DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Databases
  4. What Makes ArangoDB a Graph Database?

What Makes ArangoDB a Graph Database?

Today, we will take a look at how ArangoDB lets you map graph data natively to the database and how the database provides efficient access to graph datasets.

Max Neunhoeffer user avatar by
Max Neunhoeffer
·
May. 17, 19 · Tutorial
Like (1)
Save
Tweet
Share
8.81K Views

Join the DZone community and get the full member experience.

Join For Free

When looking for a solution for your project, it is important to understand what makes each technology unique; what sets it apart. With ArangoDB, that is its native multi-model approach including full graph database capabilities, and I am going to explain the fundamental pieces of what that means.

Using ArangoDB as a Graph Database

If you are already familiar with the graph database concept, then you know that a graph consists of vertices (or nodes) connected via edges. Graph databases usually store edges connected to vertices directly at the vertex object. In ArangoDB, this is handled differently (if you want to take a technical deep dive into ArangoDB's approach, see this article).

Today, we will take a look at how ArangoDB lets you map graph data natively to the database and how the database provides efficient access to graph datasets with a variety of different access patterns like traversals, shortest path, or pattern matching.

Vertices (or nodes) are being stored in normal collections. The key to graph database capabilities comes from something called an edge collection and an edge index.

Let's take a quick look at storing vertices and then explore a bit more on edge collections and indices.

Vertex Collections

To showcase the benefits of using graphs with ArangoDB, we will use the example of domestic flights in the USA. The dataset describes the relationship of airports (vertices) and the flights (edges) between them. We use the same dataset in our Graph Course for Beginners

Here are two examples JSON documents from our airports collection:

{
    "_key": "JFK",
    "_id": "airports/JFK",
    "_rev": "_YOO08KG-_T",
    "name": "John F Kennedy Intl",
    "city": "New York",
    "state": "NY",
    "country": "USA",
    "lat": 40.63975111,
    "long": -73.77892556,
     "vip": true
}
{
    "_key": "BIS",
    "_id": "airports/BIS",
    "_rev": "_YOSrLBe--r",
    "name": "Bismarck Municipal",
    "city": "Bismarck",
    "state": "ND",
    "country": "USA",
    "lat": 46.77411111,
    "long": -100.7467222,
     "vip": false
}

The airports collection is a normal collection of JSON documents and requires nothing special or out of the ordinary to work with a graph. Please note the _id attribute, as this will play a crucial role in our graph.

We will explore these documents a bit more in a moment, for now though, just understand that our airports collection contains normal JSON documents that represent airports.

The Edge Collection

To explain what an edge collection is, let's start with a simple explanation of it; a special collection of JSON documents that describe the connection between two other documents.

Pretty simple right? Well, I have some good news, it actually is that simple. The power of native multi-model in ArangoDB is that edges stored in an edge collection are not tied to vertices stored in another collection but can be stored and distributed independently — providing advantages in terms of data modeling flexibility and, most importantly, horizontal scalability.

Let's go a little deeper here and take a look at what exactly "describing the connection between two other documents" looks like.

A document in an Edge Collection will always contain at least five attributes. Those attributes are _id, _key, _rev, _to, and _from. The 'magic' comes from the _to and _from attributes. These two attributes define the beginning and end points for the edge, they are the _id attributes of the vertices that they connect to.

In our airports and flights example, airports are the vertices and flights 'connect' the airports with one another and therefore are the edges of our graph. Here are two edge documents from the flights/edge collection.

{
    "_key": "25471",
    "_id": "flights/25471",
    "_from": "airports/BIS",
    "_to": "airports/MSP",
    "_rev": "_YOO8JXG--f",
    "Year": 2008,
    "Month": 1,
    "Day": 2,
    "DayOfWeek": 3,
    "DepTime": 1055,
    "ArrTime": 1224,
    "DepTimeUTC": "2008-01-02T16:55:00.000Z",
    "ArrTimeUTC": "2008-01-02T18:24:00.000Z",
    "UniqueCarrier": "9E",
    "FlightNum": 5660,
    "TailNum": "85069E",
    "Distance": 386
}
{
    "_key": "71374",
    "_id": "flights/71374",
    "_from": "airports/JFK",
    "_to": "airports/DCA",
    "_rev": "_YOO8LYG--N",
    "Year": 2008,
    "Month": 1,
    "Day": 4,
    "DayOfWeek": 5,
    "DepTime": 1604,
    "ArrTime": 1724,
     "DepTimeUTC": "2008-01-04T21:04:00.000Z",
     "ArrTimeUTC": "2008-01-04T22:24:00.000Z",
     "UniqueCarrier": "MQ",
     "FlightNum": 4755,
     "TailNum": "N854AE",
     "Distance": 213
}

In addition to our required attributes, these documents actually contain all of the information for individual flights; the first flight goes from Bismarck (BIS) to Minneapolis (MSP) airport, while the second flight is from John F Kennedy(JFK) to DCA(Ronald Reagan) airport. You can tell this by looking at the _from and _to fields in the documents.

Taking a look at the rest of the fields in the document, notice that we have all of the information related to those individual flights including date, departure and arrival times, flight number, and more. Although these documents are a part of an edge collection, they can still be queried like documents in standard collections, as well. You could even use nested properties on edges if you wanted to.

Now, let's change our approach here just slightly. Let's say we wanted to get a flight from the Bismarck (BIS) airport to the Denver airport. How could we do that? The edge collection allows us to explore connections between airports by querying our flights edge collection to find a flight that goes from Bismarck to Denver. We can do this due to the _from and _to attributes. Bismarck is shown as "_from": " airports/BIS" in the above example document. Our graph also knows the destination airports of those flights with the _to fields. Our result ends up returning something like this:

Here we can see that flight 14426 departs from Bismarck and lands in Denver. We also have all of the information we would need for each airport because the edge has a reference to the actual documents for each airport. This is something that we were able to find due to having the edge collection (flights) that creates a relationship between airports with the flights to and from them.

Edge Index

Typically, with ArangoDB, documents contain a hash index of their document keys attribute that offers a way to quickly lookup documents using either their _key or _id attributes. Edge collections in ArangoDB have an additional, implicitly created, hash edge index, that provides quick access to the _to and _from fields of the edge documents, this means our queries can fetch results quickly and with a constant lookup time.

Due to the edge index specifically indexing the _to and _from fields, they are most useful when doing equality lookups, such as, looking for a connection to or from a specific airport. When doing other queries such as range queries or when sorting, edge indexes won't be very beneficial. Although additional edge indexes cannot be explicitly created, you can use the _from and _to fields in your own indexes to improve your query performance. We showcase some of the performance benefits of using indexes with graphs in our performance benchmark and we also offer a performance course with strategies for improving AQL query speed.

Conclusion

As promised, I have shown you just a little piece of what makes ArangoDB a highly flexible graph database, but edge collections are just one of the many features that ArangoDB has to offer.

This example shows just one connection from Bismarck to Denver, but what if we weren't able to find a direct flight to Denver? Using our edge collection and the power of graph traversals, we can start doing more complex queries that can do things like allow for connecting flights, all flights from an airport or to a specific airport, all flights between two airports, and more. If you would like to know more about graph traversals, pattern matching, and doing more complex queries, take our Graph Course for Freshers, which takes you from zero to advanced with the ArangoDB Query Language (AQL).

Database Graph (Unix) ArangoDB Document Attribute (computing)

Published at DZone with permission of Max Neunhoeffer, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Building a Scalable Search Architecture
  • Quick Pattern-Matching Queries in PostgreSQL and YugabyteDB
  • Utilize OpenAI API to Extract Information From PDF Files
  • Why It Is Important To Have an Ownership as a DevOps Engineer

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: