DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
  1. DZone
  2. Data Engineering
  3. Databases
  4. Performance of Graph vs. Relational Databases

Performance of Graph vs. Relational Databases

Josh Adell user avatar by
Josh Adell
·
Feb. 04, 13 · Interview
Like (2)
Save
Tweet
Share
34.36K Views

Join the DZone community and get the full member experience.

Join For Free

Curator's Note:  Here's a post from back in 2011 that offers a general look at the difference in performance of graph databases and relational databases.


A few weeks ago, Emil Eifrem, CEO of Neo Technology gave a webinar introduction to graph databases. I watched it as a lead up to my own presentation on graph databases and Neo4j.


Right around the 54 minute mark, Emil talks about a very interesting experiment showing the performance difference between a relational database and a graph database for a certain type of problem called "arbitrary path query", specifically, given 1,000 users with an average of 50 "friend" relationships each, determine if one person is connected to another in 4 or fewer hops.


Against a popular open-source relational database, the query took around 2,000 ms. For a graph database, the same determination took 2 ms. So the graph database was 1,000 times faster for this particular use case.


Not satisfied with that, they then decided to run the same experiment with 1,000,000 users. The graph database took 2 ms. They stopped the relational database after several days of waiting for results.


I showed this clip at my presentation to a few people who stuck around afterwards, and we tried to figure out why the graph database had the same performance with 1,000 times the data, while the relational database became unusably slow. The answer has to do with the way in which each type of database searches information.


Relational databases search all of the data looking for anything that meets the search criteria. The larger the set of data, the longer it takes to find matches, because the database has to examine everything in the collection.


Here's an example: let's assume there is a table with "friend" relationships:

> SELECT * FROM friends;
+-------------+--------------+
| user_id     | friend_id    |
+-------------+--------------+
| 1           | 2            |
| 1           | 3            |
| 1           | 4            |
| 2           | 5            |
| 2           | 6            |
| 2           | 7            |
| 3           | 8            |
| 3           | 9            |
| 3           | 10           |
+-------------+--------------+

In order to see if user "9" is connected to user "2", the database has to find all the friends of user "9", and see if user "2" is in that list. If not, find all of their friends, and then see if user "2" is in that list. The database has to scan the entire table each time. This means that if you double the number of rows in the table, you've doubled the amount of data to search, and thus doubled the amount of time it takes to find what you are looking for. (Even with indexing, it still has to find the values in the index tree, which involves traversing that tree. The index tree grows larger with each new record, meaning the time it takes to traverse grows larger as well. And for each search, you always start at the root of the tree.)


Each new batch of friends to look at requires an entirely new scan of the table/index. So more records leads to more search time.


Conversely, a graph database looks only at records that are directly connected to other records. If it is given a limit on how many "hops" it is allowed to make, it can ignore everything more than that number of hops away.

Only the blue records are ever seen during the search. And since a graph traversal "remembers" where it is at any time, it never has to start from the beginning, only from its last known position.


You could add another ring of records around the outside, or add a thousand more rings of records outside, but the search would never need to look at them because they are more steps away than the limit. The only way to increase the number of records searched (and thereby decrease performance) is to add more records within the 2-step limit, or add more relationships between the existing records.


So the reason why having 1,000 vs. 1,000,000 records causes such a stark difference between a relational and a graph database is that relational database performance decreases in relation to the number of records in the table, while graph database performance decreases in relation to the number of connections between the records.

Database Relational database Graph (Unix)

Published at DZone with permission of Josh Adell, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Microservices Testing
  • A Gentle Introduction to Kubernetes
  • REST vs. Messaging for Microservices
  • GitLab vs Jenkins: Which Is the Best CI/CD Tool?

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: