DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Why SQL Isn’t the Right Fit for Graph Databases
  • Graph-Oriented Solutions Enhancing Flexibility Over Mutant Requirements
  • How Graph Databases Fight Organized Crime
  • Distribution and Partitioning in Graph Databases

Trending

  • Docker Model Runner: Streamlining AI Deployment for Developers
  • How to Convert XLS to XLSX in Java
  • Beyond Simple Responses: Building Truly Conversational LLM Chatbots
  • Contextual AI Integration for Agile Product Teams
  1. DZone
  2. Data Engineering
  3. Databases
  4. Performance of Graph vs. Relational Databases

Performance of Graph vs. Relational Databases

By 
Josh Adell user avatar
Josh Adell
·
Feb. 04, 13 · Interview
Likes (2)
Comment
Save
Tweet
Share
35.4K Views

Join the DZone community and get the full member experience.

Join For Free

Curator's Note:  Here's a post from back in 2011 that offers a general look at the difference in performance of graph databases and relational databases.


A few weeks ago, Emil Eifrem, CEO of Neo Technology gave a webinar introduction to graph databases. I watched it as a lead up to my own presentation on graph databases and Neo4j.


Right around the 54 minute mark, Emil talks about a very interesting experiment showing the performance difference between a relational database and a graph database for a certain type of problem called "arbitrary path query", specifically, given 1,000 users with an average of 50 "friend" relationships each, determine if one person is connected to another in 4 or fewer hops.


Against a popular open-source relational database, the query took around 2,000 ms. For a graph database, the same determination took 2 ms. So the graph database was 1,000 times faster for this particular use case.


Not satisfied with that, they then decided to run the same experiment with 1,000,000 users. The graph database took 2 ms. They stopped the relational database after several days of waiting for results.


I showed this clip at my presentation to a few people who stuck around afterwards, and we tried to figure out why the graph database had the same performance with 1,000 times the data, while the relational database became unusably slow. The answer has to do with the way in which each type of database searches information.


Relational databases search all of the data looking for anything that meets the search criteria. The larger the set of data, the longer it takes to find matches, because the database has to examine everything in the collection.


Here's an example: let's assume there is a table with "friend" relationships:

> SELECT * FROM friends;
+-------------+--------------+
| user_id     | friend_id    |
+-------------+--------------+
| 1           | 2            |
| 1           | 3            |
| 1           | 4            |
| 2           | 5            |
| 2           | 6            |
| 2           | 7            |
| 3           | 8            |
| 3           | 9            |
| 3           | 10           |
+-------------+--------------+

In order to see if user "9" is connected to user "2", the database has to find all the friends of user "9", and see if user "2" is in that list. If not, find all of their friends, and then see if user "2" is in that list. The database has to scan the entire table each time. This means that if you double the number of rows in the table, you've doubled the amount of data to search, and thus doubled the amount of time it takes to find what you are looking for. (Even with indexing, it still has to find the values in the index tree, which involves traversing that tree. The index tree grows larger with each new record, meaning the time it takes to traverse grows larger as well. And for each search, you always start at the root of the tree.)


Each new batch of friends to look at requires an entirely new scan of the table/index. So more records leads to more search time.


Conversely, a graph database looks only at records that are directly connected to other records. If it is given a limit on how many "hops" it is allowed to make, it can ignore everything more than that number of hops away.

Only the blue records are ever seen during the search. And since a graph traversal "remembers" where it is at any time, it never has to start from the beginning, only from its last known position.


You could add another ring of records around the outside, or add a thousand more rings of records outside, but the search would never need to look at them because they are more steps away than the limit. The only way to increase the number of records searched (and thereby decrease performance) is to add more records within the 2-step limit, or add more relationships between the existing records.


So the reason why having 1,000 vs. 1,000,000 records causes such a stark difference between a relational and a graph database is that relational database performance decreases in relation to the number of records in the table, while graph database performance decreases in relation to the number of connections between the records.

Database Relational database Graph (Unix)

Published at DZone with permission of Josh Adell, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Why SQL Isn’t the Right Fit for Graph Databases
  • Graph-Oriented Solutions Enhancing Flexibility Over Mutant Requirements
  • How Graph Databases Fight Organized Crime
  • Distribution and Partitioning in Graph Databases

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!