DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • Why SQL Isn’t the Right Fit for Graph Databases
  • Graph-Oriented Solutions Enhancing Flexibility Over Mutant Requirements
  • How Graph Databases Fight Organized Crime
  • Distribution and Partitioning in Graph Databases

Trending

  • Cloud Security and Privacy: Best Practices to Mitigate the Risks
  • Rust, WASM, and Edge: Next-Level Performance
  • Endpoint Security Controls: Designing a Secure Endpoint Architecture, Part 2
  • Efficient API Communication With Spring WebClient
  1. DZone
  2. Data Engineering
  3. Databases
  4. The Secret Sauce of Graph Databases

The Secret Sauce of Graph Databases

Index-free adjacency is the secret sauce of graph databases. Read on to learn exactly what it is and how graph databases can aid fraud detection, recommendations, and more.

By 
Max De Marzi user avatar
Max De Marzi
·
Nov. 04, 17 · Opinion
Likes (8)
Comment
Save
Tweet
Share
9.1K Views

Join the DZone community and get the full member experience.

Join For Free

This article is featured in the new DZone Guide to Databases. Get your free copy for more insightful articles, industry statistics, and more!

There is no magic in computer science. There are only two things: algorithms and data structures. Choose the right combinations and they will help you solve the problem. Choose the wrong ones and they will make the problem seem insurmountable.

Imagine you’ve been tasked with keeping a list of people, a list of things, and a list of things each person likes. If you use a relational database, you would have three tables: one table of people with a primary key of personId, one table of things with a primary key of thingId, and a join table between them holding foreign keys to both tables. Let’s say we start with personId 83. How do we find the names of the things this person likes? We would scan down the join table looking for all the entries that had personId 83 in them, and collect the thingIds they referred. Then, we would scan down the things table, getting the name property of each of the thing Ids we collected. This algorithm would work, but it would not be very efficient.

Using Indexes

We could add an index to the keys of our tables. Now, instead of scanning the join table looking for entries that have personId 83, we can search a B-tree. A B-tree can perform a search operation in logarithmic time, using Big O notation: O(log n). As the data grows 10x in size, our search speed would only slow down by 2x. Starting with personId 83, we would perform one search operation to find all the thingIds they liked, and for each thingId, we would perform another search operation to get the name property of each one. Much better than scanning down the whole table.

Alternatively, we could have used a key-value store. KV stores use one big table and hashes to search in an average of O(1) time, and a worst case of O(n). Starting once again at personId 83, we would perform one hash operation to get to a list of thingIds they liked, and for each thingId, we would perform another hash operation to get the name property of each one. This would be faster, but we run into a problem: overhead from keeping a large hash table and the inevitable hash collisions that would creep up our search time.

Using Pointers

Three tables didn’t work well. One table didn’t work well. What if we tried two tables?

Imagine one table to hold both people and things, and another table to hold the relationships between them. The entry for personId 83 would have a pointer to its first relationship. That first relationship would point to the entry for the thing that was liked as well as another pointer to the next relationship of personId 83. Instead of searching in a B-tree or hashing keys, we would chase pointers in two arrays which is always an O(1) operation.

That’s the secret to graph databases. It is called “index-free adjacency” but all it really means is that objects are connected at all times without having to search. This is an important distinction from other databases as it makes data “intelligent.”

Imagine a row in a relational database. It has no idea which join tables are pointing at it and what else those join tables are pointing to. In the graph, a row is transformed into a Node and it knows exactly what it connects to and what is connecting to it by Relationships that are now first-class data citizens and hold referential integrity by design.

Reinvention

“There is nothing new except what has been forgotten,” said Rose Bertin to Marie Antoinette in 1785 — and it applies here, as well. Charles Bachman built the Integrated Data Store (IDS) in 1963, which managed relationships between records using chains of pointers. This later evolved into the network database model from CODASYL, which fell out of favor in the1970s as relational databases emerged and eventually took over in the 1980s. We are in many ways still trying to build the technology showcased in 1968 by Douglas Engelbart in “the mother of all demos.” But before we get lost in the past, let’s jump back to the present and see where graph databases can help us today.

Use Cases

A graph database is a general-purpose database. Anywhere you use a relational database, you can use a graph database, but it really shines when you can reimagine your data as a graph. There are hundreds of use cases, but the most popular ones seem to be real-time recommendations, fraud detection, master data management, network and IT operations, product hierarchies, rules engines, routing, ancestry, and data lineage. A big shift in capabilities occurs when you can take a query that wasn’t able to perform in real time and suddenly have it complete in milliseconds. There are many accounts of companies that had failed to achieve results in Teradata or Oracle for years, which are now finding success in weeks using graph databases.

Modeling

To take full advantage of a graph database you must forget what you know about modeling in relational databases. There is no third normal form for graph databases. The shape of your data model will be heavily dictated by the kind of queries you need to run. It is a mistake to try to simply translate a relational model to the graph. It is also a mistake to see the graph in a flat two-dimensional plane. A graph allows for any number of dimensions and the ability to quickly traverse in and out of them. Yes, it can be more complicated to model things correctly initially, but the graph is pretty forgiving as it allows new types of nodes and relationships to easily be added. Careful constructs in direction and relationship types allow for a kind of data sharding or built-in pagination not easily doable in relational databases. The end goal is to ensure each query looks at the least amount of the graph as possible in order to find the correct answer quickly.

Queries

Getting the data into the graph is one thing, getting the answers out is another problem altogether. A language land war is currently being fought amongst the vendors. On one side you have Gremlin, a powerful imperative language built by a genius level developer to be used by genius-level developers. On the other side, you have Cypher, an ASCII art-inspired declarative language that aims to be as easy to learn as SQL once was. Regardless of which side you choose, the real win is being able to express your ideas in a graph-like way, not the manner in which those ideas are interpreted. Much like the in the movie “Arrival,” thinking in graphs requires a kind of graph epiphany, but once achieved, it opens up a new world of possibilities.

Future

Once dismissed as the red-headed stepchild of the NoSQL world, over the last four years, graph databases have become the fastest growing category with new vendors emerging and existing vendors adding graph capabilities to their current solutions. As our industry continues to mature and evolve, it is important to know when to use the right tool for the job. There are over 300 databases listed on db-engines.com and you owe it to yourself to explore some graph databases. But be warned, once you experience their power and flexibility, you may never wish to go back to relational databases ever again.

This article is featured in the new DZone Guide to Databases. Get your free copy for more insightful articles, industry statistics, and more!

Database Relational database Graph (Unix)

Opinions expressed by DZone contributors are their own.

Related

  • Why SQL Isn’t the Right Fit for Graph Databases
  • Graph-Oriented Solutions Enhancing Flexibility Over Mutant Requirements
  • How Graph Databases Fight Organized Crime
  • Distribution and Partitioning in Graph Databases

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!