How Do You Know If a Graph Database Solves the Problem?
Evaluate whether graphs fit your use case before exploring it in-depth as a solution
Join the DZone community and get the full member experience.
Join For FreeOne of the greatest questions to consistently badger a developer is "what technology should I use?". Days of thought and analysis determines which option(s) (from an increasingly growing number) best suits the need, manages volume and demand, plans for long-term strategy, simplifies/reduces support, and gets approved by colleagues and management.
Those steps may even seem easy compared to real life. The decision's complexity can get compounded by how much buy-in is needed, and the current constraints of existing technology plus developer knowledge. For instance, investing in an unknown or newer solution means allocation for learning costs.
If you are researching graph databases, you may have been awed by the complexity it can handle or the simplicity of interacting with your data. Perhaps you were star-struck by pretty visualizations or the possibilities of lightning-fast queries. Then again, maybe you are desperate to learn something new and want to experiment with this graph database stuff.
But how do you know for sure that graph is the right solution for your business or technical need? What kind of investigation is needed to be certain of its value? What makes a graph database special over another solution for your project?
In this post, I want to highlight some of the scenarios to help you know when a use case is not a good fit. These are not strict guidelines, but rather some opportunities to evaluate whether graphs fit your use case before exploring it in-depth as a solution.
For only the benefits of graph, there are many vendor-specific pages, such as the “Why Graph Databases?” one from Neo4j.
Self-Evaluation: Are You Desperate to Use a Graph Database on Anything?
I think we as developers (or <insert position title here>) want so strongly to use something new that we choose the solution and apply it to the next “victim” project that comes along. Most of us probably know not to do this, but reality often gets lost in deadlines and desperation.
To alter this mindset, we need to put each problem through analysis before evaluating various solutions. What is our motivation for using this technology? What will it provide that others cannot? Possible solutions should be drawn out and well-researched to see what the advantages and disadvantages of each are. From there, a few reviews from others can catch any missing thoughts or remove options that do not meet enough requirements.
When Are Graph Databases NOT a Good Fit?
As most companies are, Neo4j is biased towards its products and their usefulness. We all wish our products could be used for everything, but there will never be anything in this world that is one-size-fits-all. There are too many unique ideas, people, problems, and technologies for that to exist (and that's a good thing!). Most of what you will learn about a product is likely from the company itself, which usually focuses on the positive aspects and what it does well.
....But what about knowing what you cannot or should not use it to do?
If your use case passes all of the following scenarios, this should help solidify that graph is an excellent option. If your use case fits any of these scenarios, though, this will hopefully help you avoid using the wrong tool for the wrong job. While this list is not comprehensive, it covers the most common or easily identifiable cases.
Where Data Is Disconnected and Relationships Do Not Matter
If you have transactional data and do not care how it relates to other transactions, people, etc, then graph is probably not the solution. There are some cases where a technology simply stores data, and analysis of the connections and meanings among it is not important.
Requirements for write-only transactions and simple queries without SQL join statements are good indicators that your use case may not be suited to a graph database. You might have queries that rely on sequentially-indexed data (the next record stored next to the previous one in storage), rather than relationship-indexed data (the record is stored nearest those it is related to).
Searching for individual pieces of data or a list of items also points to other solutions, as it is not interested in the context of that data. Overall, graph solutions will focus and provide the most value from data that is highly connected and where queries search possible connections (if ones don’t already exist). If this doesn't fit your use case, another kind of technology may suit it better.
Where You Are Optimizing for Writing and Storing Data and Not Reading/Querying
Though this was mentioned in the point above, I want to focus on it separately. If the use case is only looking to write data to the store and not expecting to analyze results, then graph may not solve the problem. Graph databases are designed to traverse stored data very quickly and retrieve results in milliseconds. If the use case is not expected to utilize this advantage, then you probably want to find another solution.
Where the Core Data Model Stays Consistent and Data Structure Is Fixed/Tabular
If you are collecting a constant, unchanging set of data, then graph may not be the most appropriate solution. Graphs are well-suited to storing many element types and can easily adapt to changing business needs.
Take, for instance, a scenario in which you need to track the number of people who call your business. You only need to store an ID, name, and phone number in your Customer table for this. There is no need to retain more information from the customer, so the columns on the table will not change and everyone calling your business can be assigned an ID, name, and phone number. This is a good example for a relational database.
If the requirements are expected to grow and other types of analysis will be needed, the table can still adapt to include email address, company name, order numbers, etc. There is still flexibility enough to handle empty values (not all customers create orders or work for a company), to store other types of entities (like orders), or adapt data definitions (i.e. customer could also be employee).
In short, if the requirements are narrow to a specific need and the scope is expected to remain somewhat limited, then graph may not be the best fit.
Where Queries Execute Bulk Data Scans or Start From an Unknown Data Point
If your queries are doing table scans to find a match or searching for data that fits a general category, then a graph solution is not best-suited to the task. A graph database is optimized to traverse relationships from a starting point. It is not optimized for searching the entire graph without a specific target area in mind.
Queries such as the one below will end up traversing a potentially-massive graph containing a variety of types of information for a single result (is Jennifer an order or item or customer or employee or something else?). However, the next query starts from a particular user and looks at who that person knows.
MATCH (n)
WHERE n.name = "Jennifer"
RETURN n;
MATCH (n:Person {name: "Jennifer"})-[r:KNOWS]->(p:Person)
RETURN p;
When the majority of your queries look like the first one and the performance of those queries is highly important, you need to consider non-graph solutions. While graph can still handle those queries, the technology is not optimized for maximum performance on bulk scans or unknown starting points.
Where It Is Used as a Key-Value Store (Like a Cache)
If you are only interested in a lookup operation, then a graph database is not the solution for you. As discussed above, graph analysis benefits from relationships among data. A lookup from a known key does not maximize what graph databases were created to do.
As an example, someone might use a database as a cache to store session data for an application. You might store the session ID in cache, but then write the session details to the database. When you need to retrieve session details or run analyses on them, you would send the session id (as the key) to return the value (probably properties stored on an entity).
This method does not utilize any relationships because it is using a known key to return a single object or detail data on one entity. When reviewing your use case, ensure that you understand the storage and retrieval mechanisms of each technology. Doing a lookup might fit a key-value store or even relational database more appropriately, giving you better performance.
Where Large Amounts ofText or BLOBS Need to Be Stored as Properties
If you are storing and retrieving entity properties that contain extremely large values (such as BLOBs, CLOBs, text paragraphs, etc), then another technology solution might be a better choice. Graph databases are very good at traversing relationships between small data entities, and not as performant when you store a lot of properties on a single node or large values in those properties. The reason for this is because the query can hop from entity to entity, but then also needs extra processing to pull out the details of each entity along a path.
Sometimes, this issue can be corrected by re-organizing the data model. For instance, if you stored all information about an employee on a single graph node (address, job info, orders, benefit elections, salary info), it would create a very cumbersome node containing lots of properties and potentially large values. You could re-model this to separate entities for company, address, and position details, simplifying the model and trimming down performance on queries.
However, you may have some cases where you need those large values stored in a single property, and the queries are not graph-specific. For this type of use case, a graph database is not recommended.
Of course, no single item listed above will always appear alone. The delineation between some of the scenarios often blur and cross boundaries, so there may be aspects of your project that are reasons against using a graph database, as well as reasons in support of using one. While that may complicate the decision, it ultimately comes down to evaluating the positives/negatives of each technology to determine the best fit.
When are Graph Databases a Good Fit?
I will not spend too much time here, as I briefly mentioned some of graph technology's key strengths and you can learn more from company resources, employee discussions, and customer feedback, but I want to close with some positives. :)
Scenarios where users want to understand relationships in their data (hidden and obvious) will thrive with a graph database. If you want to know customer interests to gear messages toward topic areas or understand the layout of a network to analyze impacts, a graph database is perfectly suited to these use cases and queries. Graphs can allow businesses to create well-rounded, diverse customer profiles or scrutinize bank transactions to find outliers that could be signs of fraud.
They also exceed performance expectations for data science and analytics purposes. Graph algorithms are expanding the value of running more complex analysis on connected data to highlight patterns for decision-making.
Graph technology is used in all types of industries for business-critical systems and backbone processes. Anything where data looks like a network is an indicator that a graph can maximize value.
Conclusion
We have only scratched the surface of what a graph database can and cannot do. There are much finer, and minute details that go into a decision for one technology or another. With this post, I want to give you a few of the tools to help that decision. Whether you choose a graph database or not, the goal is to find the best tool to meet (and hopefully exceed) the requirements.
Best wishes on your next project and happy evaluating!
Opinions expressed by DZone contributors are their own.
Comments