Thanks to Jim Webber, Chief Scientist at Neo4j, for sharing his thoughts on the current state of databases and their future with the influx of big data. Neo4j is the leading platform for graph technology and connected data, and it's continuing to grow as it aims to corral unstructured data in companies across numerous industries.
Q: What are the keys to successfully implementing a graph database strategy?
A: Developers have to be willing to get out of the comfort zone they have with relational databases. Letting go of what you’re familiar with, even if the new method is far superior to your standard course of action, is a step that many people are reluctant to take. Graph databases provide degrees of freedom that can be scary at first, but in the end, are empowering as you begin to grasp the technology’s capabilities. The degrees of freedom available to you in graphs is initially bewildering — and I say that as a keen graph advocate — but once you’re over that curve, you’ll never go back to the implicit complexities of RDBMS or suffer the weak data models of NoSQL.
Q: How can companies benefit from graph databases?
A: First and foremost, there is an incredible leap in terms of performance when going from relational databases to graph databases. Our users regularly feedback that their queries go from minutes to milliseconds – orders of magnitude faster. Relational databases use set theory, so when dealing with huge data sets you still end up with lots of intermediate sets in your graph. Modern operating systems are wonderful and they won’t fail the database even when it consumes more memory than RAM, but it will spill over to virtual memory on disk. This means that your relational query will chug along at a disk speed.
The secret sauce with native graph databases is that we are doing pointer chasing — what we like to call constant time traversal. I think of it as sending a robot into a graph, where it then roams around in the corridors of the graph dataset to find what it’s looking for. This is very performant: on my bog-standard laptop, you can have fourteen to sixteen million traversals happening every second.
Perhaps just as importantly, graph databases are immensely helpful with modeling and communicating with business stakeholders. If you draw a data model on a whiteboard, that is the same model that is stored in the database. There is no need to normalize or denormalize the model, meaning that there is no technical obfuscation of the domain view. This allows people on the business side of things to look at this data in a language that is meaningful to them: their data is stored their way.
Q: How have databases changed most recently?
A: If you are looking back five years ago, the adoption of NoSQL was just becoming more accepted as an alternative to relational databases. Around two years ago though, it feels like graph databases began to really demonstrate distinction and unique business value. Nowadays, graphs are no longer such a niche market and are being adopted by many of the top companies across industries like retail, finance, healthcare, manufacturing, and security. I also find the scientific use cases to be extremely socially rewarding, and who doesn’t love that graphs are going to get NASA to Mars two years early?
Q: What are some “real-world” problems your clients are solving with graph databases?
A: There are a ton of great examples! eBay ShopBot provides personalized shopping experience and the underlying technology of it is Neo4j. This is but one example of some of the tremendous uses of graph technology within artificial intelligence (AI) and machine learning (ML). Another example is that a very prominent retailer’s promotions engine is driven by a Neo4j graph database. In fact, after implementing Neo4j to help make smarter product recommendations to consumers, they had their most successful Cyber Monday ever this past year. Another example I’ll share is that a large international investment bank is using a Neo4j graph database to conduct authentication and identity management — think of it as LDAP on steroids. This has removed rather unfulfilling, error-prone work from the personnel on their security team and allows them to focus on higher-impact projects. Importantly these, as many Neo4j deployments, are on the critical path to revenue: businesses bet on graphs.
Q: What are the biggest challenges you have to help clients get past when they are implementing a graph database?
A: Getting past their comfort level with relational databases, given their long-standing familiarity with that technology. We have to encourage developers to learn about this new way of doing things, which will ultimately make their lives, and the lives of those around them, simpler and easier. But the psychological barrier is there, as it was even for those of us that work on graphs every day.
Q: What are the biggest opportunities in the evolution of graph databases?
A: Right now, AI and ML are probably the areas that have the most potential to utilize the power of graph databases. AI seems an even greater technology wave than we’ve seen with Spark and Hadoop. The graph is already being used as a model for AI in fraud detection, e-government, and everyday life in the B2C and B2B world. It’s just a matter of continuing to evangelize graphs and letting more people know that there is a better solution out there.
Despite AI and ML having a ton of momentum right now, we are also continuing to focus heavily on other projects within industries like retail and finance that can make huge impacts on the profit margins of companies.
Q: What do developers need to do to become more proficient with graph databases?
A: First, I think we need to understand and be honest about the limits of our current technology and understand when we’ve gone beyond a point of diminishing returns — fancier and fancier indexes don’t scale. Approach graph databases with an open mind and be willing to learn, as life may well become easier after getting a better grasp on the technology.
If you are currently getting poor performance out of your relational database, I encourage people to spend an afternoon testing Neo4j to see if you can make some progress on the problem or problems you’ve been running into. If you’re able to make ample headway, we may be a great fit for you!
Finally, having been a developer myself, I know that a developer’s work is often driven by your impact on business value. In order to help create the next big wave in making further advancements on business value, I think that injecting an element of computer science into your work with data will go a long way. This computer science aspect will help you multiply business value when working with data and graph databases are a core aspect of this initiative.