In less than a week, a group of developers (which can still include you at a discount!) will gather in San Francisco to talk with some of the world's experts on Graph Databases, which have a growing number of practical applications in computing. I'm referring to the Graph Connect conference, the first conference focused solely on graph theory and graph technology. Sessions will range from high-level to very specific use cases while still providing practical guidance on graph databases like Neo4j and graph processing techniques.
I recently talked with Dr. Jim Webber, the chief scientist at Neo Technology, to get some insights into the current events around graph databases and some background on the the growing interest in graph DBs. Neo Technology is the company organizing the Graph Connect conference.
Neo4j's New Release
I asked Dr. Webber about the recent advances made in Neo4j 1.8 and he gave me an overview of new features in two categories: those that were new to the database engine and those that were visible to users.
Under the covers, he said, there were many improvements to the engine and the traversal framework, which is now faster on denser graphs.
On the surface, the main query language for Neo4j, called Cypher, now has tools and usability features that allow even non-programmers to understand and modify the graph database's structure. Dr. Webber told me that using Cypher in the previous version required some knowledge of low level code in order to take advantage of the full expressiveness of graph models.
Now Dr. Webber says that if you can sketch the graph on a whiteboard, you can use Cypher and create the model. No programming knowledge is required to use this new release of Cypher. The community around Neo4j will also continue to build on this this abstraction in Cypher for future releases.
I asked Dr. Webber about the peak scalability limits on Neo4j, which were a point of contention at some meetups I had recently attended. He said that they've seen use only a few cases in the single-digit billions in terms of node count. As a result, they've kept the filesystem max at 34 billion nodes, leaving headroom for all use cases thus far. In the future, if there is a need they will always be able to add more bits on the filesystem. That's when you'd have to discuss the tradeoff between bites on the filesystem and footprint on disk vs. the realistic headroom for nodes, relations, and properties.
When I asked about Martin Fowler's new book, "NoSQL Distilled," Dr. Webber told me that he'd worked with Fowler previously at ThoughtWorks and that he had early access to the book. Jim offered pragmatic tips on query languages and APIs as well as a good deal of low level advice for the graph database section of the book and was pleased to see that most of his suggestions made it into the final release.
Dr. Webber's review of the book was that it is excellent at opening up the field of NoSQL. Specifically for the graph databases, the book takes perhaps unfamiliar graph model and grounds the concepts for a general dev audience.
On the topic of polyglot persistence (another major theme in the book), Dr. Webber said that this is inarguably sensible. The days of just putting things in 'the database' are over, because that commonly referred to one database type - an RDBMS. Now the ideal situation is to be able to pick the right db or dbs for the type of data your'e storing. It brings with it the burden for developers to make that choice, but the technology always ends up being better.
Dr. Webber said Spring Data is an excellent example of a technology that provides a facade above different data stores to enable polyglot persistence. He even gave me a real world scenario where neo4j was paired with Riak for document storage. Neo4j connected metadata for queries and Riak stored the videos themselves. In fact, Dr. Webber has spoken with Basho, the creators of Riak, on numerous occasions who said they have recommended this setup themselves.
Graph Theory - Still Alive
Graph theory is 275 years old, Dr. Webber told me, and it's not a dead branch of science. There is still plenty of active research going on in psychology, math, sociology, and computing of course. Not only are the models of Graph Theory more expressive models in many cases, but they are also excellent at predictive analysis.
The great thing about using graph data sets, Webber says, is being able to refer back to the centuries of graph theory and discrete math that's already been done in the field. The properties are well known, so if you find and apply these properties, you'll get plenty of initial insights.
Relational vs. Graph Thinking
I told Dr. Webber that many of the developers I talk to about graph databases find that graph models were more intuitive to them, and he agreed that graph databases can be very intuitive to those who aren't tainted by the mainstream business information systems approach throughout the industry, which seems to have wired developers' minds around tables. Most who try graphs never want to go back, and Dr. Webber can attest that he definitely would curse a lot more when he was working with relational dbs, simply because it was so hard to get little things done sometimes.
Dr. Webber recently returned from Splashconf (formerly OOPSLA), which had an academic flavor but still included a great industry tutorial for neo4j. He enjoyed looking at different domains where graph databases were being used and being able to learn about their users. He also enjoyed joking with Oracle folks about tables vs. graphs.
Next up for him will be the Graph Connect conference, which he called "fundamentally exciting". "For the last few years, we've been the weirdos, talking about graph databases up on our soapboxes," he says. But now he's seeing that they're not in the niche that they once were. It's going to be fundamentally important for people doing work on predictive models and many other areas he says.
Jim Webber thinks that graph databases will become more popular as a type of general database, especially on the web (which is essentially an enormous graph structure). He says that harnessing graphs could allow many businesses to quickly master their domain in the current technology climate, and he's always excited to talk with those who are having the most frequent pain with SQL joins, because they are usually the ones who will benefit the most from graph databases.