With the release of a new version of the open source graph database, Neo4j, and the fast-approaching Graph Connect conference (the first EVER graph database-focused conference, btw), we thought it'd be a good idea to talk to a couple of leaders in the graph database space at Neo Technology, the key commercial backer behind Neo4j.
By the way, you can get into Graph Connect at a 20% discount using this code: GCFON
My first interviewee was Philip Rathle, the senior director of products at Neo Technology. We started by talking about the new features of Neo4j 1.8:
These were some of the new features we talked about:
- Zero-downtime rolling upgrades in HA clusters, for nicer administrative ops
- Streamed responses to REST API requests, for faster remote access
- Bi-directional traversals, branch state and path expanders in the traversal framework, for even faster queries
- Support in the Cypher language for writing graph data and updating auto-indexes, see above ;)
- Support for explicit transactions in neo4j-shell, on the command line and through the web
"The 1.8 version improves upon the world's leading graph database, adding lots of features and enhancements that make Neo4j 1.8 the fastest and most robust database we've ever shipped, " said Rathle.
Further on we discussed how there had been a lot of 'bashing' posts around NoSQL databases over the past few years, especially targeted towards MongoDB. I asked him if he'd seen similar unfair/user-error-based criticisms of Neo4j. He's seen it only rarely. These were the main points it boiled down to from Philip's perspective:
- 95% of the use cases for Neo4j will work smoothly out of box.
- For about 5% of use cases he suggested reaching out to with Neo Technology to see whether you need app level sharding, clustering, or other techniques depending on what you're trying to scale.
was featured in Martin Fowler's new book, "NoSQL Distilled" and
I asked Philip about his thoughts on the book.
"It's a great book to understand what NoSQL is all about, what the
pieces are, and how they fit," said Rathle. "But it's more than simply a book that
NOSQL developers should read. It's an
important book for any developer, period, because we're all dealing with
persistence in some form or another."
He said the book drew a clear distinction between aggregate oriented databases such as KV stores, doc stores, and column stores, which are optimized for atomic intelligence, and then graph databases, which are optimized for understanding data connections.
Martin Fowler is also noticing the movement toward polyglot persistence (relational and non-relational databases working together in a single system). I asked Philip for a few examples of companies using Neo4j with other databases, and he mentioned a few:
Polyglot persistence usually comes about in one of two ways:
1) Someone with an existing system, often relational, suddenly finds that the connectedness of the data and queries is such that the existing system can't perform fast enough in real time. While a few customers have replaced the entire system with Neo4j, this is usually an extreme solution, and isn't as cost effective as moving just the parts of the system that are highly connected.
2) Someone is building a new system, and recognizes that the data in that system fits into distinct categories, for example: huge volumes of simple time-series data that don't need to be inter-related, giant multimedia files, and then something closely knit and highly interconnected. As the data volumes grow and SLAs become more rigorous, it starts to make sense to store data in a place that's optimized for that type of data. In this example, you might use something like Cassandra, S3, and Neo4j.
A few examples: Telenor, one of the world's 10 largest telcos, replaced part of a Sybase application with Neo4j for hierarchical queries that needed to run very fast, but kept much of their existing database around.
On the other hand, we have a life sciences customer who provides gene sequencing as a service and uses an Amazon S3-like filesystem to store gene sequences, which are large binary files, and Neo4j to store and relate metadata about the genome.
Graph Theory in other areas of life
The reason why graph databases are so successful and growing in popularity right now is because many businesses are evolving beyond atomic intelligence, and making huge competitive gains by leveraging connected intelligence. Graph databases are the best way to do this. Interestingly, Euler's Graph Theory, nearly 300 years old, has made enormous impacts on mathematics and the sciences, and has long been proven as a powerful and accurate model for describing many things in nature. Only recently have graphs been used as the basis for a database management system, and the opportunities are just beginning to unfold. As for graphs and what you can model with them, Philip had several examples:
- The Human Brain - The most powerful device in the known universe (as Philip described it) is made up of neurons and synapses, which are directly comparable to the nodes and edges of graph models
- Geography/cartography - The mathematical act of path-finding was the way in which Euler first surmised Graph Theory
- Relationships - Between people, classifications (ontologies), and almost anything being compared and connected.
- Network management - This can overlap with relationships, but we're talking about machine networks mainly.
Even today, our understanding of Graph Theory continues to evolve.
Philip was also happy to talk about recent and upcoming conferences for Neo Technology. JavaOne was another big success for Graph DB sessions as well as the NoSQL Now Conference in San Jose. They also just had some great sessions at SpringOne. Worldwide, Philip says interest has been off the charts. On any given day, there are multiple Neo events happening worldwide, with new meetups springing up nearly weekly. (Example below from last Tuesday):
Tuesday, October 16
- Manchester Talk: The Challenge of Connected Data
- London Talk @ BigDataCon: New Opportunities for Connected Data
- NY Meetup: GraphHub East Workshop
- Washington DC Meetup @AOL: Spring Data Neo4j, graph power with spring ease of use
- SF Meetup: GraphHub West Workshop
It makes sense then that Neo Technology is hosting its own conference, Graph Connect, on November 5th-6th. It's the first conference to focus completely on graph databases, and it'll feature speaking heavyweights like Eric Evans and Peter Bell, and a great mixture of startups (like FiftyThree and Squidoo), Global 2000's, cloud players (Heroku and Twitter), and graph and database experts.Philip says it will provide the best first hand experience you could get with graph databases, from tutorials with experts, to tales from the trenches from graph database users, to the trends behind connected data and when you know you need a graph database.
I concluded by asking Philip about how pervasive he thinks graph databases will become with the emergence of massive, unconnected data sets and a web environment that demands connections more than ever. In a way, he says, the graph database has, whether we realize or not, dominated web search over the last decade. One of the great innovations that Google brought in 1999 in organizing our journey through the web was PageRank, which is a graph database algorithm. Just this year Google stepped it up even further, introducing the Knowledge Graph as the basis for their next-generation search engine.
Another cool emerging project using graph databases is is Mozilla's recently unveiled Pancake project
It stores the user's browsing data / history in a graph structure via
Neo4j. The world wide web is a graph, as
are browsing patterns, so what better way to represent it than as a graph!
From the Pancake project site:
The nodes of the graph are sites that have been visited and contain the following information:
- The date of the visit
- The URL of the page that was visited
- The title of the page that was visited
- The id of the 'stack' that this visit is part of
- The id of the 'session' that this visit is part of
- A unique ID of the device that generated the visit
It does not talk to anything. It only exposes an API to the fxhome-lattice service.
For a long time I've personally been 'all in' on graph databases and the expectations of their growth in the IT space simply because graphs are everywhere. To learn more, check out www.neotechnology.com for lots of stories, videos, and examples... or go straight to www.neo4j.org to try out a Graph DB yourself. And if you can make it, visit GraphConnect!
For a 20% discounted ticket to the Graph Connect conference in San Francisco, CA from November 5-6, go to graphconnect.com and enter the registration code: GCFON