If you haven’t just heard the news from my keynote presentation at GraphConnect San Francisco, you missed what I believe is going to be a defining moment not only for Neo4j but the entire world of graph technologies.
Setting aside a few small but smart players such as my buddy Marko Rodriguez and his merry band of graphistas at Aurelius, Neo4j has had the privilege and peril of being nearly alone in the graph database sector. But, as graph technologies go mainstream, there are a lot more friends joining us in the space, from Oracle and IBM to Datastax and Amazon.
With this mainstream momentum, it’s going to be beneficial for everyone if we can all agree on one common language to speak.
The analogy to SQL couldn’t be clearer.
The original relational database (RDBMS) players all started with their own individual query languages. But just as relational databases were on the cusp of mainstream adoption, the major industry players all coalesced around SQL – and that’s exactly where we are with graphs today.
Why We Need a Common Graph Query Language
As more users learn about graphs and as more tools and vendors enter the graph space, we’re at a time when a shared graph query language – agnostic of vendor or platform – will be a huge benefit to both vendors and users.
A high-quality query language that already has broad adoption is extremely valuable because of its reusability across platforms. A common language also helps grow the wider graph space, encouraging healthy competition (another advantage to users).
I believe that graph query language is Cypher.
In the past few months, I’ve been in deep conversations with CIOs from Fortune 500 (and even Fortune 20) enterprises about adopting graph databases. They want to integrate graphs broadly across their organizations, but they’re averse to using a query language that’s tied to a single vendor.
With graphs now on the cusp of going mainstream, the time to introduce a common graph query language couldn’t be better.
All this is why we’ve started the openCypher project.
Believe it or not, Cypher is our third attempt at a graph query language, but we knew we’d struck gold when we released Neo4j 2.0 with Cypher as a first-class citizen.
That was the day we all witnessed the demand for graph databases take off completely. As an objective measure of our success, the db-engines ranking for graph databases began to soar at the same time.
As a result of its wide adoption, Cypher has had a lot of real-world validation and users love it (not the only necessary validation, but certainly an important one).
Cypher is particularly well-suited to the challenges of querying connected data because it uses symbols to express patterns that correspond to our visual understanding and intuitive representation of data. As a declarative query language, Cypher lets users focus on their respective domain and express what data to retrieve, instead of getting lost in the mechanics of data access.
Designed to be a human-readable query language, Cypher is suitable for both the developer and the operations professional.
The expressive querying of Cypher is inspired by a number of different approaches and established practices. Most of the keywords, such as
ORDER BY , are inspired by SQL, while pattern matching borrows from SPARQL. In addition, some of the collection semantics have been borrowed from languages such as Haskell and Python.
In fact, Cypher is the closest thing to drawing on a white board with a keyboard. Graph databases are whiteboard friendly; Cypher makes them keyboard friendly.
Welcome to the openCypher Project
The openCypher project makes Cypher available to everyone – every data store, every tooling provider, every application developer. It promises to be just as instrumental in the growth of graph processing and analysis as SQL was in accelerating the adoption of RDBMS.
So what does this mean in practice? openCypher is an open source project that delivers four key artifacts released under a permissive licenses:
- Cypher reference documentation: A comprehensive user documentation describing use of the Cypher query language with examples and tutorials.
- Technology compatibility kit (TCK): The TCK consists of a number of tests that a software supplier would run in order to self-certify support for a given version of Cypher.
- Reference implementation: Distributed under the Apache 2.0 license, the reference implementation is a fully functional implementation of key parts of the stack needed to support Cypher inside a data platform or tool. The first planned deliverable is a parser that will take a Cypher statement and parse it into an AST (abstract syntax tree) representation. The reference implementation complements the documentation and tests by providing working implementations of Cypher – which are permissively licensed – and can be used as examples or as a foundation for one’s own implementation.
- Cypher language specification: Licensed under a Creative Commons license, the Cypher language specification is a technical expression of the language syntax to enable parsers to auto-generate the query syntax. A full semantic specification is also planned as a part of the openCypher project.
We explicitly structured openCypher around working code – this isn’t just a theoretical, academic discussion in a committee. We want openCypher to be a substance-oriented initiative that collaborates around actual working code.
Join the Growing openCypher Community
The openCypher project already has the support of a wide community of graph technology players, including Oracle, Databricks (the company behind Apache Spark), Neo Technology, Tableau, Structrand a host of others.
However, we don’t just take suggestions from the big players – we welcome suggestions from everyone. It’s our aim to make the process of specifying and evolving the Cypher query language as open as possible. If you’re passionate about graph technology and want to influence how people interact with graphs, then we need you.
You can help out by reading through and commenting on published language proposals, or – if you want to go all in – write your own proposal with an implementation.
openCypher is a continual work in progress. Over the next few months, we will move more and more of the language artifacts over to GitHub to make it available for everyone. In the meantime,join our mailing list to stay on the cutting edge of the evolution of Cypher.
When Andrés first talked me into creating a query language for Neo4j, I couldn’t have imagined it would ever reach this point. That being said, I’m extremely proud to launch the openCypher project today.
Cypher has already introduced a world of opportunity to today’s graph developers; openCypher aims to open up many more.