Creating a DSL for Cypher graph queries
Join the DZone community and get the full member experience.
Join For FreeMy first assignment at Neo4j was to create a Java DSL for the Cypher query language, that is used to access data from the Neo4j database in a graphy way.
First off, why a DSL? There’s a ton of reasons why using a DSL instead of strings is a good idea. From a practical point of view a DSL and a decent IDE will make creating queries so much easier, as you can use code completion to build the query. No need to refer to manuals and cheat sheets if you forget the syntax. Second, I have found it useful to create queries iteratively in a layered architecture, whereby the domain model can create a base query that describes some concept, like “all messages in my inbox”, and then the application layer can take this and enhance with filtering, like “all messages in my inbox that are sent from XYZ”, and then finally the UI can add the order by and paging. Doing something like this would be extremely difficult without a DSL.
After a brief readthrough of Martin Fowlers book on DSL’s, to make sure I hadn’t missed any major useful patterns, I went to work. Together with Michael Hunger and Andres Taylor the DSL was quickly iterated, and I will show you a few examples below.
Here’s a Cypher query example:
START n=node(3,1) WHERE (n.age<30 and n.name="Tobias") or not(n.name="Tobias") RETURN n
This can be expressed using the Cypher DSL like so:
start( node( "n", 3, 1 ) ). where( prop( "n.age" ).lt( 30 ).and( prop( "n.name" ).eq( "Tobias" )). or(not(prop("n.name").eq("Tobias" )))). returns( nodes( "n" ) )
There’s obviously a whole bunch of static method imports going on here to allow this kind of syntax in Java. Each clause, such as “start”, takes one or more expressions, and returns a fluent DSL that helps you know what possible clauses comes next. If you use code completion features it thus becomes really easy to build these queries without having to know the syntax. Instead all of your brain cycles can be spent on figuring out how to construct the MATCH and WHERE clauses, which is usually the tricky part.
For the WHERE-clause you have the option of using an infix or prefix notation for the expressions. In other words, these two expressions are the same:
prop( "n.age" ).lt( 30 ) lt("n.age",30)
The most unique clause of the Cypher query language is the MATCH clause, which allows you to do pattern matching in the graph. Here’s an example of what that looks like:
START a=node(3),c=node(2) MATCH p=(a)-->(b)-->(c) RETURN nodes(p)
Given two nodes, figure out all the ways to get from a to c in one jump. This can be expressed with the DSL like this:
start( node( "a", 3 ), node( "c", 2 ) ). match( path( "p" ).from( "a" ).out().to( "b" ).link().out().to( "c" ) ). returns( nodesOf( "p" ) )
As you can see it becomes a little longer to write a query using the DSL rather than just a string, but I hope that it “reads” reasonably well to not be too difficult to parse in your head. When in doubt you can always add .toString() on the DSL result to see what the generated query looks like.
One trick that Michael Hunger showed me was the Java instance initialization block trick (that’s a mouthful!). Basically, when you instantiate a Java object it is possible to add an initializer block, something like the static block in classes, that are executed as part of the initialization phase of an object. For DSL’s, this can be exploited by adding the terms as protected methods. Here’s how you would use the Cypher DSL with this style:
assertEquals( "START john=node(0) RETURN john", new CypherQuery() {{ starts( node( "john", 0 ) ).returns( nodes( "john" ) ); }}.toString() );
In this case there are no static imports at all. Instead “starts”,”node” and “nodes” are protected methods in the CypherQuery class, which makes them available for code completion goodness in the initialization block. This style thus avoids all the static imports, but makes it very hard to do the iterative query construction mentioned earlier. If you build your queries in one step, it could be useful however.
For a more complete set of examples, please see the Cypher reference manual tests in GitHub here.
So what do you do with the DSL builder once you have created the query? In the first version the main thing you can do is to call .toString() to get the query as a string, and then use that to invoke Cypher. However, this forces Cypher to have to parse the query, which can be costly. If you have queries that are repeated often then you can use parameterized queries, so that you only have to do this parsing once. In the long term, we are working on allowing the Cypher engine to execute Cypher DSL queries directly, thus skipping the parsing step entirely.
If you are using Neo4j and Cypher, please try out this DSL! You can find it in Maven repos here. If you have feedback on how to improve it, please let me know, preferably through the Neo4j mailing lists.
The Cypher DSL has also been integrated with the QueryDSL library, which makes for even more static typing goodness. In a future post I will show how that works, and how to set it up.
Source: https://rickardoberg.wordpress.com/2011/11/14/creating-a-dsl-for-cypher-graph-queries/
Opinions expressed by DZone contributors are their own.
Comments