Over a million developers have joined DZone.

neo4j/cypher/Lucene: Dealing with special characters

· Database Zone

Sign up for the Couchbase Community Newsletter to stay ahead of the curve on the latest NoSQL news, events, and webinars. Brought to you in partnership with Couchbase.

neo4j uses Lucene to handle indexing of nodes and relationships in the graph but something that can be a bit confusing at first is how to handle special characters in Lucene queries.

For example let’s say we set up a database with the following data:

CREATE ({name: "-one"})
CREATE ({name: "-two"})
CREATE ({name: "-three"})
CREATE ({name: "four"})

And for whatever reason we only wanted to return the nodes that begin with a hyphen.

A hyphen is a special character in Lucene so if we forget to escape it we’ll end up with an impressive stack trace:

START p = node:node_auto_index("name:-*") RETURN p;
==> RuntimeException: org.apache.lucene.queryParser.ParseException: Cannot parse 'name:-*': Encountered " "-" "- "" at line 1, column 5.
==> Was expecting one of:
==>     <BAREOPER> ...
==>     "(" ...
==>     "*" ...
==>     <QUOTED> ...
==>     <TERM> ...
==>     <PREFIXTERM> ...
==>     <WILDTERM> ...
==>     "[" ...
==>     "{" ...
==>     <NUMBER> ...

So we change our query to escape the hyphen:

START p = node:node_auto_index("name:\-*") RETURN p;

which results in the following exception:

==> SyntaxException: invalid escape sequence
==> Think we should have better error message here? Help us by sending this query to cypher@neo4j.org.
==> Thank you, the Neo4j Team.
==> "START p = node:node_auto_index("name:\-*") RETURN p"

The problem is that the cypher parser also treats ‘\’ as an escape character so we need to use two of them to make our query do what we want:

START p = node:node_auto_index("name:\\-*") RETURN p;
==> +------------------------+
==> | p                      |
==> +------------------------+
==> | Node[4]{name:"-one"}   |
==> | Node[5]{name:"-two"}   |
==> | Node[6]{name:"-three"} |
==> +------------------------+
==> 3 rows

Alternatively, as Chris pointed out, we could make use of parameters in which case we don’t need to worry about how the cypher parser handles escaping:

require 'neography'
neo = Neography::Rest.new
query = "START p = node:node_auto_index({query}) RETURN p"
result = neo.execute_query(query, { :query => 'name:\-*'})
p result["data"].map { |x| x[0]["data"] }
$ bundle exec ruby params.rb
[{"name"=>"-one"}, {"name"=>"-two"}, {"name"=>"-three"}]

Are you a developer? Don’t miss the monthly Couchbase Developer Community Newsletter, covering the latest headlines on NoSQL and much more. Brought to you in partnership with Couchbase.


Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}