DZone
Java Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Java Zone > Deep Dive on Fulltext Indexing with Neo4j

Deep Dive on Fulltext Indexing with Neo4j

Stefan Armbruster user avatar by
Stefan Armbruster
·
Nov. 12, 14 · Java Zone · Interview
Like (0)
Save
Tweet
5.99K Views

Join the DZone community and get the full member experience.

Join For Free

In a previous blog post I’ve explained the differences of the different types of indexes being available in Neo4j. A common requirement for a lot of projects is the usage of fulltext indexes. With current versions of Neo4j (2.1.5 as of now) this can only be accomplished with the usage of manual indexes.

In this article I want to explain how you can use language specific analyzers for fulltext indexing and how to do regex searches for those.

When looking at the reference manual on fulltext indexing there is the notion of providing a custom analyzer class by specifying a config parameter analyzer upon index creation. It’s value is the full class name of the analyzer. There are two ways to create a manual index this, either using java api

GraphDatabaseService graphDb = ....
IndexManager indexManager = graphDb.index()
try (Transaction tx = graphDb.beginTx()) {
    Map<String,String> params = Collections.singletonMap("analyzer", 
        "my.package.Analyzer")
    Index index = indexManager.forNodes("myfulltextindex", params);
}

or using REST API (using the wonderful httpie http command line client)

http -v -j localhost:7474/db/data/index/node \
   name=myfulltextindex config:='{"analyzer":"my.package.Analyzer"}'

Lucene provides an optional set of language specific analyzers. These analyzers have some knowledge on the language their operating on and use that for word stemming, see http://www.evelix.ch/unternehmen/Blog/evelix/2013/11/11/inner-workings-of-the-german-analyzer-in-lucene for details on the internals of the GermanAnalyzer. As an example the German word for houses “Häuser” is stemmed to its singular form “Haus”. Consequently a query for “Haus” retrieves all both, occurrences of “Haus” and “Häuser”.

The language specific analyzers are residing in an optional jar file called lucene-analyzers-3.6.2.jar that is not shipping by default with Neo4j. Therefore copy lucene-analyzers-3.6.2.jar into Neo4j’s plugins folder.

When trying e.g. to use Lucene’s GermanAnalyzer using

http -v -j localhost:7474/db/data/index/node name=fulltext_de \
   config:='{"analyzer":"org.apache.lucene.analysis.de.GermanAnalyzer"}'

you get back a HTTP status 500. The log files show up a strange exception java.lang.InstantiationException: org.apache.lucene.analysis.de.GermanAnalyzer. The reason for this exception is that Neo4j tries to instantiate the analyzer class using a noarg default constructor. Unfortunately Lucene’s language specific analyzers don’t have such a constructor, see javadocs. The solution for this is write a thin analyzer class with a default constructor. Internally that class uses the Lucene provided analyzer as a delegate.

In order to simplify the process of setting this up I’ve create a small project on github called neo4j-fti. It contains the mentioned wrappers in package org.neo4j.contrib.fti.analyzers for all languages having a lucene analyzer. It also provides a kernel extension to Neo4j to automatically create fulltext indexes by a config option. In neo4j.properties you need to set:

fullTextIndexes=fulltext_de:org.neo4j.contrib.fti.analyzers.German,\
    fulltext_en:org.neo4j.contrib.fti.analyzers.English

Additionally this project features an example how to use regular expression for search an index. Using Java API you need to pass a Lucene RegexQuery based on a Term holding your regular expression. The RegexQuery class isn’t part of lucene-core either, so be sure to have lucene-queries in your Neo4j’s plugins folder as well. This example is exposed in a unmanaged extension using the following code snippet:

try (Transaction tx = graphDatabaseService.beginTx()) {
    IndexManager indexManager = graphDatabaseService.index();
    if (!indexManager.existsForNodes(indexName)) {
        throw new IllegalArgumentException("index " + indexName + " does not exist");
    }
    Index index = indexManager.forNodes(indexName);
    IndexHits hits = index.query(new RegexQuery(new Term(field, regex)));
 
    List result = new ArrayList<>();
    for (Node node: hits) {
        result.add(node.getId());
    }
}

Assuming a index named fulltext_de has been configured using the German analyzer (see above), use the following code using httpie again to create a node, add it to the fulltext index and perform a regular expression index query:


# create a node
http -j localhost:7474/db/data/cypher query="create (n:Blog {description:'Auf der Straße stehen fünf Häuser'}) return id(n)"
 
# put it to the index:
http -j localhost:7474/db/data/index/node/fulltext_de \
   uri="http://localhost:7474/db/data/node/xxxx" \
   key="description" value="Auf der Straße stehen fünf Häuser"
 
# query the index for words starting with "h" and ending with "s"
http localhost:7474/regex/fulltext_de/description/h.*s

 

Neo4j

Published at DZone with permission of Stefan Armbruster, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • 3 Predictions About How Technology Businesses Will Change In 10 Years
  • Removing JavaScript: How To Use HTML With Htmx and Reduce the Amount of Code
  • Top 10 Criteria to Select the Best Angular Development Company
  • Salesforce and Snowflake Native Data Integration Options

Comments

Java Partner Resources

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo