DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
11 Monitoring and Observability Tools for 2023
Learn more
  1. DZone
  2. Data Engineering
  3. Databases
  4. Lucene Full Text Indexing with Neo4j

Lucene Full Text Indexing with Neo4j

Romiko Derbynew user avatar by
Romiko Derbynew
·
Mar. 24, 12 · Interview
Like (0)
Save
Tweet
Share
8.86K Views

Join the DZone community and get the full member experience.

Join For Free

i spent some time working on full text search for neo4j. the basic goals were as follows

    • control the pointers of the index
    • full text search
    • all operations are done via rest
    • can create an index when creating a node
    • can update and index
    • can check if an index exists
    • when bootstrapping neo4j in the cloud run index checks
    • query index using full text search lucene query language.
download:
this is based on neo4jclient:
http://nuget.org/list/packages/neo4jclient
source code at:
http://hg.readify.net/neo4jclient/

introduction

so with the above objectives, i decided to go with manual indexing. the main reason here is that i can put an index pointing to node a based on values in node b.

imagine the following.

you have node a with a list:

surname, firstname and middlename. however node a also has a relationship to node b which has other names, perhaps display names, avatar names and aka’s.

so with manual indexing, you can have all the above entries for names in node a and node b point to node a only.

so, in a rest call to the neo4j server, it would look something like this in fiddler.

image

notice the following:

url: http://localhost:7474/db/data/index/node/{indexname}/{key}/{value }

so, if we were adding 3 names for the same client from 2 different nodes. you would have the same indexname and key then with different values in the url. the node pointer (in the request body) will then be the address to the node.

neo4jclient nuget package

i have updated the neo4jclient which is on nuget, to now support:

  • creating exact or fulltext indexes on it’s own, so that it just exists
  • creating exact or fulltest indexes when creating a node, the node reference will automatically be calculated.
  • updating an index
  • deleting entries from an index.
    class diagram for the indexing solution in neo4jclient.

image

restsharp

the neo4jclient package uses restsharp, thus making all the index call operations a trivial task for us, so lets have a look at some of the code inside the client to see how to consume manual index api from .net, and then in the next section well look how we consume this code from another application.

public dictionary<string, indexmetadata> getindexes(indexfor indexfor)
       {
           checkroot();

           string indexresource;
           switch (indexfor)
           {
               case indexfor.node:
                   indexresource = rootapiresponse.nodeindex;
                   break;
               case indexfor.relationship:
                   indexresource = rootapiresponse.relationshipindex;
                   break;
               default:
                   throw new notsupportedexception(string.format("getindexes does not support indexfor {0}", indexfor));
           }

           var request = new restrequest(indexresource, method.get)
           {
               requestformat = dataformat.json,
               jsonserializer = new customjsonserializer { nullhandling = jsonserializernullvaluehandling }
           };

           var response =  client.execute<dictionary<string, indexmetadata>>(request);

           if (response.statuscode != httpstatuscode.ok)
               throw new notsupportedexception(string.format(
                   "received an unexpected http status when executing the request.\r\n\r\n\r\nthe response status was: {0} {1}",
                   (int)response.statuscode,
                   response.statusdescription));

           return response.data;
       }

       public bool checkindexexists(string indexname, indexfor indexfor)
       {
           checkroot();

           string indexresource;
           switch (indexfor)
           {
               case indexfor.node:
                   indexresource = rootapiresponse.nodeindex;
                   break;
               case indexfor.relationship:
                   indexresource = rootapiresponse.relationshipindex;
                   break;
               default:
                   throw new notsupportedexception(string.format("indexexists does not support indexfor {0}", indexfor));
           }

           var request = new restrequest(string.format("{0}/{1}",indexresource, indexname), method.get)
           {
               requestformat = dataformat.json,
               jsonserializer = new customjsonserializer { nullhandling = jsonserializernullvaluehandling }
           };

           var response = client.execute<dictionary<string, indexmetadata>>(request);

           return response.statuscode == httpstatuscode.ok;
       }

       void checkroot()
       {
           if (rootapiresponse == null)
               throw new invalidoperationexception(
                   "the graph client is not connected to the server. call the connect method first.");
       }

       public void createindex(string indexname, indexconfiguration config, indexfor indexfor)
       {
           checkroot();

           string noderesource;
           switch (indexfor)
           {
               case indexfor.node:
                   noderesource = rootapiresponse.nodeindex;
                   break;
               case indexfor.relationship:
                   noderesource = rootapiresponse.relationshipindex;
                   break;
               default:
                   throw new notsupportedexception(string.format("createindex does not support indexfor {0}", indexfor));
           }

           var createindexapirequest = new
               {
                   name = indexname.tolower(),
                   config
               };

           var request = new restrequest(noderesource, method.post)
               {
                   requestformat = dataformat.json,
                   jsonserializer = new customjsonserializer {nullhandling = jsonserializernullvaluehandling}
               };
           request.addbody(createindexapirequest);

           var response = client.execute(request);

           if (response.statuscode != httpstatuscode.created)
               throw new notsupportedexception(string.format(
                   "received an unexpected http status when executing the request..\r\n\r\nthe index name was: {0}\r\n\r\nthe response status was: {1} {2}",
                   indexname,
                   (int) response.statuscode,
                   response.statusdescription));
       }

       public void reindex(nodereference node, ienumerable<indexentry> indexentries)
       {
           checkroot();

           var nodeaddress = string.join("/", new[] {rootapiresponse.node, node.id.tostring()});

           var updates = indexentries
               .selectmany(
                   i => i.keyvalues,
                   (i, kv) => new {indexname = i.name, kv.key, kv.value});

           foreach (var update in updates)
           {
               if (update.value == null)
                   break;

               string indexvalue;
               if(update.value is datetimeoffset)
               {
                   indexvalue = ((datetimeoffset) update.value).utcticks.tostring();
               }
               else if (update.value is datetime)
               {
                   indexvalue = ((datetime)update.value).ticks.tostring();
               }
               else
               {
                   indexvalue = update.value.tostring();
               }

               addnodetoindex(update.indexname, update.key, indexvalue, nodeaddress);
           }
       }

       public void deleteindex(string indexname, indexfor indexfor)
       {
           checkroot();

           string indexresource;
           switch (indexfor)
           {
               case indexfor.node:
                   indexresource = rootapiresponse.nodeindex;
                   break;
               case indexfor.relationship:
                   indexresource = rootapiresponse.relationshipindex;
                   break;
               default:
                   throw new notsupportedexception(string.format("deleteindex does not support indexfor {0}", indexfor));
           }

           var request = new restrequest(string.format("{0}/{1}", indexresource, indexname), method.delete)
           {
               requestformat = dataformat.json,
               jsonserializer = new customjsonserializer { nullhandling = jsonserializernullvaluehandling }
           };

           var response = client.execute(request);

           if (response.statuscode != httpstatuscode.nocontent)
               throw new notsupportedexception(string.format(
                   "received an unexpected http status when executing the request.\r\n\r\nthe index name was: {0}\r\n\r\nthe response status was: {1} {2}",
                   indexname,
                   (int)response.statuscode,
                   response.statusdescription));
       }

       void addnodetoindex(string indexname, string indexkey, string indexvalue, string nodeaddress)
       {
           var nodeindexaddress = string.join("/", new[] { rootapiresponse.nodeindex, indexname, indexkey, indexvalue });
           var request = new restrequest(nodeindexaddress, method.post)
           {
               requestformat = dataformat.json,
               jsonserializer = new customjsonserializer { nullhandling = jsonserializernullvaluehandling }
           };
           request.addbody(string.join("", client.baseurl, nodeaddress));

           var response = client.execute(request);

           if (response.statuscode != httpstatuscode.created)
               throw new notsupportedexception(string.format(
                   "received an unexpected http status when executing the request.\r\n\r\nthe index name was: {0}\r\n\r\nthe response status was: {1} {2}",
                   indexname,
                   (int)response.statuscode,
                   response.statusdescription));
       }

       public ienumerable<node<tnode>> queryindex<tnode>(string indexname, indexfor indexfor, string query)
       {
           checkroot();

           string indexresource;

           switch (indexfor)
           {
               case indexfor.node:
                   indexresource = rootapiresponse.nodeindex;
                   break;
               case indexfor.relationship:
                   indexresource = rootapiresponse.relationshipindex;
                   break;
               default:
                   throw new notsupportedexception(string.format("queryindex does not support indexfor {0}", indexfor));
           }

           var request = new restrequest(indexresource + "/" + indexname, method.get)
               {
                   requestformat = dataformat.json,
                   jsonserializer = new customjsonserializer {nullhandling = jsonserializernullvaluehandling}
               };

           request.addparameter("query", query);

           var response = client.execute<list<nodeapiresponse<tnode>>>(request);

           if (response.statuscode != httpstatuscode.ok)
               throw new notsupportedexception(string.format(
                   "received an unexpected http status when executing the request.\r\n\r\nthe index name was: {0}\r\n\r\nthe response status was: {1} {2}",
                   indexname,
                   (int) response.statuscode,
                   response.statusdescription));

           return response.data == null
          ? enumerable.empty<node<tnode>>()
          : response.data.select(r => r.tonode(this));
       }

using the neo4jclient from within an application

create an index and check if it exists

this is useful when bootstrapping neo4j, to see if there are any indexes that should be there and are not, so that you can enumerate all the nodes for that index and add entries.

public void createindexesforagencyclients()
        {
            var agencies = graphclient
                .rootnode
                .out<agency>(hosts.typekey)
                .tolist();

            foreach (var agency in agencies)
            {
                var indexname = indexnames.clients(agency.data);
                var indexconfiguration = new indexconfiguration
                    {
                        provider = indexprovider.lucene,
                        type = indextype.fulltext
                    };

                if (!graphclient.checkindexexists(indexname, indexfor.node))
                {
                    trace.traceinformation("createindexifnotexists {0} for agency key {0}", indexname, agency.data.key);
                    graphclient.createindex(indexname, indexconfiguration, indexfor.node);
                    populateagencyclientindex(agency.data);
                }
            }
        }

create an index node entry when creating a node

var indexentries = getindexentries(agency.data, client, clientviewmodel.alsoknownases);

var clientnodereference = graphclient.create(
                client,
                new[] {new clientbelongsto(agencynode.reference)}, indexentries);

public ienumerable<indexentry> getindexentries(agency agency, client client, ienumerable<alsoknownas> alsoknownases)
        {
            var indexkeyvalues = new list<keyvaluepair<string, object>>
            {
                new keyvaluepair<string, object>(agencyclientindexkeys.gender.tostring(), client.gender)
            };

            if (client.dateofbirth.hasvalue)
            {
                var dateofbirthutcticks = client.dateofbirth.value.utcticks;
                indexkeyvalues.add(new keyvaluepair<string, object>(agencyclientindexkeys.dateofbirth.tostring(), dateofbirthutcticks));
            }

            var names = new list<string>
            {
                client.givenname,
                client.familyname,
                client.preferredname,
            };

            if (alsoknownases != null)
            {
                names.addrange(alsoknownases.where(a => !string.isnullorempty(a.name)).select(aka => aka.name));
            }

            indexkeyvalues.addrange(names.select(name => new keyvaluepair<string, object>(agencyclientindexkeys.name.tostring(), name)));

            return new[]
            {
                new indexentry
                {
                    name = indexnames.clients(agency),
                    keyvalues = indexkeyvalues.where(v => v.value != null)
                }
            };
        }
		

reindex a node

notice there was a call to populateagencyclientindexin in the code, this is done in our bootstrap to ensure indexes are always there as expected, and if for some reason they are not, then they created and populated by using reindex feature.

void populateagencyclientindex(agency agency)
        {
            var clients = graphclient
                .rootnode
                .out<agency>(hosts.typekey, a => a.key == agency.key)
                .in<client>(clientbelongsto.typekey);

            foreach (var client in clients)
            {
                var clientservice = clientservicecallback();
                var akas = client.out<alsoknownas>(isalsoknownas.typekey).select(a => a.data);
                var indexentries = clientservice.getindexentries(agency, client.data, akas);
                graphclient.reindex(client.reference, indexentries);
            }
        }
		

querying a full text search index using lucene

below is sample code to query full text search. basically your index entries for a person with

name: bob, surname:van de builder, aka1: bobby, aka2: bobs, prefferedname: bob the builder

the index entries will need to look like the

key:value
name: bob
name:van
name:de
name: builder
name: bobby
name: bobs

remember, lucene has a white space analyser, so any names with spaces must become a new index entry, so what we do is split out names based on whitespaces and this becomes our collection of indexentries. the above is related to full text search context.

note: if using exact index match, then composite entries are needed for multiple words, since you no longer using lucene full text search capabilities. e.g.

name: bob the builder

this is good to know, because things like postal code searches or gender where exact matches are required do not need full text indexes.

lets check out an example of querying an index.

[test]
public void verifywhenanewclientiscreatethatpartialnamecanbefuzzysearchedinthefulltextsearchindex()
{
    using (var agency = data.newtestagency())
    using (var client = data.newtestclient(agency, c =>
    {
        c.gender = gender.male;
        c.givenname = "joseph";
        c.middlenames = "mark";
        c.familyname = "kitson";
        c.preferredname = "joey";

        c.alsoknownases = new list<alsoknownas>
            {
               new alsoknownas {name = "j-man"},
               new alsoknownas {name = "j-town"}
            };
    }
        ))
    {
        var indexname = indexnames.clients(agency.agency.data);
        const string partialname = "+name:joe~+name:kitson~";
        var result = graphclient.queryindex<client>(indexname, indexfor.node, partialname);
        assert.areequal(client.client.data.uniqueid, result.first().data.uniqueid);
    }
}

dates

notice that in some of the code, you may have noticed that when i store date entries in the index, i store them as ticks, so this will be as long numbers, this is awesome, as it gives raw power to searching dates via longs smile

[test]
       public void verifywhenanewclientiscreatethatthedateofbirthcanberangesearchedinthefulltextsearchindex()
       {
           // arrange
           const long dateofbirthticks = 634493518171556320;
           using (var agency = data.newtestagency())
           using (var client = data.newtestclient(agency, c =>
           {
               c.gender = gender.male;
               c.givenname = "joseph";
               c.middlenames = "mark";
               c.familyname = "kitson";
               c.preferredname = "joey";
               c.dateofbirth = new datetimeoffset(dateofbirthticks, new timespan());
               c.currentage = null;
               c.alsoknownases = new list<alsoknownas>
                   {
                      new alsoknownas {name = "j-man"},
                      new alsoknownas {name = "j-town"}
                   };
           }
               ))
           {
               // act
               var indexname = indexnames.clients(agency.agency.data);
               var partialname = string.format("dateofbirth:[{0} to {1}]", dateofbirthticks - 5, dateofbirthticks + 5);
               var result = graphclient.queryindex<client>(indexname, indexfor.node, partialname);
               // assert
               assert.areequal(client.client.data.uniqueid, result.first().data.uniqueid);
           }
       }
	

summary

well, i hope you found this post useful. neo4jclientis on nuget, so have a bash using it and would love to know your feedback.

download

nugetpackage:
http://nuget.org/list/packages/neo4jclient
source code at:
http://hg.readify.net/neo4jclient/

cheers

Database Lucene Neo4j

Published at DZone with permission of Romiko Derbynew, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Microservices 101: Transactional Outbox and Inbox
  • Stress Testing Tutorial: Comprehensive Guide With Best Practices
  • Practical Example of Using CSS Layer
  • Mission-Critical Cloud Modernization: Managing Coexistence With One-Way Data Sync

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: