Over a million developers have joined DZone.

Neo4j 2.0: Labels, Indexes, and the Like

· Java Zone

Microservices! They are everywhere, or at least, the term is. When should you use a microservice architecture? What factors should be considered when making that decision? Do the benefits outweigh the costs? Why is everyone so excited about them, anyway?  Brought to you in partnership with IBM.

Last week, I did a couple of talks about modelling with Neo4j meet ups in Amsterdam and Antwerp and there were a few questions about how indexing works with labels that are being introduced in Neo4j 2.0.

As well as defining properties on nodes, we can also assign them a label which can be used to categorize different groups of nodes.

For example, in the football graph we might choose to tag player nodes with the label ‘Player’:

CREATE (randomPlayer:Player {name: "Random Player"})

If we then wanted to find that player, we could use the following query:

MATCH (p:Player) 
WHERE p.name = "Random Player" 
RETURN p

A common assumption amongst the attendees was that labeled nodes are automatically indexed, but this isn’t actually the case, which we can see by profiling the above query:

$ PROFILE MATCH (p:Player) WHERE p.name = "Random Player" RETURN p;
==> +-----------------------------------+
==> | p                                 |
==> +-----------------------------------+
==> | Node[31382]{name:"Random Player"} |
==> +-----------------------------------+
==> 1 row
==> 
==> Filter(pred="(Product(p,name(0),true) == Literal(Random Player) AND hasLabel(p:Player(8)))", _rows=1, _db_hits=524)
==> NodeByLabel(label="Player", identifier="p", _rows=524, _db_hits=0)

Instead what we have is a ‘label scan’ whereby we search across the nodes labeled as ‘Player’, check whether they have a property ‘name’ which matches ‘Random Player’, and then return them if they do.

This is different than doing a ‘full node scan’, checking for the appropriate label and then property. e.g.

$ PROFILE MATCH p WHERE "Player" IN LABELS(p) AND p.name = "Random Player" RETURN p;
==> +-----------------------------------+
==> | p                                 |
==> +-----------------------------------+
==> | Node[31382]{name:"Random Player"} |
==> +-----------------------------------+
==> 1 row
==> 
==> Filter(pred="(any(-_-INNER-_- in LabelsFunction(p) where Literal(Player) == -_-INNER-_-) AND Product(p,name(0),true) == Literal(Random Player))", _rows=1, _db_hits=524)
==> AllNodes(identifier="p", _rows=11443, _db_hits=11443)

If we want to index a specific property of ‘Player’ nodes then need to explicitly index that property for that label:

$ CREATE INDEX ON :Player(name);
==> +-------------------+
==> | No data returned. |
==> +-------------------+
==> Indexes added: 1
==> 0 ms

If we want to see the indexes defined on our database we can run the following command in webadmin:

$ schema
==> Indexes
==>   ON :Player(name) ONLINE  
==> 
==> No constraints

or its equivalent in Neo4j browser:

2013 10 22 21 14 32

Now if we repeat our initial query we can see that it’s a straight schema/index lookup:

$ PROFILE MATCH (p:Player) WHERE p.name = "Random Player" RETURN p;
==> +-----------------------------------+
==> | p                                 |
==> +-----------------------------------+
==> | Node[31382]{name:"Random Player"} |
==> +-----------------------------------+
==> 1 row
==> 
==> SchemaIndex(identifier="p", _db_hits=0, _rows=1, label="Player", query="Literal(Random Player)", property="name")

Based on a few runs of the query with and without the index defined, it takes 1ms and 10ms respectively. The ‘full node scan’ approach takes ~40ms and that’s with a very small database of 30,000 nodes. I wouldn’t recommend it with a production load.


Discover how the Watson team is further developing SDKs in Java, Node.js, Python, iOS, and Android to access these services and make programming easy. Brought to you in partnership with IBM.

Topics:

Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}