Over a million developers have joined DZone.

Neo4j: Modelling Sub Types

· Java Zone

Check out this 8-step guide to see how you can increase your productivity by skipping slow application redeploys and by implementing application profiling, as you code! Brought to you in partnership with ZeroTurnaround.

A question which sometimes comes up when discussing graph data modelling is how you go about modelling sub/super types.

In my experience there are two reasons why we might want to do this:

  • To ensure that certain properties exist on bits of data
  • To write drill down queries based on those types

At the moment the former isn’t built into Neo4j and you’d only be able to achieve it by wiring up some code in a pre commit hook of a transaction event handler so we’ll focus on the latter.

The typical example used for showing how to design sub types is the animal kingdom and I managed to find a data set from Louiseville, Kentucky’s Animal Services which we can use.

In this case the sub types are used to represent the type of animal, breed group and breed. We then also have ‘real data’ in terms of actual dogs under the care of animal services.

We effectively end up with two graphs in one – a model and a meta model:

2014 10 20 22 32 31

The cypher query to create this graph looks like this:

LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-subtypes/data/dogs.csv" AS line
MERGE (animalType:AnimalType {name: "Dog"})
MERGE (breedGroup:BreedGroup {name: line.BreedGroup})
MERGE (breed:Breed {name: line.PrimaryBreed})
MERGE (animal:Animal {id: line.TagIdentity, primaryColour: line.PrimaryColour, size: line.Size})
MERGE (animalType)<-[:PARENT]-(breedGroup)
MERGE (breedGroup)<-[:PARENT]-(breed)
MERGE (breed)<-[:PARENT]-(animal)

We could then write a simple query to find out how many dogs we have:

MATCH (animalType:AnimalType)<-[:PARENT*]-(animal)
RETURN animalType, COUNT(*) AS animals

==> +--------------------------------+
==> | animalType           | animals |
==> +--------------------------------+
==> | Node[89]{name:"Dog"} | 131     |
==> +--------------------------------+
==> 1 row

Or we could write a slightly more complex query to find the number of animals at each level of our type hierarchy:

MATCH path = (animalType:AnimalType)<-[:PARENT]-(breedGroup)<-[:PARENT*]-(animal)
RETURN [node IN nodes(path) | node.name][..-1] AS breed, COUNT(*) AS animals

==> +-----------------------------------------------------+
==> | breed                                     | animals |
==> +-----------------------------------------------------+
==> | ["Dog","SETTER/RETRIEVE","LABRADOR RETR"] | 15      |
==> | ["Dog","SETTER/RETRIEVE","GOLDEN RETR"]   | 13      |
==> | ["Dog","POODLE","POODLE MIN"]             | 10      |
==> | ["Dog","TERRIER","MIN PINSCHER"]          | 9       |
==> | ["Dog","SHEPHERD","WELSH CORGI CAR"]      | 6       |
==> +-----------------------------------------------------+
==> 5 rows

We might then decide to add an exercise sub graph which indicates how much exercise each type of dog requires:

MATCH (breedGroup:BreedGroup)
MERGE (exercise:Exercise {type: "2 hours hard exercise"})
MERGE (exercise)<-[:REQUIRES_EXERCISE]-(breedGroup);

MATCH (breedGroup:BreedGroup)
WHERE breedGroup.name IN ["TERRIER", "SHEPHERD"]
MERGE (exercise:Exercise {type: "1 hour gentle exercise"})
MERGE (exercise)<-[:REQUIRES_EXERCISE]-(breedGroup);

We could then query that to find out which dogs need to come out for 2 hours of hard exercise:

MATCH (exercise:Exercise {type: "2 hours hard exercise"})<-[:REQUIRES_EXERCISE]-()<-[:PARENT*]-(dog)
WHERE NOT (dog)<-[:PARENT]-()

==> +-----------------------------------------------------------+
==> | dog                                                       |
==> +-----------------------------------------------------------+
==> | Node[541]{id:"664427",primaryColour:"BLACK",size:"SMALL"} |
==> | Node[542]{id:"543787",primaryColour:"BLACK",size:"SMALL"} |
==> | Node[543]{id:"584021",primaryColour:"BLACK",size:"SMALL"} |
==> | Node[544]{id:"584022",primaryColour:"BLACK",size:"SMALL"} |
==> | Node[545]{id:"664430",primaryColour:"BLACK",size:"SMALL"} |
==> | Node[546]{id:"535176",primaryColour:"BLACK",size:"SMALL"} |
==> | Node[567]{id:"613557",primaryColour:"WHITE",size:"SMALL"} |
==> | Node[568]{id:"531376",primaryColour:"WHITE",size:"SMALL"} |
==> | Node[569]{id:"613567",primaryColour:"WHITE",size:"SMALL"} |
==> | Node[570]{id:"531379",primaryColour:"WHITE",size:"SMALL"} |
==> +-----------------------------------------------------------+
==> 10 rows

In this query we ensured that we only returned dogs rather than breeds by checking that there was no incoming PARENT relationship. Alternatively we could have filtered on the Animal label…

MATCH (exercise:Exercise {type: "2 hours hard exercise"})<-[:REQUIRES_EXERCISE]-()<-[:PARENT*]-(dog:Animal)

or if we wanted to only take the dogs out for exercise perhaps we’d have Dog label on the appropriate nodes.

People are often curious why labels don’t have super/sub types between them but I tend to use labels for simple categorisation – anything more complicated and we may as well use the built in power of the graph model!

The code is on github should you wish to play with it.


The Java Zone is brought to you in partnership with ZeroTurnaround. Check out this 8-step guide to see how you can increase your productivity by skipping slow application redeploys and by implementing application profiling, as you code!


Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}