Over a million developers have joined DZone.

Neo4j/Cypher: Redundant Relationships

· Database Zone

Build fast, scale big with MongoDB Atlas, a hosted service for the leading NoSQL database. Try it now! Brought to you in partnership with MongoDB.

Last week I was writing a query to find the top scorers in the Premier League so far this season alongside the number of games they’ve played in which initially read like this:

START player = node:players('name:*')
MATCH player-[:started|as_sub]-playedLike-[:in]-game-[r?:scored_in]-player
WITH player, COUNT(DISTINCT game) AS games, COLLECT(r) AS allGoals
RETURN player.name, games, LENGTH(allGoals) AS goals
ORDER BY goals DESC
LIMIT 5

+------------------------------------+
| player.name        | games | goals |
+------------------------------------+
| "Luis Suárez"      | 30    | 22    |
| "Robin Van Persie" | 30    | 19    |
| "Gareth Bale"      | 27    | 17    |
| "Michu"            | 29    | 16    |
| "Demba Ba"         | 28    | 15    |
+------------------------------------+
5 rows
1 ms

I modelled whether a player started a game or came on as a substitute with separate relationship types ‘started’ and ‘as_sub’ but in this query we’re not interested in that, we just want to know whether they played.

In the world of relational database design we tend to try and avoid redundancy but with graphs this isn’t such a big deal so I thought I may as well add a ‘played’ relationship whenever a ‘as_sub’ or ‘started’ one was being created.

We can then simplify the above query to read like this:

START player = node:players('name:*')
MATCH player-[:played]-playedLike-[:in]-game-[r?:scored_in]-player
WITH player, COUNT(DISTINCT game) AS games, COLLECT(r) AS allGoals
RETURN player.name, games, LENGTH(allGoals) AS goals
ORDER BY goals DESC
LIMIT 5

+------------------------------------+
| player.name        | games | goals |
+------------------------------------+
| "Luis Suárez"      | 30    | 22    |
| "Robin Van Persie" | 30    | 19    |
| "Gareth Bale"      | 27    | 17    |
| "Michu"            | 29    | 16    |
| "Demba Ba"         | 28    | 15    |
+------------------------------------+
5 rows
0 ms

When I’m querying I often forget that I modelled starting/substitute separately and think the data has screwed up and it’s always because I’ve forgotten to include the ‘as_sub’ relationship.

Having the ‘played’ relationship means that no longer happens which is cool.

I have a reasonably small data set so I haven’t seen any performance problems from creating this redundancy.

However, since the maximum number of relationships going out from a player would be unlikely to be more than 1000s I don’t think it will become one either.

As always I’d be interested in thoughts from others who have come across similar problems or can see something that I’ve missed.




Now it's easier than ever to get started with MongoDB, the database that allows startups and enterprises alike to rapidly build planet-scale apps. Introducing MongoDB Atlas, the official hosted service for the database on AWS. Try it now! Brought to you in partnership with MongoDB.

Topics:

Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}