Neo4j/Cypher: Redundant Relationships
Join the DZone community and get the full member experience.
Join For FreeLast week I was writing a query to find the top scorers in the Premier League so far this season alongside the number of games they’ve played in which initially read like this:
START player = node:players('name:*') MATCH player-[:started|as_sub]-playedLike-[:in]-game-[r?:scored_in]-player WITH player, COUNT(DISTINCT game) AS games, COLLECT(r) AS allGoals RETURN player.name, games, LENGTH(allGoals) AS goals ORDER BY goals DESC LIMIT 5
+------------------------------------+ | player.name | games | goals | +------------------------------------+ | "Luis Suárez" | 30 | 22 | | "Robin Van Persie" | 30 | 19 | | "Gareth Bale" | 27 | 17 | | "Michu" | 29 | 16 | | "Demba Ba" | 28 | 15 | +------------------------------------+ 5 rows 1 ms
I modelled whether a player started a game or came on as a substitute with separate relationship types ‘started’ and ‘as_sub’ but in this query we’re not interested in that, we just want to know whether they played.
In the world of relational database design we tend to try and avoid redundancy but with graphs this isn’t such a big deal so I thought I may as well add a ‘played’ relationship whenever a ‘as_sub’ or ‘started’ one was being created.
We can then simplify the above query to read like this:
START player = node:players('name:*') MATCH player-[:played]-playedLike-[:in]-game-[r?:scored_in]-player WITH player, COUNT(DISTINCT game) AS games, COLLECT(r) AS allGoals RETURN player.name, games, LENGTH(allGoals) AS goals ORDER BY goals DESC LIMIT 5
+------------------------------------+ | player.name | games | goals | +------------------------------------+ | "Luis Suárez" | 30 | 22 | | "Robin Van Persie" | 30 | 19 | | "Gareth Bale" | 27 | 17 | | "Michu" | 29 | 16 | | "Demba Ba" | 28 | 15 | +------------------------------------+ 5 rows 0 ms
When I’m querying I often forget that I modelled starting/substitute separately and think the data has screwed up and it’s always because I’ve forgotten to include the ‘as_sub’ relationship.
Having the ‘played’ relationship means that no longer happens which is cool.
I have a reasonably small data set so I haven’t seen any performance problems from creating this redundancy.
However, since the maximum number of relationships going out from a player would be unlikely to be more than 1000s I don’t think it will become one either.
As always I’d be interested in thoughts from others who have come across similar problems or can see something that I’ve missed.
Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Trending
-
Five Java Books Beginners and Professionals Should Read
-
Front-End: Cache Strategies You Should Know
-
Alpha Testing Tutorial: A Comprehensive Guide With Best Practices
-
Top Six React Development Tools
Comments