Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Neo4j / Cypher: Converting Queries from 1.9 to 2.0

DZone's Guide to

Neo4j / Cypher: Converting Queries from 1.9 to 2.0

· Java Zone
Free Resource

Build vs Buy a Data Quality Solution: Which is Best for You? Gain insights on a hybrid approach. Download white paper now!

I’ve been playing around with the most recent Neo4j 2.0 milestone release – 2.0.0-M05 – and one of the first things I did was translate the queries from my football data set which were written against Neo4j 1.9.

The following query calculates the number of goals scored by players in matches that were shown on television, not on television and in total.

START player=node:players('name:*')
MATCH player-[:played|subbed_on]->stats-[:in]->game-[t?:on_tv]->channel
WITH COLLECT([stats.goals, TYPE(t)]) AS games, player
RETURN player.name,
       REDUCE(goals = 0, g' IN FILTER(g IN games WHERE HEAD(TAIL(g)) IS NULL)| goals + HEAD(g')) AS nonTvGoals,
       REDUCE(goals = 0, g' IN FILTER(g IN games WHERE HEAD(TAIL(g)) <> NULL)| goals + HEAD(g')) AS tvGoals,
       REDUCE(goals = 0, g' in games | goals + HEAD(g')) AS totalGoals
ORDER BY tvGoals DESC
LIMIT 10

We use a legacy index to get all of the players, find the games they participated in, check if those games were on television and then group by player to work out whether the goals were scored in televised games.

When we evaluate that query we get the following result:

==> +--------------------------------------------------------+
==> | player.name        | nonTvGoals | tvGoals | totalGoals |
==> +--------------------------------------------------------+
==> | "Robin Van Persie" | 11         | 15      | 26         |
==> | "Gareth Bale"      | 8          | 13      | 21         |
==> | "Luis Suárez"      | 12         | 11      | 23         |
==> | "Theo Walcott"     | 5          | 9       | 14         |
==> | "Demba Ba"         | 7          | 8       | 15         |
==> | "Edin Dzeko"       | 7          | 7       | 14         |
==> | "Santi Cazorla"    | 5          | 7       | 12         |
==> | "Juan Mata"        | 6          | 6       | 12         |
==> | "Steven Gerrard"   | 3          | 6       | 9          |
==> | "Carlos Tevez"     | 5          | 6       | 11         |
==> +--------------------------------------------------------+
==> 10 rows

The first step was to get rid of the legacy index and replace it with a label based one. I created a label ‘Player’ for this purpose:

START player = node:players('name:*') SET player :Player RETURN player
CREATE INDEX on :Player(name)

Then I got rid of the legacy index lookup and replaced it with a label based one:

MATCH (player:Player)-[:played|subbed_on]->stats-[:in]->game-[t?:on_tv]->channel
WITH COLLECT([stats.goals, TYPE(t)]) AS games, player
RETURN player.name,
       REDUCE(goals = 0, g' IN FILTER(g IN games WHERE HEAD(TAIL(g)) IS NULL)| goals + HEAD(g')) AS nonTvGoals,
       REDUCE(goals = 0, g' IN FILTER(g IN games WHERE HEAD(TAIL(g)) <> NULL)| goals + HEAD(g')) AS tvGoals,
       REDUCE(goals = 0, g' in games | goals + HEAD(g')) AS totalGoals
ORDER BY tvGoals DESC
LIMIT 10

Unfortunately you can’t use optional relationships like this:

PatternException: Can't use optional patterns without explicit START clause. Optional relationships: `t`

A neat workaround which Andres showed me is to first match all of the players and then make them available using a WITH statement:

MATCH (player:Player)
WITH player
MATCH player-[:played|subbed_on]->stats-[:in]->game-[t?:on_tv]->channel
WITH COLLECT([stats.goals, TYPE(t)]) AS games, player
RETURN player.name,
       REDUCE(goals = 0, g' IN FILTER(g IN games WHERE HEAD(TAIL(g)) IS NULL)| goals + HEAD(g')) AS nonTvGoals,
       REDUCE(goals = 0, g' IN FILTER(g IN games WHERE HEAD(TAIL(g)) <> NULL)| goals + HEAD(g')) AS tvGoals,
       REDUCE(goals = 0, g' in games | goals + HEAD(g')) AS totalGoals
ORDER BY tvGoals DESC
LIMIT 10

If we evaluate this query it returns us the same results as before. However, given that we now have better collection support it seemed a shame to still use HEAD and TAIL so I replaced those with the following:

MATCH (player:Player)
WITH player
MATCH player-[:played|subbed_on]->stats-[:in]->game-[t?:on_tv]->channel
WITH COLLECT([stats.goals, TYPE(t)]) AS games, player
RETURN player.name,
       REDUCE(goals = 0, g' IN FILTER(g IN games WHERE g[1] IS NULL)| goals + g'[0]) AS nonTvGoals,
       REDUCE(goals = 0, g' IN FILTER(g IN games WHERE g[1] <> NULL)| goals + g'[0]) AS tvGoals,
       REDUCE(goals = 0, g' in games | goals + g'[0]) AS totalGoals
ORDER BY tvGoals DESC
LIMIT 10

I didn’t like those array indexes either and since we can now create maps I thought the query would be clearer if we created one on line 4 of the query rather than an array:

MATCH (player:Player)
WITH player
MATCH player-[:played|subbed_on]->stats-[:in]->game-[t?:on_tv]->channel
WITH COLLECT({goals: stats.goals, ontv: TYPE(t)}) AS games, player
RETURN player.name,
       REDUCE(goals = 0, g' IN FILTER(g IN games WHERE g.ontv IS NULL)| goals + g'.goals) AS nonTvGoals,
       REDUCE(goals = 0, g' IN FILTER(g IN games WHERE g.ontv <> NULL)| goals + g'.goals) AS tvGoals,
       REDUCE(goals = 0, g' in games | goals + g'.goals) AS totalGoals
ORDER BY tvGoals DESC
LIMIT 10

The next tweak would be to remove the FILTER and replace that with a list comprehension:

MATCH (player:Player)
WITH player
MATCH player-[:played|subbed_on]->stats-[:in]->game-[t?:on_tv]->channel
WITH COLLECT({goals: stats.goals, ontv: TYPE(t)}) AS games, player
RETURN player.name,
       REDUCE(goals = 0, g' IN [g IN games WHERE g.ontv IS NULL] | goals + g'.goals) AS nonTvGoals,
       REDUCE(goals = 0, g' IN [g IN games WHERE g.ontv <> NULL] | goals + g'.goals) AS tvGoals,
       REDUCE(goals = 0, g' in games | goals + g'.goals) AS totalGoals
ORDER BY tvGoals DESC
LIMIT 10

Although the query ends up being a couple of lines longer than its 1.9 cousin I think there’s less noise and the intent is clearer.


Build vs Buy a Data Quality Solution: Which is Best for You? Maintaining high quality data is essential for operational efficiency, meaningful analytics and good long-term customer relationships. But, when dealing with multiple sources of data, data quality becomes complex, so you need to know when you should build a custom data quality tools effort over canned solutions. Download our whitepaper for more insights into a hybrid approach.

Topics:

Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}