Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Neo4j/Cypher: SQL Style GROUP BY WITH LIMIT Query

DZone's Guide to

Neo4j/Cypher: SQL Style GROUP BY WITH LIMIT Query

· Java Zone
Free Resource

Build vs Buy a Data Quality Solution: Which is Best for You? Gain insights on a hybrid approach. Download white paper now!

A few weeks ago I wrote a blog post where I described how we could construct a SQL GROUP BY style query in cypher and last week I wanted to write a similar query but with what I think would be a LIMIT clause in SQL.

I wanted to find the maximum number of goals that players had scored in a match for a specific team and started off with the following query to find all the matches that players had scored in:

START team = node:teams('name:"Manchester United"')
MATCH team-[:home_team|away_team]-game-[:scored_in]-player-[:played]-stats-[:for]-team, 
      stats-[:in]-game
RETURN DISTINCT player.name, stats.goals, game.name

We find all the matches where Manchester United were playing and then get a list of the players who scored for them in those games:

+--------------------------------------------------------------------------------+
| player.name         | stats.goals | game.name                                  |
+--------------------------------------------------------------------------------+
| "Javier Hernández"  | 1           | "Manchester United vs Wigan Athletic"      |
| "Robin Van Persie"  | 1           | "Manchester United vs Sunderland"          |
| "Danny Welbeck"     | 1           | "Manchester United vs Stoke City"          |
| "Rafael"            | 1           | "Queens Park Rangers vs Manchester United" |
| "Wayne Rooney"      | 1           | "Manchester United vs Norwich City"        |
| "Shinji Kagawa"     | 1           | "Manchester United vs Fulham"              |
| "Shinji Kagawa"     | 3           | "Manchester United vs Norwich City"        |
...
+--------------------------------------------------------------------------------+
50 rows

Our next step would be to only return the unique combinations of players and goals:

START team = node:teams('name:"Manchester United"')
MATCH team-[:home_team|away_team]-game-[:scored_in]-player-[:played]-stats-[:for]-team, 
      stats-[:in]-game
RETURN DISTINCT player.name, stats.goals
+-----------------------------------+
| player.name         | stats.goals |
+-----------------------------------+
| "Nemanja Vidic"     | 1           |
| "Shinji Kagawa"     | 1           |
| "Danny Welbeck"     | 1           |
| "Darren Fletcher"   | 1           |
| "Wayne Rooney"      | 2           |
| "Javier Hernández"  | 1           |
| "Nani"              | 1           |
| "Tom Cleverley"     | 1           |
| "Robin Van Persie"  | 2           |
| "Shinji Kagawa"     | 3           |
...
| "Robin Van Persie"  | 3           |
+-----------------------------------+
21 rows

Since we only want to return each player once along with their maximum value from the ‘stats.goals’ column all we have to do now is make use of the MAX function:

START team = node:teams('name:"Manchester United"')
MATCH team-[:home_team|away_team]-game-[:scored_in]-player-[:played]-stats-[:for]-team, 
      stats-[:in]-game
RETURN DISTINCT player.name, MAX(stats.goals) AS goals
ORDER BY goals DESC
+-----------------------------+
| player.name         | goals |
+-----------------------------+
| "Robin Van Persie"  | 3     |
| "Shinji Kagawa"     | 3     |
| "Javier Hernández"  | 2     |
| "Wayne Rooney"      | 2     |
| "Paul Scholes"      | 1     |
| "Ryan Giggs"        | 1     |
| "Tom Cleverley"     | 1     |
...
| "Rafael"            | 1     |
| "Darren Fletcher"   | 1     |
| "Nani"              | 1     |
+-----------------------------+
16 rows


Build vs Buy a Data Quality Solution: Which is Best for You? Maintaining high quality data is essential for operational efficiency, meaningful analytics and good long-term customer relationships. But, when dealing with multiple sources of data, data quality becomes complex, so you need to know when you should build a custom data quality tools effort over canned solutions. Download our whitepaper for more insights into a hybrid approach.

Topics:

Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}