Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Neo4j 2.0.0: Optimising a Football Query

DZone's Guide to

Neo4j 2.0.0: Optimising a Football Query

· Java Zone
Free Resource

Navigate the Maze of the End-User Experience and pick up this APM Essential guide, brought to you in partnership with CA Technologies

A couple of months ago I wrote a blog post explaining how I’d applied Wes Freeman’s Cypher optimisation patterns to a query – since then Neo4j 2.0.0 has been released and I’ve extended the model so I thought I’d try again.

The updated model looks like this:

2014 01 31 22 25 20

The query is similar to before – I want to calculate the top away goal scorers in the 2012-2013 season. I started off with this:

MATCH (game)<-[:contains_match]-(season:Season),
      (team)<-[:away_team]-(game),
      (stats)-[:in]->(game),
      (team)<-[:for]-(stats)<-[:played]-(player)
WHERE season.name = "2012-2013"
RETURN player.name, 
       COLLECT(DISTINCT team.name), 
       SUM(stats.goals) as goals
ORDER BY goals DESC
LIMIT 10

When I executed this query using my Python query tuning tool and the average time was 3.31 seconds.

I separated the MATCH statements into smaller individual statements just to see what would happen:

MATCH (game)<-[:contains_match]-(season:Season)
MATCH (team)<-[:away_team]-(game)
MATCH (stats)-[:in]->(game)
MATCH (team)<-[:for]-(stats)<-[:played]-(player)
WHERE season.name = "2012-2013"
RETURN player.name, 
       COLLECT(DISTINCT team.name), 
       SUM(stats.goals) as goals
ORDER BY goals DESC
LIMIT 10

That reduced the time to 178 milliseconds which is quite a nice improvement for so little effort. As I understand it, this is down to the traversal matcher handling smaller patterns more effectively than it handles longer patterns.

The next step was to move the WHERE clause up so that it filtered out rows right at the beginning of the query rather than letting them hang around for another 3 MATCH statements:

MATCH (game)<-[:contains_match]-(season:Season)
WHERE season.name = "2012-2013"
MATCH (team)<-[:away_team]-(game)
MATCH (stats)-[:in]->(game)
MATCH (team)<-[:for]-(stats)<-[:played]-(player)
RETURN player.name, 
       COLLECT(DISTINCT team.name), 
       SUM(stats.goals) as goals
ORDER BY goals DESC
LIMIT 10

That took the time down to 131 milliseconds. At this stage I also tried putting the ‘MATCH (team)<-[:away_team]-(game)' line first to see what would happen.

MATCH (team)<-[:away_team]-(game)
MATCH (game)<-[:contains_match]-(season:Season)
WHERE season.name = "2012-2013"
MATCH (stats)-[:in]->(game)
MATCH (team)<-[:for]-(stats)<-[:played]-(player)
RETURN player.name, 
       COLLECT(DISTINCT team.name), 
       SUM(stats.goals) as goals
ORDER BY goals DESC
LIMIT 10

I expected it to be a bit slower since we were now keeping around more games than necessary again and as expected the time rose to 172 milliseconds – slightly quicker than our first attempt at tweaking.

I reverted that change and realised that there would be a lot of ‘stats’ nodes which didn’t have any goals associated with them and would therefore have a ‘goals’ property value of 0. I tried filtering those nodes out before the ‘SUM’ part of the query:

MATCH (game)<-[:contains_match]-(season:Season)
WHERE season.name = "2012-2013"
MATCH (team)<-[:away_team]-(game)
MATCH (stats)-[:in]->(game)
MATCH (team)<-[:for]-(stats)<-[:played]-(player)
WHERE stats.goals > 0
RETURN player.name,
COLLECT(DISTINCT team.name),
SUM(stats.goals) as goals
ORDER BY goals DESC
LIMIT 10

This proved to be a very good optimisation – the time reduced to 47 milliseconds, an improvement of almost 3x on the previous optimisation and 63x quicker than the original query.

The main optimisation pattern used here was reducing the number of rows being passed through the query.

Ideally you don’t want to be passing unnecessary rows through each stage of the query – rows can be filtered out either by using more specific MATCH clauses or in this case by a WHERE clause.

Wes and I presented a webinar on Cypher query optimisation last week which so if you want to learn more about tuning queries that might be worth a watch.



Thrive in the application economy with an APM model that is strategic. Be E.P.I.C. with CA APM.  Brought to you in partnership with CA Technologies.

Topics:

Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}