DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Extract Information From PDF Invoice
  • Why Database Migrations Take Months and How to Speed Them Up
  • Unmasking Entity-Based Data Masking: Best Practices 2025
  • How Trustworthy Is Big Data?

Trending

  • Why Database Migrations Take Months and How to Speed Them Up
  • *You* Can Shape Trend Reports: Join DZone's Software Supply Chain Security Research
  • Build Your First AI Model in Python: A Beginner's Guide (1 of 3)
  • How to Build Local LLM RAG Apps With Ollama, DeepSeek-R1, and SingleStore
  1. DZone
  2. Data Engineering
  3. Databases
  4. Neo4j/Cypher: WITH, COLLECT, and EXTRACT

Neo4j/Cypher: WITH, COLLECT, and EXTRACT

By 
Mark Needham user avatar
Mark Needham
·
Mar. 22, 13 · Interview
Likes (0)
Comment
Save
Tweet
Share
17.3K Views

Join the DZone community and get the full member experience.

Join For Free

As I mentioned in my last post I’m trying to get the hang of the WITH statement in neo4j’s cypher query language and I found another application when trying to work out which opponents teams played on certain days.

I started out with a query which grouped the data set by day and showed the opponents that were played on that day:

START team = node:teams('name:"Manchester United"')
MATCH team-[h:home_team|away_team]-game-[:on_day]-day
RETURN DISTINCT day.name, COLLECT(TRIM(REPLACE(REPLACE(game.name, "Manchester United", ""), "vs", "")))
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| day.name    | opponents                                                                                                                                                                                                                       |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| "Sunday"    | ["Liverpool","Everton","Southampton","Liverpool","Newcastle United","Chelsea","Manchester City","Swansea City","Tottenham Hotspur"]                                                                                             |
| "Wednesday" | ["Southampton","West Ham United","Newcastle United"]                                                                                                                                                                            |
| "Monday"    | ["Everton"]                                                                                                                                                                                                                     |
| "Saturday"  | ["Reading","Fulham","Wigan Athletic","Tottenham Hotspur","Stoke City","Arsenal","Queens Park Rangers","Sunderland","West Bromwich Albion","Norwich City","Reading","Aston Villa","Norwich City","Fulham","Queens Park Rangers"] |
| "Tuesday"   | ["Wigan Athletic"]                                                                                                                                                                                                              |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
5 rows

The way we’ve got the opponents is a bit of a hack – the name of the two teams is in the ‘name’ property of a game node and we’ve removed ‘Manchester United’ and the word ‘vs’ to get the opponent’s name.

I thought it’d be cool if we could separate the games on each day based on whether Manchester United were playing at home or away.

With a lot of help from Wes Freeman we ended up with the following query which does the job:

START team = node:teams('name:"Manchester United"')
MATCH team-[h:home_team|away_team]-game-[:on_day]-day 
WITH day.name as d, game, team, h 
MATCH team-[:home_team|away_team]-game-[:home_team|away_team]-opp 
WITH d, COLLECT([type(h),opp.name]) AS games 
RETURN d, 
  EXTRACT(c in FILTER(x in games: HEAD(x) = "home_team") : HEAD(TAIL(c))) AS home,   
  EXTRACT(c in FILTER(x in games: HEAD(x) = "away_team") : HEAD(TAIL(c))) AS away

We use a similar approach with COLLECT as in the previous post whereby we have a collection of tuples describing whether Manchester United were at home or not and who they were playing.

A neat thing that Wes pointed out is that since there are only 2 teams per game we’re able to get the opponent node easily because it’s the only other node that can match the ‘home_team|away_team” relationship since we’ve already matched our team.

If we run the query just up to the last WITH we get the following result:

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| d           | games                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| "Sunday"    | [["home_team","Liverpool"],["home_team","Everton"],["away_team","Southampton"],["away_team","Liverpool"],["away_team","Newcastle United"],["away_team","Chelsea"],["away_team","Manchester City"],["away_team","Swansea City"],["away_team","Tottenham Hotspur"]]                                                                                                                                                                                 |
| "Wednesday" | [["home_team","Southampton"],["home_team","West Ham United"],["home_team","Newcastle United"]]                                                                                                                                                                                                                                                                                                                                                    |
| "Monday"    | [["away_team","Everton"]]                                                                                                                                                                                                                                                                                                                                                                                                                         |
| "Saturday"  | [["home_team","Reading"],["home_team","Fulham"],["home_team","Wigan Athletic"],["home_team","Tottenham Hotspur"],["home_team","Stoke City"],["home_team","Arsenal"],["home_team","Queens Park Rangers"],["home_team","Sunderland"],["home_team","West Bromwich Albion"],["home_team","Norwich City"],["away_team","Reading"],["away_team","Aston Villa"],["away_team","Norwich City"],["away_team","Fulham"],["away_team","Queens Park Rangers"]] |
| "Tuesday"   | [["away_team","Wigan Athletic"]]                                                                                                                                                                                                                                                                                                                                                                                                                  |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
5 rows

We then use the FILTER function to choose either the opponents Manchester United played at home or away and then we use the EXTRACT function to get the opponent from the tuple:

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| d           | home                                                                                                                                                      |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| "Sunday"    | ["Liverpool","Everton"]                                                                                                                                   |
| "Wednesday" | ["Southampton","West Ham United","Newcastle United"]                                                                                                      |
| "Monday"    | []                                                                                                                                                        |
| "Saturday"  | ["Reading","Fulham","Wigan Athletic","Tottenham Hotspur","Stoke City","Arsenal","Queens Park Rangers","Sunderland","West Bromwich Albion","Norwich City"] |
| "Tuesday"   | []                                                                                                                                                        |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
5 rows
 
+-----------------------------------------------------------------------------------------------------------------------------+
| d           | away                                                                                                          |
+-----------------------------------------------------------------------------------------------------------------------------+
| "Sunday"    | ["Southampton","Liverpool","Newcastle United","Chelsea","Manchester City","Swansea City","Tottenham Hotspur"] |
| "Wednesday" | []                                                                                                            |
| "Monday"    | ["Everton"]                                                                                                   |
| "Saturday"  | ["Reading","Aston Villa","Norwich City","Fulham","Queens Park Rangers"]                                       |
| "Tuesday"   | ["Wigan Athletic"]                                                                                            |
+-----------------------------------------------------------------------------------------------------------------------------+

(I ran the query twice alternating between the last two lines so that it’s readable here. In actual fact the away teams would be in a column next to the home teams)

I thought it was quite interesting how many games Manchester United play away on a Sunday – I think all of those games were probably televised so I thought they’d be more evenly split between home and away matches. Adding televised matches is perhaps another layer to add to the graph.

It’s probably more useful to summarise how many games were played on each day at home and away rather than who they’re against and we can use the REDUCE function to do this:

START team = node:teams('name:"Manchester United"')
MATCH team-[h:home_team|away_team]-game-[:on_day]-day 
WITH day.name as dayName, game, team, h 
MATCH team-[:home_team|away_team]-game-[:home_team|away_team]-opp 
WITH dayName, COLLECT([type(h),opp.name]) AS games 
RETURN dayName, 
  REDUCE(homeGames=0, game in EXTRACT(c in FILTER(x in games: head(x) = "home_team") : HEAD(TAIL(c))) : homeGames + 1) as home,   
  REDUCE(awayGames=0, game in EXTRACT(c in FILTER(x in games: head(x) = "away_team") : HEAD(TAIL(c))) : awayGames + 1) as away,
  REDUCE(totalGames=0, game in games : totalGames + 1) as total
+-----------------------------------+
| dayName     | home | away | total |
+-----------------------------------+
| "Sunday"    | 2    | 7    | 9     |
| "Wednesday" | 3    | 0    | 3     |
| "Monday"    | 0    | 1    | 1     |
| "Saturday"  | 10   | 5    | 15    |
| "Tuesday"   | 0    | 1    | 1     |
+-----------------------------------+
5 rows

An alternative way of writing the initial query would be the following which Michael Hunger suggested on the thread:

START team = node:teams('name:"Manchester United"')
MATCH p=team-[:home_team|away_team]-game-[:home_team|away_team]-(), game-[:on_day]-day
WITH day.name as dayName, COLLECT([LAST(p), HEAD(RELS(p))]) AS opponents
WITH dayName,  
  EXTRACT(y in FILTER(x in opponents: TYPE(HEAD(TAIL(x))) = "home_team") : HEAD(y)) AS home,
  EXTRACT(y in FILTER(x in opponents : TYPE(HEAD(TAIL(x))) = "away_team") : HEAD(y)) AS away
RETURN dayName, 
  EXTRACT(team in home: team.name) AS homeOpponents,
  EXTRACT(team in away: team.name) AS awayOpponents
ORDER BY dayName

Here we take a slightly different approach where we make use of functions that we can apply to a matching path. We create a collection of tuples where LAST(p) matches the opponent node and HEAD(RELS(p)) matches the ‘home_team’ or ‘away_team’ relationship accordingly.

We then filter the collection to find the times that we played at home and away. This is done by taking the second value from the tuple and then calling TYPE on it which either returns ‘home_team’ or ‘away_team’. We then extract the first value from the tuple which is the opponent node.

In the last part of the query we extract the name from the opponent nodes.

Extract Database

Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Extract Information From PDF Invoice
  • Why Database Migrations Take Months and How to Speed Them Up
  • Unmasking Entity-Based Data Masking: Best Practices 2025
  • How Trustworthy Is Big Data?

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: