Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Neo4j: LOAD CSV - Handling Conditionals

DZone's Guide to

Neo4j: LOAD CSV - Handling Conditionals

· Java Zone
Free Resource

Build vs Buy a Data Quality Solution: Which is Best for You? Gain insights on a hybrid approach. Download white paper now!

While building up the Neo4j World Cup Graph I’ve been making use of the LOAD CSV function and I frequently found myself needing to do different things depending on the value in one of the columns.

For example I have one CSV file which contains the different events that can happen in a football match:

match_id,player,player_id,time,type
"1012","Antonin Panenka","174835",21,"penalty"
"1012","Faisal Al Dakhil","2204",57,"goal"
"102","Roger Milla","79318",106,"goal"
"102","Roger Milla","79318",108,"goal"
"102","Bernardo Redin","44555",115,"goal"
"102","Andre Kana-biyik","174649",44,"yellow"

If the type is ‘penalty’, ‘owngoal’ or ‘goal’ then I want to create a SCORED_GOAL relationship whereas if it’s ‘yellow’, ‘yellowred’ or ‘red’ then I want to create a RECEIVED_CARD relationship instead.

I learnt – from reading a cypher script written by Chris Leishman – that we can make FOREACH mimic a conditional by creating a collection with one item in to represent ‘true’ and an empty collection to represent ‘false’.

In this case we’d end up with something like this to handle the case where a row represents a goal:

LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/mneedham/neo4j-worldcup/master/data/import/events.csv" AS csvLine
 
// removed for conciseness
 
// goals
FOREACH(n IN (CASE WHEN csvLine.type IN ["penalty", "goal", "owngoal"] THEN [1] else [] END) |
  FOREACH(t IN CASE WHEN team = home THEN [home] ELSE  END |
    MERGE (stats)-[:SCORED_GOAL]->(penalty:Goal {time: csvLine.time, type: csvLine.type})
  )		
)

And equally when we want to process a row that represents a card we’d have this:

// cards
FOREACH(n IN (CASE WHEN csvLine.type IN ["yellow", "red", "yellowred"] THEN [1] else [] END) |
  FOREACH(t IN CASE WHEN team = home THEN [home] ELSE [away] END |
    MERGE (stats)-[:RECEIVED_CARD]->(card {time: csvLine.time, type: csvLine.type})
  )		
)

And if we put everything together we get this:

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/mneedham/neo4j-worldcup/master/data/import/events.csv" AS csvLine
 
MATCH (home)<-[:HOME_TEAM]-(match:Match {id: csvLine.match_id})-[:AWAY_TEAM]->(away)
 
MATCH (player:Player {id: csvLine.player_id})-[:IN_SQUAD]->(squad)<-[:NAMED_SQUAD]-(team)
MATCH (player)-[:STARTED|:SUBSTITUTE]->(stats)-[:IN_MATCH]->(match)
 
// goals
FOREACH(n IN (CASE WHEN csvLine.type IN ["penalty", "goal", "owngoal"] THEN [1] else [] END) |
  FOREACH(t IN CASE WHEN team = home THEN [home] ELSE [away] END |
    MERGE (stats)-[:SCORED_GOAL]->(penalty:Goal {time: csvLine.time, type: csvLine.type})
  )		
)
 
// cards
FOREACH(n IN (CASE WHEN csvLine.type IN ["yellow", "red", "yellowred"] THEN [1] else [] END) |
  FOREACH(t IN CASE WHEN team = home THEN [home] ELSE [away] END |
    MERGE (stats)-[:RECEIVED_CARD]->(card {time: csvLine.time, type: csvLine.type})
  )		
)
;

You can have a look at the [a href="https://github.com/mneedham/neo4j-worldcup/blob/master/data/import/loadEvents.cyp"]code on github or follow the instructions to get all the World Cup graph into your own local Neo4j.

Feedback welcome as always.


Build vs Buy a Data Quality Solution: Which is Best for You? Maintaining high quality data is essential for operational efficiency, meaningful analytics and good long-term customer relationships. But, when dealing with multiple sources of data, data quality becomes complex, so you need to know when you should build a custom data quality tools effort over canned solutions. Download our whitepaper for more insights into a hybrid approach.

Topics:

Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}