DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
What's in store for DevOps in 2023? Hear from the experts in our "DZone 2023 Preview: DevOps Edition" on Fri, Jan 27!
Save your seat
  1. DZone
  2. Data Engineering
  3. Databases
  4. Neo4j: Analyzing a CSV File Using LOAD CSV and Cypher

Neo4j: Analyzing a CSV File Using LOAD CSV and Cypher

Want to try to comb through a CSV file for data? Using Neo4j's Cypher and LOAD CSV, you can easily parse the data and pull it up for your own use.

Mark Needham user avatar by
Mark Needham
·
Feb. 21, 17 · Tutorial
Like (1)
Save
Tweet
Share
5.71K Views

Join the DZone community and get the full member experience.

Join For Free

Last week, we ran our first online meetup in several years and I wanted to wanted to analyze the stats that YouTube lets you download for an event.

The file I downloaded looked like this:

$ cat ~/Downloads/youtube_stats_pW9boJoUxO0.csv 
Video IDs:, pW9boJoUxO0, Start time:, Wed Feb 15 08:57:55 2017, End time:, Wed Feb 15 10:03:10 2017
Playbacks, Peak concurrent viewers, Total view time (hours), Average session length (minutes)
348, 112, 97.125, 16.7456896552, 

Country code, AR, AT, BE, BR, BY, CA, CH, CL, CR, CZ, DE, DK, EC, EE, ES, FI, FR, GB, HU, IE, IL, IN, IT, LB, LU, LV, MY, NL, NO, NZ, PK, PL, QA, RO, RS, RU, SE, TR, US, VN, ZA
Playbacks, 2, 2, 1, 14, 1, 10, 2, 1, 1, 1, 27, 1, 1, 1, 3, 1, 25, 54, 1, 4, 6, 8, 1, 1, 1, 1, 1, 23, 1, 1, 1, 1, 1, 1, 2, 6, 22, 1, 114, 1, 1
Peak concurrent viewers, 2, 1, 1, 4, 1, 5, 1, 1, 0, 0, 11, 1, 1, 1, 2, 1, 6, 25, 1, 3, 3, 2, 1, 1, 1, 1, 1, 9, 1, 1, 0, 1, 0, 1, 1, 3, 7, 0, 44, 1, 0
Total view time (hours), 1.075, 0.0166666666667, 0.175, 2.58333333333, 0.00833333333333, 3.01666666667, 0.858333333333, 0.0583333333333, 0.0, 0.0, 8.69166666667, 0.8, 0.0166666666667, 0.0583333333333, 0.966666666667, 0.0166666666667, 4.20833333333, 20.8333333333, 0.00833333333333, 1.39166666667, 1.75, 0.766666666667, 0.00833333333333, 0.15, 0.0333333333333, 1.05833333333, 0.0333333333333, 7.36666666667, 0.0583333333333, 0.916666666667, 0.0, 0.00833333333333, 0.0, 0.00833333333333, 0.4, 1.10833333333, 5.28333333333, 0.0, 32.7333333333, 0.658333333333, 0.0
Average session length (minutes), 32.25, 0.5, 10.5, 11.0714285714, 0.5, 18.1, 25.75, 3.5, 0.0, 0.0, 19.3148148148, 48.0, 1.0, 3.5, 19.3333333333, 1.0, 10.1, 23.1481481481, 0.5, 20.875, 17.5, 5.75, 0.5, 9.0, 2.0, 63.5, 2.0, 19.2173913043, 3.5, 55.0, 0.0, 0.5, 0.0, 0.5, 12.0, 11.0833333333, 14.4090909091, 0.0, 17.2280701754, 39.5, 0.0


$ cat ~/Downloads/youtube_stats_pW9boJoUxO0.csv Video IDs:, pW9boJoUxO0, Start time:, Wed Feb 15 08:57:55 2017, End time:, Wed Feb 15 10:03:10 2017 Playbacks, Peak concurrent viewers, Total view time (hours), Average session length (minutes) 348, 112, 97.125, 16.7456896552, Country code, AR, AT, BE, BR, BY, CA, CH, CL, CR, CZ, DE, DK, EC, EE, ES, FI, FR, GB, HU, IE, IL, IN, IT, LB, LU, LV, MY, NL, NO, NZ, PK, PL, QA, RO, RS, RU, SE, TR, US, VN, ZA Playbacks, 2, 2, 1, 14, 1, 10, 2, 1, 1, 1, 27, 1, 1, 1, 3, 1, 25, 54, 1, 4, 6, 8, 1, 1, 1, 1, 1, 23, 1, 1, 1, 1, 1, 1, 2, 6, 22, 1, 114, 1, 1 Peak concurrent viewers, 2, 1, 1, 4, 1, 5, 1, 1, 0, 0, 11, 1, 1, 1, 2, 1, 6, 25, 1, 3, 3, 2, 1, 1, 1, 1, 1, 9, 1, 1, 0, 1, 0, 1, 1, 3, 7, 0, 44, 1, 0 Total view time (hours), 1.075, 0.0166666666667, 0.175, 2.58333333333, 0.00833333333333, 3.01666666667, 0.858333333333, 0.0583333333333, 0.0, 0.0, 8.69166666667, 0.8, 0.0166666666667, 0.0583333333333, 0.966666666667, 0.0166666666667, 4.20833333333, 20.8333333333, 0.00833333333333, 1.39166666667, 1.75, 0.766666666667, 0.00833333333333, 0.15, 0.0333333333333, 1.05833333333, 0.0333333333333, 7.36666666667, 0.0583333333333, 0.916666666667, 0.0, 0.00833333333333, 0.0, 0.00833333333333, 0.4, 1.10833333333, 5.28333333333, 0.0, 32.7333333333, 0.658333333333, 0.0 Average session length (minutes), 32.25, 0.5, 10.5, 11.0714285714, 0.5, 18.1, 25.75, 3.5, 0.0, 0.0, 19.3148148148, 48.0, 1.0, 3.5, 19.3333333333, 1.0, 10.1, 23.1481481481, 0.5, 20.875, 17.5, 5.75, 0.5, 9.0, 2.0, 63.5, 2.0, 19.2173913043, 3.5, 55.0, 0.0, 0.5, 0.0, 0.5, 12.0, 11.0833333333, 14.4090909091, 0.0, 17.2280701754, 39.5, 0.0

I want to look at the country specific stats so the first 4 lines aren’t interesting to me:

$ tail -n+5 youtube_stats_pW9boJoUxO0.csv > youtube.csv


$ tail -n+5 youtube_stats_pW9boJoUxO0.csv > youtube.csv

I then put the file into the import directory of Neo4j and wrote the following query to return a row representing each country and its score for each of the metrics:

load csv with headers from "file:///youtube.csv" AS row
WITH [key in keys(row) where key <> "Country code"] AS keys, row, row["Country code"] AS heading
UNWIND keys AS key
RETURN key AS country, heading AS key, row[key] AS value

╒═════════╤═══════════╤═══════╕
│"country"│"key"      │"value"│
╞═════════╪═══════════╪═══════╡
│" SE"    │"Playbacks"│"22"   │
├─────────┼───────────┼───────┤
│" GB"    │"Playbacks"│"54"   │
├─────────┼───────────┼───────┤
│" FR"    │"Playbacks"│"25"   │
├─────────┼───────────┼───────┤
│" RS"    │"Playbacks"│"2"    │
├─────────┼───────────┼───────┤
│" LV"    │"Playbacks"│"1"    │
└─────────┴───────────┴───────┘


load csv with headers from "file:///youtube.csv" AS row WITH [key in keys(row) where key <>"Country code"] AS keys, row, row["Country code"] AS heading UNWIND keys AS key RETURN key AS country, heading AS key, row[key] AS value ╒═════════╤═══════════╤═══════╕ │"country"│"key" │"value"│ ╞═════════╪═══════════╪═══════╡ │" SE" │"Playbacks"│"22" │ ├─────────┼───────────┼───────┤ │" GB" │"Playbacks"│"54" │ ├─────────┼───────────┼───────┤ │" FR" │"Playbacks"│"25" │ ├─────────┼───────────┼───────┤ │" RS" │"Playbacks"│"2" │ ├─────────┼───────────┼───────┤ │" LV" │"Playbacks"│"1" │ └─────────┴───────────┴───────┘

Now I want to create a node representing each country and create a property for each of the metrics. Since the property names are going to be dynamic, I’ll make use of the APOC library, which I drop into my plugins directory. I then tweaked the query to create the nodes:

load csv with headers from "https://dl.dropboxusercontent.com/u/14493611/youtube.csv" AS row
WITH [key in keys(row) where key <> "Country code"] AS keys, row, row["Country code"] AS heading
UNWIND keys AS key
WITH key AS country, heading AS key, row[key] AS value
MERGE (c:Country {name: replace(country, " ", "")})
WITH *
CALL apoc.create.setProperty(c, key, toInteger(value))
YIELD node
RETURN COUNT(*)


We can now see which country provided the most viewers:

MATCH (n:Country) 
RETURN n.name, n.Playbacks AS playbacks, n.`Total view time (hours)` AS viewTimeInHours, n.`Peak concurrent viewers` AS peakConcViewers, n.`Average session length (minutes)` AS aveSessionMins
ORDER BY playbacks DESC
LIMIT 10

╒════════╤═══════════╤═════════════════╤═════════════════╤════════════════╕
│"n.name"│"playbacks"│"viewTimeInHours"│"peakConcViewers"│"aveSessionMins"│
╞════════╪═══════════╪═════════════════╪═════════════════╪════════════════╡
│"US"    │"114"      │"32"             │"44"             │"17"            │
├────────┼───────────┼─────────────────┼─────────────────┼────────────────┤
│"GB"    │"54"       │"20"             │"25"             │"23"            │
├────────┼───────────┼─────────────────┼─────────────────┼────────────────┤
│"DE"    │"27"       │"8"              │"11"             │"19"            │
├────────┼───────────┼─────────────────┼─────────────────┼────────────────┤
│"FR"    │"25"       │"4"              │"6"              │"10"            │
├────────┼───────────┼─────────────────┼─────────────────┼────────────────┤
│"NL"    │"23"       │"7"              │"9"              │"19"            │
├────────┼───────────┼─────────────────┼─────────────────┼────────────────┤
│"SE"    │"22"       │"5"              │"7"              │"14"            │
├────────┼───────────┼─────────────────┼─────────────────┼────────────────┤
│"BR"    │"14"       │"2"              │"4"              │"11"            │
├────────┼───────────┼─────────────────┼─────────────────┼────────────────┤
│"CA"    │"10"       │"3"              │"5"              │"18"            │
├────────┼───────────┼─────────────────┼─────────────────┼────────────────┤
│"IN"    │"8"        │"0"              │"2"              │"5"             │
├────────┼───────────┼─────────────────┼─────────────────┼────────────────┤
│"IL"    │"6"        │"1"              │"3"              │"17"            │
└────────┴───────────┴─────────────────┴─────────────────┴────────────────┘


MATCH (n:Country) RETURN n.name, n.Playbacks AS playbacks, n.`Total view time (hours)` AS viewTimeInHours, n.`Peak concurrent viewers` AS peakConcViewers, n.`Average session length (minutes)` AS aveSessionMins ORDER BY playbacks DESC LIMIT 10 ╒════════╤═══════════╤═════════════════╤═════════════════╤════════════════╕ │"n.name"│"playbacks"│"viewTimeInHours"│"peakConcViewers"│"aveSessionMins"│ ╞════════╪═══════════╪═════════════════╪═════════════════╪════════════════╡ │"US" │"114" │"32" │"44" │"17" │ ├────────┼───────────┼─────────────────┼─────────────────┼────────────────┤ │"GB" │"54" │"20" │"25" │"23" │ ├────────┼───────────┼─────────────────┼─────────────────┼────────────────┤ │"DE" │"27" │"8" │"11" │"19" │ ├────────┼───────────┼─────────────────┼─────────────────┼────────────────┤ │"FR" │"25" │"4" │"6" │"10" │ ├────────┼───────────┼─────────────────┼─────────────────┼────────────────┤ │"NL" │"23" │"7" │"9" │"19" │ ├────────┼───────────┼─────────────────┼─────────────────┼────────────────┤ │"SE" │"22" │"5" │"7" │"14" │ ├────────┼───────────┼─────────────────┼─────────────────┼────────────────┤ │"BR" │"14" │"2" │"4" │"11" │ ├────────┼───────────┼─────────────────┼─────────────────┼────────────────┤ │"CA" │"10" │"3" │"5" │"18" │ ├────────┼───────────┼─────────────────┼─────────────────┼────────────────┤ │"IN" │"8" │"0" │"2" │"5" │ ├────────┼───────────┼─────────────────┼─────────────────┼────────────────┤ │"IL" │"6" │"1" │"3" │"17" │ └────────┴───────────┴─────────────────┴─────────────────┴────────────────┘

The United States is unsurprisingly first, followed by the UK, Germany, and France. We ran the meetup at 5 p.m. UK time, so it was a friendly enough time for this side of the globe but not so friendly for Asia or Australia so it’s not too surprising we don’t see anybody from there!

For my last trick, I wanted to see the full names of the countries so I downloaded the 2 digit codes for each country along with their full name.

I then updated my graph:

load csv with headers from "file:///countries.csv" AS row
MATCH (c:Country {name: row.Code})
SET c.fullName = row.Name;


load csv with headers from "file:///countries.csv" AS row MATCH (c:Country {name: row.Code}) SET c.fullName = row.Name;

Now let’s re-run our query and show the country fullnames instead:

MATCH (n:Country) 
RETURN n.fullName, n.Playbacks AS playbacks, n.`Total view time (hours)` AS viewTimeInHours, n.`Peak concurrent viewers` AS peakConcViewers, n.`Average session length (minutes)` AS aveSessionMins
ORDER BY playbacks DESC
LIMIT 10

╒════════════════╤═══════════╤═════════════════╤═════════════════╤════════════════╕
│"n.fullName"    │"playbacks"│"viewTimeInHours"│"peakConcViewers"│"aveSessionMins"│
╞════════════════╪═══════════╪═════════════════╪═════════════════╪════════════════╡
│"United States" │"114"      │"32"             │"44"             │"17"            │
├────────────────┼───────────┼─────────────────┼─────────────────┼────────────────┤
│"United Kingdom"│"54"       │"20"             │"25"             │"23"            │
├────────────────┼───────────┼─────────────────┼─────────────────┼────────────────┤
│"Germany"       │"27"       │"8"              │"11"             │"19"            │
├────────────────┼───────────┼─────────────────┼─────────────────┼────────────────┤
│"France"        │"25"       │"4"              │"6"              │"10"            │
├────────────────┼───────────┼─────────────────┼─────────────────┼────────────────┤
│"Netherlands"   │"23"       │"7"              │"9"              │"19"            │
├────────────────┼───────────┼─────────────────┼─────────────────┼────────────────┤
│"Sweden"        │"22"       │"5"              │"7"              │"14"            │
├────────────────┼───────────┼─────────────────┼─────────────────┼────────────────┤
│"Brazil"        │"14"       │"2"              │"4"              │"11"            │
├────────────────┼───────────┼─────────────────┼─────────────────┼────────────────┤
│"Canada"        │"10"       │"3"              │"5"              │"18"            │
├────────────────┼───────────┼─────────────────┼─────────────────┼────────────────┤
│"India"         │"8"        │"0"              │"2"              │"5"             │
├────────────────┼───────────┼─────────────────┼─────────────────┼────────────────┤
│"Israel"        │"6"        │"1"              │"3"              │"17"            │
└────────────────┴───────────┴─────────────────┴─────────────────┴────────────────┘


MATCH (n:Country) RETURN n.fullName, n.Playbacks AS playbacks, n.`Total view time (hours)` AS viewTimeInHours, n.`Peak concurrent viewers` AS peakConcViewers, n.`Average session length (minutes)` AS aveSessionMins ORDER BY playbacks DESC LIMIT 10 ╒════════════════╤═══════════╤═════════════════╤═════════════════╤════════════════╕ │"n.fullName" │"playbacks"│"viewTimeInHours"│"peakConcViewers"│"aveSessionMins"│ ╞════════════════╪═══════════╪═════════════════╪═════════════════╪════════════════╡ │"United States" │"114" │"32" │"44" │"17" │ ├────────────────┼───────────┼─────────────────┼─────────────────┼────────────────┤ │"United Kingdom"│"54" │"20" │"25" │"23" │ ├────────────────┼───────────┼─────────────────┼─────────────────┼────────────────┤ │"Germany" │"27" │"8" │"11" │"19" │ ├────────────────┼───────────┼─────────────────┼─────────────────┼────────────────┤ │"France" │"25" │"4" │"6" │"10" │ ├────────────────┼───────────┼─────────────────┼─────────────────┼────────────────┤ │"Netherlands" │"23" │"7" │"9" │"19" │ ├────────────────┼───────────┼─────────────────┼─────────────────┼────────────────┤ │"Sweden" │"22" │"5" │"7" │"14" │ ├────────────────┼───────────┼─────────────────┼─────────────────┼────────────────┤ │"Brazil" │"14" │"2" │"4" │"11" │ ├────────────────┼───────────┼─────────────────┼─────────────────┼────────────────┤ │"Canada" │"10" │"3" │"5" │"18" │ ├────────────────┼───────────┼─────────────────┼─────────────────┼────────────────┤ │"India" │"8" │"0" │"2" │"5" │ ├────────────────┼───────────┼─────────────────┼─────────────────┼────────────────┤ │"Israel" │"6" │"1" │"3" │"17" │ └────────────────┴───────────┴─────────────────┴─────────────────┴────────────────┘

And that’s the end of my analysis with no relationships in sight!

Database CSV Neo4j

Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Top 10 Secure Coding Practices Every Developer Should Know
  • A Brief Overview of the Spring Cloud Framework
  • Understanding gRPC Concepts, Use Cases, and Best Practices
  • How to Check Docker Images for Vulnerabilities

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: