Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Detecting Rogue Spaces in CSV Headers With Load CSV in Neo4j

DZone's Guide to

Detecting Rogue Spaces in CSV Headers With Load CSV in Neo4j

Rogue spaces, or any other null value in your CSV header, can lead to big trouble. See how you can get past them with some code.

· Database Zone
Free Resource

Whether you work in SQL Server Management Studio or Visual Studio, Redgate tools integrate with your existing infrastructure, enabling you to align DevOps for your applications with DevOps for your SQL Server databases. Discover true Database DevOps, brought to you in partnership with Redgate.

Last week, I was helping someone load the data from a CSV file into Neo4j, and we were having trouble filtering out rows which contained a null value in one of the columns.

This is what the data looked like:

load csv with headers from "file:///foo.csv" as row
RETURN row
╒══════════════════════════════════╕
│row                               │
╞══════════════════════════════════╡
│{key1: a,  key2: (null),  key3: c}│
├──────────────────────────────────┤
│{key1: d,  key2: e,  key3: f}     │
└──────────────────────────────────┘

We’d like to filter out any rows which have ‘key2’ as null, so let’s tweak our query to do that:

load csv with headers from "file:///foo.csv" as row
WITH row WHERE NOT row.key2 is null
RETURN row

Hmmm, that’s odd, it’s got rid of both rows. We’d expect to see the second row, since that doesn’t have a null value.

At this point, we might suspect that what we’re seeing on the screen isn’t actually what the data looks like. Let’s write the following query to check our header values:

load csv with headers from "file:///foo.csv" as row
WITH row LIMIT 1
UNWIND keys(row) AS key
RETURN key, SIZE(key)
╒═════╤═════════╕
│key  │SIZE(key)│
╞═════╪═════════╡
│key1 │4        │
├─────┼─────────┤
│ key2│5        │
├─────┼─────────┤
│ key3│5        │
└─────┴─────────┘

The second column tells us that there are some extra characters in the columns for ‘key2’ and ‘key3’ or rather ‘ key2’ and ‘ key3’. In this case, they are spaces, but it could easily be another character:

load csv with headers from "file:///foo.csv" as row
WITH row LIMIT 1
UNWIND keys(row) AS key
RETURN key, replace(key, " ", "_SPACE_") AS spaces
╒═════╤═══════════╕
│key  │spaces     │
╞═════╪═══════════╡
│key1 │key1       │
├─────┼───────────┤
│ key2│_SPACE_key2│
├─────┼───────────┤
│ key3│_SPACE_key3│
└─────┴───────────┘

If we clean up our CSV file and try, again everything works as expected:

load csv with headers from "file:///foo.csv" as row
WITH row LIMIT 1
UNWIND keys(row) AS key
RETURN key, SIZE(key)
╒════╤═════════╕
│key │SIZE(key)│
╞════╪═════════╡
│key1│4        │
├────┼─────────┤
│key2│4        │
├────┼─────────┤
│key3│4        │
└────┴─────────┘
load csv with headers from "file:///foo.csv" as row
WITH row WHERE NOT row.key2 is null
RETURN row
╒═══════════════════════════╕
│row                        │
╞═══════════════════════════╡
│{key1: d, key2: e, key3: f}│
└───────────────────────────┘

It’s easier than you think to extend DevOps practices to SQL Server with Redgate tools. Discover how to introduce true Database DevOps, brought to you in partnership with Redgate

Topics:
csv ,database ,neo4j ,headers

Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}