Platinum Partner
java,nosql,architecture,tutorial,tools & methods,spreadsheet,neo4j

Neo4j 2.0: Importing Data the Spreadsheet Way!

[This post was originally written by Pernilla Lindh]

Hi all graphistas out there,

And happy new year! I hope you had an excellent start, let's keep this year rocking with a spirit of graph-love! Our Rik Van Bruggen did a lovely blog post on how to import data into Neo4j using spreadsheets in March last year.  Simple and easy to understand but only for Neo4j version 1.9.3. Now it’s a new year and in December we launched a shiny new version of Neo4j, the 2.0.0 release! Baadadadaam! So, I thought better provide an update to his blog post, with the spirit of his work. (Thank you Rik!)

You can still use the Neo4j CSV batch-importer (Now for 2.0.0) from Michael Hunger, or look at other Data Import Options.

If you simple want to use Cypher, Rik’s way is much easier. That’s why I have updated Riks Cypher statements old statements in a new spreadsheet that shows how how to import to Neo4j 2.0.0.

How does it work?

Open the spreadhsheet.
 

The sheet is composed of two parts:

  • columns A, B and C: these contain the data for the Nodes of our graph, using a custom “id”, a “name”, and a “gender” as properties.

  • columns F, G and H: these contain the data for the Relationships of our graph, having a “from-id” (where the relationship starts), a “to-id” (where the relationship ends), and a “relationship type”. Columns F and G reference the nodes and their id’s in column A.

And then comes the secret sauce: how to create Cypher statements from this nodes and relationship information.

For this we use very simple statements that leverage the columns mentioned above, the cypher syntax and string concatenation. Look at the columns D and I:

Nodes

We just use this formula to create the cypher statement.

 ="MERGE (meetup:Event {id:'"&A3&"', name:'"&B3&"'})” 

(instead of create we will use merge who is a new feature in 2.0.0 it will create if the node not exist otherwise it will not create a new node. You can read more about it here in the Neo4j Manual. Output for row 3:

  MERGE (meetup:Event{id:’153602002', name:’Meetup Malmö'}) 

If we check the next row, we will see a change, since we know that all attendees of the meetup will attend our meetup, we can create the whole relationship too. So we combine the creation of the “Person” Node with connecting it to the meetup node we just created.

 ="MERGE (_"&A4&":Person {id:'"&A4&"', name:'"&B4&"', gender:'"&C4&"'})
-[:ATTENDS]->(meetup)"
Output for row 4:

 MERGE (_2:Person {id:'2', name:'Donald Duck', gender:'man'})-[:ATTENDS]->(meetup)

As you can see, it takes that id, name and gender properties from columns A, B and C, and puts these into a “MERGE” Cypher statement.

Relationships

Originally Rik used the (now legacy) Neo4j-AutoIndex to look up nodes to connect. We can use a schema index and do the same with MATCH.

In this particilar dataset we don't have to create a index from our labels and the nodes properties, but since I can do it I will show you. 

create index on :Person(id)

When you create a index you use the labels and the property in the node that you want to index. 

Time to create some more relationships, let’s look at the Cypher statements to create them.

="WITH 1 as dummy MATCH (p1:Person {id:'"&E4&"'}), (p2:Person {id:'"&F4&"'})
MERGE (p1)-[:"&G4&"]->(p2);"
The reason why we are using WITH 1 as dummy is that it's for the single statement for the neo4j-browser where all the match merge follow each other with no separation in a single big query.

Output for row 2:

MATCH (p1:Person {id:'2'}), (p2:Person {id:'5'}) MERGE (p1)-[:WORKS_WITH]->(p2);

This one is a little bit more complicated, as it uses Neo4j’s MATCH statement in order to create the relationship. We first have to look up start node and end node using the “id” property. And then the merge-statement creates the relationship based on the relationship-type in column G.

Then we copy each of the formulas down across all the rows we want to cover.

Having done this, we end up with two columns each containing a number of cypher statements. So then what?

The Instructions Sheet

In the first sheet of the spreadsheet, you will find a bunch of instructions. Basically, you need to go through the following steps:

  • download and unzip Neo4j server.

  • copy/paste the Cypher statements from the top part the Import Sheet into a text file or the browser window directly

  • All these statements form a single large Cypher statement as the browser can currently only execute single cypher statements

  • drag the file into the browser input area and then execute it

  • If you want to use the Neo4j-Shell for importing larger amounts of data use the approach shown in the second tab titled: “For the shell”

  • It uses one cypher statement (terminated with semi-colons) per line

  • and a begin / commit block around the statements to speed up the import with a single transaction

  • paste all statements into a file and use bin/neo4j-shell -file import.txt or copy and paste direct in the browser

And there we go: the dataset gets created, and Neo4j is ready for use. I hope this little overview was useful for you - it sure was useful for me when getting my hands dirty for the first time :) …

Note: Make sure you have the newest java running on your device.You can download it here.

(I did that mistake)

Time to DIY! Good luck!

Cheers,

Pernilla

Published at DZone with permission of {{ articles[0].authors[0].realName }}, DZone MVB. (source)

Opinions expressed by DZone contributors are their own.

{{ tag }}, {{tag}},

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}
{{ parent.authors[0].realName || parent.author}}

{{ parent.authors[0].tagline || parent.tagline }}

{{ parent.views }} ViewsClicks
Tweet

{{parent.nComments}}