Over a million developers have joined DZone.

Playing with a Neo4j Batch Importer - Part 1

DZone's Guide to

Playing with a Neo4j Batch Importer - Part 1

· Database Zone ·
Free Resource

Self-hosted vs Managed Service?  Learn how managed enterprise graph databases reduce project costs and increase time-to-delivery.


Data is everywhere… all around us, but sometimes the medium it is stored in can be a problem when analyzing it. Chances are you have a ton of data sitting around in a relational database in your current application… or you have begged, borrowed or scraped to get the data from somewhere and now you want to use Neo4j to find how this data is related.

Michael Hunger wrote a batch importer to load csv data quickly, but for some reason it hasn’t received a lot of love. We’re going to change that today and I’m going to walk you through getting your data out of tables and into nodes and edges.

Let’s clone the project and jump in.

git clone git://github.com/jexp/batch-import.git
cd batch-import


It uses Maven, so if you haven’t already go ahead and install it.

sudo apt-get install maven2


Now let’s assemble the project per the instructions:

mvn clean compile assembly:single


If you did it right, you should see:

[INFO] ------------------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 47 seconds
[INFO] Finished at: Tue Feb 28 15:50:14 UTC 2012
[INFO] Final Memory: 13M/33M
[INFO] ------------------------------------------------------------------------


Awesome… let’s create some test data. Michael packed in a data generator, let’s compile it and run it.

javac ./src/test/java/TestDataGenerator.java -d .
java TestDataGenerator

 It will take a little while, and then you should see this:

Creating 7500000 and 41242882 Relationships took 13 seconds.


Really where?

ls -al
-rw-r--r--  1 max max  111388909 2012-02-28 16:11 nodes.csv
-rw-r--r--  1 max max 1217775358 2012-02-28 16:11 rels.csv


So what’s in nodes.csv?

head -5 nodes.csv

Node    Rels    Property
0       4       TEST
1       0       TEST
2       1       TEST
3       1       TEST


The format is property_1, property_2, property_3 separated by tabs… and rels.csv:

head -5 rels.csv

Start   Ende    Type    Property
5496772 6842185 FIVE    Property
7416995 6166503 FOUR    Property
6712458 6853172 THREE   Property
1291639 296708  TWO     Property

The format is start node reference, end node reference number, relationship type, property_1 also separated by tabs.

Now we are ready to try out this test data. Run the command:

java -server -Xmx4G -jar target/batch-import-jar-with-dependencies.jar target/db nodes.csv rels.csv 


…and go grab a soda or cup of coffee unless you happen like watching dots on the screen, as this will take a minute or 3 depending on your hardware. If you are doing this test on an EC2 c1.medium instance it ain’t gonna work (trust me I know), so do it on a box with at least 4 GB of RAM:

Importing 7500000 Nodes took 17 seconds
Lots of dots....
Importing 41242882 Relationships took 164 seconds
203 seconds


Ok so where is it?

ls -al target/db

-rw-r--r-- 1 max max   67500025 2012-02-28 08:58 neostore.nodestore.db
-rw-r--r-- 1 max max 1998458182 2012-02-28 08:58 neostore.propertystore.db
-rw-r--r-- 1 max max 1361015130 2012-02-28 08:58 neostore.relationshipstore.db
...and a bunch of other files.


Great. Now assuming you have my Neography gem installed, let’s get a fresh copy of Neo4j and put these in there.

echo "require 'neography/tasks'" >> Rakefile
rake neo4j:install
mv target/db neo4j/data/graph.db
rake neo4j:start


Go to your Neo4j Dashboard and take a look:

Now everything should be working correctly. In part 2 of this series, I’ll show you how to write some SQL queries to get your data into Neo4j.


Self-hosted vs Managed Service?  Learn how managed enterprise graph databases reduce project costs and increase time-to-delivery.


Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}