Over a million developers have joined DZone.

Playing with a Neo4j Batch Importer - Part 1

· Database Zone

Build fast, scale big with MongoDB Atlas, a hosted service for the leading NoSQL database. Try it now! Brought to you in partnership with MongoDB.


Data is everywhere… all around us, but sometimes the medium it is stored in can be a problem when analyzing it. Chances are you have a ton of data sitting around in a relational database in your current application… or you have begged, borrowed or scraped to get the data from somewhere and now you want to use Neo4j to find how this data is related.

Michael Hunger wrote a batch importer to load csv data quickly, but for some reason it hasn’t received a lot of love. We’re going to change that today and I’m going to walk you through getting your data out of tables and into nodes and edges.

Let’s clone the project and jump in.

git clone git://github.com/jexp/batch-import.git
cd batch-import


It uses Maven, so if you haven’t already go ahead and install it.

sudo apt-get install maven2


Now let’s assemble the project per the instructions:

mvn clean compile assembly:single


If you did it right, you should see:

[INFO] ------------------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 47 seconds
[INFO] Finished at: Tue Feb 28 15:50:14 UTC 2012
[INFO] Final Memory: 13M/33M
[INFO] ------------------------------------------------------------------------


Awesome… let’s create some test data. Michael packed in a data generator, let’s compile it and run it.

javac ./src/test/java/TestDataGenerator.java -d .
java TestDataGenerator

 It will take a little while, and then you should see this:

Creating 7500000 and 41242882 Relationships took 13 seconds.


Really where?

ls -al
-rw-r--r--  1 max max  111388909 2012-02-28 16:11 nodes.csv
-rw-r--r--  1 max max 1217775358 2012-02-28 16:11 rels.csv


So what’s in nodes.csv?

head -5 nodes.csv

Node    Rels    Property
0       4       TEST
1       0       TEST
2       1       TEST
3       1       TEST


The format is property_1, property_2, property_3 separated by tabs… and rels.csv:

head -5 rels.csv

Start   Ende    Type    Property
5496772 6842185 FIVE    Property
7416995 6166503 FOUR    Property
6712458 6853172 THREE   Property
1291639 296708  TWO     Property

The format is start node reference, end node reference number, relationship type, property_1 also separated by tabs.

Now we are ready to try out this test data. Run the command:

java -server -Xmx4G -jar target/batch-import-jar-with-dependencies.jar target/db nodes.csv rels.csv 


…and go grab a soda or cup of coffee unless you happen like watching dots on the screen, as this will take a minute or 3 depending on your hardware. If you are doing this test on an EC2 c1.medium instance it ain’t gonna work (trust me I know), so do it on a box with at least 4 GB of RAM:

Importing 7500000 Nodes took 17 seconds
Lots of dots....
Importing 41242882 Relationships took 164 seconds
203 seconds


Ok so where is it?

ls -al target/db

-rw-r--r-- 1 max max   67500025 2012-02-28 08:58 neostore.nodestore.db
-rw-r--r-- 1 max max 1998458182 2012-02-28 08:58 neostore.propertystore.db
-rw-r--r-- 1 max max 1361015130 2012-02-28 08:58 neostore.relationshipstore.db
...and a bunch of other files.


Great. Now assuming you have my Neography gem installed, let’s get a fresh copy of Neo4j and put these in there.

echo "require 'neography/tasks'" >> Rakefile
rake neo4j:install
mv target/db neo4j/data/graph.db
rake neo4j:start


Go to your Neo4j Dashboard and take a look:

Now everything should be working correctly. In part 2 of this series, I’ll show you how to write some SQL queries to get your data into Neo4j.


Now it's easier than ever to get started with MongoDB, the database that allows startups and enterprises alike to rapidly build planet-scale apps. Introducing MongoDB Atlas, the official hosted service for the database on AWS. Try it now! Brought to you in partnership with MongoDB.


The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}