Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Neo4j With Scala: Migrating Data

DZone's Guide to

Neo4j With Scala: Migrating Data

PostgreSQL? Sure. Cassandra? No problem. Oracle? You're covered. Learn how you can migrate your data from a variety of databases to Neo4j.

· Database Zone
Free Resource

What if you could learn how to use MongoDB directly from the experts, on your schedule, for free? We've put together the ultimate guide for learning MongoDBSign up and you'll receive instructions for how to get started!

Hello, folks!

Let's continue our journey to Neo4j with Scala. Earlier, we discussed the use of Neo4j with Scala and Neo4j APOC with Scala. In this blog, we are going to discuss how we can migrate data from another database, such as MySQL, PostgreSQL, Oracle, and Cassandra.

But before starting the journey, to those who caught the train late, this is what happened up until now:

  1. Getting Started Neo4j with Scala: An Introduction.
  2. Neo4j with Scala: Defining User Defined Procedures and APOC.

So now we have some basic understanding of Neo4j and APOC with Scala. Now, let's get started.

We use many databases for storing the data. But when we have lots of data and tables, after time, it becomes tough to make queries and execute them on the database. We have to be extra cautious when performing these tasks, and it gets boring to see same screen again and again.

We can use Neo4j for this and keep it simple and fresh in the process. Before starting the discussion, we have to keep the Neo4j APOC Kit (download and install) in the $Neo4j_Home/plugins folder. Now, let's get to it.

PostgreSQL

When we use PostgreSQL, we have to download the JDBC .jar file (download), keep it in the $Neo4j_Home/plugins folder, and restart Neo4j. After restarting the Neo4j server, we are ready to migrate the data.

  • Now we load the driver with the APOC.
CALL apoc.load.driver('org.postgresql.Driver');


  • Now we create the call for fetching the data from the PostgreSQL, where we have a table named employee_details to Neo4j.
with 'jdbc:postgresql://localhost:5432/testdb?user=postgres&password=postgres' as url
CALL apoc.load.jdbc(url,'employee_details') YIELD row
RETURN count(*);


  • If we don’t want to use these steps, than we can provide URL in $Neo4j_Home/conf/neo4j.conf and restart the server:
apoc.jdbc.postgresql_url.url=jdbc:postgresql://localhost:5432/testdb?user=postgres&password=postgres


We can now directly fetch data. We don’t need to load the driver, either.

CALL apoc.load.jdbc('postgresql_url','employee_details') YIELD row
RETURN count(*);


  • Create nodes and relations in the data.
/**
* Here we define the schema and key. In first column, we define those column_name
* that can be null and in the second we those column name which we want unique.
*/

CALL apoc.schema.assert( {Detail:['name','age','address','salary']},
{Detail:['id']});

/**
* Here we load data in the neo4j and create node with the help of schema which we define
* earlier.
*/

CALL apoc.load.jdbc('jdbc:postgresql://localhost:5432/testdb?user=postgres&password=postgres','employee_details') yield row
CREATE (t:Detail {id:toString(row.id), name:row.name,
age:toString(row.age), address:row.address, salary:toString(row.salary)})
return t;


Screenshot from 2016-09-09 11:09:37.png

screenshot-from-2016-09-09-010005

MySQL

If we want to migrate data from MySQL, we have to download the JDBC .jar file (download), keep it in $Neo4j_Home/plugins, and update $Neo4j_Home/conf/neo4j.conf:

apoc.jdbc.mysql_url.url=jdbc:mysql://localhost:3306/test?user=user&password=pass


Restart the Neo4j server and we are set to migrate the data.

  • We hit the MySQL and start fetching data and perform count operation.
CALL apoc.load.jdbc('mysql_url','employee_data') yield row
RETURN count(*);


Screenshot from 2016-09-10 12:36:34 (copy).png

Cassandra

Now we want migrate data from Cassandra to Neo4j. First, import data into Cassandra. If we don’t have data in Cassandra, we can use it for testing.

  • We have to run the following command for the initial setup in Cassandra:
url -OL https://raw.githubusercontent.com/neo4j-contrib/neo4j-cassandra-connector/master/db_gen/playlist.cql
curl -OL https://raw.githubusercontent.com/neo4j-contrib/neo4j-cassandra-connector/master/db_gen/artists.csv
curl -OL https://raw.githubusercontent.com/neo4j-contrib/neo4j-cassandra-connector/master/db_gen/songs.csv
$CASSANDRA_HOME/bin/cassandra
$CASSANDRA_HOME/bin/cqlsh -f playlist.cql


  • Now that the data is in, we have to download the JDBC .jar file (download) and keep it in $Neo4j_Home/plugins. We can provide the URL in the $Neo4j_Home/conf/neo4j.conf:
apoc.jdbc.cassandra_songs.url=jdbc:cassandra://localhost:9042/playlist


After restarting the Neo4j server, we're ready to migrate the data.

  • We hit the Cassandra button to start fetching data and performing the count operation.
CALL apoc.load.jdbc('cassandra_songs','artists_by_first_letter') yield row
RETURN count(*);


screenshot-from-2016-09-09-130005

  • Let’s create an index, constraints, and relations for the data.
/**
* Here we define schema and key.
*/
CALL apoc.schema.assert(
    {Track:['title','length']},
    {Artist:['name'],Track:['id'],Genre:['name']});


Screenshot from 2016-09-09 13:12:25.png

  • Now we will load the data and perform a merge and create operation so that we can create the nodes and the relationships between the node.
/**
* Here we load data in the neo4j and create node with the help of schema which we define
* earlier.
*/
CALL apoc.load.jdbc('cassandra_songs','track_by_artist') yield row
MERGE (a:Artist {name:row.artist})
MERGE (g:Genre {name:row.genre})
CREATE (t:Track {id:toString(row.track_id), title:row.track,
length:row.track_length_in_seconds})
CREATE (a)-[:PERFORMED]->(t)
CREATE (t)-[:GENRE]->(g);


Screenshot from 2016-09-09 13:13:28.png

  • We can see the relation graph, and it should look something like this:
/**
* For Displaying Performed Relation
*/

MATCH p=()-[r:PERFORMED]->() RETURN p LIMIT 25;


Performed.png

/**
* For Displaying GENRE Relation
*/

MATCH p=()-[r:GENRE]->() RETURN p LIMIT 100;


Screenshot from 2016-09-09 14:11:41.png

Oracle

We are in the last, but not least, database to migrate to Neo4j. As usual, download the JDBC .jar file (download), keep it in $Neo4j_Home/plugins, and restart Neo4j. We can provide the URL in $Neo4j_Home/conf/neo4j.conf:

apoc.jdbc.oracle_url.url=jdbc:oracle:thin:user/password@127.0.0.1:1521/XE


  • After restarting the Neo4j server, we are set to migrate the data from Oracle to Neo4j. Let's fetch the data from Oracle, where we have a table named employee_details, and bring it to Neo4j. Now we load the driver with the APOC:
CALL apoc.load.jdbc('oracle_url','employee_details') YIELD row
RETURN count(*);


screenshot-from-2016-09-10-123634

  • Let’s create an index, constraints, and relationships for the data:
/**
* Here we define schema and key.
*/
CALL apoc.schema.assert(
    {EMPINFO:['name', 'age','salary']},
    {EMPINFO:['id'],ADDRESS:['address']});


Screenshot from 2016-09-10 14:53:00.png

  • Now we will load the data and perform a merge and create operation so that we can create the nodes and the relationships between the nodes.
/**
* Here we load data in the neo4j and create node with the help of schema which we define
* earlier.
*/
CALL apoc.load.jdbc('oracle_url','employee_details') yield row
MERGE (g:ADDRESS {name:row.ADDRESS})
CREATE (t:EMPINFO {id:toString(row.ID), name:row.NAME, age:toString(row.AGE), salary:toString(row.SALARY)})
CREATE (t)-[:LIVE]->(g);


screenshot-from-2016-09-10-145400

  • We can see a relationship graph, which should look something like this:
/**
* For Displaying Performed Relation
*/

MATCH p=()-[r:LIVE]->() RETURN p LIMIT 25;


screenshot-from-2016-09-10-145506

Now we can see that it is easy to migrate data from other databases to Neo4j.

After importing the data to Neo4j, we have to think about syncing it. We can use the schedule process, which can be time-based and automatically syncs data between the databases. We can also used event-based integration where we will define the events at which we want to update the database.

What if you could learn how to use MongoDB directly from the experts, on your schedule, for free? We've put together the ultimate guide for learning MongoDBSign up and you'll receive instructions for how to get started!

Topics:
neo technology ,database ,cassanda ,mysql ,data migration

Published at DZone with permission of Anurag Srivastava, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}