Neo4j With Scala: Migrating Data
PostgreSQL? Sure. Cassandra? No problem. Oracle? You're covered. Learn how you can migrate your data from a variety of databases to Neo4j.
Join the DZone community and get the full member experience.
Join For FreeHello, folks!
Let's continue our journey to Neo4j with Scala. Earlier, we discussed the use of Neo4j with Scala and Neo4j APOC with Scala. In this blog, we are going to discuss how we can migrate data from another database, such as MySQL, PostgreSQL, Oracle, and Cassandra.
But before starting the journey, to those who caught the train late, this is what happened up until now:
- Getting Started Neo4j with Scala: An Introduction.
- Neo4j with Scala: Defining User Defined Procedures and APOC.
So now we have some basic understanding of Neo4j and APOC with Scala. Now, let's get started.
We use many databases for storing the data. But when we have lots of data and tables, after time, it becomes tough to make queries and execute them on the database. We have to be extra cautious when performing these tasks, and it gets boring to see same screen again and again.
We can use Neo4j for this and keep it simple and fresh in the process. Before starting the discussion, we have to keep the Neo4j APOC Kit (download and install) in the $Neo4j_Home/plugins folder. Now, let's get to it.
PostgreSQL
When we use PostgreSQL, we have to download the JDBC .jar file (download), keep it in the $Neo4j_Home/plugins folder, and restart Neo4j. After restarting the Neo4j server, we are ready to migrate the data.
- Now we load the driver with the APOC.
CALL apoc.load.driver('org.postgresql.Driver');
- Now we create the call for fetching the data from the PostgreSQL, where we have a table named employee_details to Neo4j.
with 'jdbc:postgresql://localhost:5432/testdb?user=postgres&password=postgres' as url
CALL apoc.load.jdbc(url,'employee_details') YIELD row
RETURN count(*);
- If we don’t want to use these steps, than we can provide URL in $Neo4j_Home/conf/neo4j.conf and restart the server:
apoc.jdbc.postgresql_url.url=jdbc:postgresql://localhost:5432/testdb?user=postgres&password=postgres
We can now directly fetch data. We don’t need to load the driver, either.
CALL apoc.load.jdbc('postgresql_url','employee_details') YIELD row
RETURN count(*);
- Create nodes and relations in the data.
/**
* Here we define the schema and key. In first column, we define those column_name
* that can be null and in the second we those column name which we want unique.
*/
CALL apoc.schema.assert( {Detail:['name','age','address','salary']},
{Detail:['id']});
/**
* Here we load data in the neo4j and create node with the help of schema which we define
* earlier.
*/
CALL apoc.load.jdbc('jdbc:postgresql://localhost:5432/testdb?user=postgres&password=postgres','employee_details') yield row
CREATE (t:Detail {id:toString(row.id), name:row.name,
age:toString(row.age), address:row.address, salary:toString(row.salary)})
return t;
MySQL
If we want to migrate data from MySQL, we have to download the JDBC .jar file (download), keep it in $Neo4j_Home/plugins, and update $Neo4j_Home/conf/neo4j.conf:
apoc.jdbc.mysql_url.url=jdbc:mysql://localhost:3306/test?user=user&password=pass
Restart the Neo4j server and we are set to migrate the data.
- We hit the MySQL and start fetching data and perform count operation.
CALL apoc.load.jdbc('mysql_url','employee_data') yield row
RETURN count(*);
Cassandra
Now we want migrate data from Cassandra to Neo4j. First, import data into Cassandra. If we don’t have data in Cassandra, we can use it for testing.
- We have to run the following command for the initial setup in Cassandra:
url -OL https://raw.githubusercontent.com/neo4j-contrib/neo4j-cassandra-connector/master/db_gen/playlist.cql
curl -OL https://raw.githubusercontent.com/neo4j-contrib/neo4j-cassandra-connector/master/db_gen/artists.csv
curl -OL https://raw.githubusercontent.com/neo4j-contrib/neo4j-cassandra-connector/master/db_gen/songs.csv
$CASSANDRA_HOME/bin/cassandra
$CASSANDRA_HOME/bin/cqlsh -f playlist.cql
- Now that the data is in, we have to download the JDBC .jar file (download) and keep it in $Neo4j_Home/plugins. We can provide the URL in the $Neo4j_Home/conf/neo4j.conf:
apoc.jdbc.cassandra_songs.url=jdbc:cassandra://localhost:9042/playlist
After restarting the Neo4j server, we're ready to migrate the data.
- We hit the Cassandra button to start fetching data and performing the count operation.
CALL apoc.load.jdbc('cassandra_songs','artists_by_first_letter') yield row
RETURN count(*);
- Let’s create an index, constraints, and relations for the data.
/**
* Here we define schema and key.
*/
CALL apoc.schema.assert(
{Track:['title','length']},
{Artist:['name'],Track:['id'],Genre:['name']});
- Now we will load the data and perform a merge and create operation so that we can create the nodes and the relationships between the node.
/**
* Here we load data in the neo4j and create node with the help of schema which we define
* earlier.
*/
CALL apoc.load.jdbc('cassandra_songs','track_by_artist') yield row
MERGE (a:Artist {name:row.artist})
MERGE (g:Genre {name:row.genre})
CREATE (t:Track {id:toString(row.track_id), title:row.track,
length:row.track_length_in_seconds})
CREATE (a)-[:PERFORMED]->(t)
CREATE (t)-[:GENRE]->(g);
- We can see the relation graph, and it should look something like this:
/**
* For Displaying Performed Relation
*/
MATCH p=()-[r:PERFORMED]->() RETURN p LIMIT 25;
/**
* For Displaying GENRE Relation
*/
MATCH p=()-[r:GENRE]->() RETURN p LIMIT 100;
Oracle
We are in the last, but not least, database to migrate to Neo4j. As usual, download the JDBC .jar file (download), keep it in $Neo4j_Home/plugins, and restart Neo4j. We can provide the URL in $Neo4j_Home/conf/neo4j.conf:
apoc.jdbc.oracle_url.url=jdbc:oracle:thin:user/password@127.0.0.1:1521/XE
- After restarting the Neo4j server, we are set to migrate the data from Oracle to Neo4j. Let's fetch the data from Oracle, where we have a table named employee_details, and bring it to Neo4j. Now we load the driver with the APOC:
CALL apoc.load.jdbc('oracle_url','employee_details') YIELD row
RETURN count(*);
- Let’s create an index, constraints, and relationships for the data:
/**
* Here we define schema and key.
*/
CALL apoc.schema.assert(
{EMPINFO:['name', 'age','salary']},
{EMPINFO:['id'],ADDRESS:['address']});
- Now we will load the data and perform a merge and create operation so that we can create the nodes and the relationships between the nodes.
/**
* Here we load data in the neo4j and create node with the help of schema which we define
* earlier.
*/
CALL apoc.load.jdbc('oracle_url','employee_details') yield row
MERGE (g:ADDRESS {name:row.ADDRESS})
CREATE (t:EMPINFO {id:toString(row.ID), name:row.NAME, age:toString(row.AGE), salary:toString(row.SALARY)})
CREATE (t)-[:LIVE]->(g);
- We can see a relationship graph, which should look something like this:
/**
* For Displaying Performed Relation
*/
MATCH p=()-[r:LIVE]->() RETURN p LIMIT 25;
Now we can see that it is easy to migrate data from other databases to Neo4j.
After importing the data to Neo4j, we have to think about syncing it. We can use the schedule process, which can be time-based and automatically syncs data between the databases. We can also used event-based integration where we will define the events at which we want to update the database.
Published at DZone with permission of Anurag Srivastava, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments