Using APOC With Neo4j
Using APOC (Awesome Procedure On Cypher) with Neo4j can be a blessing for developers. Read on to learn what it is and how to use it with different databases.
Join the DZone community and get the full member experience.
Join For FreeAs we know, Neo4j pulls developers away from troublesome databases. It doesn’t make you free of old databases and it does provide support with predefined procedures.
Relational databases provide advantages such as better performance, scalability, productivity, ease of use, and security, and Neo4j also provides some amazing tools that can perform as mentioned above.
Yes, I am talking about APOC and using APOC with Neo4j. This is a blessing for developers. It provides many predefined procedures and user-defined functions/views so that we can easily use it and improve our productivity.
APOC stands for Awesome Procedure On Cypher. APOC is a library of procedures for various areas. It was introduced with Neo4j 3.0.
There are many areas in which we can use APOC, including:
- Graph algorithms.
- Metadata.
- Manual indexes and relationship indexes.
- Full-text search.
- Integration with other databases like MongoDB, ElasticSearch, Cassandra, and relational databases.
- Path expansion.
- Import and export.
- Date and time functions.
- Loading of XML and JSON from APIs and files.
- String and text function.
- Concurrent and batched cypher execution.
- Spatial function and lock.
- Collection and map utilities.
When you are using APOC, there are two ways to get it and use it with Neo4j.
First Way
- Download the binary JAR from the latest release.
- Put that into your
$Neo4j_Home/plugins/
folder - Restart your Neo4j Server.
Second Way
- Clone
neo4j-apoc-procedure
from here. - Go to the folder
cd neo4j-apoc-procedures
. - Create a JAR with the help of MVN clean compile install.
- Copy your JAR file from the target to the
$Neo4j_Home/plugins/
folder. (cp target/apoc-1.0.0-SNAPSHOT.jar $Neo4j_Home/plugins/
). - Restart your Neo4j Server.
Now, you are ready to use APOC with Neo4j. We will discuss data migration between the other databases and Neo4j.
We use many databases for storing data. But when we have a large amount of data and tables, it becomes hard to make queries and execute them on the database. We have to be extra cautious to perform the task at hand and not get bored after seeing the same screen without any fun. When we work on any other database and think to use Neo4j, we face the issue of migrating data into Neo4j. We are going to discuss migrating data from some commonly used databases.
Oracle
We are in the last database to migrate data to Neo4j but as obvious not least. We can download JDBC JAR file, put it in $Neo4j_Home/plugins
, and restart Neo4j. We can provide URL in $Neo4j_Home/comf/neo4j.conf
as:
apoc.jdbc.oracle_url.url=jdbc:oracle:thin:user/password@127.0.0.1:1521/XE
After restarting the Neo4j server, we are set to migrate the data from Oracle to Neo4j. We fetch the data from the Oracle where we have a table with the name employee_details
to Neo4j. Now, we load the driver with the APOC.
CALL apoc.load.jdbc('oracle_url','employee_details') YIELD row
RETURN count(*);
Let’s create index, constraints, and relations for the data.
/**
* Here we define schema and key.
*/
CALL apoc.schema.assert(
{EMPINFO:['name', 'age','salary']},
{EMPINFO:['id'],ADDRESS:['address']});
Now, we load data and perform merge and create operations so that we can create the node and the relationships between the node.
/**
* Here we load data in Neo4j and create a node with the help of the schema that we defined
* earlier.
*/
CALL apoc.load.jdbc('oracle_url','employee_details') yield row
MERGE (g:ADDRESS {name:row.ADDRESS})
CREATE (t:EMPINFO {id:toString(row.ID), name:row.NAME, age:toString(row.AGE), salary:toString(row.SALARY)})
CREATE (t)-[:LIVE]->(g);
We can see the relation graph and it will look something like this:
/**
* For Displaying Performed Relation
*/
MATCH p=()-[r:LIVE]->() RETURN p LIMIT 25;
MySQL
We want to migrate data from MySQL. Like before, we have to download the JDBC JAR file and put it in $Neo4j_Home/plugins
and update $Neo4j_Home/conf/neo4j.conf
to be:
apoc.jdbc.mysql_url.url=jdbc:mysql://localhost:3306/test?user=user&password=pass
Restart the Neo4j server. Now, we are set to migrate the data from Cassandra to Neo4j.
We hit MySQL, start fetching data, and perform the count operation.
CALL apoc.load.jdbc('mysql_url','employee_data') yield row
RETURN count(*);
PostgreSQL
When we use PostgreSQL, we have to download the JDBC JAR file, put it in $Neo4j_Home/plugins
, and restart Neo4j. After restarting the Neo4j server, we are set to migrate the data from PostgreSQL to Neo4j.
Now, we load the driver with the APOC.
CALL apoc.load.driver('org.postgresql.Driver');
Now, we create the call for fetching the data from PostgreSQL, where we have a table with name employee_details
to Neo4j.
with 'jdbc:postgresql://localhost:5432/testdb?user=postgres&password=postgres' as url
CALL apoc.load.jdbc(url,'employee_details') YIELD row
RETURN count(*);
If we don’t want to use these steps, then we can provide the URL in $Neo4j_Home/conf/neo4j.conf
and restart the server:
apoc.jdbc.postgresql_url.url=jdbc:postgresql://localhost:5432/testdb?user=postgres&password=postgres
We can now fetch data directly. We don’t need the load driver.
CALL apoc.load.jdbc('postgresql_url','employee_details') YIELD row
RETURN count(*);
Create the nodes and relation in the data.
/**
* Here we define schema and key. In the first column we define those column_name
* which can be null and in the second we define the column name that we want to be unique.
*/
CALL apoc.schema.assert( {Detail:['name','age','address','salary']},
{Detail:['id']});
/**
* Here we load data in Neo4j and create a node with the help of schemas which we defined
* earlier.
*/
CALL apoc.load.jdbc('jdbc:postgresql://localhost:5432/testdb?user=postgres&password=postgres','employee_details') yield row
CREATE (t:Detail {id:toString(row.id), name:row.name,
age:toString(row.age), address:row.address, salary:toString(row.salary)})
return t;
Cassandra
Now, we migrate data from Cassandra to Neo4j. We first import data into Cassandra if we don’t have data in Cassandra or we can use it for tests, as well.
We have to run the following command for setting up initial data in Cassandra:
curl -OL https://raw.githubusercontent.com/neo4j-contrib/neo4j-cassandra-connector/master/db_gen/playlist.cql
curl -OL https://raw.githubusercontent.com/neo4j-contrib/neo4j-cassandra-connector/master/db_gen/artists.csv
curl -OL https://raw.githubusercontent.com/neo4j-contrib/neo4j-cassandra-connector/master/db_gen/songs.csv
$CASSANDRA_HOME/bin/cassandra
$CASSANDRA_HOME/bin/cqlsh -f playlist.cql
We have set our Cassandra database with the data. We have to download the JDBC JAR file and put it in $Neo4j_Home/plugins
. We can provide the URL in $Neo4j_Home/conf/neo4j.conf
to be:
apoc.jdbc.cassandra_songs.url=jdbc:cassandra://localhost:9042/playlist
Restart the Neo4j server. Now, we are set to migrate the data from Cassandra to Neo4j.
We hit Cassandra and start fetching data and perform the count operation.
CALL apoc.load.jdbc('cassandra_songs','artists_by_first_letter') yield row
RETURN count(*);
Let’s create index, constraints, and relations for the data.
CALL apoc.load.jdbc('cassandra_songs','artists_by_first_letter') yield row
RETURN count(*);
Now, we will load data and perform merge and create operations so that we can create the node and relationships between the nodes.
/**
* Here we define schema and key.
*/
CALL apoc.schema.assert(
{Track:['title','length']},
{Artist:['name'],Track:['id'],Genre:['name']});
We can see the relation graph. It will look something like this:
/**
* Here we load data in the neo4j and create node with the help of schema which we define
* earlier.
*/
CALL apoc.load.jdbc('cassandra_songs','track_by_artist') yield row
MERGE (a:Artist {name:row.artist})
MERGE (g:Genre {name:row.genre})
CREATE (t:Track {id:toString(row.track_id), title:row.track,
length:row.track_length_in_seconds})
CREATE (a)-[:PERFORMED]->;(t)
CREATE (t)-[:GENRE]->(g);
/**
* For Displaying Performed Relation
*/
MATCH p=()-[r:PERFORMED]->() RETURN p LIMIT 25;
After importing the data in Neo4j, we have to thing about the sync of data. We can use the scheduling process, which can be time-based and can automatically sync data between the databases. We can also use event-based integration where we will define the event at which we want to update the database.
Note: As we discussed, I want to say again that if you do not update the driver name to $Neo4j_Home/conf/neo4j.conf
, then you have to load the driver in Neo4j. Otherwise, you have to provide only the driver name in the query.
This is a basic example for using APOC. It is also the first step in starting to use Neo4j and replacing it with your old databases so that you don't lose your data. After migrating the data, you are ready to use Neo4j with the data that existed in the old databases.
Published at DZone with permission of Anurag Srivastava, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments