Introducing the Neo4j to OrientDB Importer
If you're looking to move your graphs from Neo4j to OrientDB, the latter has come up with a tool that should smooth over your database migration.
Join the DZone community and get the full member experience.
Join For FreeYou could have several reasons in mind when considering moving databases. For instance, when it comes to migrating from Neo4j to OrientDB, it could be because of:
A more permissive Apache 2 license, which allows OrientDB Community Edition to be free for any usage, even commercial.
More data models, including:
Document model, object model (with support for inheritance and polymorphism), and the reactive model.
- Features like:
SQL for your queries; different schema modes (schema-less, schema-full and schema-hybrid); additional constraints and index types, including full text and spatial indexes, and indexes on multiple properties; a feature-richer Graph Editor, which not only allows you to visualize your graph, but also add, edit, and delete vertices and edges; dynamic triggers, sequences, and sharding.
A more feature-rich Community Edition, which, for instance, does include high availability, existence constraints, and user and role management.
Alternatives to Neo4j's limitations, including the space reclaim issue and lack of multitenancy.
- A wide set of security features, including record-level security, auditing, support for Kerberos, and encryption at REST.
Transparent pricing and price model.
On top of all this, you get what you might expect from a graph/document multi-model database, including index-free adjacency, TinkerPop Standard Compliance, ACID transactions, server-side functions, REST API, a set of connectors, and a web GUI.
Once you have decided that you want to experiment with, or perform an actual migration to, OrientDB, you will need to handle a data migration.
In this article, we will introduce the Neo4j to OrientDB Importer, a new easy to use tool that allows you to import a Neo4j graph dDatabase into OrientDB in a few simple steps.
In case you are coming from the RDBMS world, you may want to check out the OrientDB Teleporter, a tool that allows you to import or synchronize your relational data to OrientDB.
Let's start this article with a quick introduction of OrientDB.
OrientDB
OrientDB is an open-source, distributed, graph/document multi-model database. It is licensed with Apache 2, a permissive, commercial friendly license, which means that OrientDB is free for any use, including commercial, with no hidden restrictions.
OrientDB is actually a fully featured open source product because its Community Edition includes a wide set of features, including multi-master replication, user and role management, and existence constraints.
OrientDB has full native graph capabilities coupled with features that you normally find only in document databases, so it allows you to combine the power of graphs and the flexibility of documents into a single, scalable, high-performance operational database.
But OrientDB is even more than this: It is a multi-model database. Multi-model means that it supports different data models. In particular, the data models supported by OrientDB are:
Graph
Document
Key/Value
Object
As a multi-model database, OrientDB allows you to handle a wider set of use cases, and to solve many different data problems within a single product. As a result, the complexity linked with dealing with different Database Management Systems in a polyglot persistence scenario is reduced.
While preserving the advantages of polyglot persistence – the different data models – the polyglot persistence limitations are solved.
To implement the different data models, OrientDB does not make use of layers or interfaces to the database engine. It is a true multi-model database, that combines and builds into the core the features of the four data models. As a result, OrientDB is more fast and optimized than a solution that makes use of additional layers.
OrientDB is available in two Editions: Community and Enterprise.
The Enterprise Edition is built on top of the OrientDB Community Edition and provides additional features and tools, including Visual Server management and monitoring, enhanced HA features (like multiple datacenter replication and visual HA configuration and monitoring, query profiler, scheduled, hot and incremental backups, alerts, auditing, enhanced security, and the aforementioned Teleporter tool.
In addition to more features and tools, OrientDB's Enterprise Edition includes Commercial Licenses and 24x7 Commercial Support Services.
Neo4j to OrientDB Migration Strategies
At OrientDB Ltd, the private company that officially leads the development of the OrientDB Open Source Project, we want to help you having an easy and straightforward process while you migrate your data from Neo4j.
The legacy migration method made use of GraphML, an XML-based file format for graphs. This method however was not performing well for big graphs and had some limitations. We hence wondered what we could do as a company to make your migration easier and decided to create a new Importer.
Starting from OrientDB version 2.2.13, using the new Neo4j to OrientDB Importer is the preferred way to migrate from Neo4j.
Note:
If your data is in CSV format, you can migrate to OrientDB using the OrientDB ETL tool.
If you are using a RDBMS, you can migrate using the Teleporter.
The Neo4j to OrientDB Importer
The Neo4j to OrientDB Importer is an open source, GPL-licensed command line tool that can help you automate and simplify your migration from Neo4j.
The Importer is written in Java and uses the Neo4j Java API to interact with Neo4j and the OrientDB Java Graph API to interact with OrientDB.
Imported Neo4j items are the following:
Nodes
Relationships
Constraints
Indexes
Installation
The Importer is provided as an external plugin for the OrientDB Server, and is available as a ZIP or tar.gz archive. You can download it from maven central or from the OrientDB Web Site.
To install the plugin, please unpack the archive on your OrientDB server directory (make sure that the version of your OrientDB Server and the version of the Importer are the same. Upgrade your OrientDB server, if necessary).
On Linux systems, to unpack the archive, you can use a command similar to the following:
tar xfv orientdb-neo4j-importer-VERSION.tar.gz -C path_to_orientDB/ --strip-components=1
In this case, VERSION is the version you are using.
Migration Scenarios
A typical scenario of a migration done with the Neo4j to OrientDB Importer consists of the following steps:
A copy of the Neo4j's database graph directory (typically
graph.db
) is created into a safe place.OrientDB is installed.
The Neo4j to OrientDB Importer is installed.
The migration process is started from the command line, passing to the Neo4j to OrientDB Importer the copy of the Neo4j's database directory created earlier.
OrientDB (embedded or server) is started and the newly imported graph database can be used.
Notes:
Since currently only exclusive, local, connections are allowed, during the migration there must be no running servers on the Neo4j's database directory and on the target OrientDB's import directory
As an alternative of creating a copy of the Neo4j's database directory, and in case you can schedule a Neo4j shutdown, you can:
Shutdown your Neo4j Server
Start the migration by passing the original Neo4j's database directory to the Neo4j to OrientDB Importer (a good practice is to create a back-up first anyway)
Usage
After installation, the Neo4j to OrientDB Importer can be launched using the provided orientdb-neo4j-importer.sh
script (or orientdb-neo4j-importer.bat
for Windows systems).
Syntax
OrientDB-Neo4j-Importer
-neo4jlibdir <neo4jlibdir>
-neo4jdbdir <neo4jdbdir>
[-odbdir <odbdir>]
[-o true | false]
Where:
neo4jlibdir (mandatory option) is the full path to the Neo4j lib directory (e.g.
D:\neo4j\neo4j-community-3.0.6\lib
). On Windows systems, this parameter must be the first passed parameter.neo4jdbdir (mandatory option) is the full path to the Neo4j graph database directory (e.g.
D:\neo4j\neo4j-community-3.0.6\data\databases\graph.db
).odbdir (optional) is the full path to a directory where the Neo4j database will be migrated. The directory will be created by the import tool. In case the directory exists already, the Neo4j to OrientDB Importer will behave accordingly to the value of the option
o
(see below). The default value of odbdir is$ORIENTDB_HOME/databases/neo4j_import
.o (optional). If
true
theodbdir
directory will be overwritten, if it exists. Iffalse
and theodbdir
directory exists, a warning will be printed and the program will exit. The default value ofo
isfalse
.
Note: If the Neo4j to OrientDB Importer is launched without parameters, it fails because -neo4jlibdir
and -neo4jdbdir
are mandatory.
Example
A typical import command looks like the following (please adapt the value of the -neo4jlibdir
and -neo4jdbdir
parameters to your specific case):
Windows:
orientdb-neo4j-importer.bat
-neo4jlibdir="D:\neo4j\neo4j-community-3.0.7\lib"
-neo4jdbdir="D:\neo4j\neo4j-community-3.0.7\data\databases\graph.db"
Linux/Mac:
./orientdb-neo4j-importer.sh
-neo4jlibdir /neo4j/neo4j-community-3.0.7/lib
-neo4jdbdir /neo4j/neo4j-community-3.0.7/data/databases/graph.db
Migration Details
Internally, the Neo4j to OrientDB Importer makes use of the Neo4j's java API to read the graph database from Neo4j and of the OrientDB's java API to store the graph into OrientDB.
The import consists of four phases:
Phase 1: Initialization of the Neo4j and OrientDB servers
Phase 2: Migration of nodes and relationships
Phase 3: Schema migration
Phase 4: Shutdown of the servers and summary info
General Migration Details
The following are some general migration details that is good to keep in mind:
During the import, OrientDB's Write Ahead Log (WAL) and WAL_SYNC_ON_PAGE_FLUSH are disabled, and OrientDB is prepared for massive inserts (OIntentMassiveInsert).
In case a node in Neo4j has no Label, it will be imported in OrientDB into the Class "GenericClassNeo4jConversion".
Starting from version 2.2.14, in case a node in Neo4j has multiple Labels, it will be imported into the Class "MultipleLabelNeo4jConversion". Before 2.2.14, only the first Label was imported.
List of original Neo4j Labels are stored as properties in the imported OrientDB vertices (property: "Neo4jLabelList").
During the import, a not unique index is created on the property "Neo4jLabelList". This allows you to query by Label even over nodes migrated into the single Class "MultipleLabelNeo4jConversion", using queries like:
SELECT FROM V WHERE Neo4jLabelList CONTAINS 'your_label_here'
or the equivalent with the MATCH syntax:MATCH {class: V, as: your_alias, where:
(Neo4jLabelList CONTAINS 'your_label'} RETURN your_alias
Original Neo4j IDs are stored as properties in the imported OrientDB vertices and edges (Neo4jNodeID for vertices and Neo4jRelID for edges). Such properties can be (manually) removed at the end of the import, if not needed.
During the import, an OrientDB index is created on the property Neo4jNodeID for all imported vertex classes (node's Labels in Neo4j). This is to speed up vertices lookup during edge creation. The created indexes can be (manually) removed at the end of the import, if not needed.
In case a Neo4j Relationship has the same name of a Neo4j Label, e.g. "RelationshipName", the Neo4j to OrientDB Importer will import that relationship into OrientDB in the Class E_RelationshipName (i.e. prefixing the Neo4j's RelationshipType with an E_).
During the creation of properties in OrientDB, Neo4j Char data type is mapped to a String data type.
Details on Schema Migration
The following are some schema-specific migration details that is good to keep in mind:
If in Neo4j there are no constraints or indexes, and if we exclude the properties and indexes created for internal purposes (Neo4jNodeID, Neo4jRelID, Neo4jLabelList and corresponding indexes), the imported OrientDB database is schemaless.
If in Neo4j there are constraints or indexes, the imported OrientDB database is schema-hybrid (with some properties defined). In particular, for any constraint and index:
The Neo4j property where the constraint or index is defined on, is determined.
A corresponding property is created in OrientDB (hence the schema-hybrid mode).
If a Neo4j unique constraint is found, a corresponding unique index is created in OrientDB.
In case the creation of the unique index fails, a not unique index will be created. Note: this failure can happen, by design, when migrating nodes that have multiple Labels, as they are imported into a single vertex Class.
If a Neo4j index is found, a corresponding (not unique) OrientDB index is created.
Migration Best Practices
Below some migration best practices.
Check if you are using Labels with same name but different case, e.g. LABEL and LAbel and if you really need them. If the correct Label is Label, change LABEL and LAbel to Label in the original Neo4j database before the import. If you really cannot change them, be aware that with the current version of the Neo4j to OrientDB Importer such nodes will be aggregated into a single OrientDB vertex Class.
Check if you are using relationships with same name but different case, e.g. relaTIONship and RELATIONSHIP and if you really need them. If the correct relationship is Relationship, change relaTIONship and RELATIONSHIP to Relationship before the import. If you really cannot change them, be aware that with the current version of the Neo4j to OrientDB Importer such relationships will be aggregated into a single OrientDB edge Class.
Check your constraints and indexes before starting the import. Sometime you have more constraints or indexes than needed, e.g. old ones that you created on Labels that you are not using anymore. These constraints will be migrated as well, so a best practice is to check that you have defined, in Neo4j, only those that you really want to import. To check constraints and indexes in Neo4j, you can type
:schema
in the Browser and then click on the "play" icon. Please delete the not needed items.Check if you are using nodes with multiple Labels, and if you really need more than one Label on them. Be aware that with current version of the Neo4j to OrientDB Importer such nodes with multiple Labels will be imported into a single OrientDB Class ("MultipleLabelNeo4jConversion").
Migration Tuning, Monitoring, and Troubleshooting
The parameter -XX:MaxDirectMemorySize=4g
is hardcoded inside the start scripts orientdb-neo4j-importer.sh
and orientdb-neo4j-importer.bat
.
Depending on the amount of available memory on your system, you may want to increase this value.
During the migration, for each imported Neo4j items (nodes, relationships, constraints, and indexes) a completion percentage is written in the shell from where the import has been started, thus allowing you to monitor progresses.
A log file is created as well. The log can be found at path_to_orientDB/log/orientdb-neo4j-importer.log
. For large imports, a best practice is to monitor the produced import log, using a program like tail
, e.g.
tail -f -n 100 -f path_to_orientDB/log/orientdb-neo4j-importer.log
In case of problems, the details of the occurred errors are written in the migration log file. You can use this file to troubleshoot the migration, or open an issue.
Connecting to the Newly Imported Database
After the migration process, you may start an OrientDB server using the server.sh
or server.bat
scripts.
You can connect to the newly imported database through Studio or the Console, using the OrientDB's default database users, e.g. using the user admin and password admin.
Please secure your database by removing the default users, if you don't need them, or by creating new users.
For further information on using OrientDB, please refer to the Getting Started Guide.
Query Strategies
This section includes a few strategies that you can use to query your data after the import.
As first thing, please be aware that in OrientDB you can query your data using both SQL or pattern matching. In case you are familiar with Neo4j's Cypher query language, it may be more easy for you to use our pattern matching (have a look at the MATCH syntax for more details). However, keep in mind that depending on your specific use case, OrientDB's SQL can be of great help.
Counting All Nodes
To count all nodes (vertices):
Neo4j's Cypher |
OrientDB's SQL |
|
|
Counting All Relationships
To count all relationships (edges):
Neo4j's Cypher |
OrientDB's SQL |
|
|
Querying Nodes by Original Neo4j ID
If you would like to query nodes by their original Neo4j Node ID, you can use the property Neo4jNodeID, which is created automatically for you during the import, and indexed as well.
To query a node that belongs to a specific Class with name ClassName, you can execute a query like:
SELECT FROM ClassName WHERE Neo4jNodeID = your_id_here
To query a node regardless of the Class where it has been included in, you can use a query like:
SELECT FROM V WHERE Neo4jNodeID = your_id_here
Querying Relationships by Original Neo4j ID
The strategy to query relationships by their original Neo4j Relationship ID, will be improved in a next hotfix (see GitHub Issue #9, which also includes a workaround).
Querying Nodes by Original Neo4j Labels
In case the original nodes have just one Label, they will be migrated in OrientDB into a Class that has name equals to the Neo4j Label name. In this simple case, to query nodes by Label you can execute a query like the following:
Neo4j's Cypher |
OrientDB's SQL |
|
or using the MATCH syntax:
|
More generally speaking, since the original Neo4j Label is stored inside the property Neo4jLabelList, to query imported nodes (vertices) using their original Neo4j Label, you can use queries like the following:
Neo4j's Cypher |
OrientDB's SQL |
MATCH (n:LabelName) RETURN n
|
or using the MATCH syntax: |
This is, in particular, the strategy that must be followed in case the original Neo4j's nodes have multiple Labels (and are hence migrated into the single OrientDB Class "MultipleLabelNeo4jConversion").
Note that the property Neo4jLabelList has an index on it.
Limitations
Every tool has some limitations. The following are the limitations of the current version (2.2.14) of the Neo4j to OrientDB Importer:
Currently only local migrations are allowed.
Schema limitations:
As we saw, in case a node in Neo4j has multiple Labels, it will be imported into a single OrientDB Class ("MultipleLabelNeo4jConversion").
Note that the information about the original set of Labels is not lost but stored into an internal property of the imported vertex ("Neo4jLabelList"). As a result, it will be possible to query nodes with a specific Neo4j Label. Note also that the nodes imported into the single class "MultipleLabelNeo4jConversion" can then be moved to other Classes, according to your specific needs, using the MOVE VERTEX command.
Neo4j Nodes with same Label but different case, e.g. LABEL and LAbel will be aggregated into a single OrientDB vertex Class.
Neo4j Relationships with same name but different case, e.g. relaTIONship and RELATIONSHIP will be aggregated into a single OrientDB edge Class.
Migration of Neo4j's Existence Constraints (only available in the Neo4j Enterprise Edition) is currently not implemented.
Note that future versions of the Importer may have solved some limitations; please check the latest version of the official documentation for up-to-date information on this topic.
Migration Examples
More and more users and data scientists are using the Neo4j to OrientDB Importer to migrate their data to OrientDB, with interesting use-cases.
In this article we wanted to focus more on giving a general overview of the Importer, rather than presenting specific migration examples. This video provides details on how to migrate the Panama Papers database. Other Articles with in-depth explanations of real-case migrations may follow this one.
However, we just wanted to include in this section few details about the migration of the Neo4j example database northwind.
Assuming that:
You already downloaded and installed OrientDB and the Neo4j to OrientDB Importer (details on how to install OrientDB can be found here)
/home/santo/neo4j/neo4j-community-3.0.7/lib
is the full path to the directory that includes the Neo4j libraries/home/santo/data/graph.db_northwind
is the full path to the directory that contains the Neo4j northwind database/home/santo/orientdb/orientdb-community-2.2.14/databases/northwind_import
is the full path to the directory where you would like to migrate the northwind databaseNo Neo4j and OrientDB servers are running on those directories
You can import the northwind database into OrientDB with a command similar to the following:
./orientdb-neo4j-importer.sh \
-neo4jlibdir /home/santo/neo4j/neo4j-community-3.0.7/lib \
-neo4jdbdir /home/santo/neo4j/data/graph.db_northwind \
-odbdir /home/santo/orientdb/orientdb-community-2.2.14/databases/northwind_import
The following is the output that is written by the Neo4j to OrientDB Importer during this migration:
Neo4j to OrientDB Importer v.2.2.14 - Copyrights (c) 2016 OrientDB LTD
WARNING: 'o' option not found. Defaulting to 'false'.
Please make sure that there are no running servers on:
'/home/santo/neo4j/data/graph.db_northwind' (Neo4j)
and:
'/home/santo/orientdb/orientdb-community-2.2.14/databases/northwind' (OrientDB)
Initializing Neo4j...Done
Initializing OrientDB...Done
Importing Neo4j database:
'/home/santo/neo4j/data/graph.db_northwind'
into OrientDB database:
'/home/santo/orientdb/orientdb-community-2.2.14/databases/northwind'
Getting all Nodes from Neo4j and creating corresponding Vertices in OrientDB...
1035 OrientDB Vertices have been created (100% done)
Done
Creating internal Indices on properties 'Neo4jNodeID' & 'Neo4jLabelList' on all OrientDB Vertices Classes...
10 OrientDB Indices have been created (100% done)
Done
Getting all Relationships from Neo4j and creating corresponding Edges in OrientDB...
3139 OrientDB Edges have been created (100% done)
Done
Getting Constraints from Neo4j and creating corresponding ones in OrientDB...
0 OrientDB UNIQUE Indices have been created
Done
Getting Indices from Neo4j and creating corresponding ones in OrientDB...
5 OrientDB Indices have been created (100% done)
Done
Import completed!
Shutting down OrientDB...Done
Shutting down Neo4j...Done
===============
Import Summary:
===============
- Found Neo4j Nodes : 1035
-- With at least one Label : 1035
--- With multiple Labels : 0
-- Without Labels : 0
- Imported OrientDB Vertices : 1035 (100%)
- Found Neo4j Relationships : 3139
- Imported OrientDB Edges : 3139 (100%)
- Found Neo4j Constraints : 0
- Imported OrientDB Constraints (UNIQUE Indices created) : 0
- NOT UNIQUE Indices created due to failure in creating UNIQUE Indices : 0
- Found Neo4j (non-constraint) Indices : 5
- Imported OrientDB Indices : 5 (100%)
- Additional created Indices (on vertex properties 'Neo4jNodeID' & 'Neo4jLabelList') : 10
- Total Import time: : 32 seconds
-- Initialization time : 7 seconds
-- Time to Import Nodes : 5 seconds (198.35 nodes/sec)
-- Time to Import Relationships : 7 seconds (465.24 rels/sec)
-- Time to Import Constraints and Indices : 4 seconds (1.18 indices/sec)
-- Time to create internal Indices (on vertex properties 'Neo4jNodeID' & 'Neo4jLabelList') : 9 seconds (1.13 indices/sec)
The following is a partial visualization of the northwind database done with the Graph Editor included in OrientDB's Studio:
As you can see from the Limit field, the visualization is limited to 200 vertices.
The image below, instead, includes the graph returned by the following MATCH query:
MATCH {class: Order, where: (orderID = 10344)}--{as: n} RETURN $pathelements
(The query returns all nodes connected to the Order with orderID 10344.)
From Studio's Schema Manager, you can check all imported Vertex Classes (node Labels in Neo4j), Edge Classes (Relationship Types in Neo4j), and Indexes:
V
and E
are special classes: they include all Vertices and all Edges.
As they become available, more migration examples and tutorials may be found in the official Neo4j to OrientDB Importer documentation.
Roadmap
There are exciting features on the Roadmap of the Neo4j to OrientDB Importer.
At OrientDB Ltd, we feel it is important to be open with our community, and that's why we use GitHub for issues and enhancements.
At the time of writing this article, the main, noteworthy, areas of improvements of the Neo4j to OrientDB Importer are the following:
Integration with Studio (OrientDB's management tool) - Studio's Issue #432
We want to help you migrating your data in a visual way (in addition to the command line method). And we are working to integrate the Neo4j to OrientDB Importer into Studio, the OrientDB Management Tool.
Customized mapping between Neo4j Labels and OrientDB Classes - Issue #8
We want to improve the migration in those cases where Neo4j nodes have multiple labels by providing you a way to define a customized mapping between Neo4j Labels and OrientDB Classes.
Better customization through a configuration file - Issue #10
We want to help you customizing more your migration by using a configuration file that allows you to set custom names for the properties Neo4jNodeID, Neo4jRelID, Neo4jLabelList, and for the classes GenericClassNeo4jConversion, MultipleLabelNeo4jConversion.
All issues and enhancements can be found here. A list of prioritized enhancements, along with some other project information, can be found here.
Additional Resources
A list of resources that can help you get started with the Neo4j to OrientDB Importer can be found below:
GitHub repository (feedback, questions, issues, enhancements, contributions).
Tutorial video (note that the video was created in November 2016. Please consider that depending on when you are reading this Article, the video may be outdated; the migration example should give you an idea anyway).
Conclusions
We started this Article with an introduction to OrientDB, the open-source distributed Graph/Document Multi-Model Database and discussing, briefly, some advantages of OrientDB over Neo4j.
The legacy Neo4j to OrientDB migration method, which makes use of GraphML has some limitations and does not perform well for big graphs. At OrientDB Ltd, we want to help you having an easy and straightforward process while you migrate your data from Neo4j. For this reason we developed and released the Neo4j to OrientDB Importer, a new easy to use tool that allows you to import a Neo4j Graph Database into OrientDB in a few simple steps.
We discussed how to install and use the Neo4j to OrientDB Importer, the typical migration scenarios, some migration best practices, and we included also internal details on the migration. We described how to tune, monitor, and troubleshoot the migration, and what are the current limitations of the Importer.
We hence discussed some query strategies, and converted some Neo4j's Cypher queries to the OrientDB SQL, providing ready-to-use query examples.
In this article, we wanted to focus more on giving a general overview of the Importer, rather than presenting specific migration examples. However, we provided a link to a video that includes details on how to migrate the Panama Papers database, and we added a section with some details about the migration of the Neo4j example database northwind. We included a few visualizations of the northwind database done with the Graph Editor included in Studio, the OrientDB Management Tool, as well as a screenshot of Studio's Schema Manager, where you can check all imported Vertex Classes (node Labels in Neo4j), Edge Classes (Relationship Types in Neo4j), and Indexes.
We concluded this article sharing the roadmap of the Neo4j to OrientDB Importer, as we feel it is important to be open with our community, and providing a list of resources.
If you’re coming from Neo4j and recently decided to make the switch to OrientDB, or if you would just like to try OrientDB with an existing Neo4j Database, the Neo4j to OrientDB Importer makes it easy for you to import your nodes, relationships, constraints, and indexes.
We hope you find the Importer tool useful, and we invite you to contact us via GitHub or our Google group for issues, enhancements or questions.
Opinions expressed by DZone contributors are their own.
Comments