Over a million developers have joined DZone.

Marrying JPA with Graph Databases

DZone 's Guide to

Marrying JPA with Graph Databases

· Database Zone ·
Free Resource

For a month or two, I have been exploring Neo4J, a graph database built for storing huge amount of data. Other popular graph database, that I will be dwelling into is InfiniteGraph from Objectivity.

I have also been working on Kundera (A JPA 2.0 based object-datastore mapping library for NoSQL datastores) as a key contributor. It already supports popular databases like Cassandra, HBase MongoDB and Redis. So next thought on our mind was to support this wonderful and popular graph database, you guessed it right: Neo4J.

JPA specification was not written keeping in mind NoSQL datastores; and graph databases are altogether a different story, they take object mapping challenges to a next level.  It made us sweat and argue countless hours on how to fit JPA into graph world. We dived deeper into both JPA and Neo4J, and here is how our journey unfold and key decision made…

SpringData for Neo4J is another similar effort that attempts POJO based development for Neo4J. In terms of ease of use, our goals converged. It introduces its own annotations for two category of entities: NodeEntity and RelationshipEntity. We were constrained with using JPA standards and decided not to introduce any new annotation.

Next item on our mind was to make rules for entity definition capable enough for users to express graph structure in the form of java entity classes. Here is what we thought best suited and was possible:

1. Both Node and Relationship POJOs would be annotated with @Entity.

2. Because of graph’s very nature, relationships between entities is always @ManyToMany. So, we decided to discard other forms of relationships for the sake of simplicity.

3. Biggest challenge was to fit “Relationship Entity” between different classes of “Node Entities”.

Take, for example case of Actor and Movie nodes entities joined via relationship entity Role. Actor is related to Movie via Role entity. Till now we had been keeping a List or Set of entities as relations. But this approach wasn’t sufficient as there was a third dimension here (in the form of Role).

Map Collections in JPA came to our rescue. This means Actor entity class can have a Map, containing Role as key and Movie as value. So far so good. Next thing was to choose relationship type and direction.

Relationship type could be read from @MapKeyJoinColumn annotation. Direction was implicit because in bidirectional relationship, you always have an owning side of entity. (mappedBy is specified at the other side). So relationship direction could easily be derived as OUTGOING from Actor to Movie.

4. Next consideration was to replicate flexibility of navigation that Neo4J provides in jumping from one node to other nodes via relationships. Bidirectional relationship made it possible to navigate from Actor to Movies via Role and vice versa. We decided to let user define Incoming and Outgoing Node entity attributes in relationship entity too (in addition to relationship entity’s own attributes), that would make it easy to navigate from Role to both Actors and Movies.

5. In my previous experience with other NoSQL databases, it didn’t matter whether database was on localhost or some other machine. We provided host and port for creating connection and use it just like RDBMS. In Neo4J, We’ve got two ways:

  • Embedded Database – In case database is expected to run on the same machine (faster but less flexible)
  • REST interface – In case database is on some other machine. (slower but more flexible)

This means, we required to create two translations for user CRUD calls and give users a way of choosing Embedded/ REST.

6. Next item on our plate was how to interpret JPA queries and run them on indexes. Since indexes in Neo4J are stored in Lucene in simplest configuration (and it was easy to build a JPA to Lucene conversion engine), we decided to translate all JPA queries into Lucene ones and run them directly on indices.

We identified three types of Native queries though. Lucene, Cypher and Gremlin. We started with Lucene first because it was simplest to implement and decided to implement support for remaining ones in subsequent releases.

So, summing this up all, It was challenging but rewarding to marry both of these heterogeneous world off. Once this gets fructifies, we shall seek for more refinement and additions. I shall post Kundera-Neo4J documentation links and code examples once it’s released.


Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}