Over a million developers have joined DZone.

Solving ORM: Keep the O, Drop the R, No Need for the M

Download Forrester’s “Vendor Landscape, Application Performance Management” report that examines the evolving role of APM as a key driver of customer satisfaction and business success, brought to you in partnership with BMC.

ORM has a simple, production-ready solution hiding in plain sight in the Java world. Let’s go through it in this post, alongside with the following topics:

  • ORM / Hibernate in 2014 - the word on the street
  • ORM is still the Vietnam of Computer Science
  • ORM has 2 main goals only
  • When does ORM make sense?
  • A simple solution for the ORM problem
  • A production-ready ORM Java-based alternative

ORM / Hibernate in 2014 - the word on the street

It's been almost 20 years since ORM is around, and soon we will reach the 15th birthday of the creation of the de-facto and likely best ORM implementation in the Java world: Hibernate.

We would then expect that this is by know a well understood problem. But what are developers saying these days about Hibernate and ORM?

Let's take some quotes from two recent posts on this topic: Thoughts on Hibernate and JPA Hibernate Alternatives:

There are performance problems related to using Hibernate

A lot of business operations and reports involve writing complex queries. Writing them in terms of objects and maintaining them seems to be difficult

We shouldn't be needing a 900 page book to learn a new framework.

As Java developers we can easily relate to that: ORM frameworks tend to give cryptic error messages, the mapping is hard to do and the runtime behavior namely with lazy initialization exceptions can be surprising when first encountered.

Who hasn't had to maintain that application that uses Open Session In View pattern that generated a flood of SQL requests that took weeks to optimize?

I believe it literally can take a couple of years to really understand Hibernate, lot's of practice and several readings of the Java Persistence with Hibernate book (still 600 pages in it's upcoming second edition).

Are the criticisms on Hibernate warranted?

I personally don't think so, in fact most developers really criticize the complexity of the object-relational mapping approach itself, and not a concrete ORM implementation of it in a given language.

This sentiment seems to come and go in periodic waves, maybe when a newer generation of developers hits the labor force. After hours and days trying to do what feels it should be much simpler, it's only a natural feeling.

The fact is that there is a problem: why do many projects spend 30% of their time developing the persistence layer still today?

ORM is the Vietnam of Computer Science

The problem is that the ORM problem is complex, and there are no good solutions. Any solution to it is a huge compromise.

ORM has been famously named almost 10 years ago the Vietnam of Computer Science, in a blog post from one of the creators of Stackoverflow, Jeff Atwood.

The problems of ORM are well known and we won't go through them in detail here, here is a summary from Martin Fowler on why ORM is hard:

  • object identity vs database identity
  • How to map object oriented inheritance in the relational world
  • unidirectional associations in the database vs bi-directional in the OO world
  • Data navigation - lazy loading, eager fetching
  • Database transactions vs no rollbacks in the OO world

This is just to name the main obstacles. The problem is also that it's easy to forget what we are trying to achieve in the first place.

ORM has 2 main goals only

ORM has two main goals clearly defined:

  • map objects from the OO world into tables in a relational database

  • provide a runtime mechanism for keeping an in-memory graph of objects and a set of database tables in sync

Given this, when should we use Hibernate and ORM in general?

When does ORM make sense ?

ORM makes sense when the project at hand is being done using aDomain Driven Development approach, where the whole program is built around a set of core classes called the domain model, that represent concepts in the real world such as CustomerInvoice, etc.

If the project does not have a minimum threshold complexity that needs DDD, then an ORM can likely be overkill. The problem is that even the most simple of enterprise applications are well above this threshold, so ORM really pulls it's weight most of the time.

It's just that ORM is hard to learn and full of pitfalls. So how can we tackle this problem?

A simple solution for the ORM problem

Someone once said something like this:

A smart man solves a problem, but a wise man avoids it.

As often happens in programming, we can find the solution by going back to the beginning and see what we are trying to solve:

ORM

So we are trying to synchronize an in-memory graph of objects with a set of tables. But these are two completely different types of data structures!

But which data structure is the most generic? It turns out that the graph is the most generic one of the two: actually a set of linked database tables is really just a special type of graph.

The same can be said of basically almost any other data structure.

Graphs and their traversal are very well understood and have a body of knowledge of decades available, similar to the theory on which relational databases are built upon: Relational Algebra.

Solving the impedance mismatch

The logical conclusion is that the solution for the ORM impedance mismatch is removing the mismatch itself:

Let's store the graph of in-memory domain objects in a transactional-capable graph database!

This solves the mapping problem, by removing the need for mapping in the first place.

A production-ready solution for the ORM problem

This is easier said than done, or is it? It turns out that graph databases have been around for years, and the prime example in the Java community is Neo4j.

Neo4j is a stable product that is well understood and documented, see theNeo4J in Action book. It can used as an external server or in embedded mode inside the Java process itself.

But it's core API is all about graphs and nodes, something like this:

GraphDatabaseService gds = new EmbeddedGraphDatabase("/path/to/store");  
Node forrest=gds.createNode();  
forrest.setProperty("title","Forrest Gump");  
forrest.setProperty("year",1994);  
gds.index().forNodes("movies").add(forrest,"id",1);
 
Node tom=gds.createNode(); 

The problem is that this is too far from domain driven development, writing to this would be like coding JDBC by hand.

This is the typical task of a framework like Hibernate, with the big difference that because the impedance mismatch is minimal such framework can operate in a much more transparent and less intrusive way.

It turns out that such framework is already written (there are even two).

Spring Data Neo4J

One of the creators of the Spring framework Rod Johnson took the task of implementing himself the initial version of the Neo4j integration, theSpring Data Neo4j project.

This is an important extract from the foreword of Rod Johnson in the documentation concerning the design of the framework:

Its use of AspectJ to eliminate persistence code from your domain model is truly innovative, and on the cutting edge of today's Java technologies.

So Spring Data Neo4J is a AOP-based framework that wraps domain objects in a relatively transparent way, and synchronizes a in-memory graph of objects with a Neo4j transactional data store.

It's aimed to write the persistence layer of the application in a simplified way, similar to Spring Data JPA.

How does the mapping to a graph database look like

It turns out that there is limited mapping needed (tutorial). We need for one to mark which classes we want to make persistent, and define a field that will act as an Id:

@NodeEntity
class Movie {  
    @GraphId Long nodeId;
    String id;
    String title;
    int year;
    Set<role> cast;
}

There are other annotations (5 more per the docs) for example for defining indexing and relationships with properties, etc. Compared with Hibernate there is only a fraction of the annotations for the same domain model.

What does the query language look like?

The recommended query language is Cypher, that is an ASCII art based language. A query can look for example like this:

 // returns users who rated a movie based on movie title (movieTitle parameter) higher than rating (rating parameter)
    @Query("start movie=node:Movie(title={0}) " +
           "match (movie)<-[r:RATED]-(user) " +
           "where r.stars > {1} " +
           "return user")
     Iterable<user> getUsersWhoRatedMovieFromTitle(String movieTitle, Integer rating);

The query language is very different from JPQL or SQL and implies a learning curve.

Still after the learning curve this language allows to write performant queries that usually can be problematic in relational databases.

Performance of Queries in Graph vs Relational databases

Let's compare some frequent query types of queries and see how they should perform in a graph vs relational databases:

  • lookup by Id: This is implemented for example by doing a binary search on an index tree, finding a match and following a 'pointer' to the result. This is a (very) simplified description, but it's likely identical for both databases. There is no apparent reason why such query would take more time in a graph database than in a relational DB.

  • lookup parent relations: This is the type of query that relational databases struggle with. Self-joins might result in cartesian products of huge tables, bringing the database to an halt. A graph database can perform those queries in a fraction of that.

  • lookup by non-indexed column: Here the relational database can scan tables faster due to the physical structure of the table and the fact than one read usually brings along multiple rows. But this type of queries (table scans) are to be avoided in relational databases anyway.

There is more to say here, but there is no indication (no readilly-available DDD-related public benchmarks) that a graph-based data store would not be appropriate for doing DDD due to query performance.

Prefer a JPA based solution?

The Hibernate team has written Hibernate OGM, which provides JPA support for many NoSQL data stores, including Neo4j.

Conclusions

I personally cannot find any (conceptual) reasons why a transaction-capable graph database would not be an ideal fit for doing Domain Driven Development, as an alternative to a relational database and ORM.

No data store will ever fit perfectly every use case, but we can ask the question if graph databases shouldn't become the default for DDD, and relational the exception.

The disappearance of ORM would imply a great reduction of the complexity and the time that it takes to implement a project.

The future of DDD in the enterprise

The removal of the impedance mismatch and the improved performance of certain query types could be the killer features that drive the adoption of a graph based DDD solution.

We can see practical obstacles: operations prefer relational databases, vendor contract lock-in, having to learn a new query language, limited expertise in the labor market, etc.

But the economic advantage is there, and the technology is there also. And when that is case, it's usually only a matter of time.

What about you, could you think of any reason why Graph-based DDD would not work? Feel free to chime in on the comments bellow.

See Forrester’s Report, “Vendor Landscape, Application Performance Management” to identify the right vendor to help IT deliver better service at a lower cost, brought to you in partnership with BMC.

Topics:

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}