A Phase Shift for the ORM
I came to know about Squealer from one of Dean's tweets
over the last weekend. Over there at the git repo README, there's a
statement which makes a very succinct point on the role that relational
mappers will be playing in the days to come. It says "... ORMs had it
the wrong way around: that the application should be persisting its
data in a manner natural to it, and that external systems (like
reporting and decision support systems - or even numbskull integration
at the persistence layer) should bear the cost of mapping."
I have expressed similar observations in the past, when I talked about the rationale of modeling data close to the way applications will be using them. I talk about this same architecture in an upcoming IEEE Software Multi-Paradigm Programming Special Issue for Sep/Oct 2010.
In most of the applications that churn out domain models in an object oriented language and persist data in a relational store, we use the ORM layer as follows :
It sits between the domain model and the relational database, provides an isolation layer between the two at the expense of an intrusive framework invasion within an otherwise non-complicated application architecture.
The scenario changes if we allow the application to manipulate and persist data in the same form that it uses for modeling its domain. My email application needs an address book as a composite object instead of being torn apart into multiple relational tables in the name of normalization. It will be better if I can program the domain model to access and persist all its entities in a document store that gives it the same flexibility. So the application layer does not have to deal with the translation between the two data models that adds a significant layer of complexity today. The normal online application flow doesn't need an ORM.
How does the translation layer get shifted ?
Consider an example application that uses Terrastore as the persistent storage for implementing online domain functionalities. Terrastore is a document database having some similarities with each of CouchDB, MongoDB and Riak in the sense that data is stored in the form of JSON documents. However unlike others it has an emphasis on the "C" component of the CAP theorem and provides advanced scalability and elasticity features without sacrificing consistency.
Terrastore is built on top of Terracotta, which is itself an ACID based object database that offers storage of data larger than your RAM size clustered across multiple machines. Terrastore uses the storage and clustering capabilities of Terracotta and adds more advanced features like partitioning, data manipulation and querying through client APIs.
As long as you are using Terrastore as your persistent database for the domain model, your data layer is at the same level of abstraction as your domain layer. You manipulate objects in memory and store them in Terrastore in the same format. Terrastore, like all other NoSQL stores is schemaless and offers a collection based interface storing JSON documents. No impedance mismatch to handle so far between the application and the data model.
Have a look at the following figure where there's no additional framework sitting between the application model and the data storage (Terrastore).
However there are many reasons why you would like to have a relational database as an underlying store. Requirements like ad hoc reporting, building decision support systems or data warehouses are some of the areas which are best supported with relational engines or any of the extensions that relational vendors offer. Such applications are not real time and can very well be served out of a snapshot of data that has a temporal lag from the online version.
Every NoSQL store and many SQL ones as well offer commit handlers for publishing async jobs. In Terrastore you can write custom event handlers that you can use to publish information from the document store to an underlying relational store. It's as simple as implementing the terrastore.event.EventListener interface. This is well illustrated in the above figure. Translation to the relational model takes place here which is one level down the stack in your application architecture. The Terrastore event handlers are queued up in a synchronous FIFO manner while they execute asynchronously which is exactly what you need to scale out your application.
I took up Terrastore just as an example - you can do most of the stuff (some of them differently) with other NoSQL stores as well front ending as the frontal persistent store of your application. In real life usage choose the store that best maps the need of your domain model. It can be a document store, it can be a generic key value store or a graph store.
The example with which I started the post, Squealer, provides a simple, declarative Ruby DSL for mapping values from document trees into relations. It was built to serve exactly the above use case with MongoDB as the frontal store of your application and MySQL providing the system of record as an underlying relational store. In this model also we see a shift of the translation layer from the main online application functionalities to a more downstream component of the application architecture stack. As the document says "It can be used both in bulk operations on many documents (e.g. a periodic batch job) or executed for one document asynchronously as part of an after_save method (e.g. via a Resque job). It is possible that more work may done on this event-driven approach, perhaps in the form of a squealerd, to reduce latency". Nice!
All the above examples go on to show us that the translation layer which has so long been existing between the core application domain logic and the persistent data model has undergone a phase shift. With many of today's NoSQL data stores allowing you to model your data closer to how the domain needs it, you can push away the translation further downstream. But again, you can only do this for some applications that fit this paradigm. With applications that need instant access to the relational store, this will not be a good fit. Use it whereever you deem it's applicable - besides the simplicity that you will get in your application level programming model, you can also scale your application more easily when you have a scalable frontal database along with non-blocking asynchronous writes shoveling data into the relational model