After this metaphorical article by Uncle Bob on the subject of the impedance mismatch between objects and relational schemas, I figured I'll write a technical explanation of the mismatch and the advantages of using ORMs that try to solve it (hint: by limiting severely what you can do).
As Uncle Bob correctly states, the mapping performed by ORM does not fill relational tables by starting from objects, but from the internal data model used by a language to represent objects in memory. This means an ORM usually requires access to all the private fields of objects, compromising encapsulation.
However, it is misleading to think that an ORM does not work on objects as input and output as it often performs much work to try to keep the object model intact. It accesses fields by reflection to avoid forcing your properties to be public; it substitutes Proxies instead of real objects to avoid the instantiation of a large object graph; and after all, it recreates instances of your favorite classes, not giving you a generic ResultSet object but conforming to your API instead.
Let's try a thought experiment: suppose you are able to take a periodical snapshot of RAM (every second, for example, could be enough for many web applications with non-critical data). This means in languages such as Java and .NET you can keep, with enough memory, your whole object graph as instantiated objects instead of as rows on a disk of a database server.
Here are the relational mapping features that you lose in this situation:
- the memory savings: you have to allocate as much RAM as is needed for all the objects, even if you use only a small working set of them at any time.
- incidentally, this means you have to keep your objects on a single machine as the various pointers/handlers connecting them cannot cross between machines without some abomination such as RMI/SOAP/insert your favorite remote procedure call protocol. I think secondary servers (a read-only copy of your object graph) and sharding (partitioning Aggregates into your object model) could still be possible.
- querying on B-trees or other indexing data structures cannot be performed. By default, every search on your object model different from finding by id is a linear search; unless you take the time to introduce and maintain several additional data structures in your Repository classes.
- you cannot perform transactions over multiple objects, even inside a single Aggregate containing just a few of them. Not only this means in case of exceptions your object may end up in an inconsistent state, but you have to resort to synchronization to avoid exposing changes to one object in the aggregate while the others still have to be updated.
- whenever you update the code of a class on a production system, you need to hot swap the code in while ensuring retrocompatibility with the other classes involved. How to swap it in is a non-trivial problem and requires an API in your object model (Erlang does this with functions, safely).
On the other hand, to provide all these features ORMs and relational databases put strong constraints on your object model. Now think of your object model first, it's a very good discipline to limit of the influence of these relational constraints; but if you fail to adapt the object model to its datastore (being it relational or NoSQL-based), you fail at reality and may lose the powerful ORM features listed above (such as *querying*). This is indeed intended in some architectural styles, but all architectural styles have limits of applicability.
ORMs puts the following constraints:
- Only Entities and Value Objects can be persisted in them, as objects modelling the state of your application and the behavior that can be kept in them.
- This also means there is a standard bag of tricks to establish outward references from Entities and Value Objects when they are thawed from the data store (such as a Service Locator, __wakeup() nethods, or hooks in the reconstitution process).
- Data structures which were unlimited in size while in memory such as strings, usually have to be given a maximum size for performance reasons (char(16), varchar(255) instead of longtext).
- Objects modelling machine resources cannot be persisted since they cannot be synchronized with the machine freeing them; say goodbye to Memcache connections, or opened files.
- There has to be a way to build an object from state instead of its public API, such as a no-arguments constructor (some languages are more flexible on this).
There are good reasons for ORMs to require this from object models they have to persist: otherwise ORMs would be impossible to build or even more complex than now, which is telling.
Despite all our conceptual discussions, Von Neumann doesn't care: machines only execute a list of instructions for the CPU whether they violate the architecture or not. It is a violation of "pure" object modeling to use ORMs to persist them; but so are all the nice features we get from this architectural style, and we usually don't want to renounce to them.