ORM Is an Offensive Anti-Pattern
Join the DZone community and get the full member experience.
Join For Free
{editor's note: thanks to yegor bugayenko, a new mvb at dzone. among other things, yegor blogs about java and devops. we're pleased to have him on board as a most valuable blogger. check out his blog, yegor256.com .}
tl;dr orm is a terrible anti-pattern that violates all principles of object-oriented programming, tearing objects apart and turning them into dumb and passive data bags. there is no excuse for orm existence in any application, be it a small web app or an enterprise-size system with thousands of tables and crud manipulations on them. what is the alternative? sql-speaking objects .
how orm works
object-relational mapping (orm) is a technique (a.k.a. design pattern) of accessing a relational database from an object-oriented language (java, for example). there are multiple implementations of orm in almost every language; for example: hibernate for java, activerecord for ruby on rails, doctrine for php, and sqlalchemy for python. in java, the orm design is even standardized as jpa .
first,
let's see how orm works, by example. let's use java, postgresql, and
hibernate. let's say we have a single table in the database, called
post
:
+-----+------------+--------------------------+ | id | date | title | +-----+------------+--------------------------+ | 9 | 10/24/2014 | how to cook a sandwich | | 13 | 11/03/2014 | my favorite movies | | 27 | 11/17/2014 | how much i love my job | +-----+------------+--------------------------+
now
we want to crud-manipulate this table from our java app (crud stands
for create, read, update, and delete). first, we should create a
post
class (i'm sorry it's so long, but that's the best i can do):
@entity @table(name = "post") public class post { private int id; private date date; private string title; @id @generatedvalue public int getid() { return this.id; } @temporal(temporaltype.timestamp) public date getdate() { return this.date; } public title gettitle() { return this.title; } public void setdate(date when) { this.date = when; } public void settitle(string txt) { this.title = txt; } }
before any operation with hibernate, we have to create a session factory:
sessionfactory factory = new annotationconfiguration() .configure() .addannotatedclass(post.class) .buildsessionfactory();
this factory will give us "sessions" every time we want to manipulate with
post
objects. every manipulation with the session should be wrapped in this code block:
session session = factory.opensession(); try { transaction txn = session.begintransaction(); // your manipulations with the orm, see below txn.commit(); } catch (hibernateexception ex) { txn.rollback(); } finally { session.close(); }
when the session is ready, here is how we get a list of all posts from that database table:
list posts = session.createquery("from post").list(); for (post post : (list<post>) posts){ system.out.println("title: " + post.gettitle()); }
i think it's clear what's
going on here. hibernate is a big, powerful engine that makes a
connection to the database, executes necessary sql
select
requests, and retrieves the data. then it makes instances of class
post
and stuffs them with the data. when the object comes to us, it is
filled with data, and we should use getters to take them out, like we're
using
gettitle()
above.
when we want to do a reverse
operation and send an object to the database, we do all of the same but
in reverse order. we make an instance of class
post
, stuff it with the data, and ask hibernate to save it:
post post = new post(); post.setdate(new date()); post.settitle("how to cook an omelette"); session.save(post);
this is how almost every orm works. the basic principle is always the same — orm objects are anemic envelopes with data. we are talking with the orm framework, and the framework is talking to the database. objects only help us send our requests to the orm framework and understand its response. besides getters and setters, objects have no other methods. they don't even know which database they came from.
this is how object-relational mapping works.
what's wrong with it, you may ask? everything!
what's wrong with orm?
seriously, what is wrong? hibernate has been one of the most popular java libraries for more than 10 years already. almost every sql-intensive application in the world is using it. each java tutorial would mention hibernate (or maybe some other orm like toplink or openjpa) for a database-connected application. it's a standard de-facto and still i'm saying that it's wrong? yes.
i'm claiming that the entire idea behind orm is wrong. its invention was maybe the second big mistake in oop after null reference .
actually, i'm not the only one saying something like this, and definitely not the first. a lot about this subject has already been published by very respected authors, including ormhate by martin fowler, object-relational mapping is the vietnam of computer science by jeff atwood, the vietnam of computer science by ted neward, orm is an anti-pattern by laurie voss, and many others.
however, my argument is different than what they're saying. even though their reasons are practical and valid, like "orm is slow" or "database upgrades are hard", they miss the main point. you can see a very good, practical answer to these practical arguments given by bozhidar bozhanov in his orm haters don’t get it blog post.
the main point is that orm, instead of encapsulating database interaction inside an object, extracts it away, literally tearing a solid and cohesive living organism apart. one part of the object keeps the data while another one, implemented inside the orm engine (session factory), knows how to deal with this data and transfers it to the relational database. look at this picture; it illustrates what orm is doing.
i, being a reader of posts, have to deal with two components: 1) the orm and 2) the "obtruncated" object returned to me. the behavior i'm interacting with is supposed to be provided through a single entry point, which is an object in oop. in the case of orm, i'm getting this behavior via two entry points — the orm and the "thing", which we can't even call an object.
because of this terrible and offensive violation of the object-oriented paradigm, we have a lot of practical issues already mentioned in respected publications. i can only add a few more.
sql is not hidden
. users of orm should speak sql (or its dialect, like
hql
). see the example above; we're calling
session.createquery("from post")
in order to get all posts. even though it's not sql, it is very similar
to it. thus, the relational model is not encapsulated inside objects.
instead, it is exposed to the entire application. everybody, with each
object, inevitably has to deal with a relational model in order to get
or save something. thus, orm doesn't hide and wrap the sql but pollutes
the entire application with it.
difficult to test
. when some object is working a list of posts, it needs to deal with an instance of
sessionfactory
.
how can we mock this dependency? we have to create a mock of it? how
complex is this task? look at the code above, and you will realize how
verbose and cumbersome that unit test will be. instead, we can write
integration tests and connect the entire application to a test version
of postgresql. in that case, there is no need to mock
sessionfactory
,
but such tests will be rather slow, and even more important, our
having-nothing-to-do-with-the-database objects will be tested against
the database instance. a terrible design.
again, let me reiterate. practical problems of orm are just consequences. the fundamental drawback is that orm tears objects apart, terribly and offensively violating the very idea of what an object is .
sql-speaking objects
what is the alternative? let me show it to you by example. let's try to design that class,
post
, my way. we'll have to break it down into two classes:
post
and
posts
, singular and plural. i already mentioned in one of my previous
articles
that a good object is always an abstraction of a real-life entity. here
is how this principle works in practice. we have two entities: database
table and table row. that's why we'll make two classes;
posts
will represent the table, and
post
will represent the row.
as i also mentioned in that
article
,
every object should work by contract and implement an interface. let's
start our design with two interfaces. of course, our objects will be
immutable. here is how
posts
would look:
@immutable interface posts { iterable<post> iterate(); post add(date date, string title); }
this is how a single
post
would look:
@immutable interface post { int id(); date date(); string title(); }
here is how we will list all posts in the database table:
posts posts = // we'll discuss this right now for (post post : posts.iterate()){ system.out.println("title: " + post.title()); }
here is how we will create a new post:
posts posts = // we'll discuss this right now posts.add(new date(), "how to cook an omelette");
as
you see, we have true objects now. they are in charge of all
operations, and they perfectly hide their implementation details. there
are no transactions, sessions, or factories. we don't even know whether
these objects are actually talking to the postgresql or if they keep all
the data in text files. all we need from
posts
is an
ability to list all posts for us and to create a new one. implementation
details are perfectly hidden inside. now let's see how we can implement
these two classes.
i'm going to use
jcabi-jdbc
as a jdbc wrapper, but you can use something else or just plain jdbc if
you like. it doesn't really matter. what matters is that your database
interactions are hidden inside objects. let's start with
posts
and implement it in class
pgposts
("pg" stands for postgresql):
@immutable final class pgposts implements posts { private final source dbase; public pgposts(datasource data) { this.dbase = data; } public iterable<post> iterate() { return new jdbcsession(this.dbase) .sql("select id from post") .select( new listoutcome<post>( new listoutcome.mapping<post>() { @override public post map(final resultset rset) { return new pgpost(rset.getinteger(1)); } } ) ); } public post add(date date, string title) { return new pgpost( this.dbase, new jdbcsession(this.dbase) .sql("insert into post (date, title) values (?, ?)") .set(new utc(date)) .set(title) .insert(new singleoutcome<integer>(integer.class)) ); } }
next, let's implement the
post
interface in class
pgpost
:
@immutable final class pgpost implements post { private final source dbase; private final int number; public pgpost(datasource data, int id) { this.dbase = data; this.number = id; } public int id() { return this.number; } public date date() { return new jdbcsession(this.dbase) .sql("select date from post where id = ?") .set(this.number) .select(new singleoutcome<utc>(utc.class)); } public string title() { return new jdbcsession(this.dbase) .sql("select title from post where id = ?") .set(this.number) .select(new singleoutcome<string>(string.class)); } }
this is how a full database interaction scenario would look like using the classes we just created:
posts posts = new pgposts(dbase); for (post post : posts.iterate()){ system.out.println("title: " + post.title()); } post post = posts.add(new date(), "how to cook an omelette"); system.out.println("just added post #" + post.id());
you can see a full practical example here . it's an open source web app that works with postgresql using the exact approach explained above — sql-speaking objects.
what about performance?
i
can hear you screaming, "what about performance?" in that script a few
lines above, we're making many redundant round trips to the database.
first, we retrieve post ids with
select id
and then, in order to get their titles, we make an extra
select title
call for each post. this is inefficient, or simply put, too slow.
no worries; this is object-oriented programming, which means it is flexible! let's create a decorator of
pgpost
that will accept all data in its constructor and cache it internally, forever:
@immutable final class constpost implements post { private final post origin; private final date dte; private final string ttl; public constpost(post post, date date, string title) { this.origin = post; this.dte = date; this.ttl = title; } public int id() { return this.origin.id(); } public date date() { return this.dte; } public string title() { return this.ttl; } }
pay attention: this decorator doesn't know anything about postgresql or jdbc. it just decorates an object of type
post
and pre-caches the date and title. as usual, this decorator is also immutable.
now let's create another implementation of
posts
that will return the "constant" objects:
@immutable final class constpgposts implements posts { // ... public iterable<post> iterate() { return new jdbcsession(this.dbase) .sql("select * from post") .select( new listoutcome<post>( new listoutcome.mapping<post>() { @override public post map(final resultset rset) { return new constpost( new pgpost(rset.getinteger(1)), utc.gettimestamp(rset, 2), rset.getstring(3) ); } } ) ); } }
now all posts returned by
iterate()
of this new class are pre-equipped with dates and titles fetched in one round trip to the database.
using decorators and multiple implementations of the same interface, you can compose any functionality you wish. what is the most important is that while functionality is being extended, the complexity of the design is not escalating, because classes don't grow in size. instead, we're introducing new classes that stay cohesive and solid, because they are small.
what about transactions?
every object should deal with its own transactions and encapsulate them the same way as
select
or
insert
queries. this will lead to nested transactions, which is perfectly fine
provided the database server supports them. if there is no such
support, create a session-wide transaction object that will accept a
"callable" class. for example:
final class txn { private final datasource dbase; public <t> t call(callable<t> callable) { jdbcsession session = new jdbcsession(this.dbase); try { session.sql("start transaction").exec(); t result = callable.call(); session.sql("commit").exec(); return result; } catch (exception ex) { session.sql("rollback").exec(); throw ex; } } }
then, when you want to wrap a few object manipulations in one transaction, do it like this:
new txn(dbase).call( new callable<integer>() { @override public integer call() { posts posts = new pgposts(dbase); post post = posts.add(new date(), "how to cook an omelette"); posts.comments().post("this is my first comment!"); return post.id(); } } );
this code will create a new post and post a comment to it. if one of the calls fail, the entire transaction will be rolled back.
this approach looks object-oriented to me. i'm calling it "sql-speaking objects", because they know how to speak sql with the database server. it's their skill, perfectly encapsulated inside their borders.
related posts
you may also find these posts interesting:
Published at DZone with permission of Yegor Bugayenko. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments