Here's the Problem with Database Consistency Among the NoSQL Crowd
Join the DZone community and get the full member experience.
Join For Free I've written a few times about database consistency before, mainly in conjunction with NoSQL and the concept of Eventual consistency. Now, I'm about to do an update on the subject, as I have come to realize a few things.
From
an oldtimer like myself, having been an SQL guy for 25 years, I
remember Punk-rock and even The Beatles and I having hair growing out of
my ears, what can be contributed? Well, let me beging with stating what
I mean when I say Database consistency. What I mean is Consistency as the C in ACID
(no, we aren't talking drugs here, we are talking databases). Let's see
what the online authorative reference work on just about anything on
this planet, from the size of J-Lo's feet to the number of Atoms in the
universe (those two numbers are quite far apart by the way), Wikipedia: "The consistency property
ensures that any transaction will bring the database from one valid
state to another. Any data written to the database must be valid
according to all defined rules, including but not limited to
constraints, cascades, triggers, and any combination thereof." In
other words, consistency means that the databas is always in a
consistent state, the different data items in it (rows, if you wish) are
"in sync" with eachother. I think most of you agree with this notion.
Now, when it comes to NoSQL databases, like MongoDB, this terminology is different. These guys introduced Eventual consistency,
which means that the database will eventually reach a consistent state
with regards to a specific transaction that changes that "state" of the
database. But there are multiple transactions at the same time, and they
aren't necessarily, in an Eventual consistentcy
model, consistent with eachother as they aren't on the same node. But
the theory goes that some time, eventually, they will. If the system
never stops, and transactions keep coming, then eventual consistens is
determined to happen within 100 ms or less from the point in time when
pigs fly. But if you stop all state changing transactions, then the
state of the database will reach consistency. Eventually.
Now in NoSQL circles there is a thing called a Consistent read.
If my database was consistent, then any read is consistent, right? And
in the case of use SQL RDBMS folks, consistency is about the state of
the database when I write to it? Well, if you have an eventual
consistency model, where you have data distributed all over the place,
things are different. To begin with, the basic thing that you have to
make sure, and the NoSQL databases do this, is to ensure that the writes
to the databases are all in order (we know this from MySQL also, and it
is part of the issue with the MySQL slaves, and the NoSQL guys aren't
fixing this particular bottleneck). And here we mean they are in order
in each and every node. Across nodes, we don't care, which is where I
get my abilility to scale out writes from!
A consistent read is a
read where the data I am reading is in a consistent state, or sometime
that my data is the most recent data. These two aren't always the same,
but the second (reading most recent data) typically implies the former,
although I assume this is not always the case. This is VERY different
from the meaning of Database Consistency as we RDBMS folks look at it.
All the same, the concept sure is useful, and as the NoSQL distributed
systems doesn't need to keep the data consistent on a global level, a
lot of shortcuts can be taken. But having Read Consistency has litte to do with Database Consistency.
Your NoSQL fans will complain here and try to tell you that these
achieve the same thing, but they don't. Achieving global Database
Consistency costs an arm and a leg or two in performance, but the
database is ALWAYS consistent.
So two different things, both with
advantages and disadvantages, but they are STILL different! And the
NoSQL folks will confuse things by allowing you not to have even Read Consistency,
somewhat implying that turning it on means you get Database Consistency
and that the Read Consistency model (which is very very simple by the
way) means you get the effect of Database Consistency using Eventual
Consistency. Nope. You don't. Which doesn't make it bad, but IT IS NOT THE SAME THING!
/Karlsson
Source: http://karlssonondatabases.blogspot.com/2012/02/more-on-database-consistency.html
Opinions expressed by DZone contributors are their own.
Comments