Eventual consistency is everywhere in the real world

DZone 's Guide to

Eventual consistency is everywhere in the real world

· Agile Zone ·
Free Resource

Our habit, at least in the 2000s, was to build web sites and web applications backed by a relational database. In my case these web products were built with PHP, a language that does not usually spawn resident processes running on the server, but only consists of scripts started on demand.

This kind of storage and language gave me (and some other millions LAMP developers) a bias towards the immediately consistent model: a transaction is performed during each HTTP request, and at the end of it everything that should have been changed is consistently in the new state; every row in every table of the database is up to date.
Today we're seeing a transition towards a model full of NoSQL databases, often based on eventual consistency; towards message queues, event-driven programming and asynchronous processing; have you heard of Gearman?

These concrete examples are taken from software development and from the real world: I hope they aid you in adding the new paradigm to your toolbox (without necessarily disregarding the old one) and in evangelizing it with other people.

What's reality now?

Although many architectural choices try to hide this fact, our applications can be built as distributed systems running on different machines. It's not the time for defaulting to a MySQL connection to localhost or to a machine in the Intranet anymore.

Our data can be kept in different machines and different projects, owned by different people (integrated anything with Facebook?) Eventual consistency is the only way to scale in these scenarios; and by scale I do not mean supporting a million user on a single CPU, but just building a useful application that does not have to wait 60 seconds for each page to load.

What was the world about?

Once upon a time, everything worked with paper (at least until several tens of years ago); immediate consistency is an invention of IT people. When paper sheets had to travel between organizations and different places, there was no requirement for immediate consistency, as it wasn't even possible.

Sometimes immediate consistency is an improvement that makes the fortune of a startup: Dropbox and Google Docs eliminate the pain of synchronization and multiple versions of files and documents in a beautiful way.

Sometimes the business doesn't care really much (or at all) for forcing consistency over a large set of data.

Amazon recommendations

Amazon has a nice system for increasing sales: presenting recommendations on the page of a product, basing on which products Amazon's users has bought in the past:
The set of bought-together-products for recommendations are usually built over the whole incidence matrix users/products. Updating immediately such a model would be impossible: every time a product is bought the page would be locked up.

So how Amazon scales to selling hundreds of products continuously and generating their pages containing recommendations at the same time? In the simplest way: with a little layer of caching.

In fact, almost every cache is based on eventual consistency. I would exclude cases when cache invalidation is implemented deterministically; in all the other cases, caches are the most diffused examples of eventual consistency. The difference between them and NoSQL consistency is that many caches expire after a finite amount of time. You can cache an HTML page for 10 seconds, but the content of a CouchDB view would always favor availability to consistency; even in the case where the results are very old, querying a CouchDB view would never block waiting for recomputation.

Google search results

The same consistency model applies to Google (and any search engine): crawling takes a finite amount of time, and search results are at least always inconsistent with the current state of the web. It couldn't be different.

However, there is another level of eventual consistency. Although the indexes are probably updated hourly in 2011, spreading updates across data centers took quite a while in the past, in a phenomenon called Google dance.

During a Google dance, each data center would return different results and ordering in SERPs, while the PageRank of each result was updated. Yet Google didn't suffer this occasional lack of consistency and went on to become a web superpower.


According to the CAP theorem, we can only achieve two properties in a distributed system, chosen between Consistency, Availability and Partition Tolerance.

Although Dropbox enabled immediate consistency via synchronization in many cases, it must work with Partition Tolerance: it's a distributed system between your devices (pc, tablets, smartphones) which you can use even without a connection. You can work on files in Dropbox's local folder while commuting, in a gallery, on the sea, in the mountains, where 3G is not available or when you want to save battery or bandwidth for later.

Thus Dropbox promises to give you a folder that syncs. It hides lots of complexity under the carpet, like its delta compression algorithms and management of conflicts, or its syncing on LAN whenever possible and syncing over the Internet in any other case.

Dropbox embraces eventual consistency: transactional consistency would be impossible to obtain over a large network like Internet as your word processor would freeze each time you hit Ctrl+S and the updates are pushed to other devices. Immediate consistency is also impossible to obtain in case of a partition in the network like the absence of a working connection.

The final result? You work on your document, which will be updated when the connection is available. And then updated on your other devices when their connection is available. Sometimes it takes a lot to sync, if you have large new files to upload. But you can continue working, although not from multiple devices at the same time (Dropbox is oriented to personal syncing, not on collaboration. So it's not a real limitation.)

The hottest startup of the moment, based on explicitly avoiding immediate consistency and getting away with it.

Banking and money

It still takes some days to transfer money between Italian banks. You can imagine vans full of bank notes and policemen driving from one bank to another. Actually today banks communicate over a digital network, but still the protocols work as before. Wire transfers are scheduled for the end of the day, and transfer are made off hours (probably to simplify the accounting rules).

Bank transfers are always chosen as the example of a database transaction. That scenario may be true only when the accounts are situated in the same bank: it's not uncommon to transfer money between San Paolo and ING Direct and actually see the money vanish for some days as it is subtracted from one account and not yet added to the other one (someone with more knowledge in the banking domain could clarify this scenario.)

Of course banks provide every kind of logs so that money never disappears, but if you observe the external state of the system you will notice that at a first glance you are poorer while a wire transfer is taking place.

Again, sometimes immediate consistency is an innovation: Paypal transfers money overall the world in seconds, allowing a buyer to checkout on an e-commerce site istantaneously.

However, vendors accepting Paypal will often tell you We have accepted your order, now wait for us to come back to you... while they look for the physical item in their warehouses. Online offers are not always consistent with the actual inventory...


Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}