Generally, persistent disk-oriented systems will require the additional 3rd phase in commit protocol in order to ensure data consistency in case of failures. In my previous blog I covered why the 2-Phase-Commit protocol (without 3rd phase) is sufficient to handle failures for distributed in-memory caches. The explanation was based on the open source GridGain architecture, however it can be applied to any in-memory distributed system.
In this blog we will cover a case when an in-memory cache serves as a layer on top of a persistent database. In this case the database serves as a primary system of records, and distributed in-memory cache is added for performance and scalability reasons to accelerate reads and (sometimes) writes to the data. Cache must be kept consistent with database which means that a cache transaction must merge with the database transaction.
When we add a persistent store to an in-memory cache, our primary goal is to make sure that the cache will remain consistent with on-disk database at all times.
In order to keep the data consistent between memory and database, data is automatically loaded on demand whenever a read happens and the data cannot be found in cache. This behavior is called read-through. Alternatively, whenever a write operation happens, data is stored in cache and is automatically persisted to the database. This behavior is called write-through. Additionally, there is also a mode called write-behind which batches up the writes in memory and flushes them to the database in one bulk operation (we will not be covering this mode here).
Figure 1: Read-through operation for K2
When we add a persistent store to the
2-Phase-Commit protocol, in order to merge cache and database transactions into one, the
coordinator will have to write the transactional changes to the database before it sends the
Commit message to the other participants. This way, if database transaction fails, the coordinator can still send the
Rollback message to everyone involved, so that the cached data will remain consistent with database. Figure below illustrates this behavior.
Figure 2: Two-Phase-Commit with In-Memory-Cache and Database
Handling failures is actually more straight forward whenever a database is present rather than when it is not. We always assume that the database must have the utmost up-to-date copy, and it is acceptable to reload data from the database into cache whenever in doubt (see
). Just like in
my previous blog
, the most challenging scenario here is when the
node crashes (potentially together with other nodes), because in this case we cannot tell whether it crashed before it was able to commit to the database or not. Other failure scenarios are handled the same way with database present as without.
Whenever we cannot tell whether the database commit had happened or not, we can simply
the relevant data from database into cache upon committing the transaction. This effectively ensures that database and cache always remain in consistent state.
When working with in-memory caches, we can always manage to keep the data within transactions consistent by slightly enhancing the standard
2-Phase-Commit protocol. The main advantage of in-memory vs. disk is that failure handling does not introduce any additional overhead, and we do not need to add an expensive
3rd phase to the
2-Phase-Commit protocol in order to keep caches consistent with databases in case of failures.