Good news loyal readers, Trek is here to dive down into the transactional semantics of NuoDB again. Huzzah! In particular, I'd like to direct the worthy attention of this blog's readership to the meaning of traditional SQL isolation levels in terms of MVCC. Some of you may be wondering what an isolation level is. It is one of the dustier corners of the SQL standard, so I'll start with a quick review of why and whither isolation levels.
Isolation Levels: Quick and Dirty Overview
Database transactional semantics are often described with the handy acronym: ACID. And a fully transactional system should exhibit atomicity, consistency, isolation and durability. However, all those properties have a cost. Sometimes, an application developer would like to relax some of the A, C and I properties if it means an increase in performance or a simplification of the application. An example would be an application that has some logic who's job is to update some statistics. In this case, let's say it needs to update a global accumulator for client-side measured latency and bump up a counter for number of clients. This could all be done with full-blown snapshot isolation for each transaction, however they would interfere with each other. Because addition is inherently commutative, the application developer knows that these two operations don't need to operate against the 'exact' visible version always. They can be performed against the 'latest' version. This should allow 2 transactions to update the counters without causing transaction aborts. In 'classic' SQL, there are 4 specified isolation levels. They are, in order of increasing isolation: READ_UNCOMMITTED, READ_COMMITTED, REPEATABLE_READ, SERIALIZABLE. Check here for an in depth overview.
If you followed that link, or are familiar with the standard's definition of isolation levels, you may recall that it doesn't mention versions or visibility. That's because the standard's definition was in terms of a lock-based database. In fact, the write behavior for each of these isolation levels is defined in terms of the lock's mode (shared vs. exclusive) and the duration. This is very confusing for developers working on an MVCC database like NuoDB, because the whole point of MVCC was to eliminate the need for explicit table/row locking. But don't fret, potentially frustrated developer, the rest of this blog post is dedicated to describing these isolation levels in MVCC terms. And then providing helpful examples!
NuoDB's Isolation Levels:
NuoDB is a distributed MVCC database. Therefore, discussion of read anomalies and lock durations aren't a straightforward way to describe isolation level semantics. From the point of view of MVCC, everything needs to be described in terms of visibility. From the point of view of NuoDB, there are two visibility domains. One is the most recent committed version of a record (I'll call this MOST_RECENT), the other is the version of the record that was the most recent committed at some point in time (I'll call this SNAPSHOT_VERSION).
MOST_RECENT visibility for any row is the most recent version of that row that was installed by a committed transaction. Because NuoDB is a distributed system, a node will be informed about a commit by the node that was responsible for the committing transaction. Therefore, there's a kind of temporal skew possible where some nodes will interpret a given transaction as committed, and others (who haven't yet received and processed the commit message) will interpret it as currently active. What this means is that MOST_RECENT version isn't stable across invocations, because transactions executing on remote nodes may transition from active to committed state between version calculations.
SNAPSHOT_VERSION visibility is completely stable. When a transaction is started, the set of all transactions already committed determine exactly the set of visible records (for the entirety of that transaction's duration). This means that under snapshot isolation, a transaction will never see the set of rows returned by a SELECT change from one invocation to the next (except for when that transaction itself has updated the contents of the table(s) being SELECTed from). The consequence of this is that it is possible to get update conflicts where two transactions in snapshot isolation are attempting to update the same row version. And that the only way to resolve such conflicts may be to roll back one of the conflicting transactions (to 'reset' its snapshot).
Isolation Level MVCC Semantics:
The table below summarizes the isolation levels NuoDB supports by their MVCC behavior. We here at NuoDB's undersea geodesic dome lair, have elected to call snapshot isolation CONSISTENT_READ, but it's classic snapshot isolation. NuoDB doesn't support REPEATABLE_READ exactly, because snapshot isolation is strictly stronger than repeatable_read (repeatable_read permits phantom reads, snapshot isolation prevents them).
Isolation level visibility rules
|Isolation Level||Version read by SELECT||Version written by UPDATE|
READ_COMMITTED is similar to what one would get with many NoSQL products. It always returns the most recent version. For transactions that don't care about global read consistency or who are doing strictly commutative or idempotent operations, READ_COMMITTED offers the minimal level of isolation. An application with less stringent consistency requirements may be able to run in READ_COMMITTED mode correctly, and achieve higher throughput than at CONSISTENT_READ.
CONSISTENT_READ is good ol' snapshot isolation. This is arguably the most sensible isolation level for a distributed database, as it completely insulates transactions from non-conflicting write behavior on other nodes. NB: in NuoDB, this isolation level is also what you get if you specify SERIALIZABLE (this is not unusual in an MVCC database). Although snapshot isolation is not a standard isolation level, it is supported on many databases. In fact, every DB that uses MVCC will be able to do snapshot isolation. It is stronger than REPEATABLE_READ but not as strict as global serializability.
WRITE_COMMITTED is a NuoDB-only isolation level. It is intended for use in situations where the reads should be stable, but the updates can be correctly performed against the most recent committed version. The example of the statistics updating transaction would be an example of a transaction that could benefit from WRITE_COMMITTED.
Example 1: Reads
The following examples will make these visibility rules more concrete. For these examples, assume the following initial state:
SQL> create table t1 (f1 integer); SQL> insert into t1 values (1), (3), (5), (7), (9); SQL> select * from t1; F1 --- 1 3 5 7 9
The important point here is that a transaction in READ_COMMITTED will be able to read the latest committed version, therefore in this example it saw the inserted value 2 AFTER transaction 1 committed. Those transactions with snapshot read visibility won't see any change in SELECT output. It is important to remember that transaction 1 had to commit before any change became visible to other transactions. NuoDB doesn't permit so-called dirty reads. And that is a good thing.
Example 2: Conflicting Updates
This example will illustrate differences in isolation levels when 2 transaction's write sets overlap.
READ_COMMITTED/WRITE_COMMITTED: Here, transaction 2 detected transaction 1's updates and blocked awaiting a final state for transaction 1. When transaction 1 committed, transaction 2 evaluated the update predicate in terms of the most recent committed version, since there was no longer a record where f1 = 3, transaction 2's update updated no rows.
CONSISTENT_READ/Snapshot Isolation: The update causes an update conflict in transaction 2. This is because there's no consistent way for transaction 2 to both select a record version and update that record version without clobbering transaction 1's update. Therefore the update will fail, and the client must decide what to do next.
Example 3: Non-overlapping Writes
This example will illustrate the write behavior when the write sets do not overlap, but the writes are all against the same table.
This case illustrates the main difference between CONSISTENT_READ and the other isolation levels. In CONSISTENT_READ, because the updates shared no rows in common, both transactions could proceed without coordination. With READ_COMMITTED and WRITE_COMMITTED, because the version to be updated depends on the most recent committed value, any table update will force the late-arriving transaction (transaction 2) to have to block until the updating transaction (transaction 1) has finished. If the updater commits, then the other transaction will use the updated row version. If the updating transaction aborts, then the other transaction will have to use the previous row version. This choice of row versions cannot be performed until the updating transaction has finished.
Isolation levels are a complicated topic. On the one hand, they're all about letting a user relax the consistency properties of the system to squeeze out some performance, and consistency is totally a good thing. On the other hand, they allow a sufficiently capable developer to significantly increase the performance of a DB application. They are part of the standard, so NuoDB supports them. From the examples above, I hope the meaning of the actual isolation levels is reasonably clear. I hope it's also clear that there's no cut-and-dried formula for choosing an isolation level. In general, CONSISTENT_READ (snapshot) isolation is totally safe and probably ideal for most applications. If the transactions won't be updating the same rows all the time, CONSISTENT_READ is usually the most performant option because it won't require blocking on rows that aren't part of the update set. However, if there will be a lot of write contention on some shared rows, READ_COMMITTED or WRITE_COMMITTED might be the best option. I hope you found this blog post to be useful and informative. Stay tuned for more information, especially as it relates to error handling and update latency.