CRDTs Explained: How Conflict-Free Replicated Data Types Work
Explore Conflict-free Replicated Data Types. Data structure designed to ensure that data on different replicas will eventually converge into a consistent state.
Join the DZone community and get the full member experience.
Join For FreeIn today's post, I would like to dive deeper into one of the newest—relatively speaking— topics in the distributed systems domain. As you may have guessed already, the spotlight is on Conflict-free Replicated Data Types or CRDTs for short. I will explain what they are and what role they play in the larger landscape of distributed systems.
Let’s start our journey from explaining what Strong Eventual Consistency (SEC) means in this context.
Why SEC Matters
Consistency is one of the most important—if not the most important—traits in any system. However, the original strong consistency model imposes a significant toll on performance. It also limits the scalability and availability of our systems.
As a result, “weaker” consistency models became more and more popular and widely adopted. Eventual consistency promises to solve some of the issues created by strong consistency models. However, it also introduces some totally new types of problems—conflict resolution is one of them.
SEC aims to tackle this particular issue. It is a consistency model built atop an eventual consistency that aims to provide a conflict-free environment to ensure availability in the face of failure. It also reduces the cognitive load put on system architects by removing the need for implementing complex conflict-resolution and rollback logic.
The theoretical base for SEC is simple mathematics properties like monotonicity, commutativity, and associativity. As such, it is only valid for very specific data types and operations. These data types are commonly denoted as CRDT.
It's not surprising, taking into consideration SEC was introduced in the original, as far as I know, CRDT paper.
What are CRDTs?
CRDTs are a data structure designed to ensure that data on different computers (replicas) will eventually converge—and will be merged—into a consistent state. All of that, no matter what modifications were made and without any special conflict resolution code or user intervention.
Additionally, CRDTs are decentralized, and thus, they do not need any coordination between the replicas. Particular replicas exchange data between each other. This trait makes them quite interesting and different from algorithms used in most online gaming and Distributed File Systems (DFS).
We can differentiate two basic types of CRDTs: object-based and state-based. There is also a delta-based type which is an extension on top of the state-based CRDTs family.
Convergent Replicated Data Types (CvRDTs)
CvRDTs are state-based CRDTs. They rely on continuous exchanges of current states between particular replicas. By the way, this is a classic use case for the gossip protocol. When a replica receives the new version of the state, it uses a predefined merge function, effectively updating its own state. In such a setting, when updates stop coming, all the replicas will reach the same, consistent state.
Keep in mind that the key here is that the replicas exchange the total state each time. Thus, the size of messages may become quite big.
Commutative Replicated Data Types (CmRDTs)
CmRDTs (also called operation-based CRDT) are an alternative to state-based types. Contrary to state-based types, they do not have a merge method. Instead, they split the update operations into two steps: prepare update and effect-update. The first phase is executed locally at a particular replica, and it is directly followed by the second phase that executes across all other replicas, effectively equalizing the state across the whole deployment.
However, for the second phase to work correctly, CmRDTs require a reliable communication protocol that provides causal ordering of messages. While it is not a very complex problem, because such tools are very common nowadays, it adds another layer of complexity.
Equivalence
There is one interesting fact regarding both CvRDTs and CmRDTs. They are equivalent to each other, at least from a mathematical perspective. In the previously linked paper, there is an entire subsection (3.2) explaining in great detail why this statement holds true. I will not be copy-pasting the same text here—TLDR, it is based on emulating one type with the other.
Delta-State Conflict-Free Replicated Data Types (δ-CRDT)
Paulo Sérgio Almeida et al., in their paper Efficient State-based CRDTs by Delta-Mutation, proposed δ-CRDT. It is an extension on top of classic state-based CRDTs which addresses its biggest weakness; the continuous exchange of messages containing the full state of the object. There are two key concepts used to achieve this, namely, the delta-mutator and a delta-state.
The δ-state is a representation of changes applied by the mutator to the current state. This delta is later sent to other replicas, effectively reducing the size of messages. Additionally, to reduce the number of messages exchanged between replicas, we can group multiple deltas into a delta-group.
I do not want to get too much into the details of different types; there is much more here to uncover. If you are interested in all the math behind CRDTs, you can find all of these details here.
CRDTs Fault Tolerance
In terms of classic availability and fault tolerance, CRDTs are quite an interesting case. Their base consistency model—SEC—promises to provide very high resilience. It is possible, mostly thanks to the eventual consistency of SEC itself, but also the resilient nature of the CRDTs algorithms themselves.
In case of state-based CRDTs, they exchange full state between each other, so besides the case of total failure, sooner or later the replicas should be able to converge into a consistent state.
On the other hand, in the case of operation-based (op-based) CRDTs, the update-effect is cumulative, so again no matter the order of messages spread throughout the replicas, they will also be able to converge on an equivalent state.
With delta CRDTs, the situation is similar, as it is built upon both op- and state-based types.
There are 3 traits of CRDTs that make them especially resilient:
-
Decentralized
CRDTs operate without a central coordinator, eliminating single points of failure. Thus, they naturally handle network partitions. Updates are applied locally and propagate when communication is restored. -
Asynchronous Communication
CRDTs utilize only async communication, either via a gossip-based protocol or some broadcasting protocols. Nodes do not need to wait for any type of acknowledgments, nor do they use any type of consensus algorithm. The CRDT-wide state convergence happens asynchronously. -
Node Failures and Recovery
Nodes continue to store and process their local state even in case of network failure. Upon recovery, they can synchronize with other replicas to merge any missed updates.
Byzantine Fault Tolerance
Despite all the traits above and in general, very high fault tolerance, CRDTs are not fully invincible. There is a very particular type of failures which CRDTs cannot easily recover from—Byzantine faults. Ironically, the exact same thing that makes CRDTs so highly available—decentralization—is also the main factor of them being susceptible to Byzantine fault.
Byzantine faults occur when nodes in a distributed system behave maliciously or send malformed states, potentially leading to inconsistencies. In such a situation, reaching a consistent state across all the replicas through a gossip-based protocol or broadcast can be highly problematic. Unfortunately, at least in this case, CRDTs heavily rely on exactly these approaches.
Making CRDTs Byzantine fault-tolerant is a relatively new and hot topic among researchers focused on distributed systems, with Martin Kleppmann’s paper Making CRDTs Byzantine Fault Tolerant being one of the most cited CRDTs papers ever.
CRDTs vs CAP
CAP Theorem describes the spectrum of availability and consistency while stating that having both of them at the same time is not possible. CRDTs put this claim into question to some extent, at least part of it, as CAP is more nuanced than just consistency vs availability. CRDTs promise very high availability and eventual consistency.
CRDT replicas are always available for reads and writes no matter the network partition or failures, and what is more, any subset of communicating replicas will eventually be consistent. While it is not the same as the lineralization required by CAP, it still gives strong guarantees as to the eventual state consistency.
CRDTs show that CAP is more of a spectrum than an exact choice, and that we can balance both availability and consistency throughout our system.
Types of CRDT
The full list of all existing CRDTs is very, very long and would require multiple pages to list, not to mention describe. Here I will cover only some basic types which can later be used to build more complex structures. Let’s start with a simple register.
Register
Register is the simplest CRDT structure. It is responsible for holding a single value, like a variable. There are two basic semantics for building CRDT registers, depending on how they approach the resolving of concurrent writes:
- Multi-value Register - Stores and returns all concurrently written values, effectively returning a multi-set. Requires a conflict resolution mechanism on a higher level.
- Last-write-wins Register (LWW) - As the name suggests, only the newest value will be stored in the register.
Counter
A counter is similar to a register in the fact that it stores only one value, to be precise, a numeric type. In the case of the counter, we can also differentiate two basic types:
- Grow-only counter (GCounter) - The simplest counter that only supports an increment operation. In this counter, each replica holds its own state, and the global state is the sum of all local counters.
- Positive-Negative Counter (PN-Counter) - Somewhat more complex counter; it supports both increment and decrement operations. It tracks increments and decrements as two counters (GCounters in particular). The result is computed by counting the difference between them. Global state, similarly as in the case of GCounter, is the total sum of all counters across the nodes.
Set
Surprising as it may be, this is just a normal set, but distributed in a CRDT manner. We have multiple different set-like CRDTs.
- Grow-only set (GSet) is one of the most basic ones. It works almost the same way as GCounter, so I will not spend too much time on it.
- Another one is USet that works in a similar fashion to PN-Counter, using GSet to handle adds and removes. The USet returns the set difference between them.
- We also have Add-wins sets that favor the add operation while resolving conflicts between addition and removal of a particular element in the set.
- There is Remove-wins set that works in the directly opposite manner to Add-wins set and favors removal operations during conflict resolution.
- Later, we have even more CRDTs like Last-write-wins set, ORSet (observable-removal), and many more.
Sequence
Sequence CRDTs are a very specialized type of structure. They are extensively used in the field of collaborative editing—documents shared and edited in Google Docs. There are multiple open-source implementations of this type of CRDT. Here are a few examples, with Yjs probably being the most popular one (over 17k stars on GitHub), followed by Automerge (4k stars on GitHub), and many, many more.
Map
The case of Map is very similar to the Set CRDTs. We have Add-wins Map, Remove-wins Map, Last-write-wins Map. All of these structures have similar behavior as their set counterparts but with one difference: the conflict resolution is handled on a per-key basis.
An interesting case is the Multi-value Map, similar to Multi-value Register, where the result of each concurrent put operation is stored within the same key, and conflict resolution needs to be handled on a higher level.
A more advanced case of a Map-based structure is a Map of CRDTs, for example a PN-Counter Map that holds PN-Counters as entry values. There is some more nuance behavior when we want to update such entries, but in the end, composing CRDTs is a relatively easy task.
This is just a simplified and shortened list of all available basic CRDTs, probably not even all high-level types are covered above. For example, we also have graph-based CRDTs which can be implemented using a GSet of GSets. As to the full one, I’m not sure if it even exists, however the list available here is somewhat lengthier. Moreover, as you could see above, with PN-Counter you can build more complex CRDTs from simpler ones as building blocks.
Summary
You could read a quite comprehensive introduction to the subject of CRDTs above. You now know what CvRDTs mean and what the difference is between them and delta CRDTs. You also have some insight on how they behave when put in unfavorable situations. Moreover, you know some of the basic CRDT types, what they are, and where you can use them.
If you would like to read more about CRDTs, here is a very good page. It is run by Martin Kleppmann and aggregates a lot of data around CRDTs, like white papers and actual implementations.
Thank you for your time.
Published at DZone with permission of Bartłomiej Żyliński. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments