CAP Is Not the Whole Story: Introducing Trust and Blockchain
In this article, an expert gives us a great introduction to several key principles that are currently guiding the advancement of blockchain technology.
Join the DZone community and get the full member experience.Join For Free
The CAP theorem asserts that in any distributed data store only two out of three guarantees can be provided regarding consistency, availability, and partition tolerance. But what about trust?
In commercial systems on the internet partition tolerance can never be fully guaranteed, limiting the choice to either consistency or availability. As availability has the most significant impact on revenue, the system design of distributed data stores for businesses reduces to a choice between different eventual consistency recovery strategies.
Seen in this light, although blockchain is a strong eventual consistency recovery scheme with reconciliation through a consensus protocol, it is an inefficient and expensive method for achieving such consistency.
However, when a fourth overlooked guarantee is added to the CAP mix — namely the question of trust — blockchain comes into its own. In this article, I argue that the significance of blockchain when considering distributed data lies in its provision of a new approach to an extended version of the CAP theorem: CAPT, which asks, “to what level can you guarantee consistency, availability, partition tolerance, and trust?”
Historically, distributed data stores have been owned and operated by a single entity such as a corporation, often acting as a middleman in transacting between different external parties. The question is then reduced to a binary choice — either you trust the entity and therefore submit transactions to and retrieve data from their system, or you do not trust them and either: select another service provider, “roll your own” solution, or abstain completely.
A result has been that the issue of trust does not figure in Brewer’s CAP theorem. The theorem has been refined over the years to consider consistency and availability as lying on a continuous spectrum and has informed further research into ACID versus BASE, but apart from this, it has remained unchanged. Partitions in traditional database network architecture are seen as very unusual events.
In a blockchain, the occurrence of partitions is significantly more likely. Blockchain attempts to handle a new emerging case in data storage systems whereby parties that are completely untrusted or only partially trusted are participating in the network on an equal footing.
Not only are accidental partitions due to network or node failures a possibility, but a public blockchain, in particular, has to deal with maliciously generated partitions, chain forks due to competing nodes, and software forks due to community disagreement. Standard fault tolerance methods such as Praxos and Raft are no longer sufficient, and a new strategy is required to resolve these discrepancies.
Blockchain originated from an attempt to solve the “double-spending” problem in the context of a decentralized system, in which a digital token recorded on a ledger or in a database can be spent more than once unless suitable consistency approaches are taken. In a centralized system, the double-spend issue is not a problem, because the central authority has the final say over the “single view of the truth.”
So let us examine how blockchain resolves the fourth guarantee of trust and allows for an estimate of such trust on a spectrum, from completely untrusted or unknown and untraceable participants on the network, right up to a fully trusted network of participants. This will highlight the observation that ultimately the purpose of blockchain is enabling the sharing of data and transmittal of transactions or states between different stakeholders across low trust boundaries, and that this should inform the development of use cases for blockchain.
What Is Trust?
When you submit your personal data to a third party for storage, dissemination, and retrieval, or delegate your financial transactions to a bank or credit card provider, you are making a trust decision.
Yet hacks such as the Ashley-Madison and Equifax data leaks show that in many cases such trust is unfounded. Similarly, the emergence of Bitcoin as an alternative value transmission protocol emerged as a response to perceived malfeasance by the financial sector. And yet the CAP theorem is notably silent on the issue of trust.
This may be due to the development and deployment of public, decentralized, and distributed systems being relatively new, or perhaps the fact that third-party data storage and transaction systems are closed to inspection. Such “security through obscurity” makes a meaningful assessment of the risk associated with trust impossible until it is revealed as unfounded after a breach. As with the initial formulation of the CAP theorem, trust is then a Boolean option – either you trust the system completely (T = 1), or you do not (T = 0).
Understanding how an estimate of trust can be calculated for a blockchain requires a review of the conditions under which a “single view of the truth” may be lost and subsequently regained, and is the topic of the next section.
Uncles, Orphans, and Forks
There are three changes that dramatically affect blockchain consistency and that are not standard partitioning as experienced by conventional distributed databases: orphan or uncle blocks, soft forks, and hard forks.
At their simplest, each is an event in which the data chain comprising the blockchain splits and grows along two separate paths, with subsets of the blockchain participants following each path. In a successful blockchain implementation, the split should eventually be resolved, with one chain winning out over the other deterministically.
(Note: There is some confusion regarding the terminology for accidental chain forks based
on duplicate “next block” candidates for extending the chain. The terms “stale blocks”
and “orphan blocks” are used interchangeably by some authors. Others reserve orphan
blocks to mean blocks for which the parent block has not yet been downloaded during
a peer node sync operation. The confusion results from a misuse of the term “orphan”
in the original Bitcoin code base. At the time of writing, although the term orphan is a
misnomer, it is most commonly used to indicate valid yet non-included blocks. Ethereum
uses the term “uncle block” to describe these.)
Orphan Blocks and Chains
The blockchain extends over time as successive blocks of data are added to the chain, forming a hash-linked list. A fork due to orphan blocks occurs when there are two equally or nearly equally valid candidates for the next block of data in the blockchain, at which point the list becomes a tree. This event can occur when the two blocks are found close in time and are submitted to the network at different “ends” so that one subset of nodes receives the first candidate before the second, and the second subset of nodes receives the second candidate before the first. Each subset of nodes continues to attempt to grow the chain from their chosen candidate. Eventually, one chain will grow faster than the other, and participants working on the shorter chain will switch, resulting in the more successful chain growing even faster, until all participants have switched to the winning chain. The blocks in the losing chain are then described as “orphaned.”
Orphan blocks are generally a natural event in the blockchain ecosystem, and will always be resolved eventually. In Bitcoin, depending on the node in question, about 0.1% to 2% of valid blocks observed are orphans, and they typically resolve within one further block generation cycle. In Ethereum, orphan or uncle blocks are actually encouraged, and although their transaction content does not form part of the data set, the effort expended in creating them counts towards the proof-of-work hardening of the chain.
However, malicious blocks can also be created, for example, in an attempt to attach to an earlier block in the chain. If a malicious block breaks the consensus protocol it will be rejected by the rest of the network, but a valid malicious block may intentionally be created earlier in the chain in order to create a new fork that benefits the attacker. In this circumstance, the blockchain has to rely on the majority of the network being benign and ignoring the new “evil” chain in favor of the old.
Soft forks occur when part of the network agrees to alter the blockchain protocols in a backward non-compatible manner that places further restrictions on what constitutes a valid block. Under a soft fork, participants who refuse or fail to upgrade to the new protocol will still accept blocks that are created by the new updated software. However, they may continue to submit blocks that are no longer accepted by the participants who have upgraded. If the majority of the network has accepted the soft fork, the laggards may end up mining blocks that will not make it into the main chain, which is a waste of their computing power and hence the financial rewards that public blockchains typically award to block-creators. This provides an incentive for the participants to upgrade.
Soft forks allow for a gradual upgrade of the nodes operating on the network, provided a majority accepts the changes when they are released. They are the preferred method for improving the underlying blockchain mechanisms because they usually result in less chaos. Technically, the new software could reject the validity of the existing chain up to the fork point, if not for the fact that the software normally has a marker indicating when the fork occurred and uses the old protocols for validation up to that point.
The most difficult change to adopt is the hard fork, under which the majority of the nodes agree to alter the protocols in a backward-compatible manner, by extending the protocol. Under a hard fork, future blocks will be rejected by nodes that do not upgrade. Hard forks are the most likely to generate problems in the blockchain ecosystem and have resulted in blockchain systems splitting in two permanently, for example, Bitcoin Cash (BCH) versus Bitcoin Classic (BTC).
An example of an unintended hard fork (which subsequently had to be reverted) was the update of Bitcoin from 0.7 to 0.8 in March 2013, in which the underlying database was changed from Berkeley DB to LevelDB. Unbeknownst to the developers, version 0.8 with LevelDB was able to handle the inclusion of more transactions into a block than version 0.7 was. The split between 0.7 and 0.8 miners on the network was approximately 40/60, and, as a result, two chains developed. The final orphaned chain managed to reach an unprecedented length of 24 blocks before the fork was resolved. This remains the longest orphan chain in Bitcoin history to date.
Traditionally centralized data storage systems do not need to deal with forks, as the authority can force through any protocol or design change they chose. Similarly, orphan blocks do not exist, as data is not stored by adding packaged data to a hash-linked list.
Although including the calculations in this article is beyond the current scope, you can read more about assigning actual numbers to trust in a companion paper to this article published here. In particular, a simple yet effective model of a general blockchain is presented, allowing a theoretical calculation of the probability of an orphan block or chain forming at any given time interval within a blockchain system. For Bitcoin, the predicted value of 1.74% corresponded very closely to an empirically measured value of 1.69%.
Private and public blockchain systems face different issues in building consensus, but they do have some features in common. Both public and private blockchains will face a consistency problem due to network latency, which needs to be overcome by the blockchain consensus protocol. The probability of a consensus discrepancy and the expected resolution time under normal operation can be calculated empirically or theoretically with a reasonable level of accuracy.
Public blockchains also suffer from a significant level of risk due to malicious attackers. Although it is possible to calculate the chances of success given knowledge of an attackers resources, it is not possible to provide an estimate of the probability that someone will actually mount an attack against a given public blockchain.
Private blockchains are less likely to suffer an attack, as the participants can be clearly identified, and due to the audit trail left by the tamper-proof record of the blockchain, any damaging activity can be attributed to the responsible party, who can then be expelled from the blockchain system. The damage can subsequently be corrected.
Unlike traditional centralized data storage solutions, blockchain allows for the sharing, editing, and creation of data between different entities across lower trust boundaries, and, under certain circumstances, it is possible to produce a measure of the likelihood of, and extent to which, the system will be inconsistent, together with an estimated resolution time. In the CAP theorem, there is no accurate determination of the level of consistency or availability provided; nor is the probability of partitioning within the system able to be estimated.
With Blockchain, we can begin to examine the possibility of distributed data storage systems in which the probability of the system functioning can be calculated, despite trust in the parties involved being less than absolute. This is a significant advance in our technical capabilities.
Published at DZone with permission of Keir Finlow-Bates. See the original article here.
Opinions expressed by DZone contributors are their own.