Over a million developers have joined DZone.

Merkle Trees vs Blockchains vs Hashgraphs

DZone's Guide to

Merkle Trees vs Blockchains vs Hashgraphs

In this post, we compare how blockchains built on Merkle Trees, as well as the latest ledger technology, Hashgraphs. Interested? You should be!

· Security Zone ·
Free Resource

Secure and manage your open source software with Flexera.

This is another new article in a mini-series that started with ‘Old-School’ Merkle Trees Rock, Merkle Trees In Pictures, Blockchains in pictures, Choosing Between Blockchains And Vanilla Merkle TreesStep Aside Blockchains, Hashgraphs Are Giving Plain Merkle Trees A Turbo Boosnow this one.

Blockchains are famous enough right now, but they build on older Merkle tree concepts which were solid enough for certain categories of applications on their own. Hashgraphs are the newest of all and they push Merkle trees back to prominence for more categories of subscribable data storage for simple or sophisticated ledger-style applications. This article compares the three:

Criteria Vanilla Merkle tree Blockchain Hashgraph
Simple Visualization Merkle tree in pictures. Diagrams show the effect of records being appended to the chain, not activity in arriving at consensus towards that end. Blockchain in pictures. Same as Merkle tree for data at rest, See later for consensus building.
Nature? Tree-like facilitating adds, changes, and deletes (moves too); Everything is mutable. Chain-like notwithstanding an append-only restriction; nodes are immutable; with a Merkle tree as underlying an implementation aspect; it is really momentarily tree-like, facilitating what appears chain-like if you zoom out. As a Merkle tree for the enduring record, at least.
Transaction commit control Via gatekeeping central authority (or delegates).

There is a possibility that in high enough volumes this centralized thing can slow the whole operation down.

There’s a confidence, though, that once an item is accepted it’s in for good.
After distributed consensus:

Nodes cooperate at scale to agree about changes to be made to the head of the chain and spend time attempting to verify consensus amongst themselves.

There is community debate as to how costly this activity is. Certainly for the block-chain variants that make it sufficiently hard for nodes to do enough mining to be able to add items to the chain transactions are not free. And as attempts are made to slow the algorithms down sufficiently to give a chance to multiple miners to participate, the cost goes up (roughly proportional to the rise in the power of computers).

There are also false additions to the chain that get undone a short period later as it becomes clear that some alternate consensus won. [See efficiency later on]. Those losing out have to re-start to some degree in getting their transactions on the chain.

There’s little attempt to determine the malevolence of the anonymous nodes so chaos ensues, much more than is wanted.
After distributed consensus:

Using an algorithm that does not require costly verifications, nodes cooperate at scale to agree about changes about to be made to the head of the chain. The algorithm is said to have a Byzantine fault tolerance (with proof) that negates the need to do those verifications.

Transactions accepted via the distributed consensus for hashgraphs are never subject to error and correction. At least in that mutation of the tree based on inputs. The inputs could still be wrong but that is another thing.

Bad actors could still be nodes participating in the distributed consensus, but they will have had to pass through an enrollment process that possibly includes screening to be admitted to the group doing consensus (if that particular hashgraph requires permissions).
Mistakes and losses during transactions Depending on other controls, a plain Merkle tree solution may be able to spot mistakes by end users (even indirectly) and participate in their correction. There are plenty of cases of the unwary permanently losing items of value in the attempt to complete a transaction.  
Relationship between nodes? One publisher (and delegates), many subscribers, some of which may be authorized to suggest record adds, changes, and deletions.

A traditional account system may optionally govern that, meaning enrollment and approval is a gate before that, even if (optionally) the reading of data may not require a login.
Nodes are peers, and there is no bar on new nodes joining in for multiple purposes. Nodes are full peers, and there is no bar on new nodes joining unpermissioned ledgers. Hashgraph doesn’t intrinsically require the slow addition of nodes, but for permissioned solutions, the architects could slow addition if they had a business reason to do so.
Node from which you would navigate? The root. The address for which remains constant. The apparent storage for records could be a single machine, or federation of machines as indicated by the root node and only discovered during node traversal. The retention and availability of the entire tree is a requirement for the system to work. The end of the chain (with caveats). The location of that effectively moves constantly. From a subscriber point of view, there is no requirement to have the entire history permanently stored, only the cumulative effects of the chain additions. As a Merkle tree, but not using a centralized (and potentially overloaded) location for that root.
Growth and performance? Suits a limited number of records, a percentage of which could change in a given period as everything is going through a single machine with classic CPU/RAM/network constraints.

There is also an arguable storage aspect as data in a tree has hashes that go back to the root of the tree and that needs to be retained even if the actual storage were distributed in a modern block-style solution
Suits an ever-increasing* number of records, including acceleration of that over time. Many blockchain implementations do not keep history at all, but only the ‘provable’ current situation. Pulling in multiple node’s CPUs, memory and network give blockchains the feel of a scalable performance of course.

* But not exponentially so. At least not for the proof of work varieties.
As a plain Merkle-tree, but as each participating node keeps its own copy, and may shard in order to scale higher than ordinary centralized Merkle trees.

The write performance for an unsharded solution would be the same as for a plain Merkle tree, but a system’s read performance keeps improving with more nodes.
Decentralized? Centralized (at least the root node is) in that all records being added changed and deleted have to be routed through that. Toward that end, it is not necessary for the node requesting mutation to have calculated the hashes in advance of that, though they probably have as their own verification step. Decentralized with a consensus-based authority featuring “proof of stake” or “proof of work” as ways to show authentic chain growth, and avoid inauthentic forks/versions of the chain. All nodes should be running the same algorithms and protocols in order for this to work. Decentralized but with a consensus algorithm that does not require double-checking over TCP/IP to verify the gossip it received and is therefore Byzantine in nature. All nodes should be running the same algorithms and protocols in order for this to work.
Fairness Posting records to the chain must go through the centralized authority or one of their delegates or partners. Fairness is a choice for those intermediaries as they are gatekeepers and you will have to have trusted them to have processed your modification or addition, and not tampered with it as it passed through their hands. Blockchains are intrinsically fair to all unless more than one-third of participating nodes (miners) are malicious, as that is the threshold needed for nefarious records to be appended. In that eventuality, the chain will have been forked in an inauthentic direction.

Note: temporary apparent unavailability of non-nefarious nodes is still counted in a 2:1 determination.
Hashgraphs are intrinsically fair to all unless more than one-third of participating nodes are malicious, as that is the threshold to needed for nefarious records to be appended. In that eventuality, the chain will have been forked in an inauthentic direction.

Note: temporary apparent unavailability of non-nefarious nodes is still counted in a 2:1 determination.
Efficiency Efficiency is not criteria for plain Merkle trees. Imperfect efficiency as reaching consensus is wasteful, with some miner's work not resulting in records appended in the chain, as viewed later, and the bandwidth needed to reach consensus is inversely related to the size of the blocks. For proof of work variants there is a CPU/electricity inefficiency too. Together the inefficiency factors may cause inflationary effects (at least to crypto-currencies on the chain). Perfect efficiency as reaching consensus is not wasteful.
Retains History? The subscriber can miss some changes to the tree, only “current” is purported to be available. Retained, notwithstanding forks, pruning, and lost transactions This could be up to the node itself - it could be journalling changes made and doesn’t have to inform the other nodes that it is doing so. Or it could be part of the Merkle-tree mutation rules that all the participating nodes are aware of.
Scope? Most likely to be single purpose (say ‘2016 election results by voting district’). Can be general purpose (e.g. Etherium). Specific purpose ones are as likely though. Single purpose and general purpose hashgraphs will exist.
Subscribable? Is the appeal of this, including partial-tree subscriptions, but the onus is on the subscriber to do enough I/O to determine what changed (and whether that was interesting to the subscriber) based on what node they were subscribing to originally. Possible, logistically hard in the case of the biggest implementations, and ultimately unlikely by a non-government entity. All nodes participate in changes as they play a part in the distributed consensus. As such, they can subscribe if they like.
Tamper evidence? Records arriving in the tree cannot be hidden. Especially not ‘snuck in’ when everyone thinks that corner of the tree isn’t being updated.

The central authority (or their delegates) can alter items a requester would want posted, but that requester would know at least. Similarly the authorized can put anything in the tree and nobody else could say that was wrong, because you have to trust the authority (and their partners).
Yes, is a strong feature. Yes, is a strong feature.
Truthfulness of underlying payload data? Tech does not speak to data’s accuracy, only to its authenticity, and even then only as far as the central authority can be trusted. Tech does not speak to data’s accuracy, but via its “consensus” aspects makes a strong assertion about authenticity. Tech does not speak to data’s accuracy, but via its “consensus” aspects makes a strong assertion about authenticity.
Cost of initial implementation? $ $ - $$$ $ - $$
Worst case scenarios? 1) Rate of change too high for subscribers to keep up.

2) Rate of change too high for the implicit server to calculate hashes recursively back to root meaningfully.

3) Widespread trust in central authority is lost and value of the tree disappears precipitously.
1) Proof of work, stake, and consensus too costly to operate continually. With the “Dyson Sphere and Fermi Paradox” exaggeration (Nat Pryce quote) noted.

2)Value of the chain disappears precipitously because of lost participant trust (or other human factors) including decisions to no longer trust that may be considered irrational, invalid, or hasty (a ‘run’ for example).

3)Hashgraph proves to be a better implementation by multiple criteria.
1) At least ⅓ of the full nodes are malicious and colluding. (Or, if weighted by stake, then nodes controlling at least ⅓ of the stake are malicious and colluding).

Incidentally, Hashgraph algorithms and its ‘gossip’ can be used for non-Merkle-Tree solutions.

Hashgraph’s inventor, Leemon Baird, when I posed the question, said:

"Non Merkle-Tree solutions are fine for three specific cases:

  1. Outside users trust you completely.
  2. Outside users always read everything.
  3. Outside users don’t exist."

In all other cases, Merkle trees are pretty important.

For 1, an outside user asks a full node or central server for just one piece of data, and simply trusts that it will be correct. That has obvious problems.

For 2, the outside user never asks for one piece of data. They simply download the entire database, or difference since the last time they downloaded it. In that case, you can periodically calculate a hash of everything, and use that like the root of a Merkle tree. But even then, it can be more efficient to store it as a Merkle tree.

For 3, this actually happens in some cases. If the “central server” is only used by its owner, then there’s no need to bother with trees. If a hashgraph network is permissioned, with no outside users, then you wouldn’t have to bother with trees. But these are very limited cases. In most real-world situations, even if you started this limited, you’d eventually outgrow it and want something more powerful. Like a Merkle Tree.

Thanks also to Prasanna Pendse for advice. Also Swirlds.com execs: Leemon Baird (in particular), Mance Harmon, and Andrew Masanto for details on Hashgraphs for this comparison.

Security vulnerabilities put your data and your customer’s data at risk. Find out how FlexNet Code Insight integrates into your SDLC and makes monitoring open source security a breeze, so you can worry less and build more.  Brought to you in partnership with Flexera.

merkle trees ,blockchain ,security ,hashgraph

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}