HDFS Erasure Coding in Hadoop 3.0
HDFS Erasure Coding in Hadoop 3.0
Integrating Erasure Coding with HDFS can improve storage efficiency while still providing similar data durability as traditional replication-based HDFS deployments.
Join the DZone community and get the full member experience.Join For Free
HDFS Erasure Coding (EC) in Hadoop 3.0 is the solution to the problem that we had in the earlier version of Hadoop: its 3x replication factor, which is the simplest way to protect our data even in the failure of data nodes but needs too much extra storage. Now, with EC, storage overhead is magically reduced to 50% — it was previously 200% because of HDFS's default 3x replication factor. It also seems like extra work to store two more blocks other than our original data block with the same amount of resources as the original data block.
Therefore, in Hadoop 3.0, using Erasure Coding in place of replication provides improved fault-tolerance with much less storage space. In typical EC setups, the storage overhead is no more than 50%.
To understand Erasure Coding in detail, I would like to introduce two terms:-
- Durability: How many simultaneous failures can be tolerated?
- Storage Efficiency: How much storage is used for data?
When the Replication Factor Was 3
Data durability is 2, as we can handle two simultaneous failures.
Storage efficiency is 33% (useful block/total blocks, i.e. 33%).
Apart from this, it causes 200% overhead in making two extra data copies in storage.
Today’s HDFS Erasure Coding
There are two algorithms available.
1. XOR Algorithm (Simple EC Algo)
This is the simplest implementation of HDFS Erasure Coding. Let’s assume X and Y are data cells; the parity cell is the XOR of these two data cells.
Here, data durability is 1 as if can handle 1 simultaneous failure and storage efficiency is 75% (as we are using only one extra block, i.e. 3/4).
x ⊕ y is XOR by which only one parity bit is generated and if any bit is lost, it can be recovered by the remaining data cells and a parity bit. It is very limited since it produces one parity bit, so XOR operations can tolerate only one failure with n group size but we get the benefit of better storage efficiency by using XOR algorithm.
2. Reed-Solomon Algorithm (Improved EC Algorithm)
The limitation of XOR operations is solved by improved EC algorithm, commonly known as the Reed-Solomon algorithm. Reed-Solomon uses linear algebra operations to generate multiple parity cells where, instead of getting only one fault tolerance at a time, we can tolerate multiple failures per group. It works by multiplying a Generator Matrix (GT) with d data cells to generate codeword with d data cells and p parity cells. In Reed-Solomon, fault tolerance is up to p, i.e. (number of parity cells) cells and storage efficiency is d/d+p where d is data cells and p is parity cells.
In this particular example, when you look at the codeword, 6 (blue cells) are the actual data cells and 3 (red cells) are the parity cells, which are simply obtained by multiplied our data cells to generation matrix.
Storage failure can be recovered by the multiplying inverse of generator matrix with the extended codewords as long as k out of k+m cells are available.
Therefore, data durability is 3 (as it can handle two simultaneous failures), storage efficiency is 67% (as we are using only one extra block, i.e. 6/9), and we only need to store half the number of cells compared to the original number of cells. We can conclude that we also have only 50% overhead in it.
Advantages of HDFS Erasure Coding in Hadoop
- Saving storage: Initially, blocks are triplicated when they are no longer changed by any additional data. After this, a background task encodes it into codeword and delete its replicas.
- Two-way recovery: HDFS block errors are discovered and recovered not only during reading the path and we can check it actively in the background.
- Low overhead: Overhead is reduced from 200% to just 50% in RS encoding algorithm.
Integrating EC (Reed-Solomon algorithm) with HDFS can improve storage efficiency while still providing similar data durability as traditional replication-based HDFS deployments. As an example, a 3x replicated file withsix6 blocks will consume 6*3 = 18 blocks of disk space. But with EC (6 data, 3 parity) deployment, it will only consume nine blocks of disk space.
Published at DZone with permission of Ayush Tiwari , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.