{{announcement.body}}
{{announcement.title}}

Crunch Time: 10 Best Compression Algorithms

DZone 's Guide to

Crunch Time: 10 Best Compression Algorithms

Take a look at these compression algorithms that reduce the file size of your data to make them more convenient and efficient.

· Big Data Zone ·
Free Resource

Data compression is the process of reducing file sizes while retaining the same or a comparable approximation of data. This is accomplished by eliminating unnecessary data or by reformatting data for greater efficiency. 

When compressing data, you can use either lossy or lossless methods. Lossy methods permanently erase data while lossless preserve all original data. The type you use depends on how high fidelity you need your files to be.

In this article, you will discover six different types of lossless data compression algorithms, and four image and video compression algorithms based on deep learning.

6 Lossless Data Compression Algorithms

Lossless compression algorithms are typically used for archival or other high fidelity purposes. These algorithms enable you to reduce file size while ensuring that files can be fully restored to their original state if need be. 

There is a variety of algorithms you can choose from when you need to perform lossless compression. Below are six commonly used ones.

1. LZ77

LZ77, released in 1977, is the base of many other lossless compression algorithms. It uses a “sliding window” method. In this method, LZ77 manages a dictionary that uses triples to represent:

  • Offset—the distance between the start of a phrase and the beginning of a file. 

  • Run-length—the number of characters that make up a phrase.

  • Deviating characters—markers indicating a new phrase. This includes an indication that the phrase is equal to the original phrase and which characters differ. 

As a file is parsed, the dictionary is dynamically updated to reflect the compressed data contents and size. For example, a file containing the the string "abbadabba" is compressed to a dictionary entry of "abb(0,1,'d')(0,3,'a')". You can see a breakdown of this process below.

Position

Symbol

Output

0

a

a

1

b

b

2

b

b

3

a

(0,1,’d’)

4

d

5

a

(0,3,’a’)

6

b

7

b

8

a

In this example, the substitution is slightly larger than the input but with a realistic input (which is much longer) the substitution is typically considerably smaller. 

2. LZR

LZR, released in 1981 by Michael Rodeh, modifies LZ77. It is designed as a linear alternative to LZ77 but can be used for any offset within a file. When used non-linearly, it requires a significant amount of memory, making LZ77 a better option. 

3. LZSS

Lempel-Ziv-Storer-Szymanski (LZSS), released in 1982, is an algorithm that improves on LZ77. It does this by including a method that detects whether a substitution decreases file size. If it doesn’t, the input is left in the original form. LZSS also eliminates the use of deviating characters, only using offset-length pairs. This method is commonly used for archive formats, such as RAR, or for compression of network data. 

4. DEFLATE

DEFLATE, released in 1993 by Phil Katz, combines LZ77 or LZSS preprocessor with Huffman coding. Huffman coding is an algorithm developed in 1952. It is an entropy encoding method that assigns codes based on the frequency of a character. 

5. LZMA

Lempel-Ziv Markov chain Algorithm (LZMA), released in 1998, is a modification of LZ77 designed for the 7-Zip archiver with a .7z format. It uses a chain compression method that applies the modified LZ77 algorithm at a bit rather than byte level. The output from this compression is then processed using arithmetic coding for further compression. Depending on the exact implementation, other compression steps may also be performed.

6. LZMA2

LZMA2, released in 2009, is a refinement of LZMA. It improves the performance of LZMA with greater multithreading capabilities and improved handling of incompressible data. 

4 Image and Video Compression Algorithms Based on Deep Learning

In addition to the static algorithms introduced above, there are several algorithms that are based on deep learning that you can use. 

1. Multi-Layer Perceptron (MLP)-Based Compression

MLP is a technology that uses multiple neuron layers to intake, process, and output data. It can be applied to dimension reduction tasks and data compression. The first MLP-based algorithm was developed in 1988 and integrated the existing processes of:

  • Binary coding—standard two-symbol coding.

  • Quantization—constraint of input from a continuous set to a discrete set.

  • Spatial domain transformation—pixel-by-pixel changes to data.

MLP algorithms used outputs from the above processes in a decomposition neural network to determine optimal binary code combinations. Later, this method was refined with predictive techniques which enabled accurate approximation of data based on neighboring data via backpropagation. 

2. DeepCoder - Deep Neural Network Based Video Compression

DeepCoder is a Convolutional Neural Network (CNN) based framework, which presents an alternative to traditional video compression techniques. The model uses separate Convolutional Neural Networks (CNNs) for predictive and residual signals. It encodes feature maps into a binary stream using scalar quantization and a traditional file compression algorithm, Huffman encoding. The model is said to provide superior performance compared to the well-known H.264/AVC video coding standard. See the full paper.

3. Convolutional Neural Network (CNN)-Based Compression

CNNs are layered neural networks, often used for image recognition and feature detection. When applied to compression, these networks use convolution operations to calculate the correlation between neighboring pixels.

CNNs show better compression results than the MLP-based algorithms, with improved super-resolution performance and artifact reduction. Additionally, CNN-based compression improves the quality of JPEG images by reducing the peak signal-to-noise ratio (PSNR) and the structural similarity (SSIM). CNN-based compression can also achieve the performance of the High-Efficiency Video Coding (HEVC) standard by using entropy estimation. 

4. Generative Adversarial Network (GAN)-Based Compression

GANs are a form of neural network that use two networks in competition to produce more accurate analyses and predictions. GAN-based compression algorithms were first developed in 2017. These algorithms can compress files up to two and a half times smaller than other commonly used methods, such as JPEG or WebP. 

You can use GAN-based methods for real-time compression through the use of parallel processing. This method works by compressing images based on the most relevant features. When decoded, images are then reconstructed according to predictions made from these features. In comparison with CNN-based compression, GAN-based compression can produce higher quality images by eliminating adversarial loss.

Conclusion

Compression algorithms can help you optimize file size. Different algorithms provide different results. This article reviewed six static algorithms for lossless compression, and four algorithms for deep learning usage. However, if you can’t find the right algorithm here, you can take a look at this guide and refine your search. There is no shortage of algorithms, but you need to be specific when looking for the right algorithm for your project.

Topics:
algorithm, big data, compression, compression algorithm, data compression, lossless

Published at DZone with permission of Leah Fainchtein Buenavida . See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}