Over a million developers have joined DZone.

LZ* compression algorithms

DZone's Guide to

LZ* compression algorithms

· Performance Zone ·
Free Resource

Sensu is an open source monitoring event pipeline. Try it today.

While I was implementing a small, naive log aggregation tool I had a moment to consider the type of compression I wanted to use on the log files at rest. The main implication this has outside of the efficiency of compression and how much space the files will take up is how they can be used once stored. If you are using tools like gzip/gunzip with awk or other simple command-line tools, or even within scripts written in common languages like Python or Ruby, gzip compression poses very little problem – since in most cases you are processing one file at a time.

If you want to use a distributed computation system like Hadoop on the other hand, this can be a problem. Gzip files can’t be split, so you tend to suffer by only having a single mapper being able to work on a file at a time. If your files are small, this may not be a problem but if your files are large it can be. Other tools such as Impala will outright refuse to work on gzipped data, so this may further limit your options. I started looking into alternative compression algorithms that these tools do support and one name that kept coming up was LZO. If you look it up on Wikipedia it doesn’t offer much insight into what it actually is. Since I was implementing my aggregation tool in Golang, I checked the standard library and only found LZW compression.

Are LZO and LZW the same? Are they related? Is one better than the other? I also found very little help in the Google results (but perhaps that is just me). In the end I implemented gzip compression in my program, but little did I know that even gzip is basically in the same family as these aforementioned LZ-prefixed algorithms (via deflate).

I just started watching a new series of videos on the Google Developers YouTube channel called Compressor Headepisode 2 of which covers the LZ Compression Family, how these algorithms work in easy-to-understand terms and which programs we know of today that inherit from the fundamental algorithms LZ77 and LZ78. I highly recommend watching them!

Sensu: workflow automation for monitoring. Learn more—download the whitepaper.


Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}