Over a million developers have joined DZone.

LZ* compression algorithms

· Performance Zone

Download Forrester’s “Vendor Landscape, Application Performance Management” report that examines the evolving role of APM as a key driver of customer satisfaction and business success, brought to you in partnership with BMC.

While I was implementing a small, naive log aggregation tool I had a moment to consider the type of compression I wanted to use on the log files at rest. The main implication this has outside of the efficiency of compression and how much space the files will take up is how they can be used once stored. If you are using tools like gzip/gunzip with awk or other simple command-line tools, or even within scripts written in common languages like Python or Ruby, gzip compression poses very little problem – since in most cases you are processing one file at a time.

If you want to use a distributed computation system like Hadoop on the other hand, this can be a problem. Gzip files can’t be split, so you tend to suffer by only having a single mapper being able to work on a file at a time. If your files are small, this may not be a problem but if your files are large it can be. Other tools such as Impala will outright refuse to work on gzipped data, so this may further limit your options. I started looking into alternative compression algorithms that these tools do support and one name that kept coming up was LZO. If you look it up on Wikipedia it doesn’t offer much insight into what it actually is. Since I was implementing my aggregation tool in Golang, I checked the standard library and only found LZW compression.

Are LZO and LZW the same? Are they related? Is one better than the other? I also found very little help in the Google results (but perhaps that is just me). In the end I implemented gzip compression in my program, but little did I know that even gzip is basically in the same family as these aforementioned LZ-prefixed algorithms (via deflate).

I just started watching a new series of videos on the Google Developers YouTube channel called Compressor Headepisode 2 of which covers the LZ Compression Family, how these algorithms work in easy-to-understand terms and which programs we know of today that inherit from the fundamental algorithms LZ77 and LZ78. I highly recommend watching them!

See Forrester’s Report, “Vendor Landscape, Application Performance Management” to identify the right vendor to help IT deliver better service at a lower cost, brought to you in partnership with BMC.


Published at DZone with permission of Oliver Hookins, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}