Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Evaluating Database Compression Methods

DZone's Guide to

Evaluating Database Compression Methods

This blog post is an update to our last post discussing database compression methods, and how they stack up against each other. In this post, we will restate and correct the important observations and recommendations that were incorrect in the last post.

Free Resource

Traditional relational databases weren’t designed for today’s customers. Learn about the world’s first NoSQL Engagement Database purpose-built for the new era of customer experience.

Database Compression MethodsThis blog post is an update to our last post discussing database compression methods, and how they stack up against each other. 

When Vadim and I wrote about Evaluating Database Compression Methods last month, we claimed that evaluating database compression algorithms was easy these days because there are ready-to-use benchmark suites such as lzbench.

As easy as it was to do an evaluation with this tool, it turned out it was also easy to make a mistake. Due to a bug in the benchmark, we got incorrect results for the LZ4 compression algorithm, and as such made some incorrect claims and observations in the original article. A big thank you to Yann Collet for reporting the issue!

In this post, we will restate and correct the important observations and recommendations that were incorrect in the last post. You can view the fully updated results in this document.

Compression Method

As you can see above, there was little change in compression performance. LZ4 is still the fastest, though not as fast after correcting the issue.

Compression Ratio

The compression ratio is where our results changed substantially. We reported LZ4 achieving a compression ratio of only 1.89—by far the lowest among compression engines we compared. In fact, after our correction, the ratio is 3.89—better than Snappy and on par with QuickLZ (while also having much better performance).  

LZ4 is a superior engine in terms of the compression ratio achieved versus the CPU spent.

Compression vs Decompression

The compression versus decompression graph now shows LZ4 has the highest ratio between compression and decompression performance of the compression engines we looked at.

Compression Speed vs Block Size

The compression speed was not significantly affected by the LZ4 block size, which makes it great for compressing both large and small objects. The highest compression speed achieved was with a block size of 64KB—not the highest size, but not the smallest either among the sizes tested.

Compression Speed vs Block Size

We saw some positive impact on the compression ratio by increasing the block size. However, increasing the block size over 64K did not substantially improve the compression ratio, making 64K an excellent block for LZ4, where it had the best compression speed and about as-good-as-it-gets compression. A 64K block size works great for other data as well, though we can’t say how universal it is.

Updated Recommendations

Most of our recommendations still stand after reviewing the updated results, with one important change. If you’re looking for a fast compression algorithm that has decent compression, consider LZ4.  It offers better performance as well as a better compression ratio, at least on the data sets we tested.

Learn how the world’s first NoSQL Engagement Database delivers unparalleled performance at any scale for customer experience innovation that never ends.

Topics:
compression ,performance ,database

Published at DZone with permission of Peter Zaitsev, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}