Dynamic Compression Acceleration
Dynamic Compression Acceleration
On our benchmark machine, we managed to get the import of the full StackOverflow dataset and reduce our overall benchmark time by almost 10%. How'd we do it?
Join the DZone community and get the full member experience.Join For Free
RavenDB vs MongoDB: Which is Better? This White Paper compares the two leading NoSQL Document Databases on 9 features to find out which is the best solution for your next project.
After talking about what specifying LZ4 acceleration does, let's get down to business and talk about how we use it.
In our benchmark, we run into a problem. Our benchmark hardware is too fast. I’m testing on a disk that can deliver 250,000 IOPS and can write over 1GB/second. On the other hand, if I’m running on Amazon, the maximum IOPS that I can pay for is 20,000. Azure seems to have a limit of 5,000 IOPS per disk — and standard SSD will give you about 75,000 IOPS, while HDD will typically have IOPS in the low hundreds. I’m using IOPS because it is an easy single metric to compare, but the same is abound disk bandwidth, write speed, etc.
That means that we are actually testing on hardware that is likely to be better than what we’ll typically run on, which means that we need to be careful about what kind of optimizations we bring in. It would be easy to optimize for a specific machine, at the cost of worst performance in the general case.
Case in point, for many cases, the cost of actually writing to disk on that particular machine is low enough that it isn’t worth to put a lot of effort into compressing the data. That isn’t the case all the time, though — for example, if we are applying pressure on the I/O system or if we have some other stuff going on that will impact our writes.
On that particular machine, however, it got to the point where we are seeing higher compression times than write times, which is pretty ridiculous. We aren’t actually saving anything by doing that. But instead of tailoring a solution to a single machine, we decided to go in a bit of a roundabout way. When we start, our initial assumption is that we are in a machine with higher I/O cost than CPU cost, which will cause us to put more effort into compressing the data to reduce the number of writes that we have to make.
On the other hand, after we start writing, we can start measuring the relative costs and adjust accordingly. In other words, based on how expensive it is to compress the data versus writing the data to disk, we can dynamically adjust the cost of compression until we hit the sweet spot for the machine and environment that we are running on. The beauty in this behavior is that it will automatically adjust to pressures. So, if someone is running a backup on this disk and slowing us down, we’ll learn about it and increase the compression rate to compensate. Once the I/O pressure is off, we can relax our compression ratio to reduce total resource consumed.
On our benchmark machine, we have managed to get the import of the full StackOverflow dataset — just over 18 million documents totaling over 50 GB in size in under 8 minutes, reducing our overall benchmark time by almost 10%.
Published at DZone with permission of Oren Eini, CEO RavenDB , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.