Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Voron Internals: The Diff Is the Way

DZone's Guide to

Voron Internals: The Diff Is the Way

Read to see how the developers behind Voron used diffs to effectively reduce their journal size and increase performance.

· Database Zone
Free Resource

Check out the IT Market Clock report for recommendations on how to consolidate and replace legacy databases. Brought to you in partnership with MariaDB.

I talked about the goals of using diffs for the journals in a previous post, but in this one, I want to talk about what it actually did. To start with, it turns out that using diffs in the journal invalidates quite a few optimizations that we had. We had a whole bunch of stuff that we could do to avoid writing data to the data file if there are additional modifications to it. When using diffs, we can’t do that because the next diff is building on the previous version. That actually ended up improving the code quality because a bunch of pretty tricky code had to go away.

It was tricky code because we tried to reduce the data file I/O, but only in the cases that it was allowed, which wasn’t always. Now we always write the latest version of all the modified pages, which might add some additional I/O in some cases, but in practice, this didn’t show up in our benchmarks. And the reduction of complexity might have been worth it even if it was.

We actually have two diff implementations, one that tests two different versions of a page and finds the differences, and one that tests a page against zeros. That, in addition to collapsing runs of zeros, is the entire implementation. We are actually not as small as we could get because we insert some metadata into the diff stream to allow easier debugging and error detection. But the nice thing about the diff output is that we are still compressing it, so working extra hard to reduce the uncompressed data isn’t that important if compression will cover it anyway.

We tested before and after of using diffs for journal writes, and we found the following:

Journal size (inserts) 37% reduction in size
Journal size (updates) 80% reduction in size
Bulk insert speed 5% – 10% speed improvement *
Writes / sec (inserts) 12% improvement
Writes / sec (updates) 18% improvement

* If the sizes went down so significantly, why haven’t the insert & update speed improve by a comparable amount?

The answer to that is that for the most part, while reducing the amount of data we are writing is awesome, a major limiting factor is the number of times we write to the journal, rather than the number of bytes. And we have to write once per transaction, so that is a limiting factor.

However, the current numbers we are looking at is a benchmark showing roughly 30,000 write requests/second and are unable to get anything higher because we have saturated the local network. We are currently setting up a dedicated testing environment to see how far we can push this.

Interested in reducing database costs by moving from Oracle Enterprise to open source subscription?  Read the total cost of ownership (TCO) analysis. Brought to you in partnership with MariaDB.

Topics:
implementation ,i/o ,diff ,factor ,database

Published at DZone with permission of Ayende Rahien, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}