Over a million developers have joined DZone.

Write Optimizations in Voron

· Performance Zone

See Gartner’s latest research on the application performance monitoring landscape and how APM suites are becoming more and more critical to the business, brought to you in partnership with AppDynamics.

As I mentioned, there is one scenario that produces worse and worse performance for LMDB: random writes. The longer the data set, the more expensive the cost. This is due to the nature of LMDB as a Copy On Write database. Here is a simple explanation of the cost:

We want to update the value in the red square.


Because the tree is Copy On Write, we need to modify the following pages:


And as the tree goes deeper, it becomes worse. Naturally, one consequence of doing random writes is tree fragmentation. In my tests, on a small 1 million records dataset we would get a cost of over a hundred pages (roughly 4 MB) to update a hundred 16 + 100 bytes values. Just to clarify, those tests were done on the LMDB code, not on Voron. So I am pretty sure that this isn’t my problem. Indeed, I confirmed that this is a general problem with COW approach.

I accepted that, and the benefits that we get are significant enough to accept that. But… I never really was complete in my acceptance. I really wanted a better way.

And after adding Write Ahead Log to Voron, I started thinking. Is there a reason to continue this? We are already dealing with writing to the log. So why modify the entire tree?

Here is how it looks like:


Now, when you are looking at the tree, and you ask for the B page, you’ll get the one in the log. However, older transactions would get to see the B page in the data file, not the one in the log. So we maintain MVCC.

This complicates matters somewhat because we need to sync the data file flushing with existing transactions (we can’t flush the file until all transactions have progressed to the point where they look at the new pages in the log), but that is something that we have to do anyway. And that would cut down the amount of writes that we have to do by a factor of ten, according to my early estimation. Hell, this should improve write performance on sequential writes.

For that matter, look what happens when we need to split page D:


Here we need to touch the tree, but only need to touch the the immediate root. Which again, gives us a lot of benefits in the size we need to touch.

The Performance Zone is brought to you in partnership with AppDynamics.  See Gartner’s latest research on the application performance monitoring landscape and how APM suites are becoming more and more critical to the business.


Published at DZone with permission of Ayende Rahien, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}