Over a million developers have joined DZone.

Reviewing Lightning Memory-Mapped Database Library: On Page Splits and Other Painful Things

· Database Zone

Build fast, scale big with MongoDB Atlas, a hosted service for the leading NoSQL database. Try it now! Brought to you in partnership with MongoDB.

image

I don’t know if you noticed,  but the LMDB codebase is giving me serious headache issues. The bloody thing is a very dense piece of code, but at the same time, it is also quite interesting. In particular, B-Trees are pretty much The Answer for a lot of on disk data structures, but they tend to be quite hard to handle properly. So I am looking forward to this piece of code, in which I am going to figure out if I can figure out this code. Just to note mdb_page_split is also another 400 lines method with goto sprinkled all over.

In order to do that, I wrote the following code (in a single transaction):

image

And then I spent another few minutes trying to get it to compile. Mostly because I started out with “for (int i=0; …” and C doesn’t allow that (blah!).

Anyway, I got the page to split, and now I want to see how it actually behaves. I am currently at the first part where we take the root (currently leaf) page and turn it into a branch. This is an interesting case of a root split.

We start with:

image

And the first thing we do is to go to this mode:

image

We add an empty implicit pointer to the previous root. But this is really just the beginning, now we need to divide the entries that used to be in the root between the left & right sides. This is made complex by the fact that you might have it setup so it has a few large & small values, so just cutting them in the middle would produce a result that would be too big. At least, that is what a comment says. I am not sure how that can be. We are going from one page to two pages, so I don’t see how you can get into a situation where that would happen. I guess it is time to slot into more code.

Okay, I got it, the issue is that we do a page split because we need to add a new item. So that is the reason we have to jump through those hops. Because we add a new item (that is already too big for the original page, since that is why we are actually splitting that).

Another complex issue with page splits is that they might be recursive. If we need to split a page, we need to add an entry to the parent page, which might cause that page to split, etc…

An interesting observation on the sizes involved, a LMDB page has 4084 bytes available for storage. The data for the page number is 8 bytes long (it uses pointer size page number) and assuming keys that are 16 bytes keys in size (and including about 8 bytes for node header), we get about 128 keys per branch page. Even considering that B-Tree are specifically designed to make themselves efficient in the presence of memory hierarchies, it is  quite impressive.

Consider, assuming a full tree, if we hold just the root and the first level in memory, we would consume about 512kb. And that would give us just one seek to get any of ~2 million items. As an aside, one reason that I like reading this code is that for once, this is a B-Tree implementation that isn’t covered in locking code, which is how this is usually works in RDBMS.

Another aspect that is quite interesting is that this code really shows important aspects for working with relational databases. It is all about keeping things inside the same page, with spill over to overflow pages slowing things down because you need to do another seek. Or a really obvious explanation why page splits are such a bad thing, and a lot of other details that you learn when you go a bit deep into relational databases but (at least for me) have never been real before I started dealing with building databases myself.

Now it's easier than ever to get started with MongoDB, the database that allows startups and enterprises alike to rapidly build planet-scale apps. Introducing MongoDB Atlas, the official hosted service for the database on AWS. Try it now! Brought to you in partnership with MongoDB.

Topics:

Published at DZone with permission of Ayende Rahien, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}