Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Reviewing Lightning Memory-Mapped Database Library: Partial

DZone's Guide to

Reviewing Lightning Memory-Mapped Database Library: Partial

· Big Data Zone
Free Resource

Effortlessly power IoT, predictive analytics, and machine learning applications with an elastic, resilient data infrastructure. Learn how with Mesosphere DC/OS.

Continuing in my quest to learn more, I decided to go over the LMDB codebase.

LMDB is:

LMDB is an ultra-fast, ultra-compact key-value data store developed by Symas for the OpenLDAP Project. It uses memory-mapped files, so it has the read performance of a pure in-memory database while still offering the persistence of standard disk-based databases.

It has some interesting feature set, and a really small codebase. I am anxious to see how they managed to do so much.

Interestingly, the data model for LMDB is quite different from the usual append-only / transaction log. Instead, it allows only a single concurrent writer, and modify the data in place. There also appears to be a lot of dire warnings regarding usage of long transactions, since they would result in increased file size, presumably because the db couldn’t find the pages the scavenge because they are locked by ongoing transactions.

One thing that I should note already, the code (I am currently about 1/3 of the way of lmdb.h file) is very well commented, and it explains a lot about what is going on. If the rest of the code is like that, this is going to be really nice to read. Okay, I take it back. The main files seems to be lmdb.h and mdb.c. The first one is 1,300 lines and the second is over 7,500 lines. Admittedly, this is impressive in the sense that pretty much everything is done there, and there are a lot of docs. But damn, I wish this was better organized. Right now my head feels like I need to pop up and take a breath.

I read ~2,000 lines of code so far, and I haven’t found anything that does something. It is all headers or comments or macros up until now. I skipped over to the code, and it is really hard to understand what is going on.

Take this code snippet:

image

It shows a lot of the things that make it hard to work with. effectively random naming convention (under_score, Pascal_naming, SHOUT_NAMING, etc), the goto trick is used WAY too much, lovely variable names such as n2. This is part of a method that goes on for something like 150 – 200 lines. And it include the following code:

image

Quick, can you tell me how many variables are declared here? And note that they are all local variables. I counted it twice, once getting to 17 and once getting 18.

That is just too much, I am not going to go any deeper. The leveldb codebase was easy to follow, it had structure. This codebase is just a code dump. It might be a really good codebase, for what it needs to do, but I literally can’t follow it.

Learn to design and build better data-rich applications with this free eBook from O’Reilly. Brought to you by Mesosphere DC/OS.

Topics:

Published at DZone with permission of Oren Eini, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}