Reviewing FASTER (Part 1): Reading the Paper
Let's take a look at FASTER and the paper that was published about it to find out the implementation details beyond the abstract.
Join the DZone community and get the full member experience.Join For Free
The FASTER is a fast key-value store from Microsoft. Its paper was published a while ago, and now that the source is available, I thought that I would take a peek. You might say that I have a professional interest in such a project.
I decided to read the paper first before reading the code because I want to figure out what isn’t in the paper. The implementation details beyond the abstract are what I find most interesting, to be honest. But if I’m going to read the paper, I’m also going to add some comments on it. Please note that the comments are probably not going to make much sense unless you read the paper.
The Epoch Basic section explains how they use a global epoch counter with a thread local value and a shared table that marks what epochs are visible to each thread. They use this to provide synchronization points, I assume (don’t know yet). This resonates very strongly with how LMDB’s global transaction table operates.
I like the concept of the drain list, which is executed whenever an epoch becomes safe. I would assume that they use that to let the clients know that their operation was accepted and what its state was.
I wasn’t able to figure out what they use the tag for in the hash bucket entry. I think that the way it works is that you have K hash buckets and then use the first K bits to find the appropriate bucket, then scan for a match on the last 15 bits. I’m not really sure how that works with collisions, though. I assume that this will be clearer when I get to the code. I like the two-phase scan approach to ensure that you have no conflicts when updating an entry.
The paper keeps repeating the speed numbers of 160 million ops/sec and talking about 20 million ops/sec as being “merely.” Just based on my reading so far, I don’t see how this can happen. What is interesting to me is the definition of ops. Is it something like incrementing the same (small) set of counters? If that is the case, then the perf numbers both make more sense and are of less interest. Typically, when talking about ops/sec in such scenarios, we talk about inserts/updates to individual documents/rows/objects. Again, I don’t know yet, but that is my thinking so far.
One thing that I find sad is that this isn’t a durable store. A failure in the middle of operations would cause some data to be completely lost. It seems like they have support for checkpoints so you don’t lose everything. However, I’m not sure how often that happens and the benchmarks they are talking about were run without it. Interestingly enough, the benchmarks were run without garbage collection. I haven’t gotten to the discussion on that yet, so I’m not exactly what that means Another missing feature here is that there is no support for atomicity. You cannot ensure that two separate operations will run as a single atomic step.
The benchmark machine is 28 cores with 256GB RAM and 3.2 TB NVMe drive. This is a really nice machine, but from the get-go, I can tell that this is not going to be a fair comparison to most other storage engines. FASTER is explicitly designed to work mostly in memory and with a high degree of parallelism. This is great, but it gives us some important features (atomic batches and durability, also known as transactions). The data size they tested are:
- 250 million records with 8 bytes keys and 8 bytes values — Just under 4GB in total.
- 250 million records with 8 bytes keys and 100 bytes values — Just under 32GB in total.
I’m not sure why they think that this is going to provide a larger than memory setup. In particular, they mention a memory budget of 2GB, but I assume that this is just internal configuration. There is also going to be quite a lot of memory cached in the OS’ page cache, for example, and I don’t see anything controlling that. Maybe I will when I’ll go through the code.
Okay, the garbage collection they refer to is related to how they compact the log. They use an interesting approach where they just discard it at some point, and any pointer to the removed section is considered to be deleted automatically. That is likely to be very efficient, assuming that you don’t care about older data.
All in all, I feel that I have a vague understanding of how FASTER works and a lot better grasp on what it does and how it is utilized. I’m looking forward to diving into the code.
Published at DZone with permission of Oren Eini, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.