Reviewing FASTER: When the Data Hits the Disk
Let's take a look at a review of RASTER and explore what happens when the data hits the disk. Look at how it handles data since it uses a log-based system.
Join the DZone community and get the full member experience.Join For Free
This article takes So far, I ignored anything in FASTER about how the data actually hits the disk. Based on my reading of the code and the paper, here is what I think is going on. FASTER works in segments and in conjunction with its allocator. When you create a new instance, you have to define what would be the log size. From looking at the code, they seem to be favoring 16GB as the default size of the log. This is passed to PersistentMemoryMalloc, which uses pages of 32MB each to manage the memory. Out of these 16GB, 14.4GB are considered to be mutable and 1.6 GB is considered to be read-only.
On startup, this class allocates two pages (64MB). Whenever FASTER needs more memory, such as to store a new record, it will eventually call to this method:
Again we have num_slots that actually mean size in bytes, but I’ll leave my pet peeves aside.
You can see that this allocates from the tail of the page use Reserve, which does an atomic operation. If we run out of space in the 32MB page, the caller needs to call NewPage() to handle the new allocation. This plays together with the buffer management and the epoch. In particular, here is how a new page is allocated in the normal case. Assuming we just started and we consumed 64MB of memory, the new entry will allocate the 3rd page. This will also move the section of read-only memory when there is a new page allocated and the number of active pages is beyond 14.4 GB.
What this means, in practice, is that FASTER typically has a 14.4GB range in which all operations are working on purely mutable memory. That means that two impressions on the same ad will end up being very close to simply Interlocked.Increment on the relevant value. This is the key to the performance that FASTER exhibits.
What happens once we start going beyond the 14.4 GB? FASTER will begin to move data to the read-only section. In this case, it means that any new modifications to the data will create a new copy of it in the mutable section.
At the same time, PageAlignedShiftReadOnlyAddress() will also start an async process of writing these read-only pages to disk. Here is how it works:
If the read_only_address was updated, FASTER calls to BumpCurrentEpoch() and will execute OnPagesMarkedReadOnly() when the Epoch moves beyond this range (this works because then it is guaranteed that no one may mutate this memory). That method looks like this:
The notion of read_only_address and safe_readonly_only_address is discussed in the paper quite nicely, by the way. AsyncFlushPages() writes the data to disk, as you would expect and updates various in-memory structures.
Note that just having the data written to disk doesn’t remove the in-memory copy. It is still available in memory until it is pushed back from the log by new data. Now that we understand how the data goes to the log and then to the disk, I want to figure out how the data is written into the disk itself. Actual writes are handled here:
What you can see is that the destination offset is used to divide the data on disk to sections. Each section is 1GB in size. In other words, the way FASTER works is to write the data in 1 GB segments that are sequential over time.
This also plays into the expiration policy that FASTER employs. Because it uses a log-based system, it accumulates data over time and will need to handle that. The current way it deals with the problem is to just delete old files; this gets rid of the data in a roughly time-based order, which is suitable for the use case that the paper talks about. Another alternative is to read the old files and move the still valid entries to the start. That doesn’t seem to be implemented, and I think it will be pretty hard to do and likely consume a lot of resources.
I’ll probably have another summary post about FASTER, but that is pretty much it. I’ve ignored other parts (recovery, several state machines used to handle internal state, etc), but they aren’t important to grokking what it is actually doing. It is an interesting codebase, but it feels…incomplete. However, I’ll reserve such thoughts to the summary post.
Published at DZone with permission of Oren Eini, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.