Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Reviewing FASTER: Reading Data From Disk

DZone's Guide to

Reviewing FASTER: Reading Data From Disk

Let's take a look at a review of FASTER and explore how to read the data from the disk itself.

· Database Zone ·
Free Resource

Download "Why Your MySQL Needs Redis" and discover how to extend your current MySQL or relational database to a Redis database.

One of the things that FASTER claims is that it is suitable for larger than memory datasets with its hybrid log approach. I briefly went over that code during my review, but I want to dedicate this post to discussing how FASTER work with the disk.

FASTER is using async I/O on Linux and Windows to do its I/O, which present its own challenges, in particular, how do you handle an operation that is going to need to hit the disk ( to read old data). Another thing that I would like to discover is how does it save the data to disk. We’ll start from the reading data part.

Reading in FASTER looks like this:

image

You pass a context, a callback, and serial number. I’m not sure what the serial is about, I think it is about checkpoints, but I'm not sure. You’ll be called with the results once the operation is executed.

Here is the core of the Read method:

image

FASTER first checks the in-memory details, and if it isn’t there, it is either not found or on disk. This is actually really interesting because it implies that FASTER keeps track of all of the data items purely in memory. Let’s go to InternalRead and figure out how that works. We already looked into most of that in the previous post. FindEntry is called to find the entry by it’s hash. FASTER keep all the hashes in memory, even while it is writing entries to disk. This way, it can easily answer if an entry exists or not. Note that as I read things, if FASTER has more than a certain number of values, hash collision rate is going to reach high percentage, requiring often expensive I/O to figure out whatever the value exists.

If the value is in memory, your ReadContext.GetAtomic() method will be called. Here is one such implementation:

image

Where the value was defined as:

image

If the value has already been moved to the read-only section, it will use the Get() method, instead, as an optimization. If the value is not on the mutable section or the read-only section, it is on the disk, in which case, we have this code:

image

The go_async() method just records the status of the operation when we started the async process; it doesn’t actually invoke anything async. That is done in the caller code, Read().

image

This, in turn, gets us to this piece of code:

image

Note that the code is full of handling of the current phase of the thread. I’m ignoring all of these for now; they aren’t important.

The next thing to look at is the async I/O request, which gets us to:

image

We first register the pending I/O, then actually start to process the async call. Except not really. AsyncGetFromDisk() isn’t actually doing I/O. Instead, it seems to be focused on limiting the number of concurrent I/O operations that are going on.

In this case, if there are more than 120 pending I/Os, it will call io_getevents() in Linux and GetQueuedCompletionStatus() and try to process any completed I/O immediately.

image

ProtectAndDrain is more interesting. It asks the epoch to complete any pending operations. Such actions are the creation of a new segment file or the moving of a segment from memory to disk.

I find it interesting that FASTER chose to stall the thread until all pending I/Os are completed. Given their model, it would make more sense to push such operation into the thread ops and resume process other requests. I guess that they expect this to be either a rare case or using this for throttling overall system load. You might also have noticed the num_records arguments, this is used in the hlog method:

image

That was confusing. I expected this to be the number of records (as in, how many records are read from disk), but this is the number of bytes to read.

The C++ memory model make async code a bit complex. In particular, if you’ll look at the first code snippet in this post, you’ll see that we pass a stack allocated struct to the Read() method. Because this method can complete in an async manner, what FASTER will do is to perform a deep copy of the data. Combine that with lambda’s capturing state, and I’m pretty sure that this code will cause a very subtle memory corruption in rare cases.

image

What I think will happen is that we capture by ref the stack variable and in 99% of the cases, we’ll run this in sync mode, meaning that there will be no issues. Only if the value needs to be read from disk will you get an issue. Because that function will already return but you still have the callback (and the captured reference now pointing to something completely different) still active. I think that a C++ developer would recognize this, and the fact that C++ requires you to be explicit about your captures makes this a lot easier to deal with.

It is important to note that there is no good way to actually handle the async nature of certain operations here. Any code that actually handles the operation needs to go in the callback.

Another aspect to remember is that just reading from the disk doesn’t mean that you got the right value. You might have gotten a hash collision:

image

In other words, if you have a way to generate hash collisions as soon as the value hits the disk, you are going to be faced with making N disk I/O requests to find if you have the value or not. Denial of service attacks against hash tables are well known and represent a significant risk to services.

Next on my list is seeing how FASTER is actually writing the data to disk, but that will be in a separate post.

Read "Developing Apps Using Active-Active Redis Enterprise" and discover the advantages over other active-actve databases.

Topics:
database ,faster ,reading data from a disk ,internalread

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}