So far, we looked at naïve options for data storage and for building indexes, and we found them lacking. There was just too much complexity and the performance costs were not conducive to good business.
In this post, I want to explore the Log Structure Merge option (LSM). This is a pretty simple solution. Our file format remains pretty simple. It is just a flat list of records, but we add a very small twist. For each collection of data (we can call it a table, an index, or whatever), all the records are going to be sorted inside that file based on some criteria.
In other words, here is our file again:
But what about updates? As we mentioned, adding a user with the username "baseball" will force us to move quite a lot of data. Well, the answer to that is that we are not going to modify the existing file. Indeed, in LSM, once a file has been written out, it can never be changed again. Instead, we are going to create a new file, with the new information.
When we query, we'll search the files in descending order, so newer files are checked first. That allows us to see the updated information. Such a system also relies on tombstone markers to delete values, and it is even possible to run range searches by scanning multiple files (merge sorting on the fly). Of course, over time, the number of files you are using is going to increase, so any LSM solution also has a merge phase (it is right there in the name), where the data spread over many files is merged together.
This leads to some interesting challenges. Scanning a file to see if a value is there can be expensive (seeks, again), so we will typically use something like a bloom filter to skip that if possible. Merging files is expensive (a lot of I/O), so we want to be sure that we aren't doing that too often, and yet not doing that means that we have to do a lot more operations, so there are a lot of heuristics involved.
LSM can be a particularly good solution for certain kinds of data stores. Lucene is actually able to significantly optimize the way it works as a result of LSM. It clears internal data structures during the merge operation. Other databases that use LSM are LevelDB, RocksDB, Cassandra, etc.
Personally, I don't like LSM solutions very much. It seems that in pretty much any such solution I saw, the merge heuristics were incredibly capable of scheduling expensive merges when I didn’t want them to do anything. And there is quite a bit of complexity involved with managing such a potentially large number of files. There is also another issue; it is pretty hard to have physical separation of the data using LSM. You typically have to use separate files for each, which also doesn't help very much.
A much more elegant solution in my view is the B+Tree, but I’ll keep that for the next post.