Reviewing Resin (Part 7)
We conclude our review of the Resin NoSQL database by analyzing I/O-related issues and considering how we may have be unfair in some of our reviews.
Join the DZone community and get the full member experience.Join For Free
Looking back at this series, I have the strong feeling that I’m being unfair to Resin. I’m judging it using the same criteria I would use to judge our own production, highly optimized code. The projects have very different goals, maturities, and environments. That said, I think that a lot of the comments I have on the project are at the implementation level. That is, they can be fixed (except maybe the analyzer/tokenizer pipeline) by simply optimizing one method at a time. Even the architectural change with analyzing the text isn’t very big. What is important is that the code is quite clear, is easy to follow, and has a well-defined structure. That means that it is actually possible to make this changes as the project matures.
And now that this is out of the way, let me cover some of the things that I would have done differently in the codebase. A lot of them are I/O-related. The usage of all those different files and the way this is done is decidedly not optimal — in particular, opening and closing of the files constantly, reading and seeking all over the place, etc. The actual design seems to be based around LSM, even if this isn't stated explicitly. And that has pretty good semantics already for writes, but reads currently are probably leaning very heavily on the file system cache, and that won’t work as the data grows beyond a certain scope.
When running on a Unix system, you also need to consider the fact that there is a limit to the number of open files you have, so smaller number of files are generally preferred. I would go with merging all those files into a single large one, similar to the compound format that Lucene uses.
Once that is done, I would also memory map the entire file to memory and use direct memory accesses to handle all I/O. This has several very important advantages. First, I’m being a lot more explicit about using the file system cache, and that allows us to avoid a lot of system calls. Second, the data is already mostly structured as arrays, so it would be very natural to do so. This also avoids the need to manually buffer things in our own memory, which is always nice.
Next, there is the need to consider consistency checks. Resin, as it stands now (I’m not sure if this is an explicit design decision), takes the position that it is not its job to ensure file consistency. Lucene makes some attempt to ensure consistency and usually fails at that horribly at the most inconvenient moments. Adding a hash to the file will allow ensuring that the data is okay, but it means having to read the entire file when you open it, which is probably too expensive.
The other aspect that needs attention is the data structure used. In particular,
LcrsTrie is a good way to save space and might work well for in-memory usage, but it isn’t a good choice for persistent data structures. B+Tree or SST are the common choices and need to be evaluated for the job.
As part of this, and this is quite important, I would recommend getting a look at the full I/O status. That means:
- How do you write to disk?
- How do you update data?
- Do you have to write amplification (merges)?
- Are you trying for consistency/ACID?
- Can you explain how your data is persisted using algorithms/approaches that are well-known and trusted?
The latter is good both for external users (who can then reason about your system) and for yourself. If you are using LSM, you know that you have a set of problems (compactions, write amplifications) and solutions (auto optimize over time, etc.) that are well-known, and you can make use of that. If you are using B+Trees, then the problems and solutions space is different, but there is even more information about them.
If you are using consistency, are you using WAL or append-only? What are your consistency guarantees, etc.?
Those are all questions that need answers, and they have an impact on the design of the project as a whole.
And with this, this series is over. I have to say that I didn’t think that I would have so much to write about. It is a very interesting project.
Published at DZone with permission of Oren Eini, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.