It appears that in my previous post I have had an issue with how I read the code. In particular, I looked at the commit log and didn’t look at the most recent changes with regards to how HyperLevelDB does the writes. Robert Escriva has been kind enough to point me in the right direction.
The way that this works is a lot more elegant, I think.
When you want to make a write to a file, you ask for a segment at a particular offset. If we have that offset already mapped, we give it to the caller. Otherwise, we increase the file size if needed, then map the next segment. That part is done under a lock, so there isn’t an issue of contention over the end of the file. That is much nicer than the pwrite method.
That said, however, I am still not sure about the issue with the two concurrent transactions. What actually happens here is that while we gained concurrency in the IO, there is still some serialization going on. In other words, even though transaction B was actually flushed to disk before transaction A, it would still wait for transaction A to complete, alleviating the concern that I have had about it.