Reviewing LevelDB: Part IX, Compaction is the New Black
Join the DZone community and get the full member experience.Join For Free
after going over the versionset, i understand how leveldb decide when to compact and what it decide to compact. more or less, at least.
this means that i now mostly can figure out what this does:
a user may force manual compaction of a range, or we may have reasons of our own to decide to compact, based on leveldb heuristics. either way, we get the compaction object, which tells us what files we need to merge.
there is a check there whatever we can do a trivial compaction, that is, just move the file from the current level to level+1. the interesting thing is that we avoid doing that if this is going to cause issues in level+2 (require more expensive compaction later on).
but the interesting work is done in docompactionwork, where we actually do compaction of complex data.
the code refers to snapshots for the first time that i see. we only merge values that are in a valid snapshot. so data doesn't “go away” for users. while holding a snapshot active.
the actual work starts in here:
this give us the ability to iterate over all of the entries in all of the files that need compaction.
and then we go over it:
but note that on each iteration we do a check if we need to compactmemtable(); i followed the code there, and we finally write stuff to disk! i am quite excited about this, but i'll handle that in a future blog post. i want to see how actual compaction works.
we then have this:
this is there to stop a compaction that would force a very expensive compaction next time, i think.
as a side note, this really drive me crazy:
note that we have current_output() and filesize() in here. i don't really care what naming convention you use, but i would really rather that you had one. if there is one for the leveldb code base, i can't say that i figured it out. it seems like mostly it is pascalcase, but every now & then we have this_style methods.
back to the code, it took me a while to figure it out.
will return values in sorted key order , that means that if you have the same key in multiple levels, we need to ignore the older values. after this is happening, we now have this:
this is where we are actually writing stuff out to the sst file! this is quite exciting :-). i have been trying to figure that out for a while now.
the rest of the code in the function is mostly stuff related to stats book keeping, but this looks important:
this generate the actual versionedit, which will remove all of the files that were compacted and add the new file that was created as a new version to the versionset.
good work so far, even if i say so myself. we actually go to where we are building the sst files. now it is time to look at the code that build those table. next post, table builder...
Published at DZone with permission of Oren Eini, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Operator Overloading in Java
Implementing a Serverless DevOps Pipeline With AWS Lambda and CodePipeline
From On-Prem to SaaS
Auditing Tools for Kubernetes