I am currently playing around with persistent B+ trees. This is done using memory-mapped files and pointer arithmetic in C#. I consider this a non-trivial exercise, and certainly quite outside of my usual haunts. Obviously, as you can imagine, there are bugs. Those can be pretty hard to figure out because they can arise from any number of issues (usually it is me, but still …).
After fighting for a few hours with a set of bugs that just wouldn’t let me be, I decided that I had enough of that and set out to design the debug hooks I need to actually figure it out. Because I am dealing with pointers, I couldn’t just use the normal methods that I would use when debugging C# code. To top that, I think that most of you would agree that explaining a data structure (any data structure) without a whiteboard is nigh impossible. In the same manner, trying to figure out a bug in a B+ tree by following the member references was too hard. I wanted to actually visualize it.
I could throw it into a TreeView in a few minutes, but I wanted to take the time to do it right. I have a feeling that this is going to be useful for the future as well, so I learned how to use Graphviz’s dot language and wrote the code to dump the structure of the B+Tree into a properly formatted dot file, after which I could ask dot.exe to turn that into an image. The piece of code that I was having trouble with was the page split function.
Here is the before snapshot. Note that we have nearly run out of room in this page.
And here is the output of the after snapshot. As you can see, this really makes no sense at all. We have a page split, but it all went into the same page?
What actually happened is that I had a bug where I wired both the left and right pages in the split to the original page. Once that was fixed (and once I actually saw the data), it was very easy to see what was wrong. It was a matter of changing a single parameter. After the fix, I could visually see that everything was fine.
Leaving aside the details about the actual bug and fix, the lesson that I wish to impart today is that building those sort of capabilities into your systems is important. Now that I have this ability, I fully expect to be start using it when I need to debug the next issue.
In fact, I ran into the next issue quite quickly. After inserting quite a lot of data, I messed up the pointers and returned what looked very much like corrupted data. It was pretty hard to isolate (as usual with pointer issues) and, since the actual error occurred at some point during the process, I wasn’t really clear on where and how to figure it out. I ended up just dumping the tree content into a file after every operation, and then checking the operations until I found the one where we started to have corruption.
That allowed me to figure out that the corruption began at step 98 and iteration four.
As it turned out, that was pretty hard to figure out because I didn’t realize that when I was splitting the page I needed to compact the data on the original page. Too much thinking in List<T> and not enough thinking about the layout of the problem in memory.