Solving a RavenDB Technical Interview Question

DZone 's Guide to

Solving a RavenDB Technical Interview Question

Hibernating Rhinos goes open kimono and shows us the interview question they ask candidates applying to the firm — along with the ideal solution.

· Database Zone ·
Free Resource

For the interview question answered in this article, see this post.

So the first thing that we need to decide is what will be the data format on the trie we implemented to answer the original question. Since we have only 32KB to work with, we need to consider the actual storage carefully.

32KB is small enough to fit in an unsigned short, so all the references we’ll use will be shorts. We also need to store a bit of metadata, so we’ll use the first 4 bytes as the header for just that.

  • ushort SpaceUsed;
  • ushort LastAllocation;

Now that we have this, we need to decide how to store the actual data. To make things easy, we are going to define the following way to allocate memory:

public bool TryAllocate(sbyte size, out ushort pos)
    pos = 0;
    // we use one byte for the size
    if( _header->LastAllocation + size + 1 > 32*1024)
        return false;
    pos = _header->LastAllocation + 1;
    _header->LastAllocation+=size +1;
    _mem[pos - 1] = size;// size marker
    return true;

public void Free(ushort pos)
    _header->SpaceUsed -= _mem[pos-1]-1;
    _mem[pos-1] *= -1; // delete marker

This is about the simplest way that you can go about doing things. Note that we use a length prefix value, and we limit allocations to a max of 127 bytes each. We use a negative size to indicate a delete marker.

So basically, now we have a pretty trivial way to allocate memory, and we can implement the trie as we would normally. There are a few wrinkles, however.

Deleting the memory doesn’t actually make it eligible for reuse, and it is quite likely to get fragmented easily. In order to handle that, we will track the amount of space that is used, and if we get to the end of the space, we’ll check the UsedSpace value. If this is still too little, we can abort -- there is no available space here. However, if we go to the end of the buffer, but we have free space available, we can do the following:

  • Scan the buffer for available spots (find available locations that have negative size).
  • Failing that, we will copy the data to a temporary buffer, then re-add everything to the buffer from scratch. In other words, we defrag it.

Another issue we have is that the maximum size we can allocate is 127. This value is big enough that most actual strings can fit into it nicely, but a trie already has the property that a large string can be broken into pieces, so we’ll just cut each node in the trie to a max size of 127. Actually, the max size is likely to be less than that, because there is also some information that we need to keep track of per entry:

  • byte NumberOfChildren;
  • byte Flags; // node type, (internal, leaf or both)
  • ushort ChildrenPosition;

So in practice we have about 123 bytes to work with for the length. Note that we don’t store the string value of the node’s length (we can get that from the allocation information), and that we store the actual children in an array that is stored separately. This allows us to easily add items to the trie as child nodes. If the node is a leaf node, and we also need to store the actual value (which is 8 bytes), we store that information at the end of the value (giving us 115 bytes for that section of the value).

All in all, there is going to be a bit of pointer arithmetic and a bit of bit counting, but it's likely to be a pretty simple implementation.

Note that additional optimizations would be to try align everything so it will fit into a cache line, trying to place nodes near their children (which are more likely to be followed), etc.

interview answers, interview questions

Published at DZone with permission of Oren Eini , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}