I mentioned in passing that a lot of the actual memory use we have in RavenDB 4.0 is in unmanaged code. That frees us from the benevolent tyranny of the garbage collector, and it allows us to optimize our memory utilization to fit our usage scenarios. In particular, most of our memory is used in a thread local manner, and there, it is used in one of two ways: dedicated threads that do a specific type of work (indexing, replication, etc.) and request processing threads (which handle all requests).
A typical scenario for us would be to have a lot of requests come in all of a sudden and require us to allocate a bunch of memory — for example, if we have just run a benchmark. Or, for production usage, a spike in traffic that then turns into a lull.
One of the common hotspots in high-performance servers is memory allocations. In particular, allocating and de-allocating can kill your performance. So you don’t do that. Instead, you pool the memory and reuse it, hopefully without major costs for doing that.
Which is all well and good, until you get that spike. In this case, you allocated memory to handle the traffic, and after the load is over, you are still holding onto that memory. We probably should give it back so something else can use it. Sure, it will be paged out to disk eventually, etc., and virtual memory is there to cover us, but that is really not the right way to write high-performance stuff.
Now the way we want to handle it, we want to wait until everything is calm and then start de-allocating all that memory that isn’t being used. But that led to a problem. How can we do this if everything is thread local? And you want to have everything thread local because synchronization on memory allocation/de-allocation is one of the most common performance killers.
So we handle this by using a MutliReaderSingleWriterStack. The idea is that we have a designated data structure that has only a single write and is held in a thread local value, but which is expected to be read by multiple threads. In particular, the thread that is going to be reading from it is the idle thread. Once a minute, this is going to scan all the threads, find all the saved memory that we have allocated but haven’t used in the past minute, and clear it.
Because this can run concurrently with the thread using memory itself, we’ll only free the memory that is parked (as in, waiting for work to come), not memory that is actively used. And we also need to be aware that we aren’t taking the idle memory to be disposed at the same time the thread is reaching for its memory bank to pull a new range. Here, we have to have some thread-safe operation. We do that using an interlocked operation on the InUse field. Both the cleanup thread and the owning thread will try to set that value, but only one of them will succeed.
In this way, under load, we get to use purely local memory, but we also have automatic cleanup going on if we have an idle period.