Over a million developers have joined DZone.

Compression & Memory Sizing in RavenDB

· Performance Zone

Download Forrester’s “Vendor Landscape, Application Performance Management” report that examines the evolving role of APM as a key driver of customer satisfaction and business success, brought to you in partnership with BMC.

We had an interesting issue at a customer recently, and it took a while to resolve, so it is interesting. The underlying problem as stated by the customer was that they tried to reset an index, and RavenDB took all the memory on the machine. That is kinda rude, and we have taken steps to ensure that this wouldn’t happen, so that was surprising.

So we settled down to figure out what was going on, and we were able to reproduce this locally. That was strange, and quite annoying. We finally narrowed things down to very large documents. There was a collection on this database where most documents were several MB in size. We do a lot of rate limiting, to avoid overloading, but we do usually do that on the count of documents, not the size of them. Except, that we actually do limit the size where it matter.

When we load documents to be indexed, we specify a maximum size (usually 128MB) per batch. However… what actually happens is that we go to the storage and ask it, give us X amount of documents, up to Y amount in size. The question is, what size. And in this case, we are using the size on disk, which has a close correlation to the actual size taken when loading into a JSON object. Except… when we use compression.

When we have compression, what actually happens is that we limited to the maximum size of the compressed data, which can be 10% – 15% of the actual data size when we decompress it in memory. That was a large part of the problem. Another issue was what happened when we had prefetching enabled. We routinely prefetch data to memory so we won’t have to wait for I/O. The prefetcher only considered documents count, but when each document is multiple MB, we really needed to consider both count and size.

Both issues were fixed, and there are no longer any issues with this dataset. Interestingly enough, the customer had no issue when creating the database, because it only had to deal with whatever changed, and not the entire data set.

See Forrester’s Report, “Vendor Landscape, Application Performance Management” to identify the right vendor to help IT deliver better service at a lower cost, brought to you in partnership with BMC.


Published at DZone with permission of Ayende Rahien, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}