Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Compression & Memory Sizing in RavenDB

DZone's Guide to

Compression & Memory Sizing in RavenDB

· Performance Zone
Free Resource

Transform incident management with machine learning and analytics to help you maintain optimal performance and availability while keeping pace with the growing demands of digital business with this eBook, brought to you in partnership with BMC.

We had an interesting issue at a customer recently, and it took a while to resolve, so it is interesting. The underlying problem as stated by the customer was that they tried to reset an index, and RavenDB took all the memory on the machine. That is kinda rude, and we have taken steps to ensure that this wouldn’t happen, so that was surprising.

So we settled down to figure out what was going on, and we were able to reproduce this locally. That was strange, and quite annoying. We finally narrowed things down to very large documents. There was a collection on this database where most documents were several MB in size. We do a lot of rate limiting, to avoid overloading, but we do usually do that on the count of documents, not the size of them. Except, that we actually do limit the size where it matter.

When we load documents to be indexed, we specify a maximum size (usually 128MB) per batch. However… what actually happens is that we go to the storage and ask it, give us X amount of documents, up to Y amount in size. The question is, what size. And in this case, we are using the size on disk, which has a close correlation to the actual size taken when loading into a JSON object. Except… when we use compression.

When we have compression, what actually happens is that we limited to the maximum size of the compressed data, which can be 10% – 15% of the actual data size when we decompress it in memory. That was a large part of the problem. Another issue was what happened when we had prefetching enabled. We routinely prefetch data to memory so we won’t have to wait for I/O. The prefetcher only considered documents count, but when each document is multiple MB, we really needed to consider both count and size.

Both issues were fixed, and there are no longer any issues with this dataset. Interestingly enough, the customer had no issue when creating the database, because it only had to deal with whatever changed, and not the entire data set.

Evolve your approach to Application Performance Monitoring by adopting five best practices that are outlined and explored in this e-book, brought to you in partnership with BMC.

Topics:

Published at DZone with permission of Oren Eini, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}