DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Join us tomorrow at 1 PM EST: "3-Step Approach to Comprehensive Runtime Application Security"
Save your seat

RavenDB: .NET Memory Management

Oren Eini user avatar by
Oren Eini
·
Jul. 09, 12 · Interview
Like (0)
Save
Tweet
Share
4.73K Views

Join the DZone community and get the full member experience.

Join For Free

we just got a support issue from a customer, regarding out of control memory usage of ravendb during indexing. that was very surprising, because a few months ago i spent a few extremely intense weeks making sure that this won’t happen, building ravendb auto tuning support.

luckily, the customer was able to provide us with a way to reproduce things locally. and that is where things get interesting. here are a few fun facts, looking at the timeline of the documents, it was something like this:

image

note that the database actually had several hundreds of thousands of documents, and the reason i am showing you this merely to give you some idea about the relative sizes.

as it turned out, this particular mix of timeline sizes is quite unhealthy for ravendb during the indexing period. why?

ravendb has a batch size, the number of documents that would be indexed in a particular batch. this is used to balance between throughput and latency in ravendb. the higher the batch, the higher the latency, but the bigger the throughput.

along with the actual number of documents to index, we also have the need to balance things like cpu and memory usage. ravendb assumes that the cost of processing a batch of documents is roughly related to the number of documents.

in other words, if we just used 1 gb to index 512 documents, we would probably use roughly 2 gb to index the next 1,024 documents. this is a perfectly reasonable assumption to make, but it also hide an implicit assumption in there, that the size of documents is roughly the same across the entire data set. this is important because otherwise, you have the following situation:

  • index 512 documents – 1 gb consumed, there are more docs, there is more than 2 gb of available memory, double batch size.
  • index 1,024 documents – 2.1 gb consumed, there are more docs, there is more than 4 gb of available memory, double batch size.
  • index 2,048 documents – 3 gb consumes, there are more docs, there is enough memory, double batch size.
  • index 4,092 documents -  and here we get to the obese documents!

by the time we get to the obese documents, we have already increased our batch size significantly, so we are actually trying to read a lot of documents, and suddenly a lot of them are very big.

that caused ravendb to try to consume more and more memory. now, if it had enough memory to do so, it would detect that it is using too much memory, and drop back, but the way this dataset is structured, by the time we get there, we are trying to load tens of thousands of documents, many of them are in the multi megabyte range.

this was pretty hard to fix, not because of the actual issue, but because just reproducing this was tough, since we had other issues just getting the data in. for example, if you were trying to import this dataset in, and you choose a batch size that was greater than 128, you would also get failures, because suddenly you had a batch of documents that were extremely large, and all of them happened to fall within a single batch, resulting in a error saving them to the database.

the end result of this issue is that we now take into account actual physical size in many more places inside ravendb, and that this error has been eradicated. we also have much nicer output for the smuggler tool smile .

on a somewhat related note. ravendb and obese documents.

ravendb doesn’t actually have a max document size limitation. in contrast to other document databases, which have a hard limit at 8 or 16 mb, you can have a document as big as you want*. it doesn’t mean that you should work with obese documents. documents that are multi megabytes tend to be… awkward to work with, and they generally aren’t respecting the most important aspect of document modeling in ravendb, follow the transaction boundary. what does it means that it is awkward to work with obese documents?

just that, it is awkward. serialization times are proportional to the document time, as are retrieval time from the server, and of course, the actual memory usage on both server and client are impacted by the size of the documents. it is often easier to work with smaller documents that a few obese ones.

* well, to be truthful, we do have a hard limit, it is somewhere just short of the 2 gb mark, but we don’t consider this realistic.

Memory (storage engine) Document

Published at DZone with permission of Oren Eini, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Last Chance To Take the DZone 2023 DevOps Survey and Win $250! [Closes on 1/25 at 8 AM]
  • The Role of Data Governance in Data Strategy: Part II
  • Why It Is Important To Have an Ownership as a DevOps Engineer
  • Unlocking the Power of Polymorphism in JavaScript: A Deep Dive

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: