Over a million developers have joined DZone.

Getting Documents from Disk Using the RavenDB Indexing Process

· Database Zone

Sign up for the Couchbase Community Newsletter to stay ahead of the curve on the latest NoSQL news, events, and webinars. Brought to you in partnership with Coucbase.

As I noted in my previous post, we have done major optimizations for RavenDB. One of the areas where we improved the performance was reading the documents from the disk for indexing.

In Pseudo Code, it looks like this:

while database_is_running:
  stale = find_stale_indexes()
  lastIndexedEtag = find_last_indexed_etag(stale)
  docs_to_index = get_documents_since(lastIndexedEtag, batch_size)

As it turned out, we had a major optimization option here, because of the way the data is actually structured on disk. In simple terms, we have an on disk index that lists the documents in the order in which they were updated, and then we have the actual documents themselves, which may be anywhere on the disk.

Instead of loading the documents in the orders in which they were modified, we decided to try something different. We first query the information we need to find the document on disk from the index, then we sort them based on the optimal access pattern, to reduce disk movement and ensure that we have as sequential reads as possible. Then we take those results in memory and sort them based on their last update time again.

This seems to be a perfectly obvious thing to do, assuming that you are aware of such things, but it is actually something that is very easy not to notice. The end result is quite promising, and it contributed to the 7+ times improvements in perf that we had for indexing costs.

But surprisingly, it wasn’t the major factor, I’ll discuss a huge perf boost in this area tomorrow.

The Getting Started with NoSQL Guide will get you hands-on with NoSQL in minutes with no coding needed. Brought to you in partnership with Couchbase.

Topics:

Published at DZone with permission of Ayende Rahien, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}