Optimizing Performance of RavenDB's Indexing Process
Join the DZone community and get the full member experience.Join For Free
The actual process done by RavenDB to index documents is a fairly complex one. In order to understand what exactly happened, I decided to break it apart to pseudo code.
It looks something like this:
<span class="kwrd">while</span> database_is_running: stale = find_stale_indexes() lastIndexedEtag = find_last_indexed_etag(stale) docs_to_index = get_documents_since(lastIndexedEtag, batch_size) filtered_docs = execute_read_filters(docs_to_index) indexing_work =  <span class="kwrd">for</span> index <span class="kwrd">in</span> stale: index_docs = select_matching_docs(index, filtered_docs) <span class="kwrd">if</span> index_docs.empty: set_indexed(index, lastIndexedEtag) <span class="kwrd">else</span> indexing_work.add(index, index_docs) <span class="kwrd">for</span> work <span class="kwrd">in</span> indexing_work: work.index(work.index_docs)
And now let me show you the areas in which we did some perf work:
All of which gives us a major boost in the system performance. I’ll discuss each part of that work in detail, don’t worry
Opinions expressed by DZone contributors are their own.