Solr Optimization – document cache
A few months ago (here) we looked at filterCache. I’ve decided to update the optimization topic and take a look at the documentCache.
What it contains ?
So let’s start with information about the data that documentCache holds. documentCache contains Lucene documents that were fetched from the index. So little and so much.
What it is used for ?
Every object (Lucene document) stored in documentCache contains a list of references to the fields, that are stored with the document. Thanks to this, when a document is fetched and put into the cache it doesn’t have to be fetched again while processing another query. And this is why the number of I/O operations is reduces when rendering the query results list.
What to remember when using documentCache ?
When using documentCache you have to remember about to important things:
- documentCache can’t be autowarmed because it operates on identifiers that change after every commit operation.
- If you use lazy field loading (enableLazyFieldLoading=true) documentCache functionality is somehow limited. This means that the document stored in the documentCache will contain only those fields that were passed to the fl parameter. If the next query will try to get additional fields for the document stored in the cache, those additional fields will be fetched from the index.
The standard documentCache definition looks like this:
<documentCache class="solr.FastLRUCache" size="16384" initialSize="16384"/>
Let’s recall those parameters:
- class – class implementing the cache,
- size – the maximum cache size,
- initialSize – initial size of the cache.
How to configure
The usual question about cache is – what size should I set ? According to the information from the Solr wiki (http://wiki.apache.org/solr/SolrCaching#documentCache), the maximum size shouldn’t be less than the product of concurrent queries and the maximum number of documents fetched by the query. A simple relation that should ensure that Solr won’t have to fetch documents from the index during query processing.
Last few words
In the case of documentCache we don’t have to worry about how we construct our queries to properly use this cache. But please remember that documentCache requires memory, the more memory, the more field you stored in the index.