From time to time, in working with Solr there is a common problem – when do you update the Solr index structure. There are various reasons for these changes – the new functional requirements, optimization, or anything else – it is not important. What is important is the questions that arise – should we remove the index, or simply change the structure and do a full indexing? Contrary to appearances, the answer to this question depends on the changes we made in the structure of the index.
Personally, I am an advocate of solutions that have the smallest chance to cause problems – I just like to sleep at night. I think that removing the index after updateing its structure and then doing the full indexation of the data is one of those solutions, at least in my opinion. I am aware, however, that this type of solution is not always acceptable. So when are we not forced to remove the index, and when will it expose us to potential problems with the Solr when we don't do it?
The answer to the question depends on what changed in the structure of the index. Such changes can be divided into three areas covering most of the changes that we make in the structure of the index:
- Adding / removing new field
- Similarity modification
- Field modification
Adding / removing new field
In the case of the first type of modification of the matter is quite simple – if we add or remove a new field to schema.xml there is no need to remove the entire index before re-indexing. Solr handles adding a new field to the current index. Of course, you should be aware that the documents which will not be after this operation will not be re-indexed or automatically updated.
In the second case – the change of the class that is responsible for Similarity also does not force us to to delete the index after the change. But unlike the previous example, if we want Solr to correctly calculate the score, and thus to sort in the correct order we will be forced to re-index all documents previously present in the index.
Let's stop a minute on the third case. Let’s suppose that we modify slightly the field in the index for a prosaic reason – we are no longer are interested in the normalization of its length. We set omitNorms=”true” (I assume that the previous setting was omitNorms=”false”). If we re-index all the documents, the Lucene indexes, in the combined segments, will still have information about length normalization of the field. Something went wrong. This is precisely the case when it is necessary to delete the index after the change to its structure, and prior to full indexation. At first glance, it seems that this is a very small change, but thinking further, we have some side effects of the change. It is worth remembering that some of the field properties are overwritten by others, as in the case of normalization of the length – if one segment will have length normalization, and the second will not, when you combine the segments you will have length normalization in the one that was created.