Understanding Solr Soft Commits And Data Durability
I ran into an interesting problem today. I was working with the first project where we legitimately needed Solr soft commits and in testing my configuration I wanted to prove to myself that the soft commits were performing as expected. Namely, I expected soft commits to flush all added documents to an in-RAM index so that they would appear in search results. Furthermore, and importantly, I expected soft commits not to flush the indices to disk. Disk IO is expensive and since we are very IO constrained in this project, my main goal was to prove that soft commits were not writing to disk. The problem was – I couldn’t prove it! At least not at first. However, by the time I finally understood what was happening I had also gained a much better understanding of how Solr’s new soft commits actually work. In the remainder of this post I will walk you through my experimentation and have you think through it with me as if you were setting beside me during my experimentation.
Here’s the setup: In Solr’s solrconfig.xml, I indicated that I wanted Solr to soft commit all pending documents every 10 seconds. Then at 30 seconds, I wanted to hard commit.
I used SolrJ to load about a 100 documents into Solr and then I jumped over quickly to a browser and periodically issued a series of “select all” query to solr. Here’s about how it went:
- Time 0:01 — no documents
- Time 0:02 — no documents
- Time 0:03 — no documents
- Time 0:04 — no documents
- Time 0:05 — no documents
- Time 0:06 — no documents
- Time 0:07 — no documents
- Time 0:08 — no documents
- Time 0:09 — no documents
- Time 0:10 — 100 documents
So far so good. Soft commit appeared to be working; I just needed verify that nothing was written to disk. I immediately stopped Solr (a control-C in the terminal running Solr), and then restarted Solr and what did I see? –Documents — 100 of them!
“So,” I said grumpily “Solr is actually hard committing when I’m telling it to soft commit!” But then it occurred to me that maybe Solr was doing something smart to protect me from data loss. “What about that updateLog thing I see in the updateHandler?”
After tracking down the log files, (they’re stored sibling to the data directory), I found that, sure enough, they are keeping track of all the items that haven’t yet been hard committed to Solr. So, first lesson:
Whenever Solr wakes up, it looks for these logs and replays them back to recover from any potential data loss.
So, now it was clear what I had to do. I spun up the same experiment, I submitted about 100 docs to solr, and I kept refreshing the solr “select all” query until after 10 seconds my documents appeared. I then stopped Solr (control-C) and this time, smiling smugly, I deleted the log files and restarted Solr, resubmitted my “select all” query and found … 100 documents.
At this point, words don’t readily express the confusion that I felt. However, this emoticon does a pretty good job:
I’m becoming upset that I was wrong and still concerned that maybe Solr was writing my soft commits to disk. Then it occurred to me that perhaps my method of killing Solr – control-C – was simply not violent enough.
I created one more experiment. This time I
- turned on Solr
- submitted 100 or so documents
- re-queried “all docs” repetitively until after 10 seconds they appeared
- then I killed the task with a
kill -9 <processId>
- I removed the log files
- I restarted Solr
And finally, once I again started Solr and queried for all documents – there were none! So, my second lesson for the day:
When Solr is shut down politely, it does you the favor of hard committing all outstanding documents to disk.
And so my final assessment of the situation is that not only do the soft commits perform as expected, but Solr does several things to make sure you’re not losing any of your data.