DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations

Trending

  • HashMap Performance Improvements in Java 8
  • How to Optimize CPU Performance Through Isolation and System Tuning
  • Implementing RBAC in Quarkus
  • How AMD's Heterogeneous Systems Architecture Works, and Why
  1. DZone
  2. Data Engineering
  3. Data
  4. Understanding Solr Soft Commits And Data Durability

Understanding Solr Soft Commits And Data Durability

John Berryman user avatar by
John Berryman
·
Apr. 29, 13 · Interview
Like (0)
Save
Tweet
Share
4.63K Views

Join the DZone community and get the full member experience.

Join For Free

I ran into an interesting problem today. I was working with the first project where we legitimately needed Solr soft commits and in testing my configuration I wanted to prove to myself that the soft commits were performing as expected. Namely, I expected soft commits to flush all added documents to an in-RAM index so that they would appear in search results. Furthermore, and importantly, I expected soft commits not to flush the indices to disk. Disk IO is expensive and since we are very IO constrained in this project, my main goal was to prove that soft commits were not writing to disk. The problem was – I couldn’t prove it! At least not at first. However, by the time I finally understood what was happening I had also gained a much better understanding of how Solr’s new soft commits actually work. In the remainder of this post I will walk you through my experimentation and have you think through it with me as if you were setting beside me during my experimentation.

Here’s the setup: In Solr’s solrconfig.xml, I indicated that I wanted Solr to soft commit all pending documents every 10 seconds. Then at 30 seconds, I wanted to hard commit.

<updateHandlerclass="solr.DirectUpdateHandler2"><updateLog><strname="dir">${solr.ulog.dir:}</str></updateLog><autoCommit><maxTime>30000</maxTime></autoCommit><autoSoftCommit><maxTime>10000</maxTime></autoSoftCommit></updateHandler>

I used SolrJ to load about a 100 documents into Solr and then I jumped over quickly to a browser and periodically issued a series of “select all” query to solr. Here’s about how it went:

  • Time 0:01 — no documents
  • Time 0:02 — no documents
  • Time 0:03 — no documents
  • Time 0:04 — no documents
  • Time 0:05 — no documents
  • Time 0:06 — no documents
  • Time 0:07 — no documents
  • Time 0:08 — no documents
  • Time 0:09 — no documents
  • Time 0:10 — 100 documents

So far so good. Soft commit appeared to be working; I just needed verify that nothing was written to disk. I immediately stopped Solr (a control-C in the terminal running Solr), and then restarted Solr and what did I see? –Documents — 100 of them!

“So,” I said grumpily “Solr is actually hard committing when I’m telling it to soft commit!” But then it occurred to me that maybe Solr was doing something smart to protect me from data loss. “What about that updateLog thing I see in the updateHandler?”

After tracking down the log files, (they’re stored sibling to the data directory), I found that, sure enough, they are keeping track of all the items that haven’t yet been hard committed to Solr. So, first lesson:

Whenever Solr wakes up, it looks for these logs and replays them back to recover from any potential data loss.

So, now it was clear what I had to do. I spun up the same experiment, I submitted about 100 docs to solr, and I kept refreshing the solr “select all” query until after 10 seconds my documents appeared. I then stopped Solr (control-C) and this time, smiling smugly, I deleted the log files and restarted Solr, resubmitted my “select all” query and found … 100 documents.

At this point, words don’t readily express the confusion that I felt. However, this emoticon does a pretty good job:

:-/

I’m becoming upset that I was wrong and still concerned that maybe Solr was writing my soft commits to disk. Then it occurred to me that perhaps my method of killing Solr – control-C – was simply not violent enough.

I created one more experiment. This time I

  • turned on Solr
  • submitted 100 or so documents
  • re-queried “all docs” repetitively until after 10 seconds they appeared
  • then I killed the task with a kill -9 <processId>
  • I removed the log files
  • I restarted Solr

And finally, once I again started Solr and queried for all documents – there were none! So, my second lesson for the day:

When Solr is shut down politely, it does you the favor of hard committing all outstanding documents to disk.

And so my final assessment of the situation is that not only do the soft commits perform as expected, but Solr does several things to make sure you’re not losing any of your data.

Commit (data management) Data (computing) Document Durability (database systems)

Published at DZone with permission of John Berryman, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Trending

  • HashMap Performance Improvements in Java 8
  • How to Optimize CPU Performance Through Isolation and System Tuning
  • Implementing RBAC in Quarkus
  • How AMD's Heterogeneous Systems Architecture Works, and Why

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com

Let's be friends: