DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

How does AI transform chaos engineering from an experiment into a critical capability? Learn how to effectively operationalize the chaos.

Data quality isn't just a technical issue: It impacts an organization's compliance, operational efficiency, and customer satisfaction.

Are you a front-end or full-stack developer frustrated by front-end distractions? Learn to move forward with tooling and clear boundaries.

Developer Experience: Demand to support engineering teams has risen, and there is a shift from traditional DevOps to workflow improvements.

Trending

  • Understanding the Shift: Why Companies Are Migrating From MongoDB to Aerospike Database?
  • Exceptions in Lambdas
  • The Future of Java and AI: Coding in 2025
  • Jakarta WebSocket Essentials: A Guide to Full-Duplex Communication in Java
  1. DZone
  2. Coding
  3. Java
  4. Lucene and Solr's CheckIndex to the Rescue!

Lucene and Solr's CheckIndex to the Rescue!

By 
Rafał Kuć user avatar
Rafał Kuć
·
Sep. 22, 11 · News
Likes (1)
Comment
Save
Tweet
Share
21.3K Views

Join the DZone community and get the full member experience.

Join For Free

while using lucene and solr we are used to a very high reliability. however, there may come a day when solr will inform us that our index is corrupted, and we need to do something about it. is the only way to repair the index to restore it from the backup or do full indexation? no – there is hope in the form of checkindex tool.

what is checkindex ?

checkindex is a tool available in the lucene library, which allows you to check the files and create new segments that do not contain problematic entries. this means that this tool, with little loss of data is able to repair a broken index, and thus save us from having to restore the index from the backup (of course if we have it) or do the full indexing of all documents that were stored in solr.

where do i start?

please note that, according to what we find in javadocs, this tool is experimental and may change in the future. therefore, before starting to work with it we should create a copy of the index. in addition, it is worth knowing that the tool analyzes the index byte by byte, and thus for large indexes the time of analysis and repair may be large. it is important not to run the tool with the -fix option at the moment when it is used by solr or other applications based on the lucene library. finally, be aware that the launch of the tool in repairing mode may result in removal of some or all documents that are stored in the index.

how to run it

to run the utility, go to the directory where the lucene library files are located and run the following command:
java -ea:org.apache.lucene... org.apache.lucene.index.checkindex index_path -fix

in my case, it looked as follows:

java -cp lucene-core-2.9.3.jar -ea:org.apache.lucene... org.apache.lucene.index.checkindex e:\solr\solr\data\index\ -fix
after a while i got the following information :
opening index @ e:solrsolrdataindex

segments file=segments_2 numsegments=1 version=format_diagnostics [lucene 2.9]
1 of 1: name=_0 doccount=19
compound=false
hasprox=true
numfiles=11
size (mb)=0,018
diagnostics = {os.version=6.1, os=windows 7, lucene.version=2.9.3 951790 - 2010-06-06 01:30:55, source=flush, os.arch=x86, java.version=1.6.0_23, java.vendor=sun microsystems inc.}
no deletions
test: open reader.........ok
test: fields..............ok [15 fields]
test: field norms.........ok [15 fields]
test: terms, freq, prox...ok [900 terms; 1517 terms/docs pairs; 1707 tokens]
test: stored fields.......ok [232 total field count; avg 12,211 fields per doc]
test: term vectors........ok [3 total vector count; avg 0,158 term/freq vector fields per doc]

no problems were detected with this index.

it mean that the index is correct and there was no need for any corrective action. additionally, you can learn some interesting things about the index ;)

broken index

but what happens in the case of the broken index? there is only one way to see it – let’s try. so, i broke one of the index files and ran the checkindex tool. the following appeared on the console after i’ve run the checkindex tool:


opening index @ e:solrsolrdataindex

segments file=segments_2 numsegments=1 version=format_diagnostics [lucene 2.9]
1 of 1: name=_0 doccount=19
compound=false
hasprox=true
numfiles=11
size (mb)=0,018
diagnostics = {os.version=6.1, os=windows 7, lucene.version=2.9.3 951790 - 2010-06-06 01:30:55, source=flush, os.arch=x86, java.version=1.6.0_23, java.vendor=sun microsystems inc.}
no deletions
test: open reader.........failed
warning: fixindex() would remove reference to this segment; full exception:
org.apache.lucene.index.corruptindexexception: did not read all bytes from file "_0.fnm": read 150 vs size 152
at org.apache.lucene.index.fieldinfos.read(fieldinfos.java:370)
at org.apache.lucene.index.fieldinfos.<init>(fieldinfos.java:71)
at org.apache.lucene.index.segmentreader$corereaders.<init>(segmentreader.java:119)
at org.apache.lucene.index.segmentreader.get(segmentreader.java:652)
at org.apache.lucene.index.segmentreader.get(segmentreader.java:605)
at org.apache.lucene.index.checkindex.checkindex(checkindex.java:491)
at org.apache.lucene.index.checkindex.main(checkindex.java:903)

warning: 1 broken segments (containing 19 documents) detected
warning: 19 documents will be lost

note: will write new segments file in 5 seconds; this will remove 19 docs from the index. this is your last chance to ctrl+c!
5...
4...
3...
2...
1...
writing...
ok
wrote new segments file "segments_3"

as you can see, all the 19 documents that were in the index have been removed. this is an extreme case, but you should realize that this tool might work like this.

the end

if you remember about the basisc assumptions associated with the use of the checkindex tool you may find yourself in a situation when this tool will come in handy and you will not have to ask yourself a question like “when was the last backup was made?”


Lucene

Published at DZone with permission of Rafał Kuć, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • [email protected]

Let's be friends: