DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations

The New Spell Checker in Solr 4.0

Rafał Kuć user avatar by
Rafał Kuć
·
Apr. 30, 12 · Interview
Like (0)
Save
Tweet
Share
11.33K Views

Join the DZone community and get the full member experience.

Join For Free

One of the new features, which will be introduced in Solr 4.0 is a new SpellChecker implementation that doesn’t require its own index. I decided to take a quick look at it and share my thoughts.

What We Have Today

As for today (Solr 3.6) we can use the following SpellChecker implementations:

  • org.apache.solr.spelling.IndexBasedSpellChecker
  • org.apache.solr.spelling.FileBasedSpellChecker

With the upcoming Solr 4.0, we will get a new implementation:

  • org.apache.solr.spelling.DirectSolrSpellChecker


Current Problems

In most of the cases I worked with the main problem of IndexBasedSpellChecker was the need to rebuild its index. In some cases the rebuild was long and it wasn’t possible to rebuild that index after every commit which for some was a bit issue. Of course it wasn’t a problem with FileBasedSpellChecker, but again, in my case, it was used as a support mechanism for the IndexBasedSpellChecker.

Configuration

DirectSolrSpellChecker configuration is similar to the one you are used today in Solr 3. Of course, there are some additional parameters. Following you can find a sample configuration:

<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
  <str name="queryAnalyzerFieldType">textTitle</str>
  <lst name="spellchecker">
    <str name="name">default</str>
    <str name="field">title</str>
    <str name="classname">solr.DirectSolrSpellChecker</str>
    <str name="distanceMeasure">internal</str>
    <float name="accuracy">0.7</float>
    <int name="maxEdits">2</int>
    <int name="minPrefix">1</int>
    <int name="maxInspections">5</int>
    <int name="minQueryLength">4</int>
    <float name="maxQueryFrequency">0.01</float>
    <float name="thresholdTokenFrequency">.01</float>
  </lst>
</searchComponent>

And the meaning for each of the parameters:

  • queryAnalyzerFieldType – name of the type on which basis SpellChecker query will be analyzed.
  • field – field which contents will be used to build SpellChecker results.
  • classname – SpellChecker implementation class.
  • distanceMeasure – algorithm which will be used to calculate terms distance, in our case we will use the default ones (Levensthein’s).
  • accuracy – precision that must be achieved for the suggest to be counted as proper one.
  • maxEdits – maximum number of changes during term enumeration. This property can be set to 1 or 2.
  • minPrefix – minimal, common prefix during term enumeration.
  • maxInspections – maximum number of checks for each suggestion.
  • minQueryLength – minimal suggestion length for work to be taken into consideration as proper suggestion.
  • maxQueryFrequency – maximum percentage of documents in which word can appear for the word to be considered as one to correct (0.01 value means 1%).
  • thresholdTokenFrequency -  minimal percentage of documents in which suggestion have to appear in order for it to be considered proper (.01 value means 1%).


The above configuration attributes shows that DirectSolrSpellChecker gives us much degree of behavior configuration.

Usage

DirectSolrSpellChecker is no different than other SpellChecker implementations when it comes to using it. As with the previous implementations you can configure Solr to add SpellChecker results to each query results or just configure new handler and decide when to query it for results. We wrote about how to use SpellChecker in the past – in the “Car sale application” example.

What We Can Expect ?

Acording to the information which we can see at JIRA issue LUCENE-2507 DirectSolrSpellChecker will not only remove the need of having a separate index, but will also improvement in suggestions quality. From what you can see in the mentioned JIRA issue, DirectSolrSpellChecker works better comparing to the previous implementations although it’s slightly slower, but I think that wont be an issue when you don’t use SpellChecker with every query.



Spell checker

Published at DZone with permission of Rafał Kuć, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • A Gentle Introduction to Kubernetes
  • How Chat GPT-3 Changed the Life of Young DevOps Engineers
  • Kubernetes-Native Development With Quarkus and Eclipse JKube
  • Java REST API Frameworks

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: