Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Solr SearchComponent: Did You Mean Re-searcher?

DZone's Guide to

Solr SearchComponent: Did You Mean Re-searcher?

· Big Data Zone
Free Resource

Learn how you can maximize big data in the cloud with Apache Hadoop. Download this eBook now. Brought to you in partnership with Hortonworks.

Solr makes Spellcheck easy. Super-easy, in fact. All you need to do is to change some stuff in solrconfig.xml, and voila, spellcheck suggestions!

However, that's not how Google does spellchecking. What Google does is determine if the query has a mis-spelling, and if so, transparently correct the misspelled term for you and perform the search, but also giving you the option of searching for the original term via a link.

Now, whilst it'd be uber-cool to have an exact equivalent in Solr, you'd need some statistical data to be able to perform this efficiently. A naive version is to use spellcheck corrections to transparently perform a new query when the original query returned less than x hits, where x is some arbitrarily small number.

Here's a simple SearchComponent that does just that:

import org.apache.solr.common.util.NamedList;
import org.apache.solr.handler.component.QueryComponent;
import org.apache.solr.handler.component.ResponseBuilder;

import java.io.IOException;

public class AutoSpellcheckResearcher extends QueryComponent {
  // if less than *threshold* hits are returned, a re-search is triggered
  private int threshold = 0;

  @Override public void init(NamedList args) {
    super.init(args);
    this.threshold = (Integer) args.get("threshold");
  }

  @Override public void prepare(ResponseBuilder rb) throws IOException {
  }

  @Override public void process(ResponseBuilder rb) throws IOException {
    long hits = rb.getNumberDocumentsFound();
    if (hits <= threshold) {
      final NamedList responseValues = rb.rsp.getValues();
      NamedList spellcheckresults = (NamedList) responseValues.get("spellcheck");
      if (spellcheckresults != null) {
        NamedList suggestions = (NamedList) spellcheckresults.get("suggestions");
        if (suggestions != null) {
          final NamedList collation = (NamedList) suggestions.get("collation");
          if (collation != null) {
            String collationQuery = (String) collation.get("collationQuery");
            if (responseValues != null) {
              responseValues.add("researched.original", rb.getQueryString());
              responseValues.add("researched.replaced", collationQuery);
              responseValues.remove("response");
            }
            rb.setQueryString(collationQuery);
            super.prepare(rb);
            super.process(rb);
          }
        }
      }
    }
  }

  @Override public String getDescription() {
    return "AutoSpellcheckResearcher";
  }

  @Override public String getSource() {
    return "1.0";
  }
}


Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

Topics:

Published at DZone with permission of Kelvin Tan. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}