Over a million developers have joined DZone.
Platinum Partner

Solr SearchComponent: Did You Mean Re-searcher?

· Big Data Zone

The Big Data Zone is presented by Exaptive.  Learn about how to rapidly iterate data applications, while reusing existing code and leveraging open source technologies.

Solr makes Spellcheck easy. Super-easy, in fact. All you need to do is to change some stuff in solrconfig.xml, and voila, spellcheck suggestions!

However, that's not how Google does spellchecking. What Google does is determine if the query has a mis-spelling, and if so, transparently correct the misspelled term for you and perform the search, but also giving you the option of searching for the original term via a link.

Now, whilst it'd be uber-cool to have an exact equivalent in Solr, you'd need some statistical data to be able to perform this efficiently. A naive version is to use spellcheck corrections to transparently perform a new query when the original query returned less than x hits, where x is some arbitrarily small number.

Here's a simple SearchComponent that does just that:

import org.apache.solr.common.util.NamedList;
import org.apache.solr.handler.component.QueryComponent;
import org.apache.solr.handler.component.ResponseBuilder;

import java.io.IOException;

public class AutoSpellcheckResearcher extends QueryComponent {
  // if less than *threshold* hits are returned, a re-search is triggered
  private int threshold = 0;

  @Override public void init(NamedList args) {
    super.init(args);
    this.threshold = (Integer) args.get("threshold");
  }

  @Override public void prepare(ResponseBuilder rb) throws IOException {
  }

  @Override public void process(ResponseBuilder rb) throws IOException {
    long hits = rb.getNumberDocumentsFound();
    if (hits <= threshold) {
      final NamedList responseValues = rb.rsp.getValues();
      NamedList spellcheckresults = (NamedList) responseValues.get("spellcheck");
      if (spellcheckresults != null) {
        NamedList suggestions = (NamedList) spellcheckresults.get("suggestions");
        if (suggestions != null) {
          final NamedList collation = (NamedList) suggestions.get("collation");
          if (collation != null) {
            String collationQuery = (String) collation.get("collationQuery");
            if (responseValues != null) {
              responseValues.add("researched.original", rb.getQueryString());
              responseValues.add("researched.replaced", collationQuery);
              responseValues.remove("response");
            }
            rb.setQueryString(collationQuery);
            super.prepare(rb);
            super.process(rb);
          }
        }
      }
    }
  }

  @Override public String getDescription() {
    return "AutoSpellcheckResearcher";
  }

  @Override public String getSource() {
    return "1.0";
  }
}


The Big Data Zone is presented by Exaptive.  Learn how rapid data application development can address the data science shortage.

Topics:

Published at DZone with permission of Kelvin Tan .

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}