Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Solr SearchComponent: Did You Mean Re-searcher?

DZone's Guide to

Solr SearchComponent: Did You Mean Re-searcher?

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

Solr makes Spellcheck easy. Super-easy, in fact. All you need to do is to change some stuff in solrconfig.xml, and voila, spellcheck suggestions!

However, that's not how Google does spellchecking. What Google does is determine if the query has a mis-spelling, and if so, transparently correct the misspelled term for you and perform the search, but also giving you the option of searching for the original term via a link.

Now, whilst it'd be uber-cool to have an exact equivalent in Solr, you'd need some statistical data to be able to perform this efficiently. A naive version is to use spellcheck corrections to transparently perform a new query when the original query returned less than x hits, where x is some arbitrarily small number.

Here's a simple SearchComponent that does just that:

import org.apache.solr.common.util.NamedList;
import org.apache.solr.handler.component.QueryComponent;
import org.apache.solr.handler.component.ResponseBuilder;

import java.io.IOException;

public class AutoSpellcheckResearcher extends QueryComponent {
  // if less than *threshold* hits are returned, a re-search is triggered
  private int threshold = 0;

  @Override public void init(NamedList args) {
    super.init(args);
    this.threshold = (Integer) args.get("threshold");
  }

  @Override public void prepare(ResponseBuilder rb) throws IOException {
  }

  @Override public void process(ResponseBuilder rb) throws IOException {
    long hits = rb.getNumberDocumentsFound();
    if (hits <= threshold) {
      final NamedList responseValues = rb.rsp.getValues();
      NamedList spellcheckresults = (NamedList) responseValues.get("spellcheck");
      if (spellcheckresults != null) {
        NamedList suggestions = (NamedList) spellcheckresults.get("suggestions");
        if (suggestions != null) {
          final NamedList collation = (NamedList) suggestions.get("collation");
          if (collation != null) {
            String collationQuery = (String) collation.get("collationQuery");
            if (responseValues != null) {
              responseValues.add("researched.original", rb.getQueryString());
              responseValues.add("researched.replaced", collationQuery);
              responseValues.remove("response");
            }
            rb.setQueryString(collationQuery);
            super.prepare(rb);
            super.process(rb);
          }
        }
      }
    }
  }

  @Override public String getDescription() {
    return "AutoSpellcheckResearcher";
  }

  @Override public String getSource() {
    return "1.0";
  }
}


Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

Topics:

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}