Over a million developers have joined DZone.

Autocomplete Using Elasticsearch

We take a look at how to implement autocomplete using Elasticsearch and nGrams in this post. Read on for more information.

· Big Data Zone

Learn how you can maximize big data in the cloud with Apache Hadoop. Download this eBook now. Brought to you in partnership with Hortonworks.

You would have seen in a movie data store like IMDB, whenever a user enters ‘g’, the search bar suggests him that you might be looking for gone girl or all the movies that have ‘g’ in them. This is what an Autocomplete or word completion is and it has become an essential part of any application. Autocomplete speeds up human-computer interaction by predicting the word using very few characters.

In this blog, I’ll be discussing result suggest autocomplete using Elasticsearch which means that the predictions would be based on the existing data in the data store.

(There is another type of autocomplete i.e search suggest autocomplete which works on the previously searched phrases but we won’t be discussing it in this blog.)

Analyzers

Whenever we insert data into Elasticsearch, it analyzes the data so that an appropriate inverted index can be created.

The Analyzers consists of a tokenizer and one or more token filter which transform the data appropriately so that the business needs are met.

NGrams Analyzer

N-gram is a contiguous sequence of n items from a given sequence of text. This means that we are breaking the search text into character permutations.

ngram-analyzer

Mapping And Settings

{
  "settings": {
    "analysis": {
      "filter": {
        "gramFilter": {
          "type": "nGram",
          "min_gram": 1,
          "max_gram": 20,
          "token_chars": [
            "letter",
            "digit"
          ]
        }
      },
      "analyzer": {
        "gramAnalyzer": {
          "type": "custom",
          "tokenizer": "whitespace",
          "filter": [
            "lowercase",
            "gramFilter"
          ]
        },
        "whitespaceAnalyzer": {
          "type": "custom",
          "tokenizer": "whitespace",
          "filter": [
            "lowercase"
          ]
        }
      }
    }
  },
  "mappings": {
    "movies": {
      "properties": {
        "Title": {
          "type": "string",
          "analyzer": "gramAnalyzer",
          "search_analyzer": "whitespaceAnalyzer"
        },
        .
        .
        .
      }
    }
  }
}

Notice that we have defined a gramFilter of type nGram, min_gram and max_gram are the minimum and maximum characters that you want in the tokens and token_chars is the condition on which you want to create the grams.

And also we have used two analyzers in the mapping:-

  • gramAnalyzer
  • whitespaceAnalyzer

Now the question which must be striking you guys is, Why do we need two analyzers?

It’s just because we want to analyze the stored data and the search query differently.

  • The search text lowercase and is split on whitespaces.
  • The stored data is lowercased and gramFilter is applied on it.

Once our analyzers are ready we need to apply these to the field that we want to make suggestions for (In our example the field would be Title).

Searching

We can execute a match phrase query on “Title” field to use the autocomplete functionality.

The query looks like this:

{
  "query": {
    "match": {
      "Title": "go"
    }
  }
}

This query will return all the movies that are listed in the Elasticsearch index which contain ‘go’ in the Title.

An activator template implementing this feature can be found here.

References:
 1. https://qbox.io/blog/multi-field-partial-word-autocomplete-in-elasticsearch-using-ngrams
 2. https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.html

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

Topics:
filter ,search ,data ,autocomplete ,query ,index ,elasticsearch

Published at DZone with permission of Kunal Kapoor, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}