DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Coding
  3. Java
  4. Hibernate Search based Autocomplete Suggester

Hibernate Search based Autocomplete Suggester

Nishant Chandra user avatar by
Nishant Chandra
·
Oct. 07, 13 · Interview
Like (1)
Save
Tweet
Share
15.04K Views

Join the DZone community and get the full member experience.

Join For Free

In this article, I will show how to implement auto-completion using Hibernate Search.

The same can be achieved using Solr or ElasticSearch. But I decided to use Hibernate Search as its the simplest to get started with, easily integrates with an existing application and leverages the same core - Lucene. And we get all of this without the overhead of managing Solr/ElasticSearch cluster. In all, I found Hibernate Search to be the go-to search engine for simple use cases.

For our use case, we build a product title based auto-completion where often, the user queries are searches for product title. While typing, users should immediately see titles matching their requests, and Hibernate Search should do the hard work to filter the relevant documents in near real-time.

Lets have the following JPA annotated Product entity class. 

public class Product {

  @Id
 @Column(name = "sku")
 private String sku;

  @Column(name = "upc")
 private String upc;

  @Column(name = "title")
 private String title;

....
}


We are interested in returning suggestions based on the 'title' field. Title will be indexed based on 2 strategies - N-Gram and Edge N-Gram.

Edge N-Gram - This will match only from the left edge of the suggestion text. For this we use KeywordTokenizerFactory (emits the entire input as a single token)  and EdgeNGramFilterFactory along with some regex cleansing.

N-Gram matches from the start of every word, so that you can get right-truncated suggestions for any word in the text, not only from the first word. The main difference from N-gram is the tokenizer which is StandardTokenizerFactory along with NGramFilterFactory.

Using these strategies, if the document field is "A brown fox" and the query is
a) "A bro"- Will match
b) "bro" - Will match

Implementation: In the entity defined above, we can map 'title' property twice with the above strategies. Below are the annotations to instruct Hibernate to index 'title' twice.

@Entity
@Table(name = "item_master")
@Indexed(index = "Products")
@AnalyzerDefs({

@AnalyzerDef(name = "autocompleteEdgeAnalyzer",

// Split input into tokens according to tokenizer
tokenizer = @TokenizerDef(factory = KeywordTokenizerFactory.class),

filters = {
 // Normalize token text to lowercase, as the user is unlikely to
 // care about casing when searching for matches
 @TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
   @Parameter(name = "pattern",value = "([^a-zA-Z0-9\\.])"),
   @Parameter(name = "replacement", value = " "),
   @Parameter(name = "replace", value = "all") }),
 @TokenFilterDef(factory = LowerCaseFilterFactory.class),
 @TokenFilterDef(factory = StopFilterFactory.class),
 // Index partial words starting at the front, so we can provide
 // Autocomplete functionality
 @TokenFilterDef(factory = EdgeNGramFilterFactory.class, params = {
   @Parameter(name = "minGramSize", value = "3"),
   @Parameter(name = "maxGramSize", value = "50") }) }),

@AnalyzerDef(name = "autocompleteNGramAnalyzer",

// Split input into tokens according to tokenizer
tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),

filters = {
 // Normalize token text to lowercase, as the user is unlikely to
 // care about casing when searching for matches
 @TokenFilterDef(factory = WordDelimiterFilterFactory.class),
 @TokenFilterDef(factory = LowerCaseFilterFactory.class),
 @TokenFilterDef(factory = NGramFilterFactory.class, params = {
   @Parameter(name = "minGramSize", value = "3"),
   @Parameter(name = "maxGramSize", value = "5") }),
 @TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
   @Parameter(name = "pattern",value = "([^a-zA-Z0-9\\.])"),
   @Parameter(name = "replacement", value = " "),
   @Parameter(name = "replace", value = "all") })
}),

@AnalyzerDef(name = "standardAnalyzer",

// Split input into tokens according to tokenizer
tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),

filters = {
 // Normalize token text to lowercase, as the user is unlikely to
 // care about casing when searching for matches
 @TokenFilterDef(factory = WordDelimiterFilterFactory.class),
 @TokenFilterDef(factory = LowerCaseFilterFactory.class),
 @TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
   @Parameter(name = "pattern", value = "([^a-zA-Z0-9\\.])"),
   @Parameter(name = "replacement", value = " "),
   @Parameter(name = "replace", value = "all") })
}) // Def
})
public class Product {

....
}

Explanation: 2 custom analyzers - autocompleteEdgeAnalyzer andautocompleteNGramAnalyzer have been defined as per theory in the previous section. Next, we apply these analyzers on the 'title' field to create 2 different indexes. Here is how we do it:

@Column(name = "title")
@Fields({
  @Field(name = "title", index = Index.YES, store = Store.YES,
analyze = Analyze.YES, analyzer = @Analyzer(definition = "standardAnalyzer")),
  @Field(name = "edgeNGramTitle", index = Index.YES, store = Store.NO,
analyze = Analyze.YES, analyzer = @Analyzer(definition = "autocompleteEdgeAnalyzer")),
  @Field(name = "nGramTitle", index = Index.YES, store = Store.NO,
analyze = Analyze.YES, analyzer = @Analyzer(definition = "autocompleteNGramAnalyzer"))
})
private String title;

Start indexing:

public void index() throws InterruptedException {
  getFullTextSession().createIndexer().startAndWait();
 }


Once indexed, inspect the index using Luke and you should be able to see title analyzed and stored as N-Grams and Edge N-Grams.

Search Query:

private static final String TITLE_EDGE_NGRAM_INDEX = "edgeNGramTitle";
 private static final String TITLE_NGRAM_INDEX = "nGramTitle";

 @Transactional(readOnly = true)
 public synchronized List getSuggestions(final String searchTerm) {

 QueryBuilder titleQB = getFullTextSession().getSearchFactory()
   .buildQueryBuilder().forEntity(Product.class).get();

 Query query = titleQB.phrase().withSlop(2).onField(TITLE_NGRAM_INDEX)
   .andField(TITLE_EDGE_NGRAM_INDEX).boostedTo(5)
   .sentence(searchTerm.toLowerCase()).createQuery();

 FullTextQuery fullTextQuery = getFullTextSession().createFullTextQuery(
    query, Product.class);
 fullTextQuery.setMaxResults(20);

 @SuppressWarnings("unchecked")
 List<product> results = fullTextQuery.list();
 return results;
}

And we have a working suggester.
What next? Expose the functionality via a REST API and integrate it with jQuery, examples of which can be easily found. 

You can also use the same strategy with Solr and ElasticSearch.

Hibernate N-gram

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Core Machine Learning Metrics
  • The Key Assumption of Modern Work Culture
  • The Real Democratization of AI, and Why It Has to Be Closely Monitored
  • Connecting Your Devs' Work to the Business

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: