DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Ranking Full-Text Search Results in PostgreSQL Using ts_rank and ts_rank_cd With Hibernate 6 and posjsonhelper
  • How Spring and Hibernate Simplify Web and Database Management
  • Enhanced Query Caching Mechanism in Hibernate 6.3.0
  • Multi-Tenancy and Its Improved Support in Hibernate 6.3.0

Trending

  • From Data Movement to Local Intelligence: The Shift from Centralized to Federated AI
  • Detecting Advanced Persistent Threats Using Behavioral Analytics and Log Correlation
  • Navigating the Complexities of AI-Driven Integration in Multi-Cloud Environments: A Veteran’s Insights
  • Modernization Is Not Migration
  1. DZone
  2. Coding
  3. Java
  4. Hibernate Search based Autocomplete Suggester

Hibernate Search based Autocomplete Suggester

By 
Nishant Chandra user avatar
Nishant Chandra
·
Oct. 07, 13 · Interview
Likes (1)
Comment
Save
Tweet
Share
15.9K Views

Join the DZone community and get the full member experience.

Join For Free

In this article, I will show how to implement auto-completion using Hibernate Search.

The same can be achieved using Solr or ElasticSearch. But I decided to use Hibernate Search as its the simplest to get started with, easily integrates with an existing application and leverages the same core - Lucene. And we get all of this without the overhead of managing Solr/ElasticSearch cluster. In all, I found Hibernate Search to be the go-to search engine for simple use cases.

For our use case, we build a product title based auto-completion where often, the user queries are searches for product title. While typing, users should immediately see titles matching their requests, and Hibernate Search should do the hard work to filter the relevant documents in near real-time.

Lets have the following JPA annotated Product entity class. 

public class Product {

  @Id
 @Column(name = "sku")
 private String sku;

  @Column(name = "upc")
 private String upc;

  @Column(name = "title")
 private String title;

....
}


We are interested in returning suggestions based on the 'title' field. Title will be indexed based on 2 strategies - N-Gram and Edge N-Gram.

Edge N-Gram - This will match only from the left edge of the suggestion text. For this we use KeywordTokenizerFactory (emits the entire input as a single token)  and EdgeNGramFilterFactory along with some regex cleansing.

N-Gram matches from the start of every word, so that you can get right-truncated suggestions for any word in the text, not only from the first word. The main difference from N-gram is the tokenizer which is StandardTokenizerFactory along with NGramFilterFactory.

Using these strategies, if the document field is "A brown fox" and the query is
a) "A bro"- Will match
b) "bro" - Will match

Implementation: In the entity defined above, we can map 'title' property twice with the above strategies. Below are the annotations to instruct Hibernate to index 'title' twice.

@Entity
@Table(name = "item_master")
@Indexed(index = "Products")
@AnalyzerDefs({

@AnalyzerDef(name = "autocompleteEdgeAnalyzer",

// Split input into tokens according to tokenizer
tokenizer = @TokenizerDef(factory = KeywordTokenizerFactory.class),

filters = {
 // Normalize token text to lowercase, as the user is unlikely to
 // care about casing when searching for matches
 @TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
   @Parameter(name = "pattern",value = "([^a-zA-Z0-9\\.])"),
   @Parameter(name = "replacement", value = " "),
   @Parameter(name = "replace", value = "all") }),
 @TokenFilterDef(factory = LowerCaseFilterFactory.class),
 @TokenFilterDef(factory = StopFilterFactory.class),
 // Index partial words starting at the front, so we can provide
 // Autocomplete functionality
 @TokenFilterDef(factory = EdgeNGramFilterFactory.class, params = {
   @Parameter(name = "minGramSize", value = "3"),
   @Parameter(name = "maxGramSize", value = "50") }) }),

@AnalyzerDef(name = "autocompleteNGramAnalyzer",

// Split input into tokens according to tokenizer
tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),

filters = {
 // Normalize token text to lowercase, as the user is unlikely to
 // care about casing when searching for matches
 @TokenFilterDef(factory = WordDelimiterFilterFactory.class),
 @TokenFilterDef(factory = LowerCaseFilterFactory.class),
 @TokenFilterDef(factory = NGramFilterFactory.class, params = {
   @Parameter(name = "minGramSize", value = "3"),
   @Parameter(name = "maxGramSize", value = "5") }),
 @TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
   @Parameter(name = "pattern",value = "([^a-zA-Z0-9\\.])"),
   @Parameter(name = "replacement", value = " "),
   @Parameter(name = "replace", value = "all") })
}),

@AnalyzerDef(name = "standardAnalyzer",

// Split input into tokens according to tokenizer
tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),

filters = {
 // Normalize token text to lowercase, as the user is unlikely to
 // care about casing when searching for matches
 @TokenFilterDef(factory = WordDelimiterFilterFactory.class),
 @TokenFilterDef(factory = LowerCaseFilterFactory.class),
 @TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
   @Parameter(name = "pattern", value = "([^a-zA-Z0-9\\.])"),
   @Parameter(name = "replacement", value = " "),
   @Parameter(name = "replace", value = "all") })
}) // Def
})
public class Product {

....
}

Explanation: 2 custom analyzers - autocompleteEdgeAnalyzer andautocompleteNGramAnalyzer have been defined as per theory in the previous section. Next, we apply these analyzers on the 'title' field to create 2 different indexes. Here is how we do it:

@Column(name = "title")
@Fields({
  @Field(name = "title", index = Index.YES, store = Store.YES,
analyze = Analyze.YES, analyzer = @Analyzer(definition = "standardAnalyzer")),
  @Field(name = "edgeNGramTitle", index = Index.YES, store = Store.NO,
analyze = Analyze.YES, analyzer = @Analyzer(definition = "autocompleteEdgeAnalyzer")),
  @Field(name = "nGramTitle", index = Index.YES, store = Store.NO,
analyze = Analyze.YES, analyzer = @Analyzer(definition = "autocompleteNGramAnalyzer"))
})
private String title;

Start indexing:

public void index() throws InterruptedException {
  getFullTextSession().createIndexer().startAndWait();
 }


Once indexed, inspect the index using Luke and you should be able to see title analyzed and stored as N-Grams and Edge N-Grams.

Search Query:

private static final String TITLE_EDGE_NGRAM_INDEX = "edgeNGramTitle";
 private static final String TITLE_NGRAM_INDEX = "nGramTitle";

 @Transactional(readOnly = true)
 public synchronized List getSuggestions(final String searchTerm) {

 QueryBuilder titleQB = getFullTextSession().getSearchFactory()
   .buildQueryBuilder().forEntity(Product.class).get();

 Query query = titleQB.phrase().withSlop(2).onField(TITLE_NGRAM_INDEX)
   .andField(TITLE_EDGE_NGRAM_INDEX).boostedTo(5)
   .sentence(searchTerm.toLowerCase()).createQuery();

 FullTextQuery fullTextQuery = getFullTextSession().createFullTextQuery(
    query, Product.class);
 fullTextQuery.setMaxResults(20);

 @SuppressWarnings("unchecked")
 List<product> results = fullTextQuery.list();
 return results;
}

And we have a working suggester.
What next? Expose the functionality via a REST API and integrate it with jQuery, examples of which can be easily found. 

You can also use the same strategy with Solr and ElasticSearch.

Hibernate N-gram

Opinions expressed by DZone contributors are their own.

Related

  • Ranking Full-Text Search Results in PostgreSQL Using ts_rank and ts_rank_cd With Hibernate 6 and posjsonhelper
  • How Spring and Hibernate Simplify Web and Database Management
  • Enhanced Query Caching Mechanism in Hibernate 6.3.0
  • Multi-Tenancy and Its Improved Support in Hibernate 6.3.0

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook