Over a million developers have joined DZone.

A Phrase-based, Out-of-order Solr Autocomplete Suggester

· Big Data Zone

Read this eGuide to discover the fundamental differences between iPaaS and dPaaS and how the innovative approach of dPaaS gets to the heart of today’s most pressing integration problems, brought to you in partnership with Liaison.

Solr has a number of Autocomplete implementations that are great for most purposes. However, a client of mine recently had some fairly specific requirements for Autocomplete:

1. Phrase-based substring matching
2. Out-of-order matches ('foo bar' should match 'the bar is foo')
3. Fallback matching to a secondary field when substring matching on the primary field fails, e.g., 'windstopper jac' doesn't match anything on the 'title' field, but matches on the 'category' field

The most direct way to model this would probably have been to create a separate Solr core and use n-gram plus shingles indexing, along with Solr queries, to obtain results. However, because the index was fairly small, I decided to go with an in-memory approach.

The general strategy was:

1. For each entry in the primary field, create n-gram tokens, adding entries to a Guava Table where key is n-gram, column is string and value is a distance score.
2. For each entry in the secondary field, create n-gram tokens and add entries to a Guava Multimap where key is n-gram and value is term.
3. When an Autocomplete query is received, split it by space, then do look-ups against the primary table.
4. If no matches are found, look-up against the secondary Multimap.
5. Return results.

The scoring for the primary table was a simple one based on length of word and distance of token from the start of the string.

Discover the unprecedented possibilities and challenges, created by today’s fast paced data climate and why your current integration solution is not enough, brought to you in partnership with Liaison


Published at DZone with permission of Kelvin Tan. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}