DZone
Java Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Java Zone > Using Guava's Multimap to Improve Solr's Autocomplete Suggester

Using Guava's Multimap to Improve Solr's Autocomplete Suggester

Kelvin Tan user avatar by
Kelvin Tan
·
Mar. 16, 12 · Java Zone · Interview
Like (0)
Save
Tweet
7.47K Views

Join the DZone community and get the full member experience.

Join For Free

Context-less, multi-term autocomplete is difficult.

Given the term "di", we can look at our index and rank terms starting with "di" by frequency and return the n most frequent terms. Solr's TSTLookup and FSTLookup do this very well.

However, given the term "walt di", we can no longer do what we did above for each term and not look silly, especially if the corpus in question is a list of US companies (hint: think mickey mouse". There's little excuse to suggesting "walt discovery" or "walt diners" when our corpus does not contain any documents with that combination of terms.

In the absence of a large number of historical user queries to augment the autocomplete, context is king when it comes to multi-term queries.

The simplest way I can think of doing this, if it is feasible to do so memory-wise, is to store a list of terms and the term that immediately follows it. For example, given the field value "international business machines", mappings would be created for

international=>business
business=>machines

Out-of-order queries wouldn't be supported with this system, nor would term skips (e.g. international machines).

Here's a method fragment that does just this:

HashMultimap<String, String> map = HashMultimap.create();
for (int i = 0; i < reader.numDocs(); ++i) {
  Fieldable fieldable = reader.document(i).getFieldable(field);
  if(fieldable == null) continue;
  String fieldVal = fieldable.stringValue();
  if(fieldVal == null) continue;
  TokenStream ts = a.tokenStream(field, new StringReader(fieldVal));
  String prev = null;
  while (ts.incrementToken()) {
    CharTermAttribute attr = ts.getAttribute(CharTermAttribute.class);
    String v = new String(attr.buffer(), 0, attr.length()).intern();
    if (prev != null) {
      map.get(prev).add(v);
    }
    prev = v;
  }
}

Guava's Multimap is perfect for this, and Solr already has a Guava dependency, so we might as well make full use of it.

Database Fragment (logic) Dependency Discovery (law) Google Guava Document

Published at DZone with permission of Kelvin Tan. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Top 20 Git Commands With Examples
  • Creating an Event-Driven Architecture in a Microservices Setting
  • The Developer's Guide to SaaS Compliance
  • MACH Architecture Explained

Comments

Java Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo