DZone
Java Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Java Zone > 6-Step Lucene Multi-Point Spatial Search Algorithm

6-Step Lucene Multi-Point Spatial Search Algorithm

Kelvin Tan user avatar by
Kelvin Tan
·
Apr. 17, 12 · Java Zone · Interview
Like (0)
Save
Tweet
6.93K Views

Join the DZone community and get the full member experience.

Join For Free

This post describes a method of augmenting the lucene-spatial contrib package to support multi-point searches. It is quite similar to the method described http://www.supermind.org/blog/548/multiple-latitudelongitude-pairs-for-a-single-solrlucene-doc with some minor modifications.

The problem is as follows:

A company (mapped as a Lucene doc) has an address associated with it. It also has a list of store locations, which each have an address. Given a lat/long point, return a list of companies which have either a store location or an address within x miles from that point. There should be the ability to search on just company addresses, store locations, or both.


This problem requires that you index a "primary" lat/long pair, and multiple "secondary" lat/long pairs, and be able to search only primary lat/long, only secondary lat/long or both.

This excludes the possibility of using SOLR-2155 or LUCENE-3795 as-is. I'm sure it would have been possible to patch either to do so

Also, SOLR-2155 depended on Solr, and I needed a pure Lucene 3.5 solution. And MultiValueSource, which SOLR-2155 uses, does not appear to be supported in Lucene 3.5.

The SOLR-2155 implementation is also pretty inefficient: it creates a List object
for every single doc in the index in order to support multi-point search.

The general outline of the method is:

1. Search store locations index and collect company IDs and distances
2. Augment DistanceFilter with store location distances
3. Add a BooleanQuery with company IDs. This is to include companies in the final result-set whose address does not match, but have one or more store locations which do
4. Search company index
5. Return results

The algorithm in detail:

1. Index the company address with the company document, i.e the document containing company fields such as name etc

2. In a separate index (or in the same index but in a different document "type"), index the store locations, adding the company ID as a field.

3. Given a lat/long point to search, first search the store locations index. Collect a unique list of company doc-ids:distance in a LinkedHashMap, checking for duplicates. Note that this is the lucene doc-id of the store location's corresponding company, NOT the company ID field value. This will be used to augment the distancefilter in the next stage.

Hint: you'll need to use TermDocs to get this, like so:

for(int i=0;i<locationHits.docs.totalHits;++i) {
      int locationDocId = locationHits.docs.scoreDocs[i].doc;
      int companyId = companyIds[locationDocId];
      double distance = locationHits.distanceFilter.getDistance(locationDocId);
      if(companyDistances.containsKey(companyId)) continue;
      Term t = new Term("id", Integer.toString(companyId));
      TermDocs td = companyReader.termDocs(t);
      if (td.next()) {
        int companyDocId = td.doc();
        companyDistances.put(companyDocId, distance);
      }
      td.close();
    }

Since the search returns results sorted by distance (using lucene-spatial's DistanceFilter), you're assured to have a list of company doc ids in ascending order of distance.

In this same pass, also collect a list of company IDs. This will be used to build the BooleanQuery used in the company search.

4. Set company DistanceFilter's distances. Note: in Lucene 3.5, I added a one-line patch to DistanceFilter so that setDistances() calls putAll() instead of replacing the map.

final DistanceQueryBuilder dq = new DistanceQueryBuilder(centerLat, centerLng, milesF, "lat", "lng", CartesianTierPlotter.DEFALT_FIELD_PREFIX, true, 0, 20);
dq.getDistanceFilter().setDistances(companyDistances);
 

5. Build BooleanQuery including company IDs

    BooleanQuery bq = new BooleanQuery();
    for(Integer id: companyIds) bq.add(new TermQuery(new Term("id", Integer.toString(id))), BooleanClause.Occur.SHOULD);
    bq.add(distanceQuery, BooleanClause.Occur.SHOULD);

6. Search and return results

Lucene Algorithm

Published at DZone with permission of Kelvin Tan. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • 20 Testing Tools and Libraries You Need to Know
  • Password Authentication: How to Correctly Do It
  • 27 Free Web UI Mockup Tools
  • Checklist for API Verification

Comments

Java Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo