DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
The Latest "Software Integration: The Intersection of APIs, Microservices, and Cloud-Based Systems" Trend Report
Get the report
  1. DZone
  2. Data Engineering
  3. Data
  4. Advanced Filter Caching in Solr

Advanced Filter Caching in Solr

Yonik Seeley user avatar by
Yonik Seeley
·
Feb. 13, 12 · Interview
Like (0)
Save
Tweet
Share
9.08K Views

Join the DZone community and get the full member experience.

Join For Free

Advanced Filter Caching is a relatively new feature in Solr, available in version 3.4 and above. It allows precise control over how Solr handles filter queries in order to maximize performance, including the ability to specify if a filter is cached, the order filters are evaluated, and post filtering.

Filter Queries in Solr

Adding a filter expressed as a query to a Solr request is a snap… simply add an additional fq parameter for each filter query.

http://localhost:8983/solr/select?
   q=cars
   &fq=color:red
   &fq=model:Corvette
   &fq=year:[2005 TO *]

By default, Solr resolves all of the filters *before* the main query. Each filter query is looked up individually in Solr’s filterCache (which is pretty advanced itself, supporting concurrent lookups, different eviction policies such as LRU or LFU, and auto-warming). Caching each filter query separately accelerates Solr’s query throughput by greatly improving cache hit rates since many types of filters tend to be reused across different requests.

To Cache or not to Cache

The new advanced filter control API adds the ability to *not* cache a filter. Some filters may see almost no reuse across different requests, and not attempting to cache them can lead to a smaller, more effective filterCache with a higher hit rate.

To tell Solr not to cache a filter, we use the same powerful local params DSL that adds metadata to query parameters and is used to specify different types of query syntaxes and query parsers. For a normal query that does not have any localParam metadata, simply prepend a local param of cache=false. For example:

&fq={!cache=false}year:[2005 TO *]

To add cache=false to a filter query that already had localParams, simply add it right in with the rest of the params. For example, if we want to use Solr’s native spatial abilities to restrict our matches to locations within 50 km of Stanford, our filter query would look like:

&fq={!geofilt sfield=location pt=37.42,-122.17 d=50} 

It’s easy to modify this filter to tell Solr not to cache it by adding cache=false in with the rest of the local parameters:

&fq={!geofilt sfield=location pt=37.42,-122.17 d=50 cache=false} 

Leapfrog anyone?

When a filter isn’t generated up front and cached, it’s executed in parallel with the main query. First, the filter is asked about the first document id that it matches. The query is then asked about the first document that is equal to or greater than that document. The filter is then asked about the first document that is equal to or greater than that. The filter and the query play this game of leapfrog until they land on the same document and it’s declared a match, after which the document is collected and scored.

How much is that filter?

Advanced filtering adds even more fine grained control by introducing the notion of cost. If there are multiple non-cached filters in a response, filters with a lower cost will be checked before those with a higher cost.

&fq={!cache=false cost=10}year:[2005 TO *]
&fq={!geofilt cache=false cost=20}
&pt=37.42,-122.17
&sfield=location
&d=50

In the example above, the filter based on year has a lower cost and will thus always be checked before the spatial filter.

As an aside, notice how spatial queries will use global spatial request parameters if they are not specified locally. This can make it even easier to construct requests containing spatial functions.

Expensive Filters

Some filters are slow enough that you don’t even want to run them in parallel with the query and other filters, even if they are consulted last, since asking them “what is the next doc you match on or after this given doc” is so expensive. For these types of filters, you really want to only ask them “do you match this doc” only after the query and all other filters have been consulted. Solr has special support for this called “post filtering“.

Post filtering is triggered by filters that have a cost>=100 and have explicit support for it. If there are multiple post filters in a single request, they will be ordered by cost.

The frange qparser has post filter support and allows powerful queries specifying ranges over arbitrarily complex function queries.

For example, if we wanted to take the log of popularity, divide it by the square root of the distance, and filter out documents with a result less than 5, we could run this as a post filter using frange:

&fq={!frange l=5 cache=false cost=200}div(log(popularity),sqrt(geodist()))

Post filtering support for the spatial filter queries bbox and geofilt has just been added to Solr 4.0 too. To execute our previous un-cached spatial filter as a post filter, simply modify it’s cost to be greater than 100:

&fq={!geofilt cache=false cost=150}
&pt=37.42,-122.17
&sfield=location
&d=50

Custom Filters

If you have expensive custom logic you’d like to add as a post filter (say per-document custom security ACLs), you can implement your own QParserPlugin that returns Query objects that implement Solr’s PostFilter interface. You can set the default cost or hardcode a cost higher than 100 if you want to only support post filtering. Then, you can use your custom parser as you would any other builtin query type via fq={!myqueryparser} and Solr will handle the rest!

Try it out!

In conclusion, hopefully this gives more insight into just one of many factors working under the hood to make Solr so fast.
To try out the latest functionality, you can always get a nightly build of trunk. Feedback is always appreciated!

Source:  http://www.lucidimagination.com/blog/2012/02/10/advanced-filter-caching-in-solr/

Filter (software) Cache (computing) Database

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Journey to Event Driven, Part 1: Why Event-First Programming Changes Everything
  • Create Spider Chart With ReactJS
  • Top 5 Data Streaming Trends for 2023
  • Microservices Testing

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: