DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
The Latest "Software Integration: The Intersection of APIs, Microservices, and Cloud-Based Systems" Trend Report
Get the report
  1. DZone
  2. Data Engineering
  3. Databases
  4. Solving Issues with Date Boosting and NOW in Solr

Solving Issues with Date Boosting and NOW in Solr

Erick Erickson user avatar by
Erick Erickson
·
Apr. 04, 12 · Interview
Like (0)
Save
Tweet
Share
6.29K Views

Join the DZone community and get the full member experience.

Join For Free

More NOW evil

Prompted by a subtle issue a client raised, I was thinking about date boosting. According to the Wiki, a good way to bost by date is by something like the following:

http://localhost:8983/solr/select?q={!boost b=recip(ms(NOW,manufacturedate_dt),3.16e-11,1,1)}ipod

(see:  date boosting link). And this works well, no question.

However, there’s a subtle issue when paging. NOW evaluates to the current time, and every subsequent request will have a different value for NOW. This blog post about the effects of this on filter queries provides some useful background.

What does that have to do with date boosting?

Imagine that you have multiple pages of results. Typically, one constructs a series of page links to get to subsequent pages, something like

http://your solr addr/select?q=searchterms&start=10&rows=10

but you need to add the date boosting too, right? So each of these URLs will have the date boost appended from above (or you may have this in your default params in solrconfig.xml). And here’s where this fragment causes some “interesting” behavior ms(NOW,manufacturedate_dt)
There are two issues here.

  1. You can actually repeat or skip results as you page. This is due to the “bucketing” of results. A few seconds can change the boost calculations just enough to cause some documents to be skipped or repeated as you page.
  2. Your queryResultCache is useless.

A quick review of queryResultCache

The queryResultCache is just a map of the query and some number of documents, in order, the results of that search. How many documents are kept in the cache is configurable in solrconfig.xml. So typically people will store 2 or 3 pages of results per query. This is adequate to handle the usual user experience; rarely do users page to the second page, much less the third. When a page request comes in such that the results aren’t in the queryResultCache, the query is re-executed.

But, critically for this discussion, the use of NOW in date boosting means that no query that uses date boosting is ever fetched from the queryResultCache!

I’m exaggerating a bit. It’s possible to do limited “date math” with the date boost function, things like …ms(NOW/MINUTE,manufacturedate_dt)…. are possible. Using this techinque reduces the problem, but doesn’t eliminate it.

What can be done?

I haven’t thought of a clean way to change the Solr query process to handle this. I can imagine a new parameter like “nowIs=2012-03-28T10:30:29Z”, with the understanding that all references to NOW in the query get this substituted, but that feels kludgy. Not to mention that doing this right would touch lots of places. And I guarantee that it would be much harder to get right than I think…

Another possibility is that you limit the problem. Using some expression like NOW/DAY+1DAY would confine the problem to queries page requests that span across midnight. And this will affect the scoring of documents put in the index today. Do note if you try this on a raw URL, you need to url-escape the ‘+’ as %2B.

A third possibility is to use the fact that Solr happily ignores URL parameters it doesn’t understand. You could create a custom QueryComponent and do the substitutions there. This allows you the possibility of recognizing that your index has changed and re-executing the query in that case. There are some interesting new capabilities in the SearcherLifetimeManager coming up, see Mike McCandless’ blog post here that could help as well, although I haven’t looked at it closely. One could perhaps just write a custom query component that recognized “ms(NOW” and substituted the formatted time into the query, but anything that simple would probably have unexpected side effects.

Another solution is to simply construct your paging URLs with a raw time rather than NOW. This would look like: 

b=recip(ms(2012-03-28T10:40:00Z,manufacturedate_dt),3.16e-11,1,1)}ipod

The easiest solution is to ignore the problem entirely. Mainly I’m posting this as an interesting dive into the subtleties with NOW, and how it can produce effects you don’t anticipate. If you’re interested in squeezing every last bit of performance out of your Solr instances, and you do heavy boosting by date, you might want to address this “problem”.

But except for date rounding (e.g. using NOW/DAY+1DAY rather than a bare NOW), I’d never do this kind of thing unless I had absolute proof that I needed to because:

  1. Any solution that implements this kind of process will take time and effort you could put into other parts of your application.
  2. In most applications, your users will never notice anyway. The only time this shows up is when you page and you happen to hit an edge case. Users rarely go to even the second page of search results so it’s a vanishingly small ROI for the coding/QA effort unless and until there is a demonstrated need.

 

Database

Published at DZone with permission of Erick Erickson. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Detecting Network Anomalies Using Apache Spark
  • Host Hack Attempt Detection Using ELK
  • Benefits and Challenges of Multi-Cloud Integration
  • Create a CLI Chatbot With the ChatGPT API and Node.js

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: