Over a million developers have joined DZone.

Solr relevancy function queries

· Java Zone

Easily build powerful user management, authentication, and authorization into your web and mobile applications. Download this Forrester report on the new landscape of Customer Identity and Access Management, brought to you in partnership with Stormpath.

Lucene’s default ranking function uses factors such as tf, idf, and norm to help calculate relevancy scores.
Solr has now exposed these factors as function queries.

  • docfreq(field,term) returns the number of documents that contain the term in the field.
  • termfreq(field,term) returns the number of times the term appears in the field for that document.
  • idf(field,term) returns the inverse document frequency for the given term, using the Similarity for the field.
  • tf(field,term) returns the term frequency factor for the given term, using the Similarity for the field.
  • norm(field) returns the “norm” stored in the index, the product of the index time boost and then length normalization factor.
  • maxdoc() returns the number of documents in the index, including those that are marked as deleted but have not yet been purged.
  • numdocs() returns the number of documents in the index, not including those that are marked as deleted but have not yet been purged.


We can use these new functions to develop and test custom ranking functions! For example, if we wanted simple tf*idf for a given term, we could issue the following function query (if you have solr’s example server running with exampledocs indexed, just click on the following link):

http://localhost:8983/solr/select/?fl=score,id&defType=func&q=mul(tf(text,memory),idf(text,memory))

To avoid repeating the term we are using (text,memory) we can pull the field and term out into other query parameters:

http://localhost:8983/solr/select/?fl=score,id&defType=func&q=mul(tf($f,$t),idf($f,$t))&f=text&t=memory

Utilizing Solr’s new ability to sort by arbitrary function queries, we could now sort a query by the number of times a specific term appears in each document. The following query searches for documents matching “DDR”, but then sorts by the number of times “memory” appears in the text field.

http://localhost:8983/solr/select/?fl=score,id&q=DDR&sort=termfreq(text,memory) desc

We could also utilize the “norm” function to sort by the longest field first. This assumes there were no index time boosts and thus the norm is just the standard length normalizationf actor.

http://localhost:8983/solr/select/?fl=score,id&q=DDR&sort=norm(text) asc

Given Solr’s plethora of function queries (including the new spatial queries that return distance between points), the possibilities are almost endless. To try this out, you’ll need a recent nightly build of Solr 4.0-dev, or LucidWorks Enterprise, our commercial version of Solr.

The Java Zone is brought to you by Stormpath—a complete, pre-built User Management API. Want to learn how to use JWTs to protect microservices from CSRF and more? Check out this on-demand webinar with our Java Developer Evangelist, Micah Silverman.

Topics:

Published at DZone with permission of Yonik Seeley. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}