Over a million developers have joined DZone.

Solr relevancy function queries

· Java Zone

Discover how AppDynamics steps in to upgrade your performance game and prevent your enterprise from these top 10 Java performance problems, brought to you in partnership with AppDynamics.

Lucene’s default ranking function uses factors such as tf, idf, and norm to help calculate relevancy scores.
Solr has now exposed these factors as function queries.

  • docfreq(field,term) returns the number of documents that contain the term in the field.
  • termfreq(field,term) returns the number of times the term appears in the field for that document.
  • idf(field,term) returns the inverse document frequency for the given term, using the Similarity for the field.
  • tf(field,term) returns the term frequency factor for the given term, using the Similarity for the field.
  • norm(field) returns the “norm” stored in the index, the product of the index time boost and then length normalization factor.
  • maxdoc() returns the number of documents in the index, including those that are marked as deleted but have not yet been purged.
  • numdocs() returns the number of documents in the index, not including those that are marked as deleted but have not yet been purged.


We can use these new functions to develop and test custom ranking functions! For example, if we wanted simple tf*idf for a given term, we could issue the following function query (if you have solr’s example server running with exampledocs indexed, just click on the following link):

http://localhost:8983/solr/select/?fl=score,id&defType=func&q=mul(tf(text,memory),idf(text,memory))

To avoid repeating the term we are using (text,memory) we can pull the field and term out into other query parameters:

http://localhost:8983/solr/select/?fl=score,id&defType=func&q=mul(tf($f,$t),idf($f,$t))&f=text&t=memory

Utilizing Solr’s new ability to sort by arbitrary function queries, we could now sort a query by the number of times a specific term appears in each document. The following query searches for documents matching “DDR”, but then sorts by the number of times “memory” appears in the text field.

http://localhost:8983/solr/select/?fl=score,id&q=DDR&sort=termfreq(text,memory) desc

We could also utilize the “norm” function to sort by the longest field first. This assumes there were no index time boosts and thus the norm is just the standard length normalizationf actor.

http://localhost:8983/solr/select/?fl=score,id&q=DDR&sort=norm(text) asc

Given Solr’s plethora of function queries (including the new spatial queries that return distance between points), the possibilities are almost endless. To try this out, you’ll need a recent nightly build of Solr 4.0-dev, or LucidWorks Enterprise, our commercial version of Solr.

The Java Zone is brought to you in partnership with AppDynamics. AppDynamics helps you gain the fundamentals behind application performance, and implement best practices so you can proactively analyze and act on performance problems as they arise, and more specifically with your Java applications. Start a Free Trial.

Topics:

Published at DZone with permission of Yonik Seeley .

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}