DZone
Java Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Java Zone > Solr relevancy function queries

Solr relevancy function queries

Yonik Seeley user avatar by
Yonik Seeley
·
Apr. 21, 11 · Java Zone · News
Like (1)
Save
Tweet
21.38K Views

Join the DZone community and get the full member experience.

Join For Free

Lucene’s default ranking function uses factors such as tf, idf, and norm to help calculate relevancy scores.
Solr has now exposed these factors as function queries.

  • docfreq(field,term) returns the number of documents that contain the term in the field.
  • termfreq(field,term) returns the number of times the term appears in the field for that document.
  • idf(field,term) returns the inverse document frequency for the given term, using the Similarity for the field.
  • tf(field,term) returns the term frequency factor for the given term, using the Similarity for the field.
  • norm(field) returns the “norm” stored in the index, the product of the index time boost and then length normalization factor.
  • maxdoc() returns the number of documents in the index, including those that are marked as deleted but have not yet been purged.
  • numdocs() returns the number of documents in the index, not including those that are marked as deleted but have not yet been purged.


We can use these new functions to develop and test custom ranking functions! For example, if we wanted simple tf*idf for a given term, we could issue the following function query (if you have solr’s example server running with exampledocs indexed, just click on the following link):

http://localhost:8983/solr/select/?fl=score,id&defType=func&q=mul(tf(text,memory),idf(text,memory))

To avoid repeating the term we are using (text,memory) we can pull the field and term out into other query parameters:

http://localhost:8983/solr/select/?fl=score,id&defType=func&q=mul(tf($f,$t),idf($f,$t))&f=text&t=memory

Utilizing Solr’s new ability to sort by arbitrary function queries, we could now sort a query by the number of times a specific term appears in each document. The following query searches for documents matching “DDR”, but then sorts by the number of times “memory” appears in the text field.

http://localhost:8983/solr/select/?fl=score,id&q=DDR&sort=termfreq(text,memory) desc

We could also utilize the “norm” function to sort by the longest field first. This assumes there were no index time boosts and thus the norm is just the standard length normalizationf actor.

http://localhost:8983/solr/select/?fl=score,id&q=DDR&sort=norm(text) asc

Given Solr’s plethora of function queries (including the new spatial queries that return distance between points), the possibilities are almost endless. To try this out, you’ll need a recent nightly build of Solr 4.0-dev, or LucidWorks Enterprise, our commercial version of Solr.

Database

Published at DZone with permission of Yonik Seeley. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Best Practices for Resource Management in PrestoDB
  • Waterfall Vs. Agile Methodologies: Which Is Best For Project Management?
  • How to Utilize Python Machine Learning Models
  • A Guide to Understanding Vue Lifecycle Hooks

Comments

Java Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo