DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
11 Monitoring and Observability Tools for 2023
Learn more
  1. DZone
  2. Data Engineering
  3. Databases
  4. Options to Tune Document’s Relevance in Solr

Options to Tune Document’s Relevance in Solr

Chris Smith user avatar by
Chris Smith
·
Jan. 16, 12 · Interview
Like (0)
Save
Tweet
Share
13.64K Views

Join the DZone community and get the full member experience.

Join For Free

Working at Lucid Imagination a customer once asked me about how they could modify the score of the documents in Solr in order to get most relevant results higher in the results list. While I was trying to respond the question I realized that there are too many different options, and that not all of them are very easy to understand, so I decided to write some notes summarizing the most common/most used ways to do it. After that, many times I was asked the same question, so I decided to turn those notes into a blog post.

There are two stages where documents can be boosted: At index time and at query time.

Originally Authored by Tomás Fernández Löbbe


At Index Time

This is probably the simplest way, because there are not too many options. It is also the most static way of adding boosts, as changing the boost for a documents would require re-indexing it.

When updating documents using the XMLUpdateRequestHandler, the way to boost a document is to add the optional attribute “boost” to the doc element. When using SolrJ, the way to do it is by using the method

document.setDocumentBoost(x)

The default boost for a field is 1, so setting a value between 0 and 1 would down boost the document.

It is also possible to add different boosts to different fields of a document. The only requirement here is that the boosted fields must store the norms (“omitNorms” attribute in the schema must be set to “false”). The way of applying the boosts when using the XMLUpdateRequestHandler is similar to boosting the whole document, but instead of adding the “boost” attribute to the doc element, add it to the field element. When using SolrJ:

document.addField(“title”, “Foo Bar”, x);

It’s important to know that the boost (either for a document or for a field) will be considered when calculating the final score for a document given a search. It is not the final score of the document. Boosting documents is not the same as sorting documents.

At Query Time


Boosting at query time is a little bit different than index time. It is much more dynamic as it doesn’t require re-indexing and can be specified with every new request to Solr. Also, what gets boosted is not a document or a field, but a subquery on the search. The simplest way to achieve query time boosting is by using the ^ character plus the boost number on the query, for example:

foo^5 bar

Much more complex expressions can also be used for query time boosting, like:

title:(foo bar)^5 OR content:(foo bar)^2 OR foo OR bar

title:(foo bar)^5 OR title:”foo bar”^20 OR …

The syntax can be very simple for simple cases, but it will get more and more complex with more complex use cases.

The above syntax is Lucene’s query syntax, it is supported by the Lucene Query Parser and the Extended Dismax Query Parser but not by the Dismax Query Parser.
However, this syntax requires having an expert user who knows how to use it, or some application logic to inject it in the background after the user enters the query and before sending it to Solr. Dismax provides other alternatives for query time boosting, as dynamic as the previous one, but with a much easier syntax (all of them also supported by Extended Dismax).

Query Time Boosting with the Dismax Query Parser


Boosting Fields

The Dismax Query Parser (QP) will create a query that will be executed on many different fields, even if the user hasn’t specified any. This is one of the most important improvements of the Dismax QP over the Lucene QP. But sometimes, not all the fields have the same importance. Sometimes, a hit on the title field is more important than a hit on the content field, or a hit on the content can be more important than a hit on the comments field. The Dismax Query Parser provides the ability to consider some fields more important than others with the “qf” (named after “query fields”) parameter, the same that is used for specifying the different fields on which to execute the user query. A common value for this parameter could be:

qf=title^5 content^2 comments^0.5

This will translate a user query like “boo bar” into something similar to:

title:(foo bar)^5 OR content:(foo bar)^2 OR comments:(foo bar)^0.5


Boosting Phrases

The same as with query fields, Dismax Query Parser will execute the user query as a phrase query on the specified “phrase” fields. In this parameter, and in a similar way as in the qf parameter, a different boost for each of the phrase fields can be specified:

pf=title^20 content^10

This will translate a user query like foo bar into:

title:”foo bar”^20 OR content:”foo bar”^10

The last query will only be used for boosting the documents resulting from the original query.


Boost Queries

Sometimes it is necessary to boost some documents regardless of the user query. A typical example of boost queries is boosting sponsored documents. The user searches for “car rental”, but the application has some sponsored document that should be boosted. A good way of doing this is by using boost queries. A boost query is a query that will be executed on background after a user query, and that will boost the documents that matched it.

For this example, the boost query (specified by the “bq” parameter) would be something like:

bq=sponsored:true

The boost query won’t determine which documents are considered a hit an which are not, but it will just influence the score of the result.


Boost Functions

Boost Functions are very similar to boost queries; in fact, they can achieve the same goals. The difference between boost functions and boost queries is that the boost function is an arbitrary function instead of a query (see http://lucidworks.lucidimagination.com/display/solr/Function+Queries). A typical example of boost functions is boosting those documents that are more recent than others. Imagine a forum search application, where the user is searching for forum entries with the text “foo bar”. The application should display all the forum entries that talk about “foo bar” but usually the most recent entries are more important (most users will want to see updated entries, and not historical). The boost function will be executed on background after each user query, and will boost some documents in some way.

For this example, a boost function (specified by the “bf” parameter) could be something like:

bf=recip(ms(NOW,publicationDate),3.16e-11,1,1)

The same as with the boost queries, this function will not determine which documents are a hit and which are not, it will just add additional score to them.

A note on boost functions: boost functions can also be used with the Lucene QP by using the “_val_” special key inside the query.


Tie Breaker

The “tie” (tie breaker) parameter is very important, but not easy to understand. First it is important to understand what is a dismax (http://www.lucidimagination.com/blog/2010/05/23/whats-a-dismax/). With DisMax queries, the different terms of the user input are executed against different fields, if many of them hit (the term appears in different fields in the same document) the hit that scores higher is used, but what happens with the other sub-queries that hit in that document for the term? Well, that’s what the “tie” parameter defines. DisMax will calculate the score for a term query as:

score= [score of the top scoring subquery] + tie * (sum of other hitting subqueries)

In consequence, the “tie” parameter is a value between 0 and 1 that will define if the Dismax will only consider the max hit score for a term (setting tie=0), all the hits for a term (setting tie=1) or something between those two extremes.


The boost Parameter

The “boost” parameter is very similar to the “bf” parameter, but instead of adding its result to the final score, it will multiply it. This is only available in the “Extended Dismax Query Parser” or the “Lucid Query Parser”.


A note on the parameters

All the above parameters can be specified when configuring Solr (in the solrconfig.xml file) but they can also be changed on each request just by sending the parameter on the request with the new value.


Source: http://www.lucidimagination.com/blog/2011/12/14/options-to-tune-document%E2%80%99s-relevance-in-solr/

code style Database Boost (C++ libraries)

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Distributed Tracing: A Full Guide
  • Top 11 Git Commands That Every Developer Should Know
  • Readability in the Test: Exploring the JUnitParams
  • Test Execution Tutorial: A Comprehensive Guide With Examples and Best Practices

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: