An introduction to query time boosts in Solr

Published:
Updated:
It is possible to boost certain documents at query time in Solr. Query time boosting can be a powerful resource for finding the most relevant and "best" content. Of course the more information you index, the more fields you will be able to use for your query time boosts. A useful application of query time boosting is giving a boost to the newest content.
 

Looking at an Example


Below is an example of a boost function that scores more recent content higher than older content. But first, for this example, we should define a schema.xml file:
<field name="id" type="number" indexed="true" stored="true" required="true" />
                      <field name="title" type="text_en" indexed="true" stored="true"/>
                      <field name="submit_date" type="date" indexed="true" stored="false" />
                      <field name="rating" type="number" indexed="true" stored="false" />

Open in new window

I won't go into detail too much about this schema, but it refers to a document with an ID, a title, a submit date, and a rating, which will be relevant to other examples in this article.

So the first example we will go over is the example given from the Solr Relavancy Wiki:

{!boost b=recip(ms(NOW/HOUR,submit_date),3.16e-11,1,1)}

Open in new window

There are many things going on in this one boost query, so I'll break it down from the inside out:

  • NOW - The time in milliseconds since the Epoch (January 1, 1970 (midnight UTC/GMT))
  • /HOUR - This operation rounds NOW to the start of the current hour
  • submit_date - a field the documents, in this case the submit_date.
  • ms(NOW/HOUR,submit_date) - ms is a function. As explained in the FunctionQuery wiki page, ms returns the difference in milliseconds between the arguments. So in this case, the difference between NOW and the submit date of the document.
  • recip(ms(NOW/HOUR,submit_date), 3.16e-11, 1, 1) - recip is another function that represents this mathematical operation: a/(m*x+b) where a, m, x, and b are the arguments of the recip function - recip(x,m,a,b)
  • !boost b= - that part is just syntax.
Adding that boost to your query will assign more recent documents with a higher score. If we look at the mathematical operation of the recip function:
equation.jpgm is the difference in time between NOW and the submit_date. a, m, and b are constants, so the smaller the difference between NOW and submit_date (or the closer NOW is to a document's submit_date), the higher the value that function returns. The higher the value, the higher the score Solr assigns the document.
 

The Solr Admin Interface and running queries


If you are not already familiar with the Solr Admin interface, it is really useful for testing queries. Figure 1 shows the query interface.
01.jpg
Figure 1: Solr Admin Interface for querying a collection
The interface is full of useful tools and information, but for the purposes of this article, the useful fields to make note of are the q and fl inputs. q is where your query goes and fl is a comma separated list of the fields you want to view in the output. An important field to put in the fl input is score. This field is useful for determining the impact of your query.

For my index, if I run a query to find all documents with 'test' in the title and I want see the submit_date, title, and score of each document, I would set q to:
 
(title:test)

Open in new window

and fl to:
 
submit_date,title, score

Open in new window

You will get results like this:
 
<result name="response" numFound="xxxx" start="0" maxScore="6.8792486">
                        <doc>
                          <date name="submit_date">2000-04-04T18:31:01Z</date>
                          <str name="title">testing</str>
                          <float name="score">6.8792486</float></doc>
                        <doc>
                          <date name="submit_date">2000-03-24T23:37:50Z</date>
                          <str name="title">testing</str>
                          <float name="score">6.8792486</float></doc>
                        <doc>
                        .
                        .
                        .

Open in new window

As you can see, the top results that Solr is returning are from the year 2000. If I added the boost in the example above, q will be:
 
{!boost b=recip(ms(NOW/HOUR,submit_date),3.16e-11,1,1)}(title:test)

Open in new window

and that query returns:
 
<result name="response" numFound="xxxxx" start="0" maxScore="6.8649974">
                        <doc>
                          <date name="submit_date">2014-11-04T23:45:05Z</date>
                          <str name="title">test</str>
                          <float name="score">6.8649974</float></doc>
                        <doc>
                          <date name="submit_date">2014-10-31T17:20:54Z</date>
                          <str name="title">TEST</str>
                          <float name="score">6.7861075</float></doc>
                        <doc>
                        .
                        .
                        .

Open in new window

These results are a lot more recent, which could be useful in cases where you would want to give newer content a higher boost than older content.

Setting q to {!boost b=rating}(title:test) will return results with test in the title and give a boost to documents with a higher rating value. You can also use multiple boosts, so if you wanted to boost by rating and the submit_date you could set q to:
 
{!boost b=rating}{!boost b=recip(ms(NOW/HOUR,submit_date),3.16e-11,1,1)}(title:test)

Open in new window


Tradeoffs



Performance

Doing one or more math operations on an index with many documents will definitely take more time when executing the query. Having complex boosting operations might not be viable for real time applications, but it can still be useful for applications where speed is not a necessity.
 

Pitfalls

When performing functions on fields, you have to be careful not to provide "bad" documents with an infinite score. For example, if you added the following boost, which divides the number 1 by the rating of the document:
 
{!boost b=div(1,rating)}

Open in new window

the score will be infinite if the rating of a particular document is 0 (zero). So when you test your query in the Solr Admin interface, you might get something that looks like this:
 
<result name="response" numFound="xxxx" start="0" maxScore="Infinity">
                        <doc>
                          <str name="title">Words</str>
                          <int name="rating">0</int>
                          <float name="score">Infinity</float></doc>
                        <doc>
                          <str name="title">Different Words</str>
                          <int name="rating">0</int>
                          <float name="score">Infinity</float></doc>
                        <doc>
                        .
                        .
                        .

Open in new window

When the score of one or more documents are infinite, you cannot determine the order documents are returned. A useful function to use in this case would be the scale function:
 
scale(x,minTarget,maxTarget)

Open in new window

where the return value of the function is between minTarget and maxTarget depending on the relative value of x to other documents, so our original boost would be changed to:
 
{!boost b=div(1,scale(rating,1,2)}

Open in new window


This article demonstrates only a small part of what can be accomplished with query time boosting. Query time boosting can be a useful tool for finding the newest and highest rated content. Depending on your index and the FunctionQueries used, boosting may increase query time. You can boost on any field in your index and with the FunctionQuery, you can use functions of a field to affect the relevancy score.
0
7,029 Views

Comments (0)

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.