Boosting

Published on November 12, 2016 by

This article explains how to boost terms and query clauses when searching in Elasticsearch.

When searching for multiple terms, it is sometimes useful to be able to assign a higher or lower priority to certain terms. Elasticsearch provides a way of doing this by specifying a positive floating point number. Below is an example query.

GET /ecommerce/product/_search?q=name:pasta spaghetti^2.0

As you can see in this query, I am boosting the term spaghetti by using the boost operator followed by a floating point number. The default boost value is 1, so any number greater than one will increase the importance of a term, and any value between 0 and 1 will decrease the importance. In this case, I am increasing the importance of the term spaghetti such that documents that contain this term within its name field will get a boost in their relevancy scores.

It is also possible to add a boost to a phrase. So let’s change the query to search for a phrase of pasta spaghetti rather than two terms.

GET /ecommerce/product/_search?q=name:"pasta spaghetti"^2.0

As you can see, the boost is simply specified at the end of the phrase, after the last quotation mark. Documents containing this phrase within their name field will have a boosted relevancy score. Of course this makes more sense if you are searching more fields or applying some other constraints to the documents, but the purpose of this query is just to show you how to boost a phrase.

Now that you know how to boost terms and phrases with query string searches, let’s take a look at how to do this with the query DSL. I will just copy in an example query because it’s something that you have all seen before. The query is a bool query that requires documents to contain the term pasta and boosts documents that contain the terms spaghetti and noodle.

GET /ecommerce/product/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "name": "pasta" } }
      ],
      "should": [
        {
          "match": {
            "name": {
              "query": "spaghetti"
            }
          }
        },
        {
          "match": {
            "name": {
              "query": "noodle"
            }
          }
        }
      ]
    }
  }
}

At the moment, the query doesn’t contain boosts for any terms. Within a field object that is nested within a query object (in this case a match query), one can add a boost property with a positive floating point number as its value. I will just go ahead and add that to the query.

GET /ecommerce/product/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "name": "pasta" } }
      ],
      "should": [
        {
          "match": {
            "name": {
              "query": "spaghetti",
              "boost": 2.0
            }
          }
        },
        {
          "match": {
            "name": {
              "query": "noodle",
              "boost": 1.5
            }
          }
        }
      ]
    }
  }
}

By default, if the match queries within the bool query’s should clause are satisfied, then a given document will get a higher relevance score. In this query, however, I explicitly specify boost values for each match query, meaning that I am specifying how much each of the queries will boost the relevance score if satisfied. Note that the boost value is not linear, meaning that a boost of 2 will not result in a document’s score being twice as high. The inner workings of how the score is calculated and how a boost value affects it is beyond the scope of this article, but just know that the higher the boost, the higher the score will be. In this example, I have specified that the term spaghetti is more important than the term noodle, so if I run this query and inspect the result, then you will see that all of the matches contain the term pasta and that the matches that contain the term spaghetti have a higher score. After these documents, we find a document that includes the term noodle. This is exactly according to our boosts, which specify that spaghetti is more important than noodle.

Boosts can be added in the same way for other query types, such as the match_phrase query. As you have seen, boosting is very useful for specifying which terms, phrases or queries are more or less important, enabling you to easily manipulate the prioritization of the search results. There are other ways of doing this if you need complete control, but this is by far the easiest way, and all you need to know about for now.

Featured

Learn Elasticsearch today!

Take an online course and become an Elasticsearch champion!

Here is what you will learn:

  • The architecture of Elasticsearch
  • Mappings and analyzers
  • Many kinds of search queries (simple and advanced alike)
  • Aggregations, stemming, auto-completion, pagination, filters, fuzzy searches, etc.
  • ... and much more!
Elasticsearch logo
Author avatar
Bo Andersen

About the Author

I am a back-end web developer with a passion for open source technologies. I have been a PHP developer for many years, and also have experience with Java and Spring Framework. I currently work full time as a lead developer. Apart from that, I also spend time on making online courses, so be sure to check those out!

One comment on »Boosting«

  1. Ed Webb

    Hi Bo, Thanks for the article. In your example, the set of results when you search for just “pasta” could contain results which do not have either “spaghetti” or “noodle”. These results will not appear in your boosting query since it must match at least one of spaghetti or noodle. I have a situation where I want to boost results of a query based on the values of a different field, but I want the results still to contain the documents where this field does not match – just with a lower relevance score. Is there a way to do this?

Leave a Reply

Your e-mail address will not be published.