Boosting

This entry is part 30 of 35 in the series Complete Guide to Elasticsearch

This article explains how to boost terms and query clauses when searching in Elasticsearch.

When searching for multiple terms, it is sometimes useful to be able to assign a higher or lower priority to certain terms. Elasticsearch provides a way of doing this by specifying a positive floating point number. Below is an example query.

GET /ecommerce/product/_search?q=name:pasta spaghetti^2.0

As you can see in this query, I am boosting the term spaghetti by using the boost operator followed by a floating point number. The default boost value is 1, so any number greater than one will increase the importance of a term, and any value between 0 and 1 will decrease the importance. In this case, I am increasing the importance of the term spaghetti such that documents that contain this term within its name field will get a boost in their relevancy scores.

It is also possible to add a boost to a phrase. So let’s change the query to search for a phrase of pasta spaghetti rather than two terms.

GET /ecommerce/product/_search?q=name:"pasta spaghetti"^2.0

As you can see, the boost is simply specified at the end of the phrase, after the last quotation mark. Documents containing this phrase within their name field will have a boosted relevancy score. Of course this makes more sense if you are searching more fields or applying some other constraints to the documents, but the purpose of this query is just to show you how to boost a phrase.

Now that you know how to boost terms and phrases with query string searches, let’s take a look at how to do this with the query DSL. I will just copy in an example query because it’s something that you have all seen before. The query is a bool query that requires documents to contain the term pasta and boosts documents that contain the terms spaghetti and noodle.

GET /ecommerce/product/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "name": "pasta" } }
      ],
      "should": [
        {
          "match": {
            "name": {
              "query": "spaghetti"
            }
          }
        },
        {
          "match": {
            "name": {
              "query": "noodle"
            }
          }
        }
      ]
    }
  }
}

At the moment, the query doesn’t contain boosts for any terms. Within a field object that is nested within a query object (in this case a match query), one can add a boost property with a positive floating point number as its value. I will just go ahead and add that to the query.

GET /ecommerce/product/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "name": "pasta" } }
      ],
      "should": [
        {
          "match": {
            "name": {
              "query": "spaghetti",
              "boost": 2.0
            }
          }
        },
        {
          "match": {
            "name": {
              "query": "noodle",
              "boost": 1.5
            }
          }
        }
      ]
    }
  }
}

By default, if the match queries within the bool query’s should clause are satisfied, then a given document will get a higher relevance score. In this query, however, I explicitly specify boost values for each match query, meaning that I am specifying how much each of the queries will boost the relevance score if satisfied. Note that the boost value is not linear, meaning that a boost of 2 will not result in a document’s score being twice as high. The inner workings of how the score is calculated and how a boost value affects it is beyond the scope of this article, but just know that the higher the boost, the higher the score will be. In this example, I have specified that the term spaghetti is more important than the term noodle, so if I run this query and inspect the result, then you will see that all of the matches contain the term pasta and that the matches that contain the term spaghetti have a higher score. After these documents, we find a document that includes the term noodle. This is exactly according to our boosts, which specify that spaghetti is more important than noodle.

Boosts can be added in the same way for other query types, such as the match_phrase query. As you have seen, boosting is very useful for specifying which terms, phrases or queries are more or less important, enabling you to easily manipulate the prioritization of the search results. There are other ways of doing this if you need complete control, but this is by far the easiest way, and all you need to know about for now.

Series Navigation<< Proximity SearchesFiltering Results >>

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *