Searching with Query DSL: Compound Queries

Published on November 12, 2016 by

Now that I have shown you how to use full text and term level queries with the query DSL, it’s time to take a look at compound queries. Remember that compound queries consist of leaf queries or other compound queries.

To start off, I will perform a simple search that uses simple boolean logic. The example is going to search for products with both “pasta” and “spaghetti” in the name.

GET /ecommerce/product/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "name": "pasta" } },
        { "match": { "name": "spaghetti" } }
      ]
    }
  }
}

The conditions specified by the leaf queries within the must array must all be true for a document to match, like the logical AND. The matches within the must clause contribute to documents’ relevancy scores.

While the must clauses define conditions that must be satisfied, must_not clauses define clauses that must not evaluate to true. This can for instance be used to exclude documents with particular values for particular fields. I will you an example that matches documents with “pasta” in their names, but where “spaghetti” must not be part of the name.

GET /ecommerce/product/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "name": "pasta" } }
      ],
      "must_not": [
        { "match": { "name": "spaghetti" } }
      ]
    }
  }
}

Scrolling through the results will show that the product with both “pasta” and “spaghetti” in its name has been excluded.

The bool query also has a should parameter. When one of the should clauses match, a document’s relevancy score is increased – otherwise they have no effect. Therefore, should clauses can be used to increase the relevancy for certain documents that satisfy the search criteria, but should have higher relevancy scores. While I mentioned that should clauses only affect the relevancy scores of documents if they match and otherwise do nothing, there is one exception to this. If the search query has no must clause, then at least one of the should clauses must match, so in this case, the should clauses behave like the logical OR where at least one must be true. If at least one must clause is present, then no should clauses are required to match, and in this case they only affect the relevancy score of matched documents.

Let me show you an example of how the should query can be used.

GET /ecommerce/product/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "name": "pasta" } }
      ],
      "should": [
        { "match": { "name": "spaghetti" } }
      ]
    }
  }
}

In this example, product names must contain the term “pasta“. I have also added a match query to the should clause, which looks for the term “spaghetti” in the product name. If I run this query, you can see that products with “pasta” in their name are matched, but you may also notice that a product with both “pasta” and “spaghetti” has the highest relevance score. Let me explain why. Because we have a must clause, then the should clause is optional and no match is required. However, documents where the should clause is true, are assigned a higher relevance score. In this example, this is true for the product that includes both “pasta” and “spaghetti” in its name.

Now let’s see what happens if I remove the must clause.

GET /ecommerce/product/_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "name": "spaghetti" } }
      ]
    }
  }
}

Because we no longer have a must clause, at least one should clause must match. In my case I only have one, so the match query must match. If I run the query and inspect the results, you can see that only the product that includes the term “spaghetti” in its name is matched.

Now that we have seen how must, must_not and should clauses can be nested within bool queries, it’s also worth mentioning that for even more flexibility and more complex queries, it’s possible to nest bool queries inside any of these clauses.

There are two more queries that I would like to mention. The first one is the function_score query, which lets you provide a function that can modify the score of documents that were returned by a query. This can be useful for boosting newly added documents or boosting documents based on popularity. There is also a query named boosting, which can be used to reduce the score of documents that match a certain query. A positive query can be defined, which would be your typical query as well as a negative query. Documents that match the negative query have their score lowered, and you can define by how much it should be lowered.

There are other compound queries as well, but these were the ones that you are most likely to come across.

Featured

Learn Elasticsearch today!

Take an online course and become an Elasticsearch champion!

Here is what you will learn:

  • The architecture of Elasticsearch
  • Mappings and analyzers
  • Many kinds of search queries (simple and advanced alike)
  • Aggregations, stemming, auto-completion, pagination, filters, fuzzy searches, etc.
  • ... and much more!
Elasticsearch logo
Author avatar
Bo Andersen

About the Author

I am a back-end web developer with a passion for open source technologies. I have been a PHP developer for many years, and also have experience with Java and Spring Framework. I currently work full time as a lead developer. Apart from that, I also spend time on making online courses, so be sure to check those out!

Leave a Reply

Your e-mail address will not be published.