Introduction to Searching
Before we will get into actually performing searches against an Elasticsearch cluster, I want to introduce the basic concepts of searching in Elasticsearch.
In this post, I will talk about relevancy and scoring in Elasticsearch. I will also briefly introduce the two ways of searching, as well as the various types of queries.
Relevancy & Scoring
Elasticsearch calculates a score for each document that matches a given query and ranks the documents according to the score. The higher the score, the more relevant the document is to the search query. When searching, there are two different contexts where queries can be applied; query context and filter context. Queries within the query context answer the question “how well does the document match?”, assuming that it does match. Queries applied in the filter context, on the other hand, answer the question “does the document match?” If a document does not match a filter, then it is discarded and will not be part of the results. It is important to note that filters do not affect the scores of matching documents, but rather filters documents out that do not satisfy the requirements of the filters.
Ways of Searching
Now we will briefly discuss the two ways of searching with Elasticsearch, namely by query string and using the query DSL.
Query String
The first way that you can search using Elasticsearch is with query strings. This method uses only a URL to perform searches, and the search query is defined by using a q query parameter. It is usually used for simple queries and ad-hoc queries on the command line, but it does also support rather advances queries. Below is an example of how a query could look like.
GET http://localhost:9200/ecommerce/product/_search?q=pasta
The URL specifies the endpoint of the Elasticsearch cluster and searches for documents of the product mapping type within the ecommerce index. The _search API is used to perform the search, and the search query is specified as a query parameter. All fields are searched by default. This HTTP GET request is all it takes to performing a search in Elasticsearch.
Query DSL
The other way is using the so-called query DSL. With this approach, a query is defined in JSON within the request body of a request, rather than in the URL. The query DSL is more flexible and supports more features than the query string approach, and is therefore often used for more advanced queries. For more complex queries, the JSON can also be easier to read than a long query string. Below, you can see a query that is similar to the one you saw a moment ago.
GET http://localhost:9200/ecommerce/product/_search
{
"query": {
"match": {
"name": "pasta"
}
}
}
The only difference is that with this query, I explicitly define which field I am searching, but I could also specify the _all field if I wanted to. In this case, I am searching for products that contain the term pasta in their names. As you can see, the q parameter has been replaced with JSON in the request body.
Types of Queries
Now that I have introduced you to how you can perform searches with Elasticsearch, I will move on to talking about the various types or categories of queries.
Leaf & Compound Queries
There are two main types of queries in Elasticsearch; the leaf and compound queries. Leaf queries look for particular values within particular fields. This could be pasta within a product’s name field as in the previous examples. These queries can be used by themselves, without being part of a compound query. They can also be used within compound queries to construct more advanced queries. Compound queries are therefore queries that wrap leaf clauses or even other compound query clauses. They are used to combine multiple queries, usually using boolean logic, and can be used to alter the behavior of queries.
Full Text
Full text queries are used for running full text queries, meaning looking for a term in potentially a lot of text. The examples in this post were all full text searches. When adding documents or modifying values of documents, the values for full text fields are analyzed. Depending on the analyzer that is used, this process can involve removing stop words, tokenizing words and lowercasing the text. For searches to match, a field’s analyzer is also applied to the query string before executing a search query.
Term Levle
Term level queries are used for matching exact values and are therefore usually used for structured data such as numbers and dates. An example could be finding persons born between year 1980 and 2000. Because term level queries are used for exact matching, the search queries are not analyzed before being executed.
Joining Queries
Because performing joins in a distributed system is expensive in terms of performance, Elasticsearch offers two forms of joins that are designed to scale horizontally. The first is the nested query, which you may remember from when I talked about the nested data type. If a document contains a field of the type nested with an array of objects, then each of these objects can be queried as independent documents by using the nested query. Next are the has_child and has_parent queries, which are useful for parent-child relationships between two document types. The has_child query returns parent documents whose child documents match a given query, while the has_parent query returns child documents whose parent document matches a given query. So, the has_child query is used to add query clauses to child documents, whereas the has_parent query is used to add query clauses to parent documents.
Geo Queries
Last but not least, are geo queries. As you might recall, there are two data types for geo fields; geo_point and geo_shape. There are a number of geo queries that use these field to perform geographical searches, such as finding nearby points of interest based on GPS coordinates. There are quite a few of these queries, so I am not going to go through them now.
That is all I wanted you to know before diving into searching, which also happens to be the most fun part. So join me again in the next post, where I will show you how to search in Elasticsearch.
Here is what you will learn:
- The architecture of Elasticsearch
- Mappings and analyzers
- Many kinds of search queries (simple and advanced alike)
- Aggregations, stemming, auto-completion, pagination, filters, fuzzy searches, etc.
- ... and much more!