In this part of the series, we will be talking about mapping in Elasticsearch. I will introduce mapping as well as cover dynamic and explicit mapping.
What is Mapping in Elasticsearch?
Like I briefly mentioned in the article about Elasticsearch terminology, mapping describes how documents and their fields are indexed and stored. This typically involves defining the data types of the fields that make up the documents, which is quite similar to what one does in relational databases. Mapping can also be used to define the format for date fields, for instance, and to define whether or not field values should be indexed into the catch-all field named _all. There is more to mapping than this, but the most important part is defining the data types and formats of fields.
Field Data Types
A mapping type contains fields such as the title or content for an article type. Each field has a data type, which can be strings, numbers, booleans, etc. These data types can be defined in the mapping, similar to how data types are defined for columns in relational databases. We will take a more detailed look at this in the next article in this series, where I will be walking through the most important data types. For now, you should just know that this information is associated with mapping types.
Each mapping type contains a number of meta fields for various purposes, some of which can be customized. Each document has these meta fields associated with them apart from the JSON object that was added to the index. A few examples of such meta fields are _id, _type, _uid and _index. We will get back to meta fields in one of the next articles, so I am not going to go into much detail now.
Mapping does not necessarily have to be defined explicitly before adding documents of a given type to an index. In fact, dynamic mapping refers to the automatic detection and addition of new types and fields. This means that you can add a document without first having to define the mapping type and fields. It is even possible to add a document without creating an index first! In this case, Elasticsearch will take care of creating the index, mapping type and fields. This is done automatically, and Elasticsearch will infer the data types based on the document’s data.
Explicit mapping refers to adding mapping information explicitly instead of letting Elasticsearch infer this based on added documents. This can be done when creating an index, but can also be done by issuing a PUT request to an existing index. Explicit mapping is useful if you have some requirements for a mapping type’s data, such as date formats. It can also be useful if Elasticsearch cannot infer the correct mapping information based on the added documents. While dynamic mapping is very convenient, especially when getting started with Elasticsearch, it can be a good idea to define explicit mappings, as this describes the data and the requirements for this data.
Now there are a few gotchas in regards to mapping that you should be aware of. First of all, existing type and field mappings cannot be updated. This means that if you have data in an index, you have to create a new index with the new mappings and add the data into the new index. This is because any existing data would effectively be invalidated if existing mappings were updated.
Secondly, it is important to know that fields are shared across mapping types. What this means, is that if a title field exists in both an employee and article mapping type, then the fields must have exactly the same mapping in each type. This is quite inconvenient if you need different mappings, but the easy solution is to prefix field names with the name of the mapping type. In this example, the title fields would then be named employee_title and article_title. In my humble opinion, this is really a shame, because it does not look pretty and is not convenient for developers, but it is nevertheless how Elasticsearch works for the time being.
In the next two articles, we will go through field data types and meta fields in more details.