Introduction to Elasticsearch
Elasticsearch is an open source search server that is built on Apache Lucene. It uses Lucene as the core search engine, but provides many features that are not part of Lucene. For example, usage of Elasticsearch relies on a comprehensive REST API. Like Apache Lucene, Elasticsearch is written in Java and is therefore cross-platform and will work on many different operating systems. Elasticsearch was designed to be scalable from the beginning, which means that it has a distributed architecture. It’s designed to take data from any data source and make it searchable.
Communication with Elasticsearch is done through an HTTP REST API. For instance, consider the example where an employee is fetched by issuing an HTTP request using cURL. We will see plenty of examples of this throughout this series, so we are not going to go into the details of this right now, but just know that requests usually follow the pattern that you see below, where the ID is optional.
curl -X[HTTP method] [node address]:[port]/[index name]/[type name]/[ID]
E.g:
E.g. curl -X GET http://localhost:9200/mywebshop/employee/123
The documents that are stored in Elasticsearch are JSON documents that are schema-less, much like in NoSQL databases. This means that you don’t have to define fields and their data types before adding data like is the case for relational databases, for instance. Elasticsearch is open source, but is being developed by Elasticsearch BV, which is a company that provides commercial solutions related to the search engine. Elasticsearch is near real-time, meaning that from the point in time that one adds, modifies or deletes a document, the changes are propagated throughout the entire cluster within one or two seconds. For small clusters, the changes will practically be propagated faster. Because of its distributed and scalable architecture, changes are not immediately available like in a relational database that runs on a single machine, for instance.
Prominent Users
Elasticsearch has gotten very popular and hyped in the past few years. Part of its success is because of how easy it is to use compared to its competitors such as Apache Solr, but also because of the comprehensive feature set. Within the enterprise, its success is also much due to the fact that it scales extremely well thanks to its distributed architecture. On this slide, I just wanted to show you that the hype is not without reason, because a lot of large organizations make use of Elasticsearch, as you can see on this list of prominent users.
Here is what you will learn:
- The architecture of Elasticsearch
- Mappings and analyzers
- Many kinds of search queries (simple and advanced alike)
- Aggregations, stemming, auto-completion, pagination, filters, fuzzy searches, etc.
- ... and much more!
