{{announcement.body}}
{{announcement.title}}

Elasticsearch Index v7.6

DZone 's Guide to

Elasticsearch Index v7.6

In this article, we discuss how to work with indexes in Elasticsearch and search and visualize results.

· Big Data Zone ·
Free Resource

Elasticsearch, which is based on Lucene, is a distributed document store. It is a highly effective way of indexing your information for correlation and quick query for analysis. In this blog, I will just walk you through the steps required to create an Index, search, and visualize.

What Is an Index?

In the context of ES an index is a collection of documents.

Three basic elements in an index are Documents and Fields.

Index in Elasticsearch

Some points to remember for an index are:

  • Index is a logical grouping of physical shards.
  • Single document may be distributed across multiple shards.
  • Shards can be primaries or replicas.
  • Each document belongs to one primary shard.
  • The number of primary shards is fixed at the time of index creation.
  • The number of replicas shards can be changed anytime.

What Is Elasticsearch?

ES is a Distributed document store for complex datastore serialised as JSON

In usual dev or prod environments, an ES is deployed as a cluster of nodes (collection of master/slaves) .

  • Based on Apache Lucene.
  • Accessible from any of the member nodes.
  • It uses an inverted index to store text fields.
  • Indexes all data in every field.
  • Every field has a dedicated, optimized data store.
  • Numeric and Geo points are stored in BKD trees.
  • Can execute structured, full text, and complex queries.

Cluster Health

https://myserver.elastic.com:9200/_cat/health?v

The command above allows to query the "myserver.elastic.com" server and publishes the health on the browser (or console if invoked via curl)

Something like below

Plain Text


Time for Action

Let’s add a document into an index. 

The most powerful aspect of Elasticsearch is Dynamic Mapping (more details below) which allows you to explore your data as quickly as possible.

To index a document, you don’t have to first create an index or define a mapping type or define your fields — you can just index a document and the index, type, and fields will spring to life automatically

In the command below, I am trying to issue a request for a document with _id 1.

GET /profile/_doc/1

The GET command above just requests for an index "profile", and in the index, it specifically requests for document "1".

Note: The index is not present, we are issuing a GET command on an index not present.

As expected, the error below gets thrown.

JSON


The point is, you can create an index on the fly by adding the first element itself, but to search or get, you need to have the index.

Add the First Element

I will invoke the PUT and use the same signature "/profile/_doc/1" with the JSON body as shown below.

PUT /profile/_doc/1

JSON


The response in my case is shown below, which indicates the successful indexing of the document "doc - 1".

JSON


What if you do not specify a body? You can give it a quick try and in all probability it should result in a parse-exception.

The automatic detection and addition of new fields is called dynamic mapping. - Dynamic Mapping

Now if you issue the GET again, you will be returned the result that is the newly indexed document.

JSON


If you think ingesting one document at a time is tedious, there are other options like the Bulk API.

Search

I repeated the add document for a few more data items, and then it was ready for the next step — to query the data ingested.

GET: _Search

A simple command (from the official documentation of Elastic)

JSON


It is querying the index, "profile" to return all matches; basically a "select '*'`;" and expects the age to be sorted in the ascending order.

Instead of the result, I get an exception like below.

JSON


Mistake I Made

The error message is explanatory itself, but to be precise I made an error while adding the documents. The error was the age I specified was a "string" and now while executing the sort, I am expecting it to be a numeric.

Two things I could have done

  1. Specified it as numeric.
  2. Defined the mapping explicitly.

Mappings

If you issue the GET _mappings for the index, it should show the mapping that was deduced for the index at the time of creation.

JSON


You see the age is TEXT here, and hence the exception.

Corrections

The PUT mapping for the profile will not work as the index already exists and is not empty.

JSON


If the above command is used, it will throw an exception, as shown below:

JSON


The easiest way is to delete the index and recreate the documents or use the "age.keyword", as shown in the mapping to aggregate. 

I chose to utilize the delete index option, and create it as shown below.

DELETE profile/

This command will delete the index along with all the documents.

New index

Adding the new index as under should take care of the previous mistake I made.

JSON


Time to fire the search again.

JSON


This time the response is as under — a successful search.

JSON
 


What Is Mapping?

It will need a blog to explain the mappings in detail, but a short answer is 

Mapping defines how a document and the fields it contains are stored and indexed.

It helps to identify at a high level:

  • Which fields are date fields (should be stored like a date).
  • Which string fields are to be used for full-text searching.
  • Which are numeric (for operations like sort, aggregate, etc.)
Topics:
big data, document, elasticsearch, index, kibana, lucerne, tutorial

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}