DZone
Big Data Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Big Data Zone > How Does Elasticsearch Real-time Search?

How Does Elasticsearch Real-time Search?

Hüseyin Akdoğan user avatar by
Hüseyin Akdoğan
CORE ·
Nov. 25, 14 · Big Data Zone · Interview
Like (0)
Save
Tweet
14.87K Views

Join the DZone community and get the full member experience.

Join For Free

Compared to other features, real-time search capability is undoubtedly one of the most important features in Elasticsearch. Today we’ll look closely how is provided real-time search by Elasticsearch.

Real time

First of all, if we need to explain the concept of real-time, in general, we can say that the delay between input and out time in the information is small at real-time systems. This means, data is taken without data accumulation, processed in real time.

Today, the best solution Elasticsearch known for real-time search, when a record is added to it for storage makes it searchable in 1 second.

How?

As is known, the disks are able to create a risk of bottleneck for I/O operations at the data persistence step. Also some mechanisms used for prevent any loss of data increases cost of time.

At this point Elasticsearch uses the file-system cache that sitting between itself and the disk for overcome the risk of bottleneck and ensure the a new document can be searched in real time.

A new segment is written to the file-system cache first and only later it flushed to disk by Elasticsearch. This lightweight process of writing and opening a new segment is called a refresh in Elasticsearch. By default, all shards is refreshed automatically once every second. In this way, Elasticsearch support real-time search.

Test time

Above digression about the time of refresh of the shards you can bring to mind the following questions:

  1. What happens, when a new document is requested in less than 1 second time?
  2. Can be documents requested, without having to depend of the refresh period shards of managed by Elasticsearch?

Short answers.

  1. Elasticsearch does not return the document.
  2. Yes.

Now let’s get clarity on this issue is a simple example.

hakdogan$ curl -XPUT localhost:9200/kodcucom/document/1 -d'{
> "title": "Document A"
> }'

We sent a document to Elasticsearch. The index name is kodcucom, type document, id value 1. The title field is only field in the document and the value of "Document A". Let’s take this document from Elasticsearch.

hakdogan$ curl -XGET localhost:9200/kodcucom/document/1?pretty
{
  "_index" : "kodcucom",
  "_type" : "document",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source":{
"title": "Document A"
}
}

As expected, the document was returned to us. Well, if we keep short the time between document recording and get request than default shard refresh time what will happen?

Let’s see.

hakdogan$ curl -XPUT localhost:9200/kodcucom/document/2 -d'{"title": "Document B"}'; curl -XGET localhost:9200/kodcucom/_search?pretty
{"_index":"kodcucom","_type":"document","_id":"2","_version":1,"created":true}{
  "took" : 38,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "kodcucom",
      "_type" : "document",
      "_id" : "1",
      "_score" : 1.0,
      "_source":{
"title": "Document A"
}
    } ]
  }
}

As can be seen, only the previous document was returned to us by Elasticsearch when we do concurrently create and get request. Well, how can I get the document concurrently?

Let’s see.

hakdogan$ curl -XPUT localhost:9200/kodcucom/document/3 -d'{"title": "Document C"}'; curl -XGET localhost:9200/kodcucom/_refresh; curl -XGET localhost:9200/kodcucom/_search?pretty
{"_index":"kodcucom","_type":"document","_id":"3","_version":1,"created":true}{"_shards":{"total":10,"successful":5,"failed":0}}{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "kodcucom",
      "_type" : "document",
      "_id" : "1",
      "_score" : 1.0,
      "_source":{
"title": "Document A"
}
    }, {
      "_index" : "kodcucom",
      "_type" : "document",
      "_id" : "2",
      "_score" : 1.0,
      "_source":{"title": "Document B"}
    }, {
      "_index" : "kodcucom",
      "_type" : "document",
      "_id" : "3",
      "_score" : 1.0,
      "_source":{"title": "Document C"}
    } ]
  }
}

In this command, we perform to refresh operation on kodcucom index before the search request. In this way, the document was returned to us.

Auto refresh time can be changed.

  1. By setting the index.refresh_interval parameter in the configuration file. Applies to all indices in the cluster.
  2. A per-index basis by updated index setting.

In addition to these, you can turn off automatic refresh. An important point to keep in mind about the refresh time of the shards, the refresh operation is costly in terms of system resources. If you wished to make changes to the auto-refresh time, this situation should be taken into account.

Extension of the automatic refresh time, enables faster indexing but new documents and changes made to the existing documents will not appear in searches during specified period of time.

Elasticsearch Document

Published at DZone with permission of Hüseyin Akdoğan. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • An Overview of Key Components of a Data Pipeline
  • Types of UI Design Patterns Depending on Your Idea
  • Why to Implement GitOps into Your Kubernetes CI/CD Pipelines
  • Top Six Kubernetes Best Practices for Fleet Management

Comments

Big Data Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo