DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
The Latest "Software Integration: The Intersection of APIs, Microservices, and Cloud-Based Systems" Trend Report
Get the report
  1. DZone
  2. Data Engineering
  3. Big Data
  4. How Does Elasticsearch Real-time Search?

How Does Elasticsearch Real-time Search?

Hüseyin Akdoğan user avatar by
Hüseyin Akdoğan
CORE ·
Nov. 25, 14 · Interview
Like (0)
Save
Tweet
Share
15.54K Views

Join the DZone community and get the full member experience.

Join For Free

Compared to other features, real-time search capability is undoubtedly one of the most important features in Elasticsearch. Today we’ll look closely how is provided real-time search by Elasticsearch.

Real time

First of all, if we need to explain the concept of real-time, in general, we can say that the delay between input and out time in the information is small at real-time systems. This means, data is taken without data accumulation, processed in real time.

Today, the best solution Elasticsearch known for real-time search, when a record is added to it for storage makes it searchable in 1 second.

How?

As is known, the disks are able to create a risk of bottleneck for I/O operations at the data persistence step. Also some mechanisms used for prevent any loss of data increases cost of time.

At this point Elasticsearch uses the file-system cache that sitting between itself and the disk for overcome the risk of bottleneck and ensure the a new document can be searched in real time.

A new segment is written to the file-system cache first and only later it flushed to disk by Elasticsearch. This lightweight process of writing and opening a new segment is called a refresh in Elasticsearch. By default, all shards is refreshed automatically once every second. In this way, Elasticsearch support real-time search.

Test time

Above digression about the time of refresh of the shards you can bring to mind the following questions:

  1. What happens, when a new document is requested in less than 1 second time?
  2. Can be documents requested, without having to depend of the refresh period shards of managed by Elasticsearch?

Short answers.

  1. Elasticsearch does not return the document.
  2. Yes.

Now let’s get clarity on this issue is a simple example.

hakdogan$ curl -XPUT localhost:9200/kodcucom/document/1 -d'{
> "title": "Document A"
> }'

We sent a document to Elasticsearch. The index name is kodcucom, type document, id value 1. The title field is only field in the document and the value of "Document A". Let’s take this document from Elasticsearch.

hakdogan$ curl -XGET localhost:9200/kodcucom/document/1?pretty
{
  "_index" : "kodcucom",
  "_type" : "document",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source":{
"title": "Document A"
}
}

As expected, the document was returned to us. Well, if we keep short the time between document recording and get request than default shard refresh time what will happen?

Let’s see.

hakdogan$ curl -XPUT localhost:9200/kodcucom/document/2 -d'{"title": "Document B"}'; curl -XGET localhost:9200/kodcucom/_search?pretty
{"_index":"kodcucom","_type":"document","_id":"2","_version":1,"created":true}{
  "took" : 38,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "kodcucom",
      "_type" : "document",
      "_id" : "1",
      "_score" : 1.0,
      "_source":{
"title": "Document A"
}
    } ]
  }
}

As can be seen, only the previous document was returned to us by Elasticsearch when we do concurrently create and get request. Well, how can I get the document concurrently?

Let’s see.

hakdogan$ curl -XPUT localhost:9200/kodcucom/document/3 -d'{"title": "Document C"}'; curl -XGET localhost:9200/kodcucom/_refresh; curl -XGET localhost:9200/kodcucom/_search?pretty
{"_index":"kodcucom","_type":"document","_id":"3","_version":1,"created":true}{"_shards":{"total":10,"successful":5,"failed":0}}{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "kodcucom",
      "_type" : "document",
      "_id" : "1",
      "_score" : 1.0,
      "_source":{
"title": "Document A"
}
    }, {
      "_index" : "kodcucom",
      "_type" : "document",
      "_id" : "2",
      "_score" : 1.0,
      "_source":{"title": "Document B"}
    }, {
      "_index" : "kodcucom",
      "_type" : "document",
      "_id" : "3",
      "_score" : 1.0,
      "_source":{"title": "Document C"}
    } ]
  }
}

In this command, we perform to refresh operation on kodcucom index before the search request. In this way, the document was returned to us.

Auto refresh time can be changed.

  1. By setting the index.refresh_interval parameter in the configuration file. Applies to all indices in the cluster.
  2. A per-index basis by updated index setting.

In addition to these, you can turn off automatic refresh. An important point to keep in mind about the refresh time of the shards, the refresh operation is costly in terms of system resources. If you wished to make changes to the auto-refresh time, this situation should be taken into account.

Extension of the automatic refresh time, enables faster indexing but new documents and changes made to the existing documents will not appear in searches during specified period of time.

Elasticsearch Document

Published at DZone with permission of Hüseyin Akdoğan. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Application Architecture Design Principles
  • Fargate vs. Lambda: The Battle of the Future
  • Introduction to Container Orchestration
  • What Are the Benefits of Java Module With Example

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: