DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • How to Scale Elasticsearch to Solve Your Scalability Issues
  • Architecting and Building LLM-Powered Generative AI Applications
  • Introduction to Elasticsearch
  • Documentation 101: How to Properly Document Your Cloud Infrastructure Project

Trending

  • Analyzing Techniques to Provision Access via IDAM Models During Emergency and Disaster Response
  • Advancing Your Software Engineering Career in 2025
  • Navigating Change Management: A Guide for Engineers
  • Using Python Libraries in Java
  1. DZone
  2. Data Engineering
  3. Big Data
  4. Elasticsearch Query and Indexing Architecture

Elasticsearch Query and Indexing Architecture

This article breaks down Elasticsearch's core architecture by explaining how search queries and indexing requests flow through the system.

By 
Udbhav Prasad user avatar
Udbhav Prasad
·
Nov. 26, 24 · Tutorial
Likes (6)
Comment
Save
Tweet
Share
4.3K Views

Join the DZone community and get the full member experience.

Join For Free

What Is Elasticsearch?

Elasticsearch is a distributed, open-source search and analytics engine built atop the Apache Lucene library. Elasticsearch also offers vector search and retrieval augmented generation (RAG), supporting modern AI applications seamlessly. Applications can store structured and unstructured data in Elasticsearch, with or without a defined schema, by sending JSON payloads to an Elasticsearch cluster.

Elasticsearch Architecture

From the ground up, the main components of an Elasticsearch cluster are:

Document

A document is the smallest record of information stored by Elasticsearch and is represented as JSON. A document consists of multiple fields (key-value pairs) of different types and can have a predefined schema or be schema-less, inferring the data types of any new fields that are indexed.

Index

An index is a logical collection of documents with the same schema, identified by an index name.

Shard

Elasticsearch indexes are split into manageable units called shards, which are a collection of documents. Shards are the basic unit of search and are replicated across multiple nodes for redundancy and fault tolerance.

Node

A node is an independent instance of Elasticsearch and manages a collection of shards that belong to one or more indices. Nodes can have different roles like data node, master node, and ingest node.

Cluster

An Elasticsearch cluster is a collection of interconnected nodes. All nodes in a cluster can handle requests from clients, and communicate with each other. Each node in a cluster owns a subset of the shards that belong to an index.

Query Architecture

The following architecture diagram outlines the flow of a search request:

Architecture diagram showing flow of a search request

  1. The user or application makes a search query. The query can be handled by any node in the cluster. The node that handles the request is the “coordinating” node.
  2. The coordinating node broadcasts the query to all involved shards and their replicas.
  3. Each shard executes the query locally and returns a lightweight set of results to the coordinating node.
  4. The coordinating node merges the results it receives. This is the end of the “query” phase. The query phase identifies the bare-bones documents that form the search result, but the full document still needs to be retrieved.
  5. The coordinating node sends fetch requests to the owning shards, which enrich the documents in the result set.
  6. The enriched documents are returned to the coordinating node.
  7. The full set of search results, ranked and enriched, are returned to the caller.

Indexing Architecture

The following architecture diagram outlines the flow of an indexing request:

Architectural diagram outlining the flow of an indexing request

  1. The user sends a JSON document for Elasticsearch to index. If the document already exists, new fields are added, and existing fields are overwritten. The node that first receives the request is the “coordinating” node.
  2. The coordinate node identifies the primary shard of the incoming document, usually based on the document ID, and forwards the request to the data node which owns the primary shard.
  3. The primary shard validates the operation and executes it locally.
  4. The primary shard then forwards the operation to all its replicas in parallel.
  5. The replica shards apply the operation locally on their nodes.
  6. Steps 6, 7, and 8 show the acknowledgment of the write bubbling up from the replica shard to the primary shard, to the coordinating node, and to the caller.

Conclusion

This article describes the different components of an Elasticsearch cluster: documents, indexes, shards, and nodes. It also outlines the lifetime of a search request and an indexing request. Its flexible architecture makes it easy to add and remove nodes as the cluster scales. Combined with features like schema-less indexing and support for AI search features, this makes Elasticsearch the de-facto standard for organizations needing to efficiently store, search, and analyze large volumes of data in real time.

Architecture Document Elasticsearch cluster Shard (database architecture)

Opinions expressed by DZone contributors are their own.

Related

  • How to Scale Elasticsearch to Solve Your Scalability Issues
  • Architecting and Building LLM-Powered Generative AI Applications
  • Introduction to Elasticsearch
  • Documentation 101: How to Properly Document Your Cloud Infrastructure Project

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!