DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • How to Scale Elasticsearch to Solve Your Scalability Issues
  • Architecting and Building LLM-Powered Generative AI Applications
  • Introduction to Elasticsearch
  • Documentation 101: How to Properly Document Your Cloud Infrastructure Project

Trending

  • Optimize Deployment Pipelines for Speed, Security and Seamless Automation
  • Rethinking Recruitment: A Journey Through Hiring Practices
  • From Zero to Production: Best Practices for Scaling LLMs in the Enterprise
  • AI’s Role in Everyday Development
  1. DZone
  2. Data Engineering
  3. Big Data
  4. Elasticsearch Query and Indexing Architecture

Elasticsearch Query and Indexing Architecture

This article breaks down Elasticsearch's core architecture by explaining how search queries and indexing requests flow through the system.

By 
Udbhav Prasad user avatar
Udbhav Prasad
·
Nov. 26, 24 · Tutorial
Likes (6)
Comment
Save
Tweet
Share
4.2K Views

Join the DZone community and get the full member experience.

Join For Free

What Is Elasticsearch?

Elasticsearch is a distributed, open-source search and analytics engine built atop the Apache Lucene library. Elasticsearch also offers vector search and retrieval augmented generation (RAG), supporting modern AI applications seamlessly. Applications can store structured and unstructured data in Elasticsearch, with or without a defined schema, by sending JSON payloads to an Elasticsearch cluster.

Elasticsearch Architecture

From the ground up, the main components of an Elasticsearch cluster are:

Document

A document is the smallest record of information stored by Elasticsearch and is represented as JSON. A document consists of multiple fields (key-value pairs) of different types and can have a predefined schema or be schema-less, inferring the data types of any new fields that are indexed.

Index

An index is a logical collection of documents with the same schema, identified by an index name.

Shard

Elasticsearch indexes are split into manageable units called shards, which are a collection of documents. Shards are the basic unit of search and are replicated across multiple nodes for redundancy and fault tolerance.

Node

A node is an independent instance of Elasticsearch and manages a collection of shards that belong to one or more indices. Nodes can have different roles like data node, master node, and ingest node.

Cluster

An Elasticsearch cluster is a collection of interconnected nodes. All nodes in a cluster can handle requests from clients, and communicate with each other. Each node in a cluster owns a subset of the shards that belong to an index.

Query Architecture

The following architecture diagram outlines the flow of a search request:

Architecture diagram showing flow of a search request

  1. The user or application makes a search query. The query can be handled by any node in the cluster. The node that handles the request is the “coordinating” node.
  2. The coordinating node broadcasts the query to all involved shards and their replicas.
  3. Each shard executes the query locally and returns a lightweight set of results to the coordinating node.
  4. The coordinating node merges the results it receives. This is the end of the “query” phase. The query phase identifies the bare-bones documents that form the search result, but the full document still needs to be retrieved.
  5. The coordinating node sends fetch requests to the owning shards, which enrich the documents in the result set.
  6. The enriched documents are returned to the coordinating node.
  7. The full set of search results, ranked and enriched, are returned to the caller.

Indexing Architecture

The following architecture diagram outlines the flow of an indexing request:

Architectural diagram outlining the flow of an indexing request

  1. The user sends a JSON document for Elasticsearch to index. If the document already exists, new fields are added, and existing fields are overwritten. The node that first receives the request is the “coordinating” node.
  2. The coordinate node identifies the primary shard of the incoming document, usually based on the document ID, and forwards the request to the data node which owns the primary shard.
  3. The primary shard validates the operation and executes it locally.
  4. The primary shard then forwards the operation to all its replicas in parallel.
  5. The replica shards apply the operation locally on their nodes.
  6. Steps 6, 7, and 8 show the acknowledgment of the write bubbling up from the replica shard to the primary shard, to the coordinating node, and to the caller.

Conclusion

This article describes the different components of an Elasticsearch cluster: documents, indexes, shards, and nodes. It also outlines the lifetime of a search request and an indexing request. Its flexible architecture makes it easy to add and remove nodes as the cluster scales. Combined with features like schema-less indexing and support for AI search features, this makes Elasticsearch the de-facto standard for organizations needing to efficiently store, search, and analyze large volumes of data in real time.

Architecture Document Elasticsearch cluster Shard (database architecture)

Opinions expressed by DZone contributors are their own.

Related

  • How to Scale Elasticsearch to Solve Your Scalability Issues
  • Architecting and Building LLM-Powered Generative AI Applications
  • Introduction to Elasticsearch
  • Documentation 101: How to Properly Document Your Cloud Infrastructure Project

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!