DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workkloads.

Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Doris vs Elasticsearch: A Comparison and Practical Cost Case Study
  • Log Analysis: Elasticsearch vs. Apache Doris
  • Unlocking the Potential of Apache Iceberg: A Comprehensive Analysis
  • A Deep Dive into Apache Doris Indexes

Trending

  • How Large Tech Companies Architect Resilient Systems for Millions of Users
  • The Modern Data Stack Is Overrated — Here’s What Works
  • A Guide to Container Runtimes
  • Building Scalable and Resilient Data Pipelines With Apache Airflow
  1. DZone
  2. Software Design and Architecture
  3. Performance
  4. Apache Doris vs Elasticsearch: An In-Depth Comparative Analysis

Apache Doris vs Elasticsearch: An In-Depth Comparative Analysis

Apache Doris excels in complex analytics with SQL support and high performance, while Elasticsearch is ideal for full-text search and real-time retrieval.

By 
haijun huang user avatar
haijun huang
·
Apr. 23, 25 · Analysis
Likes (1)
Comment
Save
Tweet
Share
2.8K Views

Join the DZone community and get the full member experience.

Join For Free
In the field of big data analytics, Apache Doris and Elasticsearch (ES) are frequently utilized for real-time analytics and retrieval tasks. However, their design philosophies and technical focuses differ significantly.

This article offers a detailed comparison across six dimensions: core architecture, query language, real-time capabilities, application scenarios, performance, and enterprise practices.

Doris vs. Elasticsearch


1. Core Design Philosophy: MPP Architecture vs. Search Engine Architecture

Apache Doris employs a typical MPP (Massively Parallel Processing) distributed architecture, tailored for high-concurrency, low-latency real-time online analytical processing (OLAP) scenarios. It comprises front-end and back-end components, leveraging multi-node parallel computing and columnar storage to efficiently manage massive datasets. This design enables Doris to deliver query results in sub-seconds, making it ideal for complex aggregations and analytical queries on large datasets.

In contrast, Elasticsearch is based on a full-text search engine architecture, utilizing a sharding and inverted index design that prioritizes rapid text retrieval and filtering. ES stores data as documents, with each field indexed via an inverted index, excelling in keyword searches and log queries. However, it struggles with complex analytics and large-scale aggregation computations.

The core architectural differences are summarized below:

Architectural Philosophy
Apache Doris (MPP Analytical Database)
Elasticsearch (Distributed Search Engine)
Design Intent
Geared toward real-time data warehousing/BI, supporting high-throughput parallel computing OLAP engine; emphasizes 
high-concurrency aggregation queries
 and 
low latency
Focused on full-text search/log retrieval, built on Lucene’s inverted index; excels at 
keyword search
 and filtering, primarily a search engine despite structured query support
Data Storage
Columnar storage
 with column-encoded compression, achieving high compression ratios (5-10×) to save space; supports multiple table models (Duplicate, Aggregate, Unique) with pre-aggregation during writes
Document storage
, with inverted indexes per field (low compression ratio, ~1.5×); schema changes are challenging post-index creation, requiring reindexing for field additions or modifications
Scalability and Elasticity
Shared-nothing node design for easy linear scaling; supports strict read-write separation and multi-tenant isolation; version 3.0 introduces 
storage-compute separation
 for elastic scaling
Scales via shard replicas but is constrained by single-node memory and JVM GC limits, risking memory shortages during large queries; thread pool model offers limited isolation
Typical Features
Fully open-source (Apache 2.0), MySQL protocol compatible; no external dependencies, offers 
materialized views
 and rich SQL functions for enhanced analytics
Core developed by Elastic (license changes over time), natively supports 
full-text search
 and near-real-time indexing; rich ecosystem (Kibana, Logstash), with some advanced features requiring paid plugins


Analysis: Doris’s MPP architecture provides a natural edge in big data aggregation analytics, leveraging columnar storage and vectorized execution to optimize IO and CPU usage. Features like pre-aggregation, materialized views, and a scalable design make it outperform ES in large-scale data analytics. 

Conversely, Elasticsearch’s search engine roots make it superior for instant searches and basic metrics, but it falters in complex SQL analytics and joins. Doris also offers greater schema flexibility, allowing real-time column/index modifications, while ES’s fixed schemas often necessitate costly reindexing. 

Overall, Doris emphasizes analytical power and usability, while ES prioritizes retrieval, giving Doris an advantage in complex enterprise analytics.

2. Query Language: SQL vs. DSL Ease of Use and Expressiveness

Doris and ES diverge sharply in query interfaces: Doris natively supports standard SQL, while Elasticsearch uses JSON DSL (Domain Specific Language). Doris aligns with the MySQL protocol, offering robust SQL 92 features such as SELECT, WHERE, GROUP BY, ORDER BY, multi-table JOINs, subqueries, window functions, UDFs/UDAFs, and materialized views. This comprehensive SQL support allows analysts and engineers to perform complex queries using familiar syntax without learning a new language.

Elasticsearch, however, employs a proprietary JSON-based DSL, distinct from SQL, requiring nested structures for filtering and aggregation. This presents a steep learning curve for new users and complicates integration with traditional BI tools.

The comparison is detailed below:

Query Language
Apache Doris (SQL Interface)
Elasticsearch (JSON DSL)
Syntax Style
Standard SQL (MySQL-like), intuitive and readable
Proprietary DSL (JSON), nested and less intuitive
Expressiveness
Supports multi-table JOINs, subqueries, views, UDFs for complex logic; enables direct associative analytics
Limited to single-index queries, no native JOINs or subqueries; complex analytics require pre-processed data models
Learning Cost
SQL is widely known, low entry barrier; mature debugging tools available
DSL is custom, high learning threshold; error troubleshooting is challenging
Ecosystem Integration
MySQL protocol compatible, integrates seamlessly with BI tools (e.g., Tableau, Grafana)
Closed ecosystem, difficult to integrate with BI tools without plugins; Kibana offers basic visualization


Analysis: Doris’s SQL interface excels in usability and efficiency, lowering the entry threshold by leveraging familiar syntax. For instance, aggregating log data by multiple dimensions in Doris requires a simple SQL GROUP BY, while ES demands complex, nested DSL aggregations, reducing development efficiency.

Doris’s support for JOINs and subqueries also suits data warehouse modeling (e.g., star schemas), whereas ES’s lack of JOINs necessitates pre-denormalized data or application-layer processing. Thus, Doris outperforms in query ease and power, enhancing integration with analytics ecosystems.

3. Real-Time Data Processing Mechanisms: Write Architecture and Data Updates

Doris and ES adopt distinct approaches to real-time data ingestion and querying. Elasticsearch prioritizes near-real-time search with document-by-document writes and frequent index refreshes. Data is ingested via REST APIs (e.g., Bulk), tokenized, and indexed, becoming searchable after periodic refreshes (default: 1 second). This ensures rapid log retrieval but incurs high write overhead, with CPU-intensive indexing limiting single-core throughput to ~2 MB/s, often causing bottlenecks during peaks.

Apache Doris, conversely, uses a high-throughput batch write architecture. Data is imported in small batches (via Stream Load or Routine Load from queues like Kafka), written efficiently in columnar format across multiple replicas. Avoiding per-field indexing, Doris achieves write speeds 5 times higher than ES per ES Rally benchmarks, and supports direct queue integration, simplifying pipelines.

Key differences in updates and real-time capabilities include:

  • Storage mechanism: Doris’s columnar storage achieves 5:1 to 10:1 compression, using ~20% of ES’s space for the same data, enhancing IO efficiency. ES’s inverted indexes yield a ~1.5:1 compression ratio, inflating storage.
  • Data updates: Doris’s Unique Key model supports primary key updates with minimal performance loss (<10%), while ES’s document updates require costly reindexing (up to 3x performance hit). Doris’s Aggregate Key model ensures consistent aggregations during imports, unlike ES’s less flexible, eventually consistent rollups.
  • Query visibility: ES offers second-level visibility post-refresh, ideal for instant log retrieval. Doris achieves sub-minute visibility via batch imports, sufficient for most real-time analytics, with memory-buffered data ensuring timely query access.
Analysis: Doris excels in high-throughput, consistent analysis, while ES focuses on millisecond writes and near-real-time retrieval. Doris’s batch writes and compression outperform ES in write performance (5x), query speed (2.3x), and storage efficiency (1/5th), making it ideal for high-frequency writes and fast analytics, with flexible schema evolution further enhancing its real-time capabilities.

4. Typical Application Scenario Comparison: Log Analysis, BI Reporting, etc.

Doris and ES shine in different scenarios due to their architectural strengths:
Scenario
Apache Doris
Elasticsearch
Log Analysis
Excels in storage and multi-dimensional analysis of large logs; supports long-term retention and fast aggregations/JOINs. Enterprises report 
10x faster analytics and 60% cost savings, integrating search and analysis with inverted index support
Ideal for real-time log search and simple stats; fast keyword retrieval suits monitoring and troubleshooting (e.g., ELK). Struggles with complex aggregations and long-term analysis due to cost and performance limits
BI Reporting
Perfect for interactive reporting and ad-hoc analysis; full SQL and JOINs support data warehousing and dashboards. A logistics firm saw 5-10x faster queries and 2x concurrency
Rarely used for BI; lacks JOINs and robust SQL, limiting complex reporting. Best for simple metrics in monitoring, not rich BI logic


Analysis: In log analysis, Doris and ES complement each other: ES handles real-time searches, while Doris manages long-term, complex analytics. For BI, Doris’s SQL and performance make it far superior, directly supporting enterprise data warehouses and reporting.

5. Performance Benchmark Comparison

ES Rally benchmarks highlight Doris’s edge:

Apache Doris vs. Elasticsearch

  • Log analysis: Elasticsearch vs Apache Doris - Apache Doris
  • Performance comparison: write throughput, storage, query response time
Doris achieves 550 MB/s write speed (5x ES), uses 1/5th the storage, and offers 2.3x faster queries (e.g., 1s vs. 6-7s for 40M log aggregations). Its MPP architecture ensures stability under high concurrency, unlike ES, which struggles with memory limits.

6. Enterprise Practice Cases

  • 360 security browser: Replaced ES with Doris, improving analytics speed by 10x and cutting storage costs by 60%.
  • Tencent music: Reduced storage by 80% (697GB to 195GB) and boosted writes 4x with Doris.
  • Large bank: Enhanced log analysis efficiency, eliminating redundancy.
  • Payment firm: Achieved 4x write speed, 3x query performance, and 50% storage savings.
These cases underscore Doris’s superiority in large-scale writes and complex queries, often supplementing ES’s search strengths.

Summary

Doris excels in complex analytics, SQL usability, and efficiency, ideal for unified real-time platforms, while ES dominates in full-text search and real-time queries. Enterprises can combine them — Doris for analysis, ES for retrieval — to maximize value, with Doris poised to expand in analytics and ES in intelligent search.
Elasticsearch Log analysis Domain-Specific Language Apache

Opinions expressed by DZone contributors are their own.

Related

  • Doris vs Elasticsearch: A Comparison and Practical Cost Case Study
  • Log Analysis: Elasticsearch vs. Apache Doris
  • Unlocking the Potential of Apache Iceberg: A Comprehensive Analysis
  • A Deep Dive into Apache Doris Indexes

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!