It's 2025: How Do You Choose Between Doris and ClickHouse?
Doris or ClickHouse? Doris excels in complex SQL analytics, real-time updates, and usability, while ClickHouse offers extreme performance in single-table queries.
Join the DZone community and get the full member experience.
Join For FreeDatabase selection is a challenge every data engineer faces.
Among the many databases available, Apache Doris and ClickHouse, as two mainstream analytical databases, are often compared. Each has its strengths and is suited to different scenarios, making the choice difficult.
I once managed a data platform architecture for a large group and faced the decision: Should we replace Hadoop with ClickHouse or Doris?
The debate within the company was intense. Those in favor of ClickHouse argued for its superior performance, while those supporting Doris emphasized its comprehensive ecosystem and high usability.
Ultimately, it took us nearly two months of comprehensive testing to make a decision.
Architecture: Complex vs. Simple
Doris employs a typical FE (frontend) and BE (backend) separation architecture, which makes cluster management extremely simple.
- FE handles metadata management and query planning, with multiple FE nodes ensuring metadata consistency through the Paxos protocol.
- BE is responsible for data storage and execution engines, supporting automatic balancing and fault recovery.
ClickHouse, on the other hand, is designed as a single-machine system. To build a cluster, you need to configure distributed tables and rely on ZooKeeper (or Keeper service) for node coordination. This design makes ClickHouse very flexible for small-scale deployments, but large-scale cluster management becomes complex.
Imagine you are managing a 100-node cluster:
- With Doris, you simply add new BE nodes through the FE node, and the system automatically handles data distribution.
- With ClickHouse, you face the cumbersome tasks of managing local tables, distributed tables, and ZooKeeper configurations.
Performance: Extreme vs. Comprehensive
ClickHouse excels in query performance. In single-table query scenarios, especially where extreme performance is required, ClickHouse usually comes out on top.
Doris, with its robust MPP execution framework and vectorized execution engine, performs exceptionally well in complex SQL analytics.
Especially for multi-table joins, Doris supports shuffle joins between large tables, enabling complex analytical queries that ClickHouse struggles with.
Additionally, ClickHouse's update and delete operations are asynchronous. After executing a delete command, data does not immediately disappear from query results; it waits for the background Merge to complete. This design optimizes write performance but sacrifices data consistency.
Doris, in contrast, supports synchronous updates and deletes, ensuring data is immediately visible. Its UniqueKey model, through Merge-on-Write, achieves true primary key deduplication, with performance 10 times better than ClickHouse.
Many business scenarios require real-time data updates, such as user tagging and real-time dashboards. With ClickHouse, you might encounter issues where deleted users still appear in query results, which can severely impact business operations.
Ecosystem and Usability
Doris excels in out-of-the-box usability.
- It supports standard SQL syntax and is compatible with the MySQL protocol, allowing seamless integration with various BI tools.
- Doris also provides rich connectors for seamless integration with data sources like Hive, Iceberg, Hudi, and Paimon.
ClickHouse, however, uses its own SQL dialect, which, while powerful, comes with a higher learning curve.
- It lacks support for some standard SQL features, such as correlated subqueries and EXISTS predicates.
- Many companies already have extensive MySQL experience, and choosing Doris can reduce team learning costs. Additionally, the Doris community is particularly active!
A friend of mine, who manages a data team, found that their existing MySQL DBAs could quickly get up to speed with Doris, significantly improving team efficiency.
Choosing the Right Database
How do you choose between these two databases? Here are some guidelines:
ClickHouse is suitable for scenarios where:
- Extreme query performance is required.
- The data model is relatively fixed, and wide tables are the norm.
- There is a dedicated database engineer to maintain the system.
- Data updates are infrequent.
Doris is suitable for scenarios where:
- Complex SQL analytics are required.
- Real-time data updates are necessary.
- Ease of maintenance and usability are priorities.
- Integration with multiple data sources is needed.
- High-concurrency queries are required.
Additionally, you can refer to the case study from Kuaishou last year.
Doris' lakehouse capabilities allowed them to enhance query performance while simplifying data governance.
Conclusion
There is no absolute right or wrong when choosing a database; the key is to find the tool that best fits your business scenario.
If your team has limited resources and prioritizes usability and comprehensiveness, Doris is the better choice; if you are solely focused on extreme single-table query performance and have a dedicated team to maintain it, ClickHouse might be more suitable.
Database selection is like ordering a meal in a restaurant — no single dish suits everyone. Both databases are rapidly evolving, and the competition will only intensify in the future.
Stay tuned for more interesting, useful, and valuable content in the next issue!
Opinions expressed by DZone contributors are their own.
Comments