Why We Chose Iceberg Over Delta After Evaluating Both at Scale
Delta often performs better for Spark workloads, while Iceberg tends to be stronger for a multi-engine environment. The right choice depends on your platform use case.
Join the DZone community and get the full member experience.
Join For FreeWhen people compare Delta Lake and Apache Iceberg, the discussion often stays too abstract. Most articles describe features at a high level, but platform decisions are usually made in much more practical terms: Which format fits your workloads better? Which one is easier to operate? Which one creates fewer long-term constraints?
This article is a practitioner-style comparison of the dimensions that matter most in day-to-day platform work: write-heavy operations, multi-engine reads, schema evolution, compaction, and time travel.
The examples here are generalized and illustrative. The goal is not to prove that one format is universally better, but to show where each one tends to fit best.
1. MERGE Performance on Large Tables
For write-heavy Spark-centric workloads, Delta Lake often has an advantage.
In environments where tables are updated frequently using MERGE INTO, Delta tends to perform well because of its tight integration with Spark and the way it uses table metadata for pruning and transactional processing. In practice, that can make Delta a strong fit for fact tables that see frequent upserts and incremental corrections.
Iceberg also supports MERGE INTO, and the syntax is very similar, but performance can vary more depending on engine version, metadata layout, partitioning strategy, and write patterns. In many teams, Iceberg’s strengths show up more clearly on interoperability and table management than on highly write-optimized merge-heavy pipelines.
Example
MERGE INTO transactions AS target
USING daily_updates AS source
ON target.transaction_id = source.transaction_id
WHEN MATCHED THEN UPDATE SET
target.amount = source.amount,
target.status = source.status,
target.updated_at = current_timestamp()
WHEN NOT MATCHED THEN INSERT *;
The SQL is familiar across both formats. The difference is usually not in syntax, but in how the underlying metadata and engine integration affect execution.
Practical takeaway: If your platform is heavily centered on Spark and frequent upserts, Delta is often worth serious consideration.
2. Multi-Engine Read Access
This is one of Iceberg’s clearest strengths.
If your lakehouse needs to be read consistently by more than one engine, such as Spark for batch processing and Trino or Flink for analytics and interactive workloads, Iceberg generally offers a cleaner model. Its catalog-oriented design fits naturally into multi-engine environments and reduces the need for workarounds or compatibility layers.
Delta can absolutely work in multi-engine environments too, but the experience may depend more heavily on surrounding infrastructure, connector maturity, and version alignment.
Practical takeaway: If your operating model is intentionally multi-engine, Iceberg usually feels more natural.
3. Schema Evolution Under Live Traffic
Both formats handle common schema evolution tasks well, especially adding columns.
Where the difference becomes more noticeable is in changes such as renames and more advanced schema evolution workflows. Iceberg is often favored here because of its metadata-driven design and stable column identity model, which can make schema evolution cleaner in environments where tables continue to be read while changes are happening.
Delta also supports schema evolution well, but the exact experience for renames and similar operations can depend on platform, version, and configuration. In some environments, teams need to plan these changes more carefully.
Example
-- Iceberg
ALTER TABLE lakehouse.transactions
RENAME COLUMN old_column_name TO new_column_name;
This kind of metadata-oriented change is one reason Iceberg is often attractive in environments where schemas continue to evolve over time.
Practical takeaway: If your platform expects ongoing schema change across shared datasets, Iceberg may offer a cleaner long-term experience.
4. Compaction and File Maintenance
Neither format removes the need for operational discipline.
At the production scale, both Delta and Iceberg require active file management. Left unattended, small files and fragmented layouts will eventually affect performance and create unnecessary cost. The difference is more in how compaction is expressed and tuned.
Delta generally provides a more straightforward operational experience for many teams. Iceberg often gives you more control, which can be valuable, but also means tuning matters more.
Example Patterns
Delta-Style Approach
OPTIMIZE transactions
ZORDER BY (account_id);
Iceberg-Style Approach
spark.sql("""
CALL lakehouse.system.rewrite_data_files(
table => 'db.transactions',
strategy => 'sort'
)
""")
This is an area where usability and flexibility trade off against each other.
- Delta often feels simpler to operate
- Iceberg often offers more tuning flexibility
- Neither one should be treated as “set and forget”
Practical takeaway: Treat compaction as part of the platform operating model, not as a one-time optimization.
5. Time Travel and Auditability
Both Delta Lake and Iceberg support time travel, which is one of the major strengths of modern table formats.
That means teams can query a table as it existed at an earlier timestamp or snapshot, which is valuable for debugging, auditability, recovery, and reproducibility.
The difference is more in the operational feel:
- Delta’s log-oriented model can be easier to inspect when debugging transaction history
- Iceberg’s metadata model can be more compact and more aligned with how files and snapshots are managed across engines
Both are strong here.
Practical takeaway: This category is less about which format is “better” and more about which metadata model your team prefers to reason about.
So, Which One Should You Choose?
There is no universal winner. The better choice depends on the kind of platform you are building.
Choose Delta Lake when:
- Your environment is primarily Spark-centric
- Write-heavy
MERGEworkloads are a major priority - You value tighter integration and simpler write-path ergonomics
- Your users are mostly operating within one engine ecosystem
Choose Apache Iceberg when:
- Your platform is intentionally multi-engine
- You expect Spark, Trino, Flink, or other engines to read the same tables
- Schema evolution is an ongoing reality
- You want a format that fits well into a more open lakehouse architecture
A Simple Way to Think About It
A useful mental model is this:
- Delta often feels optimized for tightly integrated Spark-first operations
- Iceberg often feels optimized for broader interoperability and long-term openness
That does not mean Delta is closed off or that Iceberg is always slower. It means each format tends to shine in a different operating model.
Comparison Summary
| Dimension | Delta Lake | Apache Iceberg | General edge |
|---|---|---|---|
| MERGE-heavy Spark workloads | Often strong | Good, but can vary more by engine/setup | Delta |
| Multi-engine access | Possible, but may depend on connectors/integration | Strong native fit | Iceberg |
| Schema evolution | Strong, with experience depending on setup/version | Strong and often cleaner for evolving shared datasets | Iceberg |
| Compaction | Straightforward ergonomics | More configurable | Depends on needs |
| Time travel | Strong | Strong | Tie |
| Cross-engine support | More environment-dependent | Broad and natural fit | Iceberg |
Final Thoughts
The most important part of this decision is not feature parity. It is platform fit.
If your world is centered on Spark, frequent upserts, and tightly controlled write patterns, Delta may be the better operational choice.
If your world is moving toward shared lakehouse tables across multiple engines, evolving schemas, and a more open architecture, Iceberg is often the stronger long-term fit.
In other words, this is less a question of “Which format is better?” and more a question of “Which format is better for the kind of platform we are trying to build?”
That is the comparison that matters.
Note: This article presents generalized architectural observations and illustrative examples based on common lakehouse design patterns. It does not describe any specific internal implementation.
Opinions expressed by DZone contributors are their own.
Comments