Why We Chose Iceberg Over Delta After Evaluating Both at Scale

Delta often performs better for Spark workloads, while Iceberg tends to be stronger for a multi-engine environment. The right choice depends on your platform use case.

Kuladeep Sandra

Ashwin Ramesh Kumar

May. 21, 26 · Analysis

Likes (0)

Comment

Save

2.5K Views

When people compare Delta Lake and Apache Iceberg, the discussion often stays too abstract. Most articles describe features at a high level, but platform decisions are usually made in much more practical terms: Which format fits your workloads better? Which one is easier to operate? Which one creates fewer long-term constraints?

This article is a practitioner-style comparison of the dimensions that matter most in day-to-day platform work: write-heavy operations, multi-engine reads, schema evolution, compaction, and time travel.

The examples here are generalized and illustrative. The goal is not to prove that one format is universally better, but to show where each one tends to fit best.

1. MERGE Performance on Large Tables

For write-heavy Spark-centric workloads, Delta Lake often has an advantage.

In environments where tables are updated frequently using MERGE INTO, Delta tends to perform well because of its tight integration with Spark and the way it uses table metadata for pruning and transactional processing. In practice, that can make Delta a strong fit for fact tables that see frequent upserts and incremental corrections.

Iceberg also supports MERGE INTO, and the syntax is very similar, but performance can vary more depending on engine version, metadata layout, partitioning strategy, and write patterns. In many teams, Iceberg’s strengths show up more clearly on interoperability and table management than on highly write-optimized merge-heavy pipelines.

Example

    SQL
   
 

   MERGE INTO transactions AS target
USING daily_updates AS source
ON target.transaction_id = source.transaction_id
WHEN MATCHED THEN UPDATE SET
  target.amount = source.amount,
  target.status = source.status,
  target.updated_at = current_timestamp()
WHEN NOT MATCHED THEN INSERT *;
  

The SQL is familiar across both formats. The difference is usually not in syntax, but in how the underlying metadata and engine integration affect execution.

Practical takeaway: If your platform is heavily centered on Spark and frequent upserts, Delta is often worth serious consideration.

2. Multi-Engine Read Access

This is one of Iceberg’s clearest strengths.

If your lakehouse needs to be read consistently by more than one engine, such as Spark for batch processing and Trino or Flink for analytics and interactive workloads, Iceberg generally offers a cleaner model. Its catalog-oriented design fits naturally into multi-engine environments and reduces the need for workarounds or compatibility layers.

Delta can absolutely work in multi-engine environments too, but the experience may depend more heavily on surrounding infrastructure, connector maturity, and version alignment.

Practical takeaway: If your operating model is intentionally multi-engine, Iceberg usually feels more natural.

3. Schema Evolution Under Live Traffic

Both formats handle common schema evolution tasks well, especially adding columns.

Where the difference becomes more noticeable is in changes such as renames and more advanced schema evolution workflows. Iceberg is often favored here because of its metadata-driven design and stable column identity model, which can make schema evolution cleaner in environments where tables continue to be read while changes are happening.

Delta also supports schema evolution well, but the exact experience for renames and similar operations can depend on platform, version, and configuration. In some environments, teams need to plan these changes more carefully.

Example

    SQL
   
   -- Iceberg
ALTER TABLE lakehouse.transactions
RENAME COLUMN old_column_name TO new_column_name;

This kind of metadata-oriented change is one reason Iceberg is often attractive in environments where schemas continue to evolve over time.

Practical takeaway: If your platform expects ongoing schema change across shared datasets, Iceberg may offer a cleaner long-term experience.

4. Compaction and File Maintenance

Neither format removes the need for operational discipline.

At the production scale, both Delta and Iceberg require active file management. Left unattended, small files and fragmented layouts will eventually affect performance and create unnecessary cost. The difference is more in how compaction is expressed and tuned.

Delta generally provides a more straightforward operational experience for many teams. Iceberg often gives you more control, which can be valuable, but also means tuning matters more.

Example Patterns

Delta-Style Approach

    SQL
   
   OPTIMIZE transactions
ZORDER BY (account_id);

Iceberg-Style Approach

    SQL
   
 

   spark.sql("""
  CALL lakehouse.system.rewrite_data_files(
    table => 'db.transactions',
    strategy => 'sort'
  )
""")
  

This is an area where usability and flexibility trade off against each other.

Delta often feels simpler to operate
Iceberg often offers more tuning flexibility
Neither one should be treated as “set and forget”

Practical takeaway: Treat compaction as part of the platform operating model, not as a one-time optimization.

5. Time Travel and Auditability

Both Delta Lake and Iceberg support time travel, which is one of the major strengths of modern table formats.

That means teams can query a table as it existed at an earlier timestamp or snapshot, which is valuable for debugging, auditability, recovery, and reproducibility.

The difference is more in the operational feel:

Delta’s log-oriented model can be easier to inspect when debugging transaction history
Iceberg’s metadata model can be more compact and more aligned with how files and snapshots are managed across engines

Both are strong here.

Practical takeaway: This category is less about which format is “better” and more about which metadata model your team prefers to reason about.

So, Which One Should You Choose?

There is no universal winner. The better choice depends on the kind of platform you are building.

Choose Delta Lake when:

Your environment is primarily Spark-centric
Write-heavy MERGE workloads are a major priority
You value tighter integration and simpler write-path ergonomics
Your users are mostly operating within one engine ecosystem

Choose Apache Iceberg when:

Your platform is intentionally multi-engine
You expect Spark, Trino, Flink, or other engines to read the same tables
Schema evolution is an ongoing reality
You want a format that fits well into a more open lakehouse architecture

A Simple Way to Think About It

A useful mental model is this:

Delta often feels optimized for tightly integrated Spark-first operations
Iceberg often feels optimized for broader interoperability and long-term openness

That does not mean Delta is closed off or that Iceberg is always slower. It means each format tends to shine in a different operating model.

Comparison Summary

Dimension	Delta Lake	Apache Iceberg	General edge
MERGE-heavy Spark workloads	Often strong	Good, but can vary more by engine/setup	Delta
Multi-engine access	Possible, but may depend on connectors/integration	Strong native fit	Iceberg
Schema evolution	Strong, with experience depending on setup/version	Strong and often cleaner for evolving shared datasets	Iceberg
Compaction	Straightforward ergonomics	More configurable	Depends on needs
Time travel	Strong	Strong	Tie
Cross-engine support	More environment-dependent	Broad and natural fit	Iceberg

Final Thoughts

The most important part of this decision is not feature parity. It is platform fit.

If your world is centered on Spark, frequent upserts, and tightly controlled write patterns, Delta may be the better operational choice.

If your world is moving toward shared lakehouse tables across multiple engines, evolving schemas, and a more open architecture, Iceberg is often the stronger long-term fit.

In other words, this is less a question of “Which format is better?” and more a question of “Which format is better for the kind of platform we are trying to build?”

That is the comparison that matters.

Note: This article presents generalized architectural observations and illustrative examples based on common lakehouse design patterns. It does not describe any specific internal implementation.

DELTA (taxonomy) sql Big data

Opinions expressed by DZone contributors are their own.

Related

Trending

Why We Chose Iceberg Over Delta After Evaluating Both at Scale

Delta often performs better for Spark workloads, while Iceberg tends to be stronger for a multi-engine environment. The right choice depends on your platform use case.

1. MERGE Performance on Large Tables

Example

2. Multi-Engine Read Access

3. Schema Evolution Under Live Traffic

Example

4. Compaction and File Maintenance

Example Patterns

Delta-Style Approach

Iceberg-Style Approach

5. Time Travel and Auditability

So, Which One Should You Choose?

A Simple Way to Think About It

Comparison Summary

Final Thoughts

Related

Partner Resources