DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Handling Schema Versioning and Updates in Event Streaming Platforms Without Schema Registries
  • Apache Avro to ORC Using Apache Gobblin
  • Context Is the New Schema
  • Ten Years of Beam: From Google's Dataflow Paper to 4 Trillion Events at LinkedIn

Trending

  • Building Threat Intelligence Pipelines Using Python, APIs, and Elasticsearch
  • 5 AI Security Incidents That Broke Things in Production (and What They Have in Common)
  • Why Stable RAG Answers Can Still Hide Unstable Evidence
  • GenAI Implementation Isn't Magic — It’s a Lifecycle
  1. DZone
  2. Software Design and Architecture
  3. Microservices
  4. Schema Evolution in Event-Driven Systems: Avro/Protobuf Strategies That Don’t Break Consumers

Schema Evolution in Event-Driven Systems: Avro/Protobuf Strategies That Don’t Break Consumers

Evolve Avro/Protobuf safely with compatibility rules, clear contracts, and consumer-driven tests so producers can change without breaking consumers.

By 
Varun Pandey user avatar
Varun Pandey
·
Feb. 16, 26 · Analysis
Likes (0)
Comment
Save
Tweet
Share
1.6K Views

Join the DZone community and get the full member experience.

Join For Free

Most schema-evolution advice is technically correct and still gets teams hurt.

It usually stops at “add fields, don’t remove fields,” and skips the parts that cause real incidents: semantic drift, consumer lag, unknown consumers, and silent failures. In an event-driven system, the most dangerous break is the one that doesn’t crash anything — it just produces wrong results quietly.

This article is a practical playbook for evolving schemas safely with Avro and Protobuf, with emphasis on compatibility rules, versioning, contracts, and consumer-driven testing. It’s written for the messy reality where producers ship on Tuesday, some consumers don’t upgrade until next quarter, and at least one consumer is owned by a team you don’t know exists.

The Uncomfortable Truth: You Don’t Control Your Consumers

In request/response APIs, you can sometimes force upgrades, set deprecation windows, or block old clients. With events, you publish data into a stream, and it persists — which means:

  • A consumer can replay old data at any time.
  • A new consumer can appear months later and read the full history.
  • Your “minor” change can haunt you for years.

So the goal isn’t “avoid change.” The goal is to make changes safe without synchronized deployments.

Here’s the mental model I use:

Events are public facts; schemas are the public grammar; and consumers are always behind.

Once you internalize “consumers are behind,” the best practices stop feeling like bureaucracy and start feeling like seatbelts.

Compatibility Isn’t One Thing — Pick a Policy on Purpose

When teams say “it’s backward compatible,” they often mean “it compiled.” That’s not a policy.

Define compatibility in terms of who can read whose data:

  • Backward compatible: New consumers can read old events
  • Forward compatible: Old consumers can read new events
  • Fully compatible: Both directions

For event streams with independent deployments, forward or full compatibility is what prevents breakage when producers ship first (which they usually do).

A Simple Rule That Saves Time

  • If producers can ship without coordinating consumer deployments, you need forward compatibility at a minimum.
  • If consumers can ship without coordinating producers (less common), you need backward compatibility.
  • If you want peace, aim for full compatibility whenever practical.

Avro: The “Writer vs. Reader Schema” Advantage (and How People Still Break It)

Avro’s big win is that schema resolution is built in: data is written with a writer schema, and read with a reader schema, and Avro resolves differences. That sounds foolproof until you hit the two ways teams break it:

  1. They evolve schemas without defaults/aliases.
  2. They keep structural compatibility but change meaning.

Avro Rules That Actually Matter

Safe moves (usually):

  • Add a field with a default.
  • Add a new optional branch via a union (carefully).
  • Rename a field with aliases.
  • Deprecate fields by leaving them in place while consumers migrate.

Risky moves (often breaking):

  • Remove a field that some consumer implicitly requires.
  • Change a field type (unless it’s a valid promotion like int -> long).
  • Modify enum symbols without planning for unknowns.
  • “Rename” by deleting + adding (breaks consumers and corrupts analytics).

Example: Add a Field Without Breaking Consumers

v1:

JSON
 
{
  "type": "record",
  "name": "UserProfileUpdated",
  "namespace": "com.example.events",
  "fields": [
    {"name": "userId", "type": "string"},
    {"name": "email", "type": ["null","string"], "default": null},
    {"name": "updatedAtEpochMs", "type": "long"}
  ]
}


v2 (safe add with default):

JSON
 
{
  "type": "record",
  "name": "UserProfileUpdated",
  "namespace": "com.example.events",
  "fields": [
    {"name": "userId", "type": "string"},
    {"name": "email", "type": ["null","string"], "default": null},
    {"name": "updatedAtEpochMs", "type": "long"},
    {"name": "source", "type": "string", "default": "unknown"}
  ]
}


This is mechanically compatible, but here’s the part many articles skip: defaults are semantic decisions. If downstream uses source for segmentation, and you default to "unknown", you may create a large “unknown” bucket that looks like a data quality issue. It won’t crash anything — just quietly degrade decisions.

Practical pattern: If a new field is important, introduce it in two stages:

  1. Add it as optional/defaulted.
  2. Start populating it in producers.
  3. Only later consider making it “required” (and even then, be careful).

Avro Rename: Do It With Aliases (Not Delete + Add)

JSON
 
{
  "type": "record",
  "name": "UserProfileUpdated",
  "namespace": "com.example.events",
  "fields": [
    {"name": "userId", "type": "string"},
    {
      "name": "primaryEmail",
      "type": ["null","string"],
      "default": null,
      "aliases": ["email"]
    },
    {"name": "updatedAtEpochMs", "type": "long"}
  ]
}


That one aliases line is the difference between “no one noticed” and “half the org had a bad day.”

Protobuf: Field Numbers Are the Contract (Everything Else Is Commentary)

Protobuf evolution is simpler on paper: keep field numbers stable and you’re usually fine. But that simplicity hides the most common real-world Protobuf incident:

Someone deletes a field and later reuses the number “because it’s available.”

That’s not “a bug.” That’s a time-delayed data corruption event.

Protobuf Rules That Prevent Wire Breakage

  • Never change a field number.
  • Never reuse a field number (even if the field is deleted).
  • Reserve removed field numbers (and ideally names).
  • Adding new fields is safe.
  • Renaming fields is wire-safe (names aren’t on the wire), but semantics still matter.

v1:

ProtoBuf
 
syntax = "proto3";
package com.example.events;

message UserProfileUpdated {
  string user_id = 1;
  string email = 2;
  int64 updated_at_epoch_ms = 3;
}


v2 (safe add):

ProtoBuf
 
syntax = "proto3";
package com.example.events;

message UserProfileUpdated {
  string user_id = 1;
  string email = 2;
  int64 updated_at_epoch_ms = 3;

  string source = 4; // safe
}


Delete safely:

ProtoBuf
 
message UserProfileUpdated {
  string user_id = 1;
  string email = 2;
  int64 updated_at_epoch_ms = 3;

  reserved 4;         // never reuse
  reserved "source";  // optional, but helps avoid confusion
}


The Protobuf “Presence” Trap (proto3)

Proto3 gives defaults for missing fields ("", 0, false). That can blur “unset vs empty” distinctions. If “presence” matters (it often does), use optional or wrapper types so consumers can distinguish “not provided” from “provided as empty.”

Versioning: Stop Arguing About v1/v2 and Start Versioning the Right Thing

Here’s a point that’s surprisingly absent in many event-schema discussions:

Schema versioning and event versioning are not the same thing.

  • Schema version answers: “How do I decode this payload?”
  • Event version answers: “What does this event mean?”

Avro and Protobuf help you with decoding. They do not protect you from meaning changes.

A Useful Versioning Approach That Reduces Topic Sprawl

  • Keep a single topic/stream for the same event concept (e.g., UserProfileUpdated).
  • Let the schema evolve with compatibility checks.
  • Create a new event name (not just a new schema) when semantics change meaningfully.

When to create a new event (not just evolve fields):

  • Units change (cents -> dollars).
  • Time semantics change (event time -> processing time).
  • Field meaning changes (e.g., status stops being a state machine and becomes a label).
  • You need to delete/restructure in ways that aren’t compatible.

Contracts: Schemas Define Structure; Contracts Define Behavior

Most consumer pain doesn’t come from “I couldn’t parse the message.” It comes from “I parsed it, but it didn’t mean what I thought it meant.”

A schema won’t tell you:

  • Whether a field is required for business correctness
  • Which values are valid
  • What invariants hold

That’s the job of an event contract.

What a Good Event Contract Includes (and Why It’s Rare)

Keep it short and blunt:

  • Field semantics (units, time basis, normalization)
  • Invariants (amount >= 0, timestamp not in future, etc.)
  • Enum meaning (and how to handle unknown values)
  • Deprecation policy (how long old fields remain)
  • Ownership (who to talk to when something changes)

Consumer-Driven Testing: The Missing Safety Net

Compatibility checks catch obvious structural breaks. They won’t catch:

  • “Default value made the metric wrong”
  • “Field meaning changed”
  • “Enum is now missing a case consumer relied on”

So you need tests that reflect real consumers.

Layer 1: Compatibility Gates in CI (Producer-Side)

Make it impossible to merge a schema change that violates your compatibility policy. Even if you don’t have a full registry setup, you can still do this with a small rule:

  • New schema must be compatible with the last N released schemas (or at least the latest)

Layer 2: Golden-Message Replay Tests (Consumer Realism Without Coordination)

Keep a small set of real-ish serialized events produced by older versions (sanitized if needed). In CI:

  • Deserialize those golden messages using the current reader schema/code.
  • Assert invariants (not just “it deserialized”).

This catches a shocking number of “compatible but wrong” changes because it forces you to confront real payload shapes.

Layer 3: Consumer Contracts (The Part That Most Teams Never Do)

Have consumers publish minimal requirements:

  • Required fields
  • Constraints
  • Assumptions (like “email may be null, but if present it’s lowercase”)

Example contract file:

JSON
 
{
  "event": "UserProfileUpdated",
  "requires": ["userId", "updatedAtEpochMs"],
  "constraints": {
    "updatedAtEpochMs": ">= 0"
  },
  "notes": [
    "email may be null",
    "timestamp is event time, not ingestion time"
  ]
}


Producer CI can validate:

  • Schema still contains required fields (or aliases/equivalents)
  • Sample payloads satisfy constraints
  • Semantic notes haven’t been violated (this part is human-reviewed, but that’s still valuable)

A Rollout Playbook That Avoids Synchronized Deployments

This is the sequence that keeps you from playing release-manager roulette:

Compatible Additive Change

  1. Add field (Avro default/Protobuf new field number).
  2. Deploy producer writing the new field.
  3. Deploy consumers to start using it when present.
  4. Monitor adoption (how often field is non-null/non-default).
  5. Deprecate older behavior only after adoption.

Rename or Delete

  • Avro: rename with aliases first; keep old name alive through transition; only later remove
  • Protobuf: stop populating; reserve field; remove only after confidence window

The big idea: stop thinking in “schema migrations,” start thinking in “multi-version coexistence.”

avro Event Schema

Opinions expressed by DZone contributors are their own.

Related

  • Handling Schema Versioning and Updates in Event Streaming Platforms Without Schema Registries
  • Apache Avro to ORC Using Apache Gobblin
  • Context Is the New Schema
  • Ten Years of Beam: From Google's Dataflow Paper to 4 Trillion Events at LinkedIn

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook