Real-Time Flight Schedule Changes at Scale: Event-Driven Pipelines With gRPC

Real-time flight schedule updates using gRPC, event-driven pipelines, and graph-based design for scalable airline infrastructure.

Ravi Teja Thutari

CORE ·

Aug. 04, 25 · Analysis

Likes (3)

Comment

Save

2.1K Views

Introduction: The Challenge of Flight Schedule Changes

Travel aggregators (like online travel agencies or fare comparison platforms) handle data from hundreds of airlines, including frequent flight schedule changes — delays, cancellations, gate changes, etc. Managing these updates in real-time for millions of users is a massive challenge. Traditional approaches (like periodic polling or manual updates) can’t keep up with the volume and immediacy required. For example, if a flight is canceled or delayed, customers and downstream systems expect to know within seconds, not hours. As one source notes, use cases like airline flight cancellations or package delivery updates demand immediate notifications upon any upstream change.

To tackle this, modern travel platforms are embracing event-driven architecture (EDA) and pipeline patterns to process flight schedule changes in real-time. In an EDA, changes (events) propagate through a pipeline of microservices that react asynchronously. This decoupled design can scale to millions of events and deliver updates instantly to all interested components. A key enabler in this architecture is gRPC — a high-performance RPC framework — which, alongside message brokers, helps services communicate efficiently and reliably.

In this article, we’ll explore how an event-driven pipeline can handle flight schedule changes at scale, using travel aggregator context. We’ll discuss the architecture, the role of gRPC, and provide code snippets to illustrate the mechanics. The intended audience is software architects looking to design scalable, real-time systems for similar challenges.

Event-Driven Architecture for Real-Time Updates

Event-driven architecture is a paradigm where services communicate by producing and consuming events, rather than relying on direct point-to-point calls for every action. In our case, a flight schedule change (e.g., an airline updates a flight’s departure time) is treated as an event that flows through the system. Instead of each service continually polling for updates, they subscribe to events and react when an event occurs. This results in a responsive system that updates in near real-time.

In a travel aggregator, events can originate from external sources — for instance, an airline’s system might push a “flight delayed” event, or the aggregator might poll airline APIs and generate an event when a change is detected. At Spotnana (a modern travel platform), they process events triggered by external parties, “such as an airline that sends us real-time information on flight schedule changes, gate changes, ... and cancellations.” Given the high volume and complexity of these events, Spotnana chose a fault-tolerant event-driven approach. This means whenever a flight change is received, an event is published into the system, and multiple services can react to it concurrently.

Why event-driven? It offers several advantages crucial for scaling flight updates:

Decoupling: Airlines or data sources are producers of events, while internal microservices (search indices, user notification service, booking management, etc.) are consumers. They don’t call each other directly, which isolates failures and allows independent development. New services can subscribe to the “flight update” event stream without modifying the producers.
Scalability: The pipeline can handle a high throughput of messages by scaling horizontally. Multiple instances of a service can consume from an event queue (topic) in parallel. For example, Spotnana’s event framework is horizontally scalable — as load increases, more service instances spin up and each gets a share of the event stream via Kafka’s consumer groups. This design lets the system process bursts of events (like a wave of cancellations during a storm) by simply adding more consumers.
Real-time responsiveness: As soon as an event is published, it's delivered to consumers, which then immediately perform updates or notifications. This is much faster than waiting for a scheduled job or API poll cycle. A blog on data pipelines contrasts this with batch schedules: if stakeholders expect up-to-the-second data, an event-driven pipeline is the appropriate choice.
Complex workflows: A single flight change can trigger multiple follow-up actions. EDA allows modeling these as a series of events and reactions. For example, one event (flight delayed) could lead to a rebooking service scheduling new flights for affected passengers, a notification service sending alerts, and an analytics service logging the change — all in parallel, without a central bottleneck. Each service applies its business rules when it sees the event.

Designing the Flight Update Processing Pipeline

To handle flight schedule changes at scale, we design a pipeline with the following key components:

1. Event Ingestion

A component receives raw flight updates from external sources. In a travel aggregator, this might be a service that calls airline or GDS APIs for updates. Spotnana’s platform, for example, includes a Polling service that periodically checks third-party APIs for flight schedule changes, delays, cancellations, etc., then triggers appropriate actions based on those updates. This ingestion service transforms external updates into a standard internal event format (e.g., a Protobuf message FlightScheduleChanged).

2. Event Broker / Stream

Once an update is captured, it’s published to an event stream or message broker. Technologies like Apache Kafka (or cloud equivalents like Amazon Kinesis or Google Pub/Sub) are commonly used to buffer and distribute events. The flight update event is published to a topic (e.g., flight_updates) with a message containing details like flight ID, old schedule, new schedule, change reason, etc. The broker acts as the heart of the pipeline, decoupling producers from consumers and handling scale — Kafka can persist and deliver millions of messages with high throughput.

Example: In pseudo-code, publishing a flight update to Kafka might look like:

    Java
   
   FlightUpdate update = new FlightUpdate(flightId, oldTime, newTime, status);

byte[] payload = update.toProtobufBytes();

producer.send("flight_updates", key=flightId, value=payload);

Here we use the flight ID as a partition key to ensure ordering per flight. Partitioning is critical — Kafka will deliver all events for a given key to the same partition, thus consumers see, say, Flight 123 delayed then canceled in correct sequence. This prevents race conditions (imagine processing a cancellation before a delay). Strong ordering guarantees like this are a must for consistency.

3. Event Consumers (Microservices)

Multiple consumer services subscribe to the flight_updates topic to perform different tasks in parallel. For a travel aggregator, typical consumers might include:

Search Index Service: Updates the in-memory/search database so that any flight queries reflect the new schedule immediately.
Booking Service: Checks if any customer bookings or itineraries are affected by the change and marks them for follow-up (e.g., rebooking or refund workflow if a cancellation).
Notification Service: Sends out email/SMS/app notifications to customers or agents. For instance, if a traveler’s flight is delayed, automatically send a push notification to their phone.
Analytics/Logging Service: Records the change for analytics, reporting, or machine learning (e.g., to analyze patterns of delays).
External API Gateway: If the aggregator provides APIs to clients (like travel agencies), it might push the update to them as well.

Each consumer is a separate microservice, which can be scaled horizontally. They form the downstream stages of the pipeline. Using a pub-sub model, all consumers get the event almost simultaneously and handle their piece of the workflow. This parallelism speeds up overall processing.

Moreover, because the event broker can persist events, consumers that are temporarily down or overloaded can catch up later – ensuring reliability. The system can also implement retries and dead-letter queues: if processing fails repeatedly (say, notification service can’t reach SMS API), the message can be moved to a Dead Letter Topic for manual review, without blocking other services.

4. Orchestration and Workflow

In some cases, processing a flight change might involve multi-step workflows (e.g., confirm a rebooked flight then issue a refund). Instead of a central orchestrator, EDA often uses choreography — each event triggers the next step via new events. Advanced implementations (like Spotnana’s GossiperFlow built on Kafka Streams) maintain a lightweight workflow state machine to handle timing and dependencies. But even without a custom workflow engine, simple coordination can be achieved by passing events along a chain of topics (for example: flight_updates -> rebooking_requests -> rebooking_confirmations topics, each handled by relevant services).

Throughout this pipeline, horizontal scaling is key. If flight changes spike (e.g., during a major weather disruption, thousands of flights might change), the system should scale seamlessly. In Kubernetes or cloud environments, auto-scaling can start more consumer instances when the flight_updates queue backlog grows. Thanks to consumer groups, new instances will join and Kafka will balance partitions among them, increasing throughput linearly. This design allows processing high message volumes without dropping events.

The Role of gRPC in the Architecture

Where does gRPC come into play in this event-driven pipeline? There are a few important places:

Internal Service Communication

Not every interaction is an event. Some components may need direct request-response calls. For example, a Booking Service might need to synchronously fetch additional data (e.g. seat map) from an Inventory Service as part of handling a flight change. Travel platforms like Spotnana use gRPC for synchronous communication between back-end microservices. gRPC is preferred over REST for these internal calls because it’s faster (binary protocol) and has schema-defined contracts (Protobuf). This ensures that even when services call each other as part of processing an event, they do so with minimal latency and overhead.

High-Volume, Low-Latency Streams

gRPC supports streaming, which can be leveraged for event flows requiring extremely low latency or direct server-to-server push. By default, our pipeline uses Kafka (with typical end-to-end latencies in tens of milliseconds). But if we had a component that needed sub-millisecond delivery or wanted to avoid the broker for simplicity, we could use gRPC streaming. “gRPC is an efficient point-to-point communication solution... It uses HTTP/2, a binary Protobuf data format, asynchronous calls, and supports streaming”. In gRPC streaming, a server can keep pushing messages to a client over a single connection, or vice versa (or even bi-directionally). This essentially creates a broker-less event stream between services.

For instance, an API Gateway service could maintain a gRPC server-stream to a client app, sending real-time flight updates as they arrive, without the client constantly polling. One powerful feature of gRPC is streaming in both directions — clients can stream events to servers for processing, and servers can stream events to clients as they occur. This is especially useful for propagating events out to end-user applications or dashboards in real-time. In our context, a live airline dashboard could subscribe via a gRPC stream to get flight changes as push messages.

Unified Data Model via Protobuf

Both gRPC and many event brokers (like Kafka) can use Protocol Buffers (Protobuf) as the message format. By defining a Protobuf schema for flight update events, the system ensures a consistent data model across asynchronous and synchronous channels. Spotnana does this — they use Protobuf for Kafka event payloads, aligning with gRPC’s use of Protobuf for RPC definitions. This means a FlightUpdate message defined in a .proto file is used for both the Kafka topic and any gRPC calls, ensuring type safety and easy evolution of the schema. It’s a best practice to maintain these message definitions in a common repo and distribute to all services, so everyone “speaks” the same language.

Code example: Defining gRPC for Flight Updates: Below is a simplified example of how one might define a gRPC service for streaming flight updates using Protobuf. This could be an internal service that other microservices or clients use to get notifications of changes in real-time.

    ProtoBuf
   
 

   syntax = "proto3";

package travel.flights;
option java_package = "com.example.travel.flights"; // if using Java

message FlightUpdate {
  string flight_id = 1;
  string status = 2;        // e.g. "Delayed", "Cancelled"
  string new_departure = 3; // new departure time (ISO8601 string)
  string new_arrival = 4;   // new arrival time
  string update_time = 5;   // when this update was issued
  string reason = 6;        // optional reason code or message
}

message UpdateRequest {
  string flight_id_filter = 1;  // e.g. subscribe to a specific flight or route (optional)
}

service FlightUpdateService {
  // Client subscribes to flight updates (server will stream updates as they occur)
  rpc SubscribeUpdates(UpdateRequest) returns (stream FlightUpdate);
}
  

In this definition, SubscribeUpdates is a server-streaming RPC. A client (which could be another microservice or an external app) calls it with some filter criteria, and then the server pushes a stream of FlightUpdate messages whenever relevant events occur. Under the hood, the implementation of this service on the server side would subscribe to the Kafka topic (or another event source) and then feed the data through the gRPC stream to clients. This marries the event-driven pipeline with gRPC delivery.

On the flip side, we could also have a client-streaming RPC if we needed to ingest events via gRPC. For example, if an airline partner wanted to send updates via gRPC, they could call a PublishUpdate(stream FlightUpdate) RPC to stream events into our system. gRPC’s flexibility with streaming provides many architectural options for low-latency event ingestion and dissemination.

End-to-End Workflow Example

Let’s put it all together with a concrete scenario in a travel aggregator system:

Step 1: Ingest Change Event

A flight’s departure time changes at the airline’s end. The aggregator’s ingestion service picks this up (say, via a webhook or polling the airline API). It creates a FlightUpdate event with details of the change and publishes it to the flight_updates topic on the message broker.

Step 2: Event Distribution

The message broker (Kafka) receives the event and appends it to the flight_updates log. All subscribed consumer services are immediately alerted to a new message.

Step 3: Parallel Processing by Pipeline Consumers

The Search Index Service receives the event, deserializes the FlightUpdate message, and updates its in-memory data or database. Now if someone searches for that flight, they see the updated schedule.
The Notification Service receives the same event. It looks up any users who have this flight in their itinerary (likely via a quick DB query or cache) and then sends out notifications: an email to the customer, maybe an SMS to their phone, and an update to their itinerary in the mobile app. This service might itself call external APIs (e.g., Twilio for SMS) — those calls are done asynchronously, perhaps with retry logic if they fail.
The Booking Management Service gets the event and checks if the flight is part of any active bookings. If yes and the change is major (e.g., flight canceled or a long delay causing misconnect), it could create a new event like booking.disruption that kicks off a rebooking workflow (which might involve an agent or automated rules to find alternatives). This illustrates how one event can lead to another, forming an event-driven chain.
The Analytics Service logs the change in a database or data lake, tagging it with airline, delay length, etc., for future analysis of airline reliability or for feeding a machine learning model (like predicting delays). This is done asynchronously without slowing down user-critical services.

Step 4: gRPC Interactions

During these processes, some synchronous calls might happen. For example, the Notification Service might use gRPC to ask a Profile Service for the user’s preferred contact method (email/SMS) or time zone (to format the time). Because gRPC is efficient and uses Protobuffers, these calls complete quickly, even under load. If the Notification Service was designed to push updates to a mobile app, it could also maintain a gRPC streaming connection with the mobile backend to instantly forward the flight change (alternatively, it could use web push notifications — different solutions exist, but gRPC streaming could be one choice for server-push).

Step 5: Completion and Feedback

Within perhaps a few seconds, all relevant systems have processed the flight change. The website’s flight listings are updated, the user has been notified, and any necessary follow-up actions (rebooking, refunds) are in motion. The event pipeline ensures consistency eventually: even if one service was a bit slower or temporarily down, Kafka’s durable log means it will catch up and process the event when it can. No manual intervention is needed for the flow to complete, except in error cases.

Throughout this pipeline, resilience is maintained. If a consumer fails, the broker will retry delivering or another instance will pick up the partition. If something goes wrong in processing (say, the rebooking attempt fails), that service can emit its own events or log to a dead-letter for human resolution, without collapsing the whole pipeline.

Benefits and Conclusion

By using an event-driven pipeline architecture, travel aggregators achieve a system where flight schedule changes propagate in real-time to all dependent services and users. This design brings numerous benefits:

Scalability to Millions of Events

Decoupling via a broker and horizontal scaling of consumers means the system can handle massive event volumes (thousands of updates per second) by simply adding more consumer instances. Spotnana’s team explicitly built for this, requiring that the system “horizontally scale to support a high throughput of messages under stress load.”. The use of gRPC for efficient service-to-service calls further ensures low latency and CPU overhead, contributing to overall scalability.

Fault Tolerance

There’s no single point of failure. If one component goes down, the others are not blocked; the events will be processed when the component recovers. The pipeline can survive crashes, network blips, and even regional outages if designed with distributed brokers. Events are not lost thanks to durable logs and replication in systems like Kafka or Solace (which offers an event mesh for cross-region reliability).

Ordering and Consistency

Through partition keys and careful design, updates for each flight or booking are handled in sequence, preventing anomalies that could confuse customers (e.g., seeing a flight “un-cancelled” before it was shown canceled!). Ordering guarantees and exactly-once processing semantics (via Kafka transactions or idempotent consumers) ensure data consistency.

Extensibility

New features are easier to add. If tomorrow the aggregator wants to add a machine learning service that predicts flight delays, it can simply subscribe to the stream of flight updates. It doesn’t need to alter the existing flow; it taps into the pipeline and gets events as they occur. This loose coupling accelerates development and integration of new services.

Improved Customer Experience

Ultimately, travelers are kept informed and can react quickly. For example, they get a notification to head to a new gate or that their connecting flight is rebooked, within moments of the change. This proactive communication is only possible because the architecture delivers information in real-time. As the Prefect blog noted, if up-to-the-second data is needed by stakeholders or end-users, an event-driven approach is the way to go.

Conclusion

In summary, handling flight schedule changes for a travel aggregator is a prototypical real-time big data problem, solved elegantly by an event-driven pipeline with gRPC-enabled microservices. The pipelines ensure that as soon as an airline announces a change, a cascade of events updates every relevant part of the system. gRPC provides the glue for high-speed, typed communications in between. This combination (EDA + gRPC) has been employed by modern travel tech companies like Spotnana to build platforms that are robust, scalable, and responsive to the tune of the $1.4 trillion travel industry.

By adopting a similar architecture, architects in any industry with high-frequency updates (travel, logistics, finance, etc.) can achieve massive scalability (handling millions of events) while maintaining an agile and reliable system. Flight changes no longer pose a nightmare for system design, but instead become just another event in the stream — one that our systems are well-equipped to catch and handle in real-time.

gRPC Event microservices

Opinions expressed by DZone contributors are their own.

Related

Trending