Modernizing Long-Running Transactions for the Microservices Era

This open-source platform solves the long-running process problem in a modern microservices environment by replacing Sagas with workflows.

Dominik Tornow

Sep. 13, 22 · Opinion

Likes (3)

Comment

Save

7.3K Views

While some software transactions are simple, others require more maintenance. Long-running transactions in particular have become the bane of the microservices era, as they need careful stewardship to ensure successful completion. Developers have dealt with this problem for decades by writing extra code to manage more demanding transactions. Now, in the distributed world of microservices, they must tackle it again. So exploring this topic — and how modern execution platforms can help — offers a number of compelling insights.

How Do We Define “Long-Running Transaction” Anyway?

The simplest transactions are predictable, involving a straightforward request and response interaction. However, more complex transactions span multiple steps that can take different lengths of time to complete, especially when manual input is involved.

An example of this is booking a multi-leg air flight. A person might need to book multiple flights as part of a single transaction, but it might take time to manually check and book options for each individual leg of the journey. So the system must keep track of what’s already been booked and what’s available, as the user goes through the booking process from start to finish.

Careful design in a distributed application, comprising multiple microservices, is essential as the booking transaction might last mere minutes — or it could take days — depending on how long the user takes to complete their booking. Another example of a process, which often spans years, is a subscription management transaction. Streaming services like Netflix might charge subscribers each and every month. The system wakes up after 30 days, processes a credit card payment, and updates a customer’s account, in a series of regular smaller steps.

Both of these examples, airline booking and subscription services, show that it isn’t appropriate to define a long-running transaction based on the time it takes to execute. One might last a few minutes while the other spans years. So ultimately, uncertain execution times mean that long-running transactions cannot and should not have any time limits at all. For this reason, at Temporal, we call long-running operations arbitrary length operations.

The Importance of Strong Execution Guarantees

One thing a long-running transaction must have is a strong execution guarantee, minimizing the risk of failure. This means that it must be stateful. If any underlying step fails, due to technical or manual disruption, it should recall which step it’s at, and resume operation from that point.

The airline booking app needs to either succeed completely, with all flights booked — or not succeed at all, with no flights booked in order to maintain the integrity of the user’s trip.

The subscription management transaction must also maintain state. It must remember that the user paid their subscription for June so that it can charge them in July. If it fails, it can’t simply restart from the beginning and recharge them for each and every month since the account was created.

Short-running transactions, on the other hand, only provide weak execution guarantees. They can’t ensure reliability because they can’t maintain their own state in the event of a failure. Instead, they rely on the processes that called it to monitor its state, detecting and mitigating any problems with its operation.

Sagas and the History of Microservices

When we look back through history, a paper published at Princeton University way back in 1987 combined long-running and short-running transactions in a software development pattern known as a Saga. A Saga breaks down a long-running transaction’s steps into a series of short transactions.

The Saga manages each of these short transactions, issuing compensating transactions for those that fail. It maintains state across all of the short-running transactions using database storage for persistence. The Saga retries short transactions when necessary, based on message queues and timers.

Developers today use Sagas to support long-running transactions in microservices, but it’s painful. Sagas are patterns, not solutions, and developers must implement Sagas themselves. They must write code to address process failures, store state, and manage time limits. This is both burdensome and brittle.

Microservices make Sagas even harder to support because these services can call each other. A microservice conducting a short-running transaction might call another microservice for completion. The developer has to manage those relationships to support the overall Saga.

Workflows to the Rescue

Temporal is an MIT-licensed open-source platform, and solves the long-running process problem in a modern microservices environment by replacing Sagas with workflows. These are multi-step processes designed specifically for distributed systems that can last for indeterminate amounts of time.

Workflows exist independently of request-response flows while still providing strong execution guarantees. This means that the developer can support long-running transactions in their application without writing ad hoc state management code. The application needn’t be aware of failure. Temporal manages application state under the hood, regardless of how long that process runs.

Temporal’s environment includes redundant components and geographic independence across multiple availability zones to ensure consistency. Even in a catastrophic system state, the platform will preserve a workflow’s state until the underlying infrastructure recovers.

Not only does a workflow preserve its own state, but it also supports the chains of short transactions created when microservices call each other. If any of the microservices in a chain of short transactions fails, the platform can automatically restart them to help the chain complete.

Workflows achieve the same goals as Sagas, but with almost no work on the developer’s part, save for writing a few lines of code to set up the workflow. Just as a database hides time limits and failures from a developer when dealing with atomic transactions, a workflow hides them from developers in a microservices environment.

By managing application state as a core abstraction, Temporal modernizes the Saga solution for the microservices age. It lets developers create long-running transactions spanning multiple steps and hundreds of microservices with unpredictable execution times. This leaves programmers free to code their business logic safe in the knowledge that those workflows will stay consistent.

Whether your system is booking an air flight or managing a subscription that queries multiple systems — or any number of other examples — you can create a workflow in just a few lines of code.

Coding and managing long-running transactions in complex applications needn’t be such a Saga after all.

microservice Data Types

Opinions expressed by DZone contributors are their own.

Related

Trending