Developing Event-Driven, Auto-Compensating Saga Transactions For Microservices

Sagas are difficult to develop due to the complexity of compensation logic and lack of support for increasingly event-driven microservices. Find solutions here.

Paul Parkinson

Jul. 25, 24 · Tutorial

Likes (2)

Comment

Save

6.2K Views

Over the past decade, I have presented many times and written numerous blogs and source code on sagas and event-driven microservices. In those blogs, I’ve discussed the need for sagas in microservices architectures, the preferred and increased use of event-driven patterns and communication for microservices, and the difficulties in implementing sagas, particularly around developing saga participant code for compensating transactions. These are addressed in the product solution I will describe, including an example source code here, and soon, an update of the beta version of the saga workshop showing the same. The features are in the Oracle Database Free Docker container and soon in the Oracle Autonomous Database.

Part of what makes the new Oracle Saga Framework so powerful is its combined usage of other features in the Oracle Database including the TxEventQ transactional messaging system and reservation-less locking; therefore, I will describe them and how they contribute to the overall comprehensive solution as well.

Quick Background on Sagas

The saga pattern is used to provide data integrity between multiple services and to do so for potentially long-running transactions. There are many cursory blogs, as they tend to be, written on sagas and long-running transactions. In short, XA and 2PC require distributed locks (con) which manage ACID properties so that the user can simply execute rollback or commit (pro). In contrast, sagas use local transactions only and do not require distributed locks (pro) but require the user to implement compensation logic, etc. (con).

My previous blog showed examples of compensation logic and the need to explicitly maintain journals and handle a number of often subtle, but important complexities. Indeed, most agree that data integrity is perhaps the most difficult and critical of challenges when taking advantage of a microservices architecture.

The Transfer Service receives a request to transfer money from one account to another.
The transfer method that is called is annotated with @LRA(value = LRA.Type.REQUIRES_NEW, end = false); therefore, the underlying LRA client library makes a call to the Coordinator/Orchestrator service which creates a new LRA/saga and passes the LRA/saga ID back to the Transfer Service.
The Transfer Service makes a call to the (Bank)Account (for account 66) service to make the withdraw call. The LRA/saga ID is propagated as a header as part of this call.
The withdraw method that is called is annotated with @LRA(value = LRA.Type.MANDATORY, end = false); therefore, the underlying client library makes a call to the Coordinator/Orchestrator service, which recognizes the LRA/saga ID and enlists/joins the Account Service endpoint (address) to the LRA/saga started by the Transfer Service. This endpoint has a number of methods, including the complete and compensate methods that will be called when the saga/LRA is terminated/ended.
The withdraw method is executed and control returns to the Transfer Service.
This is repeated with a call from the Transfer Service to the Account service (for account 67) to make the deposit call.
Depending on the returns from the Account Service calls, the Transfer Service determines if it should close or cancel the saga/LRA. close and cancel are somewhat analogous to commit or rollback.
The Transfer Service issues the close or cancel call to the Coordinator (I will get into more details on how this is done implicitly when looking closer at the application).
The Coordinator in turn issues complete (in the case of close) or compensate calls on the participants that were joined in Saga/LRA previously.

Oracle TxEventQ (Formerly AQ) Messaging System in the Database

There are several advantages to event-driven microservices — particularly those that are close to the (potentially critical) data — including scalability and QoS levels, reliability, transactionality, integration and routing, versioning, etc.

Of course, there needs to be a messaging system/broker in order to provide event-driven sagas. The TxEventQ messaging system (formerly called AQ) has been part of the Oracle database for decades (long before Kafka existed). It provides some key differentiators not available in other messaging systems; in particular, the ability to do messaging and data operations in the same local transaction — which is required to provide transactional outbox, idempotent producers and consumers, and in particular, the robust saga simply can't do. These are described in the blog "Apache Kafka vs. Oracle Transactional Event Queues as Microservices Event Mesh," but the following table gives an idea of the common scenario that would require extra developer and admin handling or is simply not possible in Kafka and other messaging and database systems.

The scenario involves an Order micoservice inserting an order in the database and sending a message to an Inventory microservice. The Inventory microservices receives the message, updates the Inventory table, and sends a message back to the Order service, which receives that message and updates the Order in the database.

Notice how the Oracle Database and TxEventQ handle all failure scenarios automatically.

Auto-Compensating Data Types via Lock-Free Reservations

Saga and Escrow History

There is little debate that the saga pattern is currently the best approach to data integrity between microservices and long-running transactions/activities. This comes as little surprise, as it has a long history starting with the original paper that was published in 1987 which also states that a simplified and optimal implementation of the saga pattern is one where the coordinator is implemented in the database(s).

The concept of escrow concurrency and compensation-aware transactions was described even earlier in 1985. The new Oracle database feature is named "Lock-free Reservations," as a reservation journal acts as an intermediary to the actual data table for any fields marked with the keyword RESERVABLE. Here is an example of how easy it is to simply label a column/field as reservable:

CREATE TABLE bankA (
  ucid VARCHAR2(50),
  account_number NUMBER(20) PRIMARY KEY,
  account_type VARCHAR2(15) CHECK (account_type IN ('CHECKING', 'SAVING')),
  balance_amount decimal(10,2) RESERVABLE constraint balance_con check(balance_amount >= 0),
  created_at TIMESTAMP DEFAULT SYSTIMESTAMP
);

An internal reservation journal table is created and managed automatically by the database (with nomenclature SYS_RESERVJRNL_<object_number_of_base_table>) which tracks the actions made on the reservable field by concurrent transactions.

Changes requested by each transaction are verified against the journal value (not the actual database table), and thus, promises of the change are made to the transactions based on the reservation/journal. The changes are not flushed/processed on the underlying table until the commit of the transaction(s). The modifications made on these fields must be commutative; that is, relative increment/decrement operations such as quantity = quantity + 1, not absolute assignments such as quantity = 2. This is the case in the vast majority of data hot spots and indeed, even state machines work on this principle. Along with the fine-grained/column-level nature of escrow/reservations, high throughput for hot spots of concurrent transactions is attained. Likewise, transactions do not block for long-running transactions.

A customer in a store no longer locks all of a particular type of an item just because one of the items is in their cart, nor can they take items from another person's cart.

A good way to understand is to compare and contrast lock-less reservations/escrow with the concurrency mechanisms, drawbacks, and benefits of pessimistic and optimistic locking.

Pessimistic Locking

Optimistic Locking

Escrow Locking

What is extremely interesting is the fact that the journaling, etc. conducted by lock-free reservations is also used by the Oracle Saga Framework to provide auto-compensating/compensation-aware data.

The Saga framework performs compensating actions during a Saga rollback. Reservation journal entries provide the data that is required to take compensatory actions for Saga transactions. The Saga framework sequentially processes the saga_finalization$ table for a Saga branch and executes the compensatory actions using the reservation journal.

In other words, it removes the burden of coding the compensation logic, as described in the Developing Saga Participant Code For Compensating Transactions blog.

Quick Feature Comparison in Saga Implementations

In my previous blog, I used the versatile Oracle MicroTx product, written by the same team that wrote the famous Tuxedo transaction processing monitor. I've provided this table of comparison features to show what is provided by LRA in general and what unique features currently exist (others are in development) between the two Oracle Saga coordinator implementations.

	LRA	MicroTX Sagas/LRA	Oracle Database Sagas/LRA
Automatic propagation of saga/LRA ID and participant enlistment	X	X	X
Automatic coordination of completion protocol (commit/rollback).	X	X	X
Automatic timeout and recovery logic	X	X	X
REST support	X	X
Messaging support			X
Automatic recovery state maintained in participants			X
Automatic Compensating Data via Lock-free Reservations			X
XA and Try-Cancel-Commit support		X
Coordinator runs in... and HA, Security, ... is supported by...		Kubernetes	Oracle Database
Languages directly supported		Java, JavaScript	Java, PL/SQL

Application Setup and Code

Setup

There are a few simple setup steps on the database side that need to be issued just once to initialize the system. The full doc can be found here.

It is possible to use a number of different configurations for microservices, all of which are supported by the Oracle Database Saga Framework. For example, there can be schema or another isolation level between microservices, or there can be a strict database-per-service isolation. We will show the latter here and use a Pluggable Database (PDB) per service. A PDB is a devoted database that can be managed as a unit/CDB for HA, etc., making it perfect for microservices per service.

Create database links between each database for message propagation, forming an event mesh. The command looks like this:

CREATE PUBLIC DATABASE LINK PDB2_LINK CONNECT TO admin IDENTIFIED BY test USING 'cdb1_pdb2';
CREATE PUBLIC DATABASE LINK PDB3_LINK CONNECT TO admin IDENTIFIED BY test USING 'cdb1_pdb3';
CREATE PUBLIC DATABASE LINK PDB4_LINK CONNECT TO admin IDENTIFIED BY test USING 'cdb1_pdb4';

2. Grant saga-related privileges to the saga coordinator/admin.

grant saga_adm_role to admin;
grant saga_participant_role to admin;
grant saga_connect_role to admin;
grant all on sys.saga_message_broker$ to admin;
grant all on sys.saga_participant$ to admin;
grant all on sys.saga$ to admin;
grant all on sys.saga_participant_set$ to admin;

3. add_broker and add_coordinator:

exec dbms_saga_adm.add_broker(broker_name => 'TEST', broker_schema => 'admin');
exec dbms_saga_adm.add_coordinator(coordinator_name => 'CloudBankCoordinator', mailbox_schema => 'admin', broker_name => 'TEST', dblink_to_coordinator => 'pdb1_link');
exec dbms_saga_adm.add_participant(participant_name => 'CloudBank', coordinator_name => 'CloudBankCoordinator' , dblink_to_broker => 'pdb1_link' , mailbox_schema => 'admin' , broker_name => 'TEST', dblink_to_participant => 'pdb1_link');

4. add_participant(s):

exec dbms_saga_adm.add_participant(participant_name=> 'BankB' ,dblink_to_broker => 'pdb1_link',mailbox_schema=> 'admin',broker_name=> 'TEST', dblink_to_participant=> 'pdb3_link');

Application Dependencies

On the Java application side, we just need to add these two dependencies to the maven pom.xml:

<dependency>
  <groupId>com.oracle.database.saga</groupId>
  <artifactId>saga-core</artifactId>
  <version>[23.3.0,)</version>
</dependency>
<dependency>
  <groupId>com.oracle.database.saga</groupId>
  <artifactId>saga-filter</artifactId>
  <version>[23.3.0,)</version>
</dependency>

Application Source Code

As the Oracle Database Saga Framework implements the MicroProfile LRA (Long Running Actions) specification, much of the code, annotations, etc. that I've presented in previous blogs apply to this one.

However, though future support has been discussed, the LRA specification does not support messaging/eventing — only REST (it supports Async REST, but that is of course not messaging/eventing) — and so a few additional annotations have been furnished to provide such support and take advantage of the TxEventQ transactional messaging and auto-compensation functionality already described. Full documentation can be found here, but the key two additions are @Request in the saga/LRA participants and @Response in the initiating service/participant as described in the following.

Note that the Oracle Database Saga Framework also provides access to the same saga functionality via direct API calls (e.g., SagaInitiator beginSaga(), Saga sendRequest, commitSaga, rollbackSaga, etc.), and so can be used not only in JAX-RS clients but any Java client, and, of course, in PL/SQL as well. As shown in the previous blog and code repos, JAX-RS can also be used in Spring Boot.

The example code snippets below show the classic TravelAgency saga scenario (with Airline, etc., as participants) while the example I've been using in the previous blog and the GitHub repos provided continues the bank transfer scenario. The same principles apply, of course; just using different use cases to illustrate.

The Initiator (The Initiating Participant)

@LRA for demarcation of the saga/LRA indicating whether the method should start, end, or join an LRA.
@Response is an Oracle Saga-specific annotation, indicating this method collects responses from Saga participants (who were enrolled into a Saga using the sendRequest() API and the name of the participant (Airline, in this case).

    Java
   
 

   @Participant(name = "TravelAgency")
/* @Participant declares the participant’s name to the saga framework */
public class TravelAgencyController extends SagaInitiator {
/* TravelAgencyController extends the SagaInitiator class */
 @LRA(end = false)
 /* @LRA annotates the method that begins a saga and invites participants */
 @POST("booking")
 @Consumes(MediaType.TEXT_PLAIN)
 @Produces(MediaType.APPLICATION_JSON)
  public jakarta.ws.rs.core.Response booking( 
      @HeaderParam(LRA_HTTP_CONTEXT_HEADER) URI lraId,
      String bookingPayload) {
    Saga saga = this.getSaga(lraId.toString());
    /* The application can access the sagaId via the HTTP header
       and instantiate the Saga object using it */
    try {
      /* The TravelAgency sends a request to the Airline sending
         a JSON payload using the Saga.sendRequest() method */
      saga.sendRequest ("Airline", bookingPayload);
      response = Response.status(Response.Status.ACCEPTED).build();
    } catch (SagaException e) {
      response=Response.status(Response.Status.INTERNAL_SERVER_ERROR).build();
    }
  }
  @Response(sender = "Airline.*")
  /* @Response annotates the method to receive responses from a specific
     Saga participant */
  public void responseFromAirline(SagaMessageContext info) {
    if (info.getPayload().equals("success")) {
      saga.commitSaga ();
      /* The TravelAgency commits the saga if a successful response is received */
    } else {
      /* Otherwise, the TravelAgency performs a Saga rollback  */
      saga.rollbackSaga ();
    }
  }
}
  

The Participant Services

@Request is an Oracle Saga-specific annotation to indicate the method that receives incoming requests from Saga initiators.
@Complete: The completion callback (called by the coordinator) for the saga/LRA
@Compensate: The compensate callback (called by the coordinator) for the saga/LRA

The Saga framework provides a SagaMessageContext object as an input to the annotated method which includes convenience methods to get the Saga, SagaId, Sender, Payload, and Connection (to use transactionally and as an auto-compensating data type as part of the saga as described earlier).

    Java
   
 

   @Participant(name = "Airline")
/* @Participant declares the participant’s name to the saga framework */
public class Airline extends SagaParticipant {
/* Airline extends the SagaParticipant class */
  @Request(sender = "TravelAgency")
  /* The @Request annotates the method that handles incoming request from a given
     sender, in this example the TravelAgency */
  public String handleTravelAgencyRequest(SagaMessageContext    
      info) {
     
    /* Perform all DML with this connection to ensure 
       everything is in a single transaction */  
    FlightService fs = new 
      FlightService(info.getConnection());
    fs.bookFlight(info.getPayload(), info.getSagaId());
    return response;
    /* Local commit is automatically performed by the saga framework.  
       The response is returned to the initiator */  
  }
  
  @Compensate 
  /* @Compensate annotates the method automatically called to roll back a saga */
  public void compensate(SagaMessageContext info) {
    fs.deleteBooking(info.getPayload(), 
        info.getSagaId());
  }
  @Complete 
  /* @Complete annotates the method automatically called to commit a saga */
  public void complete(SagaMessageContext info) {
    fs.sendConfirmation(info.getSagaId());
  }
}
  

APEX Workflow With Oracle Saga Framework

Oracle's new APEX Workflow product has been designed to include and account for sagas. More blogs with details are coming, but to give you an idea, the following shows the same bank transfer saga we've been discussing, but defined in a workflow and with the inclusion of a manual step in the flow for approval of the transfer (a common use case in finance and other workflows).

You can read more about the workflow product in the blogs here and here.

Conclusion

Thank you for reading and of course please feel free to reach out to me with any questions or feedback.

I want to give credit to the Oracle TxEventQ and Transaction Processing teams for all the amazing work they've done to conquer some of if not the most difficult areas of data-driven microservices and simplify it for the developers. I'd like to give special credit to Oracle's Dieter Gawlick who started the work in both escrow (lock-free reservations) and compensation-aware datatypes and sagas 40 years ago and who is also the original architect of TxEventQ (formerly AQ) with its ability to do messaging and data manipulation in the same local transaction.

Data integrity Oracle Database Event Framework microservice

Published at DZone with permission of Paul Parkinson. See the original article here.

Opinions expressed by DZone contributors are their own.

Developing Event-Driven, Auto-Compensating Saga Transactions For Microservices

Sagas are difficult to develop due to the complexity of compensation logic and lack of support for increasingly event-driven microservices. Find solutions here.

Quick Background on Sagas

Oracle TxEventQ (Formerly AQ) Messaging System in the Database

Auto-Compensating Data Types via Lock-Free Reservations

Saga and Escrow History

Quick Feature Comparison in Saga Implementations

Application Setup and Code

Setup

Application Dependencies

Application Source Code

The Initiator (The Initiating Participant)

The Participant Services

APEX Workflow With Oracle Saga Framework

Other Topics: Observability, Optimizations, and Workshop

Conclusion

Partner Resources

Related

Trending

Developing Event-Driven, Auto-Compensating Saga Transactions For Microservices

Sagas are difficult to develop due to the complexity of compensation logic and lack of support for increasingly event-driven microservices. Find solutions here.

Quick Background on Sagas

Oracle TxEventQ (Formerly AQ) Messaging System in the Database

Auto-Compensating Data Types via Lock-Free Reservations

Saga and Escrow History

Quick Feature Comparison in Saga Implementations

Application Setup and Code

Setup

Application Dependencies

Application Source Code

The Initiator (The Initiating Participant)

The Participant Services

APEX Workflow With Oracle Saga Framework

Other Topics: Observability, Optimizations, and Workshop

Conclusion

Related

Partner Resources