Developing Event-Driven, Auto-Compensating Saga Transactions For Microservices
Sagas are difficult to develop due to the complexity of compensation logic and lack of support for increasingly event-driven microservices. Find solutions here.
Join the DZone community and get the full member experience.
Join For FreeOver the past decade, I have presented many times and written numerous blogs and source code on sagas and event-driven microservices. In those blogs, I’ve discussed the need for sagas in microservices architectures, the preferred and increased use of event-driven patterns and communication for microservices, and the difficulties in implementing sagas, particularly around developing saga participant code for compensating transactions. These are addressed in the product solution I will describe, including an example source code here, and soon, an update of the beta version of the saga workshop showing the same. The features are in the Oracle Database Free Docker container and soon in the Oracle Autonomous Database.
Part of what makes the new Oracle Saga Framework so powerful is its combined usage of other features in the Oracle Database including the TxEventQ transactional messaging system and reservation-less locking; therefore, I will describe them and how they contribute to the overall comprehensive solution as well.
Quick Background on Sagas
The saga pattern is used to provide data integrity between multiple services and to do so for potentially long-running transactions. There are many cursory blogs, as they tend to be, written on sagas and long-running transactions. In short, XA and 2PC require distributed locks (con) which manage ACID properties so that the user can simply execute rollback or commit (pro). In contrast, sagas use local transactions only and do not require distributed locks (pro) but require the user to implement compensation logic, etc. (con).
My previous blog showed examples of compensation logic and the need to explicitly maintain journals and handle a number of often subtle, but important complexities. Indeed, most agree that data integrity is perhaps the most difficult and critical of challenges when taking advantage of a microservices architecture.
- The Transfer Service receives a request to transfer money from one account to another.
- The
transfer
method that is called is annotated with@LRA(value = LRA.Type.REQUIRES_NEW, end = false)
; therefore, the underlying LRA client library makes a call to the Coordinator/Orchestrator service which creates a new LRA/saga and passes the LRA/saga ID back to the Transfer Service. - The Transfer Service makes a call to the
(Bank)Account
(for account 66) service to make thewithdraw
call. The LRA/saga ID is propagated as a header as part of this call. - The
withdraw
method that is called is annotated with@LRA(value = LRA.Type.MANDATORY, end = false)
; therefore, the underlying client library makes a call to the Coordinator/Orchestrator service, which recognizes the LRA/saga ID and enlists/joins the Account Service endpoint (address) to the LRA/saga started by the Transfer Service. This endpoint has a number of methods, including thecomplete
andcompensate
methods that will be called when the saga/LRA is terminated/ended. - The
withdraw
method is executed and control returns to the Transfer Service. - This is repeated with a call from the Transfer Service to the Account service (for account 67) to make the
deposit
call. - Depending on the returns from the Account Service calls, the Transfer Service determines if it should
close
orcancel
the saga/LRA.close
andcancel
are somewhat analogous tocommit
orrollback
. - The Transfer Service issues the
close
orcancel
call to the Coordinator (I will get into more details on how this is done implicitly when looking closer at the application). - The Coordinator in turn issues
complete
(in the case of close) orcompensate
calls on the participants that were joined in Saga/LRA previously.
Oracle TxEventQ (Formerly AQ) Messaging System in the Database
There are several advantages to event-driven microservices — particularly those that are close to the (potentially critical) data — including scalability and QoS levels, reliability, transactionality, integration and routing, versioning, etc.
Of course, there needs to be a messaging system/broker in order to provide event-driven sagas. The TxEventQ messaging system (formerly called AQ) has been part of the Oracle database for decades (long before Kafka existed). It provides some key differentiators not available in other messaging systems; in particular, the ability to do messaging and data operations in the same local transaction — which is required to provide transactional outbox, idempotent producers and consumers, and in particular, the robust saga simply can't do. These are described in the blog "Apache Kafka vs. Oracle Transactional Event Queues as Microservices Event Mesh," but the following table gives an idea of the common scenario that would require extra developer and admin handling or is simply not possible in Kafka and other messaging and database systems.
The scenario involves an Order
micoservice inserting an order
in the database and sending a message to an Inventory
microservice. The Inventory
microservices receives the message, updates the Inventory
table, and sends a message back to the Order
service, which receives that message and updates the Order
in the database.
Notice how the Oracle Database and TxEventQ handle all failure scenarios automatically.
Auto-Compensating Data Types via Lock-Free Reservations
Saga and Escrow History
There is little debate that the saga pattern is currently the best approach to data integrity between microservices and long-running transactions/activities. This comes as little surprise, as it has a long history starting with the original paper that was published in 1987 which also states that a simplified and optimal implementation of the saga pattern is one where the coordinator is implemented in the database(s).
The concept of escrow concurrency and compensation-aware transactions was described even earlier in 1985. The new Oracle database feature is named "Lock-free Reservations," as a reservation journal acts as an intermediary to the actual data table for any fields marked with the keyword RESERVABLE
. Here is an example of how easy it is to simply label a column/field as reservable:
CREATE TABLE bankA ( ucid VARCHAR2(50), account_number NUMBER(20) PRIMARY KEY, account_type VARCHAR2(15) CHECK (account_type IN ('CHECKING', 'SAVING')), balance_amount decimal(10,2) RESERVABLE constraint balance_con check(balance_amount >= 0), created_at TIMESTAMP DEFAULT SYSTIMESTAMP );
An internal reservation journal table is created and managed automatically by the database (with nomenclature SYS_RESERVJRNL_<object_number_of_base_table>
) which tracks the actions made on the reservable field by concurrent transactions.
Changes requested by each transaction are verified against the journal value (not the actual database table), and thus, promises of the change are made to the transactions based on the reservation/journal. The changes are not flushed/processed on the underlying table until the commit of the transaction(s). The modifications made on these fields must be commutative; that is, relative increment/decrement operations such as quantity = quantity + 1, not absolute assignments such as quantity = 2. This is the case in the vast majority of data hot spots and indeed, even state machines work on this principle. Along with the fine-grained/column-level nature of escrow/reservations, high throughput for hot spots of concurrent transactions is attained. Likewise, transactions do not block for long-running transactions.
A customer in a store no longer locks all of a particular type of an item just because one of the items is in their cart, nor can they take items from another person's cart.
A good way to understand is to compare and contrast lock-less reservations/escrow with the concurrency mechanisms, drawbacks, and benefits of pessimistic and optimistic locking.
Optimistic Locking
Escrow Locking
What is extremely interesting is the fact that the journaling, etc. conducted by lock-free reservations is also used by the Oracle Saga Framework to provide auto-compensating/compensation-aware data.
The Saga framework performs compensating actions during a Saga rollback. Reservation journal entries provide the data that is required to take compensatory actions for Saga transactions. The Saga framework sequentially processes the saga_finalization$
table for a Saga branch and executes the compensatory actions using the reservation journal.
In other words, it removes the burden of coding the compensation logic, as described in the Developing Saga Participant Code For Compensating Transactions blog.
Quick Feature Comparison in Saga Implementations
In my previous blog, I used the versatile Oracle MicroTx product, written by the same team that wrote the famous Tuxedo transaction processing monitor. I've provided this table of comparison features to show what is provided by LRA in general and what unique features currently exist (others are in development) between the two Oracle Saga coordinator implementations.
Without LRA | LRA | MicroTX Sagas/LRA | Oracle Database Sagas/LRA | |
---|---|---|---|---|
Automatic propagation of saga/LRA ID and participant enlistment |
X |
X |
X |
|
Automatic coordination of completion protocol (commit/rollback). |
X |
X |
X |
|
Automatic timeout and recovery logic |
X |
X |
X |
|
REST support |
X |
X |
||
Messaging support |
X |
|||
Automatic recovery state maintained in participants |
X |
|||
Automatic Compensating Data via Lock-free Reservations |
X |
|||
XA and Try-Cancel-Commit support |
X |
|||
Coordinator runs in... and HA, Security, ... is supported by... |
Kubernetes |
Oracle Database |
||
Languages directly supported |
Java, JavaScript |
Java, PL/SQL |
Application Setup and Code
Setup
There are a few simple setup steps on the database side that need to be issued just once to initialize the system. The full doc can be found here.
It is possible to use a number of different configurations for microservices, all of which are supported by the Oracle Database Saga Framework. For example, there can be schema or another isolation level between microservices, or there can be a strict database-per-service isolation. We will show the latter here and use a Pluggable Database (PDB) per service. A PDB is a devoted database that can be managed as a unit/CDB for HA, etc., making it perfect for microservices per service.
- Create database links between each database for message propagation, forming an event mesh. The command looks like this:
CREATE PUBLIC DATABASE LINK PDB2_LINK CONNECT TO admin IDENTIFIED BY test USING 'cdb1_pdb2'; CREATE PUBLIC DATABASE LINK PDB3_LINK CONNECT TO admin IDENTIFIED BY test USING 'cdb1_pdb3'; CREATE PUBLIC DATABASE LINK PDB4_LINK CONNECT TO admin IDENTIFIED BY test USING 'cdb1_pdb4';
2. Grant saga-related privileges to the saga coordinator/admin.
grant saga_adm_role to admin; grant saga_participant_role to admin; grant saga_connect_role to admin; grant all on sys.saga_message_broker$ to admin; grant all on sys.saga_participant$ to admin; grant all on sys.saga$ to admin; grant all on sys.saga_participant_set$ to admin;
3. add_broker
and add_coordinator
:
exec dbms_saga_adm.add_broker(broker_name => 'TEST', broker_schema => 'admin'); exec dbms_saga_adm.add_coordinator(coordinator_name => 'CloudBankCoordinator', mailbox_schema => 'admin', broker_name => 'TEST', dblink_to_coordinator => 'pdb1_link'); exec dbms_saga_adm.add_participant(participant_name => 'CloudBank', coordinator_name => 'CloudBankCoordinator' , dblink_to_broker => 'pdb1_link' , mailbox_schema => 'admin' , broker_name => 'TEST', dblink_to_participant => 'pdb1_link');
4. add_participant
(s):
exec dbms_saga_adm.add_participant(participant_name=> 'BankB' ,dblink_to_broker => 'pdb1_link',mailbox_schema=> 'admin',broker_name=> 'TEST', dblink_to_participant=> 'pdb3_link');
Application Dependencies
On the Java application side, we just need to add these two dependencies to the maven pom.xml:
<dependency> <groupId>com.oracle.database.saga</groupId> <artifactId>saga-core</artifactId> <version>[23.3.0,)</version> </dependency> <dependency> <groupId>com.oracle.database.saga</groupId> <artifactId>saga-filter</artifactId> <version>[23.3.0,)</version> </dependency>
Application Source Code
As the Oracle Database Saga Framework implements the MicroProfile LRA (Long Running Actions) specification, much of the code, annotations, etc. that I've presented in previous blogs apply to this one.
However, though future support has been discussed, the LRA specification does not support messaging/eventing — only REST (it supports Async REST, but that is of course not messaging/eventing) — and so a few additional annotations have been furnished to provide such support and take advantage of the TxEventQ transactional messaging and auto-compensation functionality already described. Full documentation can be found here, but the key two additions are @Request
in the saga/LRA participants and @Response
in the initiating service/participant as described in the following.
Note that the Oracle Database Saga Framework also provides access to the same saga functionality via direct API calls (e.g., SagaInitiator beginSaga()
, Saga sendRequest
, commitSaga
, rollbackSaga
, etc.), and so can be used not only in JAX-RS clients but any Java client, and, of course, in PL/SQL as well. As shown in the previous blog and code repos, JAX-RS can also be used in Spring Boot.
The example code snippets below show the classic TravelAgency
saga scenario (with Airline
, etc., as participants) while the example I've been using in the previous blog and the GitHub repos provided continues the bank transfer scenario. The same principles apply, of course; just using different use cases to illustrate.
The Initiator (The Initiating Participant)
@LRA
for demarcation of the saga/LRA indicating whether the method should start, end, or join an LRA.@Response
is an Oracle Saga-specific annotation, indicating this method collects responses from Saga participants (who were enrolled into a Saga using thesendRequest()
API and the name of the participant (Airline
, in this case).
@Participant(name = "TravelAgency")
/* @Participant declares the participant’s name to the saga framework */
public class TravelAgencyController extends SagaInitiator {
/* TravelAgencyController extends the SagaInitiator class */
@LRA(end = false)
/* @LRA annotates the method that begins a saga and invites participants */
@POST("booking")
@Consumes(MediaType.TEXT_PLAIN)
@Produces(MediaType.APPLICATION_JSON)
public jakarta.ws.rs.core.Response booking(
@HeaderParam(LRA_HTTP_CONTEXT_HEADER) URI lraId,
String bookingPayload) {
Saga saga = this.getSaga(lraId.toString());
/* The application can access the sagaId via the HTTP header
and instantiate the Saga object using it */
try {
/* The TravelAgency sends a request to the Airline sending
a JSON payload using the Saga.sendRequest() method */
saga.sendRequest ("Airline", bookingPayload);
response = Response.status(Response.Status.ACCEPTED).build();
} catch (SagaException e) {
response=Response.status(Response.Status.INTERNAL_SERVER_ERROR).build();
}
}
@Response(sender = "Airline.*")
/* @Response annotates the method to receive responses from a specific
Saga participant */
public void responseFromAirline(SagaMessageContext info) {
if (info.getPayload().equals("success")) {
saga.commitSaga ();
/* The TravelAgency commits the saga if a successful response is received */
} else {
/* Otherwise, the TravelAgency performs a Saga rollback */
saga.rollbackSaga ();
}
}
}
The Participant Services
@Request
is an Oracle Saga-specific annotation to indicate the method that receives incoming requests from Saga initiators.@Complete
: The completion callback (called by the coordinator) for the saga/LRA@Compensate
: The compensate callback (called by the coordinator) for the saga/LRA
The Saga framework provides a SagaMessageContext
object as an input to the annotated method which includes convenience methods to get the Saga
, SagaId
, Sender
, Payload
, and Connection
(to use transactionally and as an auto-compensating data type as part of the saga as described earlier).
@Participant(name = "Airline")
/* @Participant declares the participant’s name to the saga framework */
public class Airline extends SagaParticipant {
/* Airline extends the SagaParticipant class */
@Request(sender = "TravelAgency")
/* The @Request annotates the method that handles incoming request from a given
sender, in this example the TravelAgency */
public String handleTravelAgencyRequest(SagaMessageContext
info) {
/* Perform all DML with this connection to ensure
everything is in a single transaction */
FlightService fs = new
FlightService(info.getConnection());
fs.bookFlight(info.getPayload(), info.getSagaId());
return response;
/* Local commit is automatically performed by the saga framework.
The response is returned to the initiator */
}
@Compensate
/* @Compensate annotates the method automatically called to roll back a saga */
public void compensate(SagaMessageContext info) {
fs.deleteBooking(info.getPayload(),
info.getSagaId());
}
@Complete
/* @Complete annotates the method automatically called to commit a saga */
public void complete(SagaMessageContext info) {
fs.sendConfirmation(info.getSagaId());
}
}
APEX Workflow With Oracle Saga Framework
Oracle's new APEX Workflow product has been designed to include and account for sagas. More blogs with details are coming, but to give you an idea, the following shows the same bank transfer saga we've been discussing, but defined in a workflow and with the inclusion of a manual step in the flow for approval of the transfer (a common use case in finance and other workflows).
You can read more about the workflow product in the blogs here and here.
Other Topics: Observability, Optimizations, and Workshop
Event-driven applications are, of course, different from blocking/sync/REST applications, and lend to different patterns and advantages, particularly as far as parallelism. Therefore, settings for pool size, number of publishers and listeners, etc. are part of the saga framework in order to optimize.
As the journaling and bookkeeping are stored in the database along with the data and messaging, completion and compensation can be conducted locally there, making it in many cases unnecessary to make the callbacks to the application code that are otherwise necessary. This again greatly simplifies development and also drastically cuts down on the costs and considerations of network calls.
Microservices, and especially those containing sagas, require effective observability. Especially those containing sagas need this observability not only in the application but also in the saga coordinator, communication infrastructure, and database. Oracle has an OpenTelemetry-based solution for this that is coordinated across all tiers. A "DevOps meets DataOps" video explains this Unified Observability architecture and how it can be used with entirely open-source products such as Kubernetes (including eBPF), Prometheus, Loki and Promtail, ELK stack, Jaeger and Zipkin, Grafana, etc.
Finally, note that the existing beta workshop will soon be updated to include the new GA release of the Saga Framework which will be announced at this same blog space.
Conclusion
Thank you for reading and of course please feel free to reach out to me with any questions or feedback.
I want to give credit to the Oracle TxEventQ and Transaction Processing teams for all the amazing work they've done to conquer some of if not the most difficult areas of data-driven microservices and simplify it for the developers. I'd like to give special credit to Oracle's Dieter Gawlick who started the work in both escrow (lock-free reservations) and compensation-aware datatypes and sagas 40 years ago and who is also the original architect of TxEventQ (formerly AQ) with its ability to do messaging and data manipulation in the same local transaction.
Published at DZone with permission of Paul Parkinson. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments