Over a million developers have joined DZone.

A quick glance into Distributed Transaction

· DevOps Zone

Discover how to optimize your DevOps workflows with our cloud-based automated testing infrastructure, brought to you in partnership with Sauce Labs

Scenario 1:  Online payment:

A typical online shopping would have an order management system, inventory system, purchase order system.  When we place an order for a particular item on the CRM application should first check if the item is in the inventory, if yes the item ordered is deducted from the inventory. Once the inventory is modified purchase order is created and the user facing system is updated with the latest status.  The systems that would be involved in this process are an order management system, purchase order system, inventory system, billing system. The system needs to ensure that the entire process of online shopping happens as a single transaction. 

Scenario 2:  Online transfer  

Consider online money transfer from Bank A to Bank B. The process should deduct money from Bank A and credit it to Bank B. There would be a database updates in both these banks that need to be altered to carry out this process and we need to ensure that this runs in a single transaction.

Points obtained by analyzing these scenarios

  • There is more than one system involved.
  • Data maintained by these systems could be on
    • Same schema with different tables
    • Different schema
    • Different database instances.
  • These systems could be geographically separated as it’s handled by third parties

Distributed Transactions

The key point that is derived from the above scenarios is each scenario should run as a single unit. Typical implementation would be embedding the steps within a single transaction. This would ensure that data integrity is maintained.

Transactions where there is more than one resource or system are called ‘Distributed Transactions’. Distributed Transactions ensure that all resources / systems would work as a single unit irrespective of the location they reside. Resources here could be multiple database or message queues.  Distributed transactions can contain more than one simple transaction with in them.

Distributed transaction is part of extended architecture that follows X/Open group standards. The XA standard specifies that the concept of transaction manager.  These Transaction managers are referred to as Global transaction managers, and the transactions that participate in a distributed transaction are referred to as XA transactions. 

Transaction Manager

All transactions that participate in distributed transactions are coordinated by transaction manager. Transaction manager is typically invoked, when the code during execution encounters request to start a transaction.

The transaction manager performs the following activities

  • Demarcation – It keeps track of the start and end of every transaction
  • Propagation – It propagate the behavior of previously opened transaction based on the attributes set.
  • Context of execution is maintained – It ensures that operations are executed in a single context irrespective of the number of resources that are involved. 

Currently most technologies come with their own API for transaction manager. Listed below are few of the well-known API’s: 

  • JBOSS TS, Atomikos, Bitronix are few of the transaction manager implementation in JAVA
  • Distributed Transaction Controller (MSDTC) is a component from Microsoft  for implementing distributed transactions
  • DB2 has come up with DB2 transaction manager for XA transactions

 Resource Manager

Transaction managers don’t deal with the resources (database or message queue) directly during the execution of transaction.  They interact with the resources via Resource managers
  • A XA data source exists for a database
  • XA data source would have a pool of XA connections
  • XA Connection instance corresponds to a single database session.
  • XA Connection produce XA resource instance, Transaction manager uses the XA resource instance for processing distributed transaction

Communication in Distributed Transactions

XA transactions communicate using two phase commit protocol. Two phase protocol is a type of Atomic commitment protocol. This protocol communicates with all the participants of distributed transactions on whether they should commit or rollback. This communication happens in two phases, Prepare phase and Commit phase

Prepare Phase:  In this phase, the transaction manager sends a ‘Prepare’ message to all resource managers participating in the transaction. This request tells the resources to take necessary steps to be ready to either commit or abort a transaction. All the required objects/data that the resource would need to commit the transaction are locked by the corresponding resource. Any query will not fetch the locked data until both the phase is completed.  Once this is done the participants send back an acknowledgement to transaction manager with one of the following three responses

Status

Description

Prepared

This status indicates that the data in the node is been modified by the statement in the distributed transactions and the node is ready for a commit or rollback. The participants execute the code until the commit state. All resources that are required to commit the transaction are allocated. Once the node is prepared the transaction is said to an in-doubt transaction.

Read only

This indicates that the data in the node is not modified by the statement in the distributed transaction.  Prepare is not necessary

Abort

The node cannot enter prepare statement. It releases all the resource that is held by the node.

Commit Phase: Once the transaction manager receives acknowledgement from all the nodes, it initiates a commit message, the participants performs the requested action with their local transaction resources. 

Failures

Following are reasons for communication failures in a two phase commit protocol.

1.  Network failure between the transaction manager and nodes

2.  Hardware crash of one of the node

Here are few failure scenarios and how distributed transaction API’s address it 

Scenario :  One of the node could not send and prepare phase acknowledgement

If the transaction manager does not receive acknowledgement from all nodes on prepare phase, then the transaction manager initiates a Rollback request.

Scenario :  One of the nodes could not send a commit complete acknowledgement.

When the node is up, it sends a message to the transaction manager to check the outcome of the transaction.  

Scenario : 

-  Main node, where the transaction manager resides, crashes before receiving  acknowledgements from nodes

-  Communication failed before the nodes received the prepare/commit phase request

When the server is restarted, the node sends a ‘still waiting for acknowledgement’ message to all nodes.

Scenario :  Node was expected to rollback the transaction, but a commit was expected by the transaction manager

When all nodes are in in-doubt for long time, it commits the transaction since all nodes are in prepared state. If the transaction manager initiates a rollback command. The nodes throw back exception to the manager indicating that it is already committed.

Note: Transaction Manager normally logs these exceptions, in case of any discrepancy of data or locking of resources due to major hardware crash or network failure manual intervention is performed. 

Distributed transaction in Non-functional requirement

Performance  

Performance of distributed transaction is to be little more costly when compared to single transactions. This is because it involves in additional communication between resources and the process of creating transaction logs for each communication. This performance is negligible when weighted against the importance business functionality to have transaction run across multiple resources with data integrity being the primary objective.

Security

Distributed transactions are more vulnerable to security threats. This is mainly due to the long duration for which a transaction would be open as there are multiple resources involved and the possibility of these resources being geographically separated.  Listed are below are few points that can be taken care during design of DT.

·  Ensure the access level of the application users is limited to only relevant data/resource. All transactions in a distributed environment get executed based on the access privilege of the logged in user

·  Use cryptography to encrypt data, especially when the resources are geographically separated

·  Ensure sensitive transactions perform in multiple steps requesting authentication over a secured channel

·  Use IPsec or SSL to restrict who can connect to your database.

Scalability

Adding additional resources to be part of distribution transactions involves making few configuration changes.  However this is not advised as it would have a huge impact on performance.  Recommended practice would be to reduce the constraint or need for distributed transaction than locking multiple resources involved in a transaction throughout the duration of the transaction.


Download “The DevOps Journey - From Waterfall to Continuous Delivery” to learn learn about the importance of integrating automated testing into the DevOps workflow, brought to you in partnership with Sauce Labs.

Topics:

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}