Recover Distributed Transactions in MySQL
How to identify and troubleshoot distributed (XA) transactions stuck in the prepared state in MySQL using XA recover, XA commit, and XA rollback commands.
Join the DZone community and get the full member experience.
Join For FreeDistributed transactions, also known as XA transactions, emerged to address the complexity of coordinating transactions across multiple databases or systems in a distributed environment. Imagine you’re conducting an orchestra where each musician represents a different database or service. Just like ensuring harmony in music requires precise coordination, maintaining transactional integrity across distributed systems demands careful orchestration.
This is where a two-phase commit (2PC), a vital aspect of XA transactions, steps in. 2PC acts as the conductor, ensuring that all musicians (or database participants) are ready to commit before the final note is played. Just as every instrument must be in tune before a symphony reaches its crescendo, 2PC ensures that all components of a distributed transaction are in sync before proceeding, thus guaranteeing the integrity of the transaction across the distributed landscape.
Typical use cases include applications using more than one database system for the same transaction. In the Java ecosystem, a use case might be an Enterprise Application (i.e. an EAR deployed on an application server) using both JPA and JMS with JTA coordinating the distributed transaction.
When it comes to MySQL being one of the systems participating, the standard flow would be:
XA START <xid>;
-- [... SQL Statements ...]
XA END <xid>;
XA PREPARE <xid>;
XA COMMIT <xid>;
<xid>
is the transaction ID, a unique identifier generated by the transaction coordinator (i.e., JTA). When the Enterprise Application uses JPA with a persistence.xml
configured to use JTA:
<persistence-unit name="samplePU" transaction-type="JTA">
<!-- [...] -->
</persistence-unit>
…SQL statements get wrapped in XA transactions.
Most of the time, the above flow works flawlessly. Until the application disconnects the session between PREPARE
and COMMIT
. This is when interesting things happen…
Symptoms
A transaction being stuck in the PREPARED
state can go undetected for some time. As far as the application is concerned, the database was inaccessible so it might retry the transaction and succeed.
But for MySQL, the transaction is still in the prepared state, waiting for a final verdict: commit or rollback. You can locate such transactions in the output of SHOW ENGINE INNODB STATUS
:
---TRANSACTION 39898344, ACTIVE (PREPARED) 1314869 sec
4 lock struct(s), heap size 1128, 17 row lock(s), undo log entries 32
Ouch! More than 15 days ago. What’s worse is that you probably don’t casually check the Innodb status output: it’s the locks that will force you to investigate. Row locks or table locks will cause timeouts for no apparent reason. Then you start searching, probably using:
SELECT * FROM information_schema.innodb_trx;
Aha! There has indeed been a transaction active for that long! Problem solved? Not yet. Here’s the surprise: TRX_MYSQL_THREAD_ID is 0. No thread running this transaction, no correlation with the running processes, and nothing to KILL
to get rid of the transaction. You decide to bite the bullet and restart the server. Still no joy: the transaction is still there, waiting for the verdict. As annoying as it might be, it makes total sense. You would want the transaction to be there but don’t know how to resolve the situation. Yet…
Recovery
In order to commit or rollback the transaction, you only need the transaction’s ID. You can list all transactions in the PREPARED
state by issuing:
XA RECOVER;
The output of this command is not exactly user friendly, so you might want to try the following:
XA RECOVER CONVERT XID;
You need XA_RECOVER_ADMIN
privilege for this command, or you will get a not-very-helpful error message:
SQL Error [1401] [XAE03]: XAER_RMERR: Fatal error occurred in the transaction branch — check your data for consistency
This will get us the precious XID
in hexadecimal. But still, this is not yet usable by XA COMMIT
or XA ROLLBACK
. For whatever reason, both commands expect the XID
broken into three parts: xid: gtrid [, bqual [, formatID ]]
:
gtrid is a global transaction identifier, bqual is a branch qualifier, and formatID is a number that identifies the format used by the gtrid and bqual values. As indicated by the syntax, bqual and formatID are optional. The default bqual value is ’’ if not given. The default formatID value is 1 if not given.
There is some string manipulation required:
- gtrid: It’s the first N bytes of the XID reported by
XA RECOVER
, where N is in thegtrid_length
column of the same - bqual: It’s the next M bytes of the XID reported by
XA RECOVER
, where M is in thebqual_length
column of the same - formatID: This is available in the column
formatID
When you are done with slicing:
XA COMMIT <gtrid> , <bqual> , <formatID>
-- ...or...
XA ROLLBACK <gtrid> , <bqual> , <formatID>
Problem solved!
Opinions expressed by DZone contributors are their own.
Comments