This post was originally written by Stephane Combaudon
One of the MySQL 5.6 features many people are interested in is Global Transactions IDs (GTIDs). This is for a good reason: Reconnecting a slave to a new master has always been a challenge while it is so trivial when GTIDs are enabled. However, using GTIDs is not only about replacing good old binlog file/position with unique identifiers, it is also using a new replication protocol. And if you are not aware of it, it can bite.
Replication protocols: old vs new
The old protocol is pretty straightforward: the slave connects to a given binary log file at a specific offset, and the master sends all the transactions from there.
The new protocol is slightly different: the slave first sends the range of GTIDs it has executed, and then the master sends every missing transaction. It also guarantees that a transaction with a given GTID can only be executed once on a specific slave.
In practice, does it change anything? Well, it may change a lot of things. Imagine the following situation: you want to start replicating from trx 4, but trx 2 is missing on the slave for some reason.
With the old replication protocol, trx 2 will never be executed while with the new replication protocol, it WILL be executed automatically.
Here are 2 common situations where you can see the new replication protocol in action.
It is well known that the good old
SET GLOBAL sql_slave_skip_counter = N is no longer supported when you want to skip a transaction and GTIDs are enabled. Instead, to skip the transaction with
GTID XXX:N, you have to inject an empty transaction:
mysql> SET gtid_next = 'XXX:N'; mysql> BEGIN; COMMIT; mysql> SET gtid_next = 'AUTOMATIC';
Why can’t we use
sql_slave_skip_counter? Because of the new replication protocol!
Imagine that we have 3 servers like the picture below:
Let’s assume that
sql_slave_skip_counter is allowed and has been used on S2 to skip trx 2. What happens if you make S2 a slave of S1?
Both servers will exchange the range of executed GTIDs, and S1 will realize that it has to send trx 2 to S2. Two options then:
- If trx 2 is still in the binary logs of S1, it will be sent to S2, and the transaction is no longer skipped.
- If trx 2 no longer exists in the binary logs of S1, you will get a replication error.
This is clearly not safe, that’s why
sql_slave_skip_counter is not allowed with GTIDs. The only safe option to skip a transaction is to execute a fake transaction instead of the real one.
If you execute a transaction locally on a slave (called errant transaction in the MySQL documentation), what will happen if you promote this slave to be the new master?
With the old replication protocol, basically nothing (to be accurate, data will be inconsistent between the new master and its slaves, but that can probably be fixed later).
With the new protocol, the errant transaction will be identified as missing everywhere and will be automatically executed on failover, which has the potential to break replication.
Let’s say you have a master (M), and 2 slaves (S1 and S2). Here are 2 simple scenarios where reconnecting slaves to the new master will fail (with different replication errors):
# Scenario 1
# S1 mysql> CREATE DATABASE mydb; # M mysql> CREATE DATABASE IF NOT EXISTS mydb; # Thanks to 'IF NOT EXITS', replication doesn't break on S1. Now move S2 to S1: # S2 mysql> STOP SLAVE; CHANGE MASTER TO MASTER_HOST='S1'; START SLAVE; # This creates a conflict with existing data! mysql> SHOW SLAVE STATUS\G [...] Last_SQL_Errno: 1007 Last_SQL_Error: Error 'Can't create database 'mydb'; database exists' on query. Default database: 'mydb'. Query: 'CREATE DATABASE mydb' [...]
# Scenario 2
# S1 mysql> CREATE DATABASE mydb; # Now, we'll remove this transaction from the binary logs # S1 mysql> FLUSH LOGS; mysql> PURGE BINARY LOGS TO 'mysql-bin.000008'; # M mysql> CREATE DATABASE IF NOT EXISTS mydb; # S2 mysql> STOP SLAVE; CHANGE MASTER TO MASTER_HOST='S1'; START SLAVE; # The missing transaction is no longer available in the master's binary logs! mysql> SHOW SLAVE STATUS\G [...] Last_IO_Errno: 1236 Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.' [...]
As you can understand, errant transactions should be avoided with GTID-based replication. If you need to run a local transaction, your best option is to disable binary logging for that specific statement:
mysql> SET SQL_LOG_BIN = 0; mysql> # Run local transaction
GTIDs are a great step forward in the way we are able to reconnect replicas to other servers. But they also come with new operational challenges. If you plan to use GTIDs, make sure you correctly understand the new replication protocol, otherwise you may end up breaking replication in new and unexpected ways.
I’ll do more exploration about errant transactions in a future post.