Enabling Crash-safe Slaves with MySQL 5.6
This post comes from
Being able to configure slaves to be crash-safe is one of the major improvements of MySQL 5.6 with regards to replication. However, we noticed some confusion on how to enable this feature correctly, so let’s clarify how it should be done.
1. Stop MySQL on slave
relay_log_info_repository = TABLE and
relay_log_recovery = ON in my.cnf
3. Restart MySQL and relax
The Gory Details
To fully understand why you should change the above settings if you want crash-safe slaves, let’s first look at the reasons why replication can break when a slave crashes.
On a slave, replication involves two threads: the I/O thread, which copies the binary log of the master to a local copy called the relay log, and the SQL thread, which then executes the queries written in the relay log. The current position of each thread is stored in a file:
master.info for the I/O thread and
relay-log.info for the SQL thread.
So far, so good. The first problem is that these files are not synced to disk each time they are written to; whenever there is a crash, positions that are stored are likely to be incorrect. MySQL 5.5 has a fix for this: you can
set sync_master_info = 1 and
sync_relay_log_info = 1 to make sure both files are written and synced to disk after each transaction. Syncing is not free, of course, but if you have write-back cache, these settings can be valuable.
But wait, even with
sync_master_info = 1 and
sync_relay_info = 1, bad things can happen. The reason is that replication information is written after the transaction is committed. So if a crash occurs after the transaction is committed and before the replication information is updated, replication information will be wrong when the server restarts and a transaction could be executed twice. The effect will depend on the transaction; replication may still run fine, it may be broken, or inconsistencies can even be silently created.
MySQL 5.6 tackles this problem by letting us store replication information in tables instead of files (a
mysql.slave_relay_log_info table is created when
relay_log_info_repository = TABLE and the
mysql.slave_master_info table is created with
master_info_repository = TABLE). The idea is simple: we can include the update of the replication information inside the transaction, making sure it is always in sync with the data.
In pseudo-code, instead of having:
START TRANSACTION; -- Statement 1 -- ... -- Statement N COMMIT; -- Update replication info files
The server now behaves as if we had:
START TRANSACTION; -- Statement 1 -- ... -- Statement N -- Update replication info COMMIT;
Unfortunately, this is not as straightforward as it may appear. For the SQL thread, it works well because the server can update the
slave_relay_info_info table at the same time it commits a transaction. However, for the I/O thread, the update of the table is not related to any transaction execution, so how can the server know when to update the table?
The answer is: it is controlled by
sync_master_info. The default is 10,000, meaning that the I/O thread position is only updated every 10,000 transactions. This is obviously not good to make the slave crash-safe. One solution is to set
sync_master_info = 1, but as I mentioned, it may have a performance impact (this is why 1 is not the default setting).
However, there is a more elegant solution -- using
relay_log_recovery = ON, which will require a MySQL restart. This setting makes sure that when the server starts up, the position for the I/O thread is recovered from the
slave_relay_log_info table, which is always up-to-date. Thus, you do not even need to store I/O thread information in a table for the slave to be crash-safe. In other words, setting
master_info_repository = TABLE is not necessary.
As a final side-note, once
relay_log_info_repository = TABLE, the
sync_relay_log_info setting becomes irrelevant, since the table will always be updated at each transaction commit no matter the value of the setting. So, you can safely remove it from your configuration file.
I hope this post will help you benefit from this great feature!