Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Data Integrity and Performance Considerations in MySQL Semi-Synchronous Replication

DZone's Guide to

Data Integrity and Performance Considerations in MySQL Semi-Synchronous Replication

When taking advantage of MySQL's semi-synchronous replication feature, you're going to want to take into account some data integrity and performance considerations.

· Database Zone ·
Free Resource

Download the Altoros NoSQL Performance Benchmark 2018. Compare top NoSQL solutions – Couchbase Server v5.5, MongoDB v3.6, and DataStax Enterprise v6 (Cassandra).

MySQL semi-synchronous replication provides improved data integrity because when a commit returns successfully, it’s known that the data exists in at least two places — the master and its slave. In this article, we review some of the MySQL configurations that influence the data integrity and performance aspects of semi-synchronous replication. We’ll be using InnoDB storage engine and GTID-based replication in a 3-node replica set (master and 2 slaves), which will ensure that there is redundancy in the slaves. This means that if there are issues with one slave, we can fall back on the other.

Configurations Applicable to Both Master and Slave Nodes

These configurations guarantee high durability and consistency settings for data. That is, each committed transaction is guaranteed to be present in binary logs, and also, the logs are flushed to the disk. Hence, in the case of a power failure or operating system crash, the data consistency of MySQL is always preserved.

Configurations on the Master Node.

This option is used to configure the number of slaves that must send an acknowledgment before a semi-synchronous master can commit the transaction. In the 3-node replica set, we recommend setting this to 1 so that we always have an assurance that the data is available in at least one slave while avoiding any performance impact involved in waiting for acknowledgment from both the slaves.

This option is used to configure the amount of time that a semi-synchronous master waits for slave acknowledgment before switching back to asynchronous mode. We recommend setting this to a large number so that there is no fallback to asynchronous mode, which then defeats our data integrity objective. Since we’re operating with 2 slaves and rpl_semi_sync_master_wait_for_slave_count is set to 1, we can assume that at least one of the slaves does acknowledge within a reasonable amount of time, thereby minimizing the performance impact of this setting.

Configurations on the Slave Nodes

In the slaves, it’s always important to track two positions very accurately: the current executed position of SQL thread in relay log, and the current position of the IO thread, which indicates how far the mater binary file is read and copied to slave. The consequences of not maintaining these positions are quite obvious. If there is a slave crash and restart, SQL thread can start processing transactions from a wrong offset or the IO thread can start pulling data from a wrong position in the master binary logs. Both of these cases will lead to data corruption.

It is important to ensure crash-safety of slaves through the following configurations:

Setting relay_log_info_repository to TABLE will ensure the position of the SQL thread is updated together with each transaction commit on the slave. However, it’s difficult to maintain the exact position of IO thread and flush to the disk. This is because reading master binary log and writing to slave relay log is not based on transactions. The impact on performance is very high if IO thread position has to be updated and flushed to disk after each write to slave relay logs. A more elegant solution would be to set relay_log_recovery = ON, in which case, if there’s a MySQL restart, current relay logs will be assumed to be corrupted and will be freshly pulled from the master based on the SQL thread position.

Last but not least, it’s important to note that semi-synchronous replication ensures that the data has just "reached" one of the slaves before the master committing the transaction and does not mean that the transactions are committed on the slave. Hence, it will be good to ensure that the SQL thread works with good performance. In the ideal case, the SQL thread moves hand in hand with the IO thread so we can have the benefit of the slave not only receiving the transactions, but also committing them. It’s recommended to go with a multi-threaded slave configuration so that we can get increased slave SQL thread performance. The important configurations for multi-threaded slaves are:

The above configurations are going to promise parallelism in the slave, while at the same time, preserving the order of transactions as seen on the master.

In summary, by using the above configurations on our MySQL replica set, we’re able to maintain high data integrity along with an optimal performance.

As always, if you have any questions, feel free to leave us a comment or reach out to us. 

Download the whitepaper, Moving From Relational to NoSQL: How to Get Started. We’ll take you step by step through your first NoSQL project.

Topics:
mysql ,replication ,database ,data integrity ,innodb ,database performance

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}