MySQL Replication Troubleshooting Q and A
Check out a webinar on MySQL replication, caching, and more, then get a glimpse at some in-depth questions (and their answers).
Join the DZone community and get the full member experience.Join For Free
In this blog, I will provide answers to the Q & A for the MySQL Replication Troubleshooting webinar.
First, I want to thank everybody for attending the August 25 webinar. The recording and slides for the webinar are available here. Below is the list of your questions that I wasn’t able to answer during the webinar, with responses:
Q: Hi Sveta. One question: How is it possible to get N previous events using the SHOW BINLOG EVENTS command? For example, the position is 999 and I want to analyze the previous five events. Is it possible?
A: Not, there is no such option. You cannot get the previous five events using SHOW BINLOG EVENTS. However, you can use mysqlbinlog with the option --stop-position and tail its output.
Q: We are having issues with inconsistencies over time. We also have a lot of “waiting for table lock” statuses during high volume usage. Would changing these tables to InnoDB help the replicated database remain consistent?
A: Do you use MyISAM? Switching to InnoDB might help, but it depends on what types of queries you use. For example, if you often use the LOCK TABLE command, that will cause a "waiting for table lock" error for InnoDB too. Regarding data consistency between the master and slave, you need to use row-based replication.
Q: For semi-sync replication, what’s the master’s behavior when the master never received ACK from any of the slaves?
A: It will timeout after rpl_semi_sync_master_timeout milliseconds, and then switch to asynchronous replication.
Q: We’re using MySQL on r3.4xlarge EC2 instances (16 CPU). We use RBR. innodb_read_io_threads and innodb_write_io_threads =4. We often experience lags. Would increasing these to eight offer better IO for slaves? What other parameters could boost slave IO?
A: Yes, an increased number of IO threads would most likely improve performance. Other parameters that could help are similar to the ones discussed in “InnoDB Troubleshooting” and “Introduction to Troubleshooting Performance: What Affects Query Execution?” webinars. You need to pay attention to InnoDB options that affect IO ( innodb_thread_concurrency,innodb_flush_method,innodb_flush_log_at_trx_commit,innodb_flush_log_at_timeout ) and general IO options, such as sync_binlog .
Q: How many masters can I have working together?
A: What do you mean by “how many masters can [you] have working together”? Do you mean circular replication or a multi-master setup? In any case, the only limitation is hardware. For a multi-master setup you should ensure that the slave has enough resources to process all requests. For circular replication, ensure that each of the masters in the chain can handle the increasing number of writes as they replicate down the chain, and do not lead to permanently increasing slave lags.
Q: What’s the best way to handle auto_increment?
A: Follow the advice in the user manual: set auto_increment_offset to a unique value on each of servers, auto_increment_increment to the number of servers and never update auto-incremented columns manually.
Q: I configured multi threads replication. Sometimes the replication lag keeps increasing while the slave was doing “invalidating query cache entries(table)”. How should I do to fine tune it?
A: The status "invalidating query cache entries(table)" means that the query cache is invalidating entries, and has been changed by a command currently being executed by the slave SQL thread. To avoid this issue, you need to keep the query cache small (not larger than 512 MB) and de-fragment it from time to time using the FLUSH QUERY CACHE command.
Q: Sometimes when IO is slow and during lag we see info: Reading event from the relay log “Waiting for master to send event” — How do we troubleshoot to get more details.
A: The "Waiting for master to send event" state shows that the slave IO thread sent a request for a new event, and is waiting for the event from the master. If you believe it hasn’t received the event in a timely fashion, check the error log files on both the master and slave for connection errors. If there is no error message, or if the message doesn’t provide enough information to solve the issue, use the network troubleshooting methods discussed in the “Troubleshooting hardware resource usage” webinar.
Published at DZone with permission of Sveta Smirnova, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.