A Better MySQL Replication Heartbeat
If you’ve used MySQL replication you’ve probably discovered that slave machines can lag behind the master. Replication can also break completely, requiring hours (or days) for the slave hours to catch up. Monitoring is required to catch issues before the slaves get too far behind.
Jeremy Zawodny has suggested a heartbeat mechanism to monitor the delay between the master and the slave. (I’m not sure if he came up with this solution). His suggestion is to periodically insert a row into a heartbeat table on the master. Then you poll the table on the slave, waiting for the row to appear. The length of time you spend polling is a rough estimate for how far behind the slave is at that moment.
There are a few problems with this solution. Your have to write code to poll the slave. If you poll very frequently (every second) you’ll be polling too often if replication is actually hours behind. When do you stop polling? If you poll less frequently (every minute) your estimate gets that much less accurate. You also have to poll every slave if there are more than one.
A new solution
You can get MySQL to do the hard work for use by taking advantage of the difference in behavior between SYSDATE and CURRENT_TIMESTAMP. In almost all cases when a slave runs a SQL statement it temporarily sets the “current time” to the time the statement was executed on the master. If you insert NOW at 12:00:04 on the master the row will hold exactly 12:00:04 on the slave, not matter when it’s run. However, the SYSDATE function does not follow this behavior. It always uses the value of the slave’s system clock.
If you insert a row with one column holding the value of NOW or CURRENT_TIMESTAMP and the other holding the value of SYSDATE into the master, you can use the difference between the two values on the slave to see how far behind it is. If the slave is in sync the two values will be identical. If the slave is one second behind the column holding SYSDATE will be one second ahead of the column holding NOW. No polling is required to determine the current lag.
First, create the heartbeat table on the master. master_time wil hold the time the row was inserted on the master. slave_time will hold the time was inserted on the slave.
create table heartbeat( master_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP, slave_time TIMESTAMP NOT NULL ) ENGINE=MyISAM;
Periodically (I do it every minute), insert a row into the heartbeat table on the master.
insert into heartbeat(slave_time) values(SYSDATE());
To see the current replication lag, at any time calculate the difference between the current time and the time the most recent row was inserted on the master. (This estimate can be off by up to one heartbeat period). This query is run on a slave.
select timediff(NOW(), max(master_time)) from heartbeat;
You can see how the replication delay changed over time by selecting all rows within a range. This example shows delay for every minute of the current day. The delays are accurate to within 1 second (the max resolution of MySQL).
select master_time, timediff(slave_time, master_time) from heartbeat where DATE(master_time) = DATE(NOW()) order by master_time;