DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Databases
  4. When MySQL Lies: The Truth About Slave Lag and Seconds_Behind_Master

When MySQL Lies: The Truth About Slave Lag and Seconds_Behind_Master

If you're diving into parallel replication with MySQL, take note of a bug that produces incorrect slave lag numbers in certain circumstances—and how to fix it.

Marcelo Altmann user avatar by
Marcelo Altmann
·
Feb. 06, 17 · Tutorial
Like (0)
Save
Tweet
Share
6.85K Views

Join the DZone community and get the full member experience.

Join For Free

seconds_behind_masterIn today’s blog, I will cover an issue with seconds_behind_master that one of our clients faced when running slave_parallel_works > 0. We found out that the reported seconds_behind_master from SHOW SLAVE STATUS was lying. To be more specific, I’m talking about bugs #84415 and #1654091.

MySQL will not report the correct slave lag if you have slave_parallel_workers> 0. Let’s show it in practice.

I’ll use MySQL Sandbox to speed up one master and two slaves on MySQL version 5.7.17, and sysbench to populate the database:

# Create sandboxes
make_replication_sandbox /path/to/mysql/5.7.17
# Create table with 1.5M rows on it
sysbench --test=/usr/share/sysbench/tests/db/oltp.lua --mysql-host=localhost --mysql-user=msandbox --mysql-password=msandbox --mysql-socket=/tmp/mysql_sandbox20192.sock --mysql-db=test --oltp-table-size=1500000 prepare
# Add slave_parallel_workers=5 and slave_pending_jobs_size_max=1G" on node1
echo "slave_parallel_workers=5" >> node1/my.sandbox.cnf
echo "slave_pending_jobs_size_max=1G" >> node1/my.sandbox.cnf
node1/restart


Monitor replication lag via SHOW SLAVE STATUS:

for i in {1..1000};
do
    (
        node1/use -e "SHOW SLAVE STATUSG" | grep "Seconds_Behind_Master" | awk '{print "Node1: " $2}' &
        sleep 0.1 ;
        node2/use -e "SHOW SLAVE STATUSG" | grep "Seconds_Behind_Master" | awk '{print "Node2: " $2}' &
    );
    sleep 1;
done


On a separate terminal, DELETE some rows in the test.sbtest1 table on the master, and monitor the above output once the master completes the delete command:

DELETE FROM test.sbtest1 WHERE id > 100;


Here is a sample output:

master [localhost] {msandbox} (test) > DELETE FROM test.sbtest1 WHERE id > 100;
Query OK, 1499900 rows affected (46.42 sec)
. . .
Node1: 0
Node2: 0
Node1: 0
Node2: 48
Node1: 0
Node2: 48
Node1: 0
Node2: 49
Node1: 0
Node2: 50
. . .
Node1: 0
Node2: 90
Node1: 0
Node2: 91
Node1: 0
Node2: 0
Node1: 0
Node2: 0


As you can see, node1 (which is running with slave_parallel_workers = 5) doesn’t report any lag.

The Workaround

We can workaround this issue by querying performance_schema.threads:

SELECT PROCESSLIST_TIME FROM performance_schema.threads WHERE NAME = 'thread/sql/slave_worker' AND (PROCESSLIST_STATE IS NULL  or PROCESSLIST_STATE != 'Waiting for an event from Coordinator') ORDER BY PROCESSLIST_TIME DESC LIMIT 1;


Let’s modify our monitoring script, and use the above query to monitor the lag on node1:

for i in {1..1000};
do
    (
        node1/use -BNe "SELECT PROCESSLIST_TIME FROM performance_schema.threads WHERE NAME = 'thread/sql/slave_worker' AND (PROCESSLIST_STATE IS NULL  or PROCESSLIST_STATE != 'Waiting for an event from Coordinator') ORDER BY PROCESSLIST_TIME DESC LIMIT 1 INTO @delay; SELECT IFNULL(@delay, 0) AS 'lag';" | awk '{print "Node1: " $1}' &
        sleep 0.1 ;
        node2/use -e "SHOW SLAVE STATUSG" | grep "Seconds_Behind_Master" | awk '{print "Node2: " $2}' &
    );
    sleep 1;
done


master [localhost] {msandbox} (test) > DELETE FROM test.sbtest1 WHERE id > 100;
Query OK, 1499900 rows affected (45.21 sec)
Node1: 0
Node2: 0
Node1: 0
Node2: 0
Node1: 45
Node2: 45
Node1: 46
Node2: 46
. . .
Node1: 77
Node2: 77
Node1: 78
Node2: 79
Node1: 0
Node2: 80
Node1: 0
Node2: 81
Node1: 0
Node2: 0
Node1: 0
Node2: 0


Please note that in our query to performance_schema.threads, we are filtering PROCESSLIST_STATE “NULL” and “!= Waiting for an event from Coordinator”. The correct state is “Executing Event”, but it seems like it doesn’t correctly report that state (#84655).

MySQL parallel replication is a nice feature, but we still need to make sure we are aware of any potential issues it might bring. Most monitoring systems use the output of SHOW SLAVE STATUS to verify whether or not the slave is lagging behind the master. As shown above, it has its caveats.

As always, we should test, test and test again before implementing any change like this in production!

MySQL

Published at DZone with permission of Marcelo Altmann, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • A Guide To Successful DevOps in Web3
  • Unleashing the Power of JavaScript Modules: A Beginner’s Guide
  • Type Variance in Java and Kotlin
  • Top Five Tools for AI-based Test Automation

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: