DZone
Java Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Java Zone > Data Inconsistencies on MySQL Replicas: Beyond pt-table-checksum

Data Inconsistencies on MySQL Replicas: Beyond pt-table-checksum

Peter Zaitsev user avatar by
Peter Zaitsev
·
Nov. 18, 14 · Java Zone · Interview
Like (0)
Save
Tweet
6.44K Views

Join the DZone community and get the full member experience.

Join For Free

Originally Written by Stephane Combaudon

Percona Toolkit’s pt-table-checksum is a great tool to find data inconsistencies between a MySQL master and its replicas. However it is sometimes not enough to know that there are inconsistencies and let pt-table-sync fix the issue: you may want to know which exact rows are different to identify the statements that created the inconsistency. This post shows one way to achieve that goal.

The issue

Let’s assume you have 2 servers running MySQL 5.5: db1 the master and db2 the replica. You want to upgrade to MySQL 5.6 using an in-place upgrade and to play safe, you will upgrade db2 (the slave) first. If all goes well you will promote it and upgrade db1.

A good thing to do after upgrading db2 is to check for potential data inconsistencies with pt-table-checksum. Once checksumming is done, you can run the following query on db2 to see if there is any data drift:

mysql>SELECT db,tbl,
  SUM(this_cnt)AStotal_rows,
  COUNT(*)ASchunks
FROM percona.checksums
WHERE(master_cnt<>this_cnt
  ORmaster_crc<>this_crc
  ORISNULL(master_crc)<>ISNULL(this_crc))
GROUP BY db,tbl;
+------+-------+------------+--------+
|db  |tbl  |total_rows|chunks|
+------+-------+------------+--------+
|mydb|items|3745563|  17|
+------+-------+------------+--------+

This indicates that inconsistencies can be found in mydb.items in 17 chunks. Now the question is: which rows are different on db1 and db2?

The solution

The previous query shows that we will find inconsistencies in 17 of the chunks pt-table-checksum used. But what is a chunk?

mysql>SELECT*
  FROM percona.checksums
  WHERE this_crc!=master_crc
  ANDtbl='items'G
*****************1.row*****************
db:mydb
  tbl:items
  chunk:28
chunk_time:0.123122
  chunk_index:PRIMARY
lower_boundary:7487511
upper_boundary:7563474
this_crc:2c11da8d
this_cnt:75964
master_crc:66a1c22c
master_cnt:75964
ts:2014-10-2201:21:26
[...]

So the first chunk with inconsistencies is chunk #28, which is the set of rows where the primary key is >= 7487511 and <= 7563474.

Let’s export all these rows on db1 and db2 instance ::

# db1
mysql>SELECT*INTO outfile'/tmp/items_db1.txt'
  FROM mydb.items
  WHERE idBETWEEN7487511AND7563474;
# db2
mysql>SELECT*INTO outfile'/tmp/items_db2.txt'
  FROM mydb.items
  WHERE idBETWEEN7487511AND7563474;

Then let’s use diff to isolate non-matching rows

# Using diff to compare rows
# diff items_db1.txt items_db2.txt
75872,75874c75872,75874
<75633822127002302014-10-2202:51:33
<75633832127002402014-10-2202:51:33
<75633842127002502014-10-2202:51:33
---
>75633822127002302014-10-2202:51:34
>75633832127002402014-10-2202:51:34
>75633842127002502014-10-2202:51:34
[...]

We can see that some datetime fields are off by 1 second on the 5.6 instance.

In this case, the binlogs showed queries like:

INSERT INTO items([...],posted_at)VALUES([...],'2014-10-22 02:51:33.835249');

MySQL 5.5 rounds '2014-10-22 02:51:33.835249' to '2014-10-22 02:51:33' (ignoring the fractional part), while MySQL 5.6 rounds it to '2014-10-22 02:51:34'.

Now it’s easy to fix the application so that it works both with MySQL 5.5 and 5.6 and then continue testing MySQL 5.6.

Conclusion

The method shown above is an easy way to find the exact records that are inconsistent between the MySQL master and a replica. It is not useful if you only want to resync the slave (in this case, just run pt-table-sync) but it can be a first step in understanding how inconsistencies are created.

 

MySQL Data (computing) Database Relational database

Published at DZone with permission of Peter Zaitsev, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Java: Why Core-to-Core Latency Matters
  • Growth in Java Development for Web and Mobile Apps
  • How to Configure Git in Eclipse IDE
  • 3 Predictions About How Technology Businesses Will Change In 10 Years

Comments

Java Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo