One of our Percona Support customers recently reported that Percona XtraBackup failed with a page corruption error on an InnoDB table. The customer thought it was a problem or bug in the Percona XtraBackup tool. After investigation we found that an InnoDB page was actually corrupted and a Percona XtraBackup tool caught the error as expected and hence the backup job failed.
I thought this would be an interesting topic and worthy of a blog post. In this article I will describe the innochecksum tool, when and how to use it and what are the possible fixes if an InnoDB table suffers from page corruption.
The innochecksum tool is an offline tool that prints checksums for InnoDB files. This tool reads an InnoDB tablespace file, calculates the checksum for each page and reports mismatches, if any. A checksum mismatch is an indication of corrupt pages. Being as offline tool, innochecksum can’t be used on tablespace file that a MySQL server is currently using, hence you need to shutdown the server prior to running the innochecksum tool. If you try to run the innochecksum tool on a running MySQL server, then there is a chance that innochecksum crashes or reports a bad checksum for a good page, resulting false positives results. There is chance when you run innochecksum on a tablespace file that is opened by mysqld, that pages are dirty and not checksummed yet by the InnoDB storage engine itself.
The point: don’t run innochecksum against a running server.
InnoDB corruption can be caused by many factors (e.g. power lost, faulty hardware, bugs). The InnoDB storage engine validates calculated checksum while reading pages from a tablespace on disk to the stored checksum in the page. In case, InnoDB finds page checksum mismatch it will force down the MySQL server.
Let me show you a page corruption error identified by Percona XtraBackup during a backup run in which the backup failed afterward.
xtrabackup:Database page corruption detected atpage25413,retrying... xtrabackup:Database page corruption detected atpage25413,retrying... xtrabackup:Database page corruption detected atpage25413,retrying...
First, we need to identify if the tablespace is really corrupted for that particular table. I do that with the help of the innochecksum utility as shown below. As I mentioned earlier, make sure to shut down MySQL before using the innochecksum tool.
I passed the -p (page) flag for innochecksum to only check the specific pages that were reported corrupt by Percona XtraBackup. Without passing any option to the innochecksum tool, it will check entire tablespace for corruption which will required additional server downtime. The innochecksum tool also supports the -d (debug) option to print the checksum for each page and the -v (verbose) parameter to print a progress indicator. You can find more details in the manual. If the tool reports page corruption then database table is really corrupted as below.
page25413invalid(fails log sequence number check)
In order to fix this issue, the first thing you should try is to mysqldump the corrupted table and If mysqldump succeeded then problem exists in secondary indexes for that tablespace. This is because the mysqldump utility doesn’t touch indexes as indexes are created after all rows are inserted.
If mysqldump succeeds then the problem is associated with indexes. I would suggest following options to fix the corruption.
— Execute OPTIMIZE TABLE on that table which rebuilds indexes. The table will be locked during the operation prior to MySQL 5.6.17. Since MySQL 5.6.17 OPTIMIZE TABLE is an online operation.
— Rebuild table with the pt-online-schema-change tool from Percona Toolkit. This will give the same result as OPTIMIZE TABLE a non-blocking way as the pt-online-schema=change tool is online schema change tool.
— Drop all secondary indexes and then recreate them. The table will be locked during that operation for writes only. Again, you can use pt-online-schema-change tool for this purpose without sacrificing read/writes ability on the table during the drop and create indexes operation.
Finally, I would suggest to re-run the innochecksum tool to verify the tables integrity again as this will make sure there is no more page corruption. In this case we found that the table was actually corrupted and fixing table corruption through the backup/reload table fixed the problem and Percona XtraBackup ran fine during the next run.
It is possible that mysqldump crashes a MySQL server for a corrupted table. Fortunately, Percona Server contains innodb_corrupt_table_action which you can enable. The configuration variable is dynamic in nature, this means enabling it doesn’t requires a MySQL server restart. Prior to Percona Server 5.6 innodb_corrupt_table_action was known as innodb_pass_corrupt_table. Once you enable this option, you can try mysqldump again. If you are using Oracle MySQL then I would suggest to try this with innodb_force_recovery in case mysqldump fails to dump the table contents.
As a side note, if your backup is successful without any errors while performing a backup with Percona Xtrabackup, this means your InnoDB tables don’t have any page checksum mismatch or corruption. Percona XtraBackup can validate page checksums and in case of errors it logs error and exists as I mentioned above.
There is also a modified version of the innochecksum made available by Facebook’s Mark Callaghan and can be found in this bug report which provides extra stats on tablespace undo blocks. There is another tool made by Google’s Jeremy Cole known as the InnoDB Ruby Tool to examine the internals of InnoDB.
- Innochecksum is an offline InnoDB checksum tool. This means you must stop MySQL server. Otherwise it produces “Unable to lock file” error since MySQL 5.7.
- Old versions of innochecksum only supports files up to 2GB in size. However, since MySQL 5.6 innochecksum supports files greater than 2GB in size.
- Percona Server variable innodb_corrupt_table_action is supported on tables existing in their tablespace (i.e. innodb_file_per_table).
- If you are using compressed tables (ROW_FORMAT=COMPRESSED) , then you must use innochecksum from MySQL 5.7.2 or greater, as earlier versions of innochecksum don’t support compressed tables. Check this bug for details.
New Features for the innochecksum tool from MySQL 5.7:
- As I mentioned above, since MySQL 5.7 innochecksum supports file sizes greater than 2GB.
- Since MySQL 5.7 you can log the output with the –log option.
- –page-type-summary option added for page type summaries.
- MySQL 5.7 also includes another nice option –page-type-dump which dumps the details of each page to standard output (STDOUT) or standard error (STDERR).
- Since MySQL 5.7 innochecksum can be executed on multiple user-defined system tablespace files.
- Since MySQL 5.7 innochecksum can be executed on multiple system tablespace files.
You can read more about this is in the MySQL 5.7 manual page of innochecksum.
Conclusion: In this post, we identified InnoDB page corruption using the logs generated by Percona XtraBackup and fixed them by using the mysqldump tool. But again, unfortunately, there are chances that Percona XtraBackup will not always fail in the same way when it finds corruption. So in some cases, it’s not easy to tell whether Percona XtraBackup has failed due to a bad checksum or a bug of its own. But in most cases, page corruption is the culprit if Percona XtraBackup fails to complete.
To summarize, I would say that Percona XtraBackup is a good way of verifying whether or not InnoDB pages are corrupted – and you can also verify the same thing via the mysqldump utility.