I recently worked on upgrading MySQL from one of very early MySQL 5.0 versions to Percona Server 5.1. This was a classical upgrade scenario which can cause surprises. Master and few slaves need to be upgraded. It is a shared database used by tons of applications written by many people over more than 5 years timeframe. It did not have any extensive test suite we could use for validation. As you might guess in such cases some of the original authors have moved on and nobody is exactly sure what application does or does not do with the database. Database is production critical with serious role in serious company so we can’t just do a reckless upgrade
First we needed to do a sanity check on existing replication setup.
As we’re checking replication consistency down the road we need to make
sure replication is in sync to begin with to avoid false positives. mk-table-checksum is a tool to do it. It turned out replication indeed had an issue replicating
triggers. The problem should be fixed by upgrade so we just have to keep this into account.
We move database to MySQL 5.1 As the database size is relatively small we do mysqldump and load which is a safest way, considering we’re speaking about 4 years worth of changes in versions. We also ran mysql_fix_privilege_tables to ensure all new privileges are added, which is something I frequently see forgotten
Next step is setup MySQL 5.0 to 5.1 replication to see if it runs properly. It turns out it does not because of the old bug which I’ve also seen causing upgrade problems in number of other environments. INSERT ON DUPLICATE KEY UPDATE had a unfair share of replication issues in MySQL 5.0. There are number of ways the problem can be solved but first we decide to see how broad is it. We let Slave to replicate with skip-slave-errors=1105 to see if we get any other problems spotted and in the meanwhile we go over binary logs for the last month to see how frequently this functionality is used. Happily there are only few INSERT ON DUPLICATE KEY UPDATE query instances, and only one of them into table with AUTO_INCREMENT column (and so affected by this bug). It was easy enough to change the single application not to use INSERT ON DUPLICATE KEY UPDATE in this instance so it was done.
So replication was running properly but does data match ? (This also
would cover data improperly loaded with mysqldump if there is such).
We stopped 5.0 and 5.1 slave at the same position and used
mk-table-checksum to ensure the data is in sync. mk-table-checksum can
use replication to check consistency but comparing 2 servers directly is
faster and we had a spare capacity which we could use. First we
ran the check using default CHECKSUM TABLE algorithm. We got number of
tables reporting wrong checksums while running SELECT INTO OUTFILE and
diffing these files reported no changes. It turns out there are some
subtle changes to CHECKSUM TABLE over the years which could report
different checksum in some cases. Rerunning check using BIT_XOR
algorithm eliminated those false positives. Another table remained
though. We used mk-table-sync –print
as a diff tool for MySQL to see what is different in the tables. It turned out one of the float columns stored “-0″ in MySQL 5.0 but it was displayed as “0″ when data loaded to Percona Server 5.1. This was not the issue for application and could be ignored.
So at this point we were sure the write traffic replicates properly to the new setup. It was the time to check how read traffic behaves. We stopped both slaves at the same position again and used tcpdump and mk-query-digest to get sample read traffic from both master and slave. –sample=50 (or similar) option is important to check only limited number of samples for each query type – otherwise it can take a lot of time. Running mk-upgrade with these queries showed some results differences which turned out to be false positives too – thanks to TABLE CHECKSUM mk-upgrade uses by default to check result sets. –compare-results-method rows helped to remove them and we were down to only query time differences. In most cases query time differences were not significant or Percona Server 5.1 did better but there were couple of queries where optimizer plan changed to significantly worse one and they were flagged to be fixed.
At this point we were confident enough Slaves can handle the traffic and we could put them in production. Before upgrading Master however we had to think about rollback plan if something goes wrong and we need to go back to MySQL 5.0 on the master. To do this we set up replication from Percona Server 5.1 back to MySQL 5.0 and performed the same checks again – happily replication worked and there were no “drift”. This allows us to simply to hook up old MySQL 5.0 and all it slaves as a slave off new master and keep it for some time, with rollback to old setup being trivial. This was the best choice as with New MySQL version upgrade involves new Operating System and hardware and any of them could be potential cause of rollback.
MySQL Upgrade in my opinion is the process where hiring external consultant it especially makes sense. The team, even if it includes skilled MySQL DBA typically does not need to go through major version upgrades more frequently than 3-5 years, so unless there are a lot of applications being upgraded by the same team it is hard to archive experience. Also problems you encounter during upgrade are very different depending on the upgrade version – upgrade from MySQL 4.1 to 5.0 had a lot of different issues than upgrade from MySQL 5.0 to 5.1
Also, Maatkit is Awesome, though I believe you know already.