Over a million developers have joined DZone.

What I Learned While Migrating a Customer MySQL Installation to Amazon RDS

DZone's Guide to

What I Learned While Migrating a Customer MySQL Installation to Amazon RDS

· Java Zone
Free Resource

Try Okta to add social login, MFA, and OpenID Connect support to your Java app in minutes. Create a free developer account today and never build auth again.

Originally written by Michael Coburn

Hi, I recently had the experience of assisting with a migration of a customer MySQL installation to Amazon RDS (Relational Database Service). Amazon RDS is a great platform for hosting your MySQL installation and offers the following list of pros and cons:

  • You can scale your CPU, IOPS, and storage space separately by using Amazon RDS. Otherwise you need to take downtime and upgrade physical components of a rack-mounted server.
  • Backups, software version patching, failure detection, and (some) recovery is automated with Amazon RDS.
  • You lose shell access to your DB instance
  • You lose SUPER privilege for regular users. Many SUPER-type statements and commands are provided for as a Stored Procedure.
  • It is easy to set up multiple read replicas (slaves in READ_ONLY=1 mode).
  • You can set up a secondary sychronous instance for failover in the event your primary instance fails.

While this article is written to be Amazon RDS-specific it also has implications for any sort of migration.

  1. The only way to interface with RDS is through mysql client, which means loading data must be done using SQL. This means you need to use mysqldump or mydumper, which can be a large endeavour should your dataset be north of 500GB — this is a lot of single threaded activity!  Think about not only how long dumping and loading will take, but also factor in how much time it will take for replication to catch up on the hours/days/weeks your dumping and loading procedure took.  You might need to allocate more disk space and Provisioned IOPS to your RDS node in order to improve disk throughput, along with a change to innodb_flush_log_at_trx_commit, and sync_binlog.
  2. RDS is set to UTC (system_time_zone=UTC) and this cannot be changed as in Parameter Groups you will see that default_time_zone is set as Modifiable=false. This can bite you if you are planning to use RDS as a slave for a short while before failing the application over to Amazon RDS.  If you have configured binlog_format=STATEMENT on your master and you have TIMESTAMP columns, this will lead to differences in RDS data set for absolute values ’2014-07-24 10:15:00′ vs NOW(). It is also a concern for the Developer who may not be explicitly declaring their MySQL connections to set an appropriate time zone. Often the best piece of advice can be to leave all database data in UTC no matter where the server is physically located, and deal with localization at the presentation layer.
  3. Amazon RDS by default has max_allowed_packet=1MB. This is pretty low as most other configs are 16MB so if you’re using extended-insert (by default, you are), the size of each insert statement will be close to 16MB and thus can lead to errors related to “packet too big” on Amazon RDS side, thus failing out an import.
  4. Amazon RDS does not support the SUPER privilege for regular users. For example, this becomes quite a challenge as many tools (Percona Toolkit) are authored to assume you have SUPER-level access on all nodes — simple tasks become vastly more complicated as you need to think of clever workarounds (I’m looking at you pt-table-sync!).
  5. Triggers and views thus cannot be applied using the default mysqldump syntax which includes SQL DEFINER entries — these lines are there so that a user with SUPER can “grant” another user ability to execute the trigger/view. Your load will fail if you forget this.
  6. Consider running your load with –force to the mysql client, and log to disk stderr/stdout so you can review errors later. It is painful to spend 4 days loading a 500GB database only to have it fail partially through because you forgot about SQL DEFINER issue..
  7. Consider splitting the mysqldump into two phases: –no-data so you dump schema only, and then –data-only so you get just the rows. This way you can isolate faults and solve them along the way.
  8. Skipping replication events is SLOW. You don’t have ability to do sql_slave_skip_counter (since this requires SUPER), instead you need to use an Amazon RDS function of mysql.rds_skip_repl_error. Sadly this Stored Procedure takes no argument and thus it only skips one event at a time. It takes about 2-3 seconds for each execution, so if you have a lot of events to skip, that’s a problem. Having to skip ANYTHING is indication that something went wrong in the process, so if you find yourself in the unenviable position of skipping events, know that pt-table-checksum should be able to give you an idea how widespread is the data divergence issue.
  9. pt-table-sync doesn’t work against Amazon RDS as it is written to expect SUPER because it wants to do binlog_format=STATEMENT in session, but that’s not allowed. Kenny Gryp hacked me a version to just skip this check, and Kenny also reported it for inclusion in a future release of Percona Toolkit, but in the meantime you need to work around the lack of SUPER privilege.
  10. pt-table-sync is SLOW against RDS. As pt-table-sync doesn’t log a lot of detail about where time is spent, I haven’t completely isolated the source of the latency, but I suspect this is more about network round trip than anything else.
  11. innodb_log_file_size is hardcoded to 128MB in Amazon RDS, you can’t change this.  innodb_log_files_in_group is not even showing up in Parameter Groups view but SHOW GLOBAL VARIABLES reports as 2. So you’re cookin’ on 256MB, if your writes are heavy this may become a bottleneck with little workaround available in MySQL.
  12. CHANGE MASTER isn’t available in RDS. You define RDS as a slave by calling a stored procedure where you pass the appropriate options such as CALL mysql.rds_set_external_master.

For those of you wondering about the SUPER-privilege, I was fortunate that Bill Karwin from Percona’s Support team took the time to review my post and suggested I dig into this deeper, turns out that Amazon didn’t hack MySQL to remove the SUPER privilege, but instead run the Stored Procedures with security_type of DEFINER:

mysql> select db,name,type,language,security_type,definer from proc where name = 'rds_external_master' G
*************************** 1. row ***************************
 db: mysql
 name: rds_external_master
 language: SQL
security_type: DEFINER
 definer: rdsadmin@localhost
1 row in set (0.08 sec)
mysql> show grants for 'rdsadmin'@'localhost';
| Grants for rdsadmin@localhost                                                                        |
1 row in set (0.07 sec)

So for those of you working with Amazon RDS, I hope that this list saves you some time and helps our your migration!  If you get stuck you can always contact Percona Consulting for assistance.

Build and launch faster with Okta’s user management API. Register today for the free forever developer edition!


Published at DZone with permission of Peter Zaitsev, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}