The 5 Point Checklist Backup and Recovery in EC2
The Cloud Zone is brought to you in partnership with Mendix. Better understand the aPaaS landscape and how the right platform can accelerate your software delivery cadence and capacity with the Gartner 2015 Magic Quadrant for Enterprise Application Platform as a Service.
1. Perform multiple types of backups
2. Keep non-proprietary backups offsite
3. Test your backups – perform firedrills
4. Encrypt backups in S3
5. Perform Replication Integrity Checks
Perform Multiple Types of Backups
Your database tier is typically your primary datastore, so it’s backups are often the most crucial. Snapshots of EBS volumes are powerful and fast ways to perform full database backups in the AWS environment. This involves locking all tables briefly, and running the snapshot command, followed by a release of all those table locks. Be sure to test this process to ensure that the temporary locks on the database don’t create a pileup on your webservers.
Keep Non-proprietary Backups Offsite
The EC2 snapshots are great, but they only work in EC2. So you’ll also want to perform other types of backups. Personally I like having a few different options in the event I need to restore. Logical backups are great for restoring one table, but are slow for restoring the entire database. Hotbackups are great and fast to restore the whole database, but take a lot of space so may not be as efficient if you just need to restore one table. So I like to have both. Percona’s xtrabackup and the associated innobackupex script provide an open-source hotbackup solution for MySQL. Get it! Then intersperse those backups with mysqldumps as well. Alternating days, for example.
Test Your Backups – Perform Firedrills
Any good disaster recovery plan must be thoroughly tested. Set aside the time to actually run through this from start to finish. This is where the cloud really excels to your advantage. Spinup all the servers that makeup your entire environment, load balancer, webservers, database servers, checkout all the source code, and configuration files. You put your configuration files in version control, right? Then restore the database. This firedrill tests your server spinup scripts, your version control of source code and configuration files, and your database backups. All of these pieces must be in place for the fire drill to succeed. Lastly running through the whole process forces you to document details, and you find out how long your disaster recovery would actually take.
Encrypt Backups in S3
S3 stores objects as private by default, however it makes sense for particularly sensitive data to also encrypt those backups. Remember you control access to your encryption keys but not where the data is stored or where it might move around. So it can’t hurt to be extra cautious. Here’s an excellent article on the topic. Using mk-query-digest to checksum
Perform Replication Integrity Checks
A MySQL slave or passive master database can be a great way to offload backups away from the primary database server. This reduces impact to your customers while backups are running. But MySQL replication is not bulletproof. The slaves can drift out of sync with the master silently without throwing errors. That’s why it’s important to use an integrity checking tool like Maatkit’s mk-table-checksum. This tool can be set in cron to perform checksums on a slice of your database periodically.
Here’s an excellent article on using the tool. Ongoing MySQL Integrity Checking with mk-table-checksum