This blog post discusses the business continuity plan around MySQL backups and how organizations should think about them.
During the years I’ve worked in IT, I’ve learned that backups sometimes are a conceptual subject in organizations. Many companies have them, but don’t document the associated business continuation plan for them. I experienced this the hard way many many years ago, somewhere around when MySQL 5.0 was still widely used.
In most organizations, there are a couple of business continuity subjects that should be described internally. For example, what is the recovery time objective and what is the recovery point objective. Let’s go a bit deeper into both concepts:
Recovery Point Objective
A recovery point objective describes the utter limit of time data can be lost during a major incident. For example, recovery while a massive data center failure happens. One of the questions you should ask prior to these situations is what is a tolerable time point for lost information?
If you have a recovery point objective of over a day, your daily backup routines might cover this. However, if you have a recovery point objective that is more stringent, you might be forced to have some additional tools like binary streaming or incremental backup.
Recovery Time Objective
This second term and concept is also essential in building a business continuity plan. Your environment has to remain active to generate traffic and, potentially, revenue.
What are the requirements promised to your customers? Are there any SLA’s described with the customer, or is it best effort? If it’s best effort, what would be the tipping point for your users to start using an alternative service from your competitor. These are all factors to consider while determining your RTO.
If the recovery point objective and recovery time objective are stringent, this might mean additional costs might be required when buying hardware, or perhaps having a secondary data center becomes mandatory. However, it’s a cost/value discussion: what makes your company lose revenue, and what is acceptable during a crisis?
Based on your business continuity requirements, you can potentially build your DR plans. Make sure your business continuity requirements builds the DR plan, and not vice versa.
Remember mysqldump, the original tool included in MySQL? The good thing about mysqldump is that you can actually read and even edit the output of the backup before potentially restoring data, which can prove interesting during development work.
mysqldump’s biggest negative is that it’s not scalable, nor fast for backing up large amounts of data. Additionally, restoring data is even slower as you must replay the complete dataset on your new MySQL database servers (rebuild indexes, large IO, etc.).
mysqldump’s advantages include the convenience and flexibility of viewing or even editing the output before restoring. It gives you the ability to clone databases for development, and produce slight variations of an existing database for testing.
This tool is comparable to mysqldump. However, it does it in parallel, which provides significant benefits in backup time and restoration time.
Binary backups refer to copies made of the entire MySQL dataset. Binary backups are typically faster compared to logical backups, especially on larger datasets. Several tools come to mind in these cases.
An open source binary backup solution for InnoDB. The good thing about XtraBackup is that it is non-locking when using MySQL with the InnoDB storage engine.
MySQL Enterprise Backup
An InnoDB hot backup solution that is included in the subscription level of MySQL enterprise.
These tools can offer you incremental and daily backups, however they still don’t bring you point-in-time recovery. If your recovery point objective is very limited, it might mean that that you require to externally store (backup) your binary logs and replay them on your restored database. Keep in mind that this factor potentially impacts your recovery time objective.
This concept is not a backup, but this technology might help you to recover your database and limit the recovery time significantly.
We’ve discussed having a business continuity requirement list, and some potential tools that might assist you in covering them (at least on the MySQL level). One of the last items that is important is actual testing. The number of companies that require data recovery and then notice that their backups are corrupted are way too numerous.
Make sure your organization tests their backups regularly. Are you sure they work properly? Make sure that you perform regression tests for new code – for example on a restoration set of the backups.
If you make sure you trust your backups, you might sleep better at night!