We've already discussed hitting capacity in MySQL, sharding complications and complexities, and flexing database capacity up and down. This time, we'll be discussing achieving high availability in MySQL.
When discussing high availability, the following questions usually come up first: Do you need it? Why?
Great questions. If your system is a transactional one with high value/high volume transactions that have significant financial impact, then downtime and slowdown are not acceptable and your application needs to be protected against these issues. These kinds of workloads include e-commerce, financial/trading (including ad tech and mar-tech automatic ad-purchasing systems), real-time decision support systems, and healthcare. If these systems suffer outages, people’s jobs or even lives can be on the line.
This begs the question of whether these systems should be fault tolerant, i.e. designed to handle multiple simultaneous failures without any downtime at all. Fault tolerant systems exist but aren’t usually deployed in MySQL architectures due to their very high cost. Instead, HA in MySQL usually focuses on “minimizing downtime,” with reliability targeting 5x 9s, or five minutes of downtime per year.
Figure: Availability by the 9s.
How is HA accomplished with MySQL? Typically via replication strategies.
Replication for Highly Available MySQL Systems
HA in the world of MySQL is usually replication-based, leveraging redundant servers.
This means for every production server, there is at least one additional server, containing the same data, that is ready to be switched to if the primary goes offline.
The typical method for this is via replication, specifically:
- Read slaves: Add one or more read slaves to the master via MySQL replication. The master could be a single node or a shard.
- Master/master: Add a secondary master to the master via MySQL replication. The master could be a single node or a shard.
- Certification replication: Add one or more masters to the master via certification replication. The master could be a single node or a shard.
- Group replication: Add one or more masters to the master via group replication. The master could be a single node or a shard.
Each of these replication strategies has an associated (balance) between latency, performance, and consistency. What level of consistency is necessary between nodes in your workload? And does that have an effect on performance?
MySQL HA Approaches per Deployment Architecture
Here is a grid summarizing each of the typical MySQL deployments (single node, read slaves, master/master, and sharding), and the ramifications of the four main replication approaches. This summary should help you decide which is best for your workload:
ClustrixDB is a shared-nothing, fully distributed MySQL-compatible clustered database, providing high availability by default.
How does this work?
- There are by default two complete instances of the data distributed across the cluster.
- All nodes provide linear read and write scale.
- All nodes are strongly consistent with each other at the transactional level.
- By default, you can lose a single node and the system will lose no data.
- You can add additional server redundancy by setting the nResiliency level, up to the Paxos limit. (You can lose up to one less than half the nodes in the cluster to avoid “split brain.”)
Click here to see a full recording of the final installment of the series.