Configuring SQL Server for High Availability in the Cloud
If you want SQL Server running in the cloud with a guarantee of 99.95% or greater availability, take extra steps to ensure SQL Server itself remains available.
Join the DZone community and get the full member experience.Join For Free
Cloud service providers offer SLAs guaranteeing availability of 99.95% and higher. That number might prompt one to think the cloud ideally suited for a SQL Server deployment that requires high availability (HA). Given the geographic distribution of Azure and AWS data centers, one might even think the cloud perfect for a SQL Server deployment configured with disaster recovery (DR) in mind.
But let’s rethink this.
The cloud service SLAs only guarantee cloud infrastructure. They do not guarantee the availability of the applications running on that infrastructure. If you want SQL Server running in the cloud with a guarantee of 99.95% or greater availability, you need to take extra steps to ensure that SQL Server itself remains available. This article looks at your options.
You may also like: Achieving SQL Server HA/DR With a Mix of Always On AGs and SANless SQL Server Failover Cluster Instances
Infrastructure Availability Guarantees: Important but Insufficient
Two options for HA and DR infrastructure are important to understand: Availability Zones and Availability Sets.
The Availability Set option — only available in the Azure cloud — distributes two or more virtual machines (VMs) into separate racks in the same data center. Microsoft guarantees that at least one of the VMs will be available at least 99.95% of the time.
The Availability Zone option — available in AWS as well as in some Azure regions — puts VMs in separate data centers, not just separate racks. This enables providers supporting Availability Zones to guarantee that at least one of the VMs will be available at least 99.99% of the time.
But what if a VM in either topology goes down and takes SQL Server with it? A VM in another rack, or in another data center, will still be available. However, the infrastructure SLAs provide no guarantee that data from one VM will be replicated to another. If the instance of SQL Server running on that second VM does not have a copy of the data most recently used by the primary SQL Server instance, or there is no automated way to redirect connections to the remaining node, all the end users that had been interacting with SQL Server are going to experience a complete loss of service.
As I said, infrastructure guarantees are important but insufficient.
Ensuring the Availability of SQL Server in the Cloud
Traditional SQL Server deployments configured for HA or DR have relied on SQL Server Failover Cluster Instances (FCIs) or SQL Server Always On Availability Groups. Both options are viable in the cloud, but with caveats.
Traditional SQL Server FCIs rely on shared storage, which is not available in the cloud. The only way to get around this, barring the use of a third-party tool, is to use the Storage Spaces Direct feature first introduced in Windows Server 2016 Enterprise Edition. Storage Spaces Direct creates a virtual storage area network (SAN) by pooling locally attached storage across two or more servers in a cluster. That virtual SAN can be used by a SQL Server FCI in place of shared storage and can be accessed by any of the VMs in the cluster.
The caveats? Storage Spaces Direct can work with servers configured in an Availability Set but not with servers configured in an Availability Zone. That limitation has two important ramifications for HA and DR: Availability Sets are only available in the Azure cloud. And, the infrastructure SLA for Availability Sets tops out at 99.95% because all the VMs in the set reside in a single data center. That limits the utility of this approach when it comes to DR. Finally, Storage Spaces Direct requires SQL Server 2016 or later, so it’s not a viable option if you’re using an earlier edition of SQL Server.
Unlike Storage Spaces Direct, an Always On Availability Group can span geographically distinct data centers, which means it is fully supported in both AWS and Azure. That boosts the infrastructure availability SLA up to 99.99%. The Always On Availability Group approach also monitors SQL Server availability and replicates data from the primary instance to the secondary instance(s). If the primary SQL Server instance fails, the secondary instance takes over using the data that has been replicated to it. Additional instances in remote regions can be replicated asynchronously, enabling the creation of DR configurations that protect against the failure of an entire cloud region.
Unfortunately, this approach also has caveats. Always On Availability Groups replicate only the user-defined databases. The master database (MSDB), the database of agent jobs, logins, and passwords — none of these are replicated to the secondary VMs. None of these are protected in the event of a catastrophic failure. Always On Availability Groups also requires SQL Server 2012 Enterprise Edition or later, which is expensive and which may not be compatible with your application. Finally, this approach has not been tested beyond 100 SQL Server databases or 10 AGs. Your mileage may vary depending upon your particular configuration.
The State of HA and DR Solutions Today
If you want to run SQL Server in the cloud with true HA and DR support, your options are limited — both by the narrowness of the options built into Windows Server and cloud infrastructure constraints imposed by Azure and AWS. The Always On Availability Group approach comes closest to delivering the kind of HA and DR support you’ve come to expect from an on-premises HA/DR solution.
However, implementation in the cloud requires a pricy version of SQL Server and does not replicate all your databases. The Storage Spaces Direct approach may be sufficient for those use cases that do not require availability above the 99.95% level. Still, if you’re using SQL Server 2016 or later and that level of availability meets your needs, this may be an approach that works for you.
How to Configure a SQL Server 2008 R2 Failover Cluster Instance in Azure
Opinions expressed by DZone contributors are their own.
Designing a New Framework for Ephemeral Resources
Building the World's Most Resilient To-Do List Application With Node.js, K8s, and Distributed SQL
RAML vs. OAS: Which Is the Best API Specification for Your Project?
Send Email Using Spring Boot (SMTP Integration)