DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Top 5 Incidents and Outages of 2021
  • Implement a Distributed Database to Your Java Application
  • AWS Redshift Data Sharing: Unlocking the Power of Collaborative Analytics
  • Restoring the MS SQL Server Database in Easy Steps

Trending

  • Event-Driven Architectures: Designing Scalable and Resilient Cloud Solutions
  • A Developer's Guide to Mastering Agentic AI: From Theory to Practice
  • Medallion Architecture: Why You Need It and How To Implement It With ClickHouse
  • Docker Model Runner: Streamlining AI Deployment for Developers
  1. DZone
  2. Data Engineering
  3. Databases
  4. Using Amazon FSx for SQL Server Failover Cluster Instances

Using Amazon FSx for SQL Server Failover Cluster Instances

In this article, see what you need to know about using Amazon FSx for SQL Server failover cluster instances.

By 
David Bermingham user avatar
David Bermingham
·
Jan. 12, 21 · Opinion
Likes (3)
Comment
Save
Tweet
Share
8.2K Views

Join the DZone community and get the full member experience.

Join For Free

Intro

If you are considering deploying your own Microsoft SQL Server instances in AWS EC2, you have some decisions to make regarding the resiliency of the solution. Sure, AWS will offer you a 99.99% SLA on your Compute resources if you deploy two or more instances across different availability zones. But don't be fooled, there are a lot of other factors you need to consider when calculating your true application availability. I recently blogged about how to calculate your application availability in the cloud. You probably should have a quick read of that article before you move on.

When it comes to ensuring your Microsoft SQL Server instance is highly available, it really comes down to two basic choices: Always On Availability Group (AG) or SQL Server Failover Cluster Instance (FCI). If you are reading this article I'm making an assumption you are well aware of both of these options and are seriously considering using a SQL Server FCI instead of a SQL Server Always On AG.

Benefits of a Microsoft SQL Server Failover Cluster Instance

The following list summarizes what AWS says are the benefits of a SQL Server FCI:

Challenges with FCI in the Cloud

Of course, the challenge with building an FCI that spans availability zones is the lack of a shared storage device that is normally required when building a SQL Server FCI. Because the nodes of the cluster are distributed across multiple datacenters, a traditional SAN is not a viable option for shared storage. That leaves us with a two choices for cluster storage: 3rd party storage resources like SIOS DataKeeper or the new Amazon FSx. Let's take a look at what you need to know before you make your choice.

Buyer Beware!

Before you decide to use FSx, you must take the following into limitations into consideration.

Service Level Agreement

As I wrote in how to calculate your application availability, your overall application SLA is only as good as your weakest link. In this case, the FSx SLA of 99.9% is your weakest link!

Normally 99.99% availability represents the starting point of what is considered "highly available". This is what AWS promises you for your compute resources when two or more are deployed in different availability zones.

In case you didn't know the difference between three nines and four nines...

  • 99.9% availability allows for 43.83 minutes of downtime per month
  • 99.99% availability allows for only 4.38 minutes of downtime per month

By hosting your cluster storage on FSx you are effectively negating the benefit of your 99.99% compute availability, leaving you with just a 99.9% overall application availability.

In contrast, with a solution like SIOS DataKeeper, you would have to experience a simultaneous failure of two EBS volumes in two different availability zones before you experienced downtime. Assuming a single EBS volume has a 99.9% SLA, the statistical probability that at least one of the EBS volumes will be online at any given time is 99.9999%.

1 - (.001 * .001) = .999999

Costs

Assuming the dismal SLA of FSx didn't scare you away, let's take a close look at the costs associated with the solution compared to the SIOS DataKeeper solution. Your costs will vary greatly depending upon your requirements, but once you determine the amount, speed and latency you hope to achieve, AWS has a handy calculator that you can use to compare the solutions. In the example below I provisioned what I consider to be pretty typical of what I see in the real world. Of course if you for the DataKeeper and EBS solution you have to add the cost of DataKeeper to the solution, but even the most expensive pay as you go option ($0.50 * 2 * 730 = $730) still puts the solution at ~60% less than a comparable FSx solution.



Storage Location

When configuring FSx for high availability, you will want to enable multi-AZ support. By enabling multi-AZ you have an effectively have a "preferred" AZ and a "standby" AZ. When you deploy your SQL Server FCI nodes you will want to distribute those nodes across the same AZs.

Now in normal situations, you will want to make sure the active cluster node resides in the same AZ as the preferred FSx storage node. This is to minimize the distance and latency to your storage, but also to minimize the costs associated with data transfer across AZs. As specified in the FSx price guide, "Standard data transfer fees apply for inter-AZ or inter-region access to file systems."

Unfortunately, there is currently no way to tie both the storage and compute together, such that if one or the other fails, the other fails over as well to minimize the latency and to ensure no additional costs are incurred for accessing the data. Currently the cost for transferring data across AZs, both ingress and egress, is $0.01/GB.

Without keeping a close eye on the state of your FSx and SQL Server FCI, you may not even be aware that they are running in different regions until additional latency is noticed or until you get an unexpected data transfer charge at the end of the month.

In contrast, in a configuration that use SIOS DataKeeper, the storage failover is part of the SQL Server FCI recovery, ensuring that the storage always fails over with the SQL Server instance. This ensures SQL Server is always reading and writing to the EBS volumes that are directly attached to the active node.

Controlling Failover

In an FSx multi-subnet configuration there is a preferred availability zone and a standby availability. Should the FSx file server in the preferred availability zone experience a failure, the file server in the standby AZ will recover. AWS reports that this recovery time takes about 30 seconds.

Unfortunately a 30 second failure of the storage could also cause the SQL Server isAlive resource check to fail if it runs at the same time that a storage failover is occuring. It's a little hit or miss here as the isAlive check is scheduled to run every 60 seconds, so you may miss the outage window or you may not.

To make matters worse, FSx multi-site has automatic failback enabled, meaning that for every unplanned failover of FSx, you also have to deal with an unplanned failback, doubling your unplanned downtime. In contrast, typically when a SQL Server FCI experience an unplanned failover you would either just leave it running on the secondary or schedule a failback after hours or during the next maintenance period.

If you WANT to initiate a planned switchover of the FSx file server there is not an easy button or command to run to cause a switchover. Instead, there is a workaround where you have to change the amount of throughput of the FSx server which will cause the FSx service to failover to the standby node. To move it back you will once again have to change the throughput to cause a failback.


https://docs.aws.amazon.com/fsx/latest/WindowsGuide/high-availability-multiAZ.html


SQL Server Analysis Services Cluster Not Supported With FSx

If you want to include SSAS in your cluster, I'm afraid you will not be able to use FSx. The How to Cluster SQL Server Analysis Server white paper clearly states that SMB cannot be used and that cluster drives with drive letters must be used. In contrast, the DataKeeper Volume resource presents itself as a clustered disk and can be used with SSAS.

Network Saturation

When using SMB storage, every read and write has to go across the network. All of this traffic competes with client access traffic and counts towards the overall EC2 instance network utilization. Each EC2 instance size has a cap on how much network traffic is allocated for that instance. The bigger the instance size the more network bandwidth is allocated to the instance. You must be sure that the combination of storage traffic and client traffic does not exceed the network bandwidth allocated to your EC2 instance type. In some scenarios you may be forced to increase your instance size to accommodate the extra traffic associated with using SMB storage. To complicate matters, network throughput on some EC2 instance sizes is not guaranteed, it is only guaranteed upto an x amount, meaning it is capped at that amount for sure, but at certain times might have access to something less than the max specified.

Summary

While FSx certainly can make sense for typical SMB uses like Windows user files and other non-critical services, the less than stellar SLA of only 99.9% falls far short of the 99.99% SLA commonly considered the baseline for high availability. The reason you build a SQL Server FCI that spans availability zones is to achieve an 99.99% availability SLA. As soon as you attach FSx storage as a dependency to the cluster your 99.99% SLA goes out the window and you now have an SLA for your cluster of 99.9%, or almost 44 minutes of downtime per month.

cluster sql Microsoft SQL Server AWS application Network Data (computing) Server Message Block

Published at DZone with permission of David Bermingham. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Top 5 Incidents and Outages of 2021
  • Implement a Distributed Database to Your Java Application
  • AWS Redshift Data Sharing: Unlocking the Power of Collaborative Analytics
  • Restoring the MS SQL Server Database in Easy Steps

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!