There are many harsh learnings that we experience during significant disruptions and performance issues. The public cloud seems to have been the catch-all answer for solving some of these challenges, or so it would seem. The reality of the public cloud is that it solves specific challenges in a way that has become widely embraced. There are also still a lot of shortcomings and challenges that are in place.
When is one cloud not enough? When we design our on-premises environments we are bound to more limiting factors around embracing true platform diversity at the underlying infrastructure layers. When we move to the public cloud, it seems like the right time to not just solve things like deployment velocity and scalability, and to think bigger at inception.
Designing for cloud-native environments should really be the first step on a journey to a multi-cloud architecture. Here are a few reasons why this is important and should be on the minds of every application and cloud infrastructure architect.
Multi-Cloud Advantage #1: Cloud Diversity
No matter how resilient the services that back the public clouds are, they fail. That’s a fact. If you know that there is more than a zero percent chance of failure (which there is) then we should all be architecting for diversity in the services to ensure we can have business continuity during outages.
This doesn’t have to be active-active deployments across both Google Cloud and AWS. It could be an active-passive deployment. Make sure that you look for every opportunity to architect so that services that can be diverse across clouds are employed. If one of the underlying services goes away temporarily, you can resume services on another route and service in another cloud. This is designed for availability more than recoverability, so it is about presenting the service in one form or another. Performance is secondary in this part of the architecture, but it doesn’t remove the importance just the same.
Multi-Cloud Advantage #2: Application Resiliency
Resiliency can be built in across availability zones, and then beyond that to a multi-region strategy. Again, this is great when the disruption to the cloud provider is localized. We must think bigger for many of your applications. Can you withstand an outage for extended periods of time during cloud service disruption? When we architect for application-level resiliency we attack this constraint head on.
The ideal way to architect beyond the region is to architect beyond the cloud itself. You may stay regional to satisfy latency constraints to the consumer of the service. Why not do it on a second cloud and ensure that you have the application resiliency built to survive an entire cloud service outage altogether? This means minimizing the cross-region replication and deployments potentially and instead using a more minimal configuration across cloud platforms. Even a cross-region deployment within the same cloud provider can incur outages.
Multi-Cloud Advantage #3: Price Diversity
There is a price war among the cloud providers. Why not be on the right side of the battlefield whenever you can? The only way to truly take advantage of the pricing war among cloud providers is to have your services able to be deployed in every place where price advantage can align with service availability.
Long running workloads? You may want them in Google where they do automatic discounts for any workload running over a certain period and over a certain number of hours per month. Need to get the best database offering? You may find that a distributed database platform on one cloud provider and a native DBaaS (Database-as-a-Service) on the other provider is a better mix of resiliency and for the best price.
Every layer within the cloud architecture has its own pricing shifts and competitive drivers. It’s there to be taken advantage of if we use the right methods to leverage it.
The idea of a completely diverse cloud portfolio with price and service resiliency across multiple providers may sound like the stuff of fairy tales. Stop for a moment and think about what you’re doing today in your network infrastructure. Many organizations are already using diverse network designs and diverse providers across primary and backup links to ensure the survival of any single provider failure.
Once we solve application resiliency within an environment, we can stretch beyond the boundaries of any single service provider.
Challenge #1: Tooling and Deployment Process
Processes for deploying and managing your application infrastructure are very dissimilar across clouds. This makes an interesting barrier to the multi-cloud approach. The solution for this is to look to platforms and products that have the right abstraction and that allow for deploying and controlling your application workloads regardless of the underlying cloud provider.
There are CI/CD environments that are cross-cloud (e.g. Jenkins, Travis, CircleCI). There are deployment and provisioning products that have providers across clouds (e.g. Terraform, Ansible, Puppet, Chef). This was the reason that we designed Turbonomic to be a higher abstraction so that we can provide the control across infrastructure or else we risk being locked down to single-provider infrastructure solutions.
Multi-cloud provisioning will mean extra work to keep your application deployment frameworks as open as possible. Once you build the process around multi-cloud, you give yourself more flexibility. The effort is much higher up front, though, which often drives people to stay in a single cloud architecture just because it’s the shortest path.
Challenge #2: Sprawl and Performance
If you haven’t already lost sight of the amount of cloud workloads you have today, you’re going to soon. Sprawl is its own challenge all unto itself. This is the situation where you have a vastly spread out set of applications and supporting services that span across AZ, region, and now cloud providers. It’s not u
Challenge #3: Cost
What is the price of resiliency? There are three problems that we have when we talk about cost:
Knowing the price per resource
Knowing the cost per resource
Knowing the cost of the entire portfolio
You didn’t misread that cost and price are there as different line items. I view them as two different things. There is the price per resource which comes in the amount you spend on a per-hour basis to operate that resource. Then there is the “cost” per resource which is the real spend against what you are getting in return. It may be a semantic difference, but I put the cost as the price of performance and the overhead versus just the raw price itself. Context is important when we introduce application performance into the mix.
Another challenge is the overall portfolio cost. This is a challenge when you have multiple cloud providers and need a way to understand the total cost of the application as it stands across every instance and service that supports it.
The price of resiliency is like insurance. You always want to pay more when you need it, but don’t way to pay when it doesn’t appear to be necessary. When your customers and employees suddenly lose access to services and applications for an hour or many hours, you’ll wish that you had that insurance in place.
Architecting a multi-cloud environment is a challenge that is well worth the effort in my humble opinion. Plus, if you don’t know how much effort and cost it may incur, you will never be able to truly defend your decisions not to architect it that way.