Hybrid Cloud Performance Is More Than a Balancing Act
Hybrid Cloud Performance Is More Than a Balancing Act
Let's talk a look at the challenges and tradeoffs you need to consider to get the most out of your multi-cloud environment.
Join the DZone community and get the full member experience.Join For Free
Learn how to migrate and modernize stateless applications and run them in a Kubernetes cluster.
Today’s IT environments are growing at a rapid pace. Not only are they growing in size, but in complexity. All of the buzzwords rolling around will have you and your CIO reeling while trying to solve the true business challenges which is what we should focus on.
Having had a background in IT environments of every shape and size, I can tell you that I ended up narrowing it down to the same key questions continuously:
How do we assure workload performance?
How do we manage infrastructure capacity?
How do we size applications for the public cloud to get optimal performance for the lowest cost?
How do we maintain compliance for data sovereignty?
How do we maintain license compliance for workloads such as Oracle and Microsoft SQL?
How do we confidently scale the infrastructure for business continuity AND performance AND lowest cost?
How do we do all of this simultaneously without relying on “human middleware”?
Moreover, we are facing increasing pressure to deliver better results with the same or less resources. Sound familiar? This rolls up to what I call the triad which is the Turbonomic approach: Assuring performance, while maintaining compliance for your workloads, and while utilizing the hybrid cloud infrastructure as efficiently as possible to achieve the lowest possible costs.
Doing so with an autonomic engine, one that creates a self-managing, self-regulating hybrid cloud environment, means that this can be done in software to reduce the need for your team to be dealing with technical debt.
Performance Is Built on a Minimum of Four Infrastructure Pillars
One of the challenges in today’s conversations around workload performance is that it tends to revolve around the three common infrastructure factors which are processor (CPU/vCPU), memory, and storage (capacity, IOPS, latency). The problem with that is that it is a three-legged chair without continuously involving the extremely important 4th pillar which is network (bandwidth, latency, capability).
Solving the challenge of performance requires assuring the delivery of the optimal combination of all four of these, continuously and in real-time, otherwise we risk making decisions without all of the intelligence needed to deliver on the SLA of the applications. The desired state of your infrastructure must be across every aspect from the applications down to the physical and virtual layers.
The top things off, there is a 5th factor which becomes even more important, which is the application level Quality of Service which defines the application SLO, or Service Level Objective. Again, the only way to assure performance is to have the complete understanding of the application demand including application-level QoS, down to the virtual and physical infrastructure.
Delivering Performance and Scalable Capacity Management
These four infrastructure pillars are challenging enough to manage due to the dynamic, fluctuating nature of most application workloads. We have already proven that this cannot be managed at a human scale which is shown by how many organizations have leveraged Turbonomic to solve that for them.
What about when we just don’t have enough supply of resources to meet the demand of applications? That is when we have to scale our underlying infrastructure. The actions to scale infrastructure also need to be made in the context of the real-time demand and all of your compliance requirements for application availability while assuring workload performance, and while maintaining the lowest cost infrastructure.
The Many Faces of IT and Business Policy Compliance
The safety of your data and the associated workloads are important to you and your customers. You may find yourself in an environment bound by regulatory restrictions for data locality, or many other potential IT compliance challenges that become exceeding impossible to control and manage at the same time as tackling performance. Using Turbonomic as an example, look no further than your policies to either bring in your existing placement (affinity and anti-affinity) policies for workloads from your existing environment, or to create more effective and truly dynamic policies to maintain compliance for your workloads right inside Turbonomic.
Policies include the ability to restrict placement or migration of workloads and data into certain geographic regions (data sovereignty), restricting that workloads dynamically spread across the infrastructure for resiliency, or dynamically group certain application workloads to satisfy other architectural constraints. All of which are used as constraints while Turbonomic provides actions to assure the workload performance in the most efficient manner while maintaining that compliance.
Dynamic License Compliance and Cost Control
The very same patented approach we have covered here extends seamlessly to maintain compliance and cost control of your licensing. Again, all of which is being done to assure application performance while achieving the other two parts of the triad.
Increased demand in the licensed applications means that the licensed workloads are given priority to underlying licensed host infrastructure. Other non-licensed workloads will migrate or scale appropriately to ensure the licensed applications are getting access to the resources needed to assure performance. All done dynamically, in real-time, while maintaining compliance for your application licensing constraints.
Decisions and actions within Turbonomic to assure workload performance and get the greatest efficiency for infrastructure are always done in the context of licensing requirements. This ensures continuous compliance which needs to be done in the performance, cost-efficiency, compliance triad.
Overhead Is not a Performance Solution
Ensuring you have the performance capacity to handle workload continuity for N+1 or N+2 architectures needs to be as dynamic as your workloads. Turbonomic built in the ability to assure performance in real-time and to also show you what the real results would be in the event of any number of “What If?” scenarios such as losing a single host, or multiple hosts within a cluster.
When multiple clusters are put into the environment, we fall into the trap even more deeply. Each cluster is provided with a percentage of overhead locally which then becomes significantly more across the environment. With 5-node clusters, the best practice for N+1 configuration indicates you need 20% overhead in the event that the largest node fails. Effectively that means you have one empty host out of 5. If you have 3 clusters, that is 3 entirely unused hosts.
Simply create a new policy and virtually merge those clusters with Turbonomic and you will quickly see how the environment can change if you were to flatten these clusters.
Distributing workloads across clusters for resilience can also be done using easy-to-create placement policies which ensures architectural compliance such as compliance needs for resiliency in the event of node failures and application redistribution.
Trade-Offs vs. Balancing
It may seem like a semantic difference, but the use of a platform that uses real-time performance across all layers of the stack up to the application, using true QoS metrics such inside the application layer, down to the virtual and physical infrastructure across CPU, memory, storage, and network, is the only way to truly deliver hybrid cloud performance.
Balancing virtual machines is meant to satisfy the bottom up approach to keep hosts happy. Distributing workloads is an important aspect of application availability, but is typically being done using the legacy approach of assigning static headroom and overprovisioning to keep from hitting the thresholds.
This is another fundamental reason why Turbonomic approached this challenge from the start using a dynamic solution to assure workload performance while maintaining compliance for placement, business continuity needs, and doing all of this to attain the lowest cost to operate your infrastructure. Every action becomes a series of trade-offs which allow a proven better use of resources. Using Turbonomic policies, the workloads autonomically distribute while still delivering on the triad of performance, efficiency, and compliance.
There will be times when dynamic workloads will exceed the overhead, so what would your balancing process do for you then? No matter how seemingly predictable your workloads are, there is no better solution than a real-time platform that has complete understanding of the workloads. Did I mention agentless? Because that’s something that you shouldn’t have to trade-off.
Published at DZone with permission of Eric Wright , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.