Application Resiliency in a Multi-Cloud World Requires a New, Modern Service Mesh
We review the need to better connect silos, so app owners can make the right decisions at the right time to deliver the best application experience for users.
Join the DZone community and get the full member experience.Join For Free
The modern app is dynamic, it’s distributed, and it lives across multiple clusters and clouds. It’s likely made up of dozens, even hundreds, of microservices. And it can be spun up and scaled quickly to meet evolving user and market demands. However, architecture flexibility in a multi-cloud world often results in a lack of visibility. With services spread out across multiple cloud providers and on-premise infrastructure, cloud architects are finding it difficult to know whether modern applications are performing as intended.
This lack of visibility and awareness is a big problem in a world defined by application experience. I have spent decades working with customers who are desperately trying to solve the app resiliency problem, and I’ve observed that half of new applications fail to meet performance SLAs despite the fact that most enterprises overspend on cloud costs by two to three times. CTOs try to solve this problem by “breaking down silos.” An issue comes up, everyone jumps into a war room, identifies and solves the bottleneck, but then goes right back to living in disparate silos.
Silos are not the problem. We just need a better way to connect these silos, so application owners can make the right decisions at the right time to deliver the best application experience for users.
The Need for a New Service Mesh
Solving the app resiliency problem requires a new way of thinking about applications and application experience. The modern app is essentially a network, and developers need to treat it as such. This requires a new, modern service mesh that acts like an interconnectivity superhighway that gives application owners and cloud architects critical information across disparate silos, so they can make a just-in-time decision. This new, modern service mesh basically serves as a common abstraction layer for application services and gives cloud architects the visibility and control they need to measure and deliver consistent application experiences defined by policies.
What should this new, modern service mesh look like? Here are five critical capabilities:
- Declarative Approach: Rather than manually stitching together the right experience, organizations need to take a declarative approach to application experience where a desired service level objective (SLO) can be set and automatically delivered. No armies of engineers turning dozens of knobs. Just a simple declaration of expected experiences that is then delivered seamlessly to any user, anywhere—regardless of underlying infrastructure. The new, modern service mesh should be able to trigger certain resiliency functionalities—such as autoscaling, descaling, cloudburst—and have the ability to check on available capacity before triggering an action.
- Traceability: In order to automatically deliver SLOs, the new, modern service mesh needs to track the overall performance of user transactions in a specific geographic location—from the time a user clicks on an application to the time the transaction is completed—regardless of the number, location and provider of the distributed cloud services it uses.
- Context: The new, modern service mesh then needs to be able to apply these insights to specific metrics that are constantly measured and put into the proper context across multi-cloud environments.
- Testing and Iteration: Testing allows the user to roll out new services in a mesh and compare predicted experiences to a set baseline. Testing also needs to include a randomized fault injector to account for unplanned events. For example, an application owner can set up a scenario where 10 percent of services randomly fail, and the controller will then measure the overall SLO of these services. The developer can then compare multiple chaos tests against each other and use this data to drive an improvement to the resiliency architecture of the applications.
- Align to Cloud Spend: Experience doesn’t occur in a bubble, so the new, modern service mesh should be able to apply desired experiences to actual cloud costs. This allows organizations to balance experience with cloud spend. Through shadow mode simulations, an application owner should be able to simulate the cost of running their application services on service mesh and get an approximate of what it would cost to complete the expected number of transactions.
Modern applications have gotten too complex to manage manually. We need a new declarative approach to delivering application resiliency services across multi-cloud environments. But this dynamic declarative contract requires better visibility and control over multi-cloud infrastructure through a new, modern service mesh. Only then, will we be able to deliver powerful and consistent application experiences in today’s highly-distributed, modern world.
Opinions expressed by DZone contributors are their own.