Implementing Adaptive Concurrency Limits
Adaptive concurrency limits are critical for maximizing the performance and reliability of your service. In this blog post, we explain why your service needs these limits and how you can implement using FluxNinja Aperture.
Join the DZone community and get the full member experience.
Join For FreeHighly available and reliable Services are a hallmark of any thriving business in today’s digital economy. As a Service owner, it is important to ensure that your Services stay within SLAs. But when bugs make it into production or user traffic surges unexpectedly, services can slow down under a large volume of requests and fail. If not addressed in time, such failures tend to cascade across your infrastructure, sometimes resulting in a complete outage.
At FluxNinja, we believe that adaptive concurrency limits are the most effective way to ensure services are protected and continue to perform within SLAs.
What Are Concurrency Limits?
Concurrency is the number of requests a service can handle at any given time. It is calculated using Little’s Law, which states that in the long-term, steady state of a production system, the average number of items L in the system is the product of the average arrival rate λ and the average time W that an item spends in the system, that is, L=λW. If any excess requests come in beyond L, they cannot be served immediately and must be queued or rejected. And this could lead to a significant build-up of queues, slowing down service response times. However, queues do not build as long as services are within their concurrency limits.
Concurrency limits are hard to estimate, especially when there are a large number of interdependent micro-services and fast-moving environments.
- Updates in micro-services: Micro-services are updated frequently, and whatever concurrency limit you set initially could be outdated in the next release of your micro-service, resulting in a performance bottleneck or service outage. Additionally, feature additions and configuration changes make it hard to keep up with changing concurrency limits.
- High churn environments: Scale-in and scale-out events change concurrency limits- when services scale out, concurrency limits need to be dynamically adjusted to balance out incoming traffic.
This is why dynamically setting concurrency limits (Adaptive Concurrency Limits) based on overall service health is the best way to protect a service & stay within SLAs.
Adaptive Concurrency Limits vs. Rate Limits
At first glance, both concurrency limits and rate limits seem to do the same job. But they serve very different purposes.
Rate limits are a preventive technique- they prevent misuse of a Service by a particular user, making sure the Service remains available for other users. But this technique does not help if there is a surge in overall traffic not attributed to any specific user.
On the other hand, Adaptive Concurrency Limits are a protective reliability technique. Using Adaptive Concurrency Limits, it is possible to detect when the number of requests to a service exceeds the concurrency limit of a service and have reliability interventions kick in.
Using Aperture for Adaptive Concurrency Limits
Aperture is an open-source flow control and reliability platform which can help you set Adaptive Concurrency Limits for your services. At the heart of Aperture is a Control System Loop, manifested by:
- Observing: Aperture agents monitor the deviation of your service’s current latency from historical trends using Golden Signals and identify load build-up or deterioration.
- Analyzing: Aperture Controller, which is the control loop's brain, continuously evaluates deviations from SLAs and communicates flow control decisions back to the agents.
- Actuating: Aperture agents sit right next to the service instances, regulating and prioritizing requests via a scheduler.
To showcase how Adaptive Concurrency Limits can be set in practice, let’s deep dive into a demo setup of Aperture agents and controllers.
Demo Setup
Aperture comes with a playground, pre-configured with a traffic generator, a sample application, and an instance of Grafana that you can use to see various signals generated by a policy.
The above snap shows a demo application with three services and a traffic generator named wavepool-generator
.
Service Topology
The demo application is an example of micro-services topology, where the request flows from service1
to service2
and service2
to service3
. Each service adds a delay with a jitter to simulate processing. Service 3 is the upstream service configured with an artificial concurrency limit to simulate overload scenarios.
Traffic Pattern
The traffic generator is designed to generate a symmetrical traffic load for two types of users — subscribers and guests. Basically, the load generator alternates between regular traffic and high-traffic scenarios periodically. Its configuration is present under the file playground/load_generator/scenarios/load_test.js
.
export let vuStages = [
{ duration: "10s", target: 5 },
{ duration: "2m", target: 5 },
{ duration: "1m", target: 30 },
{ duration: "2m", target: 30 },
{ duration: "10s", target: 5 },
{ duration: "2m", target: 5 },
];
export let options = {
discardResponseBodies: true,
scenarios: {
guests: {
executor: "ramping-vus",
stages: vuStages,
env: { USER_TYPE: "guest" },
},
subscribers: {
executor: "ramping-vus",
stages: vuStages,
env: { USER_TYPE: "subscriber" },
},
},
};
And generating the following traffic pattern –
- Ramp up to
5
concurrent users in10s
. - Hold at
5
concurrent users for2m
. - Ramp up to
30
concurrent users in 1m (overloadsservice3
). - Hold at
30
concurrent users for 2m (overloadsservice3
). - Ramp down to
5
concurrent users in10s
. - Hold at
5
concurrent users for2m
.
Deploying Aperture Policies for Adaptive Concurrency Limits
Aperture comes with a pre-packaged list of Aperture Policies and Grafana Dashboards that can be used both as a guide for creating new policies and as ready-to-use Aperture Blueprints for generating policies customized to a Service and the use case. Policies are evaluated periodically, as defined in blueprints. Read more about Aperture Policy generation here.
The playground is configured with a Latency Gradient Policy. This policy is configured to measure the service latency of service3
via Flux Meter, and that signal is used to detect an overloaded state. The concurrency limiter is deployed on service1
, which is the downstream service (see Service Topology). This ensures that when service3
is overloaded, we stop accepting additional requests at the entry point i.e., service1
, to avoid wasted work.
Aperture comes with a dry run mode that can be configured dynamically, allowing us to validate policy behavior without affecting the incoming traffic.
When no Protection Is Set Up for Services
With the help of the Grafana dashboard provided by Aperture, the latency of service 3
(In this case, Aperture policy is running in dry run mode) can be easily monitored.
Traffic Ramping Up
Once the Traffic generator starts ramping up the number of users, the latency of service3
(under FluxMeter Panel) Starts touching 140ms. Whereas in normal conditions, it is under 60ms. These latency spikes could lead to a bad user experience, or if this latency keeps on increasing, it will hit client timeout, and service would become completely unavailable, triggering a potential cascading failure throughout the application.
Also, it is worth mentioning; subscribed users' workload is not prioritized, which implies if guest users make too many requests, the subscribed users will face the consequences such as high latency and request time-out problems.
When Aperture Is Protecting the Service
Once Aperture becomes active, it will start evaluating all the signals. The signals Dashboard is available under aperture-controller
inside Grafana. These signals are passed through a circuit, converting signals into control decisions.
To learn more about how a circuit works, check this circuit diagram of the policy.
Golden signal metrics in Prometheus are imported as Signals, and each signal can be plotted for understanding the functioning of a circuit such as –
EMA
- This is used to calculate Latency Setpoint.IS_OVERLOAD
- Tracks whether the policy thinks a service is overloaded.LOAD_MULTIPLIER
- Tracks load-shedding decisions being published to Aperture Agents- And etcetera.
Signals Dashboard
After evaluating signals via circuits, decisions are made. One of the policy's benefits is that it can be customized for maximum acceptance latency based on requirements and SLO.
When Aperture is protecting service.
Here, the traffic pattern is the same as earlier. However, this time around, Aperture is using service concurrency limits to decide whether to approve a request for processing or reject it.
Normal Traffic Scenario
Under normal circumstances, latency hovers around 50ms. That’s where Aperture is learning the baseline latency by doing an exponential moving average on latency readings. To track incoming traffic, check out the “Incoming Concurrency” panel, and for the accepted traffic, check the “Accepted Concurrency” panel, as shown above in the snapshot.
Both Guest and Subscriber workloads shown on indices 0 and 1, respectively, have equal acceptance rates in the “Workload Decisions” panel, as there are no traffic drops during normal loads at the start.
In addition, Aperture automatically estimates the cost of admitting the request for each workload, which can be tracked in the “Workload Latency” panel. This estimation helps with prioritization and fair scheduling of requests during overload scenarios. Aperture’s Scheduler can prioritize workloads based on request attributes. For instance, in this policy, subscribed user workload is configured to have higher priority than guest user workloads.
Traffic Ramping Up
When traffic generators start ramping up the concurrent number of users, service3
will come under a situation of overload, causing latency to go up. As soon as Aperture detects this latency spike, it limits concurrency on service1
. Based on the priorities configured in the policy, more subscribed workloads traffic is being accepted compared to guest workloads.
During the spike in traffic, you see that the “Incoming Concurrency” graph ramps up, but Aperture Agent tries to automatically adjust the “Accepted Concurrency” by flattening the graph. Eventually, as the traffic ramps down, both graphs return to normal.
In the Flux Meter panel, it's visible that the latency on service3
is being maintained within the configured tolerance level, ensuring the service remains responsive throughout the traffic spike.
Traffic Ramping Down
The traffic spike is subsiding as the traffic rate goes down to normal levels. In the background, the latency gradient policy will keep load shedding to maintain its safe concurrency limit, a limit where the service is not overloaded.
Once the service is no longer overloaded, the Aperture Latency Gradient Policy will try to increase the concurrency limit of the service periodically, leading to maximum acceptance rates of requests.
Across-the-board Overview of Aperture protecting vs unprotected service.
Overall, there is an enormous difference when Aperture comes into the picture, controlling the flow of requests and maintaining latency throughout the period when traffic is ramping up. Latency significantly drops when Aperture is protecting the service.
Conclusion
In this blog post, we learn how powerful Adaptive Concurrency Limits can be in protecting services from overloads and how Aperture policies can be used to set them. This helps service owners with:
- Preventing cascading failures with load shedding at the right place & time.
- Providing a high-quality user experience with workload prioritization and a high capacity for critical API requests.
- Keeping services within SLA with adaptive service protection.
In future posts, we will dive deeper into how Aperture enables prevention, escalation, and recovery techniques for reliability management.
To get started with Aperture open source, visit our GitHub repository and documentation site. Join our Slack community for best practices, questions, and discussions on reliability management.
Resources
Published at DZone with permission of Sudhanshu Prajapati. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments