Optimizing Cost and Carbon Footprint With Smart Scaling on AWS: Part 2

Move beyond queue-length-based scaling and use custom metrics for granular control and build smart triggers for scaling.

Varun Dixit

Sep. 11, 25 · Analysis

Likes (2)

Comment

Save

2.5K Views

This article builds on the ideas shared in "Optimizing Cost and Carbon Footprint with Smart Scaling," diving deeper into advanced scaling strategies on AWS. We'll take a closer look at the challenges of queue-based scaling and explore how custom load metrics can offer a better solution.

Along the way, we'll also walk through how to create these metrics in AWS CloudWatch and highlight other AWS services that play a role in optimizing both cost and carbon footprint.

Limitations of Using Queue Length as a Scaling Trigger

While using queue length as a scaling trigger can work in some cases, it’s not always the best solution. Several factors can limit its effectiveness:

Workload Characteristics

Not every workload is queue-driven. Applications with unpredictable traffic or long-running tasks don’t always fit the queue-based model. For example, in applications with sudden bursts of activity, the queue length might not reflect the actual resource demand, causing delays in scaling and potential performance issues. Similarly, long-running tasks that hold messages in the queue can make the queue length appear higher than it actually is, triggering unnecessary scaling actions when the system isn’t under heavy load.

Task Priority

A long queue doesn’t always mean a high load. It could just be a backlog of low-priority tasks, while a short queue could hide a surge in high-priority tasks that need immediate attention. Relying only on queue length can result in misaligned resource allocation, where the system focuses on less critical tasks and delays more urgent ones. This can harm user experience and business operations, especially in time-sensitive applications.

Hidden Bottlenecks/Underlying Issues

Queue length doesn’t always point to the root cause of system slowdowns. A long queue might be a symptom of slow downstream processing or resource limitations elsewhere in the application. In such cases, scaling based on queue length could make things worse, adding more instances that just add to the backlog without fixing the underlying issue. This leads to inefficient resource usage and higher costs without improving performance.

Limited Granularity

Queue length gives a broad overview of pending tasks, but doesn’t capture the full scope of the workload. It doesn’t distinguish between task types, resource needs, or how they impact system performance. This lack of detail can make scaling decisions less effective since it doesn’t provide the necessary context to understand the application’s needs at any given moment. For example, if some tasks take longer than others, the queue length might not provide the best trigger in those cases.

In summary, relying solely on queue length can lead to inefficient resource allocation and increased costs. For example, scaling up due to a long queue of low-priority tasks might result in over-provisioning and unnecessary expenses.

Custom Load Metrics as a Better Alternative

To overcome the limitations of queue-based scaling, we will understand, define, and utilize custom load metrics. These custom metrics give you a much more flexible approach to scaling, allowing you to monitor specific performance aspects that are more relevant to your application's needs and business goals.

Here’s why custom load metrics are a better option than relying on queue length:

1. Granular Control

Custom metrics offer far more granular control over your scaling decisions. As we explored in the previous article, you're often working with a single indicator. In some cases, this can be misleading, as we’ve discussed in the section above, especially in cases where long-running tasks or low-priority jobs inflate the queue size, without reflecting actual resource demand.

Custom load metrics, on the other hand, allow you to track performance indicators that directly reflect the health and resource demands of your application. These could include things like CPU utilization, memory consumption, request latency, or even error rates. With these metrics, scaling decisions become much more accurate because they are based on actual performance data rather than just the number of messages waiting to be processed.

For instance, rather than relying solely on the number of messages in an SQS queue to scale, you could set up a custom metric to track the average API request latency. If the latency increases beyond a certain threshold, you can trigger scaling actions to add more resources. This would be a much more accurate signal of resource strain than simply looking at the queue length, which might not have changed, even though the system is underperforming.

2. Alignment With Business Goals

One of the other key advantages of custom metrics is that you can align them directly with your business objectives. Rather than just scaling based on technical performance alone, you can create metrics that reflect the factors that are most important to your company’s success. This ensures that scaling decisions are made with a broader business perspective in mind.

Take an e-commerce platform, for example. Instead of scaling purely based on system load, custom metrics could be designed to track important business indicators like order processing time or customer conversion rates. When these metrics hit critical thresholds, you can scale up to ensure that you have enough resources to process orders quickly and maintain a smooth customer experience. This way, you ensure that scaling decisions are not just based on raw resource usage but are directly tied to factors that influence revenue, customer satisfaction, and business growth.

By aligning scaling decisions with business goals, you're not just optimizing your system for performance—you’re also ensuring that your application runs efficiently in a way that drives business outcomes.

3. Proactive Scaling

Custom load metrics also enable you to take a more proactive approach to scaling. Instead of waiting for your system to react to spikes in demand, you can anticipate increased load and scale in advance, ensuring that performance remains steady, even during peak times or unexpected traffic surges. This ability to scale ahead of time is particularly important in environments where performance is critical to user experience, such as in retail during the holiday season or during a product launch.

For example, if your application sees a regular traffic spike during lunchtime every day, you can set up custom metrics to detect this pattern and automatically scale resources up just before the surge. This ensures that your infrastructure is ready to handle the load before performance begins to degrade, preventing any disruption to the user experience. Proactive scaling also helps avoid the potential for system overload, allowing you to avoid costly downtime or user dissatisfaction due to slow response times.

4. Tailored to Unique Workloads

Another important advantage of custom metrics is that they can be designed to reflect the unique characteristics of your workloads. For some applications, a high queue length might not indicate a problem, as there may be long-running tasks that are necessary for the business to function. In these cases, a custom metric based on task completion time or the number of tasks in progress might be a better measure of when scaling is required.

For example, a video processing application might experience large fluctuations in the number of tasks in the queue, but that doesn’t always mean that additional resources are needed. Instead, using a custom metric based on task completion time or CPU usage might better reflect when additional capacity is required. By designing your custom metrics to reflect the specific needs of your application, you can avoid unnecessary scaling actions and ensure that resources are used efficiently.

5. Cost Efficiency and Sustainability

With more accurate scaling decisions, custom load metrics can help ensure you are not over-provisioning resources unnecessarily. This leads to both cost savings and a more sustainable use of resources. Instead of scaling out based on inaccurate or overly broad metrics, such as queue length, you can ensure that resources are added only when truly needed — whether it’s because of a high API request rate or increasing latency. This approach helps prevent over-provisioning, which can be costly, especially in the cloud, where you’re often billed based on usage.

In addition, by scaling more efficiently, you also help reduce the environmental impact of your operations. Less over-provisioning means fewer resources are used, which can contribute to reducing your carbon footprint — a critical consideration in today’s world of sustainable computing.

Custom load metrics offer a significantly more refined and business-aligned approach to scaling in AWS. By monitoring specific performance indicators that reflect the actual needs of your application and business, you can make smarter scaling decisions.

Creating Custom Metrics in AWS

Setting up custom metrics in CloudWatch is straightforward, and it helps you gain deeper insights into your system’s health and performance. Here's a more detailed, step-by-step guide on how to create and leverage custom metrics in AWS to fine-tune your scaling strategy:

1. Collect Data

The first step is to collect the data that will feed into your custom metrics. This involves instrumenting your application to track relevant performance indicators such as API latency, error rates, request throughput, or even custom business metrics like order processing times. To gather this data, you can use various tools, libraries, or frameworks:

Application performance monitoring (APM) agents: These tools can automatically track and report metrics like response times and error rates.
Logging frameworks: Many logging tools allow you to track events and errors that can be translated into custom metrics.
Custom code: You can add specific code to your application that tracks and records any performance metric that is important for your business.

The key is to identify the data points that matter most to your system’s health and to ensure that they’re captured accurately.

2. Publish Metrics

Once your application is collecting data, the next step is to publish these data points to AWS CloudWatch as custom metrics. AWS provides an SDK that allows you to send metric data to CloudWatch. This is available for various programming languages, including Python, Node.js, Java, and more. The SDK simplifies the process, letting you integrate the API directly into your application code. You can then publish these metrics either at regular intervals or in response to specific events within your application. You can have built-in metrics pushed to CloudWatch through frameworks such as Micrometer.

Here’s an example of how you can use the AWS SDK for Python (boto3) to publish a custom metric to CloudWatch:

    Python
   
 

   import boto3

cloudwatch = boto3.client('cloudwatch')

response = cloudwatch.put_metric_data(
    Namespace='MyApplication',
    MetricData=[
        {
            'MetricName': 'ApiLatency',
            'Dimensions': [
                {
                    'Name': 'Environment',
                    'Value': 'Production'
                },
            ],
            'Value': 123.45,
            'Unit': 'Milliseconds'
        },
    ]
)
  

In this example, we're publishing an API latency metric with a value of 123.45 milliseconds to CloudWatch under the "MyApplication" namespace. You can create as many custom metrics as needed, each with different dimensions (like environment or region), so you can monitor various parts of your application with precision.

3. Create Alarms

Once your custom metrics are being published to CloudWatch, you can set up alarms that trigger actions when certain conditions or thresholds are met. The process here is very similar to the previous article.

CloudWatch alarms can be configured with conditions that align with your application’s performance needs, allowing you to be proactive rather than reactive.

4. Configure Auto Scaling Policies

Integrating CloudWatch alarms with Auto Scaling policies is a game-changer for automating the scaling of your application. With Auto Scaling, your application can dynamically adjust its capacity—either scaling up or scaling down — based on real-time performance data provided by your custom metrics.

For instance, let’s say you have a custom metric that tracks request latency. When the latency exceeds a set threshold, your CloudWatch alarm triggers an Auto Scaling policy to scale up the number of instances handling the traffic. Similarly, when the latency drops below an acceptable level, the system can scale down to reduce unnecessary resource usage.

By setting up auto scaling policies, you ensure that your application dynamically adjusts to meet workload demands while optimizing resource utilization. This approach is cost-effective, as it ensures that you’re not over-provisioning resources during low traffic periods or under-provisioning when demand is high.

Beyond SQS and EC2

While SQS and EC2 are fundamental components of many AWS deployments, numerous other services can contribute to cost and carbon footprint optimization. In the next article, we will delve deeper into these services and explore how they can be leveraged to build sustainable and cost-effective solutions on AWS. Here's a glimpse of what we'll cover:

1. Serverless Computing With AWS Lambda

AWS Lambda enables you to run code without managing servers. Lambda only consumes resources when it executes functions, significantly reducing idle server costs and overall resource consumption. This eliminates the need to provision and manage servers, which reduces both costs and environmental impact, making it ideal for on-demand, event-driven workloads.

2. Managed Databases With AWS RDS

AWS RDS provides managed relational databases, automating tasks like patching, backups, and scaling. This reduces the operational overhead and infrastructure costs of self-managed databases, allowing your team to focus on development. RDS also scales automatically to meet demand, helping to avoid over-provisioning and reducing resource waste.

3. Cost-Effective EC2 With Spot Instances

Spot Instances let you use spare EC2 capacity at deeply discounted rates. Though they can be interrupted with a short warning, they are perfect for fault-tolerant, non-critical workloads like batch processing. By leveraging Spot Instances, you can dramatically reduce EC2 costs while maintaining performance for stateless applications.

4. Optimizing EC2 With AWS Compute Optimizer

AWS Compute Optimizer analyzes your EC2 instance usage and recommends optimal instance types and sizes. This helps you avoid underutilized or over-provisioned instances, ensuring you're using the most cost-effective and performant resources, while also improving sustainability by reducing unnecessary resource consumption.

5. Analyzing Spending With AWS Cost Explorer

AWS Cost Explorer helps you track and analyze your AWS spending patterns. It provides insights into which services are driving costs and offers visibility into opportunities for savings. By regularly reviewing usage and spending, you can identify areas to optimize resources, reduce waste, and better align your costs with your business goals.

6. Cost Optimization With AWS Trusted Advisor

AWS Trusted Advisor provides recommendations for cost optimization, security, performance, and fault tolerance. It analyzes your AWS environment and highlights opportunities to improve efficiency, such as identifying underutilized resources or recommending right-sizing for instances, ultimately helping to lower costs and reduce environmental impact.

AWS Carbon (API) Scaling (geometry)

Opinions expressed by DZone contributors are their own.

Related

Trending