Three Tips for EC2 Monitoring using CloudWatch
A overview of a few ways for monitoring EC2 using CloudWatch.
Join the DZone community and get the full member experience.Join For Free
Whether you have moved your on-prem workload to the cloud or building a cloud-native application, monitoring becomes very important. Instead of leaving this entirely to your Ops team, I encourage both developers and architects to have a monitoring strategy while designing solutions for the cloud.
In this, I am going to discuss a few tips for monitoring EC2 using CloudWatch and some important metrics to look at, so let's get started.
1. Instance Store vs. EBS metrics
Of all the metrics shown in the CloudWatch view under the "Monitoring" tab, the 4 "Disk*" metrics are computed for instance store volumes and not for most frequently provisioned EBS volumes. This is a common mistake by developers who are new to AWS, looking for their disk metrics.
2. Aggregate Statistics
AWS CloudWatch by default and with Basic Monitoring provides aggregation on per metrics basis for each EC2 instance, this is completely free. In cases where you would like to visualize a metric for all EC2 instances then you need to first enable detailed monitoring (at an additional charge), which provides data in 1-minute periods. As of now AWS provides 3 types of dimensions for aggregated data: (a) By Image Id, (b) By Instance Type and (c) Across All Instances.
When you have EC2 provisioned in multiple regions then create a new CloudWatch dashboard to have various EC2 metrics from multiple regions in a single view. This is a great feature for the Ops team, to have a consolidated view without switching to a specific region.
3. Key Metrics to monitor
- CPUCreditBalance and CPUSurplusCreditBalance — If you are using a burstable instance such as T3, T2 then you must monitor these two metrics. They will help you in determining how much CPU credit is left for next burst and in case if all credits are consumed and you are using surplus credits then how much of it is left. These are critical metrics and helps in understanding if you are using the right instance type or if the actual processing time is similar to expected or there are some non-actors running behind the scene eating up CPU credits.
- CPUUtilization — In AWS, any unused computing capacity (CPU cycles) are additional charges. This metric allows you to not only determine if you should switch to lesser size instances to save cost but also helps in finding a workload pattern which can decide in long run, a choice between reserved vs on-demand reservation type. Assuming you have a long term plan to stick with cloud (which is in most cases a yes) and if you have instances which have an average utilization of CPU most of the time then going for few reserve instances will definitely save you a lot of dollars. **This metric is certainly not the only criteria to choose between reserved or on-demand.
- NetworkIn/NetworkOut — No surprises here, whether you have a computer or I/O sensitive workload, you must always monitor your network traffic. High unexpected ingress traffic asks for more detailed monitoring using VPC Flow logs.
- StatusChecks metrics — Very crucial for your EC2 and recommendation is to create a CloudWatch alarm and use recovery action to ensure minimal downtime. This metric should be a top priority for the production environment.
Complete list of available metrics for EC2 can be found at AWS docs here.
Opinions expressed by DZone contributors are their own.