Getting Started With AWS Monitoring
Take your monitoring to the cloud.
Join the DZone community and get the full member experience.Join For Free
Amazon Web Services (AWS) is the most popular public cloud, with 175 services and counting. A key element of a successful cloud operation is gaining visibility into what is running where, what issues are occurring, and dealing with them, preferably automatically.
In this article, I’ll discuss the basics of AWS monitoring, including Amazon services that can assist with monitoring, key metrics to watch for the most popular Amazon services, and a special focus on monitoring EC2 environments, which are the basis for most Amazon deployments.
What Is AWS Monitoring?
Amazon Web Services (AWS) is a top cloud service vendor, providing hundreds of services for cloud users worldwide. AWS offers several monitoring services, built to work natively with other AWS solutions while integrating with third-party tools.
AWS provides two widely used monitoring services:
AWS CloudWatch—a comprehensive monitoring solution designed to provide operational and security capabilities for DevOps teams, security professionals, and developers. Notable CloudWatch features include automated incident response, operational insights, troubleshooting, anomaly detection, and metric visualization. You can deploy CloudWatch on-premises as well as in the cloud.
AWS CloudTrail—a monitoring tool designed for tracking API usage and user activity across the AWS ecosystem. AWS CloutTrail tracks user actions and then automatically stores and records these event logs. You can log activities such as user identification, IP addresses, and dates and times during user interactions.
Key Metrics for AWS Monitoring
When monitoring your environment, it is important to choose certain metrics to focus on. Below are metrics you can consider when monitoring Amazon Elastic Compute Cloud (EC2), Amazon Elastic Block Store (EBS), and AWS Lambda.
Metrics for EC2 Monitoring
Amazon Elastic Compute Cloud (EC2) lets you easily provision and scale infrastructure resources, on-demand. The main resource EC2 provides is called an EC2 instance, which is essentially a virtual server provisioned in the AWS cloud. There is a wide range of EC2 instance types, each providing different CPU, storage, network capacity, and memory.
EC2 integrates natively with AWS monitoring tools, as well as with Elastic Load Balancing and Auto Scaling, which help you optimize usage and costs. When monitoring EC2, there are certain metrics that can help you maintain visibility, including CPU utilization, DiskReadOps, DiskWriteOps, DiskReadBytes, and DiskWriteBytes. Each of these metrics is key to ensuring optimal performance.
Metrics for Amazon EBS
Amazon Elastic Block Store (EBS) is a cloud-based block storage service, typically used for storing EC2 instances. There are two main categories of ECS volumes—hard-disk drives (HDD) and solid-state drives (SSD). You can store snapshots of EBS volumes in Amazon Simple Storage Service (Amazon S3) buckets, and transfer replicas across AWS regions as needed.
Here are key metrics you can consider when setting up your monitoring configuration:
VolumeReadBytes and VolumeWriteBytes—measure the number of bytes transferred from an EBS volume during a specific time period.
VolumeIdleTime—measures the time during which your EBS volumes remain inactive, in seconds.
VolumeTotalReadTime and VolumeTotalWriteTime—measure the total time required to complete all write and read operations during a queried time period, in seconds.
Metrics for AWS Lambda
AWS Lambda lets you execute code without having to provision and manage any underlying server resources. The solution is based on events, which are configurable using Lambda functions.
Each Lambda function is triggered in response to either an AWS event or an API call made by the AWS API Gateway. You can also trigger events by setting up manual invocations via the user interface. It is also possible to schedule Lambda functions.
When setting up monitoring for your AWS Lambda operation, consider the following metrics:
Duration—measures the time needed to execute a Lambda function, in milliseconds. You can use this metric to learn about overall performance.
Errors—tracks the number of executions that resulted in errors. When you see a high level of errors or a sudden increase, you can investigate the cause by analyzing Lambda logs.
Throttles—calculates the number of invocation attempts that were throttled due to exceeding the execution limit. You can use this metric to optimize concurrency limits and ensure the limits you set meet your requirements.
Automated vs. Manual Monitoring of EC2 Workloads
AWS offers several tools you can use to monitor workloads on EC2. Some of them automatically generate alerts on certain conditions, and some allow human operators to monitor system status.
Automated Monitoring Tools
Here are a few automatic monitoring facilities you can use to understand if something is wrong with your Amazon EC2 instances:
Amazon CloudWatch alarms—let you track one metric over a specified time frame, and when the metric goes over a certain threshold, send a notification to Amazon SNS or to an Auto Scaling Policy on Amazon EC2. CloudWatch alerts are not triggered by a momentary shift of the metric beyond its threshold; metrics need to be in an irregular state for a specified period of time.
CloudWatch Events—you can automate services on AWS to respond automatically to events. AWS delivers data on events in real time via Cloudwatch Events. You can specify automatic actions on any AWS service if an event meets certain logic.
System status checks—monitors AWS systems which your EC2 instances depend on. When a status check fails, this typically indicates there is a problem with the instance that requires AWS maintenance work. You can wait for Amazon to resolve the issue, or move your workloads to another instance.
The Amazon EC2 console and AWS automated monitoring tools provide a look at certain aspects of your environment. However, these tools do not monitor everything.
The Amazon EC2 dashboard, for example, shows instance state and alarm status, provides details into instance and volume metrics, and lets you see service health and scheduled events by region.
When using Amazon CloudWatch, the dashboard can show you the status of current alarms and the status of service health, as well as graphs that visualize the information. You can also set up CloudWatch to show graphs of troubleshooting issues and trends in your EC2 environment, and create and modify alarms.
You can use CloudWatch to search for AWS metrics and view all of your alarms and resources. Anything else that is not available in the EC2 console or the CloudWatch dashboard requires manual customization.
In this article, I covered the basics of AWS monitoring. I proposed key metrics for popular Amazon environments including:
CPU and disk utilization for EC2
Read and write volume and times for EBS
Duration, errors, and throttles for Lambda
In addition, I covered automated monitoring tools on Amazon, such as CloudWatch and CloudTrails, and key dashboard human operators should watch in day to day monitoring of EC2 environments. I hope this will be helpful as you plan a monitoring strategy for your AWS workloads.
Opinions expressed by DZone contributors are their own.