How to Optimize AWS Observability Tools
AWS provides several disparate observability tools that can be combined for a system's complete picture.
Join the DZone community and get the full member experience.Join For Free
Amazon Web Services (AWS) is a powerhouse cloud computing service allowing companies to produce computational functionality. They enable developers to quickly create serverless functions, which quickly delivers new features to consumers without scaling up infrastructure, taking both time and cost. The downside to this speed is that tracking and observing these functions’ health issues can be difficult, especially when running microservices. AWS provides several tools to assist developers in understanding their system’s health and are in the process of delivering new tools as well.
Observability With AWS CloudWatch
CloudWatch is AWS’s monitoring and insight service. Developers use CloudWatch to collect logs from compute functions and track performance information for many AWS services. Using this data, CloudWatch can create insights on which developers can develop alarms or insights. Using the combination of these tools, developers can create AWS observability tools that meet their needs.
Gaining Function Understanding Through CloudWatch Logs
Logging is a straightforward method to gain an understanding of what your system is doing. However, it does not on its own give simple, at-a-glance knowledge of the health of your system. Logs are best suited for troubleshooting required after identifying a bug in a specific area or function of your system.
Developers can also use CloudWatch logs to trigger alarms or metrics. If you use error logging in your compute functions, CloudWatch will identify these separately from other logs, making it simple to trigger an alarm on error events.
In CloudWatch, users pay for data collection, data storage, and log analysis. Data collection is the most expensive step costing over 16x more than storage for each GB of data. Cloudwatch costs can ramp up quickly, even with a relatively simple platform. Keeping logs to a minimum can have a significant impact on your AWS bill. You may consider using debug logs that can be turned on and off in your production environment to allow for troubleshooting when needed, but otherwise keeping logging to a minimum.
Using CloudWatch Metrics to Understand Service Health
Using Metrics in CloudWatch shows developers about the performance of their system. AWS provides some metrics by default, including information like a Lambda function’s duration, the number of errors returned from an API Gateway, or the used throughput of a Kinesis stream. Developers can also create custom metrics. For example, they could detect the number of error logs CloudWatch received and set a custom metric to track that number.
Metrics on their own are useful for troubleshooting specific data or checking in on your system. If you need to track a metric often, you can save time by adding the metric to a dashboard where it will be available at a glance. If you have a known range where the system will require maintenance, you can set an alarm to notify you should it occur.
Custom metrics have costs associated with them, and the cost varies depending on how many metrics you have (the more metrics, the less expensive they are). You will still incur charges for other services used to feed the metric.
Visualize Observability With Dashboards
Dashboards on CloudWatch are customizable pages that allow you to monitor resources at-a-glance. You can visualize the status of metrics and alarms or create a widget using CloudWatch logs. Developers can save graphs on a dashboard, so you do not have to configure the graphed metric or setup the log filter each time you enter CloudWatch. While dashboards are a better user experience than CloudWatch metrics or logs alone, they are limited in the amount of information shown by the default metrics and custom metrics you create or the log filters you configure.
Dashboards carry a flat fee per dashboard per month. If you require different dashboards to monitor other aspects of your system, this cost may not scale well. AWS allows up to 500 metrics in a single dashboard widget and up to 2500 metrics per dashboard.
Send Data to AWS Elasticsearch Service
CloudWatch provides a tool allowing developers to stream logs directly into an Amazon Elasticsearch Service cluster. Elasticsearch is a source-available platform used by AWS to provide the ELK stack’s functionality inside the AWS platform. While at its core Elasticsearch is a search engine, its embedded tools like document aggregations make it an excellent log analytics tool. Since CloudWatch can stream to the AWS Elasticsearch service in near-real time, you can be sure to access knowledge about your system’s health quickly. Further, users could configure a Kibana dashboard to get a polished interface for their data.
With this AWS Observability tool, take care to track how much data is streamed from CloudWatch to Elasticsearch to prevent costs from adding up. Users pay for the CloudWatch and Elasticsearch services according to standard rates and pay for each log event sent between the two services. To keep costs low, try to minimize logs sent to CloudWatch to only those necessary for analytics.
Recently, Elasticsearch has also changed its licensing model. While they used to be open-source, their new model removes that title and restricts SaaS platforms like AWS to either pay a premium for using the Elasticsearch source code or contribute to the code base. It is still unknown how the AWS Elasticsearch service’s cost will be affected by this change, but chances are users will see an increase in their AWS bill soon.
Observability With AWS X-Ray
X-Ray is a tool within AWS that monitors distributed applications. It shows you how requests flow through microservices run on AWS. These traces can be tracked across different services, accounts, and regions, meaning X-Ray can trace even complex data and allow the user to track any issues throughout the microservice. The result is a map of data flowing through your system so you can see where performance issues, choke points, or configuration issues are originating.
AWS charges individually for traces recorded, retrieved, and scanned with X-Ray. For an additional cost, you can also get X-Ray Insights. While in preview only, Insights create records of when fault rates are outside your preset, allowed range. This feature is useful for at-a-glance knowledge of whether you need to troubleshoot your system or if data is flowing through appropriately. However, adding X-Ray Insights into the mix is likely to make it even more costly.
Observability With AWS CloudTrail
CloudTrail is an AWS service that keeps records of activities taken by users, roles, or services. The logged data is useful for analyzing your system’s security, giving observability into how secure your information is within your AWS setup. CloudTrail can also set up automatic notifications; it detects operational issues, giving developers the comfort to know about their platform’s critical points.
CloudTrail classifies events as either management (operations on your AWS account) or data (operations within your system or running AWS service). Each of these will incur a cost, and data events cost less than management events. A typical system can expect few management events compared to data events, so this cost discrepancy should not have an enormous impact on your bill. AWS charges more per event if you set up an Insight on your CloudTrail setup, but you have the benefit of being alerted when suspicious behavior is detected. Using CloudTrail to augment security systems could pay off on its own, regardless of other observability tools used.
Observability With AWS Grafana
Grafana, an open-source data visualization service, was added as a preview service on AWS in December 2020. Grafana adds a rich visualization service to AWS that will allow users to observe your AWS services’ health and monitor ongoing logs and metrics.
What Grafana Does for AWS Observability
The Grafana library provides superior methods to display logged data and analytics. They provide an abundance of different graph types and allow for customizable colors to give a rich user interface. They also combine alerting capabilities, so you will be notified when your system is not performing as expected. With the AWS managed Grafana service, developers can directly interface with Grafana via private, native connections. Where security is crucial, using a private VPC to gain system observability with Grafana native to AWS may finally give a way to get a superior user interface without risking sending data to another service.
Like many AWS services, Amazon-managed Grafana charges for what you used monthly. While this service is in preview until February 2021, the service is free. After the preview period is over, users can take advantage of a 90-day free trial, though there are limits to the number of account users allowed for this trial.
After these free trials, Amazon charges per active user per workspace per month. You can also choose to get an Enterprise subscription for some extra fees, both flat and per user. This subscription includes further training resources and plugins from Grafana not available on AWS.
In comparison to the cost of other AWS tools, Grafana has a much higher price. Some developers may find it pays off to have a superior user interface to assist in troubleshooting than the more rudimentary CloudWatch dashboards.
Summary of AWS Observability Tools
AWS has several tools available to give developers insights into the health and performance of their systems. Developers use each tool to track their AWS system’s performance, security, and configuration to find issues before affecting customers. The point here, though, is that these are all disparate tools. None provide a “single pane of glass” AWS observability functionality. However, Coralogix does.
For a complete picture of your platform, developers may choose to use all these services from AWS. Unfortunately, developers may find the burden to set up and maintain these services too high. Further, the cost of these services does not scale well to large platforms. Typical developers will find that they will need to use CloudWatch logging; beyond that, developers may choose to use AWS tools or choose to send the logs to another service for analysis.
Opinions expressed by DZone contributors are their own.