DZone
Cloud Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Cloud Zone > Dead Man's Switch With CloudWatch

Dead Man's Switch With CloudWatch

This is not a drill! Take a look a at how you can construct a dead man's switch to notify you of changes in your CloudWatch metrics.

Michael Wittig user avatar by
Michael Wittig
·
Jul. 23, 18 · Cloud Zone · Tutorial
Like (1)
Save
Tweet
5.84K Views

Join the DZone community and get the full member experience.

Join For Free

While writing this article, I'm traveling from Frankfurt to Stuttgart by high-speed train (ICE) with a top speed of 280 km/h. It is reassuring to know that a dead man's switch stops the train immediately if the train driver becomes incapacitated, such as through death, loss of consciousness, or being bodily removed from control.

Even though you are typically using CloudWatch alarms to make sure a metric does not exceed a threshold, it is also possible to build a dead man's switch with CloudWatch. Doing so allows you to monitor the health of processes and jobs. A few examples for typical failures to monitor with a dead man's switch often called heartbeat monitoring as well:

  • A daily backup did not complete.
  • It was not possible to generate a daily report.
  • A recurring import job failed.

The following example guides you through how to monitor a job backing up the home directory of an EC2 instance to S3 every 4 hours. You will learn how to create a dead man's switch consisting of the following building blocks:

  1. A CloudWatch custom metric collecting heartbeats from the backup job.
  2. A CloudWatch alarm is monitoring the metric for missing heartbeats.

Collecting Heartbeats

An EC2 instance publishes CloudWatch metrics like the CPU utilization, the number of read operations on disk, or the number of bytes sent out. Almost every other AWS service is publishing metrics as well. On top of that, you can publish a heartbeat to a custom metric as well.

The following snippet shows a backup script triggered by a cronjob every four hours.

  1. Synchronize the folder /home to S3.
  2. Send a heartbeat to a custom metric.
#!/bin/bash
aws s3 sync /home s3://my-company-backup/home
aws cloudwatch put-metric-data --namespace custom/backup --metric-data 'MetricName=heartbeat,Dimensions=[{Name=source,Value=home}],Value=1'

How does publishing a heartbeat to CloudWatch work?

  1. aws cloudwatch put-metric-data sends data to a custom metric.
  2. custom/backup is the namespace used for this example.
  3. The name of the custom metric is set to MetricName=heartbeat.
  4. The backup source (the home directory) is used as dimension: Dimensions=[{Name=source,Value=home}]

Learn more about custom metrics. Of course, it is also possible to publish heartbeats by using one of the AWS SDKs directly from within your application.

Next, to get notified whenever the backup job does not succeed anymore, you only need to create a CloudWatch alarm.

Monitoring Heartbeats

A CloudWatch alarm monitors a metric and triggers actions. For example, you can use a CloudWatch alarm to notify you whenever the CPU utilization of an EC2 instance is above 80% for more than 60 minutes. However, it is also possible to implement a dead man's switch with the help of a CloudWatch alarm as described next.

As illustrated in the following figure the following steps are necessary to start creating a new CloudWatch alarm:

  1. Open the CloudWatch service within the AWS Management Console.
  2. Select Alarms from the sub-navigation.
  3. Click the Create Alarm button.

The following figure shows how to select the custom metric.

  1. Choose the namespace custom/backup.
  2. Select the metric with the metric name heartbeat and source home.
  3. Click the Next button.

The last step is to configure the alarm as illustrated in the following figure.

  1. Type in deadmanswitch-backup-home as the Name and a Description for the alarm.
  2. Select < 0 as the threshold for the alarm ...
  3. ... for 1 out of 1 data points.
  4. Most importantly, set treat missing data as to bad.
  5. Select a timeframe of 6 hours.
  6. Select the statistic method Sum.
  7. Define an ALARM action.
  8. Create a new list and enter your email address.
  9. Don't forget to press the Create Alarm button.

By default, a CloudWatch alarm is entering the state INSUFFICIENT_DATA when there are no data points within the specified timeframe, which is 6 hours in our example. As we are configuring the alarm to treat missing data as bad, the alarm will enter the state ALARM instead of INSUFFICIENT_DATA. Learn more about how alarms treat missing data.

Summary

Creating a dead man's switch with the help of CloudWatch allows you to monitor if jobs are working as expected. I've used this approach to monitor an agent responsible for synchronizing data from an on-premises database to DynamoDB, for example.

Metric (unit) Heartbeat (computing) AWS

Published at DZone with permission of Michael Wittig. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Transactions vs. Analytics in Apache Kafka
  • Fintech and AI: Ways Artificial Intelligence Is Used in Finance
  • Waterfall Vs. Agile Methodologies: Which Is Best For Project Management?
  • Choosing Between REST and GraphQL

Comments

Cloud Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo