Over a million developers have joined DZone.

Monitoring CoreOS Clusters

DZone's Guide to

Monitoring CoreOS Clusters

Take full advantage of SPM and Logsene by defining intelligent alerts for metrics and logs, delivered to channels like e-mail, PagerDuty, Slack, HipChat or any WebHook,

· Performance Zone ·
Free Resource

Built by operators for operators, the Sensu monitoring event pipeline empowers businesses to automate their monitoring workflows and gain deep visibility into their multi-cloud environments. Get started for free today.

[This article was written by Mick Emmett]

In this post you’ll learn how to get operational insights (i.e. performance metrics, container events, etc.) from CoreOS and make that super simple with etcd, fleet, and SPM.

We’ll use:

  • SPM for Docker to run the monitoring agent as a Docker container and collect all Docker metrics and events for all other containers on the same host + metrics for hosts
  • fleet to seamlessly distribute this container to all hosts in the CoreOS cluster by simply providing it with a fleet unit file shown below
  • etcd to set a property to hold the SPM App token for the whole cluster

The Big Picture

Before we get started, let’s take a step back and look at our end goal.  What do we want?  We want charts with Performance Metrics, we want Event Collection, we’d love integrated Anomaly Detection and Alerting, and we want that not only for containers, but also for hosts running containers.  CoreOS has no package manager and deploys services in containers, so we want to run the SPM agent in a Docker container, as shown in the following figure:


By the end of this post each of your Docker hosts could look like the above figure, with one or more of your own containers running your own apps, and a single SPM Docker Agent container that monitors all your containers and the underlying hosts.

3 Simple Steps

1)  Create a new SPM App of type “Docker” and copy the SPM App Token

2) Set the SPM App Token via etcd. This makes the token instantly available to all SPM agent instances in the cluster:

etcdctl set /sematext.com/myapp/spm/token/SPM_TOKEN YOUR_SPM_APP_TOKEN

Of course, you can change “myapp” part to whatever you want.  This simply acts as a namespace in etcd in case you have multiple SPM Apps (and thus multiple SPM App Tokens).

3) Grab the spm-agent.service fleet unit file and submit it to fleet:

# download service file for spm-agent-docker
wget https://raw.githubusercontent.com/sematext/spm-agent-docker/master/coreos/spm-agent.service
# Load and start the service with
fleetctl load spm-agent.service
fleetctl start spm-agent.service

Fleet unit file

What’s this fleet unit file about?  It simple.  It reads the SPM App Token from etcd and then starts the Docker container with spm-agent-docker inside. This is what it looks like:

Description=SPM Docker Agent

ExecStartPre=-/usr/bin/docker kill spm-agent
ExecStartPre=-/usr/bin/docker rm spm-agent
ExecStartPre=/usr/bin/docker pull sematext/spm-agent-docker:latest
ExecStart=/bin/sh -c 'set -ex; /usr/bin/docker run --name spm-agent -e
SPM_TOKEN=$(etcdctl get /sematext.com/myapp/spm/SPM_TOKEN) -e HOSTNAME=$HOSTNAME -v /var/run/docker.sock:/var/run/docker.sock sematext/spm-agent-docker' ExecStop=/usr/bin/docker stop spm-agent



After about a minute, you should see Docker metrics and events in SPM.

Bildschirmfoto 2015-06-24 um 13.56.39

Open Sourced Everything

Everything described here is open-sourced:

Summary – What this gets you

What we  get after this setup is the following:

Having this little setup let’s you take the full advantage of SPM and Logsene e.g. by defining intelligent alerts for metrics and logs, delivered to channels like e-mail, PagerDuty, Slack, HipChat or any WebHook, as well as making correlations between performance metrics, events, logs, and alerts.

Running CoreOS? Need any help getting CoreOS metrics and/or logs into SPM & Logsene?  Let us know!  Oh, and if you’re a small startup — ping @sematext — you can get a good discount on both SPM and Logsene!

Download our guide to mitigating alert fatigue, with real-world tips on automating remediation and triage from an IT veteran.

performance ,coreos ,os ,monitoring

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}