Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Centralized Log Management and Monitoring for CoreOS Clusters

DZone's Guide to

Centralized Log Management and Monitoring for CoreOS Clusters

Check out the current state of CoreOS Monitoring and Log Management

· Cloud Zone
Free Resource

Linkerd, the open source service mesh for cloud native applications. Get the complete guide to using Linkerd and Kubernetes to build scalable, resilient applications.

[Note: We’re holding Docker Monitoring and Docker Logging webinars in September — sign up today!]

If you’ve got an interest in things like CoreOS, logs and monitoring then you should check out our previous CoreOS-related posts on Monitoring Core OS Clusters and how to get CoreOS logs into ELK in 5 minutes.  And they are only the start of SPM integrations with CoreOS!  Case in point: we have recently optimized the SPM setup on CoreOS and integrated a logging gateway to Logsene into the SPM Agent for Docker.  And that’s not all…

In this post we want to share the current state of CoreOS Monitoring and Log Management from Sematext so you know what’s coming — and you know about things that might be helpful for your organization, such as:

  1. Feature Overview
  2. Fleet Units for SPM
  3. How to Set Up Monitoring and Logging Services

1. Feature Overview

  • Quick setup
    • add monitoring and logging for the whole cluster in 5 minutes
  • Collection Performance Metrics for the CoreOS Cluster
    • Metrics for all CoreOS cluster nodes (hosts)
      • CPU, Memory, Disk usage
    • Detailed metrics for all containers on each host
      • CPU, Memory, Limits, Failures, Network and Disk I/O, …
    • Anomaly detection and alerts for all metrics
    • Anomaly detection and alerts for all logs
  • Correlated Container Events, Metrics and Logs
    • Docker Events like start/stop/destroy are related to deployments, maintenance or sometimes to errors and unwanted restarts;  correlation of metrics, events and logs is the natural way to discover problems using SPM.
  • Docker Events

  • Centralized configuration via etcd
    • There is often a mix of configurations in environment variables, static settings in cloud configuration files, and combinations of confd and etcd. We decided to have all settings stored in etcd, so the settings are done only once and are easy to access.
    • SPM Agent for Docker includes a logging gateway service to receive log message via TCP.  The service discovery is solved via etcd (where the exposed TCP is stored). All received messages are parsed, and the following formats are supported:
      • journalctl -o short | short-iso | json
      • integrated messages parser (e.g. for dockerd time, level and message text)
      • line delimited JSON
      • plain text messages
      • In cases where the parsing fails, the gateway adds a timestamp and keeps the message 1:1.
    • The logging gateway can be configured with the Logsene App Token – this makes it compatible with most Unix tools e.g. journalctl -o json -n 10 | netcat localhost 9000
    • SPM for Docker collects all logs from containers directly from the Docker API. The logging gateway is typically used for system logs – or anything else configured in journald (see “Log forwarding service” below)
    • The transmission to Logsene receivers is encrypted via HTTPS.
  • The log forwarding service streams logs to the logging gateway by pulling them from journald. In addition, it saves the ‘last log time’ to recover after a service restart. Most people take this for granted; but not all logging services have such a recovery function.  There are many tools which just capture the current log stream. Often people realize this only when they miss logs one day because of a reboot, network outage, software update, etc.  But these are exactly the types of situations where you would like to know what is going on!
  • SPM integrations into CoreOS

    2. Fleet Units for SPM

    SPM agent services are installed via fleet (a distributed init system) in the whole cluster. Lets see those unit files before we fire them up into the Cloud.

    The first unit file spm-agent.service starts SPM Agent for Docker. It takes the SPM and Logsene app tokens and port for the logging gateway etcd. It starts on every CoreOS host (global unit).

    Fleet Unit File – SPM Agent incl. Log Gateway: spm-agent.service

    The second unit file logsene-service.service forwards logs from journald to that logging gateway running as part of spm-agent-docker. All fields stored in the journal (down to source-code level and line numbers provided by GO modules) are then available in Logsene.

    Fleet Unit File – Log forwarder: logsene.service

    3. Set Up Monitoring and Logging Services

    Preparation:

    1. Get a free account apps.sematext.com
    2. Create an SPM App of type “Docker” and copy the SPM Application Token
    3. Store the configuration in etcd
    # PREPARATION
    # set your application tokens for SPM and Logsene
    export $SPM_TOKEN=YOUR-SPM-TOKEN
    export $LOGSENE_TOKEN=YOUR-LOGSENE-TOKEN
    # set the port for the Logsene Gateway
    export $LG_PORT=9000
    # Store the tokens in etcd
    # please note the same key is used in the unit file!
    etcdctl set /sematext.com/myapp/spm/token $SPM_TOKEN
    etcdctl set /sematext.com/myapp/logsene/token $LOGSENE_TOKEN
    etcdctl set /sematext.com/myapp/logsene/gateway_port $LG_PORT
    
    

    Download the fleet unit files and start the service via fleetclt

    # INSTALLATION
    # Download the unit file for SPM
    wget https://raw.githubusercontent.com/sematext/spm-agent-docker/master/coreos/spm-agent.service
    # Start SPM Agent in the whole cluster
    fleetctl load spm-agent.service; fleetctl start spm-agent.service
    # Download the unit file for Logsene
    wget https://raw.githubusercontent.com/sematext/spm-agent-docker/master/coreos/logsene.service
    # Start the log forwarding service
    fleetctl load logsene.service; fleetctl start logsene.service
    

    Check the installation

    systemctl status spm-agent.service
    systemctl status logsene.service
    

    Send a few log lines to see them in Logsene.

    journalctl -o json -n 10 | ncat localhost 9000

    After about a minute you should see Metrics in SPM and Logs in Logsene.

    Cluster Health in ‘Birds Eye View’

    Host and Container Metrics Overview for the whole cluster

    Logs and Metrics

    Open-Source Resources

    Some of the things described here are open-sourced:

    Summary – What this gets you

    Here’s what this setup provides for you:

    • Operating System metrics of each CoreOS cluster node
    • Container and Host Metrics on each node
    • All Logs from Docker containers and Hosts (via journald)
    • Docker Events from all nodes
    • CoreOS logs from all nodes

    Having this setup allows you to take the full advantage of SPM and Logsene by defining intelligent alerts for metrics and logs (delivered via channels like e-mail, PagerDuty, Slack, HipChat or any WebHook), as well as making correlations between performance metrics, events, logs, and alerts.

    Running CoreOS? Need any help getting CoreOS metrics and/or logs into SPM & Logsene?  Let us know!  Oh, and if you’re a small startup — ping @sematext — you can get a good discount on both SPM and Logsene!

    Linkerd, the open source service mesh for cloud native applications. Get the complete guide to using Linkerd and Kubernetes to build scalable, resilient applications.

    Topics:
    coreos

    Published at DZone with permission of Radu Gheorghe, DZone MVB. See the original article here.

    Opinions expressed by DZone contributors are their own.

    THE DZONE NEWSLETTER

    Dev Resources & Solutions Straight to Your Inbox

    Thanks for subscribing!

    Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

    X

    {{ parent.title || parent.header.title}}

    {{ parent.tldr }}

    {{ parent.urlSource.name }}