A log management tool like Loggly has become an essential part of the operational infrastructure for monitoring cloud-based applications. In an application stack made up of clusters of multiple machines, logging into an individual machine is inconvenient at best, and even infeasible in cases where a cluster would have a large number of machines, as is common with Hadoop or Elasticsearch. The automated provisioning of cloud infrastructure and its elasticity makes it impractical to count on logging in to an individual machine to look at its log files.
Log management solutions usually provide a web UI for users to search logs as well as a query language to filter information to suit specific needs. Though such methods are powerful, many sysadmins and operations engineers miss the option of logging into a problem machine and tailing the application logs to look for clues during an outage. Loggly’s Live Tail addresses that need: Using Live Tail, you can tail the stream of log aggregates and filter it to do whatever you could do in a UNIX shell.
The usage of this powerful utility is very simple. Here’s an example of how you would use Live Tail to print the log lines that contain the word “404”:
$ ./livetail -m “404”
The output of Live Tail can be piped into any other UNIX commands, like sed or awk, for further processing. In this post, I will show you how to leverage that feature to integrate Loggly with another popular cloud-monitoring platform, Datadog.
Setup and Usage
If you have been using Loggly already, you can set up Live Tail as a client anywhere on a Linux, Mac OS X, or Windows environment. It is a Java application and the steps to install the tool are available here.
The general usage of the tool is the following:
$ livetail -m <matcher pattern> -i <ignore pattern>
To mimic the behavior of “
tail -f” you can do the following, in which case, you may have to filter the output using other UNIX commands like grep or awk:
$ ./livetail -m “.*”
Integrating With Datadog
While Loggly is dedicated to aggregation, indexing, and searching of logs, Datadog is a platform suitable for consolidating various aspects of monitoring for alerting and operational analysis. It covers traditional aspects of network and system monitoring and also provides plugins to cover monitoring of popular components used to build cloud infrastructure and applications. It also tracks various metrics, both system and custom, with which you can create charts and operational dashboards.
Datadog also has an Events Dashboard where important operational events such as alerts from the monitors are published. Using APIs, custom messages can be posted on this dashboard, and that could be very useful for the first responders to production issues, typically a NOC or on-call team in the operations group.
I built a sample integration of Live Tail with Datadog to meet these objectives:
- Using Live Tail, locate Apache error logs corresponding to HTTP status 404 (Resource Not Found).
- Post the log entry 404 incident to Datadog’s Events Dashboard.
- Aggregate the number of 404 accesses by minute and publish it as a custom time series metric on Datadog and use that data to create graphs.
The tailing of Loggly logs for 404s is simple, as in the usage below:
$ ./livetail -m “404”
To integrate with Datadog, the output of livetail command is piped into a simple Python script. It posts every 404 incident as an event to the Events Dashboard. The script also counts such incidents and reports as a per-minute aggregate to the Datadog backend. This custom metric is then used for creating couple charts on an operational dashboard.
The latest version of the Python script, postMetrics2Datadog.py, that integrates Live Tail with Datadog is available at GitHub:
The details on Datadog API are available here: http://docs.datadoghq.com/api/. In this example, the custom metric published to Datadog is labelled as “loggly.livetail.dd_integ.count_404”.
The script is run in the background, and it will post events and the custom metric to Datadog whenever it locates a log entry with “404”. To generate a 404 error, access a non-existent file on one of the web browsers from the browser or using curl:
$ curl http://18.104.22.168/file-not-installed
A sample event posted to the Events Dashboard:
The charts created using the custom time series metric published by the Live Tail – Datadog integration:
While the interactive uses of Live Tail’s features on the command line are powerful for triaging production issues, Live Tail also facilitates the generation of operational metrics that could be used to fine-tune the underlying system components Loggly helps to monitor.