Over a million developers have joined DZone.

Kubernetes Logging and Monitoring: The EFK Stack — Part 1: Fluentd Architecture and Configuration

DZone's Guide to

Kubernetes Logging and Monitoring: The EFK Stack — Part 1: Fluentd Architecture and Configuration

Learn about best practices, architecture, and configuration of fluentd in the EFK logging and monitoring stack for Kubernetes.

· Performance Zone ·
Free Resource

Sensu is an open source monitoring event pipeline. Try it today.

In the previous article, we discussed the proven components and architecture of the EFK logging and monitoring stack for Kubernetes, comprised of Fluentd, Elasticsearch, and Kibana.

In this article, we'll dive deeper into best practices and configuration of fluentd.

What Is fluentd?

Fluentd is an efficient log aggregator. It is written in Ruby and scales very well. For most small to medium sized deployments, fluentd is fast and consumes relatively minimal resources. "Fluent-bit," a new project from the creators of fluentd claims to scale even better and has an even smaller resource footprint. For the purpose of this discussion, let's focus on fluentd as it is more mature and more widely used.

How Does fluentd Work?

Fluentd scraps logs from a given set of sources, processes them (converting into a structured data format), and then forwards them to other services like Elasticsearch, object storage etc. Fluentd is especially flexible when it comes to integrations - it works with 300+ log storage and analytic services.

  1. Fluentd gets data from multiple sources.
  2. It structures and tags data.
  3. It then sends the data to multiple destinations, based on matching tags

Image title

fluentd architecture.

Source Configuration in fluentd

For the purpose of this discussion, to capture all container logs on a Kubernetes node, the following source configuration is required:

@id fluentd-containers.log
@type tail
path /var/log/containers/*.log
pos_file /var/log/fluentd-containers.log.pos
time_format %Y-%m-%dT%H:%M:%S.%NZ
tag raw.kubernetes.*
format json
read_from_head true
  1. id: A unique identifier to reference this source. This can be used for further filtering and routing of structured log data
  2. type: Inbuilt directive understood by fluentd. In this case, "tail" instructs fluentd to gather data by tailing logs from a given location. Another example is "http" which instructs fluentd to collect data by using GET on http endpoint.
  3. path: Specific to type "tail". Instructs fluentd to collect all logs under /var/log/containers directory. This is the location used by docker daemon on a Kubernetes node to store stdout from running containers.
  4. pos_file: Used as a checkpoint. In case the fluentd process restarts, it uses the position from this file to resume log data collection
  5. tag: A custom string for matching source to destination/filters. fluentd matches source/destination tags to route log data

Routing Configuration in fluentd

Lets look at the config instructing fluentd to send logs to Eelasticsearch:

<match **>
@id elasticsearch
@type elasticsearch
@log_level info
include_tag_key true
type_name fluentd
host "#{ENV['OUTPUT_HOST']}"
port "#{ENV['OUTPUT_PORT']}"
logstash_format true
@type file
path /var/log/fluentd-buffers/kubernetes.system.buffer
flush_mode interval
retry_type exponential_backoff
flush_thread_count 2
flush_interval 5s
retry_max_interval 30
chunk_limit_size "#{ENV['OUTPUT_BUFFER_CHUNK_LIMIT']}"
queue_limit_length "#{ENV['OUTPUT_BUFFER_QUEUE_LIMIT']}"
overflow_action block
  1. "match" tag indicates a destination. It is followed by a regular expression for matching the source. In this case, we want to capture all logs and send them to Elasticsearch, so simply use **
  2. id: Unique identifier of the destination
  3. type: Supported output plugin identifier. In this case, we are using ElasticSearch which is a built-in plugin of fluentd.
  4. log_level: Indicates which logs to capture. In this case, any log with level "info" and above — INFO, WARNING, ERROR — will be routed to Elasticsearch.
  5. host/port: ElasticSearch host/port. Credentials can be configured as well, but not shown here.
  6. logstash_format: The Elasticsearch service builds reverse indices on log data forward by fluentd for searching. Hence, it needs to interpret the data. By setting logstash_format to "true", fluentd forwards the structured log data in logstash format, which Elasticsearch understands.
  7. Buffer: fluentd allows a buffer configuration in the event the destination becomes unavailable. e.g. If the network goes down or ElasticSearch is unavailable. Buffer configuration also helps reduce disk activity by batching writes.

Fluentd as Kubernetes Log Aggregator

To collect logs from a K8s cluster, fluentd is deployed as privileged daemonset. That way, it can read logs from a location on the Kubernetes node. Kubernetes ensures that exactly one fluentd container is always running on each node in the cluster. For the impatient, you can simply deploy it as helm chart.

$ helm install stable/fluentd-elasticsearch

To summarize, fluentd is highly scalable log aggregation solution. It provides a compelling option for log management in a Kubernetes cluster. In the next post, we will look at fluentd deployment along with Elasticsearch and Kibana for an end to end log management solution.

Sensu: workflow automation for monitoring. Learn more—download the whitepaper.

kubernetes ,fluentd ,elasticsearch ,monitoring ,logging ,tutorial ,performance

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}