DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. Fluentd, Kubernetes, and Google Cloud Platform: A Few Recipes for Streaming Logging

Fluentd, Kubernetes, and Google Cloud Platform: A Few Recipes for Streaming Logging

This tutorial teaches you to seamlessly help Fluentd, Kubernetes and Google Cloud Platform talk to each other to serve your needs.

John Hammink user avatar by
John Hammink
·
Jun. 16, 16 · Tutorial
Like (1)
Save
Tweet
Share
3.43K Views

Join the DZone community and get the full member experience.

Join For Free

Maybe you already know about Fluentd's unified logging layer. Maybe you are already familiar with the idea that logs are streams, not files, thus it's necessary to think of a logging layer dynamically this way.

Actually, it's that very last point that lends a crucial understanding to how Fluentd is configured. It's all about how we handle the different elements of the stream: where we get that data from, what we do with it when we get it, where we send the processed data to, and how we handle it as we send it on its way. In this blog, we'll review these concepts and apply them in the following "recipes:"

  1. Logging stdout commands from a Docker container (but keeping the config when the container dies!).
  2. Handling JSON logs.
  3. Sorting messages by levels.
  4. Splitting the data stream to two destinations.

container_logging

As it turns out, Google Cloud Platform and Kubernetes now include Fluentd logging layer output as a default so that you can do precisely such things. But first, let's look at the directives in a fluentd.conf file:

  1. source directives determine the input sources.
  2. match directives determine the output destinations.
  3. filter directives determine the event processing pipelines.
  4. system directives set system wide configuration.
  5. label directives group the output and filter for internal routing.
  6. @include directives include other files.

A Basic Recipe (For Logging Docker Stdout Commands)

For our purposes today, we'll be primarily concerned with source and match directives. Below is a sample configuration for logging stdout commands from a docker container directly to Treasure Data (and, because our configuration lives on the Ubuntu host, it doesn't die with the Docker container!):

<source>
  type forward
</source>

<match td.*.*>
  type tdlog
  apikey "<YOUR_TREASURE_DATA_API_KEY>"
  auto_create_table
  buffer_type file
  buffer_path /var/log/td-agent/buffer/td
  flush_interval 5s

  <secondary>
    type file
    path /var/log/td-agent/failed_records
  </secondary>
</match>

## match tag=docker.** and dump to console
<match docker.**>
  type stdout
</match>

So, what's happening here?

Our source directive tells us that we are using the forward input plugin, which tells Fluentd to listen to a TCP socket to receive the event stream.

We have two match directives. The one at the end assumes we've set our logging option accordingly when we launched the container:

– -log-opt fluentd-tag=td.docker.{{.Name}}
 This directive tells us to use the stdout plugin to print events to standard out. 

However, it's the first directive that's the most interesting. Assuming the same logging options, we match everything that’s tagged td.*.* and, using the tdlog output plugin, send each console output as a single record to a Treasure Data Database named Docker, where the table is also the name of the Docker container:

  • auto_create_table creates tables on the first instance.

  • buffer_type file buffers to a file.

  • buffer_path specifies the buffer files on our Docker container.

  • flush_interval 5s specifies a five-second interval to flush the buffer and write to the Treasure Data table.

Are you starting to see how this works? For more specifics on Fluentd configuration and parameters, see the related article.

Logging Results to Google Cloud Platform

fluent_GCP

Ready to see how Fluentd works with Google Cloud Platform? Let's look at a few different scenarios. Thanks to the Kubernetes team for making these configurations (and ones like these) available on GitHub.

Handling JSON Logs

# example: {"log":"[info:2016-02-16T16:04:05.930-08:00] Some log text here\n","stream":"stdout","time":"2016-02-17T00:04:05.931087621Z"}

<source>
  type tail
  format json
  time_key time
  path /var/log/containers/*.log
  pos_file /var/log/gcp-containers.log.pos
  time_format %Y-%m-%dT%H:%M:%S.%NZ
  tag reform.*
  read_from_head true
</source>

<match reform.**>
  type record_reformer
  enable_ruby true
  tag kubernetes.${tag_suffix[4].split('-')[0..-2].join('-')}
</match>

Here, we're tailing our logs in JSON and logging the results to Kubernetes. We have to work with the timestamp, so we've included the time_key and the time_format directives. Lastly, we're tagging the data stream with Kubernetes and the appropriate unique suffix. We've also specified a position file and set read_from_head to true. This enables us to stop and restart processing if, for some reason, our stream is interrupted.

Sorting Out Messages By Different Levels

# Examples:
# time="2016-02-04T06:51:03.053580605Z" level=info msg="GET /containers/json"
# time="2016-02-04T07:53:57.505612354Z" level=error msg="HTTP Error" err="No such image: -f" statusCode=404

<source>
  type tail
  format /^time="(?<time>[^)]*)" level=(?<severity>[^ ]*) msg="(?<message>[^"]*)"( err="(?<error>[^"]*)")?( statusCode=($<status_code>\d+))?/
  time_format %Y-%m-%dT%H:%M:%S.%NZ
  path /var/log/docker.log
  pos_file /var/log/gcp-docker.log.pos
  tag docker
</source>

We can tail different messages, using regular expressions to look for parts of the message like format, level (severity), message, error (if any), and Status Code. Note that we must parse the time format on input. Again, we're using a position file to keep our place in the stream. We should include a match section to route the data to a specific destination.

Splitting the Data Stream to Two Destinations

# Example:
# I0603 15:31:05.793605       6 cluster_manager.go:230] Reading config from path /etc/gce.conf
<source>
  type tail
  format multiline
  multiline_flush_interval 5s
  format_firstline /^\w\d{4}/
  format1 /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(?<source>[^ \]]+)\] (?<message>.*)/
  time_format %m%d %H:%M:%S.%N
  path /var/log/cluster-autoscaler.log
  pos_file /var/log/gcp-cluster-autoscaler.log.pos
  tag cluster-autoscaler
</source>

# We use 2 output stanzas - one to handle the container logs and one to handle
# the node daemon logs, the latter of which explicitly sends its logs to the
# compute.googleapis.com service rather than container.googleapis.com to keep
# them separate since most users don't care about the node logs.
<match kubernetes.**>
  type google_cloud
  # Set the chunk limit conservatively to avoid exceeding the GCL limit
  # of 10MiB per write request.
  buffer_chunk_limit 2M
  # Cap the combined memory usage of this buffer and the one below to
  # 2MiB/chunk * (24 + 8) chunks = 64 MiB
  buffer_queue_limit 24
  # Never wait more than 5 seconds before flushing logs in the non-error case.
  flush_interval 5s
  # Never wait longer than 30 seconds between retries.
  max_retry_wait 30
  # Disable the limit on the number of retries (retry forever).
  disable_retry_limit
</match>

# Keep a smaller buffer here since these logs are less important than the user's
# container logs.
<match **>
  type google_cloud
  detect_subservice false
  buffer_chunk_limit 2M
  buffer_queue_limit 8
  flush_interval 5s
  max_retry_wait 30
  disable_retry_limit
</match>


Here, we're parsing the autoscaler log, managing our buffer, queue and chunk sizes, and, in the cases of both destinations (namely, GCP and Kubernetes), we retry forever.

What’s Next?

Would you to build the easiest possible logging infrastructure you can? Get Fluentd!

  • fluentd.org

There are more than two hundred input, output, and other plugins here. Here you can see them sorted in descending order by popularity here:

  • fluentd.org/plugins/all

If you are interested in seeing the plug-ins by category, go here:

  • fluentd.org/plugins

Related Refcard: Getting Started With Kubernetes

Kubernetes Fluentd Database Cloud Google (verb) Data stream Docker (software) Directive (programming)

Published at DZone with permission of John Hammink, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • How to Quickly Build an Audio Editor With UI
  • DevOps Roadmap for 2022
  • Asynchronous HTTP Requests With RxJava
  • Secrets Management

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: