DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Telemetry Pipelines Workshop: Parsing Multiple Events
  • From 24 Hours to 2 Hours: How We Fixed a Broken BI System With Apache Airflow
  • Beyond Manual Annotation: Engineering Self-Correcting Pseudo-Labeling Pipelines
  • Using LLMs to Automate Data Cleaning and Transformation Pipelines

Trending

  • LLM-Powered Deep Parsing for Industrial Inventory Search
  • Mocking Kafka for Local Spring Development
  • Build a GitHub Slack Bot With AWS Bedrock and MCP, Part 1
  • Contract-First Integration: Building Scalable Systems With Flyway, OpenAPI, and Kafka
  1. DZone
  2. Testing, Deployment, and Maintenance
  3. DevOps and CI/CD
  4. Mastering Fluent Bit: Top Tips Using Telemetry Pipeline Parsers for Developers (Part 8)

Mastering Fluent Bit: Top Tips Using Telemetry Pipeline Parsers for Developers (Part 8)

This intro to mastering Fluent Bit covers handy tips and tricks for speeding up the inner development loop using parsers in telemetry pipelines.

By 
Eric D.  Schabell user avatar
Eric D. Schabell
DZone Core CORE ·
Oct. 27, 25 · Tutorial
Likes (2)
Comment
Save
Tweet
Share
1.3K Views

Join the DZone community and get the full member experience.

Join For Free

This series is a general-purpose getting-started guide for those of us wanting to learn about the Cloud Native Computing Foundation (CNCF) project Fluent Bit.

Each article in this series addresses a single topic by providing insights into what the topic is, why we are interested in exploring that topic, where to get started with the topic, and how to get hands-on with learning about the topic as it relates to the Fluent Bit project.

The idea is that each article can stand on its own, but that they also lead down a path that slowly increases our abilities to implement solutions with Fluent Bit telemetry pipelines.

Let's take a look at the topic of this article, using Fluent Bit tips for developers. In case you missed the previous article, check out the top 3 telemetry pipeline output plugins for developers, where you get tips on the best of Fluent Bit for your developer experiences.

This article will be a hands-on tour of the things that help you as a developer testing out your Fluent Bit pipelines. We'll take a look at the top tip when using a parser for your telemetry pipeline configuration in Fluent Bit.

All examples in this article have been done on OSX and assume the reader is able to convert the actions shown here to their own local machines.

Where to Get Started

You should have explored the previous articles in this series to install and get started with Fluent Bit on your developer's local machine, either using the source code or container images. Links at the end of this article will point you to a free hands-on workshop that lets you explore more of Fluent Bit in detail.

You can verify that you have a functioning installation by testing your Fluent Bit, either using a source installation or a container installation, as shown below:

Shell
 
# For source installation.
$ fluent-bit -i dummy -o stdout

# For container installation.
$ podman run -ti ghcr.io/fluent/fluent-bit:4.0.8 -i dummy -o stdout

...
[0] dummy.0: [[1753105021.031338000, {}], {"message"=>"dummy"}]
[0] dummy.0: [[1753105022.033205000, {}], {"message"=>"dummy"}]
[0] dummy.0: [[1753105023.032600000, {}], {"message"=>"dummy"}]
[0] dummy.0: [[1753105024.033517000, {}], {"message"=>"dummy"}]
...


Let's look at a few tips and tricks to help you with your local development testing of Fluent Bit input plugins.

Parsing in a Telemetry Pipeline

See this article for details about the service section of the configurations used in the rest of this article, but for now, we plan to focus on our Fluent Bit pipeline and specifically the parsers that can be of great help in managing our telemetry data during testing in our inner developer loop.

Below, in the figure, you see the phases of a telemetry pipeline. The second phase is the parser, which is where unstructured input data is turned into structured data. Fluent Bit does this using Parsers that we can configure to manipulate the unstructured data, producing structured data for the next phases of our pipeline..

Pasring in a telemetry pipeline


An example of this can be found in the online workshop, where we see an example of unstructured log data:

Shell
 
192.168.2.20 - - [28/Jul/2006:10:27:10 -0300] "GET /cgi-bin/try/ HTTP/1.0" 200 3395


When unstructured log data is parsed by Fluent Bit, the results become structured data such as the following:

JSON
 
{
  "host":    "192.168.2.20",
  "user":    "-",
  "method":  "GET",
  "path":    "/cgi-bin/try/",
  "code":    "200",
  "size":    "3395"
}


The Fluent Bit parser engine is configurable and can process log entries based on two formats:

  • JSON maps
  • Regular expressions

By default, Fluent Bit provides a set of pre-configured parsers that can be used for different use cases, such as logs from these formats:

  • Apache
  • NGINX
  • Docker
  • Syslog rfc5424
  • Syslog rfc3164

Parsers tend to be defined in configuration files that are loaded at start time in the main Fluent Bit configuration file. We can also load parsers from the command line, but we won't be covering this here. Keeping all of this in mind, let's look at the most interesting parser that developers will want to know more about.

1. Regular Expression Parser 

One of the more common use cases for telemetry pipelines that developers will encounter is having multiple event streams producing data that creates the situation where keys are not unique if parsed without some cleanup. To illustrate how Fluent Bit can easily provide us with a means to both parse and filter events from multiple input sources to clean up any duplicate keys before sending them onward to a destination.

To provide an example, we start with a simple Fluent Bit configuration file fluent-bit.yaml containing a configuration using the dummy plugin to generate two types of events, both using the same key to cause confusion if we try querying without cleaning them up first:

YAML
 
service:
  flush: 1
  log_level: info
  http_server: on
  http_listen: 0.0.0.0
  http_port: 2020
  hot_reload: on

pipeline:
  inputs:
    # This entry generates a successful message.
    - name:  dummy
      tag:   event.success
      dummy: '{"message":"true 200 success"}'

    # This entry generates a failure message.
    - name:  dummy
      tag:   event.error
      dummy: '{"message":"false 500 error"}'

  outputs:
    - name: stdout
      match: '*'
      format: json_lines
      json_date_format: java_sql_timestamp   


Our configuration is tagging each successful event with event.success and failure events with event.error. The confusion will be caused by configuring the dummy message with the same key and message for both event definitions. This will cause our incoming events to be confusing to deal with.

Let's run this to confirm our working test environment:

Shell
 
# For source installation.
$ fluent-bit --config fluent-bit.yaml
# For container installation after building new image with your 
# configuration using a Buildfile as follows:
#

# FROM ghcr.io/fluent/fluent-bit:4.1.0
# COPY ./fluent-bit.yaml /fluent-bit/etc/fluent-bit.yaml
# CMD [ "fluent-bit", "-c", "/fluent-bit/etc/fluent-bit.yaml" ]
#
$ podman build -t fb -f Buildfile

$ podman run --rm fb 

...
{"date":"2025-10-26 19:59:34.508732","message":"true 200 success"}
{"date":"2025-10-26 19:59:34.508837","message":"false 500 error"}
{"date":"2025-10-26 19:59:35.509396","message":"true 200 success"}
{"date":"2025-10-26 19:59:35.509456","message":"false 500 error"}
{"date":"2025-10-26 19:59:36.508828","message":"true 200 success"}
...


Now we have dirty ingested data coming into our pipeline, showing that we have multiple messages on the same key. To be able to clean this up for usage before passing on to the backend (output), we need to make use of both the Parser and Filter phases.

First, in the Parser phase, where unstructured data is converted into structured data, we'll make use of the built-in Regular expression parser plugin to structure the duplicate messages into something more usable. To set up the parser configuration, we create a new file called parsers.yaml in our favorite editor. Add the following configuration, where we are defining a parser, naming the parser message_cleaning_parser, selecting the built-in regex parser, and applying the regular expression shown here to convert each message into a structured format (note this actually is applied to incoming messages in the next phase of the telemetry pipeline):

Shell
 
# This parser uses the built-in parser plugin and applies the
# regex to all incoming events.
#
parsers:
  - name: message_cleaning_parser
    format: regex
    regex: '^(?<valid_message>[^ ]+) (?<code>[^ ]+) (?<type>[^ ]+)$'


In the Filter phase, the previously defined parser is put to the test. To set up the filters configuration, we create a new section as shown below and add the following configuration, where we are defining filters, naming a new filter parser, matching all incoming messages to apply this filter, looking for the key message to select the value to be fed into the parser, and applying the parser message_cleaning_parser to it:

YAML
 
service:
  flush: 1
  log_level: info
  http_server: on
  http_listen: 0.0.0.0
  http_port: 2020
  hot_reload: on
  parsers_file: parsers.yaml

pipeline:
  inputs:
    # This entry generates a successful message.
    - name:  dummy
      tag:   event.success
      dummy: '{"message":"true 200 success"}'

    # This entry generates a failure message.
    - name:  dummy
      tag:   event.error
      dummy: '{"message":"false 500 error"}'

  filters:
    - name: parser
      match: '*'
      key_name: message
      parser: message_cleaning_parser

  outputs:
    - name: stdout
      match: '*'
      format: json_lines
      json_date_format: java_sql_timestamp    


Also, note that we have to include the parsers_file by name to ensure our filters can find the parser we defined. Now, when we run the configuration, we see the following:

YAML
 
# For source installation.
$ fluent-bit --config fluent-bit.yaml

# For container installation after building new image with your 
# configuration using a Buildfile as follows:
#
# FROM ghcr.io/fluent/fluent-bit:4.1.0
# COPY ./fluent-bit.yaml /fluent-bit/etc/fluent-bit.yaml
# COPY ./parsers.yaml /fluent-bit/etc/parsers.yaml
# CMD [ "fluent-bit", "-c", "/fluent-bit/etc/fluent-bit.yaml" ]
#
$ podman build -t fb -f Buildfile

$ podman run --rm fb 

...
{"date":"2025-10-26 20:15:54.233766","valid_message":"true","code":"200","type":"success"}
{"date":"2025-10-26 20:15:54.234199","valid_message":"false","code":"500","type":"error"}
{"date":"2025-10-26 20:15:55.234238","valid_message":"true","code":"200","type":"success"}
{"date":"2025-10-26 20:15:55.234323","valid_message":"false","code":"500","type":"error"}
{"date":"2025-10-26 20:15:56.233915","valid_message":"true","code":"200","type":"success"}
{"date":"2025-10-26 20:15:56.234009","valid_message":"false","code":"500","type":"error"}
...


Note the alternating generated event lines with parsed messages that now contain keys for each field to simplify later querying. The message key has been parsed to show valid_message, solving the confusing use case.

This covers the top tip for developers getting started with Fluent Bit while trying to leverage a parser to clean up their telemetry data quickly and speed up their inner development loop.

More in the Series

In this article, you learned a few handy tricks for using Fluent Bit output plugins and routing to improve the inner developer loop experience. This article is based on this online free workshop.

There will be more in this series as you continue to learn how to configure, run, manage, and master the use of Fluent Bit in the wild. Next up, exploring some of the more interesting Fluent Bit processors for developers.

Telemetry Parser (programming language) Pipeline (software)

Published at DZone with permission of Eric D. Schabell. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Telemetry Pipelines Workshop: Parsing Multiple Events
  • From 24 Hours to 2 Hours: How We Fixed a Broken BI System With Apache Airflow
  • Beyond Manual Annotation: Engineering Self-Correcting Pseudo-Labeling Pipelines
  • Using LLMs to Automate Data Cleaning and Transformation Pipelines

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook