DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Databases
  4. Use Fluentd for Real-Time MongoDB Log Collection

Use Fluentd for Real-Time MongoDB Log Collection

Sadayuki Furuhashi user avatar by
Sadayuki Furuhashi
·
Jul. 22, 12 · Interview
Like (1)
Save
Tweet
Share
11.27K Views

Join the DZone community and get the full member experience.

Join For Free

About

This post shows how to use Fluentd’s MongoDB plugin to aggregate semi-structured logs in real-time.

Background

Fluentd is an advanced open-source log collector developed at Treasure Data, Inc (see previous post). Because Fluentd handles logs as semi-structured data streams, the ideal database should have strong support for semi-structured data. There are several databases that meet this criterion, but we believe MongoDB is the market leader.

For those of you who do not know what MongoDB is, it is an open-source, document-oriented database developed at 10gen, Inc. It is schema-free and uses a JSON-like format to manage semi-structured data.

This post shows how to import Apache logs into MongoDB with Fluentd, by really small configurations.

Mechanism

The figure below shows how the things work.

Fluentd does 3 things:

  1. It continuously “tails” the access log.
  2. It parses the incoming log entries into meaningful fields (such as ip, path, etc) and buffers them.
  3. It writes the buffered data to MongoDB periodically.

Install

For simplicity, this post shows the one-node configuration. You should have the following software installed on the same node.

  • Fluentd with MongoDB Plugin
  • MongoDB
  • Apache (with the Combined Log Format)

Fluentd’s most recent version of deb/rpm package includes the MongoDB plugin. If you want to use Ruby Gems to install the plugin, gem install fluent-plugin-mongo does the job.

  • Debian Package
  • RPM Package

For MongoDB, please refer to the downloads page.

  • MongoDB Downloads

Configuration

Let’s start the actual configurations. If you use deb/rpm, the Fluentd’s config file is located at /etc/td-agent/td-agent.conf. Otherwise, it is located at /etc/fluentd/fluentd.conf.

Tail Input

For input, let’s set up Fluentd to track the recent Apache logs (usually at /var/log/apache2/access_log). This is what the Fluentd configuration looks like.

<source>
  type tail
  format apache
  path /var/log/apache2/access_log
  tag mongo.apache
</source>

Let’s go through the configuration line by line.

  1. type tail: The tail plugin continuously tracks the log file. This handy plugin is part of Fluentd’s core plugins.
  2. format apache: Use Fluentd’s built-in Apache log parser.
  3. path /var/log/apache2/access_log: Assuming the Apache log is in /var/log/apache2/access_log.
  4. tag mongo.apache: mongo.apach tells Fluentd to parse the log entry into meaningtful fields.

That’s it. You should be able to output a JSON-formatted data stream for MongoDB to consume.

MongoDB Output

The output configuration should look like this:

<match mongo.**>
  # plugin type
  type mongo

  # mongodb db + collection
  database apache
  collection access

  # mongodb host + port
  host localhost
  port 27017

  # interval
  flush_interval 10s
</match>

The match section specifies the regexp to match the tags. If the tag is matched, then the config inside the <match>...</match> is used. In this example, the mongo.apache tag (generated by tail) is always used.

The ** in match.** matches zero or more period-delimited tag elements (e.g. match/match.a/match.a.b). flush_internal indicates how often the data is written to the database (MongoDB in this case). Other options specify MongoDB’s host, port, db, and collection.

Test

To test the configuration, just ping the Apache server however you want. This example uses ab (Apache Bench) program.

$ ab -n 100 -c 10 http://localhost/

Then, let’s access MongoDB and see the stored data.

$ mongo
> use apache
> db.access.find()
{ "_id" : ObjectId("4ed1ed3a340765ce73000001"), "host" : "127.0.0.1", "user" : "-", "method" : "GET", "path" : "/", "code" : "200", "size" : "44", "time" : ISODate("2011-11-27T07:56:27Z") }
{ "_id" : ObjectId("4ed1ed3a340765ce73000002"), "host" : "127.0.0.1", "user" : "-", "method" : "GET", "path" : "/", "code" : "200", "size" : "44", "time" : ISODate("2011-11-27T07:56:34Z") }
{ "_id" : ObjectId("4ed1ed3a340765ce73000003"), "host" : "127.0.0.1", "user" : "-", "method" : "GET", "path" : "/", "code" : "200", "size" : "44", "time" : ISODate("2011-11-27T07:56:34Z") }

Conclusion

Fluentd + MongoDB make real-time log collection simple, easy and robust.


 

 

 

MongoDB Fluentd

Published at DZone with permission of Sadayuki Furuhashi, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • What Was the Question Again, ChatGPT?
  • Microservices Discovery With Eureka
  • API Design Patterns Review
  • What Is a Kubernetes CI/CD Pipeline?

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: