Using Fluentd and MongoDB serverStatus for Real-Time Metrics
Using Fluentd and MongoDB serverStatus for Real-Time Metrics
Join the DZone community and get the full member experience.
Join For FreeDelivering modern software? Atomist automates your software delivery experience.
As developers, we often look for tools to make our work and processes more efficient. Sometimes we have to search for what we’re looking for and sometimes we’re lucky enough that it finds us! When our friends over at Treasure Data wrote to me about Fluentd, an open-source logging daemon written in Ruby that they created and maintain, I immediately saw value for MongoDB users looking for a quick way to collect data streams and store information in MongoDB.
Intro to Fluentd
Fluentd is an open source data collector designed to simplify and scale log management. Open-sourced in October 2011, it has gained traction steadily over the last 2.5 years: today, Fluentd has a thriving community of ~50 contributors and 1,900+ stargazers on GitHub with companies like Slideshare and Nintendo deploying it across hundreds of machines in production.
Fluentd has broad use cases: Slideshare integrates it into their company-wide infrastructure monitoring system, and Change.org uses it to route their log streams into various backends.
Most relevant to MongoDB developers, many folks use Fluentd to aggregate logs into MongoDB. The MongoDB community was one of the first to take notice of Fluentd, and the MongoDB plugin is one of the most downloaded Fluentd plugins to date.
Tutorial: Using MongoDB serverStatus for real-time & historical metrics
Today we’ll provide a tutorial on using Fluentd with MongoDB. To make things interesting, we decided to get a bit meta; we’ll be showing you how to store MongoDB serverStatus output into a MongoDB. The serverStatus command returns a document that provides an overview of the database process’s state.
With this data you can easily create real-time and/or historical metrics that you’re interested in. These metrics may be particularly useful for benchmarking, testing in development or monitoring your MongoDB’s overall health.
Installing Fluentd
If you need to install Fluentd, you can find detailed installation instructions on the project site. Fluentd is written in Ruby for flexibility, with performance-sensitive parts in C. However, since not all developers use Ruby, a stable distribution of Fluentd called td-agent was created. This allows developers unfamiliar with Ruby to quickly get up and running with Fluentd and avoid having to install the “fluentd” gem. The differences between td-agent and the fluentd gem can be found here.
For the purposes of this tutorial, we’ll assume you’ve installed td-agent for Mac; I’ll be using the Mac OSX distribution. However, if you’re using fluentd just replace all instances of “td-agent” with “fluentd” and all the steps will still apply.
Setting up your Fluentd configuration file
First, you’ll need to locate your td-agent.conf file. This is the config file that allows the user to control the input and output behavior of Fluentd by selecting plugins and specifying plugin parameters. If you don’t know where it is, you can run the command “td-agent” from your terminal and the streaming logs will output the config file path location (amongst other information). By default on OSX, the file path is /usr/local/etc/td-agent/td-agent.conf.
Configuring the serverStatus input plugin
Once you’ve found the config file, you can define a data input source to collect from.
First we’ll specify an input plugin – the serverStatus plugin that we’ve written for this tutorial. You’ll want to change your config file to look like the following:
<source> type serverstatus uri mongodb://dbUser:dbPass@host:port/admin # Replica sets use "uris" array param # uris ["mongodb://dbUser:dbPass@host1:port1/admin", "mongodb://dbUser:dbPass@host2:port2/admin", ...] stats_interval 5s # How frequently you get the server status. Every minute by default </source>
Next you’ll need to save the serverStatus plugin code so that Fluentd can load and run the plugin. In the same directory as your config file there resides a “plugins” folder. Go ahead and save the serverStatus plugin code in a file named “in_serverstatus.rb” in the “plugins” folder.
module Fluent class ServerStatusInput < Input Plugin.register_input('serverstatus', self) config_param :uris, :array, :default => nil config_param :uri, :string, :default => "mongodb://localhost:27017" config_param :stats_interval, :time, :default => 60 # every minute config_param :tag_prefix, :string, :default => "serverstatus" def initialize super require 'mongo' end def configure(conf) super unless @uris or @uri raise ConfigError, 'uris or uri must be specified' end if @uris.nil? @uris = [@uri] end @conns = @uris.map do |uri_str| uri_str = "mongodb://#{uri_str}" if not uri_str.start_with?("mongodb://") uri = Mongo::URIParser.new(uri_str) [Mongo::MongoClient.from_uri(uri_str), uri] end end def start @loop = Coolio::Loop.new tw = TimerWatcher.new(@stats_interval, true, @log, &method(:collect_serverstatus)) tw.attach(@loop) @thread = Thread.new(&method(:run)) end def run @loop.run rescue log.error "unexpected error", :error=>$!.to_s log.error_backtrace end def shutdown @loop.stop @thread.join end def collect_serverstatus begin for conn, conn_uri in @conns stats = conn.db('admin').command(:serverStatus => true) make_data_msgpack_compatible(stats) tag = [@tag_prefix, conn_uri.host.gsub(/[\.-]/, "_"), conn_uri.port].join(".") Engine.emit(tag, Engine.now, stats) end rescue => e log.error "failed to collect MongoDB stats", :error_class => e.class, :error => e end end # MessagePack doesn't like it when the field is of Time class. # This is a convenient method that traverses through the # getServerStatus response and update any field that is of Time class. def make_data_msgpack_compatible(data) if [Hash, BSON::OrderedHash].include?(data.class) data.each {|k, v| if v.respond_to?(:each) make_data_msgpack_compatible(v) elsif v.class == Time data[k] = v.to_i end } # serverStatus's "locks" field has "." as a key, which can't be # inserted back to MongoDB withou wreaking havoc. Replace it with # "global" data["global"] = data.delete(".") if data["."] elsif data.class == Array data.each_with_index { |v, i| if v.respond_to?(:each) make_data_msgpack_compatible(v) elsif v.class == Time data[i] = v.to_i end } end end class TimerWatcher < Coolio::TimerWatcher def initialize(interval, repeat, log, &callback) @callback = callback @log = log super(interval, repeat) end def on_timer @callback.call rescue @log.error $!.to_s @log.error_backtrace end end end end
The serverStatus input plugin executes the serverStatus() command every stats_interval seconds and also applies a tag to the data- in this case, serverstatus.hostName.portNumber. The tag is used by the output plugin to easily identify and store tagged data. For more on tags, I recommend checking out these 5 quick slides about the “Life of a Fluentd event”.
Configuring the out_mongo output plugin
Now that we have our input plugin set up, we need to set up an output plugin to store our data to our target destination (a MongoDB). If you’re using td-agent, it already comes bundled with a MongoDB output plugin called out_mongo or out_mongo_replset. If you’re using fluentd, you can install it by running the command below.
% fluent-gem install fluent-plugin-mongo
With the output plugin installed, we can now add output parameters to our config file such as database location, credentials and other options. We’ll add to our existing config file the following code.
<source> type serverstatus uri mongodb://dbUser:dbPass@host:port/admin stats_interval 5s # How frequently you get the server statuses. Every minute by default </source> <match serverstatus.**> type mongo user dbUser pass dbPass host hostName port portNumber database dbName # See https://github.com/fluent/fluent-plugin-mongo#mongotag-mapped-mode for details tag_mapped remove_tag_prefix serverstatus. collection misc flush_interval 10s </match>
The output plugin begins with a match regex that we’ve set to match the tag (“serverstatus”) tagged by the input plugin. Specify where you’d like the output to be stored (your database information) and you’re good to go!
Using multiple inputs/outputs
If you’d like to monitor multiple MongoDB deployments and/or use multiple outputs, the plugins support this too! To get serverStatus of more than one MongoDB, you can list URIs in the config file using the “uris” array parameter. The output for these MongoDBs will have different tags, making it easy to determine what data came from where.
To configure multiple outputs, you’ll need to use the “copy” output plugin. In the example below, we’ve modified our existing configuration file’s output code to also print the input results to the console.
# Input code goes here <match serverstatus.**> type copy <store> type stdout </store> <store> type mongo user dbUser pass dbPass host hostName port portNumber database dbName # See https://github.com/fluent/fluent-plugin-mongo#mongotag-mapped-mode for details tag_mapped remove_tag_prefix serverstatus. collection misc flush_interval 10s </store> </match>
Running Fluentd
Once you have the plugins set up, you can run Fluentd with either the ‘fluentd’ or ‘td-agent’ command from the command line. If you’re using the multiple output configuration from above, you’ll instantly see the serverStatus data printing in your console and storing to your MongoDB every 10 seconds.
With access to this data, you can calculate many interesting metrics that can help monitor the health of your MongoDB. However, you may notice that a lot of the metrics reported in serverStatus are growing totals as opposed to rates. For instance, instead of getting a simple updates-per-second number, serverStatus will give you the total number of update queries that have been made against the server since it started.
Creating useful metrics
Luckily it’s very simple to extract rates from multiple serverStatus documents. Since we’ve set the stats interval to every 5 seconds, to get updates-per-second we take the update numbers from 2 sequential serverStatus documents, subtract them and divide by 5. Assuming serverStatusB was recorded after serverStatusA:
(serverStatusB.opcounters.updates – serverStatusA.opcounters.updates) / 5 seconds
This will give you the average rate of update queries during that 5 second period.
You can use this same technique with any of the counted metrics in serverStatus, including other opcounters, asserts, network bytesIn and bytesOut, page faults and index hits and misses.
Happy hacking!
We hope you found this tutorial helpful and informative… the possibilities with Fluentd and MongoDB are endless! Be sure to reach out to support@mongolab.com if you have any questions about MongoDB and check out the Fluentd mailing list for any questions about Fluentd!
-Chris@MongoLab
Start automating your delivery right there on your own laptop, today! Get the open source Atomist Software Delivery Machine.
Published at DZone with permission of Chris Chang , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
{{ parent.title || parent.header.title}}
{{ parent.tldr }}
{{ parent.linkDescription }}
{{ parent.urlSource.name }}