MongoDB Performance Monitoring Using the ELK Stack

Keep an eye on your database's performance with this guide to setting up monitoring via the ELK Stack

Daniel Berman

Jul. 18, 16 · Tutorial

Likes (1)

Comment

Save

9.0K Views

mongodb, one of the most popular nosql databases today, is designed to support massive amounts of data processing and storage. the tool is used by many well-known modern it organizations such as facebook, ebay, foursquare, and expedia. monitoring is a critical component of all database administration, and tight monitoring of your mongodb cluster will allow you to assess the state of your database. however, due to its complex architecture, which allows for virtually unlimited scaling, monitoring is a challenging task.

in this article, we will explain how to collect and analyze some of the mongodb metrics using the elk stack so that you can keep a close eye on your mongodb performance and growth.

mongodb metrics to track

in this article, we will use the latest version of mongodb (version 3.2) and focus on metrics that are available using the wiredtiger storage engine . this is currently mongodb 3.0's default storage engine. we will focus on tracking and metric analysis to get an overview of database performance, resource utilization, and saturation. these are accessible using mongodb commands.

throughput

mongodb (with the wiredtiger storage engine) provides several commands that can be used to collect metrics using mongo shell . mongo shell is an interactive javascript interface for mongodb that allows you to query data and take administrative actions.

one of the rich commands that provides a lot of information about items, including operations, connections, journaling, background flushing, memory, locking, asserts, cursors, and cache, is the serverstatus (i.e., db.serverstatus()) .

these throughput metrics are important as they can be used to avoid many performance issues, such as resource overloading. to get a general overview of your mongodb cluster activities, you should first look at the number of read/write clients and the number of database operations that they perform. these metrics can be retrieved using serverstatus opcounters and globallock objects.

the objects' output is in json, shown in the example below:

"opcounters":{
    "insert":0,
    "query":1,
    "update":12,
    "delete":5,
    "getmore":0,
    "command":23
    }

the opcounters.query and opcounters.getmore commands return metrics that indicate the number of read requests received from the time the mongod (a process that handles data requests and manages data access) instance last began. on the other hand, opcounters.insert, opcounters.update, and opcounters.delete return the number of write requests received.

by monitoring the number of read and write requests, you can quickly prevent resource saturation as well as spot bottlenecks and the root cause of overloads. in addition, these metrics will allow you to assess when and how you need to scale your cluster.

as shown above, globallock is a document that reports on the database's lock state and can provide you with information regarding read/write request statuses. these will allow you to check if requests are accumulating faster than they are being processed. the same applies to activeclients.readers and activeclients.writers. these can enable you to learn about the relationship between the amount of current active clients and your database load.

"globallock": {
   "totaltime": numberlong(254415000),
   "currentqueue": {
       "total": 0,
       "readers": 0,
       "writers": 0
        },
   "activeclients": {
        "total": 8,
        "readers": 0,
        "writers": 0
        }
    }
…

performance and failover

using a replica set (a master-slave replication that facilitates load balancing and failover) is a must to ensure your production robustness. the oplog (operations log) is the main component of the mongodb replication mechanism. below, you can see the relevant metrics that can be retrieved using the getreplicationinfo and replsetgetstatus commands.

as shown below, replica set member statuses are composed of a few indications such as the replica state and optimedate field, which is important for calculating the replication lag metric that contains the date when the last entry from the oplog is applied to that member):

...
"members" : [
    {
        "_id" : 0,
        "name" : "&lt;host1&gt;",
        "health" : 1,
        "state" : 1,
        "statestr" : "primary",
        "uptime" : 250,
        "optime" : {
            "ts" : timestamp(1447946570, 1),
            "t" : numberlong(1)
            },
        "optimedate" : &lt;date&gt;,
        "infomessage" : "could not find member to sync from",
        "electiontime" : &lt;timestamp&gt;,
        "electiondate" : &lt;date&gt;,
        "configversion" : 1,
        "self" : true
        },
    {
        "_id" : 1,
        "name" : "&lt;host2&gt;",
        "health" : 1,
        "state" : 2,
        "statestr" : "secondary",
        "uptime" : 13,
        "optime" : {
            "ts" : timestamp(1447946549, 1),
            "t" : numberlong(-1)
            },
        "optimedate" : &lt;date&gt;,
        "lastheartbeat" : &lt;date&gt;,
        "lastheartbeatrecv" : &lt;date&gt;,
        "pingms" : numberlong(0),
        "configversion" : 1
        },
    {
        "_id" : 2,
        "name" : "&lt;host2&gt;",
        "health" : 1,
        "state" : 2,
        "statestr" : "secondary",
        "uptime" : 13,
        "optime" : {
            "ts" : timestamp(1447946549, 1),
            "t" : numberlong(-1)
            },
        "optimedate" : &lt;date&gt;,
        "lastheartbeat" : &lt;date&gt;,
        "lastheartbeatrecv" : &lt;date&gt;,
        "pingms" : numberlong(0),
        "configversion" : 1
        }
    ]
…

replication lag is used to show the difference between the primary and secondary. because you want to avoid serving outdated information, it's important to keep the difference between the two as narrow as possible. if you lack any existing load issues, your replication lag will be zero. this is ideal. however, if the number rises for your secondary nodes, the integrity of your data is at risk. to avoid such events, we recommend setting alerts on these metrics so that you can constantly monitor your replica status. learn more about replication lag here .

resource utilization

one of the most important metrics is the number of client connections. this includes current active connected clients and the unused connections as well. these can be reported using serverstatus :

...
"connections": {
    "current": 1,
    "available": 818,
    "totalcreated": numberlong(4)
    }
...

an unexpected rise in the client connections metric can occur if the connection is not handled well or if there is an issue inside of the mongodb driver that is used for handling the connection. tracking the behavior of these metrics will allow you to set the relevant summary metrics.

another set of very important metrics is related to storage. these can be be retrieved using the db.stats() command, which will return statistics for the selected database. running it using the mongo shell to get statistics on the database test_mongo_db looks like this:

mongo test_mongo_db --eval "db.stats()"

the next json snippet is from the db.stats output:

{
    "db": "test",
    "collections": 1,
    "objects": 1,
    "avgobjsize": 40,
    "datasize": 40,
    "storagesize": 4096,
    "numextents": 0,
    "indexes": 1,
    "indexsize": 4096,
    "filesize": 0,
    "ok": 1
    }

if you look inside the output of db.stats command, you will find (similar to the example above) metrics for the number of objects (documents) within all of the collections ( collections property in the output), the size of all documents ( datasize property in bytes), the size of all indexes ( indexsize property, in bytes), and total amount of space allocated to collections in this database for document storage.

monitoring datasize, indexsize, or storagesize metrics will show you the change in physical memory allocation and will help you to keep your cluster healthy with enough storage to serve your database. on the other hand, a large drop in datasize can also indicate that there are many requested deletions, which should be investigated to confirm that they are legitimate operations.

the following metrics that should be monitored are the memory metrics using serverstatus . the interested tuple of metrics is virtual memory usage, which is located in the mem.virtual property (in mb), and the amount of memory used by the database, which is located in the mem.resident property (in mb). similar to the storage metrics, memory metrics are important to monitor because overloading ram memory within your server(s) is never good. this can lead to the slowing or crashing of your server, which will leave your cluster weakened. or, even worse, if you have only one dedicated server, mongodb can dramatically slow down or even crash.

another important set of metrics is located in the extra_info.page_faults property of the serverstatus output: the number of page faults or the number of times mongodb failed to get data from the disk.

"mem": {
    "bits": 64,
    "resident": 82,
    "virtual": 245,
    "supported": true,
    "mapped": 0,
    "mappedwithjournal": 0
    }

"extra_info": {
    "note": "fields vary by platform",
    "heap_usage_bytes": 59710000,
    "page_faults": 1
    }

collecting and monitoring using elk

in this section, we will describe how to ship, store, and monitor your mongodb performance metrics detailed above using the logz.io elk stack .

we will use the ubuntu server 16.04 on amazon cloud. you can also read our step-by-step article if you would like to know how to install and configure the elk stack on amazon cloud .

extracting the mongodb metrics

in the next step, we will demonstrate how to ship metrics to elasticsearch with logstash. using some programming to retrieve metrics will give you better control and allow you to run complex pre-shipping actions.

to ship logs, we will create a logstash configuration file with the input path, including how to interpret it and where to send it. learn more about logstash configuration here .

before we create the logstash configuration file, we will describe how to retrieve the mongodb metrics specifically — using the mongo shell interface via the bash of your os.

if we want to execute the serverstatus command via our terminal, without staying in the mongo shell program, we can use –eval flag of the mongo shell program as follows:

mongo --eval "db.serverstatus()"

and the output:

mongodb shell version: 3.2.7
connection to: &lt;db&gt;
{
    &lt;the json objects&gt;
    }

as you can see, the first two lines of the output contain information about the mongodb shell version and to which database the shell is currently connected. since this format does not comply with strict json rules and complicates our logstash configuration file, we will use the pipeline approach to cut off the first two lines of the output with the tail command.

so, our command will look like this:

mongo --eval 'db.serverstatus()' | tail -n +3

now, the output file will only contain the json part.

next, we want to remove the numberlong(x) and isodate(x) from the json file. again, sending these to logstash will trigger a json parsing exception, and storing in elasticsearch will fail. to transform the stream of the text, we will use the sed command with a regex pattern that will find numberlong and isodate data types. it will then replace it with the arguments that exist inside these data types:

{
    ….
    "localtime": isodate("2016-06-23t16:43:19.105z"),
    …
    "connections": {
        ….
        "totalcreated": numberlong(62)
        …
        }
    }

now, using the pipeline command and adding the piece for transforming the text, the final command will look as follows:

mongo --eval 'db.serverstatus()' | tail -n +3 | sed 's/\(numberlong([[:punct:]]\?\)\([[:digit:]]*\)\([[:punct:]]\?)\)/\2/' | sed 's/\(isodate(\)\(.*\)\()\)/\2/'

running this command will generate a pure a json file without the mongodb metadata.

in addition to the serverstatus command, we will also use the db.stats() command to gather storage metrics for specific databases. for the purpose of this tutorial, we created two databases for which we want to monitor storage allocation with the names test_mongo_db_1 and test_mongo_db_2 .

again, we will use the commands for gathering storage statistics for these two databases together with pipeline and tail commands to comply with the json formatting rules:

mongo test_mongo_db_1 --eval "db.stats()" | tail -n +3

mongo test_mongo_db_2 --eval "db.stats()" | tail -n +3

configuring logstash

next, we will take the created commands from above and place them in the logstash configuration file (logstash.config) using the exec input plugin. to forward the data to elasticsearch, we will use the elasticsearch output plugin :

input {
    exec {
        command =&gt; "mongo --eval 'db.serverstatus()' | tail -n +3 | sed 's/\(numberlong([[:punct:]]\?\)\([[:digit:]]*\)\([[:punct:]]\?)\)/\2/' | sed 's/\(isodate(\)\(.*\)\()\)/\2/'"
        interval =&gt; 7
        type =&gt; "db.serverstatus"
        }
    exec {
        command =&gt; "mongo test_mongo_db_1 --eval 'db.stats()' | tail -n +3"
        interval =&gt; 7
        type =&gt; "db.test_mongo_db_1.stats"
        }
    exec {
        command =&gt; "mongo test_mongo_db_2 --eval 'db.stats()' | tail -n +3"
        interval =&gt; 7
        type =&gt; "db.test_mongo_db_2.stats"
        }
    }
filter {
    json {
        source =&gt; "message"
        }
    }
output {
    elasticsearch {
        hosts =&gt; ["localhost:9200"]
        }
    }

we're now going to start the logstash configuration using the next command:

./bin/logstash -f logstash.config

after a short while, you will begin to receive the first mongodb metrics via logstash.

kibana discover section

shipping to logz.io using logstash

logz.io provides the elk stack as an end-to-end service so that the logs that you send to us are indexed and stored in elasticsearch and available in real-time through kibana.

while we support a wide range of techniques for shipping the logs (available under the log shipping section in the ui), in the next section i will explain how to use our logstash integration to ship mongodb logs into logz.io.

in the logz.io ui, select the log shipping tab located at the top of the page, and under the platforms menu on the left, select the logstash item.

on the right, you will see what needs to be added to the current logstash configuration to send logs to logz.io. two additional changes are required: one is adding token through the filter plugin , and the second is changing the output, where the elasticsearch output is replaced with tcp pointing to the listener.logz.io server in charge of processing incoming logs.

logstash shipping page

logstash shipping page

after adding these changes, the logstash configuration file for shipping logs to logz.io looks like this:

input {
    exec {
        command =&gt; "mongo --eval 'db.serverstatus()' | tail -n +3 | sed 's/\(numberlong([[:punct:]]\?\)\([[:digit:]]*\)\([[:punct:]]\?)\)/\2/' | sed 's/\(isodate(\)\(.*\)\()\)/\2/'"
        interval =&gt; 7
        type =&gt; "db.serverstatus"
        }
     exec {
        command =&gt; "mongo test_mongo_db_1 --eval 'db.stats()' | tail -n +3"
        interval =&gt; 7
        type =&gt; "db.test_mongo_db_1.stats"
        }
     exec {
        command =&gt; "mongo test_mongo_db_2 --eval 'db.stats()' | tail -n +3"
        interval =&gt; 7
        type =&gt; "db.test_mongo_db_2.stats"
        }
    }
filter {
    json {
        source =&gt; "message"
        }
    mutate {
        add_field =&gt; {"token" =&gt; "&lt;token&gt;"}
        }
    }
output {
    tcp {
        host =&gt; "listener.logz.io"
        port =&gt; 5050
        codec =&gt; json_lines
        }
    }

after starting logstash with the new configuration file, you will notice that logs will begin to appear in the discover section within the logz.io ui.

logzio log discover section

the logz.io discover section after starting a new logstash configuration

shipping to logz.io using amazon s3

another way to ship logs into logz.io is with aws s3. you would first need to create the log files themselves from the mongodb command output, and then use the aws cli to sync with an s3 bucket.

creating the log files

in the previous section, we used the pipeline command to execute and filter command output. the next step is to redirect this output to the file.

first, we will create a new log file:

mongo --eval 'db.serverstatus()' | tail -n +3 | sed 's/\(numberlong([[:punct:]]\?\)\([[:digit:]]*\)\([[:punct:]]\?)\)/\2/' | sed 's/\(isodate(\)\(.*\)\()\)/\2/' &gt;&gt; mongo_server_status_$(date +"%y-%m-%d-%h")

next, we will do the same for the command that generates the database stats:

mongo test_mongo_db_1 --eval 'db.stats()' | tail -n +3 &gt;&gt; mongo_test_mongo_db_1_stats_$(date +"%y-%m-%d-%h")

mongo test_mongo_db_2 --eval 'db.stats()' | tail -n +3 &gt;&gt; mongo_test_mongo_db_2_stats_$(date +"%y-%m-%d-%h")

we can now use these commands for periodic cron jobs to take charge of collecting the logs in a periodic manner.

syncing with s3 and shipping to logz.io

logz.io supports shipping from s3 natively. in the logz.io ui, open the log shipping section and expand the aws section. select the s3 bucket option and configure logz.io to be able to read from your s3 bucket.

to find more information on how to configure this type of shipping of the logs and how to use aws cli sync command to copy files to an s3 bucket, you can read the section "s3 syncing and shipping" in our article on creating a pci dss dashboard .

the mongodb performance dashboard

now that all of our mongodb metrics are shipped to elasticsearch, we are ready to build a monitoring dashboard. we will start with a series of kibana visualizations for the throughput metrics.

first and as an example, we will create a line chart that visualizes the number of read requests. after clicking on the visualize section and selecting the line chart visualization type from the menu, we will set up metrics fields on the left side in the kibana:

metrics configuration for query number

the metrics configuration for query number

line chart for query number

a line chart for query number

we will do the same thing for the rest of the throughput metrics. the configuration will only differ in the aggregation fields used (where for query we pointed on the opcounters.query from the field dropdown).

after adding and saving these charts in the kibana dashboard, you will be able to see throughput metrics visualized:

throughput metrics

a dashboard with visualized throughput metrics

in a similar fashion, we can visualize the other metrics described in the mongodb metrics section.

mongo db dashboard

the final dashboard for mongodb metrics

to help you to hit the ground running, we’ve added this dashboard to elk apps — our free library of ready-made visualizations and dashboards that can be installed in one click. simply search for mongodb in the elk apps page, and click to install.

your job doesn't necessarily stop there — set up alerts for the metrics that we have added here. learn how to create alerts for the elk stack .

MongoDB Database Metric (unit) Command (computing) Data processing Log shipping Dashboard (Mac OS) shell cluster

Published at DZone with permission of Daniel Berman, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending