By monitoring NGINX you can catch two categories of issues: resource issues within NGINX itself, as well as problems developing elsewhere in your web infrastructure. Some of the metrics most NGINX users will benefit from monitoring include: requests per second, which provides a high-level view of combined end-user activity; server error rate, which indicates how often your servers are failing to process seemingly valid requests; and request processing time, which describes how long your servers are taking to process client requests (and which can point to slowdowns or other problems in your environment).
More generally, there are at least three key categories of metrics to watch:
-
Basic activity metrics
-
Error metrics
-
Performance metrics
Below we’ll break down a few of the most important NGINX metrics in each category, as well as metrics for a fairly common use case that deserves special mention: using NGINX Plus for reverse proxying. We will also describe how you can monitor all of these metrics with your graphing or monitoring tools of choice.
This Refcard references metric terminology introduced in our Monitoring 101 series, which provides a framework for metric collection and alerting.
Basic Activity Metrics
Whatever your NGINX use case, you will no doubt want to monitor how many client requests your servers are receiving and how those requests are being processed.
NGINX Plus can report basic activity metrics exactly like open-source NGINX, but it also provides a secondary module that reports metrics slightly differently. This Refcard first discusses open-source NGINX, then the additional reporting capabilities provided by NGINX Plus.
NGINX
The diagram below shows the lifecycle of a client connection and how the open-source version of NGINX collects metrics during a connection.
Download the Refcard for Figure 1
Figure 1: Open-Source NGINX Client Connection Lifecycle
Accepts, handled, and requests are ever-increasing counters. Active, waiting, reading, and writing grow and shrink with request volume.
NAME |
DESCRIPTION |
METRIC TYPE |
accepts |
Count of client connections attempted by NGINX |
Resource: Utilization |
handled |
Count of successful client connections |
Resource: Utilization |
active |
Currently active client connections |
Resource: Utilization |
dropped (calculated) |
Count of dropped connections (accepts – handled) |
Work: Errors* |
requests |
Count of client requests |
Work: Throughput |
*Strictly speaking, dropped connections is a metric of resource saturation, but since saturation causes NGINX to stop servicing some work (rather than queuing it up for later), “dropped” is best thought of as a work metric.
The accepts counter is incremented when an NGINX worker picks up a request for a connection from the OS, whereas handled is incremented when the worker actually gets a connection for the request (by establishing a new connection or reusing an open one). These two counts are usually the same—any divergence indicates that connections are being dropped, often because a resource limit, such as NGINX’s worker_connections limit, has been reached.
Once NGINX successfully handles a connection, the connection moves to an active state, where it remains as client requests are processed:
ACTIVE STATE |
Waiting |
An active connection may also be in a Waiting substate if there is no active request at the moment. New connections can bypass this state and move directly to Reading, most commonly when using “accept filter” or “deferred accept,” in which case NGINX does not receive notice of work until it has enough data to begin working on the response. Connections will also be in the Waiting state after sending a response if the connection is set to keep-alive. |
Reading |
When a request is received, the connection moves out of the waiting state, and the request itself is counted as Reading. In this state NGINX is reading a client request header. Request headers are lightweight, so this is usually a fast operation. |
Writing |
After the request is read, it is counted as Writing, and remains in that state until a response is returned to the client. This means that the request is Writing while NGINX is waiting for results from upstream systems (systems “behind” NGINX), and while NGINX is operating on the response. Requests will often spend the majority of their time in the Writing state. |
Often a connection will only support one request at a time. In this case, the number of Active connections == Waiting connections + Reading requests + Writing requests. However, the newer SPDY and HTTP/2 protocols allow multiple concurrent requests/ responses to be multiplexed over a connection, so Active may be less than the sum of Waiting, Reading, and Writing. (As of this writing, NGINX does not support HTTP/2, but expects to add support during 2015.
NGINX Plus
As mentioned above, all of open-source NGINX’s metrics are available within NGINX Plus, but Plus can also report additional metrics. This section covers the metrics that are only available from NGINX Plus.
Download the Refcard for Figure 2
Figure 2: NGINX Plus Client Connection Lifecycle
Accepted, dropped, and total are ever-increasing counters. Active, idle, and current track the current number of connections or requests in each of those states, so they grow and shrink with request volume.
NAME |
DESCRIPTION |
METRIC TYPE |
accepted |
Count of client connections attempted by NGINX |
Resource: Utilization |
dropped |
Count of dropped connections |
Work: Errors* |
active |
Currently active client connections |
Resource: Utilization |
idle |
Client connections with zero current requests |
Resource: Utilization |
total |
Count of client requests |
Work: Throughput |
The accepted counter is incremented when an NGINX Plus worker picks up a request for a connection from the OS. If the worker fails to get a connection for the request (by establishing
a new connection or reusing an open one), then the connection is dropped, and the dropped counter is incremented. Ordinarily connections are dropped because a resource limit, such as NGINX Plus’s worker_connectionslimit, has been reached.
Active and idle are the same as the active and waiting states in open-source NGINX as described above, with one key exception: in open-source NGINX, waiting falls under the active umbrella, whereas in NGINX Plus idle connections are excluded from the active count. Current is the same as the combined reading + writing states in open-source NGINX.
Total is a cumulative count of client requests. Note that a single client connection can involve multiple requests, so this number may be significantly larger than the cumulative number of connections. In fact, (total / accepted) yields the average number of requests per connection.
Metric Differences Between Open-Source and Plus
NGINX (OPEN-SOURCE) |
NGINX PLUS |
accepts |
accepted |
dropped must be calculated |
dropped is reported directly |
reading + writing |
current |
waiting |
idle |
active (includes “waiting” states) |
active (excludes “idle” states) |
requests |
total |
Metric to Alert on: Dropped Connections
The number of connections that have been dropped is equal to the difference between accepts and handled (NGINX) or is exposed directly as a standard metric (NGINX Plus). Under normal circumstances, dropped connections should be zero. If your rate of dropped connections per unit time starts to rise, look for possible resource saturation.
Download the Refcard for Figure 3
Figure 3: NGINX Dropped Connections Per Second
Metric to Alert on: Requests Per Second
Sampling your request data (requests in open-source, or total
in Plus) with a fixed time interval provides you with the number of requests you’re receiving per unit of time—often minutes or seconds. Monitoring this metric can alert you to spikes in incoming web traffic, whether legitimate or nefarious, or sudden drops, which are usually indicative of problems. A drastic
change in requests per second can alert you to problems brewing somewhere in your environment, even if it cannot tell you exactly where those problems lie. Note that all requests are counted the same, regardless of their URLs.
Download the Refcard for Figure 4
Figure 4: NGINX Requests Per Second
Collecting Activity Metrics
Open-source NGINX exposes these basic server metrics on a simple status page. Because the status information is displayed in a standardized form, virtually any graphing or monitoring tool can be configured to parse the relevant data for analysis, visualization, or alerting. NGINX Plus provides a JSON feed with much richer data. A later section of this Refcard will discuss how to enable NGINX metrics collection.