Introducing Capacity Analysis for Python
Editor's Note: This article was original written by Graham Dumpleton.
Last week, we released the latest version of our Python agent. This release includes a few updates that were frequently requested by our users, including better support for TastyPie which should be particularly useful for Django users. One of the features I’m most excited about is the Capacity Analysis report and that’s what I’ll be discussing in this post.
The Capacity Analysis report helps you know if your app has enough instances deployed to keep up with request load. In other words, it helps you find out how busy your application is under varying loads, allowing you to tune your configuration for optimal performance. The reason I’m so interested in this report is, that as the author of Apache/mod_wsgi, I am always being asked what is the best mix of processes and threads to use when configuring the WSGI server hosting your application.
Unfortunately, there is no simple answer I can give as it depends on many factors. Ultimately, the only way to ascertain what the optimal configuration is is by monitoring your specific web application when it has been deployed to production and use that information to iteratively tune the configuration. The Capacity Analysis report that comes with this version of the Python agent provides you with key metrics you can use to answer this question.
What Does the Capacity Analysis Report Show?
The Capacity Analysis report captures information in three separate charts:
The first chart, App instance busy, shows how busy your application instances are (i.e. what percent of time instances are processing requests). You can use this graph to determine if you have the right number of instances for your application. As app utilization approaches 100%, you application needs more instances to handle incoming requests.
To understand how this works, imagine your app constantly receives one request per minute. If your app could only serve one request per minute, your utilization would be 100%. If you halved the time it takes to serve a request, or doubled the number of app servers, your utilization would go down to 50%. If multithread server processes are being used, the utilization figure is calculated across all threads which are actively handling requests in that process. A figure of 50% can therefore indicate that the equivalent of five out of ten available threads were being fully utilized during that period of time.
So essentially, the chart looks at how long your requests take to serve and how much capacity you have, and expresses the result as a percentage. When a multiprocess server configuration is being used, as well as the average being shown, the utilization of the least and most heavily loaded instances during that time are visible.
The next chart to look at is App instance analysis. This chart shows the total number of instances running along with the concurrent instance load. The concurrent instance load is the number of fully busy instances that would be needed to handle the load on your web application. It is computed as based on the average utilization from the App instance busy chart, multiplied by the number of instances you have running.
The final chart, App instance restarts by host, shows the number of instance restarts for each time interval. You can use this to figure out if your application instances are restarting too frequently.
How Can I See the Number of Available Threads?
The Capacity Analysis report isn’t a new feature of New Relic. It’s actually been available as part of the Ruby agent for some time. At this stage, we have simply taken the existing report as implemented for the Ruby agent and updated the Python agent so it can be used with it as well. Because of this, the current report doesn’t break out any more detailed information about the number of threads which may be available and/or used within each process. We are working on adding this in some way, but in the interim it’s still possible to get this information by creating a Custom Dashboard which accesses and charts the underlying metrics which are used in creating the Capacity Analysis report for the Python agent.
The names of the metrics available for the Python agent which relate to capacity analysis and can be charted using a Custom Dashboard are:
When using these metrics in a Custom Dashboard, we can focus in on specific measures related to thread usage within application processes.
In some cases it can help to understand better the inner workings of specific WSGI servers. For example, there’s an interesting aspect of how Apache/mod_wsgi daemon mode works that shows up in these particular charts. In this case, the Apache/mod_wsgi daemon mode configuration was:
WSGIDaemonProcess processes=5 threads=25
Although mod_wsgi will create 25 threads, they are maintained in a LIFO queue. This means that when handling a request, a preference is given to using a thread which has recently handled a request. If the number of threads is over-specified, any additional threads beyond what is required will sit there dormant and never be used. Because they aren’t used, they won’t be counted against threads available by the agent.
This was the circumstance around this particular example. Threads available averaged at less than four across all processes. Even when the maximum accessed was considered, it only reached a maximum of 12 in any one process.
This meant that the configuration of ‘threads=25′ was actually a lot more than was needed to satisfy the load. A small amount of memory could have been saved by reducing the number of configured threads.
At the same time, overall capacity used reached up to 40%. Some headroom is required for expansion and spikes in traffic, but the number of instances could also have been reduced.
What About Co-Routine Based Systems?
Calculating how busy a process is for single or multithreaded systems is relatively straight forward. For co-routine based systems, such as gevent and eventlet, it gets a bit trickier. This is because there isn’t really an upper limit on the number of co-routines you could have so as to measure a utilization factor. Instead, co-routines are created on demand for each request and destroyed at the end of the request.
As a result we are still tweaking how we can represent instance load for a co-routine based system and the Capacity Analysis report will currently not provide data when such a co-routine based system is used.
How Much Overhead Does this Introduce?
Overhead of any monitoring is something we take very seriously. Because of this, the code that performs the calculation of thread utilization which underlies these metrics is currently implemented as a C extension to the Python agent. This allows us to implement the measure with no noticeable overhead. Since it relies on a C extension, the system on which the agent package is installed on must have a compiler present. If it does not, the C extension can’t be compiled at the time of installation and the feature will be disabled. We are currently working on a pure Python implementation, but we want to make sure that there will be no performance impacts due to the feature.
Try It Out For Yourself
If you haven’t already, you should try New Relic today to see how you can improve the performance of your Python applications.