How to Load Test & Tune Performance on Your API (Part II)

DZone 's Guide to

How to Load Test & Tune Performance on Your API (Part II)

There's nothing worse than hearing "it's slow." Load testing is integral to improving performance. Learn how to load test and tune performance on your API.

· Performance Zone ·
Free Resource

These posts are based on Mark’s presentation at APIStrat/APIDays Berlin. The video is now available on YouTube.

[This article was written by Victor Delgado]

Here is the second part of our how-to on running a load test on your API. In the first part, we walked through the process of setting up your load testing environment and deciding what are the right metrics to measure and the different approaches to measuring them. We also provided some guidance on what tools to use and finally obtained real data points about how our API was performing.

We will now look at ways of securely exposing your API to the public while making sure that its performance is not being affected.

How adding an access control layer affects your API

At this point we have a reasonably high performance API, but what would happen if someone started sending traffic at a rate beyond 16.000 req/second? How can you prevent any single one of your API users to start blasting requests to your API affecting all other developers consuming it?

There are multiple ways to approach this problem, the most obvious of which would be to build a rate limiting system into your own API server. We believe in separating concerns as much as possible, so that is why we think it is better to have a separate layer in your stack play that role.

Here is where an API gateway comes handy. API gateways are a common architectural solution to the problem of managing access to APIs. Besides the access control role, API gateways are also typically used as an orchestration layer to manipulate and expose different interfaces for combinations of internal APIs.

In this case we are going to use the our own API gateway that, together with the 3scale platform, enables API providers to open their API to the world in a safe way.  It is an easy way to start requiring authentication and setting rate limits on each of your API endpoints.

There are several ways you can integrate your API with 3scale, but we are going to choose the on-premise API gateway based on Nginx, a high-performance proxy. We will deploy the 3scale API gateway in front of our API server. The reasoning behind using the proxy is not only performance: since it is deployed in front of our API, traffic that is either unauthenticated or above its allowed limits will be rejected without ever reaching our API server. This makes it much easier to think about our API server, since we can design it for a known amount of traffic. Our API gateway will deal with everything else. Of course, we will always be able to tweak the limits from the 3scale admin dashboard, to increase them when grow our API server capacity or to add new limits if the number of users grows unexpectedly.

api gateway diagram

As engineers we are cautious when adding extra layers to our stack. So our goal here is to make sure that this extra layer will not impact the great performance of our API that we are so proud of.

We will deploy the gateway in the same type of AWS instance we are using for our API server, a c4.large with 2 virtual cores. Setting up the gateway is almost a one-click processsince we can take advantage of our AMI available in the AWS marketplace.

After configuring our API in 3scale, we get our Nginx configuration files tailored for us, which we use to start the API gateway. The first step will be running the same exact tests we did before with the default configurations, as we get them from the 3scale dashboard. The only modification is that our test requests now include a parameter with the API key:

GET http://our-api-gateway.com/question?user_key=ABCD123

Running the same test as before at 10000 requests/second, we hit several errors. Loader.io ends up halting the test due to the large number of errors.

Screen Shot 2015-05-14 at 12.36.14 PM

These all appear to be 500 status code responses or other unspecified network errors. After verifying that the API server has returned no errors, we narrow down the possible sources to Nginx where the logs show plenty of these:

2015/04/12 23:07:10 [alert] 2573#0: *147508 256 worker_connections are not enough while connecting to upstream, client: ..., server: , request: "GET /question HTTP/1.1", upstream: "http://.../question", host: "..."

Nginx needs to open at least two connections for every request, one to the client and another one to the proxied server. It also needs to open connections to 3scale, although not for every request (we are using the Extended Capacity version of the 3scale API gateway which batches the authorization and reporting calls to 3scale).

Taking that into consideration, at a rate of 10000 rps, the number of simultaneous open connections Nginx will need to keep will be much higher than 256.

Fine tuning your API proxy for maximum performance

The number of simultaneous open connections is configured through the directiveworker_connections. That setting is also highly dependent on the underlying system: Nginx won’t ever be able to open more connections than available sockets the system can offer. If we want to raise that number, there are a couple of settings to modify.

In /etc/security/limits.conf:
# increase the number of file descriptors available for
# any process (you will need to reboot to apply this)

# add these to the bottom of the file
* soft nofile 200000
* hard nofile 200000

In /etc/sysctl.conf:
# increase the system-wide limit on the number of open files for all processes
# apply the change running:  sudo sysctl -p /etc/sysctl.conf
fs.file-max = 5000000

After this, you can tell Nginx to use more connections by setting the following in the nginx.conf file:

worker_rlimit_nofile 100000;
events {
worker_connections  100000;

There are other optimizations that can be done both at the Nginx and system level. You can read about some of them in the Nginx documentation.

The settings that brought the greatest benefits in our case were those related to optimizing the utilization of the network resources:

  • Enabling keepalive to the upstream. This requires setting the HTTP protocol to the 1.1 version, which is ok since it is the version used by Node.js by default.
  • Reducing the tcp_fin_timeout to minimize connections in TIME_WAIT state so that we don’t have many stale connections. Also enabling tcp_tw_reuse with the same goal.
  • Setting the worker_processes to “auto” so that Nginx will always start one worker process for every available CPU core.

There are many more settings, especially at the system level, that are generally considered to improve the performance of web servers. You can check a more complete list here. Make sure you know your environment characteristics before applying them.

There are other settings that might be misleading since they can have a much greater impact in a load testing environment than with production traffic. For instance, increasing the value of the keepalive_requests directive in Nginx allows more requests to be sent through one keep-alive connection. Since in our benchmark all the load is being sent from a single source, this has a bigger benefit than a scenario in which the load might come from several origins.

We want to see if the optimizations took effect, so we run our 10.000 requests/sec Loader.io benchmark for 60 seconds. Here is the output from a couple of these test runs:

Screen Shot 2015-05-14 at 12.26.46 PMScreen Shot 2015-05-14 at 12.03.55 PM

The results are fine now, and we are reaching our expected rate with effectively 0 errors. There is a slight difference, which is that there are some spikes where the latency increases. These spikes happen when Nginx needs to report to 3scale, which impacts a few of the incoming client requests that are being processed just at those times. The batching of the reporting calls in the API gateway is done per client, which is why the impact here is very visible since we are producing a large amount of traffic from a single client. Also, during these tests the batching was being done rather frequently. If we set the period to be longer than the duration of our test, we end up with a result that is almost identical to when we were hitting the API directly.

Screen Shot 2015-05-14 at 12.02.23 PM

Finally, we repeat the same test using wrk as we did initially, and we confirm that the performance is in line with when we were going directly to our API, with a big, positive difference: we now have a layer in front of our API that gives us control over it, adding rate limiting, authentication and allowing us to see analytics of our usage. And everything without modifying a single line of our API server code.

Running 1m test @ http://api-gateway/question
4 threads and 1000 connections
Thread Stats  Avg  Stdev  Max  +/- Stdev
Latency  85.27ms  90.29ms  1.27s  92.76%
Req/Sec  3.98k  588.91  4.27k  64.50%
Latency Distribution
50%  85.75ms
75%  96.61ms
90%  127.49ms
99%  230.90ms
955320 requests in 1.00m, 194.17MB read
Requests/sec:  15942.54
Transfer/sec:  3.23MB

Screen Shot 2015-05-14 at 12.02.09 PM


In these two posts, we have covered the end-to-end process of how to run a useful load test over your API. We started by providing advice on the necessary preparation, setting up the environment. We reviewed a few of the long list of load testing tools that can be found on the market and the open-source community. We discovered the peak performance our API could achieve and set the goal of adding an access control layer without the performance being affected.

Our main takeaways from this experience:

  • Always set your objectives before running a test. It is best if those goals are based on data about your current traffic. That will mark the performance goal you have to achieve.
  • Invest time in preparing and measuring a good and stable baseline environment. Your test results are worthless if they can’t be compared with a known and solid starting point.
  • Avoid letting the testing tool become the limiting factor. Research the characteristics of each tool (e.g. single-threaded vs. multi-threaded) to know.
  • Look across all layers of the stack when fine-tuning the performance of your components. Increasing a limit on the API server will do nothing if the restricting factor is at the system level.
  • Having an API management layer will bring you peace of mind, since your API will be safely protected behind it and will only receive a predictable amount of traffic. The API gateway will deal with everything else.

We hope that this how-to will prove useful for companies that have jumped aboard API wagon and are wondering how their API performs under load.

api, load test, monitoring, performance, performance tuning, tutorial

Published at DZone with permission of Steven Willmott , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}