JMeter has been used extensively for performance testing. This is because its numerous capabilities, such as the ability to test various protocols/applications/servers, allow for fast test plan development, dynamic HTML reporting, multithreading, and scriptable samplers.
WRK is another performance testing tool that runs on a single multi-core CPU. WRK supports response processing, custom reporting, and HTTP request message generation via Lua scripts.
Recently, when we were doing Ballerina (a general-purpose, concurrent, and strongly typed programming language with both textual and graphical syntaxes, optimized for use cases on microservices) performance tests, we observed that different performance results under these two tools for certain scenarios we tested.
This was particularly the case when running performance tests with a higher number of concurrent users (e.g. 2000). This has led us to investigate the behavior of these two tools under a range of scenarios. These include different concurrencies, message sizes, and server response times.
For the sake of simplicity, we have done our evaluation (of the two tools) using a NETTY server (instead of Ballerina).
In this blog, I will present some of these results and observations.
It is worth mentioning that the underlying concurrency models of the two tools are different. In WRK a “connection” represents a “user” and multiple connections are handled by a single thread. WRK distributes the total number of connections evenly among the threads and the connections are reused during the course of the performance test. Although we can control the ratio between the number of threads and the connections, the author of WRK recommends that the number of threads should equal the number of physical CPU cores. The following figure illustrates the concurrency model implemented in WRK. You may refer to this article for more details (image source).
JMeter creates a thread and a connection for each user and we cannot control the ratio between the number of connections and the threads. This means that when doing performance tests using a large number of users, JMeter will create a large number of threads. In such scenarios, it is possible that the context switching of many threads can have an impact on the performance. We can, however, minimize this effect by using multiple JMeter instances when running performance tests with a large number of users.
Performance Testing Environment
As discussed, the objective of this article is to investigate the differences in performance test results obtained under two performance testing tools, namely, WRK and JMeter. Our performance testing scenario is implemented using a simple NETTY which sends the HTTP message it receives back to the client (i.e. JMTER/WRK). For WRK, the performance test was run on two EC2 instances with 4vCPus: 1 WRK instance and 1 NETTY server instance. In the case of JMeter, I have used the clustered set up and the performance test was run on five EC2 instances with 4vCPus: 2 JMeter (server) instances (with clustering) and 1 JMeter client instance and 1 NETTY server.
The performance evaluation is done by varying the numbers of concurrent users (500 to 4000), message size (0.1 K, 10k, 50K) and varying NETTY servers response time (10 ms, 1000ms). Here the server response time refers to the time that it waits before it sends the response back to the client. The response time is generated via a sleep. As per the author’s recommendations, I have set the number of threads in WRK at 4 (= number of CPUs). Increasing the number of threads beyond 4 did not make a significant difference in the results.
Server With Fast Response Time (10ms)
Let’s now consider the behavior of a server with fast response time. The following figures illustrate performance results obtained under the two tools when the response time of the server is 10 ms.
We note that for small message sizes (0.1K) JMeter has produced better results for TPS compared to WRK. For small message sizes, the (average) latency results obtained and WRK is slightly better than JMeter. As the message size increases, WRK tends to produce better results for throughput. However, the latency results obtained under WRK is higher compared to JMeter. This is particularly the case for large message sizes. For example, when the number of concurrent users is 2000, the average latency under WRK is 21 seconds while the average latency obtained under JMeter was 1 second. For the same scenario, the TPS under WRK is 1835 requests/second while the TPS under JMeter is 1437 requests/second.
Server With Slow Response Time (1000 ms)
Let’s now consider the behavior when the response time of the server is 1000ms. In this case, we note that both tools produce (very) similar results except when testing with large concurrencies using large message sizes. In this case, JMeter produced much better results for the latency and while there was no difference in the TPS under the two tools. The following figure illustrates this behavior:
In this article, I investigated the performance results obtained under two different performance testing tools: JMeter and WRK. Under lower concurrencies (<500) both tools produced similar results. As the number of concurrent users increases, we noticed there are significant deviations in the results. We have considered a server with slow and fast response times. One main observation was that latency results under JMeter are much (20x) better compared to WRK for a range of scenarios. In the case of TPS, there was no difference in the results for the slow response time server. For the fast response time server, WRK produced better TPS results (up to 1.25x) for certain message sizes (> 1k) and JMeter produced better TPS results ( up to 1.12x) certain message sizes (< 1K).