How to Achieve Better Accuracy in Latency Percentiles in JMeter Dashboard
How to Achieve Better Accuracy in Latency Percentiles in JMeter Dashboard
When doing performance tests, accuracy is all important. See how two performance engineers used JMeter to increase the accuracy of their tests.
Join the DZone community and get the full member experience.
Join For FreeSignalFx is the only realtime cloud monitoring platform for infrastructure, microservices, and applications. The platform collects metrics and traces across every component in your cloud environment, replacing traditional point tools with a single integrated solution that works across the stack.
Introduction
There are a number of ways to evaluate the performance of systems using the data collected during a performance test. Latency analysis is one such important analysis technique in which we analyze the behavior of the latency. This analysis can be as simple as calculating the average latency/mean latency/latency percentiles or it can be a rather complex process in which we fit distributions to the data to study the characteristics of the latency distribution.
The "latency percentile" is an important performance metric which is used to analyze the latency. Since it measures the percentage of requests that has latency below some value, it can be considered as a metric that measures the quality of service of the application/system being evaluated. For example, if 99% latency percentile of your system is equal to 5 ms, it means that 99% of requests served by the system will have latency below 5ms. In the case of large datasets, there are methods to estimate the latency percentiles. The accuracy of results produced under these methods may vary depending on the underlying algorithm and parameters used.
Apache JMeter™ is a great tool which has been designed to load test the functional behavior of applications and measure their performance. At WSO2, we use JMeter to test the performance of most of our products. JMeter has a great set of features such as the ability to test various protocols/applications/servers, an IDE that allows fast test plan development, dynamic HTML reporting, multithreading and scriptable samplers, and the ability to test using a large number of concurrent users (which is achieved by running multiple instances of JMeter).
When we run performance tests, we can configure JMeter to create text files containing the results of a test. These files are called JTL files. Since the JTL files contain latency values for each request, we can use this information for latency analysis. This can be done using various listeners (e.g. Aggregate Reports) that are already available in JMeter or by loading the JTL file into a statistical software (such as R).
JMeter Dashboard
Recently, we have started using the JMeter Dashboard for obtaining performance results. JMeterDashboard can generate graphs and statistics from the JTL. While analyzing the latency percentile values in the JMeter dashboard, we noticed that, for certain scenarios we tested, there was a significant difference in the actual (exact) latency percentile values and the percentile values calculated in the JMeter dashboard. The exact value was calculated using R (statistical software package). Interestingly, enough JMeter aggregate reports produced the same result as R.
For example, see the following result:
Average Latency (DashBoard) 
90th Percentile (Dashboard) 
95th Percentile (Dashboard)) 
99th Percentile (Dashboard)) 
Throughput (Dashboard)) 
Average Latency (Exact Value) 
90th Percentile (Exact Value) 
95th Percentile (Exact Value) 
99th Percentile (Exact Value) 
Throughput (Aggregate Report) 
87.47 
170 
996 
2296.97 
5706.3 
87.47 
70 
321 
2009 
5706.3 
Note the following:
There is no difference in the average latency.
There is no difference in the throughput.
90% is significantly higher in the dashboard.
95% is significantly higher in the dashboard.
99% is higher in the dashboard.
The above result was obtained by loading JTL file of a 10 min performance test. The total time of the test was 15 min and the first 5 min was the warmup period. The total number of requests in the test was 3421980.
Improving Accuracy in the Latency Percentiles
The way to address the above is to increase the default value of the following property: jmeter.reportgenerator.statistic_window. Note that this property only affects the latency percentile values (because it is only used in the PercentileAggregator class, the component implemented for latency percentile calculation in JMeter).
The following table shows the impact of statistic_window on the results. Note that the number of samples = 3421980
Average Latency 
90th Percentile 
95th Percentile 
99th Percentile 
Throughput 

Dashboard: statistic_window=20k (default) 
87.47 
170 
996 
2296.97 
5706.3 
Dashboard: statistic_window=200k 
87.47 
81 
394 
2057 
5706.3 
Dashboard: statistic_window=500k 
87.47 
72 
355.95 
2013 
5706.3 
Dashboard: statistic_window=1000k 
87.47 
70.9 
336 
1993 
5706.3 
Dashboard: statistic_window=10000k 
87.47 
70 
321 
2009 
5706.3 
Dashboard: statistic_window = 3000k 
87.47 
71 
324 
2017 
5706.3 
Dashboard: statistic_window = 3421980 
87.47 
70 
321 
2009 
5706.3 
Exact value = R result = Aggergate Report Result 
87.47 
70 
321 
2009 
5706.3 
statistic_window= sample count
When statistic_window= total number of samples, then we get 100% accuracy (i.e. exact value) in the dashboard results.
jmeter.reportgenerator.statistic_window < sample count
When jmeter.reportgenerator.statistic_window < sample count, the last static_window number of samples in the JTL file is used for calculating the latency percentiles and this is the reason why we do not get the exact result. The following diagram shows samples used when statistic_window=20000 (i.e. default).
jmeter.reportgenerator.statistic_window = 1
We can get 100% accuracy (i.e. exact result) in the latency percentiles if we do the above.
Further Analysis
As pointed out above, the JTL file which we analyzed consisted of 3421980 samples. We now create multiple JTLs from the original JTL (with 3421980 samples). Each of these JTLs consists of 20000 samples. For each JTL we compute the percentile values using the Dashboard report (using default window size). In this case, the Dashboard report should produce an exact result (due to samplecount = default window size = 20000).
JTL 1
JTL 2
JTL 3
The objective is to investigate the deviation in the latency percentiles when you use different subsets of data from the original dataset. The results are shown below:
Average Latency 
90th Percentile 
95th Percentile 
99th Percentile 
Throughput 

Sample 1 (exact value) 
87.8 
72 
285 
1966.95 
3047.85 
Sample 2 (exact value) 
113.5 
130 
614.95 
2007.97 
2979.74 
Sample 3 (exact value) 
134.39 
170 
996 
2296.97 
2598.75 
Exact value obtained using R/JMeter aggregate report (using all data) 
87.47 
70 
321 
2009 
5706.3 
We note that there is a significant variation in the percentile values among different samples. In fact, for this particular test, we note that the values have become worse as the time progresses. The possible reasons for this behavior are:
20000 (default window size) samples are not sufficient to capture the behavior of the full latency distribution.
The system has not arrived at a steady state. This means that we need to increase the warmup period.
Other reasons we have not covered.
JMeter Memory
It is worth pointing out that when you increase the statistic_window you may need to increase the memory you allocate for JMeter. In this particular test, we allocated 2GB of heap memory for JMeter.
Conclusion
In this article, we have discussed the use of latency percentiles as a metric for measuring the performance and how to increase the accuracy of results that appear in the JMeter dashboard. We noted that there is a way to get the exact result (i.e. 100% accuracy). This can be achieved by setting jmeter.reportgenerator.statistic_window = 1, i.e. infinite window. However, when you set this property at 1 you may need to increase the amount of memory you allocate for JMeter, in particular, if you have a large number of samples. If there is not enough memory, then you can simply increase the default value of this property to a higher value which will increase the accuracy of the results. In the article, we investigated the impact of window size on the accuracy.
SignalFx is built on a massively scalable streaming architecture that applies advanced predictive analytics for realtime problem detection. With its NoSample™ distributed tracing capabilities, SignalFx reliably monitors all transactions across microservices, accurately identifying all anomalies. And through datasciencepowered directed troubleshooting SignalFx guides the operator to find the root cause of issues in seconds.
Opinions expressed by DZone contributors are their own.
{{ parent.title  parent.header.title}}
{{ parent.tldr }}
{{ parent.linkDescription }}
{{ parent.urlSource.name }}