Improving the Performance of a Real-Time Streaming Solution by Auto-Tuning the JVM
Performance tuning allows us to improve the performance of applications, and autotuning the JVM can definitely result in some performance improvement.
Join the DZone community and get the full member experience.Join For Free
Performance tuning allows us to improve the performance of applications. Doing performance tuning manually is not always practical due to the very large parameter space. Autotuning allows you to find best parameters automatically so as to optimize a given certain performance criteria.
OpenTuner is a tuning framework that allows you to automatically find optimal configuration and tuning parameters for a given application or program. It supports complex and user-defined data types and uses numerous search techniques to obtain the best combination of parameters that will result in optimal performance (i.e., total run time, average latency, throughput). OpenTuner has been used to tune seven distinct applications and the results show that it is possible to obtain up to 2.8x improvement in performance. For more information about OpenTuner framework, refer to this article.
JATT, a HotSpot autotuner, has been designed specifically to autotune the Java Virtual Machine (JVM). JATT organizes the JVM flag into groups and finds the best combination of flags that would result in the best performance. The rationale for introducing different flag group is to reduce the search space and avoid invalid flag combinations.
In this article, we will share the experience we gained while tuning an event-based solution using JATT. This event-based solution has been specifically designed to solve the DEBS 2016 grand challenge. The solution analyzes a dynamic (evolving) social-network graph. There are two main objectives:
Identification of the posts that currently trigger the most activity in the social network.
Identification of large communities that are currently involved in a topic.
We ran the experiment under five different flag groups, namely code cache, compilation, compiler, memory, and compiler-compilation-memory. For more information about different flag groups, refer to this article.
JATT Configuration Details
In this section, we will provide some configuration details of JATT. The source code for the JATT can be found here. Note that this is not the original JATT repository. In this repository, we have made a few minor improvements to the original JATT.
To run the autotuner: python src/javaProgramTuner.py > source=program-to-tune > iterations=2 > flags=bytecode > configfile=results.
|javaProgramTuner.py||Python script used for tuning the Java program.|
|Source||Java program to be autotuned.|
|Iterations||Number of iterations to run a program to obtain the average runtime.|
|Flags||flag-group: bytecode, code cache, compilation, compiler, deoptimization, GC, interpreter, memory, priorities, temporary, and define flag combinations (separated by commas).|
|Results||Write the tuned results into this file.|
The parameter values and ranges used by auto-tuner can be found here.
The performance tests were done on two different machines (see below for details). In all the experiments, we maintained the maximum and minimum heap memory sizes at 4GB.
|Processor||Intel Core i7-3520M CPU @ 2.90 GHz x4||Intel Core i5 CPU M 560 @ 2.67 GHz x 4|
|RAM||8 GB||8 GB|
We configured JATT to find JVM parameters such that it optimizes the total run time of the program. For each flag group, we obtain optimal parameter values and then run the program 50 times using the optimal parameter values and compute the average run time.
The following figures show the behavior of the runtime of the program under different flag groups.
The following figure shows the factor of improvement under different flag groups.
The factor of improvement is given by:
Factor of improvement = Default runtime/Optimized runtime
The default runtime is the runtime of the program with the default JVM argument. The optimized runtime is runtime of the program with the optimal parameter values. As pointed out earlier, we keep the minimum and maximum heap memory at 4GB when measuring both default and optimized run times and these values are not modified by the auto-tuner.
First note that there is (some) improvement under all type of flag groups on both machines. The maximum factor improvement we obtained is less than 1.1. We note that the factor of improvements obtained under two machines are different and depend on the flag group. For example, the highest factor of improvement on machine A is 1.05 (obtained under compiler-compilation-gc flag group) while the highest factor of improvement on machine B is 1.08 (obtained under compiler flag-group).
One of the observations is that the optimal flag combination that will result in the best improvement for a given flag group on a given machine may not result in similar performance improvement on the other machine. In fact, we have seen degradation in the performance when using one machine’s optimal flag combination on the other. For example, when we use the optimal parameter set for code-cache on machine B, on machine A, we noticed a 30% degradation in the performance.
We also noted that the optimal parameter values are workload dependent. Therefore, the optimal parameter (that would result in the best performance under one type of workload and traffic) may not result in the similar performance improvements under a different type of workload, even on the same hardware.
Autotuning the JVM can result in some performance improvement. In this particular problem, we managed to get a maximum factor of improvement of 1.08. We noted that the optimal JVM parameter and flag values depend on both the hardware and workload characteristics. The optimal parameter set for one specific machine may not result in similar performance improvement on the other machine, which has different hardware.
Opinions expressed by DZone contributors are their own.