On a major B2B application different GC algorithms behaviors were studied. This application is basically a webservice provider servicing SOAP and REST requests from its clients. This application doesn’t have any web browser interactions. Application runs on 8 Core CPU, Red Hat Linux 6.9. It’s using Java 7, Tomcat 7 and other popular Java frameworks.
This study was conducted over a 3 hour period in production environment during off-peak hours. This application runs on multiple JVM instances across multiple servers. We basically configured 4 different JVM instances with the below mentioned settings. Remaining JVM instances were running with it’s old settings (which I can’t tell & not of interest to this article). Traffic was evenly distributed across all JVM instances “Round-Robbin” algorithm in the load balancer.
G1 GC: -Xms6144m -Xmx6144m -XX:MaxPermSize=512m -XX:PermSize=300m -XX:+UseG1GC -XX:MaxGCPauseMillis=500 CMS GC: -Xms6144m -Xmx6144m -XX:MaxPermSize=512m -XX:PermSize=300m -XX:NewRatio=1 -XX:+UseConcMarkSweepGC Parallel GC: -Xms6144m -Xmx6144m -XX:MaxPermSize=512m -XX:PermSize=300m -XX:NewRatio=1 -XX:-UseParallelOldGC Serial GC: -Xms6144m -Xmx6144m -XX:MaxPermSize=512m -XX:PermSize=300m -XX:NewRatio=1 -XX:-UseSerialGC
Note in the JVM settings, Heap Size (-Xmx, -Xms, --XX:NewRatio), Perm Size (--XX:MaxPermSize, -XX:PermSize) and all other parameters are kept identical. Only GC algorithms vary.
Key Performance Indicators
In any study key performance indicators should be carefully identified. As far as a Garbage Collection study is concerned, (in my humble opinion) key performance indicators are:
Memory & CPU Utilization
Latency and Throughput are slightly confusing terminology. Let me make an attempt to clarify it through an example. Let’s say your application is running for a 1 hour period (i.e. 60 seconds). In this 1 hour period, 5 GCs run.
1st GC took: 1 second
2nd GC took: 2 seconds
3rd GC took: 1 second
4th GC took: 1 second
5th GC took: 1 second
Latency is the maximum GC Pause time. In this example maximum GC pause time is 2 seconds. Thus Latency is 2 seconds. Latency is an important KPI, because during GC pauses, application will freeze. Lets say your application's SLA commitment is 600 ms. In general your average response time is 500ms. Then you are with in the SLA limits, which is a good thing. Lets say your GC runs now and it takes 2 seconds to complete. then your application's response time during this window will become 2 seconds & 500 ms. It means you have breached the SLA commitment. Latency has direct impact on your end user's experience.
Throughput is the number of results produced per unit of time. In this example total time spent on GC is 6 seconds (i.e. adding 1st, 2nd 3rd, 4th and 5th GC times). It means 10% of time is spent in GC (i.e. 6 / 60). It means throughput is 90% (i.e. 100 - 10%). So if you have a high throughput it means your application is performing lot better with less overhead. In this example 90% is a poor throughput.
One should target for low latency and high throughput. Now a question might be, "What is the acceptable latency and throughput?" The answer is: It depends. It depends on the nature of your application, it depends on your SLA agreements with your clients, it depends on the price you are willing to pay for your compute power, it depends on your competitors response time, etc.
The following tools were used for this study:
The CPU utilization metric was captured from the application performance monitoring tool New Relic.
Throughput and Latency metrics were captured from the universal garbage collection analysis tool GCEasy.
The below table summarizes all the KPIs gathered from this study:
|GC Algorithm||CPU Utilization||Max Latency||Throughput||Complete Report|
|8.50%||3 sec 100 ms||
|7.60%||4 sec 560 ms||
|7.10%||6 sec 500 ms||
Here are some key observations from this study:
CPU utilization has been comparable among all GC settings. There isn't significant difference. Among all GC settings, G1 GC consumes a maximum CPU performance of 9.80%. The least CPU consumption came from the Serial GC setting which takes only 7.10%.
Irrespective of the GC algorithm throughput remains fairly consistent. CMS GC having slightly better throughput 97.29% than other GC algorithms.
G1 GC produces the best latency because of setting the -XX:MaxGCPauseMillis system property.
-XX:MaxGCPauseMillis is set to 500 ms. This setting is closely honored, thus we are seeing the max GC pause time to be 780 ms.
Serial GC has worst latency at 6 sec 500 ms among all GC algorithms.