Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

A Case Study: Different GC Algorithms Behavior in Production

DZone's Guide to

A Case Study: Different GC Algorithms Behavior in Production

How different garbage collection algorithms consume CPU power and effect latency and throughput.

· Java Zone
Free Resource

Bitbucket is for the code that takes us to Mars, decodes the human genome, or drives your next car. What will your code do? Get started with Bitbucket today, it's free.

On a major B2B application different GC algorithms behaviors were studied. This application is basically a webservice provider servicing SOAP and REST requests from its clients. This application doesn’t have any web browser interactions. Application runs on 8 Core CPU, Red Hat Linux 6.9. It’s using Java 7, Tomcat 7 and other popular Java frameworks.

This study was conducted over a 3 hour period in production environment during off-peak hours. This application runs on multiple JVM instances across multiple servers. We basically configured 4 different JVM instances with the below mentioned settings. Remaining JVM instances were running with it’s old settings (which I can’t tell & not of interest to this article). Traffic was evenly distributed across all JVM instances “Round-Robbin” algorithm in the load balancer.

G1 GC: -Xms6144m -Xmx6144m -XX:MaxPermSize=512m -XX:PermSize=300m -XX:+UseG1GC -XX:MaxGCPauseMillis=500


CMS GC:  -Xms6144m -Xmx6144m -XX:MaxPermSize=512m -XX:PermSize=300m -XX:NewRatio=1  -XX:+UseConcMarkSweepGC


Parallel GC: -Xms6144m -Xmx6144m -XX:MaxPermSize=512m -XX:PermSize=300m -XX:NewRatio=1 -XX:-UseParallelOldGC


Serial GC: -Xms6144m -Xmx6144m -XX:MaxPermSize=512m -XX:PermSize=300m -XX:NewRatio=1 -XX:-UseSerialGC

Note in the JVM settings, Heap Size (-Xmx, -Xms, --XX:NewRatio), Perm Size (--XX:MaxPermSize, -XX:PermSize) and all other parameters are kept identical. Only GC algorithms vary.

Key Performance Indicators

In any study key performance indicators should be carefully identified. As far as a Garbage Collection study is concerned, (in my humble opinion) key performance indicators are:

  1. Memory & CPU Utilization

  2. Latency

  3. Throughput

Latency and Throughput are slightly confusing terminology. Let me make an attempt to clarify it through an example. Let’s say your application is running for a 1 hour period (i.e. 60 seconds). In this 1 hour period, 5 GCs run.

  • 1st GC took: 1 second

  • 2nd GC took: 2 seconds

  • 3rd GC took: 1 second

  • 4th GC took: 1 second

  • 5th GC took: 1 second

Latency

Latency is the maximum GC Pause time. In this example maximum GC pause time is 2 seconds. Thus Latency is 2 seconds. Latency is an important KPI, because during GC pauses, application will freeze. Lets say your application's SLA commitment is 600 ms. In general your average response time is 500ms. Then you are with in the SLA limits, which is a good thing. Lets say your GC runs now and it takes 2 seconds to complete. then your application's response time during this window will become 2 seconds & 500 ms. It means you have breached the SLA commitment. Latency has direct impact on your end user's experience. 

Throughput

Throughput is the number of results produced per unit of time. In this example total time spent on GC is 6 seconds (i.e. adding 1st, 2nd 3rd, 4th and 5th GC times). It means 10% of time is spent in GC (i.e. 6 / 60). It means throughput is 90% (i.e. 100 - 10%). So if you have a high throughput it means your application is performing lot better with less overhead. In this example 90% is a poor throughput.

One should target for low latency and high throughput. Now a question might be, "What is the acceptable latency and throughput?" The answer is: It depends. It depends on the nature of your application, it depends on your SLA agreements with your clients, it depends on the price you are willing to pay for your compute power, it depends on your competitors response time, etc.

Tools

The following tools were used for this study:

  1. The CPU utilization metric was captured from the application performance monitoring tool New Relic.

  2. Throughput and Latency metrics were captured from the universal garbage collection analysis tool GCEasy.

Performance Summary

The below table summarizes all the KPIs gathered from this study:

GC Algorithm CPU Utilization Max Latency Throughput Complete Report

G1

9.80% 780 ms 96.96%

G1 GC Report

CMS

8.50% 3 sec 100 ms

97.294%

CMS GC Report

Parallel

7.60% 4 sec 560 ms

97.022%

Parallel GC Report

Serial

7.10% 6 sec 500 ms

96.861%

Serial GC Report


Here are some key observations from this study:

  • CPU utilization has been comparable among all GC settings. There isn't significant difference. Among all GC settings, G1 GC consumes a maximum CPU performance of 9.80%. The least CPU consumption came from the Serial GC setting which takes only 7.10%.

  • Irrespective of the GC algorithm throughput remains fairly consistent. CMS GC having slightly better throughput 97.29% than other GC algorithms.

  • G1 GC produces the best latency because of setting the -XX:MaxGCPauseMillis system property.

  • -XX:MaxGCPauseMillis is set to 500 ms. This setting is closely honored, thus we are seeing the max GC pause time to be 780 ms.

  • Serial GC has worst latency at 6 sec 500 ms among all GC algorithms.

Bitbucket is the Git solution for professional teams who code with a purpose, not just as a hobby. Get started today, it's free.

Topics:
java ,gc ,g1 ,garbage collection ,performance and scalability

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}