Choosing the Best Garbage Collection Algorithm for Better Performance in Java
Choosing the Best Garbage Collection Algorithm for Better Performance in Java
Take a look at several of the Java garbage collectors that you can use, as well as some behind-the-scenes about how each collector works.
Join the DZone community and get the full member experience.Join For Free
In this article, I’m going to explain how Garbage Collection works behind the scene to free up memory. Java memory management has emerged a lot over the past few Java releases. Understanding the different GC algorithms will help you better to tune it (if required) depending upon the different performance issues we see in many of the Java-based application performance testing. When your Java application runs, it creates objects which take up memory space. As long as the object is being used (i.e referred by the application somewhere), it is going to occupy the memory. When the object is no longer used (for example, when you cleanly close a DB connection), the space occupied by the object can be reclaimed by Garbage collection.
How to choose the best garbage collector for your use case for any Java-based application performance testing? Before we go into that question lets talk some of the below basic concepts.
When we talk about Garbage Collection we need to generally consider three things.
This is the amount of memory that is assigned to the program and this is called heap memory. Please don't confuse this with footprint (amount of memory required by GC algorithm to run).
The second thing that we need to understand is throughput. The throughput is how much amount of time that code is run compared to how much amount of time your Garbage collection has run. For example, if your throughput is 99% that means 99% of the time the code was running and 1% of the time the garbage collection was running. For any high volume applications, we want the higher throughput as much as possible in any load tests that we run.
The third aspect that we need to understand is latency. Latency is whenever the Garbage collection runs, how much amount of time our program stops for the Garbage collection to run properly. All these are measured in milliseconds but they can go up to a few seconds depending upon the size of the memory and the Garbage collection algorithm that we choose for our load tests. Ideally, we would want the latency to be as low as possible or to be as predictable as possible.
Generational Hypothesis of Garbage Collection
This hypothesis says that most of the objects that are created die young. An object is marked as eligible to be garbage collected when it can no longer be accessed, which can happen when the object goes out of scope. It can also happen when an object’s reference variable is assigned an explicit null value or reinitialized. If an object cannot be accessed, that means any live thread is not able to access it through any reference variable that is used in a program.
Garbage collection algorithms split your heap memory size into Young Generation and Old Generation. Whenever we first create the objects, they are kept into the Young generation and most of them die young or they will become eligible for Garbage collection very quickly and that is why we have a lot of Garbage collection run on this Young generation and this collection is called Minor GC.
If there are objects like class level variables which are also called instance-level variables remains for a lifetime, which remains much longer, and even after many minor GC collections when the objects are still not eligible for garbage collection, they are promoted into this OLD generation. Whenever there are a lot of objects in the Old generation, then the Major GC is triggered.
GC Algorithm Steps
Any garbage collection algorithm has three basic steps:
This is the first step where the GC walks through the object graph in memory starts from the root node of all references to multiple objects and marks objects that are reachable as live. When the marking phase finishes, every live object is marked. The duration of this mark phase depends on the number of alive objects and increasing the heap memory directly doesn't affect the duration of the marking phase.
Whichever the objects are reachable that are not touched and unused objects are deleted and it reclaims the memory.
Compaction is the process of arranging everything in order. This step is to remove memory fragmentation by compacting memory in order to remove the empty space between allocated memory areas.
MARK and COPY Algorithm
Within the young generation generally, the space is divided into EDEN space and two survivor spaces SURVIVOR SPACE 1 and SURVIVOR SPACE 2
All the new objects that are created in memory is allocated to the EDEN space first. Whenever the Minor GC runs only the live objects of this EDEN space are marked and copied over to the survivor space and this involves in the below steps
1. It will first mark all the objects as live that means which are still being used or referenced and not eligible for Garbage Collection
2. Copy all the live objects to SURVIVOR spaces either in S1 or S2
Once it copies all the live objects now this EDEN space consists of objects which are already copied and the objects which are eligible for garbage collection the whole EDEN space is wiped out.
MARK SWEEP and COMPACT Algorithm
This generally runs on the old generation. Let’s say we have a lot of allocated objects some of them are live and some of them are eligible for Garbage collection. At first, we will mark only live objects. Second is, we will sweep and remove all the objects that are eligible for Garbage collection then it will remove the spaces and make it blank technically we don’t remove the spaces, the data structure itself gets updated saying the spaces are empty. The third aspect is compaction we will move all the live objects which are still being used, to the left side, and cluster all of them together. The downside of this approach is an increased GC pause duration as we need to copy all objects to a new place and to update all references to such objects.
The advantage of compaction is when we want to allocate new objects all we have to do is we need to keep a pointer and reference that says everything on the left is utilized and everything on the right is free.
Serial Garbage Collector (-XX:+UseSerialGC)
Serial Collector has the smallest footprint of any of the collectors. The amount of data structures the footprint required for this Garbage collector to run is very minimal. This collector uses a single thread for both minor and major collection. The bump pointer technique for compaction is used by Serial Collector and that is why the allocation is much faster. This collector is generally best for applications that are run on a shared CPU with very small amounts of memory.
Let’s imagine that we have a quad-core CPU and four applications are running on it. If your Garbage collector was not single-threaded and it is multi-threaded and at some point of time our Garbage collector will start all four threads on four cores of the CPU and will utilize that entire CPU for its own garbage collection and that is when the other applications running on the CPU will suffer. If there are multiple applications running on a single CPU and we have to ensure that our garbage collector doesn’t affect other cores or applications then we can use Serial Garbage Collector.
Parallel/Throughput Collector (-XX:+UseParallelGC , -XX:+UseParallelOldGC)
The next collector to understand is called Parallel Collector. We have Parallel Collector and Parallel Old Collector. we generally use only about Parallel Old Collector which uses multiple threads for both Minor GC as well as Major GC. This collector doesn’t run concurrently with the application. It is named Parallel because it has multiple threads of the Garbage collection itself and all of those threads run parallelly but when the Garbage collector is running all the threads are stopped and if our application is deployed on a multicore or multiprocessor systems this collector will give us the greatest throughput
In the shortest amount of time, it will be able to collect the highest amount of garbage possible. It stops the entire application and it could stop it for some time and it is the best collector only for batch applications. In the batch applications, we do not care about users, response times because there is no user on the front end and its batch application and running behind the scenes. For batch applications, the Parallel Collector will be the best one to use.
Concurrent Mark and Sweep Collector (-XX:+UseConcMarkSweepGC, -XX:+UseParNewGC)
This is called Concurrent Mark and Sweep. This collector runs concurrently with the application to mark all the live objects. The amount of time that the application has to stop is less so the latency of the application is also less. In the actual collection, it still has STW pauses. STW is also called Stop the World pauses which means it stops the application for a very small amount of time to do its actual garbage collection. This CMS collector requires more footprint than Parallel Collector and it has more data structures to take care of. It has less throughput than the Parallel Collector but the advantage is it has smaller pauses than the Parallel Collector. This collector is the best-used collector for all the general Java applications
G1 Collector (-XX:+UseG1GC) (Garbage First)
The improvement over the CMS collector is called the G1 collector. Instead of having specific young and old generations for Heap this collector uses its entire heap and divides it into multiple regions. It has more footprint and the advantage of this collector is it has the most predictable latency and this is the best feature of this collector. When we start our application, we can pass on this variable that the maximum pause time (maxTargetPauseTime) that our application can withstand say 10ms for example. The G1 collector will try to ensure that the Garbage collection is done only for 10 ms and even if there is some garbage left it will take care in the next cycle. If we want the predictable latencies and pause times the G1 collector will be the best collector to use. This is the most commonly used collector for all the performance testing needs.
Shenandoah Collector (-XX:+UseShenandoahGC)
There is one more collector called Shenandoah Collector. This collector is an improvement upon G1 collector wherein it requires a little higher footprint so it takes more data structures behind the scenes but it has even lower latency than G1 collector.
Shenandoah is an ultra-low pause time garbage collector that reduces GC pause times by performing more garbage collection work concurrently with the running Java program. CMS and G1 both perform concurrent marking of live objects. Shenandoah adds concurrent compaction.
Epsilon Collector (-XX:+UseEpsilonGC): The JDK's Do-Nothing Collector
The Epsilon garbage collector introduced in JDK 11 as an experimental collector and only allocates memory. It cannot release any allocated memory, so the application is very likely to crash because of an OutOfMemoryError. The GC in Epsilon collector does not do any GC cycles and therefore does not care about the object graph, object marking, object copying, etc. Once the Java heap is exhausted, no allocation is possible, no memory reclamation is possible, and therefore the test will fail.
The most significant advantage is no GC overhead and the JVM does not pause to clear the memory because it does not even try to release any memory. The Epsilon GC has been added as a benchmark to test applications for performance, memory usage, latency, and throughput improvements. Epsilon Collector helps us to calculate how long it takes for the Java Virtual Machine (JVM) to exhaust all its memory and shut down. The Epsilon GC helps test raw application performance with no interference from GC and no GC barriers embedded in the code. The Epsilon GC feature is disabled by default in JDK 11 and we must enable to use this collector.
For ultra-latency-sensitive applications, to completely know about memory allocations, memory footprint, and to know how much of your program’s performance is affected by garbage collection Epsilon collector is the best one to use.
Z Garbage Collector (-XX:+UseZGC)
Z Garbage Collector (ZGC) is scalable, with low latency. It is a completely new GC, written from scratch. It can mark memory, copy and relocate it, all concurrently and it can work with heap memory, ranging from KBs to a large TB memory. As a concurrent garbage collector, ZGC guarantees not to exceed application latency by 10 milliseconds, even for bigger heap sizes. The ZGC was initially released as an experimental GC in Java 11 (Linux) and more changes are expected over time in JDK 11, 13, and 14.
The stop-the-world pauses are limited to root scanning in ZGC. It uses load barriers with colored pointers to perform concurrent operations when the threads are running and they are used to keep track of heap usage. Colored pointers are one of the core concepts of ZGC and it enables ZGC to find, mark, locate, and remap the objects. Compared to G1, ZGC has better ways to deal with very large object allocations which are highly performant when it comes to reclaiming memory and reallocating it and it is a single-generation GC.
ZGC divides memory into regions, also called ZPages. These ZPages can be dynamically created and destroyed and can also be dynamically sized. Unlike other GCs, the physical heap regions of ZGC can map into a bigger heap address space (which can include virtual memory) which can avoid memory fragmentation issues.
In general, the Serial collector is for small devices or when we want to ensure that GC doesn’t affect other applications or CPU’s, the Parallel Collector is best for batch applications, the CMS collector is used for general applications, G1 collector is best for predictable latencies and Shenandoah collector is an improvement over G1 which we will be able to use as default collector in few versions of Java (From Java 11). Epsilon and ZGC collectors are the new experimental collectors introduced from JDK 11 and they are still undergoing a lot many changes from release to release.
Thanks for reading the article, and happy learning!
Opinions expressed by DZone contributors are their own.