Introduction Garbage Collection Java
Understand the trade-off of each approach and how this can impact the current application.
Join the DZone community and get the full member experience.Join For Free
This article aims to make a short brief introduction to JVM, as well as explore a little of its architecture, even as which algorithms we can choose. Finally, understand the trade-off of each approach and how this can impact the current application.
It is a very important component inside Java applications because of performance of the application strongly depends on the technology, and it is not a surprise that has little few collectors available.
This component has a feature that enables the Java programmer to not need to manage the explicit lifecycle of objects; objects are created when necessary, and when not used, the JVM automatically frees these objects.
The JVM regularly polls the Heap for unused objects; it starts with an object that is rooted at the GC, which are objects that are accessed from out of the stack. This mainly includes stacks of threads and system classes; like this, objects are always reached; the GC algorithms scan all the objects that are reached, and all objects not reached in that scan are garbage.
But does not stop there; once those memories have been released, they can be occupied again but isn't very simple to maintain control of these free memory for allocation in the future, after collection also is realized the compaction of memory for evict the fragmentation of the same, as the image below.
Performing these operations seems simple, but isn't because, given the nature of java of executed multithread operations, when the GC moves an object in memory to perform memory compression, it must ensure that the application threads are not using these objects; otherwise the thread will lose the reference to the object being used, which may cause unexpected behavior in the application.
When the GC is executed, we have a pause at the application known as stop-the-world; this pause often has a big impact on application performance, and minimizing them is an important consideration when tuning the GC.
Organization of Garbage Collector
Many GC work by splitting the HEAP into generations; these are called old and young generations. The young generation is split into two sections known as Eden and survivor.
The rationale for having this separation is that many objects are used for a short period and others for a long period.
How Does the Management of These Spaces Work?
All the new objects allocated are directed to the young generation. When this space is filled, the GC takes action pausing all the application threads and cleaning up the young generation discarding the objects that are no longer in use. Finally, the objects still in use are moved to the survivor; this operation is called minor GC or young GC.
This design was thought to have to perform at the operations where happening the stop-the world, as the young generation is only a part of the Heap to the thread of applications are paused for a shorter period than if the entire Heap was processed.
What About When the Old Generation Is Also Full? What Strategy Will Be Used?
The JVM will need to find any objects that are no longer being used in the old generation, and this is where the GC algorithms have a big difference.
The algorithm simply pauses all the threads, and when it finds objects that are not used, it frees memory and then compacts the Heap; this process is called the full GC.
More complex algorithms can find objects that are no longer being used while the application threads are running. This is possible because the phase in which they check for unused objects can occur without stopping application threads. These algorithms are called concurrent collectors.
Below is a diagram to represent the heap structure and the minor GC and full GC operations.
Cool, I understand them now that we have options for use or agree with certain situations, but in which moment must use an algorithm, simple or more complex?
There are trade-offs here that going to discuss, and based on this discussion, we have an idea of which algorithm we can use.
- Is the application a Rest API that we are measuring the response time of the individual request?
- These requests will be affected by the time of breaks and, more importantly, due to the long pause times of the full GC. If minimizing the effect of long pauses is our goal, the concurrent collector may be better suited here.
- If the average response time is more important than the outliers ((i.e., the 90th%) response time), a nonconcurrent collector may yield better results.
- When the goal is even long pauses with a concurrent collector, it is a little dangerous because that comes in with a high CPU load. Case your machine doesn't have the CPU cycles needed by a concurrent collector, maybe this concurrent is not the better option.
- Does the application perform batch processing?
- If we have enough CPU available, using the concurrent collector to avoid the pauses caused by full GC will allow the job to pause faster.
- If the CPU is limited, the CPU consumption for the concurrent collector will make processing take more time than if using the nonconcurrent collector.
In these topics, we are going to describe some algorithms used by JDK 8 until JDK 12.
The garbage collector serial is the simple of the collector. There is the default collector if the application runs on a machine where the client is (JVMs of 32 bits on Windows) or on a machine with a unique processor. The serial collector seemed destined for the trash, but the containerization changed these.
The collector seria uses a single thread to process the Heap. It will stop all the application treads as the Heap is processed (for a small or full GC)
The serial collector is enabled using the flag -XX:+UseSerialGC (although it is usually the default in cases where it can be used). Note that, the opposite of most JVM flags, the serial collectors aren't disabled by changing the plus sign to minus (i.e., specifying -XX:-UseSerialGC). Instead, on the system where the serial collector is the default, it is disabled, selecting a different GC algorithm.
Throughput (Parallel) GC
In JDK 8, the throughput collector is the collector default for any machine of 64 bits with two or more CPUs. The throughput collector uses multiple threads to collect the younger generation, which makes smaller GCs much faster than when the serial collector. This also uses multiple threads to process the old generation. Because it uses multiple threads, the throughput collector is often referred to as a parallel collector.
The throughput collection stops all threads of application during minor and full GCs and fully compresses the old generation during a full GC. As is the default in most situations where it would be used. it does not need to be explicitly enabled. For enabled it when necessary, use the flag -XX:+UseParallelGC.
Note that the older version of JVM enables the parallel collection of young and old generations separately so that you may see references to the flag -XX:+UseParallelOldGC. This flag is deprecated (although it still works, and you can disable it to collect only the younger generation in parallel if you wish to do it.)
The G1 GC (or garbage first garbage collector) uses a concurrent collection strategy to collect the Heap with minimal pauses. It is the default collector in JDK 11 and later for JVMs of 64 bits on machines with two or more CPUs.
G1 GC divides the Heap into regions but still considers the Heap to have two generations. Some of these regions make up the young generation, and the young generation is still collected by stopping all application threads and moving all objects that are alive to the old generation or surviving spaces. (This occurs using multiple threads)
At G1 GC, the old generation is processed by background threads that do not need to stop all application threads to do most of their work. Because the old generation is divided into regions, G1 GC can clean up old generation objects by copying from one region to another, which means that it (at least partially) compresses the Heap during normal processing. This helps prevent the heaps of G1 GC become fragmented, although this is still possible.
The compensation for avoiding full GC cycles is the time of CPU: the (multiple) background threads the G1 GC uses to process the old generation must have a CPU cycle available at the same time as the application threads are running and executed.
G1 GC is enabled by specifying the flag -XX:+UseG1GC. In most cases, It is the default in JDK 11 and also function in JDK 8, especially in later builds of JDK 8, which contains a lot of correction, important bug fixes, and improvements in performance that are carried over from the later version.
What Else Comes Around?
Garbage collection continues fertile ground for JVM engineers, and also have algorithms that are in an experimental version maturing and evolving this powerful component within the java world.
That is it, guys; I hope you enjoyed it, and any questions or experiences you had the opportunity to experiment with, share them here in the comments.
See you next time.
Opinions expressed by DZone contributors are their own.
Implementing a Serverless DevOps Pipeline With AWS Lambda and CodePipeline
IDE Changing as Fast as Cloud Native
Auditing Tools for Kubernetes
SeaweedFS vs. JuiceFS Design and Features