Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Java Version Upgrades: GC Overview

DZone's Guide to

Java Version Upgrades: GC Overview

Ready for the latest Java version but aren't sure how that will impact your projects' performance? Here's a checklist of how to make the switch from a GC perspective.

· Java Zone ·
Free Resource

Get the Edge with a Professional Java IDE. 30-day free trial.

I'm guessing there are many companies on the verge of upgrading to a newer Java version because developers love to upgrade, but besides this adventurous attitude, there is also fear. The biggest fear of such a change is that applications can produce unexpected behaviors. In my opinion, the root of this fear is the GC system, the fear of the unknown. 

First, we will do a quick walkthrough of how memory management architecture looks, what kind of GC algorithms are available, and what GC types exist. We will then go over how to come up with a plan that is safe and can persuade management that switching to a newer Java version won't be the end of worlds. If you are pretty confident on how the GC works, you can skip to the end of the article where the TODOs are listed. For others, let's start with the architecture for the JVM.

Memory Management Architecture

Many modern MM systems split the available memory into to two key parts based on the type. MetaData (MetaSpace/PermGenSpace) is where you store how data is represented. In Java, this storage is mostly populated by ClassLoaders when loading a class for the first time. This contains, for example, which methods are available in the loaded class and what types of data can be stored in a particular field. Runtime constant pools can also can be found here.

Image titleThe other part is where the actual runtime data resides, known as Java Runtime Memory. This can further be split into two parts, Stack and Heap. The Stack holds data like method local variables, callstack, and Thread-related metadata. They are of relatively small size (~1MB). 

Meanwhile, the Heap holds Object Instance-related data, like actual content of a field defined in your class. Where does Thread-related data get stored? When a thread is started, the JVM allocates a stack separate from the Heap and MetaSpace. Besides size, the Heap also has an important property — how long we actually use certain memory parts.

But why do engineers create this kind of complexity around memory management? Just so they can ask fancy interview questions, or maybe so Oracle can sell support on how to tune your JVM? Behind every complicated solution, there is the desire to building something better. The reason is to have better optimized Garbage Collection systems.

Garbage Collection

We know that memory is finite, so if we are not freeing it up efficiently, we are going to run out of it eventually, leading to us furiously reading the 'java.lang.OutOfMemoryError' class's Javadoc. When dealing with sophisticated languages, the allocation and freeing of memory is taken care of by the environment. Many things can happen in the background, but basically, the pseudo code for GC is the following.

  1. Mark the parts of memory that are potentially still in use (tree traversal found it)

  2. Set the memory pointers free

  3. Defragment memory by moving referenced objects together

This doesn't sound too complicated, right? Well, for starters, this can be quite expensive in processing power. You can create objects that can't be reclaimed by the GC, or the GC algorithm simply can't keep up if your application receives a higher load. When GC runs, Threads in the JVM can be suspended and make your application unresponsive.

Generational Hypothesis

You either die as an Object part of the Young generation, or live long enough to see yourself end up in the Old generation.

Engineers found out that most application data falls into two categories — data that is collected quickly or data that stays around for a long time. The gain here is that you don't have to run GCs so frequently on the old generation, thus saving a lot of processing power. Of course, objects of the Old Generation can refer to YoungGeneration objects — these are cross-generational links. The JVM has a way of dealing with this — splitting up the Young Generation into three more regions, Eden, Survivor1, and Survivor2.

Based on this information, we can categorize different Garbage Collections strategies.

GC Types

Type Target Trigger Impact

Minor GC

Young generation

Eden getting full

No effect on latency*

Major GC

Old generation

Minor GC fails

Can have latency effect

Full GC

Whole heap + MetaSpace

Minor or Major GC fail

Can have latency effect

* If it fails, it can trigger MajorGC, which can eventually have latency effects
If you are interested in what events are happening in your JVM regarding GC runs, you can start the JVM with -XX:+PrintGCDetails . 

GC Algorithms

Type Threads Algorythm Effect Default in

Serial

YoungGen: single
OldGen: single

YoungGen: mark and copy
OldGen: mark sweep compact

Stop the world every case Java6
if client class*

Parallel

YoungGen: multi
OldGen: single

YoungGen: mark and copy
OldGen: mark sweep compact

Stop the world when OldGen
cleanup is necessary
Java6
if server class**

ParalellOld

YoungGen: multi
OldGen: multi

YoungGen: mark and copy
OldGen: mark summary compact

No stop the world, but
allocation is not so efficient
non default

Concurrent
Mark Sweep

YoungGen: multi
OldGen: multi

YoungGen: mark and copy
OldGen: mark compact

No stop the world, but
allocation is not so efficient

non default
G1 GC

YoungGen: multi
OldGen: multi

memory split into chunks and marked
as usable or "under maintenance"
which is cleaned by the GC currently
Best of both, the promise of
 less GC pauses, more
 predictable GC runs
Java9

*client class: 32-bit architecture or single processor. Today, it's kind of irrelevant
**server class: Two or more physical processors and 2 or more GB of memory

We can decide which GC algorithms fit best for our needs.
Examples: -XX:+UseSerialGC-XX:+UseG1GC, and -XX:-UseConcMarkSweepGC 

These can be fine-tuned even more with additional parameters, but tuning the GC should be a last resort. 

Monitoring

Memory problems are hard to detect because they are not strictly bound to functional behavior. You can greenlit a project that worked fine for the e2e test and the 5 QA people, but that will break with real load in production in minutes. Know what is happening in your application! If you have already released your software, this should be on point and may only need minor tweaks. If not, you are probably doing blood sacrifices after every retro. Some important measurement values: Memory usage, request response time, threadcount, CPU usage, connections, and GC statistics (-XX:+PrintGCDetails).

These values should be stored as a data series so you can see behavior trendlines. You need to define your system's current thresholds. 

Examples:

  • CPU percentage should never go over 60%.

  • GC pause never higher than 0.7 sec,

  • Memory usage is always under 70%

  • Avg. Response time is always under 2 sec

  • Active connections less than 300

You will constantly compare to this baseline. Anything worse is not acceptable. If the values start to worsen, it is worth doing an investigation. Try to isolate parts of your system. Threads are also a huge help for such investigations.

The Plan

You should!

So now we know what kind of GCs there are, how are they triggered, what they can cause, and what properties a Minimum Viable Monitoring System (MVMS) should contain. With this knowledge, and after reading many other articles, you can start creating a plan for making a switch. We did this kind of upgrade from Java 7 to Java 8. I tried to create a list of TODOs on what we did for such an upgrade.

  1. Use performance tests and measure the properties listed in the monitoring section. These are crucial to be able to make good decisions later on. Define thresholds for these values. It's better to run this test regularly as development goes on — bottlenecks could be introduced with new features.

  2. Create easy upgrade and downgrade scripts/configurations for your dev and prod environments. You need to be able to switch easily between the two states of the system. This includes GC configuration, not just the JVM version.

  3. Update your CI environment to enforce compilation with your current JVM version. This also means that you should not start to use the new Java language features for the first release. This is to ensure backward compatibility. Newer versions of the JVM can run code compiled on an earlier version of Java.

  4. Validate your new environment functionality with automated and manual tests. 

  5. Run performance tests using the new Java version environment. Compare the results you got mentioned in step 1. If, somehow, thresholds aren't met, try to rule out new code changes from development.

  6. Run stability tests. Close to average load should be produced against the system for at least 12 hours. This is the real test of how the GC behaves.

  7. Release the new JVM to the production system. Until this point, we are only doing changes to the JVM and its configuration, no Java 8 code implementation is allowed.

  8. Now you are running your code on a new JVM environment, but the devs see no real benefit from the new functionalities. It's time to switch your CI environment to compile with the new version of Java.

  9. Repeat step 4-5-6-7. You might think you are done, but there is one last additional step.

  10. Educate your fellow developers of the capabilities of your new Java version. Do some knowledge sharing. Explain how the new GC system you are using works. This is the real fruit of the upgrade, and hopefully sees some performance gain.

Get the Java IDE that understands code & makes developing enjoyable. Level up your code with IntelliJ IDEA. Download the free trial.

Topics:
java ,memory management ,garbage collection ,jvm ,java performance ,tutorial

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}