Table Of Contents
What is Garbage Collection
Garbage collection in Java is the processes of freeing the dynamic memory used by objects that are no longer being used by an application. In languages such as or C or C++, the developer is often responsible for managing dynamic memory (using malloc and free or new and delete). However, in Java, this task is left up to something known as the garbage collector. A garbage collector automatically frees unused memory, freeing the developer from much of this thankless memory juggling.
The most basic garbage collection algorithm works by starting at the root objects (ie objects on the thread stack, static objects, etc) that are live (live meaning currently in use) – and then iterating down over every reachable object. Any object that cannot be reached in this manner is garbage and can be collected. The application is paused while this process goes on. This is referred to as mark and sweep – first you mark the objects that are live, then you sweep those that are not. The time needed to do this is obviously proportional to the number of live objects (which can be quite a large number in modern Java applications), and so more efficient collection schemes have been devised.
One such scheme comes from the natural fact that you can divide up objects based on how long they live. Most applications create a lot of very short lived objects, and fewer objects that are around for a long time (I’ve seen estimates that for the average application, 85-98% of allocated objects are short lived). You can take advantage of this fact when doing collections. In Java, objects are allocated from a region of memory known as the heap. The Java heap is generally divided up into a few spaces (its usually the same across implementations, but there is the odd exception or two). The major spaces are the young generation, the tenured generation (also called the old generation), and the permanent generation. The young generation is then further sub divided into the eden space and two survivor spaces. The permanent generation is generally for objects that are around for the life of the application (interned Strings, class objects, etc) and doesn’t usually play much of a role in garbage collection. The permanent generation size is not part of the heap region defined with -Xms and -Xmx. Though a very unusual need, it is still worth noting that the permanent generation can actually be collected if needed using:
When objects are first created, they are allocated within the eden space. When the eden space becomes full, the still live objects within it are copied into one of the survivor spaces (or if they don’t fit, into the tenured space). One survivor space is always left empty, and on each young generation collection (a minor collection), the live objects from the eden space and the non empty survivor space are copied into the empty survivor space. This leaves a newly emptied survivor space for the next round, as any still live objects in the formerly full survivor space will be copied into the tenured space.
As you can see, rather then running over every object for every collection now, you can collect the young generations more often, and the tenured generation (long lived objects), much less often. You can also optimize your collection for the characteristics of the space – ie usually, almost all of the objects in the young space will be garbage. In general, an object will have to survive a couple minor collections to make it to the tenured space (first making it into a survivor space and then the tenured space). A copying collector identifies garbage by copying live objects from one space to another – anything left over is by definition garbage. The Sun JDK uses copying collectors for the young space and mark and sweep type collectors for the tenured space.
Tuning for garbage collection means adjusting the sizes of the various spaces mentioned in the previous section, as well as the algorithms used to collect them. You can do this with various JVM command line options.
The amount of RAM available for the various spaces is dependent upon the size of the heap that the JVM has allocated. Defaults are chosen based on the hardware detected, but you can usually do better by specifying a good Xms, Xmx yourself. On a server machine, it can be a good idea to pin those two settings together so that the JVM does not waste any time resizing itself. You generally do not want to size the heap much larger than is needed – this can needlessly increase the cost of full garbage collections, and take RAM from other important activities, such as file system caching.
|Initial Heap Size|
|Maximum Heap Size|
A Note About JVM Cmd Line Options
- Boolean options – On: -XX:+<option> Off: -XX:-<option>.
- Numeric options: -XX:<option>=<number>. Numbers can include ‘m’ or ‘M’ for megabytes, ‘k’ or ‘K’ for kilobytes, and ‘g’ or ‘G’ for gigabytes (1M= 1048576). In the case of Xms and Xmx, only one X is used and no colon.
- String options: -XX:<option>=<string>
You usually want to grant plenty of memory to the young generation – especially when you have multiple processors – as allocation can be parallelized and each thread will get its own private piece of the eden space to work with. You generally want the young generation to have less than half the space of the tenured generation though – especially when using the Serialized collector. About 33% is usually a good number to start from. The best size will vary from application to application depending on its distribution of young vs long lived objects. You don’t want the young space to be so small that many short lived objects are getting piled into the tenured space. You also usually don’t want it to be so large that the tenured space doesn’t have enough space available to it and/or young generation collections start taking too long to complete.
Other than sizing the total heap, sizing the new generation (another name for the young generation) can be the most important piece to good performance.
|(Since 5.0) Size of the young generation at JVM startup – this is calculated automatically if you specify NewRatio|
|(Since 1.4) The largest size the young generation can grow to (unlimited if not specified)|
|Sets the new generation to a fixed size – this is not usually recommended unless you are fixing the other memory sizes as well.|
|Sets the new generation size as a ratio to the tenured generation size.|
|You can also control the sizing of the survivor spaces – in practice this is not usually very helpful though.|
The best sizing is usually chosen by playing with the parameters and then testing the performance of your application. Often, the JVM uses good defaults, or depending on the garbage collector in use, resizes the spaces on it’s own based on historical statistics.
There are a few helpful tools that give you insight into the garbage collection process.
You can use the following command line options to generate information about the garbage collection process:
|Print info about heap and gc on each collection.|
|(Since 1.4) Print additional garbage collection info.|
|(Since 1.4) Add timestamps to the garbage collection logs.|
|Specify log file.|
There are various tools to then help you decipher these logs. One is GCViewer – though it only knows how to read gc logs up to Java 5.0 (though it can partially read 6.0 files). Another nice option from IBM is PMAT, and it can read Java 6 gc logs.
There is also a very cool tool called VisualGC that you can use to visually watch how objects move between spaces in real time as your application is running. This is available as a standalone application, or as a plugin for both Netbeans and VisualVM.
The Garbage Collectors
The following applies to the Sun Java implementation as well as OpenJDK.
There are three main garbage collection schemes that you should concern yourself with (much of this applies to Java 1.4, but in general, I am targeting Java 1.5 and up). These schemes are often called collectors themselves, but generally each involves two collectors – one for the old space and one for the new space. These collector schemes are often referred to by their old space collector names: the Serialized Collector, the Throughput Collector, and the Concurrent Low Pause Collector.
There is also an older incremental collector (unsupported and also called the train collector), and an incremental collection mode for the concurrent low pause collector (that I touch on and is generally used when only one or two CPU’s are available), but I’ll leave those for you to explore on your own if you are interested.
|Cmd Line Arg||
-XX:+UseSerialGC (Since 5.0)
|New Space Collector||Serial – single threaded, stop the world, copying collector|
|Old Space Collector||Serial Old – single threaded, stop the world, mark-sweep-compact collector|
With the serialized collector, a major collection is done when the tenured space is full. This is known as a “stop the world” collection, because all application threads will be paused while the collection occurs.
This collector is best used with small applications, applications run on a single CPU machine, or applications where pause times don’t matter. This collector is relatively efficient because it does not need to communicate between threads, but you have to be willing to accept its “stop the world” pauses. Minor collections will “stop the world” as well, but are generally fairly efficient and fast.
This collector is the only one that I have seen to respect -XX:MaxHeapFreeRatio - though that still only happens if a full collection is triggered. If you where trying to keep your RAM usage to a minimum, and always return as much memory as possible to the operating system, using the serialized collector and an aggressive -XX:MaxHeapFreeRatio can be a good strategy. You might want to occasionally force a full collection with System.gc() when your application is idle.
|Cmd Line Arg||
-XX:+UseParallelGC (Since 1.4.1)
|New Space Collector||Parallel Scavenge – multi threaded, stop the world, copying collector|
|Old Space Collector||Serial Old – single threaded, stop the world, mark-sweep-compact collector|
The throughput collector uses a parallel version of the young generation collector, while the tenured generation will still use the serial collector. So while a single thread will still perform collections on the tenured space, multiple threads will work together collecting the young space.
A feature called parallel compaction was added in Java 1.5 update 6 – this feature allows the throughput collector to also perform major collections in parallel. You can enable this with -XX:+UseParallelOldGC. Using this should help a lot with scalability, as you sidestep the single collection thread bottleneck on very large heaps (multi gigabyte). I’ve read this can actually lower performance on smaller heaps due to lock contention.
The throughput collector should be the default collector chosen on server class machines (in Java 1.5 and up), but there are exceptions – for example, my MacbookPro defaults to the CMS collector. You can always override these defaults.
Throughput is usually most useful when your application has a large number of threads creating new objects, and you have more than one processor available (though more than two is best). Typically, when you have multiple threads allocating objects, you also want to increase the size of the young generation. The number of garbage collector threads will generally be equal to the number of processors you have, but you can control that number with -XX:ParallelGCThreads=n. Sometimes you will want to lower the number of threads because each will reserve a part of the tenured generation for promotions – this can cause a fragmentation effect and effectively lower the size of the tenured generation (this is generally only an issue if your application has access to many processors or cores).
The throughput collector also supports something called Ergonomics. As part of this support, you can specify various desired behaviors for your application, and the JVM will attempt to tune various settings to meet your goals.
-XX:MaxGCPauseMillis=n hint to the throughput collector that a max pause time of n milliseconds is desired. By default there is no hint. The collector will adjust the heap size and other collection parameters in an attempt to meet the hint – keep in mind that throughput may be sacrificed in the attempt to meet this goal. There is also no guarantee that the goal will be met.
You can also specify a target goal for how much time is spent in garbage collection in comparison to running your application using -XX:GCTimeRatio. By default this is set to 1% (keep in mind that these defaults tend to change from release to release).
With the serialzed garbage collector a generation is collected when it is full (i.e., when no further allocations can be done from that generation). This is also true of the throughput collector.
|Cmd Line Arg||
-XX:+UseConcMarkSweepGC (Since 1.4.1)
|New Space Collector||Par New – multi threaded, stop the world, copying collector that works with CMS|
|Old Space Collector||Usually CMS, the mostly concurrent low pause collector – unless there is a concurrent mode failure, in which case, Serial Old - single threaded, stop the world, mark-sweep-compact collector|
Use the concurrent low pause collector when you can afford to share the processor resources with the garbage collector while the application is running. This is usually good for an application with a lot of long lived data – meaning you need a large tenured generation space. Obviously, having multiple processors is also helpful. This collector still pauses the application threads twice in a collection – once briefly at the start (when it marks objects directly accessible from root objects), and a slightly longer pause towards the middle (when it sweeps to find what it missed due to parallel marking) – the rest of the collection is done concurrently using one of the available processors (or one thread). If this collector cannot complete collecting the tenured space before it is full, all threads will be paused and a full collection performed – this is known as a concurrent mode failure and likely means you need to adjust the concurrent collection parameters.
This collector is used for the tenured generation, and does the collection concurrently with the execution of the application. This collector can also be paired with a parallel version of the young generation collector (-XX:+UseParNewGC).
Note that -XX:+UseParallelGC (the throughput collector) should not be used with -XX:+UseConcMarkSweepGC, and the JVM will fail on startup if you try this with most modern JVMs. Same with -XX:+UseParallelOldGC.
The concurrent low pause collector will keep statistics so that it can best guess when to start collecting (so that it finishes before the tenured space is full) – also though, it will start collecting when the tenured space hits a percentage of what’s available – You can manually set this with -XX:CMSInitiatingOccupancyFraction=n. The default for this setting varies across JVMs. I’ve read that the default for 1.5 was 68%, while the default for 1.6 is 92%. You can lower this if needed to ensure that the collection is kicked off sooner, and then you will be more likely to finish the collection before the tenured space is full.
The concurrent low pause collector can also be used in an incremental mode that I will not go into here. This mode causes the low pause collector to occasional yield the processor used for parallel collection back to the application, and thereby lessen its impact on application performance.
The Parallel Young Generation Collector
This collector is much like the throughput collector in that it collects the young generation in parallel. Most of what applies to the throughput collector also applies to this collector, however a different implementation is used that allows this collector to work in conjunction with the concurrent low pause collector, unlike the throughput collector. Despite some Sun/Oracle literature indicating this is off by default, it does seem to be on by default when using CMS in at least Java 6. You can disable it with:
The flip side of that coin is that while the throughput garbage collector (-XX:+UseParallelGC) can be used with adaptive sizing (-XX:+UseAdaptiveSizePolicy), the parallel young generation collector (-XX:+UseParNewGC) cannot.
-XX:+UseAdaptiveSizePolicy records statistics about GC times, allocation rates, and free space, and then sizes the young and tenured generations to best fit those statistics. This is for use with the throughput collector and is on by default.
Note: this article is biased towards server applications and the -server hotspot vm.
Usually you just want to start with the Parallel (throughput) collector. It’s the one that has ergonomics, and it will automatically adjust key settings so that most server apps will do just fine. This is the default collector on most server class systems. In general, you do not need to change any garbage collection settings until you have determined you have a garbage collection issue to solve.
When you have to confront very large heaps, the Parallel collector can start to break down – it collects the tenured space using a stop the world collection, meaning your app is frozen while the collections happens. So when you find that the Parallel collector is just not cutting it, even when using UseParallelOldGC, you might try the mostly Concurrent Low-Pause Collector. It will collect as your application is running using a thread on the side, with two much shorter stop-the-world pauses. Overall, the CMS collector is slower in terms of throughput – but your application will likely be frozen less often.
Ergonomics do not apply here, so you are on your own for coming up with good settings if the defaults don’t turn out to be a good fit – but you can often remove long “the world is stopped” pauses with this collector.
The hope is that it is just going to make sense to always use the G1 collector in the future – it attempts to offer the best of both worlds of the throughput and mostly concurrent low pause collectors.
The Garbage First (G1) Collector
The Garbage First Collector is a new garbage collector that intends to rule them all. It is available in Sun Java 6 update 14 as well as recent versions of OpenJDK6 and early versions of OpenJDK 7. Eventually I plan to write more about his collector. Briefly: the G1 collector should combine the best of both the throughput and mostly concurrent low pause collectors. It uses new strategies to minimize stop the world pauses and maintain high throughput on multiprocessor systems with very large heaps.
Try this collector with: