This follows on from my previous two garbage collection blog posts:
Concurrent Mark Sweep
The parallel garbage collectors in Hotspot are designed to minimise the amount of time that the application spends undertaking garbage collection, which is termed throughput. This isn't an appropriate tradeoff for all applications - some require individual pauses to be short as well, which is known as a latency requirement.
The Concurrent Mark Sweep (CMS) collector is designed to be a lower latency collector than the parallel collectors. The key part of this design is trying to do part of the garbage collection at the same time as the application is running. This means that when the collector needs to pause the application's execution it doesn't need to pause for as long.
At this point you're probably thinking 'don't parallel and concurrent mean something fairly similar?' Well in the context of GC Parallel means "uses multiple threads to perform GC at the same time" and Concurrent means "the GC runs at the same time as the application its collecting".
Young Generational Collection
The young gen collector in CMS is called ParNew and it actually uses the same basic algorithm as the Parallel Scavenge collector in the parallel collectors, that I described previously.
This is still a different collector in terms of the hotspot codebase to Parallel Scavenge though because it needs to interleave its execution with the rest of CMS, and also implements a different internal API to Parallel Scavenge. Parallel Scavenge makes assumptions about which tenured collectors it works with - specifically ParOld and SerialOld. Bare in mind this also means that the young generational collector is stop the world.
As with the ParOld collector the CMS tenured collector uses a mark and sweep algorithm, in which live objects are marked and then dead objects are deleted. Deleted is really a strange term when it comes to memory management. The collector isn't actually deleting objects in the sense of blanking memory, its merely returning the memory associated with that object to the space that the memory system can allocate from - the freelist. Even though its termed a concurrent mark and sweep collector, not all phases run concurrently with the application's execution, two of them stop the world and four run concurrently.
How is GC triggered?
In ParOld garbage collection is triggered when you run out of space in the tenured heap. This approach works because ParOld simply pauses the application to collect. In order for the application to continue operating during a tenured collection, the CMS collector needs to start collecting when there is a still enough working space left in tenured.
So CMS starts based upon how full up tenured is - the idea is that the amount of free space left is your window of opportunity to run GC. This is known as the initiating occupancy fraction and is described in terms of how full the heap is, so a fraction of 0.7 gives you a window of 30% of your heap to run the CMS GC before you run out of heap.
Once the GC is triggered, the CMS algorithm consists of a series of phases run in sequence.
- Initial Mark - Pauses all application threads and marks all objects directly reachable from root objects as live. This phase stops the world.
- Concurrent Mark - Application threads are restarted. All live objects are transitively marked as reachable by following references from the objects marked in the initial mark.
- Concurrent Preclean - This phase looks at objects which have been updated or promoted during the concurrent mark or new objects that have been allocated during the concurrent mark. It updates the mark bit to denote whether these objects are live or dead. This phase may be run repeatedly until there is a specified occupancy ratio in Eden.
- Remark Since some objects may have been updated during the preclean phase its still necessary to do stop the world in order to process the residual objects. This phase does a retrace from the roots. It also processes reference objects, such as soft and weak references. This phase stops the world.
- Concurrent Sweep - This looks through the Ordinary Object Pointer (OOP) Table, which references all objects in the heap, and finds the dead objects. It then re-adds the memory allocated to those objects to its freelist. This is the list of spaces from which an object can be allocated.
- Concurrent Reset - Reset all internal data structures in order to be able to run CMS again in future.
Theoretically the objects marked during the preclean phase would get looked at during the next phase - remark - but the remark phase is stop the world, so the preclean phase exists to try and reduce remark pauses by doing part of the remark work concurrently. When CMS was originally added to HotSpot this phase didn't exist at all. It was added in Java 1.5 in order to address scenarios when a young generation scavenging collection causes a pause and is immediately followed by a remark. This remark also causes a pause, which combine to make a more painful pause. This is why remarks are triggered by an occupancy threshold in Eden - the goal is to schedule the remark phase halfway between young gen pauses.
The remark phases are also pausing, whilst the preclean isn't, which means that having precleans reduces the amount of time spent paused in GC.
Concurrent Mode Failures
Sometimes CMS is unable to meet the needs of the application and a stop-the-world Full GC needs to be run. This is called a concurrent mode failure, and usually results in a long pause. A concurrent mode failure happens when there isn't enough space in tenured to promote an object. There are two causes for this:
- An object is promoted that is too large to fit into any contiguous space in memory.
- There isn't enough space in tenured to account for the rate of live objects being promoted.
This might happen because the concurrent collection is unable to free space fast enough given the object promotion rates or because the continued use of the CMS collector has resulted in a fragmented heap and there's no individual space large enough to promote an object into. In order to properly 'defrag' the tenured heap space a full GC is required.
CMS doesn't collect permgen spaces by default, and requires the
?XX:+CMSClassUnloadingEnabled flag enabled in order to do so. If, whilst using CMS, you run out of permgen space without this flag switched on it will trigger a Full GC. Furthermore permgen space can hold references into normal heap via things like classloaders, which means that until you collect Permgen you may be leaking memory in regular heap. In Java 7 String constants from class files are also allocated in regular heap, instead of permgen, which reduces permgen consumption, but also adds to the set of object references coming into regular heap from permgen.
At the end of a CMS collection its possible for some objects to not have been deleted - this is called Floating Garbage. This happens when objects become de-referenced since the initial mark. The concurrent preclean and the remark phase ensure that all live objects are marked by looking at objects which have been created, mutated or promoted. If an object has become dereferenced between the initial mark and the remark phase then it would require a complete retrace of the entire object graph in order to find all dead objects. This is obviously very expensive, and the remark phase must be kept short since its a pausing phase.
This isn't necessarily a problem for users of CMS since the next run of the CMS collector will clean up this garbage.
Concurrent Mark and Sweep reduces the pause times observed in the parallel collector by performing some of the GC work at the same time as the application runs. It doesn't entirely remove the pauses, since part of its algorithm needs to pause the application in order to execute.
It took me a little longer than I had hoped to get round to writing this blog post - but if you want to know when my next post is published simply enter your email in the top right hand corner of this blog to subscribe by email.