java -XX:+UnlockDiagnosticVMOptions -XX:+UnlockExperimentalVMOptions
After some post-processing with vi, awk, diff, and grep I got 115 new options broken into three lists:
- Product options: 39
- Experimental options: 21
- Diagnostic options: 55
There’s too many for one blog entry, so I’ll limit the scope of this post to just the 39 new product options. These are likely to be the ones of most use.
Ahead Of Time Compilation
Ahead of Time (AOT) compilation is included (but not supported) as an experimental feature in JDK 9. Many people often think that static compilation of Java code would solve many of the problems of bytecode interpretation and adaptive Just-in-time (JIT) compilation performance. On its own, static compilation often provides significantly worse performance than properly warmed up JIT compiled code because of the issues associated with dynamic class loading, class initialization, the restrictions on method inlining and inability to use speculative optimisations, etc. That said, there are situations where AOT compilation can help with the JVM startup time associated with moving from interpreted to JIT compiled mode (there is a simpler solution to this, which is to use the ReadyNow! feature of Zing). The AOT designers are also doing some interesting things with using statically compiled code at JVM startup and then having the JIT provide more optimised code as the application runs.
The AOT code generated by the jaotc tool shipped with JDK 9 can either be tiered or non-tiered. Non-tiered code behaves the same way as statically compiled C and C++ code; the code is static and stays that way. Using tiered code enables the JVM to take advantage of running the statically compiled code when it starts but to also record profiling data in the same way it does for interpreted bytecodes. The JVM is then able to use the JIT to recompile the code using more sophisticated optimisations as it has full knowledge of the runtime. The JIT compilers used by the OpenJDK JVM (C1 and C2) use five tiers; the AOT system uses tier three, which is the C1 compiler with profiling information. Several of the new options control the AOT and Tier 3 JIT interaction.
This is an experimental feature in JDK 9 and based on the Graal compiler project. Currently, the java.base module is the only one of the core modules that is supported by this feature. More detail of AOT compilation can be found in JEP 295.
The new command line options to support this are:
- AOTLibrary (String): Which AOT Libraries to use. This can be a comma separated list for multiple libraries.
- UseAOT (+/-): Whether to use AOT compiled files. By default this is on but, since the JDK does not ship with any AOT code, the JVM will turn AOT use off unless AOT code is provided.
- PrintAOT (+/-): Print used AOT classes and methods
- Tier3AOTBackEdgeThreshold (int): Back-edge threshold at which tier 3 on-stack replacement compilation is invoked. A back-edge is a branch backwards in the code which typically denotes a loop construct.
- Tier3AOTCompileThreshold (int): Threshold at which tier 3 compilation is invoked (invocation minimum must be satisfied)
- Tier3AOTInvocationThreshold (int): Compile if number of method invocations crosses this threshold if coming from AOT
- Tier3AOTMinInvocationThreshold (int): Minimum invocation to compile at tier 3 if coming from AOT
There is a bit more detail on these in the excellent presentation by Dmitry Chuyko of Oracle’s AOT development team.
Garbage Collection Related
- TraceOldGenTime (+/-): Trace the cumulative time for collection of the old generation.
- TraceYoungGenTime (+/-): Trace the cumulative time for collection of the young generation.
- HeapSearchSteps (int): Heap allocation steps through preferred address regions to find where it can allocate the heap. This sets the number of steps to take per region.
- ShrinkHeapInSteps (+/-): When disabled, informs the GC to shrink the Java heap directly to the target size at the next full GC rather than requiring smaller steps during multiple full GCs.
- G1UseAdaptiveIHOP (+/-): For the G1 collector use an adaptive policy for the initiating heap occupancy percentage value.
- PreTouchParallelChunkSize (int): The size of the chunk of memory for each thread when using parallel memory pre-touch. Pre-touch fills pages employed by the heap with zeros before they are needed (rather than when they are required). This can improve the performance of the allocator, although it can degrade startup time (especially with a large heap).
- CompileThresholdScaling (double): Factor to control when the first compilation happens (both with and without tiered compilation): values greater than 1.0 delay counter overflow,
values between 0 and 1.0 rush counter overflow,
a value of 1.0 leaves compilation thresholds unchanged
a value of 0.0 is equivalent to -Xint.This flag can be set as a per-method option. If a value is specified for a method, compilation thresholds for that method are scaled by both the value of the global flag and the value of the per-method flag.
- UseCodeAging (+/-): Insert a counter to measure the age of a method when it is compiled.
- UseFMA (+/-): Control whether Fused Multiply Add (FMA) instructions are used when available. FMA is a form of single instruction, multiple data (SIMD) instruction that makes use of wide registers to improve processor efficiency. FMA 3 is supported on Piledriver and later processors from AMD and Haswell and later processors from Intel. FMA 4 is supported on Bulldozer, Piledriver, Steamroller and Excavator processors from AMD.
C2 JIT Specific
These all look quite esoteric. Several relate to the use of SuperWords, which is the idea of vector operations using extremely wide registers and extensions to the x86 instruction set such as AVX, AVX 2 and AVX 512. Pete Lawrey has written more about this in a blog post.
- ArrayCopyLoadStoreMaxElem (int): The maximum number of array copy elements that will be inlined as a sequence of loads and stores.
- LoopPercentProfileLimit (int): Unroll loop bodies with this percentage node count of the profile limit.
- AllowVectorizeOnDemand (+/-): Allow vectorization (SIMD) of loops.
- UseCMoveUnconditionally (+/-): Use (x86) conditional move instructions (scalar and vector) ignoring the test to see if their use would be more profitable.
- OptoRegScheduling (+/-): This affects the way the C2 (optimizing or ‘opto’) JIT compiler allocates registers. This option enables instruction scheduling before register allocation, which is the opposite of the existing OptoScheduling option that enables instruction scheduling after register allocation.
- DoReserveCopyInSuperWord (+/-):Create reserve copy of graph in SuperWord.
- SuperWordLoopUnrollAnalysis (+/-): Map the number of unrolls for the main loop via Superword Level Parallelism analysis.
- SuperWordReductions (+/-): Enable reductions support when using SuperWords.
- RestrictReservedStack (+/-): There is an annotation, @ReservedStackAccess, which can mark a method as especially sensitive to stack overflows. The JVM can use this information to grant access to additional stack space. This command line option restricts @ReservedStackAccess only to trusted classes.
- StackReservedPages (int): Number of reserved zonepages of size 4KB (reserved to annotated methods). If pages are bigger reserved zone is aligned up. This has been added as part of JEP 270.
Segmented Code Cache
This is a new feature in JDK 9 and is described in JEP 197.
- SegmentedCodeCache (+/-): Use a segmented code cache.
- ProfiledCodeHeapSize (int): Size of the code heap containing profiled methods (in bytes)
- NonProfiledCodeHeapSize (int): Size of the code heap containing non-profiled methods (in bytes)
- NonNMethodCodeHeapSize (int): Size of the code heap containing non-method (like buffers, adapters and run-time stubs) code. (in bytes).
- StartAggressiveSweepingAt (int): Force stack scanning of active methods to aggressively remove unused code when only the given percentage of the code cache is free. For a segmented code cache, it is the percentage of the non-profiled heap and for a non-segmented code cache, it is the percentage of the total code cache.
- PrintFlagsRanges (+/-): Print VM flags and their ranges and exit VM
- CreateCoredumpOnCrash (+/-): Create core/mini dump on VM fatal error.
- ErrorLogTimeout (int): Timeout, in seconds, to limit the time spent on writing a log in the case of a crash.
- CompactStrings (+/-): Enable Strings to use single byte chars in backing store (JEP 254)
- ExecutingUnitTests (+/-): Whether the JVM is running unit tests or not
- EnableDynamicAgentLoading (+/-): Allow tools to load agents with the attachment mechanism. By default, this option is on and prevents tools from attaching a dynamic agent to a running JVM. If you try to attach an agent when this is enabled you will get a warning message telling you to turn it on. This is a security improvement.
- SharedSymbolTableBucketSize (int): Average number of symbols per bucket in shared table. The symbol table in the JVM is stored as a hash table and to improve efficiency when shared between JVMs the bucket size can be changed.
Hopefully, this will be of some use and at least interesting to see some of the new options available in JDK 9.
Early access versions of the Zulu builds of OpenJDK 9 are available for your testing pleasure.