Class Sharing in Eclipse OpenJ9: How to Improve Memory, Performance (Part 2)
Class Sharing in Eclipse OpenJ9: How to Improve Memory, Performance (Part 2)
Learn how to reduce your memory footprint and improve startup performance in this tutorial on class sharing in Eclipse OpenJ9.
Join the DZone community and get the full member experience.Join For Free
Memory footprint and startup time are important performance metrics for a Java virtual machine (JVM). The memory footprint becomes especially important in the cloud environment since you pay for the memory that your application uses. In this tutorial, we will show you how to use the shared classes feature in Eclipse OpenJ9 to reduce the memory footprint and improve your JVM startup time.
Runtime Bytecode Modification
Runtime bytecode modification is a popular way of instrumenting behavior into Java classes. It can be performed using the JVM Tools Interface (JVMTI) hooks (details can be found here). Alternately, the class bytes can be replaced by the class loader before the class is defined. This presents an extra challenge to class sharing, as one JVM may cache instrumented bytecode that should not be loaded by another JVM sharing the same cache.
However, because of the dynamic nature of the OpenJ9 Shared Classes implementation, multiple JVMs using different types of modification can safely share the same cache. Indeed, if the bytecode modification is expensive, caching the modified classes has an even greater benefit, as the transformation only needs to be performed once. The only provision is that the bytecode modifications should be deterministic and predictable. Once a class has been modified and cached, it cannot be changed further.
Modified bytecode can be shared by using the
modified= sub-option to
-Xshareclasses. The context is a user-defined name that creates a logical partition in the shared cache into which all the classes loaded by that JVM are stored. All JVMs using that particular modification should use the same modification context name. They all load classes from the same shared cache partition. Any JVM using the same shared cache without the
modifiedub-option finds and stores vanilla classes as normal.
If a JVM is running with a JVMTI agent that has registered to modify class bytes and the modified sub-option is not used, class sharing with other vanilla JVMs or JVMs using other agents is still managed safely, albeit with a small performance cost due to extra checking. Thus, it is always more efficient to use the modified sub-option.
Note that this is only possible because the JVM knows when bytecode modification is occurring because of the use of the JVMTI API. Redefined and retransformed classes are not stored in the cache. JVM stores vanilla class byte data in the shared cache, which allows the JVMTI
ClassFileLoadHook event to be triggered for all classes loaded from the cache. Therefore, if a custom class loader modifies class bytes before defining the class without using JVMTI and without using the modified sub-option, the classes being defined are assumed to be vanilla and could be incorrectly loaded by other JVMs.
For more detailed information on sharing modified bytecode, see here.
Using the Helper API
The Shared Classes Helper API is provided by OpenJ9 so that developers can integrate class sharing support into custom class loaders. This is only required for class loaders that do not extend
java.net.URLClassLoader, as those class loaders automatically inherit class-sharing support.
A comprehensive tutorial on the Helper API is beyond the scope of this article, but we will provide a general overview. If you'd like to know more details, you can find the Helper API implementation on GitHub.
The Helper API: a Summary
All the Helper API classes are in the com.ibm.oti.shared package. Each class loader wishing to share classes must get a
SharedClassHelperobject from a
SharedClassHelperFactory. Once created, the
SharedClassHelperbelongs to the class loader that requested it and can only store classes defined by that class loader. The
SharedClassHelpergives the class loader a simple API for finding and storing classes in the shared cache. If the class loader is garbage collected, its
SharedClassHelperis also garbage collected.
Using the SharedClassHelperFactory
SharedClassHelperFactoryis a singleton that is obtained using the static method
com.ibm.oti.shared.Shared.getSharedClassHelperFactory(), which returns a factory if class sharing is enabled in the JVM; otherwise, it returns null.
Using the SharedClassHelpers
There are three different types of
SharedClassHelper that can be returned by the factory. Each is designed for use by a different type of class loader:
SharedClassURLClasspathHelper: This helper is designed for use by class loaders that have the concept of a URL classpath. Classes are stored and found in the shared cache using the URL classpath array. The URL resources in the classpath must be accessible on the filesystem for the classes to be cached. This helper also carries some restrictions on how the classpath can be modified during the lifetime of the helper.
SharedClassURLHelper: This helper is designed for use by class loaders that can load classes from any URL. The URL resources given must be accessible on the filesystem for the classes to be cached.
SharedClassTokenHelper: This helper effectively turns the shared class cache into a simple hash table — classes are stored against string key tokens that are meaningless to the shared cache. This is the only helper that doesn't provide dynamic update capability because the classes stored have no filesystem context associated with them.
SharedClassHelperhas two basic methods, the parameters of which differ slightly between helper types:
byte findSharedClass(String classname...)should be called after the class loader has asked its parent for the class (if one exists). If the
findSharedClass()does not return null, the class loader should call the
defineClass()on the byte array returned. Note that this function returns a special cookie for the
defineClass(), not actual class bytes, so the bytes cannot be instrumented.
boolean storeSharedClass(Class clazz...)should be called immediately after a class has been defined. The method returns true if the class was successfully stored and false otherwise.
When deploying class sharing with your application, you need to consider factors such as security and cache tuning. These considerations are briefly summarized below.
By default, the shared caches are created with user-level security, so only the user that created the shared cache can access it. For this reason, the default cache name is different for each user so that clashes are avoided. On UNIX, there is a sub-option to specify groupAccess, which gives access to all users in the primary group of the user that created the cache.
In addition to this, if there is a SecurityManager installed, a class loader can only share classes if it has been explicitly granted the correct permissions. Refer to the user guide here for more details on setting these permissions.
Garbage Collection and Just-in-time Compilation
Running with class sharing enabled has no effect on class garbage collection (GC). Classes and class loaders are still garbage collected, just as they are in the non-shared case. Also, there are no restrictions placed on GC modes or configurations when using class sharing.
It is not possible to cache just-in-time (JIT) compiled code in the class cache. The AOT code in the shared cache is also subject to JIT compilation, and it affects how and when a method is JIT'ed. In addition, the JIT hints and profile data can be stored in the shared cache. You can use options
-Xscminjitdata to set the size for shared cache space for such JIT data.
Cache Size Limits
The current maximum theoretical cache size is 2GB. The cache size is limited by factors such as available system memory, available virtual address space, available disk space, etc. More details can be found here.
To practically demonstrate the benefits of class sharing, this section provides a simple graphical demo. The source and binaries are available on GitHub.
The demo app works on Java 8 and looks for the
jre\lib directory and opens each JAR, calling
Class.forName() on every class it finds. This causes about 16,000 classes to be loaded into the JVM. The demo reports on how long the JVM takes to load the classes. This is a slightly contrived example, but it effectively demonstrates the benefits of class sharing. Let's run the application and see the results.
2. Download shcdemo.jar from GitHub.
3. Run the test a couple of times without class sharing to warm up the system disk cache, using the command in Listing 11:
Listing 11. Warming up the Disk Cache
C:\OpenJ9>wa6480_openj9\j2sdk-image\bin\java -Xshareclasses:none -cp shcdemo.jar ClassLoadStress
When the window in Figure 1 appears, press the button. The app will load the classes.
Figure 1. Press the button
Once the classes have loaded, the application reports how many it loaded and how long it took, as Figure 2 shows:
Figure 2. Results are in !
You'll notice that the application probably gets slightly faster each time you run it; this is because of operating system optimizations.
4. Now, run the demo with class sharing enabled, as Listing 12 illustrates. A new shared cache is created. You can specify a cache size of about 50MB to ensure that there is enough space for all the classes. Listing 12 shows the command line and some sample output.
Listing 12. Running the Demo With Class Sharing Enabled
C:\OpenJ9>wa6480_openj9\j2sdk-image\bin\java -cp shcdemo.jar -Xshareclasses:name=demo,verbose -Xscmx50m ClassLoadStress [-Xshareclasses persistent cache enabled] [-Xshareclasses verbose output enabled] JVMSHRC236I Created shared classes persistent cache demo JVMSHRC246I Attached shared classes persistent cache demo JVMSHRC765I Memory page protection on runtime data, string read-write data and partially filled pages is successfully enabled JVMSHRC168I Total shared class bytes read=1111375. Total bytes stored=40947096 JVMSHRC818I Total unstored bytes due to the setting of shared cache soft max is 0. Unstored AOT bytes due to the setting of -Xscmaxaot is 0. Unstored JIT bytes due to the setting of -Xscmaxjitdata is 0.
You can also check the cache statistics using
printStats , as Listing 13 shows:
Listing 13. Checking the Number of Cached Classes
C:\OpenJ9>wa6480_openj9\j2sdk-image\bin\java -cp shcdemo.jar -Xshareclasses:name=demo,printStats Current statistics for cache "demo": Cache created with: -Xnolinenumbers = false BCI Enabled = true Restrict Classpaths = false Feature = cr Cache contains only classes with line numbers base address = 0x0000000011F96000 end address = 0x0000000015140000 allocation pointer = 0x000000001403FF50 cache size = 52428192 softmx bytes = 52428192 free bytes = 10874992 ROMClass bytes = 34250576 AOT bytes = 1193452 Reserved space for AOT bytes = -1 Maximum space for AOT bytes = -1 JIT data bytes = 28208 Reserved space for JIT data bytes = -1 Maximum space for JIT data bytes = -1 Zip cache bytes = 902472 Data bytes = 351648 Metadata bytes = 661212 Metadata % used = 1% Class debug area size = 4165632 Class debug area used bytes = 3911176 Class debug area % used = 93% # ROMClasses = 17062 # AOT Methods = 559 # Classpaths = 3 # URLs = 0 # Tokens = 0 # Zip caches = 5 # Stale classes = 0 % Stale classes = 0% Cache is 79% full Cache is accessible to current user = true
5. Now, start the demo again with the same Java command line. This time, it should read the classes from the shared class cache, as you can see in Listing 14.
Listing 14. Running the Application With a Warm Shared Cache
C:\OpenJ9>wa6480_openj9\j2sdk-image\bin\java -cp shcdemo.jar -Xshareclasses:name=demo,verbose -Xscmx50m ClassLoadStress [-Xshareclasses persistent cache enabled] [-Xshareclasses verbose output enabled] JVMSHRC237I Opened shared classes persistent cache demo JVMSHRC246I Attached shared classes persistent cache demo JVMSHRC765I Memory page protection on runtime data, string read-write data and partially filled pages is successfully enabled JVMSHRC168I Total shared class bytes read=36841382. Total bytes stored=50652 JVMSHRC818I Total unstored bytes due to the setting of shared cache soft max is 0. Unstored AOT bytes due to the setting of -Xscmaxaot is 0. Unstored JIT bytes due to the setting of -Xscmaxjitdata is 0.
You can clearly see the significant (about 40 percent) improvement in class load time from figure 3. Again, you should see the performance improve slightly each time you run the demo because of operating system optimizations.
Figure 3. Warm cache results
There are a few variations you can experiment with. For example, you can use the javaw command to start multiple demos and trigger all loading classes together to see the concurrent performance.
In a real-world scenario, the overall JVM startup time benefit that can be gained from using class sharing depends on the number of classes that are loaded by the application. A HelloWorld program will not show much benefit, whereas a large web server certainly will. However, this example has hopefully demonstrated that experimenting with class sharing is very straightforward, so you can easily test the benefits.
It is also easy to see the memory savings when running the example program in more than one JVM.
Below are four VMMap snapshots obtained using the same machine as the previous examples. In Figure 4, two instances of the demo have been run to completion without class sharing. In Figure 5, two instances have been run to completion with class sharing enabled, using the same command lines as before.
Figure 4. Two instances of demo with no class sharing
Figure 5. Two instances of demo with class sharing enabled
The share cache size is 50MB in the experiment, so the Mapped Files size of each instance in Figure 6 is 50MB more (56736KB – 5536KB) compared to Figure 5.
You can clearly see that the memory usage (Private WS) when shared classes are enabled is significantly lower. A saving of about 70MB Private WS is achieved for 2 JVM instances. More memory saving will be observed if more instances of the demo are launched with class sharing enabled. The test results above are obtained on a Windows 10 laptop with 32GB RAM, using an Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz.
We perform the same memory footprint experiment on a Linux x64 machine as well. Listing 15 shows the result of two JVM instances with no class sharing and Listing 16 shows the result of two JVM instances with class sharing enabled.
Looking at the results, RSS does not show much improvement when class sharing is enabled. This is because the whole shared cache is included in RSS. But, if we look at the PSS, which counts only half of the shared cache to each JVM (as it is shared by 2 JVMs), there is a saving of about 34MB.
Listing 15. Footprint on Linux With Class Sharing Disabled
pmap -X 9612 9612: xa6480_openj9/j2sdk-image/jre/bin/java -cp shcdemo.jar ClassLoadStress Address Perm … Size Rss Pss Referenced Anonymous Swap Locked Mapping … ======= ======= ===== ======== ========= ==== ==== 2676500 118280 106192 118280 95860 0 0 KB pmap -X 9850 9850: xa6480_openj9/j2sdk-image/jre/bin/java -cp shcdemo.jar ClassLoadStress Address Perm … Size Rss Pss Referenced Anonymous Swap Locked Mapping … ======= ======= ===== ======== ========= ==== ==== 2676500 124852 112792 124852 102448 0 0 KB
List 16. Footprint on Linux With Class Sharing Enabled
pmap -X 4501 4501: xa6480_openj9/j2sdk-image/jre/bin/java -Xshareclasses:name=demo -Xscmx50m -cp shcdemo.jar ClassLoadStress Address Perm … Size Rss Pss Referenced Anonymous Swap Locked Mapping … 7fe7d0e00000 rw-s 4 4 2 4 0 0 0 C290M4F1A64P_demo_G35 7fe7d0e01000 r--s 33356 33356 16678 33356 0 0 0 C290M4F1A64P_demo_G35 7fe7d2e94000 rw-s 11096 48 24 48 0 0 0 C290M4F1A64P_demo_G35 7fe7d396a000 r--s 5376 1640 832 1640 0 0 0 C290M4F1A64P_demo_G35 7fe7d3eaa000 rw-s 296 0 0 0 0 0 0 C290M4F1A64P_demo_G35 7fe7d3ef4000 r--s 1072 0 0 0 0 0 0 C290M4F1A64P_demo_G35 … ======= ======= ===== ======== ====== ====== ==== 2732852 120656 90817 97988 62572 0 0 KB pmap -X 4574 4574: xa6480_openj9/j2sdk-image/jre/bin/java -Xshareclasses:name=demo -Xscmx50m -cp shcdemo.jar ClassLoadStress Address Perm … Size Rss Pss Referenced Anonymous Swap Locked Mapping … 7f308ce00000 rw-s 4 4 2 4 0 0 0 C290M4F1A64P_demo_G35 7f308ce01000 r--s 33356 33356 16678 33356 0 0 0 C290M4F1A64P_demo_G35 7f308ee94000 rw-s 11080 48 24 48 0 0 0 C290M4F1A64P_demo_G35 7f308f966000 r--s 5392 1632 824 1632 0 0 0 C290M4F1A64P_demo_G35 7f308feaa000 rw-s 296 0 0 0 0 0 0 C290M4F1A64P_demo_G35 7f308fef4000 r--s 1072 0 0 0 0 0 0 C290M4F1A64P_demo_G35 … ======= ======= ===== ======== ====== ====== ==== 2730800 122832 92911 102584 64812 0 0 KB
The Shared Classes feature in the OpenJ9 implementation offers a simple and flexible way to reduce memory footprint and improve JVM startup time. In this article, you have seen how to enable the feature, how to use the cache utilities, and how to get quantifiable measurements of the benefits.
Published at DZone with permission of Hang Shao . See the original article here.
Opinions expressed by DZone contributors are their own.