5 Tips for Proper Java Heap Size
Determination of proper Java Heap size for a production system is not a straightforward exercise. In my Java EE enterprise experience, I have seen multiple performance problem cases due to inadequate Java Heap capacity and tuning. This article will provide you with 5 tips that can help you determine optimal Java Heap size, as a starting point, for your current or new production environment. Some of these tips are also very useful regarding the prevention and resolution of java.lang.OutOfMemoryError problems; including memory leaks. Please note that these tips are intended to “help you” determine proper Java Heap size. Since each IT environment is unique, you are actually in the best position to determine precisely the required Java Heap specifications of your client’s environment. Some of these tips may also not be applicable in the context of a very small Java standalone application but I still recommend you to read the entire article. Future articles will include tips on how to choose the proper Java VM garbage collector type for your environment and applications. #1 – JVM: you always fear what you don't understand How can you expect to configure, tune and troubleshoot something that you don’t understand? You may never have the chance to write and improve Java VM specifications but you are still free to learn its foundation in order to improve your knowledge and troubleshooting skills. Some may disagree, but from my perspective, the thinking that Java programmers are not required to know the internal JVM memory management is an illusion. Java Heap tuning and troubleshooting can especially be a challenge for Java & Java EE beginners. Find below a typical scenario: - Your client production environment is facing OutOfMemoryError on a regular basis and causing lot of business impact. Your support team is under pressure to resolve this problem - A quick Google search allows you to find examples of similar problems and you now believe (and assume) that you are facing the same problem - You then grab JVM -Xms and -Xmx values from another person OutOfMemoryError problem case, hoping to quickly resolve your client’s problem - You then proceed and implement the same tuning to your environment. 2 days later you realize problem is still happening (even worse or little better)…the struggle continues… What went wrong? - You failed to first acquire proper understanding of the root cause of your problem - You may also have failed to properly understand your production environment at a deeper level (specifications, load situation etc.). Web searches is a great way to learn and share knowledge but you have to perform your own due diligence and root cause analysis - You may also be lacking some basic knowledge of the JVM and its internal memory management, preventing you to connect all the dots together My #1 tip and recommendation to you is to learn and understand the basic JVM principles along with its different memory spaces. Such knowledge is critical as it will allow you to make valid recommendations to your clients and properly understand the possible impact and risk associated with future tuning considerations. Now find below a quick high level reference guide for the Java VM: The Java VM memory is split up to 3 memory spaces: The Java Heap. Applicable for all JVM vendors, usually split between YoungGen (nursery) & OldGen (tenured) spaces. The PermGen (permanent generation). Applicable to the Sun HotSpot VM only (PermGen space will be removed in future Java 7 or Java 8 updates) The Native Heap (C-Heap). Applicable for all JVM vendors. I recommend that you review each article below, including Sun white paper on the HotSpot Java memory management. I also encourage you to download and look at the OpenJDK implementation. ## Sun HotSpot VM http://javaeesupportpatterns.blogspot.com/2011/08/java-heap-space-hotspot-vm.html ## IBM VM http://javaeesupportpatterns.blogspot.com/2012/02/java-heap-space-ibm-vm.html ## Oracle JRockit VM http://javaeesupportpatterns.blogspot.com/2012/02/java-heap-space-jrockit-vm.html ## Sun (Oracle) – Java memory management white paper http://java.sun.com/j2se/reference/whitepapers/memorymanagement_whitepaper.pdf ## OpenJDK – Open-source Java implementation http://openjdk.java.net/ As you can see, the Java VM memory management is more complex than just setting up the biggest value possible via –Xmx. You have to look at all angles, including your native and PermGen space requirement along with physical memory availability (and # of CPU cores) from your physical host(s). It can get especially tricky for 32-bit JVM since the Java Heap and native Heap are in a race. The bigger your Java Heap, smaller the native Heap. Attempting to setup a large Heap for a 32-bit VM e.g .2.5 GB+ increases risk of native OutOfMemoryError depending of your application(s) footprint, number of Threads etc. 64-bit JVM resolves this problem but you are still limited to physical resources availability and garbage collection overhead (cost of major GC collections go up with size). The bottom line is that the bigger is not always the better so please do not assume that you can run all your 20 Java EE applications on a single 16 GB 64-bit JVM process. #2 – Data and application is king: review your static footprint requirement Your application(s) along with its associated data will dictate the Java Heap footprint requirement. By static memory, I mean “predictable” memory requirements as per below. - Determine how many different applications you are planning to deploy to a single JVM process e.g. number of EAR files, WAR files, jar files etc. The more applications you deploy to a single JVM, higher demand on native Heap - Determine how many Java classes will be potentially loaded at runtime; including third part API’s. The more class loaders and classes that you load at runtime, higher demand on the HotSpot VM PermGen space and internal JIT related optimization objects - Determine data cache footprint e.g. internal cache data structures loaded by your application (and third party API’s) such as cached data from a database, data read from a file etc. The more data caching that you use, higher demand on the Java Heap OldGen space - Determine the number of Threads that your middleware is allowed to create. This is very important since Java threads require enough native memory or OutOfMemoryError will be thrown For example, you will need much more native memory and PermGen space if you are planning to deploy 10 separate EAR applications on a single JVM process vs. only 2 or 3. Data caching not serialized to a disk or database will require extra memory from the OldGen space. Try to come up with reasonable estimates of the static memory footprint requirement. This will be very useful to setup some starting point JVM capacity figures before your true measurement exercise (e.g. tip #4). For 32-bit JVM, I usually do not recommend a Java Heap size high than 2 GB (-Xms2048m, -Xmx2048m) since you need enough memory for PermGen and native Heap for your Java EE applications and threads. This assessment is especially important since too many applications deployed in a single 32-bit JVM process can easily lead to native Heap depletion; especially in a multi threads environment. For a 64-bit JVM, a Java Heap size of 3 GB or 4 GB per JVM process is usually my recommended starting point. #3 – Business traffic set the rules: review your dynamic footprint requirement Your business traffic will typically dictate your dynamic memory footprint. Concurrent users & requests generate the JVM GC “heartbeat” that you can observe from various monitoring tools due to very frequent creation and garbage collections of short & long lived objects. As you saw from the above JVM diagram, a typical ratio of YoungGen vs. OldGen is 1:3 or 33%. For a typical 32-bit JVM, a Java Heap size setup at 2 GB (using generational & concurrent collector) will typically allocate 500 MB for YoungGen space and 1.5 GB for the OldGen space. Minimizing the frequency of major GC collections is a key aspect for optimal performance so it is very important that you understand and estimate how much memory you need during your peak volume. Again, your type of application and data will dictate how much memory you need. Shopping cart type of applications (long lived objects) involving large and non-serialized session data typically need large Java Heap and lot of OldGen space. Stateless and XML processing heavy applications (lot of short lived objects) require proper YoungGen space in order to minimize frequency of major collections. Example: - You have 5 EAR applications (~2 thousands of Java classes) to deploy (which include middleware code as well…) - Your native heap requirement is estimated at 1 GB (has to be large enough to handle Threads creation etc.) - Your PermGen space is estimated at 512 MB - Your internal static data caching is estimated at 500 MB - Your total forecast traffic is 5000 concurrent users at peak hours - Each user session data footprint is estimated at 500 K - Total footprint requirement for session data alone is 2.5 GB under peak volume As you can see, with such requirement, there is no way you can have all this traffic sent to a single JVM 32-bit process. A typical solution involves splitting (tip #5) traffic across a few JVM processes and / or physical host (assuming you have enough hardware and CPU cores available). However, for this example, given the high demand on static memory and to ensure a scalable environment in the long run, I would also recommend 64-bit VM but with a smaller Java Heap as a starting point such as 3 GB to minimize the GC cost. You definitely want to have extra buffer for the OldGen space so I typically recommend up to 50% memory footprint post major collection in order to keep the frequency of Full GC low and enough buffer for fail-over scenarios. Most of the time, your business traffic will drive most of your memory footprint, unless you need significant amount of data caching to achieve proper performance which is typical for portal (media) heavy applications. Too much data caching should raise a yellow flag that you may need to revisit some design elements sooner than later. #4 – Don’t guess it, measure it! At this point you should: - Understand the basic JVM principles and memory spaces - Have a deep view and understanding of all applications along with their characteristics (size, type, dynamic traffic, stateless vs. stateful objects, internal memory caches etc.) - Have a very good view or forecast on the business traffic (# of concurrent users etc.) and for each application - Some ideas if you need a 64-bit VM or not and which JVM settings to start with - Some ideas if you need more than one JVM (middleware) processes But wait, your work is not done yet. While this above information is crucial and great for you to come up with “best guess” Java Heap settings, it is always best and recommended to simulate your application(s) behaviour and validate the Java Heap memory requirement via proper profiling, load & performance testing. You can learn and take advantage of tools such as JProfiler (future articles will include tutorials on JProfiler). From my perspective, learning how to use a profiler is the best way to properly understand your application memory footprint. Another approach I use for existing production environments is heap dump analysis using the Eclipse MAT tool. Heap Dump analysis is very powerful and allow you to view and understand the entire memory footprint of the Java Heap, including class loader related data and is a must do exercise in any memory footprint analysis; especially memory leaks. Java profilers and heap dump analysis tools allow you to understand and validate your application memory footprint, including detection and resolution of memory leaks. Load and performance testing is also a must since this will allow you to validate your earlier estimates by simulating your forecast concurrent users. It will also expose your application bottlenecks and allow you to further fine tune your JVM settings. You can use tools such as Apache JMeter which is very easy to learn and use or explore other commercial products. Finally, I have seen quite often Java EE environments running perfectly fine until the day where one piece of the infrastructure start to fail e.g. hardware failure. Suddenly the environment is running at reduced capacity (reduced # of JVM processes) and the whole environment goes down. What happened? There are many scenarios that can lead to domino effects but lack of JVM tuning and capacity to handle fail-over (short term extra load) is very common. If your JVM processes are running at 80%+ OldGen space capacity with frequent garbage collections, how can you expect to handle any fail-over scenario? Your load and performance testing exercise performed earlier should simulate such scenario and you should adjust your tuning settings properly so your Java Heap has enough buffer to handle extra load (extra objects) at short term. This is mainly applicable for the dynamic memory footprint since fail-over means redirecting a certain % of your concurrent users to the available JVM processes (middleware instances). #5 – Divide and conquer At this point you have performed dozens of load testing iterations. You know that your JVM is not leaking memory. Your application memory footprint cannot be reduced any further. You tried several tuning strategies such as using a large 64-bit Java Heap space of 10 GB+, multiple GC policies but still not finding your performance level acceptable? In my experience I found that, with current JVM specifications, proper vertical and horizontal scaling which involved creating a few JVM processes per physical host and across several hosts will give you the throughput and capacity that you are looking for. Your IT environment will also more fault tolerant if you break your application list in a few logical silos, with their own JVM process, Threads and tuning values. This “divide and conquer” strategy involves splitting your application(s) traffic to multiple JVM processes and will provide you with: - Reduced Java Heap size per JVM process (both static & dynamic footprint) - Reduced complexity of JVM tuning - Reduced GC elapsed and pause time per JVM process - Increased redundancy and fail-over capabilities - Aligned with latest Cloud and IT virtualization strategies The bottom line is that when you find yourself spending too much time in tuning that single elephant 64-bit JVM process, it is time to revisit your middleware and JVM deployment strategy and take advantage of vertical & horizontal scaling. This implementation strategy is more taxing for the hardware but will really pay off in the long run. Please provide any comment and share your experience on JVM Heap sizing and tuning.
July 19, 2012
·
142,194 Views
·
7 Likes
Comments
Nov 18, 2013 · Mike Rohde
Nice post Nikita,
"The second likely cause for the lack of the GC algorithm indicates that the application performance has not been a priority to the team"
From my experience, this is precisely what I see as well, application performance is often not a priority. Proper JVM tuning and resolution of memory leaks can improve the application performance, stability & scalability significantly when prioritized properly.
Regarding CMS vs G1, I have yet to observe a true success story as well. For one client, we had to stick to CMS due to the much increased native memory footprint (increase of total Java process memory size) associated with the G1 collector.
I will post some results once I have the chance to better experiment and share real life experience with the latest G1 collector. I recommend proper caution and capacity planning before switching from CMS to G1 using existing HW specifications.
Thanks.
P-H
Aug 15, 2013 · Mr B Loid
Hi Brian,
I had a closer look at the WildFly 8 Alpha 3 memory leak. I used a combo of Plumbr 3.0, MAT and thread snapshot execution to narrow it down.
- The leaking object type is: org.jboss.weld.context.SerializableContextualInstanceImpl
- The leak is created at: org.jboss.weld.context.unbound.DependentContextImpl.addDependentInstance()
- Hard references are kept from structures such as: org.jboss.weld.context.http.HttpConversationContextImpl
I can replicate the leak very easily by executing a REST Web Service call over and over. This keeps accumulating SerializableContextualInstanceImpl instances.The instances originates from the WELD CDI implementation. I was able to extract the full stack trace showing the "leak" in action. The leak appears to be triggered during the activation of the conversation context for a servlet request and is never released. I can also see the leak triggered during the execution of the REST WS itself.I hope this help..
## Leak - Execution path #1
(default task-5) PH Leak Creater Stack Trace:java.lang.Thread.getStackTrace(1567)(default task-5) org.jboss.weld.context.unbound.DependentContextImpl.addDependentInstance(85)(default task-5) org.jboss.weld.context.unbound.DependentContextImpl.get(46)(default task-5) org.jboss.weld.manager.BeanManagerImpl.getReference(737)(default task-5) org.jboss.weld.manager.BeanManagerImpl.getReference(793)(default task-5) org.jboss.weld.injection.FieldInjectionPoint.inject(92)(default task-5) org.jboss.weld.util.Beans.injectBoundFields(375)(default task-5) org.jboss.weld.util.Beans.injectFieldsAndInitializers(387)(default task-5) org.jboss.weld.injection.producer.DefaultInjector.inject(72)(default task-5) org.jboss.weld.injection.producer.ResourceInjector.inject(60)(default task-5) org.jboss.weld.injection.producer.DefaultInjector$1.proceed(66)(default task-5) org.jboss.weld.injection.InjectionContextImpl.run(48)(default task-5) org.jboss.weld.injection.producer.DefaultInjector.inject(64)(default task-5) org.jboss.weld.injection.producer.BasicInjectionTarget.inject(91)(default task-5) org.jboss.resteasy.cdi.JaxrsInjectionTarget.inject(35)(default task-5) org.jboss.weld.bean.ManagedBean.create(158)(default task-5) org.jboss.weld.context.AbstractContext.get(103)(default task-5) org.jboss.weld.bean.proxy.ContextBeanInstance.getInstance(93)(default task-5) org.jboss.weld.bean.proxy.ProxyMethodHandler.invoke(79)(default task-5) org.jboss.tools.examples.rest.MemberResourceRESTService$Proxy$_$$_WeldClientProxy.jvmLeak(-1)(default task-5) sun.reflect.NativeMethodAccessorImpl.invoke0(-2)(default task-5) sun.reflect.NativeMethodAccessorImpl.invoke(57)(default task-5) sun.reflect.DelegatingMethodAccessorImpl.invoke(43)(default task-5) java.lang.reflect.Method.invoke(601)(default task-5) org.jboss.resteasy.core.MethodInjectorImpl.invoke(137)(default task-5) org.jboss.resteasy.core.ResourceMethodInvoker.invokeOnTarget(272)(default task-5) org.jboss.resteasy.core.ResourceMethodInvoker.invoke(229)(default task-5) org.jboss.resteasy.core.ResourceMethodInvoker.invoke(216)(default task-5) org.jboss.resteasy.core.SynchronousDispatcher.invoke(356)(default task-5) org.jboss.resteasy.core.SynchronousDispatcher.invoke(179)(default task-5) org.jboss.resteasy.plugins.server.servlet.ServletContainerDispatcher.service(220)..........................................## Leak - Execution path #2
(default task-5) PH Leak Creater Stack Trace:java.lang.Thread.getStackTrace(1567)(default task-5) org.jboss.weld.context.unbound.DependentContextImpl.addDependentInstance(85)(default task-5) org.jboss.weld.context.unbound.DependentContextImpl.get(46)(default task-5) org.jboss.weld.manager.BeanManagerImpl.getReference(737)(default task-5) org.jboss.weld.manager.BeanManagerImpl.getReference(757)(default task-5) org.jboss.weld.bean.builtin.InstanceImpl.getBeanInstance(89)(default task-5) org.jboss.weld.bean.builtin.InstanceImpl.access$100(61)(default task-5) org.jboss.weld.bean.builtin.InstanceImpl$InstanceImplIterator.next(208)(default task-5) org.jboss.weld.context.conversation.ConversationImpl.getConversationContext(80)(default task-5) org.jboss.weld.context.conversation.ConversationImpl.<init>(69)(default task-5) org.jboss.weld.context.AbstractConversationContext.associateRequest(183)(default task-5) org.jboss.weld.context.AbstractConversationContext.activate(229)(default task-5) org.jboss.weld.servlet.ConversationContextActivator.activateConversationContext(90)(default task-5) org.jboss.weld.servlet.HttpContextLifecycle.requestInitialized(139)(default task-5) org.jboss.weld.servlet.WeldListener.requestInitialized(107)(default task-5) io.undertow.servlet.core.ApplicationListeners.requestInitialized(193)(default task-5) io.undertow.servlet.handlers.ServletInitialHandler.handleFirstRequest(184)(default task-5) io.undertow.servlet.handlers.ServletInitialHandler.dispatchRequest(172)(default task-5) io.undertow.servlet.handlers.ServletInitialHandler.access$000(56)(default task-5) io.undertow.servlet.handlers.ServletInitialHandler$1.handleRequest(107)(default task-5) io.undertow.server.HttpHandlers.executeRootHandler(36)(default task-5) io.undertow.server.HttpServerExchange$1.run(629)(default task-5) java.util.concurrent.ThreadPoolExecutor.runWorker(1110)(default task-5) java.util.concurrent.ThreadPoolExecutor$Worker.run(603)(default task-5) java.lang.Thread.run(722)Aug 15, 2013 · Mr B Loid
Hi Brian,
I had a closer look at the WildFly 8 Alpha 3 memory leak. I used a combo of Plumbr 3.0, MAT and thread snapshot execution to narrow it down.
- The leaking object type is: org.jboss.weld.context.SerializableContextualInstanceImpl
- The leak is created at: org.jboss.weld.context.unbound.DependentContextImpl.addDependentInstance()
- Hard references are kept from structures such as: org.jboss.weld.context.http.HttpConversationContextImpl
I can replicate the leak very easily by executing a REST Web Service call over and over. This keeps accumulating SerializableContextualInstanceImpl instances.The instances originates from the WELD CDI implementation. I was able to extract the full stack trace showing the "leak" in action. The leak appears to be triggered during the activation of the conversation context for a servlet request and is never released. I can also see the leak triggered during the execution of the REST WS itself.I hope this help..
## Leak - Execution path #1
(default task-5) PH Leak Creater Stack Trace:java.lang.Thread.getStackTrace(1567)(default task-5) org.jboss.weld.context.unbound.DependentContextImpl.addDependentInstance(85)(default task-5) org.jboss.weld.context.unbound.DependentContextImpl.get(46)(default task-5) org.jboss.weld.manager.BeanManagerImpl.getReference(737)(default task-5) org.jboss.weld.manager.BeanManagerImpl.getReference(793)(default task-5) org.jboss.weld.injection.FieldInjectionPoint.inject(92)(default task-5) org.jboss.weld.util.Beans.injectBoundFields(375)(default task-5) org.jboss.weld.util.Beans.injectFieldsAndInitializers(387)(default task-5) org.jboss.weld.injection.producer.DefaultInjector.inject(72)(default task-5) org.jboss.weld.injection.producer.ResourceInjector.inject(60)(default task-5) org.jboss.weld.injection.producer.DefaultInjector$1.proceed(66)(default task-5) org.jboss.weld.injection.InjectionContextImpl.run(48)(default task-5) org.jboss.weld.injection.producer.DefaultInjector.inject(64)(default task-5) org.jboss.weld.injection.producer.BasicInjectionTarget.inject(91)(default task-5) org.jboss.resteasy.cdi.JaxrsInjectionTarget.inject(35)(default task-5) org.jboss.weld.bean.ManagedBean.create(158)(default task-5) org.jboss.weld.context.AbstractContext.get(103)(default task-5) org.jboss.weld.bean.proxy.ContextBeanInstance.getInstance(93)(default task-5) org.jboss.weld.bean.proxy.ProxyMethodHandler.invoke(79)(default task-5) org.jboss.tools.examples.rest.MemberResourceRESTService$Proxy$_$$_WeldClientProxy.jvmLeak(-1)(default task-5) sun.reflect.NativeMethodAccessorImpl.invoke0(-2)(default task-5) sun.reflect.NativeMethodAccessorImpl.invoke(57)(default task-5) sun.reflect.DelegatingMethodAccessorImpl.invoke(43)(default task-5) java.lang.reflect.Method.invoke(601)(default task-5) org.jboss.resteasy.core.MethodInjectorImpl.invoke(137)(default task-5) org.jboss.resteasy.core.ResourceMethodInvoker.invokeOnTarget(272)(default task-5) org.jboss.resteasy.core.ResourceMethodInvoker.invoke(229)(default task-5) org.jboss.resteasy.core.ResourceMethodInvoker.invoke(216)(default task-5) org.jboss.resteasy.core.SynchronousDispatcher.invoke(356)(default task-5) org.jboss.resteasy.core.SynchronousDispatcher.invoke(179)(default task-5) org.jboss.resteasy.plugins.server.servlet.ServletContainerDispatcher.service(220)..........................................## Leak - Execution path #2
(default task-5) PH Leak Creater Stack Trace:java.lang.Thread.getStackTrace(1567)(default task-5) org.jboss.weld.context.unbound.DependentContextImpl.addDependentInstance(85)(default task-5) org.jboss.weld.context.unbound.DependentContextImpl.get(46)(default task-5) org.jboss.weld.manager.BeanManagerImpl.getReference(737)(default task-5) org.jboss.weld.manager.BeanManagerImpl.getReference(757)(default task-5) org.jboss.weld.bean.builtin.InstanceImpl.getBeanInstance(89)(default task-5) org.jboss.weld.bean.builtin.InstanceImpl.access$100(61)(default task-5) org.jboss.weld.bean.builtin.InstanceImpl$InstanceImplIterator.next(208)(default task-5) org.jboss.weld.context.conversation.ConversationImpl.getConversationContext(80)(default task-5) org.jboss.weld.context.conversation.ConversationImpl.<init>(69)(default task-5) org.jboss.weld.context.AbstractConversationContext.associateRequest(183)(default task-5) org.jboss.weld.context.AbstractConversationContext.activate(229)(default task-5) org.jboss.weld.servlet.ConversationContextActivator.activateConversationContext(90)(default task-5) org.jboss.weld.servlet.HttpContextLifecycle.requestInitialized(139)(default task-5) org.jboss.weld.servlet.WeldListener.requestInitialized(107)(default task-5) io.undertow.servlet.core.ApplicationListeners.requestInitialized(193)(default task-5) io.undertow.servlet.handlers.ServletInitialHandler.handleFirstRequest(184)(default task-5) io.undertow.servlet.handlers.ServletInitialHandler.dispatchRequest(172)(default task-5) io.undertow.servlet.handlers.ServletInitialHandler.access$000(56)(default task-5) io.undertow.servlet.handlers.ServletInitialHandler$1.handleRequest(107)(default task-5) io.undertow.server.HttpHandlers.executeRootHandler(36)(default task-5) io.undertow.server.HttpServerExchange$1.run(629)(default task-5) java.util.concurrent.ThreadPoolExecutor.runWorker(1110)(default task-5) java.util.concurrent.ThreadPoolExecutor$Worker.run(603)(default task-5) java.lang.Thread.run(722)Aug 14, 2013 · Mr B Loid
Sure Jason,
Let me work on the packaging and I will get back to you when available.
Regards,
P-H
Aug 14, 2013 · Mr B Loid
Sure Jason,
Let me work on the packaging and I will get back to you when available.
Regards,
P-H
Aug 14, 2013 · Mr B Loid
Sure Brian,
I did not have a chance to deep dive into this memory leak at this point. I will get back to you shortly with the details and location of this leak found by Plumbr.
What I will do is re-run the REST Web Service & load test cycle without the engineered memory leak so we can better expose this possible WildFly 8 Alpha3 memory leak. I will also perform a code walk-through of the affected component as well.
Regards,
P-H
Senior Technical Consultant
CGI Inc.
Aug 14, 2013 · Mr B Loid
Sure Brian,
I did not have a chance to deep dive into this memory leak at this point. I will get back to you shortly with the details and location of this leak found by Plumbr.
What I will do is re-run the REST Web Service & load test cycle without the engineered memory leak so we can better expose this possible WildFly 8 Alpha3 memory leak. I will also perform a code walk-through of the affected component as well.
Regards,
P-H
Senior Technical Consultant
CGI Inc.
Jun 18, 2013 · James Sugrue
Thanks Glyn for your comments,
The Java 8 Metaspace JVM arguments are similar to Java 7 HotSpot PermSize and MaxPermSize parameters (initial & maximum size).
For Java 8, Metaspace GC will be indeed be triggered once it reaches the current MetaspaceSize. MaxMetaspaceSize is simply the upper limit you can setup in order to avoid, for example, a 64-bit JVM process to use too much native memory (existing leak, physical resource constraints etc.).
As you saw from the above GC log snapshot, after each Metaspace GC, the JVM did trigger an expansion of the Metaspace, up to its upper limit when set. Once the Metaspace can no longer be expanded (upper limit reached, 32-bit memory address space depletion, non-availablity of OS virtual memory...) OOM is thrown by the JVM.
I will soon release a part 2 of this article focusing on garbage collection and debugging strategies for future Java 8 Metaspace / native memory leak.
Regards,
P-H
Jun 18, 2013 · Tony Thomas
Thanks Glyn for your comments,
The Java 8 Metaspace JVM arguments are similar to Java 7 HotSpot PermSize and MaxPermSize parameters (initial & maximum size).
For Java 8, Metaspace GC will be indeed be triggered once it reaches the current MetaspaceSize. MaxMetaspaceSize is simply the upper limit you can setup in order to avoid, for example, a 64-bit JVM process to use too much native memory (existing leak, physical resource constraints etc.).
As you saw from the above GC log snapshot, after each Metaspace GC, the JVM did trigger an expansion of the Metaspace, up to its upper limit when set. Once the Metaspace can no longer be expanded (upper limit reached, 32-bit memory address space depletion, non-availablity of OS virtual memory...) OOM is thrown by the JVM.
I will soon release a part 2 of this article focusing on garbage collection and debugging strategies for future Java 8 Metaspace / native memory leak.
Regards,
P-H
Mar 21, 2013 · James Sugrue
Thanks Ivo for your comments,
Yes, I'm currently exploring and comparing the memory leak analysis effort & effectiveness between the Heap Dump, classic Java memory profilers vs. Plumbr active monitoring & leak detection strategies.
I will post future articles on the subject.
Regards,
P-H
Mar 21, 2013 · James Sugrue
Thanks Ivo for your comments,
Yes, I'm currently exploring and comparing the memory leak analysis effort & effectiveness between the Heap Dump, classic Java memory profilers vs. Plumbr active monitoring & leak detection strategies.
I will post future articles on the subject.
Regards,
P-H
Feb 13, 2013 · James Sugrue
Hi Andre and thanks for your comments,
You did not miss the core of the story. You raised a valid point and concern. I recommend that you review the Open JDK post about this change and motivation.
http://openjdk.java.net/jeps/122
One of the motivation is the convergence between Oracle JRockit & Oracle HotSpot (JRockit never had a PermGen space). Another benefit (when using default or unbounded) is that it eliminates the need for PermGen sizing. Now, we have to keep in mind that native memory will be used instead so this means that you still need to perform your due diligence as per below:
- A future upgrade project From Java 5-6-7 to Java 8 should include proper capacity planning along with performance testing (vs. established baseline) so you can assess the change of behaviour and memory native footprint requirement for your application.
- Monitor closely the Java process size and native heap space of your production JVM processes for possible memory leaks and native memory footprint.
- If using 32-bit JVM, ensure that you have enough native heap for the JVM to dynamically grow the meta space (which is in a race with Java Heap space).
- If using 64-bit JVM, ensure that your OS has enough physical/virtual memory to allow the JVM to dynamically grow the meta space.
As you mentioned, for some production environments with limited virtual memory availability, using the cap mode may be preferable. Regardless of your decision of using default unbounded or cap mode, proper capacity planning and monitoring will be very important.
My experience with IBM JM and JRockit from production environments did teach me to always monitor closely the native heap space.
Finally, you will still be able to use MAT and perform Heap Dump analysis for Class & Class loader leaks (Java representation objects are still present). You can see this by running the sample program and replicating scenario #3. Add the -XX:+HeapDumpOnOutOfMemoryError flag and analyze the generated heap dump following the
java.lang.OutOfMemoryError: Metadata space. You will be able to pinpoint the class loader memory leak.Regards,
P-H
Feb 13, 2013 · Tony Thomas
Hi Andre and thanks for your comments,
You did not miss the core of the story. You raised a valid point and concern. I recommend that you review the Open JDK post about this change and motivation.
http://openjdk.java.net/jeps/122
One of the motivation is the convergence between Oracle JRockit & Oracle HotSpot (JRockit never had a PermGen space). Another benefit (when using default or unbounded) is that it eliminates the need for PermGen sizing. Now, we have to keep in mind that native memory will be used instead so this means that you still need to perform your due diligence as per below:
- A future upgrade project From Java 5-6-7 to Java 8 should include proper capacity planning along with performance testing (vs. established baseline) so you can assess the change of behaviour and memory native footprint requirement for your application.
- Monitor closely the Java process size and native heap space of your production JVM processes for possible memory leaks and native memory footprint.
- If using 32-bit JVM, ensure that you have enough native heap for the JVM to dynamically grow the meta space (which is in a race with Java Heap space).
- If using 64-bit JVM, ensure that your OS has enough physical/virtual memory to allow the JVM to dynamically grow the meta space.
As you mentioned, for some production environments with limited virtual memory availability, using the cap mode may be preferable. Regardless of your decision of using default unbounded or cap mode, proper capacity planning and monitoring will be very important.
My experience with IBM JM and JRockit from production environments did teach me to always monitor closely the native heap space.
Finally, you will still be able to use MAT and perform Heap Dump analysis for Class & Class loader leaks (Java representation objects are still present). You can see this by running the sample program and replicating scenario #3. Add the -XX:+HeapDumpOnOutOfMemoryError flag and analyze the generated heap dump following the
java.lang.OutOfMemoryError: Metadata space. You will be able to pinpoint the class loader memory leak.Regards,
P-H
Nov 28, 2012 · Mr B Loid
Hi Jose,
Let’s take a simple example of a Java Web application running on Tomcat or JBoss. Let’s assume you need, for some reasons, a 30 GB Java heap size to handle your client traffic (many sessions with high memory footprint etc.).
Partitioning your JVM process simply means that you would create let’s say 3 instances of Tomcat on the same host vs. only one. Each instance of Tomcat has its own JVM process (and thread pools). In this scenario, you would configure the Java heap size at 10 GB instead of 30 GB. This is assuming your memory footprint is dynamic and depends on incoming client requests vs. static footprint. You application will obviously be more fault tolerant with reduced impact if you need to take restart actions, dealing with JVM crashes and/or stuck threads etc.
Vertical scaling all depends of your application behavior, static vs. dynamic footprint and also availability of CPU cores from your physical/virtual host.
Regards,
P-H
Nov 28, 2012 · Mr B Loid
Hi Jose,
Let’s take a simple example of a Java Web application running on Tomcat or JBoss. Let’s assume you need, for some reasons, a 30 GB Java heap size to handle your client traffic (many sessions with high memory footprint etc.).
Partitioning your JVM process simply means that you would create let’s say 3 instances of Tomcat on the same host vs. only one. Each instance of Tomcat has its own JVM process (and thread pools). In this scenario, you would configure the Java heap size at 10 GB instead of 30 GB. This is assuming your memory footprint is dynamic and depends on incoming client requests vs. static footprint. You application will obviously be more fault tolerant with reduced impact if you need to take restart actions, dealing with JVM crashes and/or stuck threads etc.
Vertical scaling all depends of your application behavior, static vs. dynamic footprint and also availability of CPU cores from your physical/virtual host.
Regards,
P-H
Nov 28, 2012 · Mr B Loid
Hi Jose,
Let’s take a simple example of a Java Web application running on Tomcat or JBoss. Let’s assume you need, for some reasons, a 30 GB Java heap size to handle your client traffic (many sessions with high memory footprint etc.).
Partitioning your JVM process simply means that you would create let’s say 3 instances of Tomcat on the same host vs. only one. Each instance of Tomcat has its own JVM process (and thread pools). In this scenario, you would configure the Java heap size at 10 GB instead of 30 GB. This is assuming your memory footprint is dynamic and depends on incoming client requests vs. static footprint. You application will obviously be more fault tolerant with reduced impact if you need to take restart actions, dealing with JVM crashes and/or stuck threads etc.
Vertical scaling all depends of your application behavior, static vs. dynamic footprint and also availability of CPU cores from your physical/virtual host.
Regards,
P-H
Nov 28, 2012 · Mr B Loid
Hi Jose,
Let’s take a simple example of a Java Web application running on Tomcat or JBoss. Let’s assume you need, for some reasons, a 30 GB Java heap size to handle your client traffic (many sessions with high memory footprint etc.).
Partitioning your JVM process simply means that you would create let’s say 3 instances of Tomcat on the same host vs. only one. Each instance of Tomcat has its own JVM process (and thread pools). In this scenario, you would configure the Java heap size at 10 GB instead of 30 GB. This is assuming your memory footprint is dynamic and depends on incoming client requests vs. static footprint. You application will obviously be more fault tolerant with reduced impact if you need to take restart actions, dealing with JVM crashes and/or stuck threads etc.
Vertical scaling all depends of your application behavior, static vs. dynamic footprint and also availability of CPU cores from your physical/virtual host.
Regards,
P-H
Nov 28, 2012 · Mr B Loid
Hi Jose,
Let’s take a simple example of a Java Web application running on Tomcat or JBoss. Let’s assume you need, for some reasons, a 30 GB Java heap size to handle your client traffic (many sessions with high memory footprint etc.).
Partitioning your JVM process simply means that you would create let’s say 3 instances of Tomcat on the same host vs. only one. Each instance of Tomcat has its own JVM process (and thread pools). In this scenario, you would configure the Java heap size at 10 GB instead of 30 GB. This is assuming your memory footprint is dynamic and depends on incoming client requests vs. static footprint. You application will obviously be more fault tolerant with reduced impact if you need to take restart actions, dealing with JVM crashes and/or stuck threads etc.
Vertical scaling all depends of your application behavior, static vs. dynamic footprint and also availability of CPU cores from your physical/virtual host.
Regards,
P-H
Nov 28, 2012 · Mr B Loid
Great post Nikita. You listed well the facts here and 2 problems to keep an eye on when upgrading or planning to use a 64-bit JVM. These are exactly the main problems I observed while performing multiple JVM upgrades from 32-bit to 64-bit for my clients.
Regarding problem #1, this is true that the footprint increase can be mitigated at some level when enabling compression for the HotSpot JVM with minimal performance impact. Proper due diligence and load testing is still my primary recommendation so you can truly assess any delta increase and/or negative impact for your environment. This is especially important if you are planning to re-use existing hardware so you will need to perform extra capacity planning analysis as well to ensure you have enough RAM & CPU to handle the upgrade.
Regarding the problem #2, my understanding from Nikita’s point is that typically using a JVM that big is when we are dealing with a large OldGen space. The cost of the short live objects may not be excessive here but the true impact can be observed when the Full GC has to clear the OldGen space (long lived objects). Given the # objects accumulated, this can lead to high GC time and JVM hang if the GC policy is not tuned properly.
If your physical/virtual server has enough CPU cores, typically I have observed better throughput & capacity by splitting that 20-30 GB JVM into sub JVM processes up to 10-15 GB in order to reduce inner contention.
My final point about problem #2 is thread concurrency. Running a good portion of your traffic using a single JVM process of 30 GB may be appealing but this can also lead to significant increase of thread concurrency within the JVM/middleware/application. Again, depending of your application behavior, you may notice throughput capacity increase by splitting (partitioning ) such big JVM process.
Regards,
P-H
Sep 07, 2012 · James Sugrue
You are actually correct. The impact of the busy loop is more noticed for the non-thread safe HashMap vs. others since total worker threads elapsed is less than 1 second.
The Main Java program Thread was actually using ~ 20% CPU (as per CPU per Thread analysis) due this busy loop for the non-thread safe HashMap.
I just ran the same test again using CountDownLatch. Using the non-busy looping approach, CPU is now mainly used from worker threads stuck in infinite loop when using non-thread safe HashMap test case. Performance ratio between the thread safe data structures remain the same; so is the risk and easy problem replication when using the plain old HashMap with lack of proper synchronization. The now improved test case eliminates the unnecessary CPU noise from the Main Thread Program.
Thanks.
P-H
Sep 07, 2012 · James Sugrue
You are actually correct. The impact of the busy loop is more noticed for the non-thread safe HashMap vs. others since total worker threads elapsed is less than 1 second.
The Main Java program Thread was actually using ~ 20% CPU (as per CPU per Thread analysis) due this busy loop for the non-thread safe HashMap.
I just ran the same test again using CountDownLatch. Using the non-busy looping approach, CPU is now mainly used from worker threads stuck in infinite loop when using non-thread safe HashMap test case. Performance ratio between the thread safe data structures remain the same; so is the risk and easy problem replication when using the plain old HashMap with lack of proper synchronization. The now improved test case eliminates the unnecessary CPU noise from the Main Thread Program.
Thanks.
P-H
Sep 07, 2012 · Lori MacVittie
You are actually correct. The impact of the busy loop is more noticed for the non-thread safe HashMap vs. others since total worker threads elapsed is less than 1 second.
The Main Java program Thread was actually using ~ 20% CPU (as per CPU per Thread analysis) due this busy loop for the non-thread safe HashMap.
I just ran the same test again using CountDownLatch. Using the non-busy looping approach, CPU is now mainly used from worker threads stuck in infinite loop when using non-thread safe HashMap test case. Performance ratio between the thread safe data structures remain the same; so is the risk and easy problem replication when using the plain old HashMap with lack of proper synchronization. The now improved test case eliminates the unnecessary CPU noise from the Main Thread Program.
Thanks.
P-H
Sep 07, 2012 · Lori MacVittie
You are actually correct. The impact of the busy loop is more noticed for the non-thread safe HashMap vs. others since total worker threads elapsed is less than 1 second.
The Main Java program Thread was actually using ~ 20% CPU (as per CPU per Thread analysis) due this busy loop for the non-thread safe HashMap.
I just ran the same test again using CountDownLatch. Using the non-busy looping approach, CPU is now mainly used from worker threads stuck in infinite loop when using non-thread safe HashMap test case. Performance ratio between the thread safe data structures remain the same; so is the risk and easy problem replication when using the plain old HashMap with lack of proper synchronization. The now improved test case eliminates the unnecessary CPU noise from the Main Thread Program.
Thanks.
P-H