Over a million developers have joined DZone.

Self-Healing JVM

Microservices! They are everywhere, or at least, the term is. When should you use a microservice architecture? What factors should be considered when making that decision? Do the benefits outweigh the costs? Why is everyone so excited about them, anyway?  Brought to you in partnership with IBM.

This post is an example about an application where the first solution to each and every IT problem – “have you tried turning it off and on again” – can backfire and do more harm than good.

Instead of turning things off and on, we have an application that can literally heal itself: it fails at the beginning, but starts running smoothly after some time. To give an example of such application in action, we recreated it in the most simple form possible, gathering inspiration from what is now a five-year old post from the Heinz Kabutz’ Java Newsletter:

package eu.plumbr.test;

public class HealMe {
  private static final int SIZE = (int) (Runtime.getRuntime().maxMemory() * 0.6);

  public static void main(String[] args) throws Exception {
    for (int i = 0; i < 1000; i++) {

  private static void allocateMemory(int i) {
    try {
        byte[] bytes = new byte[SIZE];

      byte[] moreBytes = new byte[SIZE];

      System.out.println("I allocated memory successfully " + i);

    } catch (OutOfMemoryError e) {
      System.out.println("I failed to allocate memory " + i);

The code above is allocating two bulks of memory in a loop. Each of those allocation is equal to 60% of the total available heap size. As the allocations occur sequentially in the same method, then one might expect this code to keep throwing java.lang.OutOfMemoryError: Java heap space errors and never successfully complete the allocateMemory() method.

So, lets see whether our expectations are correct by starting with the static analysis of the source code:

  1. From the first fast examination, this code really cannot complete, because we try to allocate more memory than is available to JVM.
  2. If we look closer we can notice that the first allocation takes place in a scoped block, meaning that the variables defined in this block are visible only to this block. This indicates that thebytes should be eligible for GC after the block is completed. And so our code should in fact run fine right from the beginning, as at the time when it tries to allocate moreBytes the previous allocation bytes should be dead.
  3. If we now look into the compiled classfile, we will see the following bytecode:
    private static void allocateMemory(int);
           0: getstatic     #3                  // Field SIZE:I
           3: newarray       byte
           5: astore_1      
           6: getstatic     #4                  // Field java/lang/System.out:Ljava/io/PrintStream;
           9: aload_1       
          10: arraylength   
          11: invokevirtual #5                  // Method java/io/PrintStream.println:(I)V
          14: getstatic     #3                  // Field SIZE:I
          17: newarray       byte
          19: astore_1      
          20: getstatic     #4                  // Field java/lang/System.out:Ljava/io/PrintStream;
          23: aload_1       
          24: arraylength   
          25: invokevirtual #5                  // Method java/io/PrintStream.println:(I)V
    ---- cut for brevity ----

    Here we see, that on offsets 3-5 first array is allocated and stored into local variable with index 1. Then, on offset 17 another array is going to be allocated. But first array is still referenced by local variable and so the second allocation should always fail with OOM. Bytecode interpreter just cannot let GC clean up first array, because it is still strongly referenced.

    Our static code analysis has shown us that for two underlying reasons, the presented code should not run successfully and in one case, it should. Which one out of those three is the correct one? Let us actually run it and see for ourselves. It turns out that both conclusions were correct. First, application fails to allocate memory. But after some time (on my Mac OS X with Java 8 it happens at iteration #255) the allocations start succeeding:

    java -Xmx2g eu.plumbr.test.HealMe
    I failed to allocate memory 0
    I failed to allocate memory 1
    … cut for brevity ...
    I failed to allocate memory 254
    I failed to allocate memory 255
    I allocated memory successfully 256
    I allocated memory successfully 257
    Self-healing code is a reality! Skynet is near...

n order to understand what is really happening we need to think, what changes during program execution? The obvious answer is, of course, Just-In-Time compilation can occur. If you recall, Just-In-Time compilation is a JVM built-in mechanics to optimize code hotspots. For this, the JIT monitors the running code and when a hotspot is detected, JIT compiles your bytecode into native code, performing different optimizations such as method inlining and dead code elimination in the process.

Lets see if this is a case by turning on the following command line options and relaunching the program

-XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -XX:+LogCompilation

This will generate a log file, in our case named hotspot_pid38139.log, where 38139 was the PID of your java process. In this file the following line can be found:

<task_queued compile_id='94' method='HealMe allocateMemory (I)V' bytes='83' count='256' iicount='256' level='3' stamp='112.305' comment='tiered' hot_count='256'/>

This means, that after executing allocateMemory() methods 256 times, C1 compiler has decided to queue this method for C1 tier 3 compilation. You can get more information about tiered compilation’s levels and different thresholds here. And so our first 256 iterations were run in interpreted mode, where bytecode interpreter, being a simple stack machine, cannot know in advance, if some variable, bytes in this case, will be used further on or not. But JIT sees the whole method at once and so can deduce than bytes will not be used anymore and is, in fact, GC eligible. Thus the garbage collection can eventually take place and our program has magically self-healed. Now, I can only hope none of the readers should actually be responsible for debugging such a case in production. But in case you wish to make someone’s life a misery, introducing code like this to production would be a sure way to achieve this.

This post was originally published in the 2014 Java Advent Calendar.

Discover how the Watson team is further developing SDKs in Java, Node.js, Python, iOS, and Android to access these services and make programming easy. Brought to you in partnership with IBM.


Published at DZone with permission of Nikita Salnikov-tarnovski, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}