Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Troubleshooting Java Applications With Arthas

DZone's Guide to

Troubleshooting Java Applications With Arthas

Want to learn more about troubleshooting Java applications in production? Check out this post on how to troubleshoot Java apps with Arthas.

· Java Zone ·
Free Resource

Verify, standardize, and correct the Big 4 + more– name, email, phone and global addresses – try our Data Quality APIs now at Melissa Developer Portal!

Arthas logo

Arthas is a Java Diagnostic tool open sourced by Alibaba. Arthas can help developer trouble-shoot production issues for Java applications without modifying your code or restarting your server.

Let's start with a simple demo to see how we can use Arthas to trouble-shoot this Java program on-the-fly.

Java Program Demo

import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;
public class Demo {
    static class Counter {
        private static AtomicInteger count = new AtomicInteger(0);
        public static void increment() {
            count.incrementAndGet();
        }
        public static int value() {
            return count.get();
        }
    }

    public static void main(String[] args) throws InterruptedException {
        while (true) {
            doSomething();
            TimeUnit.SECONDS.sleep(1);
        }
    }

    private static int doSomething() {
        for (int i = 0; i< 1000; i++) {
            Counter.increment();
        }
        int value = Counter.value();
        return value;
    }
}


Save the following code to a Demo.java and run the commands in Shell:

javac Demo.java && java Demo


Install Arthas

curl -L https://alibaba.github.io/arthas/install.sh | sh


Start Arthas

To start Arthas, simply type the following command:

./as.sh


This script will automatically display the available Java process that you can attach. In this example, we have only two available Java processes to attach.

Arthas script version: 3.0.4
Found existing java process, please choose one and hit RETURN.
* [1]: 9617
  [2]: 37374 Demo


Since we are going to attach the Demo.Java process, we type 2 and hit RETURN, then Arthas will proceed to attach to the specified process.

Calculating attach execution time...
Attaching to 37374 using version 3.0.4...

real    0m0.986s
user    0m0.379s
sys     0m0.058s
Attach success.
Connecting to arthas server... current timestamp is 1537684660
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
  ,---.  ,------. ,--------.,--.  ,--.  ,---.   ,---.
 /  O  \ |  .--. ''--.  .--'|  '--'  | /  O  \ '   .-'
|  .-.  ||  '--'.'   |  |   |  .--.  ||  .-.  |`.  `-.
|  | |  ||  |\  \    |  |   |  |  |  ||  | |  |.-'    |
`--' `--'`--' '--'   `--'   `--'  `--'`--' `--'`-----'


wiki: https://alibaba.github.io/arthas
version: 3.0.4
pid: 37374
timestamp: 1537684661535

$


If you can see the Arthas logo, then congratulations, you have successfully attached to the target JVM!

Next, let's go through some of the cool features that Arthas provides and have a quick evaluation of what Arthas can do for you.

TroubleShoot With Arthas

Dashboard: Overview of Your Java Process

Type in dashboard and hit the ENTER, you will see it in the following code (press Ctrl+C to stop):

$ dashboard
ID     NAME                   GROUP          PRIORI STATE  %CPU    TIME   INTERRU DAEMON
17     pool-2-thread-1        system         5      WAITIN 67      0:0    false   false
27     Timer-for-arthas-dashb system         10     RUNNAB 32      0:0    false   true
11     AsyncAppender-Worker-a system         9      WAITIN 0       0:0    false   true
9      Attach Listener        system         9      RUNNAB 0       0:0    false   true
3      Finalizer              system         8      WAITIN 0       0:0    false   true
2      Reference Handler      system         10     WAITIN 0       0:0    false   true
4      Signal Dispatcher      system         9      RUNNAB 0       0:0    false   true
26     as-command-execute-dae system         10     TIMED_ 0       0:0    false   true
13     job-timeout            system         9      TIMED_ 0       0:0    false   true
1      main                   main           5      TIMED_ 0       0:0    false   false
14     nioEventLoopGroup-2-1  system         10     RUNNAB 0       0:0    false   false
18     nioEventLoopGroup-2-2  system         10     RUNNAB 0       0:0    false   false
23     nioEventLoopGroup-2-3  system         10     RUNNAB 0       0:0    false   false
15     nioEventLoopGroup-3-1  system         10     RUNNAB 0       0:0    false   false
Memory             used   total max    usage GC
heap               32M    155M  1820M  1.77% gc.ps_scavenge.count  4
ps_eden_space      14M    65M   672M   2.21% gc.ps_scavenge.time(m 166
ps_survivor_space  4M     5M    5M           s)
ps_old_gen         12M    85M   1365M  0.91% gc.ps_marksweep.count 0
nonheap            20M    23M   -1           gc.ps_marksweep.time( 0
code_cache         3M     5M    240M   1.32% ms )
Runtime
os.name                Mac OS X
os.version             10.13.4
java.version           1.8.0_162
java.home              /Library/Java/JavaVir
                       tualMachines/jdk1.8.0
                       _162.jdk/Contents/Hom
                       e/jre


By default, the content will be refreshed every five seconds. On top of the dashboard, you can see a list of running threads with some information, like thread ID, name, group, priority, state, CPU usage, running time, etc. They are ranked by CPU usage.

In the middle section, there are two sub-sections. On the left, it shows the memory usage with heap usage and non-heap usage. In the heap usage, you can see the usage of the young generation, including the Eden/survivor region and the old generation.

On the right, it shows the Garbage Collection statistics, including the Garbage Collection algorithm, as well as the time and count for both the young and old generations.

In the bottom section, it shows basic system information, e.g. the Java version, operating system, and Java home location.

SC: Search Class Loaded by Your JVM

Sometimes, you want to know whether a classed is loaded and where is it loaded from. SC is the command for you.

$ sc Demo$Counter -d
 class-info        Demo$Counter
 code-source       /private/var/tmp/
 name              Demo$Counter
 isInterface       false
 isAnnotation      false
 isEnum            false
 isAnonymousClass  false
 isArray           false
 isLocalClass      false
 isMemberClass     true
 isPrimitive       false
 isSynthetic       false
 simple-name       Counter
 modifier          static
 annotation
 interfaces
 super-class       +-java.lang.Object
 class-loader      +-sun.misc.Launcher$AppClassLoader@2a139a55
                     +-sun.misc.Launcher$ExtClassLoader@51d04a9
 classLoaderHash   2a139a55

Affect(row-cnt:1) cost in 13 ms.


The sc Demo$Counter tells Arthas to search for a class from all the loaded classes of this JVM. The-doption means to display the metadata in a more detail way. For example, where the class is loaded from, the modifiers, and the classloader hierarchy.

Jad: Decompile Class Into Source Code

Sometimes, you want to confirm whether your application is running the correct code or not. To do that, you have to decompile the class into the source code. The jad is the right command for you.

$ jad Demo

ClassLoader:
+-sun.misc.Launcher$AppClassLoader@2a139a55
  +-sun.misc.Launcher$ExtClassLoader@51d04a9

Location:
/private/var/tmp/

/*
 * Decompiled with CFR 0_132.
 */
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;

public class Demo {
    public static void main(String[] arrstring) throws InterruptedException {
        do {
            Demo.doSomething();
            TimeUnit.SECONDS.sleep(1L);
        } while (true);
    }

    private static int doSomething() {
        int n;
        for (n = 0; n < 1000; ++n) {
            Counter.increment();
        }
        n = Counter.value();
        return n;
    }

    static class Counter {
        private static AtomicInteger count = new AtomicInteger(0);

        Counter() {
        }

        public static int value() {
            return count.get();
        }

        public static void increment() {
            count.incrementAndGet();
        }
    }
}


Watch: View Method Invocation Input and Results

Next, we would like to look at the return value of Counter.value(). Normally, we have to add the following:

System.out.println(value);


And, we will need to restart the demo to see the return value. However, with Arthas, you don't have to add any code or restart. You can use the watch command instead.

$ watch Demo$Counter value returnObj
Press Ctrl+C to abort.
Affect(class-cnt:1 , method-cnt:1) cost in 62 ms.
ts=2018-09-23 15:36:54;result=@Integer[24000]
ts=2018-09-23 15:36:55;result=@Integer[25000]
ts=2018-09-23 15:36:56;result=@Integer[26000]
ts=2018-09-23 15:36:57;result=@Integer[27000]
ts=2018-09-23 15:36:58;result=@Integer[28000]


The basic watch command syntax is watch class-name method-name content. Here, since Counter is an inner class of Demo, we have to use Deme$Counter to denote it. Value is the name of the method. ReturnObjis one of the pre-defined keywords that we can use to denote the content to watch. Other possible keywords include paramsthrowExptargetclassloader, and more. We will talk about that later. Here, we just need to know that returnObj means that we want to watch the return object of the specified method.

The next line:

Press Ctrl+C to abort.


This means that Arthas will start another thread to process this command and output the console. If you would like to stop watching, you can press Ctrl+C to abort.

The next line can be found below:

Affect(class-cnt:1 , method-cnt:1) cost in 62 ms.


This means that Arthas will search for the specified class and method in the Java process and change the byte-code of the method to add some logic. This means that there is one class and one method that has been modified by Arthas.

Once the method has been called, it will output the content you specified to watch:

ts=2018-09-23 15:36:54;result=@Integer[24000]


This includes a timestamp and the result, which is an integer of 24000.

Of course, this is a simple case, and if the returned object is complicated and you would like to what is inside, you can use the ognl expression to customize it. However, this is beyond the scope of this article.

Trace: Find the Bottleneck of Your Method Invocation

Next, whenever you want to find out whether there is a bottleneck, e.g. why my method running slowly, you should try the trace command. It can list all the sub method invocations with the number of invocations and the total execution time. This is very helpful to identify slow method invocations or an unexpected number of invocations.

Next, let's find out what is going on inside doSomething using the trace command.

$ trace Demo doSomething
Press Ctrl+C to abort.
Affect(class-cnt:1 , method-cnt:1) cost in 101 ms.
`---ts=2018-09-24 14:43:58;thread_name=main;id=1;is_daemon=false;priority=5;TCCL=sun.misc.Launcher$AppClassLoader@2a139a55
    `---[25.927722ms] Demo:doSomething()
        +---[min=0.00184ms,max=5.062901ms,total=9.786846ms,count=1000] Demo$Counter:increment()
        `---[0.011398ms] Demo$Counter:value()


The first several lines have been explained above, so here, we will just skip it. Let start with:

14:43:58;thread_name=main;id=1;is_daemon=false;priority=5;TCCL=sun.misc.Launcher$AppClassLoader@2a139a55


This line shows some important information on calling Thread, which includes the thread name, ID, priority, and  ThreadContextClassloader.

Next, let's look at the following lines:

 `---[25.927722ms] Demo:doSomething()
        +---[min=0.00184ms,max=5.062901ms,total=9.786846ms,count=1000] Demo$Counter:increment()
        `---[0.011398ms] Demo$Counter:value()


This shows the sub-method invocation tree of doSomething, It means in doSomething that two methods have been called; they are: Demo$Counter:increment() and Demo$Counter:value(). The former method has been called 1000 times, with a total time of 9.786846ms, and the latter method has been called once, with a total time of 0.011398ms.

Monitor: View Method Invocation Statistics

Sometimes, we want to have a performance overview for a specified method to ensure that it is working correctly. In this case,Monitor is a good friend to have.

$ monitor Demo$Counter increment -c 5
Press Ctrl+C to abort.
Affect(class-cnt:1 , method-cnt:1) cost in 17 ms.
 timestamp            class         method     total  success  fail  avg-rt(ms)  fail-rate
-------------------------------------------------------------------------------------------
 2018-09-24 15:13:31  Demo$Counter  increment  5000   5000     0     0.01        0.00%

 timestamp            class         method     total  success  fail  avg-rt(ms)  fail-rate
-------------------------------------------------------------------------------------------
 2018-09-24 15:13:36  Demo$Counter  increment  5000   5000     0     0.00        0.00%

 timestamp            class         method     total  success  fail  avg-rt(ms)  fail-rate
-------------------------------------------------------------------------------------------
 2018-09-24 15:13:41  Demo$Counter  increment  5000   5000     0     0.00        0.00%


The -c 5 option tells Arthas to output the result every five seconds. And, the output shows the total number of invocations, successive invocations (no exception is thrown out of the method), failed invocations, average response time, and failure rates during the monitor period.

Stack: View Call Stack of the Method

Sometimes, we want to know who is calling this method. Normally, when an exception is raised, there will be a stack trace. But, when your method is acting abnormally, you should be able to know who is causing this behavior without any change of your code. Here comes the stackcommand to help you solve it.

$ stack  Demo$Counter increment -n 1
Press Ctrl+C to abort.
Affect(class-cnt:1 , method-cnt:1) cost in 17 ms.
ts=2018-09-24 15:20:33;thread_name=main;id=1;is_daemon=false;priority=5;TCCL=sun.misc.Launcher$AppClassLoader@2a139a55
    @Demo.doSomething()
        at Demo.main(Demo.java:16)


The stack Demo$Counter increment basically means give me the stack trace if Demo$Counter#increment is called. The option -n 1 tells Arthas to only output the stack trace once. We are already familiar with most of the output, except for the last several lines, which outputs the stack trace of theincrement method.

Tt: Time Tunnel of Method Invocations

Sometimes, there may be too many invocations to a method, and whether you want to check the input parameter or the returned object is undecided, because you have no idea what to check. At this time, you'd better record all invocations so that you can analyze them later. At this time, you will need the ttcommand, which does exactly what you need.

$ tt Demo$Counter value -t
Press Ctrl+C to abort.
Affect(class-cnt:1 , method-cnt:1) cost in 14 ms.
 INDEX   TIMESTAMP              COST(ms)   IS-RET  IS-EXP  OBJECT  CLASS          METHOD
----------------------------------------------------------------------------------------
 1000    2018-09-24 15:24:14    0.197211   true    false   NULL    Demo$Counter    value
 1001    2018-09-24 15:24:15    0.091722   true    false   NULL    Demo$Counter    value
 1002    2018-09-24 15:24:16    0.11092    true    false   NULL    Demo$Counter    value
 1003    2018-09-24 15:24:17    0.078451   true    false   NULL    Demo$Counter    value


tt Demo$Counter value -t basically means we need to record every invocation to Demo$Counter#value, each one of the recordings is called a time fragment. The result table shows the ID of the time fragment, the timestamp that happened, the execution time, and whether it successfully returned or ended it up with an exception thrown. The last column OBJECT represents the actual object that is called. Here, it is nullbecause it is a static method.

After this recording, you can list all the recordings:

$ tt -l
 INDEX   TIMESTAMP            COST(ms)   IS-RET  IS-EXP   OBJECT  CLASS         METHOD
--------------------------------------------------------------------------------------
 1000    2018-09-24 15:24:14  0.197211   true    false    NULL    Demo$Counter  value
 1001    2018-09-24 15:24:15  0.091722   true    false    NULL    Demo$Counter  value
 1002    2018-09-24 15:24:16  0.11092    true    false    NULL    Demo$Counter  value
 1003    2018-09-24 15:24:17  0.078451   true    false    NULL    Demo$Counter  value


You can also check the method invocation details just as the watch command did. For example:

$ tt -i 1000 -w returnObj
@Integer[12444000]
Affect(row-cnt:1) cost in 181 ms.


This basically means the returned value of the time fragment with id=1000.

Exit Arthas

If you have finished troubleshooting with Arthas, there are two ways to quit Arthas:

quit or exit

These two commands will disconnect the current console connection while Arthas is still running in the target process. If you would like to connect again, you can simply use telnet to connect telnet localhost 3658.

shutdown

This command will terminate the Arthas completely. You have to run as.sh to attach again.

Conclusion

Until now, you had only taken a quick look at the installation and quick start of Arthas. We used this simple demo Java program to go through some interesting features provided by Arthas. Finally, we learned that how to quit or shutdown Arthas. Hopefully, this article can give you an overview of how Arthas can help you to quickly troubleshoot Java applications in production. You can also refer to this documentation for further reading. It's time to give it a try!

Developers! Quickly and easily gain access to the tools and information you need! Explore, test and combine our data quality APIs at Melissa Developer Portal – home to tools that save time and boost revenue. 

Topics:
java ,agent ,trace ,jvm ,troubleshoot ,tutorial

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}