Troubleshooting Java Applications With Arthas
Want to learn more about troubleshooting Java applications in production? Check out this post on how to troubleshoot Java apps with Arthas.
Join the DZone community and get the full member experience.
Join For FreeArthas is a Java Diagnostic tool open sourced by Alibaba. Arthas can help developer trouble-shoot production issues for Java applications without modifying your code or restarting your server.
Let's start with a simple demo to see how we can use Arthas to trouble-shoot this Java program on-the-fly.
Java Program Demo
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;
public class Demo {
static class Counter {
private static AtomicInteger count = new AtomicInteger(0);
public static void increment() {
count.incrementAndGet();
}
public static int value() {
return count.get();
}
}
public static void main(String[] args) throws InterruptedException {
while (true) {
doSomething();
TimeUnit.SECONDS.sleep(1);
}
}
private static int doSomething() {
for (int i = 0; i< 1000; i++) {
Counter.increment();
}
int value = Counter.value();
return value;
}
}
Save the following code to a Demo.java
and run the commands in Shell:
javac Demo.java && java Demo
Install Arthas
curl -L https://alibaba.github.io/arthas/install.sh | sh
Start Arthas
To start Arthas, simply type the following command:
./as.sh
This script will automatically display the available Java process that you can attach. In this example, we have only two available Java processes to attach.
Arthas script version: 3.0.4
Found existing java process, please choose one and hit RETURN.
* [1]: 9617
[2]: 37374 Demo
Since we are going to attach the Demo.Java process, we type 2
and hit RETURN
, then Arthas will proceed to attach to the specified process.
Calculating attach execution time...
Attaching to 37374 using version 3.0.4...
real 0m0.986s
user 0m0.379s
sys 0m0.058s
Attach success.
Connecting to arthas server... current timestamp is 1537684660
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
,---. ,------. ,--------.,--. ,--. ,---. ,---.
/ O \ | .--. ''--. .--'| '--' | / O \ ' .-'
| .-. || '--'.' | | | .--. || .-. |`. `-.
| | | || |\ \ | | | | | || | | |.-' |
`--' `--'`--' '--' `--' `--' `--'`--' `--'`-----'
wiki: https://alibaba.github.io/arthas
version: 3.0.4
pid: 37374
timestamp: 1537684661535
$
If you can see the Arthas logo, then congratulations, you have successfully attached to the target JVM!
Next, let's go through some of the cool features that Arthas provides and have a quick evaluation of what Arthas can do for you.
TroubleShoot With Arthas
Dashboard: Overview of Your Java Process
Type in dashboard
and hit the ENTER
, you will see it in the following code (press Ctrl+C
to stop):
$ dashboard
ID NAME GROUP PRIORI STATE %CPU TIME INTERRU DAEMON
17 pool-2-thread-1 system 5 WAITIN 67 0:0 false false
27 Timer-for-arthas-dashb system 10 RUNNAB 32 0:0 false true
11 AsyncAppender-Worker-a system 9 WAITIN 0 0:0 false true
9 Attach Listener system 9 RUNNAB 0 0:0 false true
3 Finalizer system 8 WAITIN 0 0:0 false true
2 Reference Handler system 10 WAITIN 0 0:0 false true
4 Signal Dispatcher system 9 RUNNAB 0 0:0 false true
26 as-command-execute-dae system 10 TIMED_ 0 0:0 false true
13 job-timeout system 9 TIMED_ 0 0:0 false true
1 main main 5 TIMED_ 0 0:0 false false
14 nioEventLoopGroup-2-1 system 10 RUNNAB 0 0:0 false false
18 nioEventLoopGroup-2-2 system 10 RUNNAB 0 0:0 false false
23 nioEventLoopGroup-2-3 system 10 RUNNAB 0 0:0 false false
15 nioEventLoopGroup-3-1 system 10 RUNNAB 0 0:0 false false
Memory used total max usage GC
heap 32M 155M 1820M 1.77% gc.ps_scavenge.count 4
ps_eden_space 14M 65M 672M 2.21% gc.ps_scavenge.time(m 166
ps_survivor_space 4M 5M 5M s)
ps_old_gen 12M 85M 1365M 0.91% gc.ps_marksweep.count 0
nonheap 20M 23M -1 gc.ps_marksweep.time( 0
code_cache 3M 5M 240M 1.32% ms )
Runtime
os.name Mac OS X
os.version 10.13.4
java.version 1.8.0_162
java.home /Library/Java/JavaVir
tualMachines/jdk1.8.0
_162.jdk/Contents/Hom
e/jre
By default, the content will be refreshed every five seconds. On top of the dashboard, you can see a list of running threads with some information, like thread ID, name, group, priority, state, CPU usage, running time, etc. They are ranked by CPU usage.
In the middle section, there are two sub-sections. On the left, it shows the memory usage with heap usage and non-heap usage. In the heap usage, you can see the usage of the young generation, including the Eden/survivor region and the old generation.
On the right, it shows the Garbage Collection statistics, including the Garbage Collection algorithm, as well as the time and count for both the young and old generations.
In the bottom section, it shows basic system information, e.g. the Java version, operating system, and Java home location.
SC: Search Class Loaded by Your JVM
Sometimes, you want to know whether a classed is loaded and where is it loaded from. SC is the command for you.
$ sc Demo$Counter -d
class-info Demo$Counter
code-source /private/var/tmp/
name Demo$Counter
isInterface false
isAnnotation false
isEnum false
isAnonymousClass false
isArray false
isLocalClass false
isMemberClass true
isPrimitive false
isSynthetic false
simple-name Counter
modifier static
annotation
interfaces
super-class +-java.lang.Object
class-loader +-sun.misc.Launcher$AppClassLoader@2a139a55
+-sun.misc.Launcher$ExtClassLoader@51d04a9
classLoaderHash 2a139a55
Affect(row-cnt:1) cost in 13 ms.
The sc Demo$Counter
tells Arthas to search for a class from all the loaded classes of this JVM. The-d
option means to display the metadata in a more detail way. For example, where the class is loaded from, the modifiers, and the classloader hierarchy.
Jad: Decompile Class Into Source Code
Sometimes, you want to confirm whether your application is running the correct code or not. To do that, you have to decompile the class into the source code. The jad
is the right command for you.
$ jad Demo
ClassLoader:
+-sun.misc.Launcher$AppClassLoader@2a139a55
+-sun.misc.Launcher$ExtClassLoader@51d04a9
Location:
/private/var/tmp/
/*
* Decompiled with CFR 0_132.
*/
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;
public class Demo {
public static void main(String[] arrstring) throws InterruptedException {
do {
Demo.doSomething();
TimeUnit.SECONDS.sleep(1L);
} while (true);
}
private static int doSomething() {
int n;
for (n = 0; n < 1000; ++n) {
Counter.increment();
}
n = Counter.value();
return n;
}
static class Counter {
private static AtomicInteger count = new AtomicInteger(0);
Counter() {
}
public static int value() {
return count.get();
}
public static void increment() {
count.incrementAndGet();
}
}
}
Watch: View Method Invocation Input and Results
Next, we would like to look at the return value of Counter.value()
. Normally, we have to add the following:
System.out.println(value);
And, we will need to restart the demo to see the return value. However, with Arthas, you don't have to add any code or restart. You can use the watch
command instead.
$ watch Demo$Counter value returnObj
Press Ctrl+C to abort.
Affect(class-cnt:1 , method-cnt:1) cost in 62 ms.
ts=2018-09-23 15:36:54;result=@Integer[24000]
ts=2018-09-23 15:36:55;result=@Integer[25000]
ts=2018-09-23 15:36:56;result=@Integer[26000]
ts=2018-09-23 15:36:57;result=@Integer[27000]
ts=2018-09-23 15:36:58;result=@Integer[28000]
The basic watch command syntax is watch class-name method-name content
. Here, since Counter is an inner class of Demo, we have to use Deme$Counter
to denote it. Value
is the name of the method. ReturnObj
is one of the pre-defined keywords that we can use to denote the content to watch. Other possible keywords include params
, throwExp
, target
, classloader,
and more. We will talk about that later. Here, we just need to know that returnObj
means that we want to watch the return object of the specified method.
The next line:
Press Ctrl+C to abort.
This means that Arthas will start another thread to process this command and output the console. If you would like to stop watching, you can press Ctrl+C to abort.
The next line can be found below:
Affect(class-cnt:1 , method-cnt:1) cost in 62 ms.
This means that Arthas will search for the specified class and method in the Java process and change the byte-code of the method to add some logic. This means that there is one class and one method that has been modified by Arthas.
Once the method has been called, it will output the content you specified to watch:
ts=2018-09-23 15:36:54;result=@Integer[24000]
This includes a timestamp and the result, which is an integer of 24000.
Of course, this is a simple case, and if the returned object is complicated and you would like to what is inside, you can use the ognl
expression to customize it. However, this is beyond the scope of this article.
Trace: Find the Bottleneck of Your Method Invocation
Next, whenever you want to find out whether there is a bottleneck, e.g. why my method running slowly, you should try the trace
command. It can list all the sub method invocations with the number of invocations and the total execution time. This is very helpful to identify slow method invocations or an unexpected number of invocations.
Next, let's find out what is going on inside doSomething
using the trace
command.
$ trace Demo doSomething
Press Ctrl+C to abort.
Affect(class-cnt:1 , method-cnt:1) cost in 101 ms.
`---ts=2018-09-24 14:43:58;thread_name=main;id=1;is_daemon=false;priority=5;TCCL=sun.misc.Launcher$AppClassLoader@2a139a55
`---[25.927722ms] Demo:doSomething()
+---[min=0.00184ms,max=5.062901ms,total=9.786846ms,count=1000] Demo$Counter:increment()
`---[0.011398ms] Demo$Counter:value()
The first several lines have been explained above, so here, we will just skip it. Let start with:
14:43:58;thread_name=main;id=1;is_daemon=false;priority=5;TCCL=sun.misc.Launcher$AppClassLoader@2a139a55
This line shows some important information on calling Thread
, which includes the thread name, ID, priority, and ThreadContextClassloader
.
Next, let's look at the following lines:
`---[25.927722ms] Demo:doSomething()
+---[min=0.00184ms,max=5.062901ms,total=9.786846ms,count=1000] Demo$Counter:increment()
`---[0.011398ms] Demo$Counter:value()
This shows the sub-method invocation tree of doSomething
, It means in doSomething
that two methods have been called; they are: Demo$Counter:increment()
and Demo$Counter:value()
. The former method has been called 1000 times, with a total time of 9.786846ms
, and the latter method has been called once, with a total time of 0.011398ms
.
Monitor: View Method Invocation Statistics
Sometimes, we want to have a performance overview for a specified method to ensure that it is working correctly. In this case,Monitor
is a good friend to have.
$ monitor Demo$Counter increment -c 5
Press Ctrl+C to abort.
Affect(class-cnt:1 , method-cnt:1) cost in 17 ms.
timestamp class method total success fail avg-rt(ms) fail-rate
-------------------------------------------------------------------------------------------
2018-09-24 15:13:31 Demo$Counter increment 5000 5000 0 0.01 0.00%
timestamp class method total success fail avg-rt(ms) fail-rate
-------------------------------------------------------------------------------------------
2018-09-24 15:13:36 Demo$Counter increment 5000 5000 0 0.00 0.00%
timestamp class method total success fail avg-rt(ms) fail-rate
-------------------------------------------------------------------------------------------
2018-09-24 15:13:41 Demo$Counter increment 5000 5000 0 0.00 0.00%
The -c 5
option tells Arthas to output the result every five seconds. And, the output shows the total number of invocations, successive invocations (no exception is thrown out of the method), failed invocations, average response time, and failure rates during the monitor period.
Stack: View Call Stack of the Method
Sometimes, we want to know who is calling this method. Normally, when an exception is raised, there will be a stack trace. But, when your method is acting abnormally, you should be able to know who is causing this behavior without any change of your code. Here comes the stack
command to help you solve it.
$ stack Demo$Counter increment -n 1
Press Ctrl+C to abort.
Affect(class-cnt:1 , method-cnt:1) cost in 17 ms.
ts=2018-09-24 15:20:33;thread_name=main;id=1;is_daemon=false;priority=5;TCCL=sun.misc.Launcher$AppClassLoader@2a139a55
@Demo.doSomething()
at Demo.main(Demo.java:16)
The stack Demo$Counter increment
basically means give me the stack trace if Demo$Counter#increment
is called. The option -n 1
tells Arthas to only output the stack trace once. We are already familiar with most of the output, except for the last several lines, which outputs the stack trace of theincrement
method.
Tt: Time Tunnel of Method Invocations
Sometimes, there may be too many invocations to a method, and whether you want to check the input parameter or the returned object is undecided, because you have no idea what to check. At this time, you'd better record all invocations so that you can analyze them later. At this time, you will need the tt
command, which does exactly what you need.
$ tt Demo$Counter value -t
Press Ctrl+C to abort.
Affect(class-cnt:1 , method-cnt:1) cost in 14 ms.
INDEX TIMESTAMP COST(ms) IS-RET IS-EXP OBJECT CLASS METHOD
----------------------------------------------------------------------------------------
1000 2018-09-24 15:24:14 0.197211 true false NULL Demo$Counter value
1001 2018-09-24 15:24:15 0.091722 true false NULL Demo$Counter value
1002 2018-09-24 15:24:16 0.11092 true false NULL Demo$Counter value
1003 2018-09-24 15:24:17 0.078451 true false NULL Demo$Counter value
tt Demo$Counter value -t
basically means we need to record every invocation to Demo$Counter#value
, each one of the recordings is called a time fragment. The result table shows the ID of the time fragment, the timestamp that happened, the execution time, and whether it successfully returned or ended it up with an exception thrown. The last column OBJECT
represents the actual object that is called. Here, it is null
because it is a static method.
After this recording, you can list all the recordings:
$ tt -l
INDEX TIMESTAMP COST(ms) IS-RET IS-EXP OBJECT CLASS METHOD
--------------------------------------------------------------------------------------
1000 2018-09-24 15:24:14 0.197211 true false NULL Demo$Counter value
1001 2018-09-24 15:24:15 0.091722 true false NULL Demo$Counter value
1002 2018-09-24 15:24:16 0.11092 true false NULL Demo$Counter value
1003 2018-09-24 15:24:17 0.078451 true false NULL Demo$Counter value
You can also check the method invocation details just as the watch
command did. For example:
$ tt -i 1000 -w returnObj
@Integer[12444000]
Affect(row-cnt:1) cost in 181 ms.
This basically means the returned value of the time fragment with id=1000.
Exit Arthas
If you have finished troubleshooting with Arthas, there are two ways to quit Arthas:
quit
or exit
These two commands will disconnect the current console connection while Arthas is still running in the target process. If you would like to connect again, you can simply use telnet to connect telnet localhost 3658
.
shutdown
This command will terminate the Arthas completely. You have to run as.sh
to attach again.
Conclusion
Until now, you had only taken a quick look at the installation and quick start of Arthas. We used this simple demo Java program to go through some interesting features provided by Arthas. Finally, we learned that how to quit or shutdown Arthas. Hopefully, this article can give you an overview of how Arthas can help you to quickly troubleshoot Java applications in production. You can also refer to this documentation for further reading. It's time to give it a try!
Opinions expressed by DZone contributors are their own.
Comments