Java Profiling with DTrace
Join the DZone community and get the full member experience.
Join For FreeDTrace, a dynamic tracing framework on Solaris (see dtrace(1m)) is a valuable and extremely easy-to-use tool if you find yourself analyzing Java performance issues on Solaris. While there is a lot of information on DTrace available (see the DTrace Wiki, which includes an extensive user guide), less information is available on using DTrace specifically with Java applications. My motivation with this post is to focus on profiling Java applications with DTrace.
Quick Tour
If you're not at all familiar with DTrace, look at the Getting Started section of the user guide. In short, DTrace allows you to specify a set of actions you would like performed at a location of interest (in the kernel or an application). A location of interest is referred to as a probe. In addition to a numerical ID, each probe is identified by the following four pieces of information:
- The probe Provider name
- The probe Module
- The name of the Function where the probe is located
- A Name for the probe, designed for human readability
(By the way, if you don't have a Solaris environment but would still like to follow along, you can install VirtualBox, then retrieve the ISO image for OpenSolaris, then follow the VirtualBox guide to get up and running on OpenSolaris).
What probes are available? Well, a lot. On my OpenSolaris (2009.06) environment, a count (using pfexec dtrace -l | wc -l), yields 60068 probes! (Note: this includes some Java-6-specific probes; more on that later). Most of these are fairly low-level probes from the point of view of a Java profiler, but let's take one and see what we can do with it.
Using DTrace, we can perform actions when the read system call is entered, using the following:
pfexec dtrace -n syscall::read:entry
If you enter this, you'll begin to see output such as the following:
0 57922 read:entry
0 57922 read:entry
0 57922 read:entry
0 57922 read:entry
0 57922 read:entry
Note this doesn't tell you a lot; you don't know the PIDs of the processes making the calls, and so on. DTrace has a number of built-in functions and variables that you can use to provide meaningful output at these locations. I won't go into this in any detail because there is a large body of information on these topics at the links I mentioned above. This is just the very quick tour to get to, and focus on, Java-specific probes.
At this point, before going any farther, you would begin placing your declarations and logic in a file. DTrace understands the D programming language, a language that will be familiar to most developers (especially C developers). The above syscall::read:entry example, placed in a file and enhanced to 1) apply only to a specified PID, and 2) to print out the name of the process, follows:
syscall::read:entry
/pid == $1/
{
printf("'%s' (PID: %d): entered syscall:read", execname, pid);
}
Place the following text in a file called blogExample.d, then find a process to monitor (I picked Firefox) and enter the following, substituting the PID of the process of your choice:
pfexec dtrace -s blogExample.d 697
and you should see output similar to the following:
0 57922 read:entry 'firefox-bin' (PID: 697): entered syscall:read
0 57922 read:entry 'firefox-bin' (PID: 697): entered syscall:read
0 57922 read:entry 'firefox-bin' (PID: 697): entered syscall:read
0 57922 read:entry 'firefox-bin' (PID: 697): entered syscall:read
Note the predicate (/pid == $1/) in the example which restricts actions to be performed only when pid (a built-in DTrace variable) is equal to the first argument supplied on the command line ($1). To suppress all output except for that specified in your printf() call, add the -q ("quiet") switch (you'll need to add a newline to the end of your printf() specification, also, as "quiet" mode does not provide one):
pfexec dtrace -q -s blogExample.d 697
and you should see output similar to the following:
'firefox-bin' (PID: 697): entered syscall:read
'firefox-bin' (PID: 697): entered syscall:read
'firefox-bin' (PID: 697): entered syscall:read
'firefox-bin' (PID: 697): entered syscall:read
'firefox-bin' (PID: 697): entered syscall:read
That wraps up the quick tour of DTrace. Now, on to our main topic!
DTrace Support in Java
Java SE 6 now supports DTrace, in the form of a couple of providers (specific to the HotSpot JVM):
- hotspot
- hotspot_jni
For more information, see DTrace Probes in HotSpot VM. We'll be focusing on the hotspot provider in this post. Normally, you would expect to find a provider with the dtrace -l command, looking for "hotspot". This approach gets you most of the way there; actually, providers are on a per-JVM basis, and they are identified by appending the PID to the name "hotspot." So, for example, if you have a Java process running and its PID is 1234, then there will be a DTrace provider called "hotspot1234". This naming convention is the reason that you will see references to "hotspot$1" in example D-language scripts -- the author is expecting a PID on the command line and is using it to create the provider name. This convention also spares us from having to restrict our probe actions to a specific PID, as the restriction is built in to the choice of provider.
Back to the Java-specific probes. Enter the following command in a shell:
pfexec dtrace -l | grep hotspot
 and you should see a list of all the hotspot and hotspot_jni probes. On my system, I got 499 probes with one Java process running; note there will be a complete set for each JVM. I have 26 "hotspot" probes, and 473 "hotspot_jni" probes. I will only be looking at "hotspot" probes in this post. You can also get a list of these probes, along with their arguments, in the Reference section of DTrace Probes in HotSpot VM.
Some Examples
Thread Monitoring
Note the following two probes related to thread activity:
probe thread-start(char* thread_name, uintptr_t thread_name_length,
uintptr_t java_thread_id, uintptr_t native_thread_id, bool is_daemon);
probe thread-stop(char* thread_name, uintptr_t thread_name_length,
uintptr_t java_thread_id, uintptr_t native_thread_id, bool is_daemon);
One interesting detail to note with string arguments in DTrace is that they come in pairs, where the first item is a pointer to the un-terminated string and the second item is the length of the string. Because of this convention, in example code you'll frequently see the following construct, which builds a null-terminated string from the provided information:
self->str_ptr = (char*) copyin(arg0, arg1+1);
self->str_ptr[arg1] = '\0';
self->stringVariable = (string) self->str_ptr;
Note that the above fragment could be used to construct the name of the thread from the above thread-startthread-stop probes.
To see how easy it is to use DTrace to create a simple thread profiler on Java 6, open a file in a text editor and enter the following:
hotspot$1:::thread-start
{
self->ptr = (char*) copyin(arg0, arg1+1);
self->ptr[arg1] = '\0';
self->threadname = (string) self->ptr;
printf("thread-start: '%s' (thread ID = %d; native thread ID = %d; is_daemon? %d)\n",
self->threadname, arg2, arg3, arg4);
}
hotspot$1:::thread-stop
{
self->ptr = (char*) copyin(arg0, arg1+1);
self->ptr[arg1] = '\0';
self->threadname = (string) self->ptr;
printf("thread-stop: '%s' (thread ID = %d; native thread ID = %d; is_daemon? %d)\n",
self->threadname, arg2, arg3, arg4);
}
Note as mentioned earlier the $1 macro variable, representing the JVM process ID you will be passing in on the command line. In this script, we output everything we get from these two probes, which is the thread name, ID, native ID, and a boolean "is_daemon". Save this file as threadWatcher.d and enter the following:
pfexec dtrace -q -s threadWatcher.d pid
substituting the process ID of a running JVM. In my test Java application, I then opened a file chooser and got the following output from DTrace:
thread-start: 'Basic L&F File Loading Thread' (thread ID = 42; native thread ID = 42; is_daemon? 0)Garbage Collection
thread-stop: 'Basic L&F File Loading Thread' (thread ID = 42; native thread ID = 42; is_daemon? 0)
thread-start: 'Basic L&F File Loading Thread' (thread ID = 43; native thread ID = 43; is_daemon? 0)
thread-start: 'Basic L&F File Loading Thread' (thread ID = 41; native thread ID = 41; is_daemon? 0)
thread-stop: 'Basic L&F File Loading Thread' (thread ID = 41; native thread ID = 41; is_daemon? 0)
thread-stop: 'Basic L&F File Loading Thread' (thread ID = 43; native thread ID = 43; is_daemon? 0)
Note the following probes relevant to garbage collection:
probe gc-begin(bool is_full);
probe gc-end();
probe mem-pool-gc-begin(
char* mgr_name, uintptr_t mgr_name_len, char* pool_name, uintptr_t pool_name_len,
uintptr_t initial_size, uintptr_t used, uintptr_t committed, uintptr_t max_size);
probe mem-pool-gc-end(
char* mgr_name, uintptr_t mgr_name_len, char* pool_name, uintptr_t pool_name_len,
uintptr_t initial_size, uintptr_t used, uintptr_t committed, uintptr_t max_size);
The
first two probes provide generic information about the start and stop
of a garbage collection, while the latter two provide more detailed
information. We'll output information from both to compare. Note that
the manager name and pool name are provided with the same
pointer/length data pairs as we saw earlier.
Open a file called gcWatcher.d in a text editor and enter the following:hotspot$1:::gc-begin
Save the file, then trace your Java application with the following:
{
printf("gc-begin: is_full = %d\n", arg1);
}
hotspot$1:::gc-end
{
printf("gc-end:\n");
}
hotspot$1:::mem-pool-gc-begin
{
self->ptr1 = (char*) copyin(arg0, arg1+1);
self->ptr1[arg1] = '\0';
self->mgr_name = (string) self->ptr1;
self->ptr2 = (char*) copyin(arg2, arg3+1);
self->ptr2[arg3] = '\0';
self->pool_name = (string) self->ptr2;
printf("mem-pool-gc-begin: mgr: '%s'; pool: '%s'; initial: %d; used: %d; committed: %d; max: %d\n",
self->mgr_name, self->pool_name, arg4, arg5, arg6, arg7);
}
hotspot$1:::mem-pool-gc-end
{
self->ptr1 = (char*) copyin(arg0, arg1+1);
self->ptr1[arg1] = '\0';
self->mgr_name = (string) self->ptr1;
self->ptr2 = (char*) copyin(arg2, arg3+1);
self->ptr2[arg3] = '\0';
self->pool_name = (string) self->ptr2;
printf("mem-pool-gc-end: mgr: '%s'; pool: '%s'; initial: %d; used: %d; committed: %d; max: %d\n",
self->mgr_name, self->pool_name, arg4, arg5, arg6, arg7);
} pfexec dtrace -q -s gcWatcher.d pid
For
my example, after triggering a little activity in my application, I saw
a number of minor collections; here is the output from one of them:gc-begin: is_full = 0
mem-pool-gc-begin: mgr: 'Copy'; pool: 'Code Cache'; initial: 163840; used: 1751360; committed: 1769472; max: 33554432
mem-pool-gc-begin: mgr: 'Copy'; pool: 'Eden Space'; initial: 917504; used: 917504; committed: 917504; max: 4194304
mem-pool-gc-begin: mgr: 'Copy'; pool: 'Survivor Space'; initial: 65536; used: 58904; committed: 65536; max: 458752
mem-pool-gc-begin: mgr: 'Copy'; pool: 'Tenured Gen'; initial: 4194304; used: 1406096; committed: 4194304; max: 61997056
mem-pool-gc-begin: mgr: 'Copy'; pool: 'Perm Gen'; initial: 12582912; used: 1144824; committed: 12582912; max: 67108864
mem-pool-gc-begin: mgr: 'Copy'; pool: 'Perm Gen [shared-ro]'; initial: 8388608; used: 6733248; committed: 8388608; max: 8388608
mem-pool-gc-begin: mgr: 'Copy'; pool: 'Perm Gen [shared-rw]'; initial: 12582912; used: 7676960; committed: 12582912; max: 12582912
mem-pool-gc-end: mgr: 'Copy'; pool: 'Code Cache'; initial: 163840; used: 1751360; committed: 1769472; max: 33554432
mem-pool-gc-end: mgr: 'Copy'; pool: 'Eden Space'; initial: 917504; used: 0; committed: 917504; max: 4194304
mem-pool-gc-end: mgr: 'Copy'; pool: 'Survivor Space'; initial: 65536; used: 29648; committed: 65536; max: 458752
mem-pool-gc-end: mgr: 'Copy'; pool: 'Tenured Gen'; initial: 4194304; used: 1458336; committed: 4194304; max: 61997056
mem-pool-gc-end: mgr: 'Copy'; pool: 'Perm Gen'; initial: 12582912; used: 1144824; committed: 12582912; max: 67108864
mem-pool-gc-end: mgr: 'Copy'; pool: 'Perm Gen [shared-ro]'; initial: 8388608; used: 6733248; committed: 8388608; max: 8388608
mem-pool-gc-end: mgr: 'Copy'; pool: 'Perm Gen [shared-rw]'; initial: 12582912; used: 7676960; committed: 12582912; max: 12582912
gc-end:
Note that not a great deal of information is available from the generic garbage-collection begin and end probes, while much more information is available from the mem-pool probes.
You
can do much more than simply print the values of probe arguments, of
course; see the reference documents for more information. As an
example, you can count and output the number of threads created in a
time period, gather statistics on the average lifetime of a thread,
output the average memory recovered during a garbage collection, and so
on.
Other Considerations
One question you might have is "How expensive are these operations?" It depends on the operation. The probes we have looked at above are relatively inexpensive. Other probes, such as method-entry and method-exit probes, or those monitoring object allocation, are much more expensive and indeed have a significant cost when enabled, even when they are not being used.
For this reason, there is a JVM option -- -XX:+ExtendedDTraceProbes -- which when present enables these and some other expensive probes. For more information, see the "Effect of using DTrace with Java Programs" section of the Observability Using Java Platform, Standard Edition 6, and Solaris OS article. If this option is not enabled, DTrace will ignore any of the above-mentioned probes in your profiling application, sparing you from inadvertently slowing down someone's production server.
That's the end of this quick tour. As I mentioned earlier, there are a lot of resources available on DTrace, specifically the user guide on the DTrace wiki (see the "Documentation" link, or go directly to this link).
Opinions expressed by DZone contributors are their own.
Comments