Java Profiling with MonkeyWrench
MonkeyWrench (available at http://www.gorillalogic.com/monkeywrench) is an open-source Java profiler based on the runtime's built-in management interface and instrumentation hooks. MonkeyWrench enables you to:
- Collect CPU and wall‐clock statistics at the class constructor and method level, for the classes you select, with detailed information on the stack trace signatures and threads involved with each constructor or method;
- Monitor object creation, also with associated stack trace signatures and associated threads;
- Monitor all threads in the target application, focusing on those with the most CPU time (either in a window or cumulative); also monitor thread contention and deadlocks, all sampled at the time interval of your choice;
- Graphically monitor the real‐time behavior of the JVM garbage collectors, viewing both the number of collections, and the total time consumed collecting, also sampling at the time interval of your choice.
MonkeyWrench uses well‐defined Java APIs, and its Swing interface makes it easy to begin profiling an application. It can be launched at target launch time, or, for HotSpot JVMs, a MonkeyWrench launcher application can be used to locate and attach to already‐running JVMs. MonkeyWrench provides in one place information you'd normally need to collect from multiple applications (e.g., method CPU times, hprof‐style site summaries, and thread‐contention monitoring). Finally, since the source is available, you can customize it to provide just what you want, and only what you want, for your needs.
MonkeyWrench is available both in source and binary (.jar) form. Installation instructions will start with build instructions, and then proceed to the different ways to launch MonkeyWrench. Even if you are not building MonkeyWrench, it is recommended you read the (relatively‐short) section on building, as it will help you to better understand some of the launch dependencies you will encounter later.
MonkeyWrench is a Java 6 application. While your target application may need to run in an older JVM, that’s fine, but you will need to run it in Java 6 while profiling it with MonkeyWrench. It is possible for some pre‐Java‐6 applications to not be able to run under Java 6, of course, based on choices made during their development. In those cases, it may make sense to choose a different profiler.
One act that will make deployment simpler is to, before you build, download Javassist (see http://sourceforge.net/projects/jboss/files/Javassist and download the latest version) and edit the following file in the MonkeyWrench source tree:
The Boot‐Class‐Path attribute in the manifest tells target applications where to find the JDK’s tools.jar and Javassist’s javassist.jar files when you attach to a running application. Edit this entry to reflect the locations of these two jar files in your project.
MonkeyWrench is a Maven‐2 project. The easiest path to building it is to simply unzip it, ensure Maven is in your path, set the M2 and M2_HOME environment variables, and enter
mvn clean install
The Maven pom file has dependencies on the JDK’s tools.jar and the Javassist javassist.jar file. MonkeyWrench’s use of the Javassist package is simple enough that you should be OK with any version of Javassist; the Maven pom file references version 3.11. To process the dependency on the tools.jar file, the Maven pom file needs the environment variable java.home. Note that this needs to be the JDK home (JRE is not sufficient). You can ensure that it is set to a JDK home or hard‐code the location in the pom file. If you have any problems with the build, you may need to manually download and install some resource, most likely Javassist. If you have any experience with Maven, you won’t have any trouble with this; if you are new to Maven, the error messages are fairly informative.
Per the usual Maven setup, the MonkeyWrench jar file will be placed in the target directory. Note that the name of the jar file is determined by properties in the pom file and can be changed.
The MonkeyWrench can be run in two modes:
- Set up to launch at the same time that the target application launches;
- Launched by attaching to a running target (Java HotSpot JVMs only)
Launching MonkeyWrench at target application launch time:
This is the simplest way to launch the MonkeyWrench profiler. In the launch script of your target application, supply the following argument: -javaagent:$MonkeyWrenchHome/target/GorillaProfiler-1.0-SNAPSHOT.jar
if you built MonkeyWrench from source, or
if you downloaded the binaries. Note that the full path to the MonkeyWrench jar file is required. The MonkeyWrench GUI will launch when you launch your target application. Figure 1 shows the MonkeyWrench GUI at launch time.
Figure 1 -- The MonkeyWrench GUI
Attaching MonkeyWrench to an already‐running target application:
There are a couple of ways provided to attach to a running target. Before you proceed, you must first correctly set the Boot-Class-Path attribute in the manifest file of the MonkeyWrench jar file. The reason for this step is that when you attach to a running JVM, the JVM needs to know where to find the JDK tools.jar and Javassist javassist.jar files. You can open the MonkeyWrench jar file and edit this entry directly, then replace the manifest file, or you can use the included script named (somewhat grandiosely) installer.bat. Edit the file and put in the paths to your JDK and Javassist deployment, and it will update the MonkeyWrench jar file for you.
The two ways to attach to a running target are to 1) launch the MonkeyWrench’s main() method, which will attach to a running JVM, or 2) use the MonkeyWrench launcher GUI, which makes things a little easier.
See the included run.bat script for a command‐line approach to starting the agent. You’ll need the JVM ID of your target. Luckily, this is just the OS’s process ID. Don’t forget that you can use jps to list all Java processes. In the script, note the following:
- The classpath must include the MonkeyWrench jar file, tools.jar and javassist.jar.
- The first argument to the main() method is the path to the MonkeyWrench jar file. The classpath finds the jar file to launch the agent, while the first command‐line argument supplies the agent path to the Sun attach API. This path must be a full path.
- The second command‐line argument to the MonkeyWrench main() method, and the one and only command‐line argument passed to the example script, is the process ID of the target application.
When you attach to the running application, the MonkeyWrench Swing GUI will launch and you are ready to begin profiling.
The second way to attach to a running target is with the MonkeyWrench launcher application (see Figure 2). There is a sample Windows script in the distribution top‐level directory called runLauncher.bat. Edit it as necessary to point to tools.jar and the MonkeyWrench jar file on your system (note: the launcher does not need to know about Javassist, but MonkeyWrench still does; MonkeyWrench will find Javassist by looking at the Boot-Class-Path entry in its manifest). Once you have started the launch, it will maintain an up‐to‐date list of all running Java applications on the system; double‐click on an entry for a JVM and the MonkeyWrench GUI will launch.
Figure 2 -- The MonkeyWrench Launcher
MonkeyWrench GUI Overview
The configuration screen of MonkeyWrench (see Figure 1) displays the controls for the other screens of MonkeyWrench. The “CPU & Memory Profiling” section controls the operation of the “CPU” and “Memory” tabs; the “Thread Profiling” section controls the operation of the “Threads” tab, and the “Garbage Collection Profiling” section controls the “Garbage Collection” tab. Note that only the CPU and Memory profiling involve bytecode injection, the most expensive operations in the profiler; Thread and Garbage Collection profiling use JMX beans only, and so have much less impact on the performance of your target application. The right side of this screen is a help document which contains an abbreviated version of the information in this article.
CPU and Memory profiling are turned on or off at the same time, so those two tabs are linked by one configuration. Considering this as one feature, this feature, thread profiling and garbage‐collection profiling are independent features that can be turned on or off individually. If an application is undergoing heavy garbage collection, for example, MonkeyWrench can be used to profile just the garbage collection without the additional overhead of method profiling.
To avoid leaving a target application with expensive profiling code injected, the “close” button of MonkeyWrench is disabled while CPU and memory profiling is active. To re‐enable this button, you must stop CPU and memory profiling, from the Configuration tab.
The following sections will focus on each of the remaining tabs of the application by feature. Since all of these features are available in some form in most profilers, each feature section will include a discussion of the motivation behind the specific features provided by MonkeyWrench.
Configuration is split into three areas, as described earlier. Each is covered in the following subsections.
CPU & Memory Profiling
CPU & Memory profiling is defined as the injection of bytecode to notify MonkeyWrench when methods or constructors are entered and exited. When one of these events occurs, the collected information includes system time and the current thread’s CPU time, as well as its stack trace.
The constructors and methods chosen for profiling are determined by a Java regular expression. Before you build this regular expression, you must first decide if you want the expression to define class names or method names. If you match on class names, all visible constructors and methods will be instrumented for each class that matches. If you match on method or constructor names, your regular expression must match exactly on that method or constructor. The match expression includes the full package name of the class. If you don’t want to type out the full package of a class, you can start the regex with “.*” as a shortcut, although you may end up matching more than you intended. You can also specify an exclusion regular expression; if it is specified, it takes precedence over inclusions.
For example, if you want to instrument all methods starting with “get” in the class com.example.Printer, your regular expression could be com.example.Printer.get.* and you would choose “Instrument on Method/Constructor Name”. If you want to instrument every method in the same class, you would choose “Instrument on Class Name” and give the regular expression com.example.Printer. Note that the dot “.” character actually means “match any character”, so the above expressions should actually escape the dot to match specifically on that character (e.g. com\.example\.Printer), but in practice the lazy regex will usually match just what you want. Just be aware that it might match more than you expect.
The regex window does live validation of your expression as you are typing it, to warn you of incorrectly formulated regular expressions. In these cases, the text is colored red and the tool tip tells you what the problem is. Note that the validation is aggressive and will flag an expression that might eventually be correct when you are done typing it (for example, when you are entering an escape character, or have not yet completed typing a correct sequence of characters). Figure 3 shows an example of what you would see if you were about to escape an open‐parenthesis, where your eventual regex will be “.*\(“. Validation occurs on each keystroke, so entering a “(“ after the “\” will return the display to the “valid” state.
You can specify multiple regular expressions in each window by separating them with a comma. Note that everything between commas is considered part of the regular expression! If you follow a comma with a space character and then some non‐whitespace characters, you are specifying an additional regex which starts with a space character.
Figure 3 -- Regular Expression Feedback
The update interval defaults to 5 seconds. This means that every 5 seconds, MonkeyWrench will process the data structures of method‐entry and method‐exit calls, calculate wall clock times and CPU times, and update the MonkeyWrench CPU and Memory tabs. While the collection of profiling data if fairly expensive, the updating of the display is also quite expensive. Note that a lot of activity must occur on the AWT event thread, to avoid concurrency issues with the GUI’s data structures, so there will be a noticeable impact on the performance of the MonkeyWrench and the target application when the screens are updated. If in addition you are profiling a Swing application, the update pause becomes even more noticeable. For this reason, unless you are profiling a rather narrow slice of classes in your application, it is recommended that you start with a display update interval of 30 seconds.
The “Clear CPU and Memory Data on Start/Stop” checkbox is designed to help you narrow down profiling data to specific windows of time, and to separate profiling results for one set of classes (per the regular expressions you specify) from results for a different set of classes. One strength of MonkeyWrench is the ability to clear data on demand (so, for example, you can clear profiling data just before initiating an expensive operation in your target application). This checkbox is another way of isolating profiling results.
Since bytecode injection itself is an expensive operation, a progress bar runs both when classes are being instrumented and when they are being restored to their original state. During profiling, you can click on the “View Instrumented Classes” (or “View Instrumented Methods”, depending on which instrumentation choice you have made) to verify that you instrumented the classes or methods that you specified in your regular expressions.
As was mentioned earlier, once profiling is turned on, the “close” button on MonkeyWrench is disabled. This step is taken to prevent the profiler from being disposed while the target application is still instrumented. To re‐enable the MonkeyWrench “close” button, stop CPU and Memory profiling.
For more discussion of the actual profiling data collected and displayed during this type of profiling, see the section “CPU & Memory Profiling”.
Thread profiling configuration is relatively simple; all you need to decide is how often the display updates (this interval is independent of the CPU & Memory Profiling interval) and whether or not you want thread profiling data to be cleared on start and stop. MonkeyWrench maintains records of all threads it has seen during profiling; clearing profiling data (either via the “…Now” button or at profiling start and stop) clears memory of threads which are no longer running. Note, however, that there is some data which isn’t cleared; for example, the cumulative CPU time of a thread is retrieved from the Thread JMX bean and will include CPU time accumulated before the last time the MonkeyWrench display was cleared. A good recommended update interval here is one second.
For more discussion of the data collected and displayed for thread profiling, see the section “Thread Profiling”.
Garbage Collection Profiling
The configuration options for garbage‐collection profiling are similar to those for thread profiling. The recommended (and default) display‐update interval is five seconds, although it can be set as low as one second. You can also specify whether or not to clear data on profiling start and stop.
A discussion of the data displayed in garbage‐collection profiling, and the motivation for the MonkeyWrench displays, can be found in “Garbage Collection Profiling”.
CPU & Memory Profiling
Once you have started CPU and Memory profiling, information accumulates in the “CPU” and “Memory” tabs at each display update interval. The “CPU” tab is designed to track the time spent in methods and constructors, while the “Memory” tab is designed to track the allocation of objects specified by your regular expressions. Note that if you instrument only on method names, you will not get any output in the “Memory” tab, as no constructors will be instrumented.
Figure 4 shows the CPU tab while profiling an NWIS-based water quality web service. A few selections have been made to highlight most of the features of this tab. As method and constructor‐call signatures accumulate in the display, the table in the top half of the screen displays the CPU time and wall‐clock time for each method, as well as the percentages of total CPU and wall‐clock time, and the number of invocations. Note that the totals are based on the total time accumulated for all methods being instrumented (not for all methods in the target application). There are only a few methods in the table in this example, but the number of entries can of course run into the hundreds or thousands. For this reason, a search window is provided, with “find next” and “find previous” buttons.
The StreamMonitor’s getSiteName() method is selected in the constructor/method table. When you select an entry in this table, the site‐summary table in the lower‐left corner is populated with a list of unique stack‐trace signatures associated with this method call (the term “site” is taken from the profiling utility hprof). In this case, there is only one site identified; clicking on that entry will populate the lower‐right text area with the stack trace itself.
Figure 4 -- The CPU Screen
Both tables in this screen can be sorted on all columns, ascending or descending. You will typically sort the upper table by “% Total CPU Time”, descending, and the site‐summary table by the “Occurrences” column, descending. The two percentage columns in the upper table render a proportional green bar to display the percentage, as well as displaying the amount.
Note there are two “View Threads” buttons. When you select “View Threads” on the upper table, containing the method signature data, you will open a small dialog listing the threads used to invoke that method, with a count of the number of times each thread called that method or constructor. When you select the “View Threads” button above the stack‐trace text area, you display only the threads used to call the method or constructor through the selected site or stack trace. In this example, the thread “pool‐1‐thread‐1” (a naming convention used by the Java SE webservice implementation) is the thread that has called this method for all of the occurrences (10, in this case) since we started profiling or cleared the data.
In a more‐complex or more heavily‐used application, there may in fact be many sites, or stack‐trace signatures, for a particular constructor or method call, and there may be several threads involved. These may be just different threads from a pool, but in some cases the thread display can reveal potential issues. For example, it may show that on a number of occasions, a highly compute‐intensive method is being called on the AWT event thread. Having the stack trace associated with this thread would then simplify the job of determining how such a call got scheduled on the thread responsible for GUI updates.
Finally, note the two “Clear...” buttons on this screen. Every screen (except the Configuration screen) in the MonkeyWrench contains these two buttons; one to clear the profiling data in the current screen, and one to clear the profiling data in all screens. This feature is provided as a convenience to allow you to clear the profiling data just before an important event occurs in your target application.
Figure 5 shows the “Memory” screen of the MonkeyWrench profiler, this time for the popular “jEdit” Java file editor, after a couple of files have been opened, parsed and displayed. As you might expect, a lot of objects get constructed by an editor during this type of activity, so the memory usage table (similar to the method/constructor table in the CPU screen, but in this case, showing objects instantiated) has quite a few entries. This application is an example of a case where the search window (identical in operation to the one in the CPU screen) can be quite useful.
Figure 5 -- The Memory Screen
In this case, we have selected jEdit’s ServiceManager’s inner class Descriptor. We see that there are 34 instances of this class for a total of 1088 bytes (note that the object size, as returned by the instrumentation hooks of the JVM, is a shallow size). We selected the “View Threads” button above the upper table, which displays the threads for all 34 occurrences of this object instantiation (unlike the lower “View Threads” button, which would just show the 4 occurrences for the selected site summary) and can see that 32 times, the Descriptor was instantiated on the AWT event thread, but 2 times it was instantiated on one of jEdit’s I/O threads.
Working our way down the list in the site summary table, we could find the one (or at most two) sites where the jEdit I/O thread was used to instantiate one of these objects. In fact, if we scroll down through the list of sites (practical in this case only because there are not that many sites) we would see that it is stack trace #15 which was used by this particular thread, as seen in Figure 6. Matching a thread with a stack trace can help you find errors in applications, especially those that spawn a chain of thread‐start operations (in some cases, handing over to the AWT event thread) in the process of handling a request.
Figure 6 -- Memory Screen Detail View
As with the CPU screen, the Memory screen has fully sortable tables and buttons to clear the CPU and Memory profiling data, or to clear all profiling data. While the sizes reported by the JVM for each object are probably not that useful, it can be revealing to see just how many instances of a class are being created by an application, particularly if the relevant constructor involves some resource‐intensive activity. In this case, you should also be seeing large CPU‐time numbers for the constructor on the MonkeyWrench’s CPU screen.
The MonkeyWrench Thread profiling screen displays, for each active thread, the following data:
- Thread ID
- Thread Name
- Thread state
- % of total CPU time (of all active threads)
- Total CPU time consumed by the thread
- CPU time consumed by the thread in the display interval
- Total times thread has blocked
- Number of times thread has blocked during the display interval
- Total time thread has spent blocked
- Time thread has spent blocked during the display interval
As with the other screens in the MonkeyWrench, all columns in the table are sortable, descending or ascending. In practice, you will usually sort on either the “% of Total CPU Time” or the “CPU Time In Window” columns, descending, showing you the most active threads either in general or for the current sampling interval (which you set in the Configuration screen when you started Thread profiling).
The bottom half of this screen displays a table of blocked and deadlocked threads. Because the MonkeyWrench has a significant impact on the application being profiled, many times a thread will block on a lock held by a MonkeyWrench thread (for example, the ThreadStatisticsUpdater), and vice‐versa. These blocks would not occur if the MonkeyWrench were not profiling your application, of course. For this reason, a checkbox (labeled “Ignore Profiler‐Caused Blocks”) is available; check this box and the MonkeyWrench will not display any blocks where a MonkeyWrench‐specific thread is either blocking or being blocked. In the example, we have left the box unchecked because there are no jEdit‐caused blocks, and we want to show examples of the data displayed in this table.
Figure 7 -- Threads Screen
The Blocked and Deadlocked Threads table displays the following data for each block:
- ID of blocked thread
- Name of blocked thread
- Boolean, true if the thread is deadlocked
- The lock name
- ID of the thread holding the lock
- Name of the thread holding the lock
In cases where a particular block occurs frequently, it is helpful to know the path taken by one of the threads. Selecting a entry in this table produces a window with the stack trace for that particular occurrence. Figure 8 is an example of one of the block events found when the threads were inspected.
Figure 8 -- Blocked Thread Stack Trace View
Note from Figure 7 that all of the MonkeyWrench‐caused blocks have the same lock – an instance of a com.gorillalogic.profiler.GorillaProfiler. This is the MonkeyWrench main profiler class. Since this is the class that every instrumented method in the target application calls to notify of method entries and exits, it’s a frequent object of contention. Of course this is only true during profiling, which is why the Threads screen provides the checkbox to suppress display of these contention events.
As with the other MonkeyWrench screens, the Threads screen provides buttons to clear Thread profiling data, as well as all profiling data. Additionally, it provides a button to clear the history of blocked and deadlocked threads.
Garbage Collection Profiling
While most Java profilers provide a view on garbage‐collection activity, the MonkeyWrench provides a slightly different perspective than most. This view, found on the Garbage Collection screen and shown in Figure 9, provides two time series: one of the number of garbage collections for each display interval, and one of the CPU time spent in garbage collection during the interval.
Most profilers provide a plot of the amount of memory used in a generation as a function of time. While this view can be useful when garbage collection does not occur frequently, it breaks down in situations where (for example), the Eden generation is being collected multiple times per second (which is not an unusual situation). In these cases, the sampling interval of the profiler is too large to capture the fact that many collections are occurring during each sampling interval, and a plot of the generation memory as a function of time can appear to be a random number. Not being aware of this situation can cost a lot of troubleshooting time.
When MonkeyWrench first begins monitoring garbage collection, the cursor for the current time (represented with a t = Now cursor in red) progresses from the left to the right edge of the screen. Time is marked in time before the present time (that is, the cursor), displaying the sampling interval chosen on the Configuration screen. When the “t = Now” cursor reaches the right edge of the graph, the x axis remains fixed and the data points scroll to the left. A combo box at the top of the screen allows you to switch between garbage collectors.
Undergoing multiple garbage collections per second is not necessarily undesirable; more important is the amount of time spent in garbage collection during an interval. In the example shown in Figure 9, at peak, about 40 minor collections are occurring per second, burning about 50 msec of CPU time. If an application is experiencing a very large number of collections (minor or otherwise) in an interval, and using a lot of CPU time to collect, it might signal that the generation is sized too small, or it may indicate that there is an issue in the target application requiring attention.
Figure 9 -- Garbage Collection Screen
For example, when profiling a Swing application, if you notice that a lot of minor collections occur during certain mouse movements, or when selecting a menu item, it may point to an over‐engineered AWT event listener. While the collection display won’t actually point to the section of code at fault, knowing something about what is happening when the collection activity spikes will generally help narrow down the list of possible causes.
As with the other MonkeyWrench screens, two “clear” buttons are provided: one to clear garbage‐collection profiling data, and another to clear all profiling data. This feature allows you to clear the display before triggering a target application event of interest, so that the display will show only activity since the event of interest occurred.
Concluding Remarks and Notes
MonkeyWrench uses the Javassist bytecode injection library to insert calls to the profiler. While this package is extremely simple to use, a number of issues associated with it limit the functionality of the MonkeyWrench. When Javassist is used to instrument core or extension Java classes, the resulting application is disabled, as the classloader becomes unable to locate the MonkeyWrench profiler classes. Future work in MonkeyWrench will involve swapping in other libraries (such as BCEL, or ASM) in search of improvements in this area. While it’s not currently clear exactly what causes this problem (and Javassist may not actually be the cause of the issue), the current approach in MonkeyWrench is to avoid instrumenting any core (java.*), extension (javax.*) or Sun (com.sun.*) classes. It is a development goal to eventually instrument these classes, however. To avoid circular dependencies, neither the MonkeyWrench nor the Javassist classes are instrumented, either (a situation which is not likely to change, as no great value would be obtained from doing so).
Because of the way that MonkeyWrench instruments classes, it cannot provide what is typically called “self time” (time spent only in the method and not in any methods called by the method).
Since the MonkeyWrench is a substantial Swing application in its own right and runs in the same JVM as your target application, you may need to increase the heap space for your application when profiling it. On rare occasions, the additional overhead of MonkeyWrench may cause stack size overflows, a situation which can be remedied with the JVM option –Xss. Note that StackOverflowErrors usually point to an error in application logic (for example, an infinite recursion), in which case increasing the stack size usually doesn’t help. If the error only occurs when you are profiling, though, then a simple increase above the default should get past the issue. You probably won’t see this problem, but if you do, it will probably happen when you are profiling a parser or interpreter.
A final recommendation on using the MonkeyWrench is to narrow down, as quickly as is practical, the set of classes that you wish to profile. Profiling all of an application’s classes (with the regular expression “.*”) can in some cases slow the target application down too much to profile it. If you must instrument all classes, set the display update interval to the maximum (30 seconds) and use the CPU screen to narrow the search down to a few packages. Alternatively, if you find that a package of little or no interest is incurring a great deal of profiling overhead (for example, the parse method of an XML parser), another approach is to list the package in the “Excluded Classes” regular expression.
This was a long, and hopefully comprehensive, tour. If you use MonkeyWrench, perhaps you'll encounter some of the particular profiling scenarios which prompted its development. In any case, I hope you find it useful and that it finds a spot in your profiling toolbox!