The Hidden Valley Where Performance Defects Graze, Waiting to be Slaughtered
The Hidden Valley Where Performance Defects Graze, Waiting to be Slaughtered
Before you get too overwhelmed by your stack traces in Java, take a step back and check out this article that should help.
Join the DZone community and get the full member experience.Join For Free
Container Monitoring and Management eBook: Read about the new realities of containerization.
In my previous blog post, I talked about the IT industry’s desperate need for a steady flow of Java developers with performance skills — but instead, all we get is a drip, drip, drip, where very few (if any) Java developers vet their code for performance defects before checking it into source code control. I suggested a few ideas on how to crank up that flow a bit with better education on Java performance — I’m basically looking for better performance books, and better and less expensive instruction.
To further crank up that flow a bit, this blog post imparts a few performance secrets from Chapter 11 of my new Java performance book; there is even a video at the end of the post, which is from part of a full house presentation I gave at the St. Louis Java Users Group. Here's a brief TL;DR if you just need the highlights.
The “Entry,” “Trigger,” and “Current” are three helpful landmarks for understanding stack traces, which are notoriously complicated. The Entry landmark shows where the protocol started the thread. The Trigger landmark is where your code calls into another API, and the Current is the code that’s running on the CPU at the instant the stack trace was captured. The key to fixing performance defects quickly is to understanding how your code (the Trigger) invokes the code in the bottleneck (the Current landmark).
Reproduce It Yourself
With just a 5-10 minute install, you can reproduce the exact stack trace used here on your own machine with
So let’s get started. This might be a bit controversial, but I’m going to take you to a secret place. Frankly, I’m not sure why it's so secret and why other performance experts have been so hush-hush about this, because this magic place has been around forever — since the first end user, with a beet red face, shouted and cursed at their slow computer. So performance people, if you want to know who spilled the beans — it was me. Couldn’t help it.
Perhaps others haven’t shared the location of the Hidden Valley because they thought you, the Java developer, couldn’t handle it. They thought you would be a bit scared and that you perhaps might not even admit it. But don’t worry, I’m was a bit scared myself — of Java stack traces, of course — massive, complex, hairy, 100 line stack traces, like the ones you get with a java.lang.Exception, littered with mostly unfamiliar class and package names. Stack traces also show up in thread dumps and Java profilers — to show what triggered the code eating all the CPU.
To bust through a stack trace’s complexity, the topic of this post, I like to start with a mental “you are here” arrow, like the big red ones you see on maps at Disney World. The “you are here” for a stack trace is the line at the very top — the code that’s currently on the CPU, so I call this the “Current” landmark. For example, when you use the JDK’s jstack to capture a thread dump, the Current landmark shows the code currently on the CPU for every thread at the instant you ran jstack (it just takes an instant).
Below, line 9 says
java.io.FileInputStream.readBytes() — this is the “Current” landmark in the stack trace below (captured using the JDK’s jstack tool).
I captured this stack trace while I was doing some performance tuning and this particular “Current” showed more than any other in the thread dumps — so this is a likely candidate for a performance bottleneck, the lowest hanging fruit.
But what is to be done about this particular Current:
FileInputStream.readBytes(? Should I open a bug report with the openJdk? Or should we chalk this up to the “Java is slow” category? When you’re asking these kinds of questions, you have definitely lost your way to the Hidden Valley where performance defects graze. Actually, the performance Nazi inside me (think Seinfeld’s Soup Nazi) says that you don’t get to ask these questions until you’ve done your homework: Understand your code’s role in triggering the Current activity. You also need to understand how and whether you can tweak your code to alleviate the Current bottleneck.
As a reminder, the Current is the first landmark discussed — and the second landmark, the Trigger, points to the code in your Java package that “Triggered” (aka calls) the Current activity.
In most cases, there are at least a handful of adjacent stack trace lines in your package that trigger the Current. See all the adjacent lines (four of them) in the rectangle below in my
eostermueller package? The topmost of those four lines is the Trigger, the one line of your code that made calls into someone else’s package (java.util in this case), which ultimately leads to the Current (not shown below, see Figure-1 above).
Just by looking at the stack trace, we already know a lot: Line 20 in the stack trace shows that the Trigger method,
getUuid() on line 379 of my Item.java, made a call to
java.util.UUID.randomUUID() , which ultimately caused
FileInputStream.readBytes() (the “Current”) to run.
We also learned that immediately blaming the openJdk for the performance problem was a bit premature — instead, we must first locate the Trigger to discover our own code’s role in causing the slowdown. So at last, the Trigger landmark itself is the fertile Hidden Valley, where performance defects graze. Most of the time, you’ll find performance defects here, or in the immediate vicinity. Make sure the Trigger is the first place you look, and you’ll find performance defects everywhere, and the hunting will be easy. Slaughtering sounds like overkill.
This works with Java profilers (JProfiler, yourkit), thread dumps (from jstack), or Application Performance Management (APM) tools (Dynatrace, New Relic, AppDynamics, Glowroot,org, etc…) — because they all use stack traces to display performance metrics.
But even though we’ve finally discovered this fertile hidden valley of the Trigger, this story is not over. Without the third landmark, there are still obstacles to understanding scary, complicated stack traces.
One single 100-line stack trace can be scary on its own, as shown above. But it gets worse, I promise you. What about when they gang up on you, and there are dozens of stack traces in a single thread dump? Of all those stack traces, which ones are a cause for performance concern?
That sounds really scary, but do not fear. Stay on target, red 5. Keep your pants on. Seeing all those stack traces in a thread dump once scared me, too. I would say to myself, “I get paid to program in Java — aren’t I supposed to understand what’s going on in there?”
That all changed when I discovered a simple performance secret that sounds a little obvious:
Your performance problem is caused by the threads under load.
When there are zero users on the system (perhaps at night?) there is no load. That’s right, and it’s stupid obvious. But let’s get even stupid-er. How do you know which of those dozens of threads are under load? Here is the stupid-obvious answer:
The threads under load are the ones launched
by the protocol used by your end users.
This is so obvious that all Java programmers should know this. Specifically, if an end user calls into your JVM over HTTP/S, there is real smart HTTP/S-aware Java code (that comes with your container, like Tomcat, Jetty, JBoss, WebSphere, etc.) that reads the HTTP/S request and passes it off to your java.lang.Servlet, which then “Triggers” the code that leads to the “Current.”
Those are the two landmarks we’ve discussed thus far, the Trigger and the Current. The third landmark, easy enough, is called the Entry. The Entry is the bottom-most line in the stack trace from jstack. It might take a few method calls, but the Entry leads to your container’s protocol code that launched the thread — specifically, see the HTTP 1.1 code on line 69 in the rectangle?
With all three landmarks defined, let’s return to the last scary question we raised:
Of all those stack traces [in a thread dump],
which ones are a cause for performance concern?
We now know that we should look for those threads that are under load. Unfortunately, you won’t find the threads under load carrying little signs saying “I’m under load”. Instead you should go through all the stack traces and find just those that:
Have an Entry landmark with
code from the container I’m using
- …and have code, likely in that container’s package space, that handles the protocol(s) in use by our end users.
- Have a Trigger landmark that shows some code in my package space (like eostermueller in my case) that’s been executed.
Of course, there are some exceptions to these guidelines. For example, threads running as part of a Garbage Collection algorithm are of “performance concern”, and they won’t have Entry-Trigger-Current landmarks as described. So, all performance enquiries should look at both — thread health and heap/GC health, perhaps using jstat that comes with the JDK. These two areas are items 3 and 4 in my the four-part checklist — the P.A.t.h. Checklist, outlined in my new performance book.
Here is one additional concern: if your system implements the Reactor pattern, the Entry-Trigger-Current might look slightly different, but nonetheless, the threads launched by your container will be identifiable by some kind of a pattern, which you should hone in on to identify the threads under load.
Now that we have all three landmarks defined, don’t stack traces seem a little less scary? Sure, you might not understand every line, but without too much trouble you can understand the flow from the Entry to the Trigger to the Current, and you can also use these landmarks to help identify exactly which threads in a thread dump might be of performance concern. To help remember the landmark names, just think of the acronym for “et cetera”: ETC… — Entry, Trigger Current.
Thanks for reading all the way to the end of this long blog post! As mentioned above, with just a 5-10 minute install, you can reproduce the exact stack trace used here on your own machine with this project.
Also, stay tuned for more blog posts in this series.
How did this tuning story end? How did Entry-Trigger-Current help improve the performance of the
UUID.randomUUID()code shown above? This forthcoming blog post will tell the whole story.
You need a plan for finding performance defects in any JVM, not just those few that are carefully instrumented with just the right monitoring. This forthcoming blog post will show you how defects can be detected in any JVM using the Entry-Trigger-Current landmarks along with a fantastic but undocumented technique called “mTDp” — manual Thread Dump profiling.
- Sample code for performance training gets complicated — you need visibility tools for measuring performance, but you also need quick and tidy ways to deploy fixes for code. So, one forthcoming blog post will show how the littleMock performance samples addresses these issues.
Below are a few 1-minute-ish video clips from my presentation at the St. Louis Java Users group on March 8, 2018. These shed a little more light on ETC, Entry-Trigger-Current.
Published at DZone with permission of Erik Ostermueller . See the original article here.
Opinions expressed by DZone contributors are their own.