Common Benchmarking Pitfalls
Join the DZone community and get the full member experience.
Join For FreeEvery once in awhile
each and every one of us is trying to measure performance of certain
parts of the code. It can be either a fully functional application
measured against specific number of user transactions per minute or a
microbenchmark. Which you have written to prove that your fellow
developer is wrong.
Especially in the
latter case it is beneficial to be aware of the pitfalls skewing your
benchmark results. Looking into the benchmarks I have created ~seven
years ago, I have to admit I have been proving my point on several cases
I was actually incorrect. My apologies, dear friends. But I have
learned something along the way and think it is worth sharing. So that
you would not end up in the same stupid situation.
Pitfall 1: noisy
neighbours. When running a benchmark in your own PC, you would not
launch a DVD-RIP next to it. But what about the torrents you forgot in
the background? Or Windows pulling the next update? Problems like this
have skewed my test results several times.
So one should conduct
benchmarking only on server. But be aware that it can become even more
tricky with all the virtualized platforms out there. When you launch
your next virtual instance for a test drive, there is a chance that you
are sharing the same physical hardware with somebody else. And now if
the resources are either tough to virtualize (such as I/O) or you have
elasticity issues, such as short intense bursts in CPU followed by
hypervisor-enforced throttling – you have a problem. At least when
trying to achieve predictable outcome on your benchmark.
Pitfall 2: garbage
collection pauses. Measuring the performance on a garbage collected
environment has to take the garbage collection pauses into account as
well. So – whenever you are running a benchmark, it is recommended that
you turn on the garbage collector logging via -XX:+PrintGCDetails. Now
when monitoring the logs:
0.415: [GC 0.415: [ParNew: 19136K->2112K(19136K), 0.0255607 secs] 100164K->100154K(118108K), 0.0256139 secs] [Times: user=0.04 sys=0.01, real=0.03 secs] 0.451: [GC 0.451: [ParNew: 19136K->19136K(19136K), 0.0000286 secs]0.451: [CMS0.461: [CMS-concurrent-mark: 0.033/0.099 secs] [Times: user=0.12 sys=0.04, real=0.10 secs] 0.580: [Full GC 0.580: [CMS: 107774K->107774K(107776K), 0.0553739 secs] 126910K->126910K(126912K), [CMS Perm : 4620K->4620K(21248K)], 0.0554400 secs] [Times: user=0.05 sys=0.00, real=0.05 secs] 0.635: [Full GC 0.635: [CMS: 107774K->107775K(107776K), 0.0919800 secs] 126910K->126872K(126912K), [CMS Perm : 4620K->4609K(21248K)], 0.0920382 secs] [Times: user=0.09 sys=0.00, real=0.09 secs]
You can estimate the
impact on your benchmark. As we can see from the sample above, the
program has been running just for 635ms and out of this time we have
spent 270ms (42.5%) waiting for the GC to do its job. Now, I am not
saying here that your tests should be configured to avoid GC pauses. I
am just warning that in cases where you are testing for example the
behaviour of large data structures you should watch out that your tests
will not become GC-bound at some point.
Pitfall 3: Just In Time
compilation. Modern JVM’s are amazing in the sense that the JVM
optimizes the code during runtime. So in most cases you would not like
to jump into performance-related conclusions before JIT compilation has
finished its duties. To be aware what JIT is doing during your benchmark
you would need to add -XX:+PrintCompilation to your JVM arguments and
monitor the logs:
240 org.codehaus.groovy.reflection.CachedMethod::compareToMethod (125 bytes) 241 java.util.zip.ZipFile::access$300 (5 bytes) 242 java.util.ArrayList::indexOf (67 bytes) 243 java.util.Arrays::fill (28 bytes) 72 made zombie sun.misc.URLClassPath::getLoader (136 bytes)
On most cases, the
trick here is to wait until the JIT has finished it’s optimization tasks
and switch on the monitoring from that point. For example, as seen from
our last blog post, the results can vary up to 100x performance-wise.
Pitfall 4:
Classloaders. Be aware of the classloading effects. JVM is a lazy
bastard and will not load classes before he really needs them. For the
benchmarks it means that you should not turn on your measurements before
the classloading has finished. For microbenchmarks it is oftentimes not
too difficult – having a warmup phase is all you are going to need. But
when benchmarking things in real world you might wish to switch on
-verbose:class flag to check whether all classes were loaded before your
measurements were turned on.
Pitfall 5: Machine
Architecture. If you are benchmarking something you are planning to use
in production code you should run the test on the same architecture your
production environment is using. Otherwise you could get completely
skewed results – either due to the 32/64 bit difference, GC algorithm
difference, heap region configurations, number of cores, etc.
Pitfall 6: Measuring.
First is the unit of measurement. If your results are measured in
nanoseconds then chances are you are in the wrong territory. Try to
create more coarsely grained tests so that you can go into milliseconds
at least. Better yet – in tens of seconds. Measuring really small units
of work can again skew the results – measuring itself might change the
outcome too much. Second – measure several times. Actually, tens of
times if you can. And I do not mean that you iterate 30x over the
benchmark in your code. I mean you launch the whole benchmark 30 times
and aggregate the results.
Now if everything here
scared you away from writing microbenchmarks – good. Because most often
your microbenchmarks do not help you out with actual performance issues
in your application. Just write simple, clear code that avoids clever
optimizations. This is the code that JITs of the present and future are
likely to optimize themselves. And that’s a job which truly should be
theirs, not yours.
But if besides being
scared, you also felt curious, then – we have a lot more of interesting
content to share. Just follow us in Twitter or RSS and be notified on
time.
garbage collection
Published at DZone with permission of Nikita Salnikov-Tarnovski, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments