- Nowadays, performance is completely non-predictable. You have to measure it and employ proper statistics to get some meaningful results.
- Microbenchmarking is very, very hard to do correctly. No, you misunderstand me, I mean even harder than that!
- From the resources: Profiles and result evaluation methods may be very misleading unless used correctly.
There has been another blog about it but I’d like to record here more detailed remarks.
Today we can’t estimate performance, we must measure it because the systems (JVM, OS, processor, …) are very complex with many different heuristics on various levels and thus the performance is highly unpredictable. This doesn’t apply only to Java, but also to C, C++, even to assembly code.
Example: Results during a single JVM run may be consistent (warm-up, then faster) but can vary between JVM executions even by 20%. One of the causes may be Compilation Planning (what’s inlined, …) – it’s done in a background thread and thus is inherently non-deterministic.
Therefore don’t estimate but measure and not only that – also do statistical processing of the data (how often diff. values appear, what they are, … – mean, median, standard deviation etc.).
“Profiles don’t help much; in fact, they can mislead” – Mytkowicz, Diwan etc. – “Evaluating the Accuracy of Java Proﬁlers”, PLDI ’10 – in their experiment, each of 4 leading profiles identified a different hotspot. I’d really recommend you reading the related StackOverflow discussion “If profiler is not the answer, what other choices do we have?” (the answer is: profilers have their value, but use the correct ones and use them correctly). The conclusion of the original paper:
Our results are disturbing because they indicate that proﬁler incorrectness is pervasive—occurring in most of our seven benchmarks and in two production JVM—-and signiﬁcant—all four of the state-of-the-art proﬁlers produce incorrect proﬁles. Incorrect proﬁles can easily cause a performance analyst to spend time optimizing cold methods that will have minimal effect on performance. We show that a proof-of-concept proﬁler that does not use yield points for sampling does not suffer from the above problems.
“Benchmarking is really, really hard!” and “Most benchmarks are seriously broken“. Broken means that either the measurement’s error is higher than the value being measured or that the results obtained are unrelated to intended measurements. It seems that it is actually really hard to find a (micro)-benchmark, which isn’t broken. Joshua recommends Cliff Click’s JavaOne 2009 presentation The Art of (Java) Benchmarking (see also an interesting related interview with Cliff), which I belive to have seen and which points out the various traps here. Joshu also mentiones that some frameworks, such as Google Caliper may help you to avoid the pitfalls, though I’m quite sure they can’t protect you from all.
Joshua mentiones a couple of interesting papers, you should check the slides for them. One which sounds really interesting to me is by Georges, Buytaert and Eeckhout – Statistically Rigorous Java Performance Evaluation, OOPSLA07 (20 pages). They mention there that you need to run VM 30 times to get meaningful data. From the abstract:
This paper shows that prevalent methodologies can be misleading, and can even lead to incorrect conclusions. The reason is that the data analysis is not statistically rigorous. In this paper, we present a survey of existing Java performance evaluation methodologies and discuss the importance of statistically rigorous data analysis for dealing with non-determinism. We advocate approaches to quantify startup as well as steady-state performance, and, in addition, we provide the JavaStats software to automatically obtain performance numbers in a rigorous manner. Although this paper focuses on Java performance evaluation, many of the issues addressed in this paper also apply to other programming languages and systems that build on a managed runtime system.
I find this subject very interesting because for over a year I’m involved in performance optimization of one of our data feeds, which used to run for couple of days (latest results: 1/2h [with a bit of cheating]). My experience completely supports what Joshua says – don’t guess but measure, profilers may be misleading, performance is unpredictable. Though as a collegue mentioned, in the domain of enterprise Java, our performance problems are usually caused by the database and communication with it (which 100% applies to that feed too).
I’ve already blogged about some experiences, e.g. in The power of batching or speeding JDBC by 100 (inspired by JDBC performance tuning with fetch size), check also the performance tag for interesting links. I also appreciated and applied the knowledge from Accurately computing running variance (I often wish I have slept less and paid attention more during the uni math lectures ).
The higher complexity, the higher unpredictability =>
- As an application programmer, use high-level, declarative constructs where posible to push the responsability for performance one level down to library and JVM authors, who should know better.
- Measure repeatedly and process the results with proper statistics. Don’t forget to repeat them over time, the platform evolves with every release.
Once again, microbenchmarking is hard! If you have to play with it, use something like Caliper and be aware that your results are most likely wrong anyway.