Over a million developers have joined DZone.

Lies, statistics and vendors

· Performance Zone

Evolve your approach to Application Performance Monitoring by adopting five best practices that are outlined and explored in this e-book, brought to you in partnership with BMC.

Overview

Reading performance results supplied by vendors is a skill in itself.   It can be difficult to compare numbers from different vendors on a fair basis, and even more difficult to estimate how a product will behave in your system.

Lies and statistics

One of the few quotes from University I remember goes roughly like this
Peak Performance - A manufacture's guarantee not to exceed a given rating
-- Computer Architecture, A Quantitative Approach. (1st edition)

At first this appears rather cynical, but over the years I have come to the conclusion this is unavoidable and once you accept this you can trust the numbers you get in if you see them a new light.

Why is it so hard to give a trustworthy performance number?

There are many challenges in giving good performance numbers.  Most vendors try harder to give trustworthy numbers but it is not as easy as it looks.
  • Latencies and throughputs don't follow a normal distribution which is the basis of mathematically rigorous statistics.  This means you are modelling something for which is isn't a generally accepted mathematical model.
  • There are many different assumptions you can make, ways to test your solution and ways to represent the results.
  • You need to use benchmarks to measure something, but those benchmarks are either a) not standard, b) not representative of your use case, or c) can be optimised for in ways which don't help you.
  • Vendors understand their products and sensibly select the best hardware for their product.  This works best if you only have one product to consider. Multi-product systems many not have an optimal hardware solution for all the products, even if your organisation allowed you to buy the optional hardware.
  • It is easy to report the best results tested and not include results which were not so good.
Any decent vendor will use their benchmarks to optimise their solution.  The downside of this is that the solution will have been optimised more for the benchmarks they report than use cases the vendor hasn't tested e.g. your use case.

BTW: I often find it interesting to see what use cases the vendor had in mind when they benchmark their solutions.  This can be a good indication of a) what it is good for, b) the assumptions made in designing the solution, and c) how it is generally used already.

Should we ignore all benchmarks?

This can lead people to give up on micro-benchmarks and benchmarks in general because they have been "lied" to many times before.
However, used correctly benchmarks can be a good guide even if they cannot give you definitive or completely reliable answers.  As such I suggest you should be highly sceptical that small difference in performance give you any indication of what you would expect tot see, and only take note of wide variations in performance. By wide variations I mean 3 to 10 times differences.

Percentiles for latency

Customers generally remember the worst service they ever got and take the average service for granted.  When looking at the latency of your systems, it is generally the higher latencies which cause the most issues if not customer complaints.
A common approach for modelling the distribution of latencies is to sort all the latencies and report a sample of the worst.
Percentile One in N Scale  Notes
50% "typical"
 1x
This is a good indication of what is possible.
 It is the most optimistic figure you could use
90% one in
ten
2x-3x
This is a better indication of performance
if tested on a real, complex system.
99% one in
100
4x-10x
For benchmarks of simplified systems, this is a better
indication of what you can realistically expect to achieve
99.9% one in
1,000
10x-30x
For benchmarks of simplified systems, this is a conservative
indication of what you can expect.
99.99% one in
10,000
20x-100x  This number is nice to have but difficult to reproduce,
even for the same benchmark, let alone for a different use case.
See below
99.999% one in
100,000
varies
This number is almost impossible reproduce between systems.
See below

Generally speaking, the latencies escalate geometrically, as you get into the higher percentiles. The very high percentiles have limited value as you have to take more samples to get a reproducible number even on the same system from one day to next.  They can vary dramatically based on the use case or system.

A guide to the number of samples you need for reproducible numbers

Java has a additional feature that it gets faster as it warms up.  In the past I have advocated removing these warm-up figures, but given micro-benchmarks give overly optimistic figures, I am more inclined to include them if for no other reason than it is simpler.
My rule of thumb for reproducible percentile figures is that for 1 in N, you need N^1.5 samples for simple micro-benchmarks and N^2 samples for complex systems.

Percentile One in N Simple test
samples
Complex test
samples
90% one in ten ~ 30 ~ 100
99% one in 100 ~ 300 ~ 10,000
99.9% one in 1,000 ~ 30,000 ~ 1 million
99.99% one in 10,000 ~1 million ~ 100 million
99.999% one in 100,000 ~ 30 million ~ 10 billion
99.9999% one in 1,000,000 ~ one billion ~ one trillion
Maximum or 100%
never
Infinite
Infinite

Based on this rule of thumb I don't believe a real maximum can be measured empirically. Never the less, not reporting it all isn't satisfactory either.  Some benchmarks report what is the "worst in sample" which is better than nothing, but very hard to reproduce.

To mitigate the cost of warm up in real systems, I suggest latency critical classes should be pre-loaded, if not warmed up on start up of your application.

In summary

If you are looking for a performance figure you can use, I suggest using the 99 percentile as a good indication of what you can expect in a real system.  If you want to be cautious, use the 99.9 percentile.
If this number is not given, I would assume you might get about 10x the average or typical latency and 1/10th of the throughput the vendor can get under ideal conditions.  Usually this is still more than enough. 
If the vendor quotes performance figures close to what you need, or worse doesn't quote figures at all, beware !! I am amazed how many vendors will say they are fast, quick, fastest, efficient, high performance but don't quote any figures at all.



Learn tips and best practices for optimizing your capacity management strategy with the Market Guide for Capacity Management, brought to you in partnership with BMC.

Topics:

Published at DZone with permission of Peter Lawrey, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}