Lies, statistics and vendors
Lies, statistics and vendors
Join the DZone community and get the full member experience.
Join For FreeBuilt by operators for operators, the Sensu monitoring event pipeline empowers businesses to automate their monitoring workflows and gain deep visibility into their multicloud environments. Get started for free today.
Overview
Reading performance results supplied by vendors is a skill in itself. It can be difficult to compare numbers from different vendors on a fair basis, and even more difficult to estimate how a product will behave in your system.Lies and statistics
Why is it so hard to give a trustworthy performance number?
 Latencies and throughputs don't follow a normal distribution which is the basis of mathematically rigorous statistics. This means you are modelling something for which is isn't a generally accepted mathematical model.
 There are many different assumptions you can make, ways to test your solution and ways to represent the results.
 You need to use benchmarks to measure something, but those benchmarks are either a) not standard, b) not representative of your use case, or c) can be optimised for in ways which don't help you.
 Vendors understand their products and sensibly select the best hardware for their product. This works best if you only have one product to consider. Multiproduct systems many not have an optimal hardware solution for all the products, even if your organisation allowed you to buy the optional hardware.
 It is easy to report the best results tested and not include results which were not so good.
BTW: I often find it interesting to see what use cases the vendor had in mind when they benchmark their solutions. This can be a good indication of a) what it is good for, b) the assumptions made in designing the solution, and c) how it is generally used already.
Should we ignore all benchmarks?
Percentiles for latency
Percentile  One in N  Scale  Notes 

50%  "typical" 
1x

This is a good indication of what is possible. It is the most optimistic figure you could use 
90%  one in ten 
2x3x

This is a better indication of performance if tested on a real, complex system. 
99%  one in 100 
4x10x

For benchmarks of simplified systems, this is a better indication of what you can realistically expect to achieve 
99.9%  one in 1,000 
10x30x

For benchmarks of simplified systems, this is a conservative indication of what you can expect. 
99.99%  one in 10,000 
20x100x  This number is nice to have but difficult to reproduce, even for the same benchmark, let alone for a different use case. See below 
99.999%  one in 100,000 
varies

This number is almost impossible reproduce between systems. See below 
A guide to the number of samples you need for reproducible numbers
Percentile  One in N  Simple test samples 
Complex test samples 

90%  one in ten  ~ 30  ~ 100 
99%  one in 100  ~ 300  ~ 10,000 
99.9%  one in 1,000  ~ 30,000  ~ 1 million 
99.99%  one in 10,000  ~1 million  ~ 100 million 
99.999%  one in 100,000  ~ 30 million  ~ 10 billion 
99.9999%  one in 1,000,000  ~ one billion  ~ one trillion 
Maximum or 100% 
never

Infinite

Infinite

Based on this rule of thumb I don't believe a real maximum can be measured empirically. Never the less, not reporting it all isn't satisfactory either. Some benchmarks report what is the "worst in sample" which is better than nothing, but very hard to reproduce.
To mitigate the cost of warm up in real systems, I suggest latency critical classes should be preloaded, if not warmed up on start up of your application.
In summary
Download our guide to mitigating alert fatigue, with realworld tips on automating remediation and triage from an IT veteran.
Published at DZone with permission of Peter Lawrey , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
{{ parent.title  parent.header.title}}
{{ parent.tldr }}
{{ parent.linkDescription }}
{{ parent.urlSource.name }}