OverviewYahoo Cloud Service Benchmark is a reasonably widely used benchmarking tool for testing key value stores for a significant number of key e.g 100 million, and a modest number of clients i.e. served from one machine.
In this article I look at how a test of 100 million * 1 KB key/values performed using Chronicle Map on a single machine with 128 GB memory, dual Intel E5-2650 v2 @ 2.60GHz, and six Samsung 840 EVO SSDs.
The 1 KB value consists of ten fields of 100 byte Strings. For a more optimal solution, primitive numbers would be a better choice. While the SSDs helped, the peak transfer rate was 700 MB/s which could be supported by two SATA SSD drives.
These benchmarks were performed using the latest version at the time of the report, Chronicle Map 2.0.6a-SNAPSHOT.
Micro-second world.Something which confounds me when reading benchmarks about key-value stores is that they start with the premise that performance is really important. IMHO, about 90% of the time, performance is not the most important feature, provided you have sufficient performance.
These benchmark reports then continue to report times in milli-seconds, not micro-seconds and throughputs in the tens of thousands instead of the hundreds of thousands or millions. If performance really was that important, they would have built their products around performance, instead of the useful features they do support, like multi-key transactionality, quorum updates and other features Chronicle Map doesn't support, for performance reasons.
So how would a key-store built for performance look with YCSB?
Throughput measuresThe "50/50" tests 50% random reads and 50% random writes, the "95/5" tests 95% reads to 5% writes. It is expected that writes will be more expensive, and a higher percentage of reads results in higher throughputs.
|Threads||50/50 read/update||95/5 read/update|
|1||122 K/s||262 K/s|
|2||235 K/s||496 K/s|
|4||339 K/s||910 K/s|
|8||565 K/s||1.010 M/s|
|15||973 K/s||1.445 M/s|
|30||816 K/s||1.787 M/s|
LatenciesThe following latencies are in micro-seconds, not milli-seconds.
|Threads: 8||50/50 read||95/5 read||50/50 update||95/5 update|
|average||5.7 µs||4.9 µs||13 µs||12.9 µs|
|95th||15 µs||13 µs||27 µs||25 µs|
|99th||25 µs||30 µs||44 µs||47 µs|
|worst||52 ms||52 ms||52 ms||52 ms|
Note: the benchmark is not designed to be GC free and creates some garbage. This is not particularly high and the benchmark itself uses only about 1/4 of CPU according to flight simulator, however it does impact the worst latencies.
ConclusionMake sure the key-value store has the features you need, but if performance is critical, look for a solution designed for performance as this can be 100x faster than full featured products.
Other high performance examples
Room for improvement
[OVERALL], Throughput(ops/sec), 3,296,576
[READ], Operations, 190002671
[READ], AverageLatency(us), 4.81
[READ], MinLatency(us), 0
[READ], MaxLatency(us), 74864
[READ], 95thPercentileLatency(ms), 0.009
[READ], 99thPercentileLatency(ms), 0.014
[READ], Return=0, 115841209
[READ], Return=1, 74161462
[UPDATE], Operations, 9997309
[UPDATE], AverageLatency(us), 12.23
[UPDATE], MinLatency(us), 1
[UPDATE], MaxLatency(us), 75015
[UPDATE], 95thPercentileLatency(ms), 0.017
[UPDATE], 99thPercentileLatency(ms), 0.028
[UPDATE], Return=0, 9997309