Benchmarking Cassandra: The Right & Wrong Way to Do it
Everybody loves comparing databases. Not everybody agrees on how to do it, though. If you ask Jonathan Ellis at the DataStax Developer Blog, for example, one prime example is Thumbtack Technology's benchmarks comparing Cassandra, Couchbase, MongoDB, and Aerospike. The problem, Ellis says, is that the benchmarks give Cassandra a raw deal.
According to Ellis, the benchmarks were basically set up correctly, but ignored some major factors when it comes to benchmark hygiene:
Our problems start with benchmark hygiene: the read runs were run one after the other rather than properly isolating them by dropping the page cache and warming up each workload separately. It also looks like no effort was made to isolate the effects of Cassandra compaction; compaction from the read/write workload could have continued into the read-heavy section.
And those aren't even the biggest problems with the benchmarks, Ellis says. By Thumbtack Technology's numbers, Aerospike comes out on top and/or on par with Couchbase, while Cassandra trails behind, with MongoDB even further behind, and Ellis goes into detail for each aspect of the benchmark to explain what aspect of Cassandra was misunderstood or ignored.
To really nail down the argument, though, Ellis runs his own benchmarks. Due to changes in Aerospike's API, he couldn't include Aerospike in his new benchmarks, but instead substituted HBase as another representative of the top NoSQL solutions. His results came out like this:
(Source: Jonathan Ellis at DataStax)
It's an interesting look at the various factors one must consider when making performance comparisons, or any comparisons, given the complexity of these technologies.
The cynical might observe that benchmarks coming at the request of Aerospike (as Ellis notes) show Aerospike's excellent performance, while benchmarks coming from DataStax show Cassandra's excellent performance. The even-more-cynical might observe that both show MongoDB far below all the others - but hey, MongoDB's always being mistreated.
Check out the full article from Jonathan Ellis for all the details, and if you're looking for more in the way of Cassandra's performance, you might find something interesting here:
- Tuning the JVM to Improve Performance in Cassandra
- Netflix Benchmarks on AWS Show Cassandra NoSQL Still Has the Goods
And more from Jonathan Ellis: