Benchmarking Cassandra: The Right & Wrong Way to Do it
Join the DZone community and get the full member experience.
Join For FreeEverybody loves comparing databases. Not everybody agrees on how to do it, though. If you ask Jonathan Ellis at the DataStax Developer Blog, for example, one prime example is Thumbtack Technology's benchmarks comparing Cassandra, Couchbase, MongoDB, and Aerospike. The problem, Ellis says, is that the benchmarks give Cassandra a raw deal.
According to Ellis, the benchmarks were basically set up correctly, but ignored some major factors when it comes to benchmark hygiene:
Our problems start with benchmark hygiene: the read runs were run one after the other rather than properly isolating them by dropping the page cache and warming up each workload separately. It also looks like no effort was made to isolate the effects of Cassandra compaction; compaction from the read/write workload could have continued into the read-heavy section.
And those aren't even the biggest problems with the benchmarks, Ellis says. By Thumbtack Technology's numbers, Aerospike comes out on top and/or on par with Couchbase, while Cassandra trails behind, with MongoDB even further behind, and Ellis goes into detail for each aspect of the benchmark to explain what aspect of Cassandra was misunderstood or ignored.
To really nail down the argument, though, Ellis runs his own benchmarks. Due to changes in Aerospike's API, he couldn't include Aerospike in his new benchmarks, but instead substituted HBase as another representative of the top NoSQL solutions. His results came out like this:
(Source: Jonathan Ellis at DataStax)
It's an interesting look at the various factors one must consider when making performance comparisons, or any comparisons, given the complexity of these technologies.
The cynical might observe that benchmarks coming at the request of Aerospike (as Ellis notes) show Aerospike's excellent performance, while benchmarks coming from DataStax show Cassandra's excellent performance. The even-more-cynical might observe that both show MongoDB far below all the others - but hey, MongoDB's always being mistreated.
Check out the full article from Jonathan Ellis for all the details, and if you're looking for more in the way of Cassandra's performance, you might find something interesting here:
- Tuning the JVM to Improve Performance in Cassandra
- Netflix Benchmarks on AWS Show Cassandra NoSQL Still Has the Goods
And more from Jonathan Ellis:
Opinions expressed by DZone contributors are their own.
Trending
-
VPN Architecture for Internal Networks
-
Microservices Decoded: Unraveling the Benefits, Challenges, and Best Practices for APIs
-
Design Patterns for Microservices: Ambassador, Anti-Corruption Layer, and Backends for Frontends
-
How To Check IP Addresses for Known Threats and Tor Exit Node Servers in Java
Comments