Over a million developers have joined DZone.
Platinum Partner

Benchmarking Cassandra: The Right & Wrong Way to Do it

· Performance Zone

The Performance Zone is brought to you in partnership with New Relic. New Relic APM provides constant monitoring of your apps so you don't have to.

Everybody loves comparing databases. Not everybody agrees on how to do it, though. If you ask Jonathan Ellis at the DataStax Developer Blog, for example, one prime example is Thumbtack Technology's benchmarks comparing Cassandra, Couchbase, MongoDB, and Aerospike. The problem, Ellis says, is that the benchmarks give Cassandra a raw deal.

According to Ellis, the benchmarks were basically set up correctly, but ignored some major factors when it comes to benchmark hygiene:

Our problems start with benchmark hygiene: the read runs were run one after the other rather than properly isolating them by dropping the page cache and warming up each workload separately.  It also looks like no effort was made to isolate the effects of Cassandra compaction; compaction from the read/write workload could have continued into the read-heavy section. 

And those aren't even the biggest problems with the benchmarks, Ellis says. By Thumbtack Technology's numbers, Aerospike comes out on top and/or on par with Couchbase, while Cassandra trails behind, with MongoDB even further behind, and Ellis goes into detail for each aspect of the benchmark to explain what aspect of Cassandra was misunderstood or ignored.

To really nail down the argument, though, Ellis runs his own benchmarks. Due to changes in Aerospike's API, he couldn't include Aerospike in his new benchmarks, but instead substituted HBase as another representative of the top NoSQL solutions. His results came out like this:

(Source: Jonathan Ellis at DataStax)

It's an interesting look at the various factors one must consider when making performance comparisons, or any comparisons, given the complexity of these technologies.

The cynical might observe that benchmarks coming at the request of Aerospike (as Ellis notes) show Aerospike's excellent performance, while benchmarks coming from DataStax show Cassandra's excellent performance. The even-more-cynical might observe that both show MongoDB far below all the others - but hey, MongoDB's always being mistreated.

Check out the full article from Jonathan Ellis for all the details, and if you're looking for more in the way of Cassandra's performance, you might find something interesting here:

And more from Jonathan Ellis:

The Performance Zone is brought to you in partnership with New Relic. New Relic’s SaaS-based Application Performance Monitoring helps you build, deploy, and maintain great web software.


{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}