Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Solid NoSQL Benchmarks from YCSB with a Side of HBase Bologna

DZone's Guide to

Solid NoSQL Benchmarks from YCSB with a Side of HBase Bologna

· Database Zone
Free Resource

MongoDB Atlas is a database as a service that makes it easy to deploy, manage, and scale MongoDB. So you can focus on innovation, not operations. Brought to you in partnership with MongoDB.

One of the great appeals of Cassandra is its linear scalability.  Need more speed? Just add water, er... nodes to your ring.    Proving this out, Netflix performed  one of the most famous Cassandra benchmarks.

Cassandra's numbers are most impressive, but Netflix didn't perform a side-by-side comparison of the available NoSQL platforms.  Admirably, Yahoo! Cloud Serving Benchmark (YCSB) endeavored to perform just such a comparison. The results of that effort were recently published by Network World.

It's not surprising that Cassandra and HBase lead the pack in many of the metrics since both are based on Google's BigTable.  It was surprising however to see HBase's latency near zero.  This warranted some digging.

Now, side-by-side comparisons are always tough because they often depend highly on system configuration and the specifics of the use case / data model used.   And in NetworkWorld's article, there is a key paragraph:
" During updates, HBase and Cassandra went far ahead from the main group with the average response latency time not exceeding two milliseconds. HBase was even faster. HBase client was configured with AutoFlush turned off. The updates aggregated in the client buffer and pending writes flushed asynchronously, as soon as the buffer became full. To accelerate updates processing on the  server , the deferred log flush was enabled and WAL edits were kept in memory during the flush period"

With Autoflush disabled, writes are buffered until flush is called.  See:
http://hbase.apache.org/book/perf.writing.html

If I'm reading that correctly, with autoflush disabled, the durability of a put operation is not guaranteed until the flush occurs.   This really sets up an unfair comparison with the other systems where durability is guaranteed on each write.  When buffering the data locally, nothing is sent over the wire, which naturally results in near-zero latency!

The change required to YCSB to level the playing field can be seen here:

I think its certainly worth including metrics when the autoflush is disabled because that is a valid use case, but YCSB should also include the results when autoflush is enabled.  Similarly, it would be great to see Cassandra's numbers when using different consistency levels and replication factors. Durability is something we all need, but it isn't one-size fits all. (which is why tunable consistency/replication is ideal)
 
Anyway...
I appreciate all the work the YCSB crew put in to produce the benchmark.  Thank you.  And it is not to say you should take the benchmarks with a grain of salt, but you may need some mustard to go with the HBase autoflush bologna.
 
 
 
 
 
 
 

MongoDB Atlas is the best way to run MongoDB on AWS — highly secure by default, highly available, and fully elastic. Get started free. Brought to you in partnership with MongoDB.

Topics:

Published at DZone with permission of Brian O' Neill, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}