A little more than a year ago, Apache Cassandra's reputation was untouchable. It was blowing other NoSQL data stores out of the water in benchmarks and in our very own DZone popularity poll. What else would you expect from the data solution that was originally designed to handle the data on Facebook. How could it not be the top solution out there?
But last year, Cassandra's reputation seemed like it got a little tarnished by stories about its instability and difficult learning curve. And then there were subsequent migrations which were induced by the emerging and the growing popularity of MongoDB. What really hurt Cassandra was Twitter announcing that it would hold off on the migration of their tweet storage over to the NoSQL store. It was still used to store geolocation data and data mining results that feed into things like local trends and @toptweets, but the damage was still done.
Fastforward to last month and we see the stability issues fade away as Apache Cassandra reaches a major milestone in version 1.0. And just this week there's been benchmarks done by Netflix which vindicate their 6-month migration to Cassandra.
on Amazon EC2 instances:
To measure scalability, the same test was run with 48, 96, 144 and 288 instances, with 10, 20, 30 and 60 clients respectively. The load on each instance was very similar in all cases, and the throughput scaled linearly as we increased the number of instances. Our previous benchmarks and production roll-out had resulted in many application specific Cassandra clusters from 6 to 48 instances, so we were very happy to see linear scale to six times the size of our current largest deployment. This benchmark went from concept to final result in five days as a spare time activity alongside other work, using our standard production configuration of Cassandra 0.8.6, running in our test account. The time taken by EC2 to create 288 new instances was about 15 minutes out of our total of 66 minutes. The rest of the time was taken to boot Linux, start the Apache Tomcat JVM that runs our automation tooling, start the Cassandra JVM and join the "ring" that makes up the Cassandra data store. For a more typical 12 instance Cassandra cluster the same sequence takes 8 minutes.
The blog post is kindly divided into overview and TL;DR sections if you're not interested in the nitty-gritty performance details. Cockroft even offered Netflix's infrastructure up for your testing curiosity... seriously: "If you are the kind of performance geek that has read this far and wishes your current employer would let you spin up huge tests in minutes and open source the tools you build, perhaps you should give us a call..."