Twitter Passes On Cassandra-based Tweet Storage For Now

DZone 's Guide to

Twitter Passes On Cassandra-based Tweet Storage For Now

· Java Zone ·
Free Resource
An update on the state of Cassandra at Twitter sparked a huge controversy over the weekend as flame wars broke out with some declaring Cassandra's fall from grace as the champion of the NoSQL movement.  Others stood by Apache's prized project, saying that detractors were making a gross misinterpretation of Twitter's announcement while defending the data store against technical criticisms.  Let's look at the facts first, and then the flames.

By no means has Twitter stopped using Cassandra.  They have stated that it's currently being used to store geolocation data and data mining results that feed into things like local trends and @toptweets.  Twitter is also planning to use Cassandra as a part of their monetzation strategy.  Specifically, they are using it for a realtime analytics product (to be used "internally and externally") that is currently under development.  

What sparked the controversy and speculation was Twitter's decision to hold off on moving tweet storage over to Cassandra.  "This is a change in strategy," said Ryan King, an architect at Twitter.  "Instead we're going to continue to maintain our existing Mysql-based storage. We believe that this isn't the time to make large scale migration to a new technology."

Now here's an example of the backlash against Cassandra and the responses to detractors:

"The bloom is starting to come off NoSQL, which is normal - it means that people & firms are trying to do more with it and most probably realizing that all of the tools, support, infrastructure, etc. surrounding alternative solutions isn't such a bad thing.  And that the world of NoSQL had start to come up with a better mantra than "joins are bad, dude", and "you're just protecting the status quo."  There's a *lot more* big data wrapped up inside of SQL databases and only a fraction of the in NoSQL - and there's a lot of reasons for it." --Colin Clark

"You are, for whatever reason, using the dullest of cliches as if they were informed opinion.  Nobody with actual knowledge of the space says "joins are bad, dude".  What they might say is "When you have petabytes and low latency requirements, joins are an expensive proposition".  That is clearly a true statement and constructing indices in a column store to avoid joins is a reasonable decision to avoid that expense.  Is it free?  Of course not, nothing is." --Response, Benjamin Black

"For example, do I *really* need Cassandra if MySQL will work for me and I just want to get up and running quickly without writing a bunch of code?  My team was pushing greater than 20k updates per second into, GASP, Oracle 5 years ago.  Sure, it was expensive.  But it worked.  And it was worth it - or we wouldn't have spent the $$.  What's your data worth if you don't have your data? zero." --Colin Clark

"Had you spent any time on the irc channel you would've seen this advice given repeatedly.  If you don't need what Cassandra does, don't use it.  That you have seen 20k updates/sec on really expensive hardware with a SQL store is neither surprising nor relevant.  As you must realize, those choose to ignore, Cassandra is about more than just high, per-node write throughput.  It is about seamless scale-out of a single cluster, robustness in the face of node failure and network partition, etc.  Can you do that with a SQL store?  Certainly.  Expect to pay 5x in hardware and not be able to operate multi-DC."  It's what folks call a trade-off. --Response, Benjamin Black

"And then there's support - internal support.  Picking a database du-jour is organizationally expensive.  Especially when there's probably one or two databases that Twitter could have bought off the shelf that would have solved their problems." --Colin Clark

"You have no idea what their actual problems are and are merely engaging in the favorite game of HN and similar venues: armchair engineering." --Response, Benjamin Black

And here was one blogger's take on the issue:

"Twitter is busy fighting other fires and they don't have the time to retrofit something that is (more or less) working, namely their MySQL based tweet storage, with a completely new technology based on Cassandra. Does this mean Cassandra and NoSQL suck? No, I think it's just smart project planning." --Todd Hoff, High Scalability

Perhaps a change is what Twitter needs though.  I personally find that out of all of the sites that I frequently visit, Twitter is the most buggy and experiences the most downtime.  I'm not saying that Cassandra could fix that, I have no idea, but I hope they will start making more money so they can bring in some heavy duty solutions and make their site super-reliable.

Thoughts?  ..on Twitter… or Cassandra?


Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}