The NoSQL hype is omnipresent. And many startups are tempted to go for Cassandra/MongoDB/HBase/Redis/… . Here I’ll argue why they should rather stick to a SQL solution – MySQL or PostgreSQL.
In my previous post about Cassandra I detailed why I decided not to use it. Now, a dozen presentations watched and several dozen articles read later, I can detail why I think it is not generally a good idea.
NoSQL is great for “web-scale”. That is the mantra of NoSQL evangelists. But an important downside of NoSQL solutions, which is mentioned by most sources (twitter, facebook, rackspace) is that in NoSQL (at least for Cassandra and HBase) you must know what will be the questions that you will be asking upfront. You can’t just query anything you like. On the other hand the relational model allows you to define your model and then ask whatever question (query) comes to your mind. And I can bet that a startup does not yet know all the questions it is about to ask its data store.
Another thing is usability. All developers are familiar with SQL and relational model. And startups must get in the public fast. Why bother learning a new paradigm, a new platform, and new tools (if you are lucky to have tools)?
Now let’s get back to the web-scale. A startup does not need web-scale. Really. You are not getting a million users overnight. Twitter didn’t. Facebook didn’t. If things work out you can gradually upgrade your data model to meet the new demands. That’s how twitter and facebook did. They started with MySQL. Oh, by the way – twitter is still using MySQL for the most important thing – tweets. Now, if you have more data than them, you are.. facebook?
So to summarize – don’t sacrifice flexibility and ease of work for some fictional “trillions of petabytes”. If it happens that you need to handle huge amounts of data, it will be in a way that you will be able to restructure your data model. And at a point when you will know what questions you want to ask.