While some of you might be NoSQL gurus, there is often a lack of solid knowledge about NoSQL in general and some common myths as well. Specifically, topics like NoSQL applicability/use cases and its comparison (fair and unfair) against relational databases are often driven by incomplete knowledge. I do not claim to be an expert in the NoSQL domain, but I guess it might be useful to jot down what ever little I know about NoSQL (in general). Someone looking at NoSQL from the view point of a (relative) beginner can benefit from this post (maybe?). So let’s dive in.
What’s In a Name? NoSQL...
It does not mean "No SQL" (see how a white space can screw things up)! That’s far from the truth. It actually means, "Not Only SQL"! See the difference ? Even NoSQL based technologies use some kind of (mostly proprietary) language to "query" against their native store – it is not very different from a Structured Query Language then!
Ok, so it’s "Not Only SQL". But...
What Are Its Main Characteristics ?
It’s hard to list down all the specific attributes of a NoSQL solution with precision. Some solutions might or might not support all of these properties, but by and large, these are the most common ones.
This is perhaps the most fundamental attribute of all. NoSQL solutions do not have the notion of a schema (which is nothing but a bunch of metadata about your data containers – rows and columns).
Distributed By Nature
Most of the NoSQL solutions support a distributed architecture where the data itself is partitioned and load balanced (entirely different terms!) across multiple instances
It is an abbreviation for "Basically Available, Soft state, Eventual consistency". Well I do not want to go into its detail [ because I do not completely understand it ! ;-) ]. But what I do know for a fact that, BASE (for NoSQL) vs ACID (for RDBMS) are often major points of debate.
Types of NoSQL Solutions
These are the most common categories/types/variants of NoSQL solutions:
- Key-Value pair: Stores data as key value pairs where values has no fixed representation e.g. Redis, Oracle NoSQL.
- Document store: Store documents (XML, JSON, BSON etc) as values e.g. MongoDB, Couchbase.
- Graph: store information in a graph like structure e.g. Neo4j, OrientDB.
- Column based: store data in columns but no rows e.g. Apache Cassandra.
…or maybe a combination of one or more of the above!
Ok, So Why Should I Choose a NoSQL Solution Over My Good Old RDBMS?
Alright, so this is the right moment to talk about some differences. Hopefully I can get these approximately correct if not 100% accurate.
It’s easy (at least theoretically) to add additional nodes/instances of a NoSQL data store to meet the increasing demands of your application. This is made possible by the fact that NoSQL solutions are designed to work well in a distributed fashion.
This is related to the "No Schema" characteristic. NoSQL solutions are flexible in the sense that they either have no schema at all or they allow relatively unstructured data to be stored without tinkering with the administration side of things (e.g. evolving the schema in case of a RDBMS).
This might sound silly at first. You might say, anything (including a RDBMS) can be made redundant (highly available) by adding more instances. That’s true. With NoSQL solutions, it’s just more easier to do this since they are (generally) designed (from the ground up) with extreme scaling in mind which automatically makes them highly available – if one node fails, your application does not halt. The data gets re-distributed (re-partitioned) among the remaining nodes and the show goes on.
The performance of a distributed NoSQL solution shines in problem domain involving large data sets since it can be scaled horizontally (by adding more nodes).
More suitable for the Cloud
In my opinion, Cloud computing (specially the ones related to PaaS) services are about elastic scaling (cost effective management of resources where your instances increase/decrease based on policies which are further based on factors like load/volume/time etc.), easier setup & provisioning along with smooth upgrade/patching process. NoSQL solutions fit the bill perfectly (I am pretty certain, at least from the scaling point of view).
Again, its all about horizontal scaling and not vertical scaling. Horizontal scaling means spinning up more instances (much more cheaper) rather than upgrading the hardware of a single machine (can get costly after a certain point of time).
Ok.. so ‘No Caveats’ with NoSQL ?
Absolutely not! As with any technology, there are pros and cons, even though its usage might be perfectly for your use case.
- For RDBMS purists, Eventual Consistency of a NoSQL solution is not good enough. Lack of ACID properties is often cited as the top most drawback of NoSQL stores in specific use cases/domains.
- Heterogenous products and lack of standards: There has been an explosion of NoSQL solutions. Although many of the basic concepts and characteristics remain the same, learning NoSQL solutions from different vendors makes for a steep learning curve! It is because, there is no specific standard/API around this technology yet (at least I have not seen one)
- Relatively new: does not sound like a serious caveat, but it can take time for teams to ramp up with the technology as compared to RDBMS (which has been in force since decades!)
When should I use a NoSQL solution?
I have no personal experience of implementing a NoSQL solution in production, but from a common sense perspective, this is what I think.
The best answer, as you all know is ‘it depends’ ;-) Well, maybe not ? Whether you are thinking about NoSQL vs RDBMS or comparing various NoSQL offerings, you should look at your use case and then take things from there. If you need ACID properties, avoid NoSQL. If you have large data sets and the type of data is non relational in nature, its better to leverage NoSQL and its scalability properties. As far as choosing from a graph, key-value, document or column based NoSQL store is concerned, the answer (or maybe the question) still remains the same – ‘what does your use case require ?’