Originally Written by Sam Basu
Pain is often the stimulus behind innovations. This is particularly true in software development, in what we endearingly call Pain Driven Development (PDD). Starting from the 1980s, we have all known how to handle relational data – simply put it in a Relational DataBase Management System (RDBMS) and use SQL to work with the data. For the past few years however, our industry has seen an increasing trend towards the usage of NoSQL databases, where data just isn’t stored like in relational databases.
Yes, thousands of internet debates are dedicated to the SQL vs NoSQL fight. But does it have to be a fight? If you are choosing one over the other, do you know for sure why and what the potential benefits of each are? This article is a run-down of most common benefits and pitfalls of both SQL and NoSQL approaches, including simple comparisons and developer to developer considerations. There is no right or wrong; as usual, the classic answer stands true – it depends.
Tables or Collections
The primary difference between relational and non-relational databases is the way data is stored. Relational data is tabular by nature – and hence stored in Tables with rows and columns. Tables can be related to one another and cooperate in data storage, as well as easy retrieval. Non-relational data, on the other hand, is just not meant to fit in tables of rows and columns, but rather grouped together in chunks. Non-relational data is often stored as Collections, like in documents, key-value pairs or graphs. It’s your data – and the nature of your data is one of the primary indicators towards which approach works best for data storage and retrieval.
Pre-defined or Dynamic Schema
Relational data is often referred to as structured data, because the tables have a pre-defined schema (column definitions) to describe the data. This gives data modeling paramount importance, and necessitates the ‘Get it right first time’ attitudes. While a pre-defined schema offers reliability and stability, changes to a schema on a table with pre-existing data is plain difficult. Non-relational data, on the other hand, thrives on dynamic schemas and is often referred to as unstructured data. Non-relational data can easily accommodate changes in data type/structure due to its dynamic schema support.
Normalization or Cost of storage
Data storage in relational databases aims for higher normalization – breaking up the data into smallest logical tables (related) to prevent duplication and gain leaner space utilization. While normalization of data leads to cleaner data management, it often adds a little complexity, especially on data management where a single operation may have to span numerous related tables. Also, the leaner space utilization which used to save precious database storage, is largely moot in today’s world where cost of storage (disk space) is trivial. Non-relational data, on the other hand, is stored in flat collections, where data might often be duplicated. A single chunk of data is seldom partitioned off, rather stored as an entity, thus allowing easier reads/writes to that single entity.
Vertical or Horizontal scalability
Possibly the biggest difference between SQL and NoSQL databases is the way they scale to support increasing demand. To support more concurrent usage, SQL databases scale vertically – this means adding more horsepower to the computer for faster operations on the same set of data. Since the data is stored in relational tables, the performance bottleneck of operations that span a lot of tables is overcome by adding more computational power to the machine. While SQL databases can scale insanely high, eventually there will be an upper limit to vertically scalability. NoSQL databases, on the other hand, scale horizontally. Non-relational data storage being inherently distributed in nature, scalability for NoSQL databases is achieved horizontally by adding more commodity database servers (nodes) in the resource pool to share the load.
Structured or UnStructured Queries
Relational databases manipulate data through what’s called Structured Query Language (though we all just say SQL). SQL is incredibly powerful for supporting database CRUD (Create, Read, Update, Delete) operations and is an industry-standard. Non-relational databases manipulate data in chunks (like documents) and use something called Unstructured Query Language (UnQL), which is not standard and may vary between database providers. The Primary Key concept in a relational table maps roughly to a Document ID in a non-relational storage. SQL databases employ pre-defined optimizations like column index definitions to help speed up query operations, while NoSQL databases enjoy simpler but narrow data access patterns.
Mapper or Native
The choice of SQL or NoSQL data storage also has an impact on developers, albeit a little one. Developers using object oriented programming languages are often dealing one or more data entities (complex structures with nested data, lists or arrays) to fuel the application user interface. When it comes time to talk to the underlying database, things are not always apples to apples. In case of relational storage, the data entities often need to be broken down for normalization and lean storage across multiple related tables. Thankfully, this being a long-standing problem, most programming platforms have come up with an easy solution: an ORM (Object Relational Mapper) layer. This acts as the mapping layer between the tabular relational data source and the object oriented data entities developers deal with. In case on non-relational storage however, there is no need to normalize the data; complex data entities can be stowed away as is in a single unit. Objects used in applications are often serialized into JSon and stored as JSon documents in NoSQL databases.
Transactional or Pure Scale
If your data operations demand high transactional rates or complex data queries needing control over execution plans, that old SQL database can be your best friend in terms of performance and stability. SQL databases allow fine-grained control over transactional atomicity, and easy rollbacks. While NoSQL databases can be used for transactional operations, they really shine with pure scale of operation and quantity of data.
ACID or CAP
SQL databases have been long renowned to maintain integrity through what’s called the ACID (Atomic | Consistence | Isolated | Durable ) properties and most relational storage vendors honor ACID. The goal is to support isolated indivisible transactions whose changes are persisted and leave the data in a consistent state. NoSQL databases, on the other hand, make you choose between any two priorities per the CAP (Consistency | Availability | Partition Tolerance ) theorem, since all three are difficult to achieve in a distributed node-based system.
Just Data or Big Data
While SQL databases can store and handle data reliably, the biggest proposition for NoSQL comes in the form of what we call Big Data – enormous quantities of semi-structured data entities generated by machines or our society every day. The NoSQL, schema-less approach to data management and the infinite, horizontal scaling is potentially the only way to fathom how to capture, curate, search, analyze and visualize big data.
Data records or the Internet of Things/Users
Relational databases are masters at data normalization and lean storage paired with solid performance. But in recent years, our pace of accumulating data is outweighing the operational solace needed for relational storage. What’s spurring this growth in sheer numbers are: Big Users and the Internet of Things. Exponentially increasing number of users are connected to the internet and using our applications simultaneously. The amount of machine-generated data is also increasing tremendously through the proliferation of connected devices providing digital telemetry. Enterprises are thus having to look at NoSQL technologies and infrastructures to deal with the continuous influx of semi or unstructured data.
To the Cloud
The ubiquitousness of cloud computing, in reality, has benefited both SQL and NoSQL databases. Relational storage in the cloud often comes as a service that is replicated, highly available and distributed for greater fault tolerance through horizontal sharding techniques. A NoSQL database hosted as a cloud service benefits from inherent auto-sharding, flexible elasticity for seasonal demand, integrated caching and tremendous computing power to capture, store and analyze big data.
Show me the Names
There is a perception that SQL databases are mostly proprietary, while NoSQL databases are often open source. In reality, both types of databases enjoy both open source and commercial love. The established SQL databases include Microsoft SQL Server, MySQL, SQLite, Oracle and PostGres. Popular NoSQL databases include Couchbase, MongoDB, Redis, BigTable and RavenDB.
You, like Neo in The Matrix, have a choice – blissful ignorance and illusion or acceptance of harsh reality. But choosing to use SQL or NoSQL technologies blindly or based on popular demand is harboring the illusion that that you auto-magically made the right choice. Both have pros and cons, and the right architecture depends on the requirements of the applications you build. That cranky old SQL database is still tremendously powerful and can reliably handle your transactional demands with integrity. Look for NoSQL options when you are nearing the fringe limitations of relational databases and the vastness of your data handling or scale of operations simply demands a more distributed system. With thoughtful choices, you can become the ONE – free your data and build the next-generation of amazing applications!