SQL has served as the primary data storage method for more than 40 years and went through at least two periods of exponential expansion: in the 1990s, after the rise of web applications, and over the last decade, due to the explosion mobile devices. As a result, smaller and smaller companies started finding the use of databases beneficial, while Internet giants like Google operate units of data rising up to PBs or even EBs.
Over the course of its existence, SQL gave rise to a number of iterations of itself, the most important being SQL per se, NoSQL, and NewSQL — together, they are responsible for the large portion of database market.
So, if you are to choose a database tool, which one should you opt for? Actually, there is no definite answer. Different people and companies choose different variants depending more on their preference and relative advantages for each particular project than any outright supremacy of one over all the others. So, what might these advantages and disadvantages be? Let’s take a look.
SQL is a relational database management system (RDBMS) and, as the name implies, it is built around relational algebra and tuple relational calculus. It has been the primary database solution since the '70s and only recently made room for newcomers. Whatever some people may say, this primarily signifies that it has been, and still is, good enough for a wide range of tasks. Its primary advantages are as follows:
- It uses a single uniform language (DDL) for different roles (developer, user, DBA).
- It uses a single standardized language for different RDBMS.
- It uses an advanced and non-structural querying language.
- It sticks to ACID principles (atomicity, consistency, isolation, durability), thus guaranteeing stability, security, and predictability both of the entire database and every transaction in particular.
As you can see, a lot of SQL’s benefits come from its uniformity, comfort, and ease of use. Even with a very limited knowledge of SQL (or entirely without it, if need be) you can reliably use it with the help of special tools like an online SQL Query Builder.
The drawbacks, however, make it highly unsuitable for certain types of projects. The main problem with SQL is that it is very difficult to scale, as its performance quickly deteriorates as a database grows larger. Sharding is quite problematic as well.
One of the reasons NoSQL and NewSQL appeared was the fact that old RDBMS weren’t designed to handle the number of transactions modern databases have to deal with every second. Giants like Amazon or Alibaba deal with staggering amounts of data that will choke an old RDBMS in a matter of minutes.
NoSQL (Not Only SQL)
NoSQL is achieving greater and greater popularity with every passing year, its most significant implementations being such products as Apache Cassandra, MongoDB, and others. It has been primarily developed to address the scalability problem characteristic of SQL. As a result, it is schema-free and built on distributed systems, which makes it easy to scale and shard.
However, these benefits come at the cost of relaxing ACID principles: instead of keeping all four parameters consistent throughout every transaction, NoSQL uses the principle of eventual consistency. This means that if there are no new updates for a particular data item for a certain period of time, eventually all accesses to it will return the last updated value. That’s why such systems are usually described as providing BASE guarantees (Basically Available, Soft state, Eventual consistency) — as opposed to ACID.
While this approach greatly increases access time and scalability, it may lead to data loss – the severity of the problem depends on database server support and the quality of application code. In some cases, this issue might be very serious.
Another problem presented by NoSQL is the fact that there are many types of NoSQL systems, and there is little uniformity among them. Such characteristics as flexibility, performance, complexity, scalability and so on vary greatly among different systems, which makes choosing from among them difficult even for experienced specialists. Nevertheless, when properly chosen in concordance with the project’s peculiarities, NoSQL can provide a much more efficient solution than an SQL system without a significant loss in stability.
NewSQL is a relatively new approach that seeks to unite the best of both possible worlds using modern programming languages and technology that wasn’t available before. Its goal is to combine ACID guarantees of SQL with scalability and high performance of NoSQL.
Obviously, NewSQL looks quite promising due to combining the advantages that, in the past, existed only separately; and, perhaps, at some point in the future, it will become the standard used by the majority. Unfortunately, currently most NewSQL databases are either proprietary software or only apply to specific scenarios, which significantly limits the spread and adoption of the new technology.
In addition to that, NewSQL is anything but homogenous, and each solution has drawbacks and advantages of its own. For example, SAP HANA can easily handle low-to-medium transactional workloads but doesn’t use native clustering, MemSQL is useful for clustered analytics but shows poor consistency on ACID transactions, and so on. As a result, it will probably be quite a while before these solutions become really widespread.
There are a lot of myths and misconceptions surrounding SQL: for example, that SQL is outdated and should be replaced by NoSQL or New SQL whenever possible. Of course, that's not true. Currently, there is no clear-cut leader among the three basic alternatives – each of them is more suitable for some projects and less suitable (or outright unsuitable) in other situations. Therefore, there is no universal ideal choice. For example, if your primary consideration is that a database should be always immediately available to accept new and provide existing content, you should consider an eventual consistency solution like Cassandra or Riak. If you go for fast in-memory SQL, a new in-memory database like VoltDB seems to be the obvious choice; and so on.