Reaching a New Level of NoSQL ACIDity
Reaching a New Level of NoSQL ACIDity
Take a look at making NoSQL options ACID.
Join the DZone community and get the full member experience.Join For Free
There is no way to deny that most data is unstructured. The biggest challenge for developers is how to acquire data as smoothly and quickly as possible. As data becomes more complex, the relational model for storing and processing as the default approach no longer holds. Excluding the (now) rare cases where the appropriate model is going relational, the obvious solution is to go non-relational.
Herein lies the first problem for decision makers. While there has been a big effort to make NoSQL options ACID, in most cases, that has been an afterthought. There is no denying that building an ACID system is hard work. In most cases, it becomes a tradeoff between consistency and efficiency. As a decision maker, I am always being told: “You can either have efficiency of processing or consistency.” I tend to agree, achieving both is hard.
Even in cases where ACIDity is the default mode of operation, I find subtle annotations at the margin like: “We support single document transactions”  or even in SQL based systems, the standard allows the developer to change the isolation level . And let’s not enter into the minutia of what guarantee the database itself asks from the OS/Runtime. Spoiler alert, many database engines fail at this . The problem is that there is only a single ACID, the rest while arguably useful, is plain non-ACID.
What makes matters worse is that apparently, based on industry behavior, there are three levels to ACID. One for show, one for real, and the one that applies to the actual technology. As a decision maker, my job is to ensure that we treat data with the respect it deserves. I am not in show business; I get paid to ensure we keep our data safe.
The industry standard for fully transactional is ACID across your entire database. This is where a transaction must be written in its entirety as a stand-alone event to the database or not at all. This gives the database a chance to replicate throughout the cluster, giving users high-availability and data safety.
Even if the system crashes after the data was committed to a single server, but not yet replicated, once the system resumes operations the information will be immediately replicated to the rest of the nodes in the cluster.
However, that is where the good news ends. The problem is that when I am asked how safe our data is, usually every guarantee goes down the drain with the usual distributed systems we must build for scalability.
Say you have a reservation system for flights. The database consists of a 5-node cluster with 15 clients manned by ticket agents all connected to different nodes. At a locally transactional database, each purchase or cancellation will be fully committed to a node and then replicated throughout the cluster.
What happens when the same seat is booked by 2 agents and committed to disk at two different nodes at the same time? They are committed to their respective node, and once it's time to replicate, they clash — both fighting to replicate their version of how the entire cluster should be updated, leading to inconsistencies that must be fixed somehow.
MarkLogic and MongoDB are examples of where that would happen because the developers must provide those guarantees by hand. What could possibly go wrong?
This has existed for a long time in the SQL deployments. Distributed Transactions anyone? It’s setup and use has always been awkward and very error-prone.
To solve these kinds of problems, RavenDB has included in its 4.1 version transactional guarantees throughout the entire cluster. Instead of a transaction committing when data is safe on a specific node, it will be committed once it's committed (and agreed upon) to the entire cluster. This resolves the issue of concurrency and different actions fighting to represent the new state of data.
Having transactional guarantees across the cluster is a real win for nonrelational data management. The right solution must be easy to develop correct software with, but it also must have other qualities that are paramount for day-to-day operations. Being quick to set up and secure your database, adding new nodes once your database, and maintain the highest quality in data safety are a must for cluster operations.
Our company is an early adopter of RavenDB, and RavenDB has been ACID since before we started using it. They may be the first fully ACID, no annotations on the margin, nonrelational, transactional databases.
Understanding that the proliferation of unstructured data would eventually require nonrelational databases to offer the most important components of relational databases, RavenDB has been building on the NoSQL ACID model for almost a decade.
As the Internet of Things and Big Data intensify the need for better unstructured data infrastructure; cluster-wide fully transactional databases present the ideal solution over the coming years.
Opinions expressed by DZone contributors are their own.