Data Storage in 2021: Choosing the Right Tools for the Job
The functional needs of a business should drive your choice of data storage.
Join the DZone community and get the full member experience.Join For Free
Editor’s Note: The following is an article written for and published in DZone’s 2021 Data Persistence Trend Report.
Reading about the death of the relational database seems like a regular occurrence. However, here we are in 2021, and the relational data store is going strong. If we look at the DB-Engines Ranking website, six of the top 10, including the top four spots, are all relational data stores. Evidently, structured, or relational, data storage is here to stay. Yet four of the top spots are held by non-relational engines. Could that mean that relational data storage is really dying?
The core of this question is not really which kind of data store is better between a relational, normalized structure or a non-relational, denormalized storage mechanism. No, the real core question is: What kind of data store should your organization be using in 2021?
The shortest possible way I can answer this question is as follows:
All of them.
The fact is, you’re much better off not trying to answer your data needs with one, single methodology. Let’s discuss why.
Relational Data Stores
The concept of the relational data store goes all the way back to 1970 with E.F. Codd laying down the mathematics involved (see Additional Resources). That means the technology is 51 years old. Surely, such an old technology is on its last legs?
Well, the interesting thing about reality is that so very much of it maps quite neatly to a relational data store. For example, think about a boutique store — they have an inventory, sales transactions, and a list of customers. Each of these concepts maps very neatly into structured, relational storage:
People also continue discovering just how versatile relational data is. While the data stored is defined by rows and columns in tables through the Structured Query Language (SQL), you can query the data in all sorts of interesting ways, such as filtering and aggregating across the data structure, to arrive at new structures with new meanings. For example, get a count of transactions for all customers from Kansas.
Further, SQL is extremely simple to learn and use, so everyone from your development team to the analysts are capable of putting it to work. Finally, partly because of the nature of relational data storage itself, most relational data engines have a strict transaction mechanism to ensure data consistency — that data only gets updated, deleted, or inserted once. This makes things like strict inventory management or certain financial transactions possible.
The issue with relational data storage is that you have to define a structure. Then, once you’ve defined a structure, you must adhere to that structure. The tightly defined structure can make coding much more difficult. It also makes changes harder to implement. Scaling a relational data store can be extremely difficult. Coding transactions incorrectly can cause errors, outages, and all sorts of problems. All of these weaknesses and challenges are why we’ve seen the growth in adoption of other data storage solutions.
Non-Relational Data Stores
The concept of storing data in a denormalized, or unstructured, state goes all the way back to the origins of computing before there was such a thing as data normalization. However, the modern concepts of NoSQL (originally, NO S-Q-L but currently meaning Not Only SQL) and the unstructured or semi-structured data stores go back to 2009 (see Additional Resources).
The beauty of the unstructured data store is that, quite simply, you don’t have to worry about the data structure. A data store like MongoDB stores the documents that you want stored in an ID/Value pair. Because there is no required structure for the JSON document, you can put anything into that document and then later retrieve it from the data store. The speed of development is radically enhanced, and the ability to change is limited only by your speed at coding that change.
As you can see in this example of a document store, you’re not limited in what you choose to place in the document as long as your code can deal with what’s there. The first two documents are for some type of book sales, while the final document is a list of movie titles. Yet these are easily and readily stored in a document data store.
Non-relational stores are also extremely attractive because they specialize, meaning each one stores a very particular type of data that lends itself to equally particular behaviors. I already mentioned a document store, MongoDB. However, there are search-engine-specific data stores (e.g., ElasticSearch) that focus on indexing text, the wide column data store Cassandra which stores data in columns instead of rows, as well as all sorts of others. Each of these different data stores is extremely efficient within its defined parameters (see Additional Resources).
In general, non-relational data stores have been designed with scaling in mind. In fact, because they’re not dependent on defined structures, requiring specific IDs, they lend themselves easily to scaling out. However, like so many things, there are also weaknesses.
Querying data from within a document store, for example, is much harder to do than the same kind of query within a relational data store. Setting up reporting and analytical queries into the data is much more difficult for non-developers to implement since it may require programming. Finally, the lack of transactional consistency means that a non-relational data store could simply be a dangerous place to store inventory or financial transactions since a bad actor could conceivably withdraw from the bank account multiple times without the transaction showing up.
This is an excerpt from DZone's 2021 Data Persistence Trend Report.
Read the Report
Short Answer? You Need Both
Instead of trying to fight this from either point of view, the right approach is to accept a very simple reality: Both relational and non-relational data stores serve a real purpose in the modern development landscape.
In this way, I’m simply saying that for the modern business, it’s easy to find yourself in the situation where for one part of the business, or one part of the data managed by the business, a relational store will work better. Yet for other parts of the business, a non-relational store makes more sense. So rather than attempt to force the storage all one way or another, simply accept that the functional needs of the business should drive the choice of data storage.
Real-World Use Cases
Let’s say your organization makes widgets that are distributed all around the world, and you’re capturing telemetry data from those widgets, collecting millions of data points constantly. A structured data store is likely to fall over under such a load. Also, when you decide to capture one additional point of data, you simply update your code and you’re good to go. No rebuilding a database necessary. In this situation, you’d be mad to attempt this with a relational data store.
Now, let’s completely change the scenario.
Your organization manages the sale of precious metals. You have a very precise set of stock — this much gold, that much silver, that small pile of platinum over there. In this case, you have an extremely well-defined set of data and need extremely tight controls on the inventory. You simply can’t sell more platinum than you have in the inventory. All of the flexibility and speed of development of a non-relational data store doesn’t do much for you in this scenario. Further, the lack of strict transactions could lead to all sorts of problems, like selling more platinum than you have in your inventory or selling the same piece of platinum to more than one customer. Here, you’d be mad to try to manage your data in a search engine data store.
No, the fact is, while technology has grown and shifted in ways that E.F. Codd could never have imagined back in 1970, the mathematics of the relational data store still work. Yet the limits of and pain involved in maintaining relational data structures, for some types of data, demand a different approach to data storage and management. In this day and age, a wise technologist will examine their data needs and pick the tool appropriate for the job, not attempt to force everything into a single box or category. This is because using the correct tool for the job makes the job easier. Making the job easier, increases your chances for success.
So don’t fight about which kind of data storage is superior. Instead, define the parameters around what you need for your data stores. Then, pick the right tools for the job.
Codd, E F. A relational model of data for large shared data banks. Communications of the ACM, Edited by P. Baxendale, vol. 13, no. 6, June 1970, www.seas.upenn.edu/~zives/03f/cis550/codd.pdf.
Marr, B. (2019, October 18). What's the difference between structured, semi-structured and unstructured data? Forbes. https://www.forbes.com/sites/bernardmarr/2019/10/18/whats-the-difference-between-structured-semi-structured-and-unstructured-data/?sh=18ed93a2b4d3.
Kshirsagar, D. (2020, August 8). Understand data store models. Azure Application Architecture Guide. https://docs.microsoft.com/en-us/azure/architecture/guide/technology-choices/data-store-overview.
Opinions expressed by DZone contributors are their own.