This article comes to you from the DZone Guide to Database and Persistence Management. For more information—including in-depth articles from industry experts, profiles on 35 database solutions, and more—click the link below to download your free copy of the guide.
The past decade has been an extremely exciting one in all matters related to data. We have had:
- An ever increasing amount of data produced by social media (once called “Web 2.0”)
- An ever increasing amount of data produced by devices (a.k.a. the Internet of Things)
- An ever increasing amount of database vendors that explore “new” models
Marketers, publishers, and evangelists coined the term “Big Data” and “NoSQL” to wrap up the fuzzy ideas around persistence technology that have emerged in the last decade. But how big is “Big Data”? Do you do “Big Data”? Do you need to do “Big Data”?
The truth is... you don’t. You can keep using the same architecture and relational databases you’ve always had, for these three reasons.
1. RAM Prices Are Crumbling
For a while, economists and researchers have been predicting the end of Moore’s Law. With single-CPU processing power reaching the limits of what’s possible in non-quantum physics, we can no longer scale up processing power cheaply. Multi-core processors have emerged, and vendors have encouraged sharing data across several machines for a long time.
But do we really need to distribute data onto several machines and deal with the struggles introduced by the CAP theorem in distributed environments? The truth is, you may not have to pay this price just yet!
While CPUs aren’t getting faster anymore, RAM continues to get cheaper. Today, a database server that can keep your whole database in memory constantly is fairly affordable.
Several vendors of commercial RDBMS have implemented in-memory computing capabilities and column stores in their standard products: Oracle 12c, SQL Server 2012, SAP HANA, just to name a few. One of the most interesting examples of a “classic” application that runs on a single centralized RDBMS is the Stack Exchange platform, with around 400 GB of RAM and 343M queries per day .
This architecture has been the preferred way for any software system that profited from the free ride offered by Moore’s Law in the past: “Just throw more hardware at it, and it’ll run again.” The nightmare of having to distribute such a system is postponed yet again.
In other words: The primary source of contention, the disk, is no longer the bottleneck of your application, and you can scale out on a single machine.
2. SQL is the Best Language for Querying
Winston Churchill may have been known for saying something like: “SQL is the worst form of database querying. Except for all the other forms.”
QUEL may have been the better language to start with from a technical perspective, but due to a cunning move by Oracle and IBM in the early 80s, SQL and the ANSI/ISO SQL standard won the race. One of the most fundamental criticisms of SQL was that it is not really a relational language. This was recognized as early as in 1983 by none other than C.J. Date in his paper “A Critique of the SQL Database Language,” but it was too late . SQL had already won the race. Why?
SQL has been designed to be used primarily by humans to create ad-hoc queries. It is remarkable that the SQL language is pretty much the only declarative language that has survived and maintained a continuous level of popularity.
Declarative thinking is hard enough when most people prefer to instruct machines in an imperative style. We can see attempts at creating such languages not only in databases, but also in general-purpose languages, such as Java. Specifically, Java EE is full of annotations that are used to tag types and members for an interpreter to pick up those tags and inject behavior. Once you combine JPA, JAXB, JAX-RS, JAX-WS, EJB, perhaps even Spring, and many other tools in a single application, you will immediately see how hard it is to still understand what all those declarative items mean, and how and when they interact. Usually, the behavior is not specified or ill-specified.
SQL is a comparatively simple declarative language, with a lot of limitations. This is good, because the lack of language features allows for implementations to quickly meet the SQL standard’s requirements. In fact, this generality of the SQL language has even lead to a variety of NoSQL databases adopting SQL or imitating SQL closely:
- SQL on Hadoop
- Phoenix for HBase
- Cassandra’s CQL resembles SQL
- JCR-SQL2 for content repositories
Not to mention, of course, the numerous actual SQL database implementations out there.
While popular ORMs help abstract away the tedious work of implementing CRUD with SQL, there has been no replacement thus far for SQL when it comes to querying and bulk data processing—it doesn’t matter if your underlying storage is relational, or non-relational.
In other words: The non-relational-ness of SQL is one if the language’s primary advantages, allowing technologies to incorporate other aspects of data processing, such as XML, OLAP, JSON, column stores, and many more.
3. Procedural Languages are Ideal for Computing
A third advantage that relational databases have over non-relational stores is the tight integration of procedural languages with the SQL language.
If scaling vertically is your strategy, you want to leverage as many cores as possible from your single database server, therefore, you want to run the computations as closely as possible to the data in your server’s RAM.
For many years, enterprise software architects have attempted to move all business logic into a middle tier, preferably written in J2EE and later in Java EE, helping large software vendors sell extremely expensive middleware components as a complement to their already expensive databases.
This may have been reasonable in the past since databases weren’t as powerful 20 years ago as they are today. Today, however, commercial SQL optimizers are extremely powerful. All you need to get your SQL up to speed is appropriate indexing. If you decide to use a platform that doesn’t allow you to use SQL, it will greatly increase your total cost of ownership for data processing logic. It is very hard to implement optimal execution plans manually, in the form of algorithms in your general-purpose imperative language, such as Java. There are ways around this, however, with APIs like jOOQ, which is a product our company works on that seamlessly integrates SQL or procedure calls into Java.
For everything that is not feasible with SQL, you can use your relational database’s procedural language, which allows you to implement more complex and more explicit algorithms directly in the database, keeping business logic very close to the data. Not only can such languages access the data very quickly, they can also interact with SQL for bulk data processing.
In other words: Moving some computation-heavy logic into the database may be your best choice for a variety of use cases. I predict that more tech firms will start building around in-memory databases and this will inevitably translate into a higher popularity for procedural SQL languages like Transact-SQL (T-SQL), PL/SQL and SQLScript.
RDBMS Have Won in the Past and Will Win Again
“When all you have is a hammer, every problem will look like a nail”
There couldn’t be a better metaphor for RDBMS. They’re hammers. They’re Swiss-Army-Hammers with millions of tools, and most developers can go as far as they need to with just that hammer. So, keep scaling up. With the hammer: SQL.
Lukas Eder is the founder and CEO of Data Geekery GmbH, located in Zurich, Switzerland. Data Geekery is a leading innovator in the field of Java/SQL interfacing. Their flagship product jOOQ is offering frictionless integration of the SQL language into Java.