SQL is old (or experienced, whichever you prefer)! It was designed back in 1970s by pioneers like C.J Date, E.F Codd, Don Chapberlin, Ray Boyce, and others. There are only a few languages that can claim relevance over such a long history. So how is SQL still alive and kicking after such a long history? How did it survive multiple waves of disruptive changes? I think there are a few reasons:
SQL is super easy to understand!
SQL is composable! You construct very sophisticated data processing logic by breaking down data processing to simpler logic and later embedding, nesting, joining, and with other powerful transforms within the language.
SQL is very extensible! In fact, SQL has been adapted to many changes in data model over time. These days, SQL is extending to support the new JSON standard and it continues to be the de-facto standard for data processing.
JSON Implementations in Databases
It is clear that all types of databases are on boarding JSON. However there are varying levels of "native-ness" they can offer.
- On the native-ness level, most native implementations are the new JSON based operational document databases — Couchbase, MongoDB, DocumentDB etc come with JSON documents as the data model and provide fast access to data natively taking JSON in and serving JSON out. Most provide SQL based languages to query (Couchbase Server N1QL and DocumentDB SQL) the data except MongoDB (see find() mothod).
Relational Databases are retrofitting JSON by using a data type much like XML. SQL Server, Oracle, MySQL, and Postgres all are following this path. The query language provides functions to manipulate JSON. They translate JSON in and out of their native data type — tables, columns and rows.
Even some columnar and key value databases like Cassandra, MapR-DB, Riak are on board with JSON. They are trying to extend their native models with a JSON cover on top, much like relational databases.
Native JSON access in SQL
Not all implementations of SQL over JSON are created equal. Some have to jump through hoops (or functions) to adapt JSON to the native data type of the system. There are a few key attributes to make the JSON access native to SQL. These attributes make for a high performance JSON/SQL combination that is both flexible when it comes to modelling data, and powerful when it comes to querying.
#1 Ingest & Serve JSON without Translation
JSON can be sparse and its flexible structure mean that if you translate JSON to other representations that does not have the flexibility, you likely will bloat the data and loose performance in the back & forth translations. SQL should be working with JSON without translation... Just JSON in and JSON out.
#2 Provide Access to Nested Structures Natively
JSON is highly nested and ability to reference the deep nested structures without compromise is important. Using the "dot" notation, you should be able to reference attributes embedded in nested hierarchies within the document.
The native capabilities should also allow you to unfold and fold the nesting in JSON using commends like NEST and UNNEST.
The following example show an example of UNNEST to flatten trips under a flight code in Couchbase Server N1QL.
#3 JOINs and Subqueries
JSON is powerful in representing relationships but there is still no good reason to represent "products" and "customers" in the same document. That would be insane in many cases! You still may need "order" documents referencing "products" and "customers" documents. So you need JOINs in your SQL. and support for subqueries in SELECT, FROM and WHERE to make sure you can reference documents without compromise and SQL language composable to let you reconstruct the projections you like.
I want to iterate over this one more time as some NoSQL platforms have a "NO JOINs" policy. The following example shows options for modeling an embedded vs referenced document model for your data. JSON can embed relationships but for large breweries that have many beers, or for instruments that produce many measurements or in other many to many relationships, embedding "beers" or "measurements" do not make sense. It would make the document size explode. So we need JOINs and subqueries and more to make sure we can model data flexibly.
By the way, you can see more examples of document data modeling with JSON here.
#4 Handle MISSING Values
NULL is a super useful concept and has been a powerful aspect of SQL for a long time. JSON needs NULLs but that isn't enough. JSON also needs MISSING. JSON can be sparse and JSON schemas evolve fast. So both the database and JSON processing should allow fast evolution of JSON schema and should not error under cases where attributes are not only NULL but are MISSING altogether.
So how would MISSING work in SQL? The following example from N1QL demonstrates how MISSING can be used to detect and process 2 versions of a flight detail document. v1 and v2 contain different representations for trips. The query language adapts to help run the right filter predicate for each version.
#5 Handle Nested Sets and Collections
JSON structure is full of nested documents and arrays. This is the basic muscle of JSON data model over relational structures. The unnatural acts of normilizing "order details" to another table away from orders is not necessary in JSON and that keeps the data model simple and natural for the apps. No more ORM (object relational mapping) layers... JSON is naturally a fit for serializing data in apps. However in cases like this ("order details embeded as an array in "orders" document), iterating through hierarchies and arrays in SQL are critically important. Otherwise you cannot take full advantage of JSON.
So how does SQL extensions enable this flexiability? The following example from Couchbase Server N1QL demonstrates the ANY ... IN/WITHIN syntax for processing arrays. In this case, IN allows ANY to iterate over a given level of the Robotics tree (value of "item" is assigned to be "AI", "SPEECH" and "P3"). WITHIN allows recursive iteration over the entire tree (value of "item" is assigned to be each node of the Robotics tree including all green and gray nodes).
I'll end with a prediction: Given the prominence of JSON, soon you will be looking to process JSON with SQL. Before choosing your data processing platform for JSON, compare and contrast platforms against these flexibilities and see how they score!
If your data processing platform can satisfy these 5 capabilities fully without compromise, you wont have to compromise on flexibility of your data model with constant acrobatics in the way you model "your JSON". You won't have to work around limitations of the query language and bend your data model to these limitations! Instead, you will get a data platform and a query language that is flexible enough to accommodate the data model that fits best your application logic.
- SQL++ : SQL++ is a highly expressive semi-structured query language that encompasses both the SQL and the JSON data model. SQL++ is SQL backwards-compatible and does a good job detailing these and other requirements for extending SQL to process JSON http://forward.ucsd.edu/sqlpp.html
- Document Databases with native JSON processing: