Over a million developers have joined DZone.

JSON Gene in SQL: How SQL Extensions are Helping SQL Adapt to Flexibility of JSON

How has SQL lasted since the 1970s while other technologies are phased out or become laughing stocks of the industry?

· Database Zone

Sign up for the Couchbase Community Newsletter to stay ahead of the curve on the latest NoSQL news, events, and webinars. Brought to you in partnership with Coucbase.

SQL is old (or experienced, whichever you prefer)! It was designed back in 1970s by pioneers like C.J Date, E.F Codd, Don Chapberlin, Ray Boyce, and others. There are only a few languages that can claim relevance over such a long history. So how is SQL still alive and kicking after such a long history? How did it survive multiple waves of disruptive changes? I think there are a few reasons: 

  • SQL is super easy to understand!

  • SQL is composable! You construct very sophisticated data processing logic by breaking down data processing to simpler logic and later embedding, nesting, joining, and with other powerful transforms within the language.

  • SQL is very extensible! In fact, SQL has been adapted to many changes in data model over time. These days, SQL is extending to support the new JSON standard and it continues to be the de-facto standard for data processing.

JSON Implementations in Databases

It is clear that all types of databases are on boarding JSON. However there are varying levels of "native-ness" they can offer.

  • On the native-ness level, most native implementations are the new JSON based operational document databases — Couchbase, MongoDB, DocumentDB etc come with JSON documents as the data model and provide fast access to data natively taking JSON in and serving JSON out. Most provide SQL based languages to query (Couchbase Server N1QL and DocumentDB SQL) the data except MongoDB (see find() mothod).
  • Relational Databases are retrofitting JSON by using a data type much like XML. SQL Server, Oracle, MySQL, and Postgres all are following this path. The query language provides functions to manipulate JSON. They translate JSON in and out of their native data type — tables, columns and rows.

  • Even some columnar and key value databases like Cassandra, MapR-DB, Riak are on board with JSON. They are trying to extend their native models with a JSON cover on top, much like relational databases.

  • Aside from databases, many SQL based analytics languages like Drill, Hive, Spark and others also have extended their languages to process JSON.

Native JSON access in SQL

Not all implementations of SQL over JSON are created equal. Some have to jump through hoops (or functions) to adapt JSON to the native data type of the system. There are a few key attributes to make the JSON access native to SQL. These attributes make for a high performance JSON/SQL combination that is both flexible when it comes to modelling data, and powerful when it comes to querying.

#1 Ingest & Serve JSON without Translation

JSON can be sparse and its flexible structure mean that if you translate JSON to other representations that does not have the flexibility, you likely will bloat the data and loose performance in the back & forth translations. SQL should be working with JSON without translation... Just JSON in and JSON out.

#2 Provide Access to Nested Structures Natively

JSON is highly nested and ability to reference the deep nested structures without compromise is important. Using the "dot" notation, you should be able to reference attributes embedded in nested hierarchies within the document. 

Image title

The native capabilities should also allow you to unfold and fold the nesting in JSON using commends like NEST and UNNEST.

The following example show an example of UNNEST to flatten trips under a flight code in Couchbase Server N1QL. 

Image title

#3 JOINs and Subqueries

JSON is powerful in representing relationships but there is still no good reason to represent "products" and "customers" in the same document. That would be insane in many cases! You still may need "order" documents referencing "products" and "customers" documents. So you need JOINs in your SQL. and support for subqueries in SELECT, FROM and WHERE to make sure you can reference documents without compromise and SQL language composable to let you reconstruct the projections you like.

I want to iterate over this one more time as some NoSQL platforms have a "NO JOINs" policy. The following example shows options for modeling an embedded vs referenced document model for your data. JSON can embed relationships but for large breweries that have many beers, or for instruments that produce many measurements or in other many to many relationships, embedding "beers" or "measurements" do not make sense. It would make the document size explode. So we need JOINs and subqueries and more to make sure we can model data flexibly. 

By the way, you can see more examples of document data modeling with JSON here.

Image title

#4 Handle MISSING Values

NULL is a super useful concept and has been a powerful aspect of SQL for a long time. JSON needs NULLs but that isn't enough. JSON also needs MISSING. JSON can be sparse and JSON schemas evolve fast. So both the database and JSON processing should allow fast evolution of JSON schema and should not error under cases where attributes are not only NULL but are MISSING altogether. 

So how would MISSING work in SQL? The following example from N1QL demonstrates how MISSING can be used to detect and process 2 versions of a flight detail document. v1 and v2 contain different representations for trips. The query language adapts to help run the right filter predicate for each version.

Image title

#5 Handle Nested Sets and Collections

JSON structure is full of nested documents and arrays. This is the basic muscle of JSON data model over relational structures. The unnatural acts of normilizing "order details" to another table away from orders is not necessary in JSON and that keeps the data model simple and natural for the apps. No more ORM (object relational mapping) layers... JSON is naturally a fit for serializing data in apps. However in cases like this ("order details embeded as an array in "orders" document), iterating through hierarchies and arrays in SQL are critically important. Otherwise you cannot take full advantage of JSON. 

So how does SQL extensions enable this flexiability? The following example from Couchbase Server N1QL demonstrates the ANY ... IN/WITHIN syntax for processing arrays. In this case, IN allows ANY to iterate over a given level of the Robotics tree (value of "item" is assigned to be "AI", "SPEECH" and "P3"). WITHIN allows recursive iteration over the entire tree (value of "item" is assigned to be each node of the Robotics tree including all green and gray nodes).

Image title

I'll end with a prediction: Given the prominence of JSON, soon you will be looking to process JSON with SQL. Before choosing your data processing platform for JSON, compare and contrast platforms against these flexibilities and see how they score! 

If your data processing platform can satisfy these 5 capabilities fully without compromise, you wont have to compromise on flexibility of your data model with constant acrobatics in the way you model "your JSON". You won't have to work around limitations of the query language and bend your data model to these limitations! Instead, you will get a data platform and a query language that is flexible enough to accommodate the data model that fits best your application logic. 


The Getting Started with NoSQL Guide will get you hands-on with NoSQL in minutes with no coding needed. Brought to you in partnership with Couchbase.

sql ,nosql ,big data ,big data analytics ,rdbms

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}