Over a million developers have joined DZone.

Stinger.next: The Future of SQL in Hadoop

· Big Data Zone

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

Hortonworks’ Stinger Initiative, which finished rolling out in April, expanded on the Hive engine to allow for interactive SQL queries at the Hadoop scale. Now Hortonworks has announced their next set of objectives for Hive, which they are calling Stinger.next.

The goal of Stinger.next is to continue improvements on the Hive engine to allow for its use in a wider range of situations. It will implement a number of changes that increase “the speed, scale and breadth of SQL support” in Hive.

As with the original Stinger Initiative, Stinger.next will be rolled out in incremental installments, with each installment focusing on a different major improvement to Hive. The first such improvement will add ACID transaction capabilities to enhance the ways in which users can interact with data:

Hive has been used as a write-once, read-often system, where users add partitions of data and query this data often. ACID is a major shift in the paradigm, adding SQL transactions that allow users to insert, update and delete the existing data. This allows a much wider set of use cases that require periodic modifications to the existing data. ACID will include BEGIN, COMMIT and ROLLBACK for multi-statement transactions in next releases.

But implementing ACID transactions to allow for data modification is only the beginning of Stinger.next’s changes to Hive. The next change will increase the speed of SQL queries within Hive to sub-second times, allowing for real-time data access. In order to increase query speed, a new hybrid engine will be used that includes a process called LLAP, or Live Long and Process.

The hybrid engine approach provides fast response times by efficient in-memory data caching and low-latency processing, provided by node resident processes. However, by limiting LLAP use to the initial phases of query processing, Hive sidesteps limitations around coordination, workload management and failure isolation that are introduced by running entire query within this process as done by other databases.

Finally, SQL:2011 Analytics will be added to include even more popular SQL constructs than Hive already has, including:

·  Non Equi-Joins

·  Set Functions – Union, Except and Intersect

·  Interval types

·  Most sub-queries, nested and otherwise

·  Fixes to syntactical differences from SQL:2011 spec, such as rollup

And these are just the major advances planned for the Hive engine. More improvements are planned and can be found on Hortonworks’ announcement. As for the roll-out, Hortonworks provides the following outline of the anticipated releases:

(Credit: and at HortonWorks)

With these changes implemented, Hive will become an even more useful tool for working with Big Data in Hadoop.

Hortonworks Sandbox is a personal, portable Apache Hadoop® environment that comes with dozens of interactive Hadoop and it's ecosystem tutorials and the most exciting developments from the latest HDP distribution, brought to you in partnership with Hortonworks.


The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}