Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Stinger.next: The Future of SQL in Hadoop

DZone's Guide to

Stinger.next: The Future of SQL in Hadoop

· Big Data Zone
Free Resource

Free O'Reilly eBook: Learn how to architect always-on apps that scale. Brought to you by Mesosphere DC/OS–the premier platform for containers and big data.

Hortonworks’ Stinger Initiative, which finished rolling out in April, expanded on the Hive engine to allow for interactive SQL queries at the Hadoop scale. Now Hortonworks has announced their next set of objectives for Hive, which they are calling Stinger.next.

The goal of Stinger.next is to continue improvements on the Hive engine to allow for its use in a wider range of situations. It will implement a number of changes that increase “the speed, scale and breadth of SQL support” in Hive.

As with the original Stinger Initiative, Stinger.next will be rolled out in incremental installments, with each installment focusing on a different major improvement to Hive. The first such improvement will add ACID transaction capabilities to enhance the ways in which users can interact with data:

Hive has been used as a write-once, read-often system, where users add partitions of data and query this data often. ACID is a major shift in the paradigm, adding SQL transactions that allow users to insert, update and delete the existing data. This allows a much wider set of use cases that require periodic modifications to the existing data. ACID will include BEGIN, COMMIT and ROLLBACK for multi-statement transactions in next releases.

But implementing ACID transactions to allow for data modification is only the beginning of Stinger.next’s changes to Hive. The next change will increase the speed of SQL queries within Hive to sub-second times, allowing for real-time data access. In order to increase query speed, a new hybrid engine will be used that includes a process called LLAP, or Live Long and Process.

The hybrid engine approach provides fast response times by efficient in-memory data caching and low-latency processing, provided by node resident processes. However, by limiting LLAP use to the initial phases of query processing, Hive sidesteps limitations around coordination, workload management and failure isolation that are introduced by running entire query within this process as done by other databases.

Finally, SQL:2011 Analytics will be added to include even more popular SQL constructs than Hive already has, including:

·  Non Equi-Joins

·  Set Functions – Union, Except and Intersect

·  Interval types

·  Most sub-queries, nested and otherwise

·  Fixes to syntactical differences from SQL:2011 spec, such as rollup

And these are just the major advances planned for the Hive engine. More improvements are planned and can be found on Hortonworks’ announcement. As for the roll-out, Hortonworks provides the following outline of the anticipated releases:

(Credit: and at HortonWorks)

With these changes implemented, Hive will become an even more useful tool for working with Big Data in Hadoop.

Easily deploy & scale your data pipelines in clicks. Run Spark, Kafka, Cassandra + more on shared infrastructure and blow away your data silos. Learn how with Mesosphere DC/OS.

Topics:

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}