Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Stinger.next: The Future of SQL in Hadoop

DZone's Guide to

Stinger.next: The Future of SQL in Hadoop

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

Hortonworks’ Stinger Initiative, which finished rolling out in April, expanded on the Hive engine to allow for interactive SQL queries at the Hadoop scale. Now Hortonworks has announced their next set of objectives for Hive, which they are calling Stinger.next.

The goal of Stinger.next is to continue improvements on the Hive engine to allow for its use in a wider range of situations. It will implement a number of changes that increase “the speed, scale and breadth of SQL support” in Hive.

As with the original Stinger Initiative, Stinger.next will be rolled out in incremental installments, with each installment focusing on a different major improvement to Hive. The first such improvement will add ACID transaction capabilities to enhance the ways in which users can interact with data:

Hive has been used as a write-once, read-often system, where users add partitions of data and query this data often. ACID is a major shift in the paradigm, adding SQL transactions that allow users to insert, update and delete the existing data. This allows a much wider set of use cases that require periodic modifications to the existing data. ACID will include BEGIN, COMMIT and ROLLBACK for multi-statement transactions in next releases.

But implementing ACID transactions to allow for data modification is only the beginning of Stinger.next’s changes to Hive. The next change will increase the speed of SQL queries within Hive to sub-second times, allowing for real-time data access. In order to increase query speed, a new hybrid engine will be used that includes a process called LLAP, or Live Long and Process.

The hybrid engine approach provides fast response times by efficient in-memory data caching and low-latency processing, provided by node resident processes. However, by limiting LLAP use to the initial phases of query processing, Hive sidesteps limitations around coordination, workload management and failure isolation that are introduced by running entire query within this process as done by other databases.

Finally, SQL:2011 Analytics will be added to include even more popular SQL constructs than Hive already has, including:

·  Non Equi-Joins

·  Set Functions – Union, Except and Intersect

·  Interval types

·  Most sub-queries, nested and otherwise

·  Fixes to syntactical differences from SQL:2011 spec, such as rollup

And these are just the major advances planned for the Hive engine. More improvements are planned and can be found on Hortonworks’ announcement. As for the roll-out, Hortonworks provides the following outline of the anticipated releases:

(Credit: and at HortonWorks)

With these changes implemented, Hive will become an even more useful tool for working with Big Data in Hadoop.

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

Topics:

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}