Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Impala and SQL on Hadoop

DZone's Guide to

Impala and SQL on Hadoop

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

The origins of Impala can be found in F1 – The Fault-Tolerant Distributed RDBMS Supporting Google’s Ad Business.

One of many differences between MapReduce and Impala is that in Impala, the intermediate data moves from process to process directly instead of storing it on HDFS for processes to get at the data needed for processing.  This provides a HUGE performance advantage and doing so while consuming few cluster resources.   Less hardware to-do more!

impala

There are many advantages to this approach over alternative approaches for querying Hadoop data, including:

  • Thanks to local processing on data nodes, network bottlenecks are avoided.
  • A single, open, and unified metadata store can be utilized.
  • Costly data format conversion is unnecessary and thus no overhead is incurred.
  • All data is immediately query-able, with no delays for ETL.
  • All hardware is utilized for Impala queries as well as for MapReduce.
  • Only a single machine pool is needed to scale.

We encourage you to read the documentation for further exploration!

There are still transformation steps required to optimize the queries but Impala can help to-do this for you with Parquet file format.  Better compression and optimized runtime performance is realized using the ParquetFormat though many other file types are supported.

This and a whole lot more was discussed with Marcel Kornacker the Cloudera Architect behind Impala on Episode 18 of the All Things Hadoop Podcast.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}