Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Big Data Analytics: Interactive Queries Using Presto on Microsoft Azure Data Lake Store and Qubole

DZone's Guide to

Big Data Analytics: Interactive Queries Using Presto on Microsoft Azure Data Lake Store and Qubole

Analysts with existing skills can connect their preferred BI apps to the Qubole-managed Presto cluster via a driver to make adoption easy and maintain productivity.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

In addition to the recent announcement of the integration of Microsoft's Azure Data Lake Store (ADLS) with Qubole Data Service (QDS), Qubole now offers interactive querying capability on ADLS with Presto. This is a major benefit for businesses that want to do interactive queries against large datasets using the same Hive metastore leveraged by ETL process on Hive and data science use cases on Spark.

ADLS is an enterprise-grade hyper-scale repository for big data workloads. It enables you to capture and process data of any size, type, and ingestion speed in one single place. ADLS implements the open-source Apache® Hadoop Distributed File System (HDFS) compatible interface. With its HDFS support, you can easily migrate your existing Hadoop and Spark datasets to the cloud without recreating your HDFS directory structure.

Presto is a distributed SQL engine designed for running interactive analytics queries. Using a memory-oriented architecture in place of HDFS storage, Presto is able to rival the performance of traditional data warehouses but at a fraction of the cost, by relying on a horizontally scalable layer of memory optimized compute nodes backed on to affordable and secure ADLS storage for your data lake. Presto can be used in place of other well known interactive open-source query engine such as Impala, Hive or traditional SQL data warehouses.

Analysts with existing skills can connect their preferred BI applications (i.e. Power BI, Tableau, Looker, Qlik, etc.) to the Qubole-managed Presto cluster via a common ODBC/JDBC driver. This makes adoption easy and maintains productivity.

ADLS vs. Azure Blob Store

ADLS is storage optimized for big data workloads of all kinds — batch, interactive, and streaming and all types, both structured and unstructured. On the other hand, Azure Blob Store is a general-purpose object store that works well for a variety of use cases and is not specially tuned for read/write accesses of big data workloads. With ADLS, there are no limits on the amount of data you can store and it is optimized for high-throughput and input/output operations per second (IOPS). ADLS also enforces HTTPS protocol for data transfer to and from the store, thereby enforcing better security.

For more details, visit here.

Highlights of Presto, ADLS, and QDS Integration

  • Use Qubole to efficiently manage your Presto clusters by automatically taking care of provisioning and de-provisioning, along with dynamic auto-scaling the size of the cluster to handle the workload at any point in time.
  • Connect your favorite BI application to a Qubole-managed Presto cluster.
  • Configure QDS accounts with ADLS credentials for seamless and transparent access to ADLS on all (Hadoop, Spark, etc.) clusters in your account.
  • Run Apache Hive, Hadoop, Spark, and Presto queries through QDS platform, which is now capable of accessing data in your ADLS.
  • Migrate data from on-premise storage to ADLS using built-in native tools (in QDS) from a diverse set of storage solutions such as Azure SQL Service, Azure SQL Data Warehouse, Microsoft SQL Server, MySQL, and more.
  • Migrate data from cloud object stores using distributed Hadoop (MapReduce) job from Azure Blob Storage to ADLS.

Getting Started

Let's see how to get started with ADLS, a free QDS Business Edition on Azure, and using Presto and ADLS with QDS.

ADLS

Sign up for Azure portal and create an ADLS account. For detailed steps, visit here.

Free QDS Business Edition on Azure

Sign up for free QDS Business Edition on Azure by visiting here. For detailed steps, visit here.

Using Presto and ADLS With QDS

For detailed steps, visit here.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
big data ,presto ,microsoft azure ,apache hadoop ,data science ,etl ,hive ,queries

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}