Using an Impala JDBC Driver to Query Apache Kudu
Learn how to access the Progress DataDirect Impala JDBC driver so that you can query Kudu tablets using Impala SQL syntax.
Join the DZone community and get the full member experience.
Join For FreeApache Kudu is columnar storage manager for Apache Hadoop platform that provides fast analytical and real-time capabilities, efficient utilization of CPU and I/O resources, the ability to do updates in place and an evolvable data model that’s simple. You can learn more about Apache Kudu features in detail from the documentation.
One of the features of Apache Kudu is that it has a tight integration with Apache Impala, which allows you to insert, update, delete, or query Kudu data, along with several other operations. In this tutorial, we will walk you through on how you can access Progress DataDirect Impala JDBC driver to query Kudu tablets using Impala SQL syntax.
Prerequisites
Before you start with this tutorial, we expect you to have an existing Apache Kudu instance with Impala installed. If you don’t, you can follow this getting started tutorial to spin up an Apache Kudu VM and load the data into it.
This tutorial also assumes that you have the Progress DataDirect Impala JDBC driver. If you do not, follow these three simple steps:
- Download the Cloudera Impala JDBC driver.
- Once the package is downloaded, unzip the package and run the program
PROGRESS_DATADIRECT_JDBC_INSTALL.exe
. - The installation process will be simple — just follow the instructions. For most users, the default settings will be sufficient to install the driver successfully.
Configure and Test Connection
- To configure and connect to Apache Kudu using the DataDirect Impala JDBC driver, we will be using SQL Workbench.
- Open SQL Workbench and go to File > Connect Window, which will open a new window. On the bottom left of that window, you will find a button named Manage Drivers. Click on it.
- Add a new driver by clicking on the New button. Give the name as Impala and browse the path to
impala.jar
, which will be in thelib
folder of installed directory, as shown below. Click OK once you are finished. You should be back at the Connect window. Create a new connection, give any name to it, and choose Impala(com.ddtek.jdbc.impala.ImpalaDriver) as your driver.
Fill in the URL for connection in the following format and credentials in respective fields as shown below.
jdbc:datadirect:impala://<Server_Address>:<port>
Click on the Test button and you should be able to connect successfully. Click on OK and you should now be able to query your Apache Kudu without any problem.
Sample Queries
Once you have followed this getting started tutorial for Apache Kudu, you can run queries against the data.
For example, here are a few basic queries to test it out:
select * from sfmta LIMIT 1
INSERT INTO sfmta VALUES(1323, 123, -122.32, 32.22, 12.322, 52.0)
select * from sfmta where report_time = 1323
We hope this tutorial helped you to get connected to Apache Kudu using Progress DataDirect Impala JDBC driver.
Published at DZone with permission of Saikrishna Teja Bobba, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments