Over a million developers have joined DZone.

Developing Python Applications Against Apache Phoenix (HBase)

Tim Spann explains how you can develope Python applications to access HBase big data on Hadoop via the Phoenix JDBC/SQL API.

· Big Data Zone

Learn how you can maximize big data in the cloud with Apache Hadoop. Download this eBook now. Brought to you in partnership with Hortonworks.

First off, Apache Phoenix is an SQL interface to the HBase database on Hadoop. You can access this database with a specialized connector for Spark: Phoenix with Spark.

Phoenix allows for non-JVM developers to access HBase in an easy, standard way. Python is popular for data sciences and web applications. This article will get you started Developing Phoenix Apps with Python. Check out the documentation here

Installing the library to access Phoenix from Python is very easy. First off, make sure that you have Python 2.7 or Python 3.x. If you don't have PIP installed, you can see the instructions below. It's a simple one-line PiP install for PhoenixDB.

wget https://bootstrap.pypa.io/get-pip.py
python get-pip.py
pip install phoenixdb

If you get stuck installing Phoenix libraries for Python, then check out the documentation. You will need to have Apache HBase and Apache Phoenix Running. From your Ambari admin console, grab the server for Phoenix IP/domain name and port. The default port is 8765, but someone may have changed it. These can run either standalone or part of a cluster.  The easiest way is to download a distribution like HDP 2.5.

Access Phoenix Using the Thin Client on HDP

 /usr/hdp/current/phoenix-client/bin/sqlline-thin.py http://server:8765/ 

The newest version of the Hortonworks distribution of Hadoop, HDP 2.5, has Apache Phoenix 4.7.0. Make sure you know the version of Phoenix that you are running. Phoenixdb doesn't work with JSON only Protobuf, so you have to manually change that.

To develop web application against Phoenix, you can use Python with Flask and rapidly developer web apps or REST/JSON Web APIs. Again, make sure you have Python 2.7 or 3 with PiP installed.  You may need root access or need to install via sudo.

To Install Python's Flask Library

pip install gunicorn flask

You can also access Phoenix through standard drivers for .NET and Java. Microsoft provides Phoenix Drivers for Microsoft's .NET. They work fine and are well-supported for HDInsight and other hosted Phoenix servers.

You can finally also access Apache Phoenix via the Phoenix JDBC Drivers that can be used by BI and ETL tools as well as for coding in Java and Scala.

Leave a comment if you are looking for more help or have questions. Thanks!

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

Topics:
big data ,hadoop ,python ,hortonworks

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}