Developing Python Applications Against Apache Phoenix (HBase)
Tim Spann explains how you can develope Python applications to access HBase big data on Hadoop via the Phoenix JDBC/SQL API.
Join the DZone community and get the full member experience.Join For Free
First off, Apache Phoenix is an SQL interface to the HBase database on Hadoop. You can access this database with a specialized connector for Spark: Phoenix with Spark.
Phoenix allows for non-JVM developers to access HBase in an easy, standard way. Python is popular for data sciences and web applications. This article will get you started Developing Phoenix Apps with Python. Check out the documentation here.
Installing the library to access Phoenix from Python is very easy. First off, make sure that you have Python 2.7 or Python 3.x. If you don't have PIP installed, you can see the instructions below. It's a simple one-line PiP install for PhoenixDB.
wget https://bootstrap.pypa.io/get-pip.py python get-pip.py pip install phoenixdb
If you get stuck installing Phoenix libraries for Python, then check out the documentation. You will need to have Apache HBase and Apache Phoenix Running. From your Ambari admin console, grab the server for Phoenix IP/domain name and port. The default port is 8765, but someone may have changed it. These can run either standalone or part of a cluster. The easiest way is to download a distribution like HDP 2.5.
Access Phoenix Using the Thin Client on HDP
The newest version of the Hortonworks distribution of Hadoop, HDP 2.5, has Apache Phoenix 4.7.0. Make sure you know the version of Phoenix that you are running. Phoenixdb doesn't work with JSON only Protobuf, so you have to manually change that.
To develop web application against Phoenix, you can use Python with Flask and rapidly developer web apps or REST/JSON Web APIs. Again, make sure you have Python 2.7 or 3 with PiP installed. You may need root access or need to install via sudo.
To Install Python's Flask Library
pip install gunicorn flask
You can also access Phoenix through standard drivers for .NET and Java. Microsoft provides Phoenix Drivers for Microsoft's .NET. They work fine and are well-supported for HDInsight and other hosted Phoenix servers.
You can finally also access Apache Phoenix via the Phoenix JDBC Drivers that can be used by BI and ETL tools as well as for coding in Java and Scala.
Leave a comment if you are looking for more help or have questions. Thanks!
Opinions expressed by DZone contributors are their own.