DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workkloads.

Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Architecture and Code Design, Pt. 1: Relational Persistence Insights to Use Today and On the Upcoming Years
  • Apache Cassandra Horizontal Scalability for Java Applications [Book]
  • 5 Key Postgres Advantages Over MySQL
  • Basic CRUD Operations Using Hasura GraphQL With Distributed SQL on GKE

Trending

  • CRDTs Explained: How Conflict-Free Replicated Data Types Work
  • Build Your First AI Model in Python: A Beginner's Guide (1 of 3)
  • Segmentation Violation and How Rust Helps Overcome It
  • Streamlining Event Data in Event-Driven Ansible
  1. DZone
  2. Data Engineering
  3. Databases
  4. HBase, Phoenix, and Java — Part 2

HBase, Phoenix, and Java — Part 2

Find out the advantages, limitations, features, and the architecture of Apache Phoenix.

By 
Bipin Patwardhan user avatar
Bipin Patwardhan
·
Jun. 12, 18 · Opinion
Likes (4)
Comment
Save
Tweet
Share
21.4K Views

Join the DZone community and get the full member experience.

Join For Free

Introduction

While RDBMS (Relational Database Management systems) have been popular for decades (and will continue to remain so), in recent time, we have seen the emergence and acceptance of databases/datastores that are not based on relational technology concepts. Such databases are typically called NoSQL, with the team initially meaning "Not SQL," but has come to mean "Not only SQL." When the NoSQL database was released, they did not support the popular SQL interface. Each of the databases had their own API for data access and manipulation, but with the passage of time, a few solutions were developed on top of existing NoSQL databases that supported the SQL interface. For example, the Hive datastore was created on top of HDFS (Hadoop Distributed File System) to provide a "table-like" interface to data and it supports many features of SQL 92. Similarly, though HBase is a column store, an SQL interface was defined. This interface is Phoenix.

When news reporters write about Non-Resident Indians or about Americans of Indian origin (or for that matter, any person of Indian origin, now settled in another country), they use the phrase “You can take the XYZ out of India, but you cannot take India out of XYZ.” Looking at the SQL interfaces provided for NoSQL databases, I would like to coin the phrase, “You can take a programmer out of SQL, but you cannot take SQL out of the programmer.”

Apache Phoenix

Phoenix is an open source SQL skin for HBase. You use the standard JDBC APIs instead of the regular HBase client APIs to create tables, insert data, and query your HBase data. It enables OLTP and operational analytics in Hadoop for low latency applications by combining the best of both worlds:

  • the power of standard SQL and JDBC APIs with full ACID transaction capabilities
  • the flexibility of late-bound, schema-on-read capabilities from the NoSQL world by leveraging HBase as its backing store

Apache Phoenix is fully integrated with other Hadoop products such as Spark, Hive, Pig, Flume, and Map Reduce. It takes an SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows.

Place in the Hadoop ecosystem:

Image titleFigure 1: Phoenix and HBase in the Hadoop Ecosystem

At first glance, given that Phoenix is an SQL skin on top of HBase, it is assumed that using it will result in slower access as compared to direct HBase API access, but in most cases, it has been noted that Phoenix achieves as good or likely better performance than if we hand-coded it ourselves (in addition to the advantage of reduced code development). This is achieved by:

  • compiling SQL queries to native HBase scans
  • determining the optimal start and stop for the scan key
  • orchestrating the parallel execution of scans
  • bringing the computation to the data by
    • pushing the predicates in the where clause to a server-side filter
    • executing aggregate queries through server-side hooks (called co-processors)

Advantages of Phoenix

One of the biggest advantages of using Phoenix is that it provides access to HBase using an interface that most programmers are familiar with, namely SQL and JDBC. Some of the other advantages of using Phoenix are:

  • Reduces the amount of code users need to write
  • Allows for performance optimizations transparent to the user
  • Opens the door for leveraging and integrating lots of existing tooling

Architecture

Architecturally, Phoenix works by using two components, namely a server component and a client component. The client component is nothing but the Phoenix JDBC driver and is used by each client that connects to HBase using Phoenix. On the server side, a Phoenix component resides on each RegionServer in HBase.

Let us consider the following HBase cluster architecture:

Image title

Figure 2: Sample Architecture

When Phoenix is brought into the picture, we have to note that there are two components — a server-side component that resides on the each RegionServer, and the client-side JDBC library. This diagram depicts the server side component.

Image title

Figure 3: Sample Architecture with Phoenix server-side component

To access a Phoenix enabled HDFS database, we need to use the Phoenix JDBC client library. When using the JDBC library, the architecture is as depicted:

Image title

Figure 4: Sample architecture with Phoenix client-side component

Example

Let us now consider a simple example using the Phoenix API.

To run the example, we need to include the Phoenix client jar containing the JDBC classes, for example, we can use phoenix-4.1.0-client-hadoop2.jar.

import java.sql.*;
import java.util.*;

public class PhoenixExample {
  public static void main(String args[]) throws Exception {
    Connection conn;
    Properties prop = new Properties();
    Class.forName("org.apache.phoenix.jdbc.PhoenixDriver");
    conn =  DriverManager.getConnection("jdbc:phoenix:localhost:/hbase-unsecure");
    System.out.println("got connection");
    ResultSet rst = conn.createStatement().executeQuery("select * from BMARKSP");
    while (rst.next()) {
      System.out.println(rst.getString(1) + " " + rst.getString(2) + " " + rst.getString(3) + " " + rst.getString(4) + " " + rst.getString(5) + " " + rst.getString(6));
    }
  }
}

When compared to the example presented in "HBase, Phoenix, and Java Part 1," accessing the data stored in HBase is way simpler using Phoenix.

Apache Phoenix Features

  • It is delivered as an embedded JDBC driver for HBase data.
  • Follows ANSI SQL standards whenever possible
  • Allows columns to be modeled as a multi-part row key or key/value cells.
  • Full query support with predicate push down and optimal scan key formation.
  • Versioned schema repository. Table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema.
  • DDL support: CREATE TABLE, DROP TABLE, and ALTER TABLE for adding/removing columns.
  • DML support: UPSERT VALUES for row-by-row insertion, UPSERT SELECT for mass data transfer between the same or different tables, and DELETE for deleting rows.
  • Compiles SQL query into a series of HBase scans, and runs those scans in parallel to produce regular JDBC result sets.

It can seamlessly integrate with HBase, Pig, Flume, and Sqoop.

Why Is Phoenix so Fast?

  • Apache Phoenix breaks up SQL queries into multiple HBase scans and runs them in parallel.
  • Phoenix was developed with the notion of bringing the computation to the data by using:
    • Coprocessors to perform operations on the server-side thus minimizing client/server data transfer
  • Custom filters to prune data as close to the source as possible in addition, to minimize any startup cost.
  • Phoenix uses native HBase APIs rather than going through the MapReduce framework.
  • Phoenix leverages below HBase custom filters to provide higher performance.
    • Essential Column Family filter leads to improved performance when Phoenix query filters on data that is split in multiple column families (cf) by only loading essential cf. In second pass, all cf are loaded as needed.
  • Phoenix’s Skip Scan Filter leverages SEEK_NEXT_USING_HINT of HBase Filter. It significantly improves point queries over key columns.
  • Salting in Phoenix leads to both improved read and write performance by adding an extra hash byte at start of key and pre-splitting the data in the number of regions. This eliminates hot-spotting of single or few regions servers
  • In a single line, we can say that direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries or seconds for tens of millions of rows.
  • For aggregate queries, co-processors complete partial aggregation on their local region servers and only return relevant data to the client.
  • Phoenix creates secondary indexes to improve performance on non-row key columns.

Limitations

A few limitations of Phoenix:

  • Limited transaction support through client-side batching.
  • Joins are not completely supported. FULL OUTER JOIN and CROSS JOIN are not supported.

Additional Examples

Two more examples that show how to use Phoenix:

  • Open source Java projects: Apache Phoenix 
  • HBase Phoenix JDBC example
Database Relational database sql Data (computing) Apache Phoenix Java (programming language) Open source Filter (software)

Opinions expressed by DZone contributors are their own.

Related

  • Architecture and Code Design, Pt. 1: Relational Persistence Insights to Use Today and On the Upcoming Years
  • Apache Cassandra Horizontal Scalability for Java Applications [Book]
  • 5 Key Postgres Advantages Over MySQL
  • Basic CRUD Operations Using Hasura GraphQL With Distributed SQL on GKE

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!