Apache Calcite Report: Part 1

DZone 's Guide to

Apache Calcite Report: Part 1

Tim Spann interviews Julian Hyde and Josh Elser, two Apache Calcite leads, and reports on why they believe Apache Calcite and Avatica are important.

· Big Data Zone ·
Free Resource

I spoke with Julian Hyde (Slides) and Josh Elser (Slides), both are Apache Calcite PMCs.

We spoke about Apache Calcite and Avatica. Avatica is a sub-project of Calcite for historical reasons but is beginning to be used standalone, e.g. for high-performance connectivity to Phoenix/HBase. Avatica provides a wire API between clients and a server. The Avatica server is an HTTP server that accepts API calls in two popular formats of JSON or Protocol Buffers (Protobuf).  The Avatica client is a standard JDBC driver. Clients can be developed in many languages since protobuf and JSON over HTTP are pretty common. The open source community has already added a Go Avatica client focused on Apache Phoenix.  There is also a very early .NET driver.

Image title

Talk with Julian Hyde

Julian has spoken about StreamingSQL (with Apache Calcite) and Query Optimization on Phoenix at some big conferences including XLDB, Hadoop Summit Dublin and the Kafka Summit.  He is the Apache Calcite Team lead and an Architect at Hortonworks.

Why Does Calcite Matter?

  • It provides a query planner for any database engine.   

  • It provides the heavy lifting for many of the key database features you come to expect.

  • It provides a SQL Query Parser and Abstract Syntax Tree (AST)

  • Calcite implements JDBC/ODBC frameworks

  • Data in multiple formats, multiple engines

  • It supports relational, document and KV stores

  • It supporst a variety of workloads

  • You can easily introduce new data formats and engines

  • Deconstructioned traditional database engine

  • You have visibility into all the components of the database engine

  • MetaData Catalog, authorization, algorithms (distributed join), scheduler / resource allocation, Engine, data format, storage

  • Key value is to separate queries from how to get data.

  • Your application shouldn't be tied to lower-level database implementations

Talk with Josh Elser

Josh has worked on Accumulo and Hbase and is now working on Phoenix Query Server Internals, along with Apache Calcite and Avatica.

The primary value of this work is to allow non-JVM developers to interact with Hadoop and other big data servers using familiar Enterprise tools and standards like ODBC and BI Tools.

You can query all your data sources at the same time.

Phoenix QueryServer is built on Apache Calcite's Avatica sub-project. It is a generic Jetty web server with wire API to allow client talk to server and interacts with databases via a JDBC Driver.

Anyone can implement an HTTP Clients to call it in all the popular langues. This will allow .NET developers, Python and Ruby scripters to access HBase.

Why are there so few drivers or they are not great. Every new database has to build their own stack from scratch. The leading open source databases Postgresql and MariaDB have good ODBC drivers, but those took years and a lot of developer hours. Avatica provides a reference implementation JDBC Driver Protocol Buffers are recommended, they are backwards compatible and are easier to parse than JSON. Avatica provides a metrics systems to provide utilization and information on SQL queries that are running. This should become the Universal Client for database access. The hard work is defining the a documented, clear, stable wire protocol. So drivers can be implemented in a few weeks, not months or even years.


bigdata, calcite, drill, hadoop, hive, hortonworks, sql

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}