Apache Calcite Report: Part 1
Apache Calcite Report: Part 1
Tim Spann interviews Julian Hyde and Josh Elser, two Apache Calcite leads, and reports on why they believe Apache Calcite and Avatica are important.
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
We spoke about Apache Calcite and Avatica. Avatica is a sub-project of Calcite for historical reasons but is beginning to be used standalone, e.g. for high-performance connectivity to Phoenix/HBase. Avatica provides a wire API between clients and a server. The Avatica server is an HTTP server that accepts API calls in two popular formats of JSON or Protocol Buffers (Protobuf). The Avatica client is a standard JDBC driver. Clients can be developed in many languages since protobuf and JSON over HTTP are pretty common. The open source community has already added a Go Avatica client focused on Apache Phoenix. There is also a very early .NET driver.
Talk with Julian Hyde
Julian has spoken about StreamingSQL (with Apache Calcite) and Query Optimization on Phoenix at some big conferences including XLDB, Hadoop Summit Dublin and the Kafka Summit. He is the Apache Calcite Team lead and an Architect at Hortonworks.
Why Does Calcite Matter?
It provides a query planner for any database engine.
It provides the heavy lifting for many of the key database features you come to expect.
It provides a SQL Query Parser and Abstract Syntax Tree (AST)
Calcite implements JDBC/ODBC frameworks
Data in multiple formats, multiple engines
It supports relational, document and KV stores
It supporst a variety of workloads
You can easily introduce new data formats and engines
Deconstructioned traditional database engine
You have visibility into all the components of the database engine
MetaData Catalog, authorization, algorithms (distributed join), scheduler / resource allocation, Engine, data format, storage
Key value is to separate queries from how to get data.
Your application shouldn't be tied to lower-level database implementations
Talk with Josh Elser
Josh has worked on Accumulo and Hbase and is now working on Phoenix Query Server Internals, along with Apache Calcite and Avatica.
The primary value of this work is to allow non-JVM developers to interact with Hadoop and other big data servers using familiar Enterprise tools and standards like ODBC and BI Tools.
You can query all your data sources at the same time.
Phoenix QueryServer is built on Apache Calcite's Avatica sub-project. It is a generic Jetty web server with wire API to allow client talk to server and interacts with databases via a JDBC Driver.
Anyone can implement an HTTP Clients to call it in all the popular langues. This will allow .NET developers, Python and Ruby scripters to access HBase.
Why are there so few drivers or they are not great. Every new database has to build their own stack from scratch. The leading open source databases Postgresql and MariaDB have good ODBC drivers, but those took years and a lot of developer hours. Avatica provides a reference implementation JDBC Driver Protocol Buffers are recommended, they are backwards compatible and are easier to parse than JSON. Avatica provides a metrics systems to provide utilization and information on SQL queries that are running. This should become the Universal Client for database access. The hard work is defining the a documented, clear, stable wire protocol. So drivers can be implemented in a few weeks, not months or even years.
Opinions expressed by DZone contributors are their own.