DZone
Big Data Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Big Data Zone > Apache Kylin for OLAP on Hadoop

Apache Kylin for OLAP on Hadoop

We are helping a firm unify billing data from disparate systems to create OLAP cubes that provide analytics that are not possible with current systems. This all happens in the Hadoop cluster.

Craig Lukasi user avatar by
Craig Lukasi
·
Aug. 17, 16 · Big Data Zone · Tutorial
Like (5)
Save
Tweet
11.27K Views

Join the DZone community and get the full member experience.

Join For Free

Traditionally, Hadoop (via MapReduce, Pig or Hive) was used to prepare data for OLAP cubes for external, proprietary OLAP engines. Now we at Zaloni are encountering firms using Apache Kylin to achieve real-time query capabilities on OLAP cubes backed by 40-billion-plus row fact tables. We are helping a firm unify billing data from disparate systems to create OLAP cubes that provide analytics that are not possible with current systems. This all happens in the Hadoop cluster.

The Evolution of Analytics on Hadoop

Hadoop has evolved from a distributed data platform with generic compute capabilities (via MapReduce) to a powerful platform. Hadoop and its ecosystem tools are now capable of tackling a broad set of use cases beyond low-cost distributed batch processing, Hadoop’s original claim to fame. From iterative Machine Learning to OLAP and OLTP systems, open source analytics capabilities that run “on the cluster” are putting pressure on the traditional players in the field (Oracle, SAS, Teradata, IBM, etc.).

Designed for Scale

Apache Kylin, named after a mythical chinese creature, is an open source multidimensional online analytic processing engine (MOLAP). Originating from eBay, Inc., Kylin is designed to handle petabyte scale datasets. Here’s a quote from the Apache Foundation Blog from December 2015:

"Apache Kylin is the best OLAP engine on Big Data so far," said Wilson Pang, Senior Director of Data Services and Solutions at eBay. "At eBay, we collect every user behavior on every eBay screen. While other OLAP engines struggle with the data volume, Kylin enables query responses in the milliseconds. Moreover, we are also starting to leverage Kylin for near real time data streaming storage and analytics engine. All together, Kylin serves as a critical backend component for eBay’s product analytics platform."

How it Works

Kylin achieves its speed by precomputing the various dimensional combinations and the measure aggregates via Hive queries and populating HBase with the results. The Kylin query engine - accessible in Kylin’s user-friendly UI, via an API or via JDBC - leverages the Apache Calcite query processor and HBase features (such as fuzzy row filters) to achieve fast lookups. The HBase rowkeys are compact too, due to the use of a Trie Data Structure for the dictionary of the dimension values.

Kylin only supports the star schema. You are limited to a single fact table for each cube.

Wizard

Building a cube is a snap. Assuming you already have a Hive table in place, the wizard walks you through the process of selecting the dimensions (which may be hierarchical), selecting the lookup-tables, choosing the measures, etc. Partitioning by date is possible and makes refreshes of segments of the cube a breeze, for example, when incremental or streaming data is involved. Once the cube is defined, the build process can be monitored in Kylin’s UI.

Beyond Kylin’s Web UI, you can query the OLAP cubes via JDBC, inside Zeppelin (there’s a Kylin interpreter distributed with Zeppelin), or by way of a well-designed REST API.

Other Options for OLAP on Hadoop

Kylin is just one open source option for OLAP on Hadoop. Apache Lens is another, but it is a ROLAP solution and does not currently give the responsiveness that Kylin’s precomputed cubes gives. Druid is also option, but it leverages its own clustering technologies (not requiring Hadoop). There are also vendor solutions that claim to achieve OLAP capabilities on Hadoop. 

hadoop Machine learning Big data Database Open source

Published at DZone with permission of Craig Lukasi. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Java Class Loading: Performance Impact
  • Exploring a Paradigm Shift for Relational Database Schema Changes
  • DZone's Article Submission Guidelines
  • What Is Cloud Storage: Definition, Types, Pros, and Cons

Comments

Big Data Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo