Over a million developers have joined DZone.

Apache Zeppelin: The Road Ahead

Learn about Apache Zeppelin and the recent improvements that have been made as well as the road ahead for the project, much of which has to do with community.

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

Recently, the Apache Software Foundation (ASF) announced Apache Zeppelin as a top level project.

This was a great milestone for both the Zeppelin and data science community. Since its incubation in ASF in December 2014, the community around Zeppelin has become larger, more diverse, inclusive and vibrant. As of last week there are now 126 contributors, 812 forks and three releases of the project.

Apache Zeppelin helps data analysts, data scientist, and business users get better understanding of data. With Apache Zeppelin users can quickly explore data, create visualizations and share their insights, as web pages, with various stakeholders.

Recent Improvements

Over the last year, there have been several key improvements to Apache Zeppelin that have been contributed by a diverse group of developers. Some of the highlights are:

  • Security Features-Authentication, Access Control, LDAP Support
  • Sharing Collaboration- Notebook import/export
  • Interpreters-Noteable R interpreter, and others too numerous to list

The pluggable nature of the Apache Zeppelin interpreter architecture has made it easy to add support for interpreters. Now there are over 30 interpreters supporting everything from Spark, Hive, MySql, and R to things like Geode and HBase.

The Road Ahead


The Apache Zeppelin community has been working on Project Helium, which aims to seed growth in all kinds of visualizations. This follows the model created by pluggable interpreters. Helium aims to make adding a new visualization simple and intuitive. With pluggable visualization, adding support for Map based visualization would be easy and will be added to Zeppelin later this year.


One of the most requested features among Zeppelin users was full support for sharing and collaboration. Data scientists and business analysts often collaborate on their work. They should be able to read notebooks stored in a GIT server and be able to write their notebooks to GIT.

Multi-User Support

Multi-user support in Zeppelin was another highly requested feature. There are multiple facets of multi-user support: The most basic aspect is that a notebook should execute as an authenticated end-user. We have added this feature in Zeppelin. Another facet of multi-user support is user-specific dependencies management. We plan on adding this feature.

Zeppelin is closely tied to Apache Spark. The Spark community is close to releasing Spark 2.0. Zeppelin will very shortly start to support Spark 2.0.

Notebook Organization

Another commonly requested feature is notebook organization and the community is working to provide this feature.

Data Preparation

As Zeppelin’s adoption grows, so does its’ use in enterprises. Often the data scientist/data analyst workflow starts with importing some data sets. A significant portion of time is spent on improving the quality of this data set before the data set is used in analysis or machine learning.
We plan to make data set import easier and allow basic features to check and validate the data quality.


It takes a community to create a compelling Apache project. We truly believe in ASF’s motto of community over code. Now developers and supporters from NFLabs, Twitter, Hortonworks, MapR, Pivotal, and IBM among many others are working together to deliver new features and fix issues in Apache Zeppelin. We are very thankful to this community and look forward to growing this community and to make Apache Zeppelin the best Notebook there is.

Hortonworks Sandbox is a personal, portable Apache Hadoop® environment that comes with dozens of interactive Hadoop and it's ecosystem tutorials and the most exciting developments from the latest HDP distribution, brought to you in partnership with Hortonworks.

r,zeppelin,spark,hadoop,big data

Published at DZone with permission of Vinay Shukla, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}