Data Visualization Using Apache Zeppelin
Data Visualization Using Apache Zeppelin
Apache Zeppelin — an open-source data analytics and visualization platform — helps us analyze the data to gain insight and to improve and enhance business decisions.
Join the DZone community and get the full member experience.Join For Free
In today's world, data is being generated at an exponential rate — so much so that analysts are predicting our global data creation to increase 10x by 2025. Businesses are now collecting data across every internal system and external source that impacts their company, and with it comes an ever-growing need to analyze the data to gain insight into how it can be used to improve and enhance their business decisions. Apache Zeppelin — an open-source data analytics and visualization platform — can take us a long way toward meeting that goal.
In this article, you'll learn how to add a custom interpreter for MongoDB and MySQL and how to use it to query and visualize collection data. First, let's start off with an overview of Apache Zeppelin and its feature set:
What Is Apache Zeppelin?
Apache Zeppelin is an open-source, web-based "notebook" that enables interactive data analytics and collaborative documents. The notebook is integrated with distributed, general-purpose data processing systems such as Apache Spark (large-scale data processing), Apache Flink (stream processing framework), and many others. Apache Zeppelin allows you to make beautiful, data-driven, interactive documents with SQL, Scala, R, or Python right in your browser.
Apache Zeppelin has an interactive interface that allows you to instantly see the results of your analytics and have an immediate connection with your creation:
Integrate with many different open-source big data tools such as Apache projects Spark, Flink, Hive, Ignite, Lens, and Tajo.
Create notebooks that run in your browser (both on your machine and remotely) and experiment with different types of charts for to explore your data sets:
Dynamically create input forms right in your notebook.
Collaboration and Sharing
A diverse and vibrant developer community gives you access to new data sources that are being constantly added and distributed through their open-source Apache 2.0 license.
Apache Zeppelin interpreter concept allows any language/data-processing-backend to be plugged into Zeppelin. Currently, Apache Zeppelin supports many interpreters such as Apache Spark, Python, JDBC, Markdown, and Shell.
Now, let's get started creating your custom interpreter for MongoDB and MySQL.
Add a MySQL Interpreter
In the Apache Zeppelin platform, go to the drop-down menu in the top-right and click on Interpreter:
Here's where you can find a list of all interpreters. We need to create a new one for MySQL, so click on the Create button in the upper right-hand corner:
Keep all the default options, but enter the required details and make sure that a connection to your MySQL server is established:
We also need to add a custom artifact to the MySQL connector JAR so Zeppelin knows where to execute it from. Download the connector here, place it in the interpreter/jdbc folder, and then provide the exact path to the artifact:
And that's it! To test our interpreter, we need to create a new note. But first, let's set up our MongoDB interpreter, as well.
Go back to your Interpreter page and click the Create button. We're going to use this open source MongoDB interpreter, so you'll next need to download the
.zip file and rename it to
After that, go to interpreters/, create a mongodb/ folder, and paste the
.jar into the folder.
You'll now have a new Interpreter group called mongodb. Go to your Interpreter page, enter a friendly name such as mongodb, and then choose mongodb under the Interpreter group dropdown.
Now, let's enter the details of our newly created ScaleGrid MongoDB cluster in Properties, found in the Overview/Machines section of the Cluster Details page.
And we're done! Now it is time to test out our newly created interpreters.
Create a Zeppelin Note
To run queries that will help visualize our data, we need to create notes. From the Zeppelin header pane, click Notebook, and then Create a new note:
Make sure the notebook header shows a connected status as denoted by a green dot in the top-right corner:
When creating a note, you'll be presented with a dialog to enter more information. Choose the default interpreter as our newly created mysql and click Create Note.
Before we can run any queries, we also need to mention the type of interpreter we'll be using for our note. We can do that by starting our note with
%mysql. This will tell Zeppelin to expect MySQL queries in that note.
And now, we're ready to query our database. For the purpose of this example, I'll use my WordPress installation that contains a typical
wp_options table to query and visualize its data.
It works! You can now click on the various charts to visualize the data in different graph formats.
Similarly, for MongoDB, make sure you have data in the MongoDB cluster. You can add some by going to the Admin tab and running Mongo queries.
Here's an example of some MongoDB data in the note:
Now that your data is ready for visualization and querying, you may want to show it off to your team. You can do this very easily by creating a shareable link to the note:
This shareable link will be available for anyone to view, and you can also choose to share a link to a specific graph only:
Apache Zeppelin is an immensely helpful tool that allows teams to manage and analyze data with many different visualization options, tables, and shareable links for collaboration.
You can also explore other ways to visualize your data through MongoDB GUI's, including the top four: MongoDB Compass, Robomongo, Studio 3T, and MongoBooster.
Here are some helpful links to get you started:
Published at DZone with permission of Kunal Nagar , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.