DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

How does AI transform chaos engineering from an experiment into a critical capability? Learn how to effectively operationalize the chaos.

Data quality isn't just a technical issue: It impacts an organization's compliance, operational efficiency, and customer satisfaction.

Are you a front-end or full-stack developer frustrated by front-end distractions? Learn to move forward with tooling and clear boundaries.

Developer Experience: Demand to support engineering teams has risen, and there is a shift from traditional DevOps to workflow improvements.

Related

  • Scaling Microservices With Docker and Kubernetes on Production
  • Look, Ma! No Pods!
  • Build Your Private Cloud at Home
  • MLOps: Practical Lessons from Bridging the Gap Between ML Development and Production

Trending

  • Data Storage and Indexing in PostgreSQL: Practical Guide With Examples and Performance Insights
  • Software Specs 2.0: An Elaborate Example
  • Serverless IAM: Implementing IAM in Serverless Architectures with Lessons from the Security Trenches
  • Understanding the Fundamentals of Cryptography
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. Apache Superset in the Production Environment

Apache Superset in the Production Environment

We take a look at this interesting open source BI tool from the Apache Foundation, and show you how to set it up in a Docker container.

By 
Abhishek Sharma user avatar
Abhishek Sharma
·
Dec. 21, 18 · Tutorial
Likes (8)
Comment
Save
Tweet
Share
34.2K Views

Join the DZone community and get the full member experience.

Join For Free

Visualizing data helps in building a much deeper understanding of the data and quickens analytics around the data. There are several mature paid products available on the market. Recently, I explored an open source product name Apache Superset which I found a very upbeat product in this space. Some prominent features of Superset are:

  • A rich set of data visualizations.

  • An easy-to-use interface for exploring and visualizing data.

  • Create and share dashboards.

After reading about Superset, I wanted to try it, and as Superset is a Python programming language-based project we can easily install it using pip; but I decided to set it up as a container based on Docker. The Apache Superset GitHub Repo contains code for building and running Superset as a container. Since I want to run Superset in a completely distributed manner and with as little modification as possible in the code, I decided to modify the code so that it could run in multiple different modes.

Below is a list of specific changes/enhancements done in the code.

  • Different version of a Superset image can be built using the same code.

  • Superset configurations can be easily edited and mounted into the container, with no need to rebuild the image.

  • We can use asynchronous query executions through Celery-based executors and manage it through Flower UI.

Exploration Made Easy

While for exploring a project, development mode is an excellent choice, however, it would be great if the initial exploration happened with all the features, for instance, in the case of Superset, running queries in async mode, and storing the result in cache. You can explore Superset smoothly by using the below commands.

First pull a Docker Superset image from docker-hub:

docker pull abhioncbr/docker-superset:<tag>

Get docker-compose.yml and superset-config.py from the code base and follow the same directory structure.

Lastly, start a Superset image as a container in a local or prod mode using docker-compose:

cd docker-files/ && SUPERSET_ENV=<local | prod> SUPERSET_VERSION=<tag> docker-compose up -d

Running Superset in a Completey Distributed Mode

As per my understanding, running a Superset in a production environment for serving thousands of end-users should be distributed in nature and can be easily scaled as per the requirements. The below image depicts such a setup:


Image title

The published Docker image of Superset can be leveraged to achieve the above image.

  • The load balancer in front for routing the request from clients to a one server container.

  • Multiple containers in server mode for serving the UI of the Superset. Starting a server container using docker run can be done as follows:

docker run -p 8088:8088 \
-v config:/home/superset/config/ \
abhioncbr/docker-superset:<tag> \
cluster server <db_url> <redis_url>
  • Use multiple containers in worker mode for executing the SQL queries in an async mode using Celery executor. Starting a worker container using docker run can be done as follows:

docker run -p 5555:5555 \
-v config:/home/superset/config/ \
abhioncbr/docker-superset:<tag> \
cluster worker <db_url> <redis_url>
  • Use a centralized Redis container or Redis cluster for serving it as a cache layer and Celery task queues for workers.

  • Use a centralized Superset metadata database.

I found setting up Superset as Docker container is quite easy and 9t can be used for different environments. 

Docker (software) Production (computer science)

Opinions expressed by DZone contributors are their own.

Related

  • Scaling Microservices With Docker and Kubernetes on Production
  • Look, Ma! No Pods!
  • Build Your Private Cloud at Home
  • MLOps: Practical Lessons from Bridging the Gap Between ML Development and Production

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • [email protected]

Let's be friends: