DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
View Events Video Library
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Integrating PostgreSQL Databases with ANF: Join this workshop to learn how to create a PostgreSQL server using Instaclustr’s managed service

Mobile Database Essentials: Assess data needs, storage requirements, and more when leveraging databases for cloud and edge applications.

Monitoring and Observability for LLMs: Datadog and Google Cloud discuss how to achieve optimal AI model performance.

Automated Testing: The latest on architecture, TDD, and the benefits of AI and low-code tools.

Related

  • Look, Ma! No Pods!
  • The Challenges of Adopting K8s for Production and Tips to Avoid Them
  • Canary Deployment of Applications on Kubernetes Using Spinnaker
  • 50% of Developers Don’t Scan Their Docker Images for Vulnerabilities at All

Trending

  • How To Verify Database Connection From a Spring Boot Application
  • Java Parallel GC Tuning
  • Breaking Down Silos: The Importance of Collaboration in Solution Architecture
  • Deploy a Session Recording Solution Using Ansible and Audit Your Bastion Host
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. Apache Superset in the Production Environment

Apache Superset in the Production Environment

We take a look at this interesting open source BI tool from the Apache Foundation, and show you how to set it up in a Docker container.

Abhishek Sharma user avatar by
Abhishek Sharma
·
Dec. 21, 18 · Tutorial
Like (7)
Save
Tweet
Share
32.54K Views

Join the DZone community and get the full member experience.

Join For Free

Visualizing data helps in building a much deeper understanding of the data and quickens analytics around the data. There are several mature paid products available on the market. Recently, I explored an open source product name Apache Superset which I found a very upbeat product in this space. Some prominent features of Superset are:

  • A rich set of data visualizations.

  • An easy-to-use interface for exploring and visualizing data.

  • Create and share dashboards.

After reading about Superset, I wanted to try it, and as Superset is a Python programming language-based project we can easily install it using pip; but I decided to set it up as a container based on Docker. The Apache Superset GitHub Repo contains code for building and running Superset as a container. Since I want to run Superset in a completely distributed manner and with as little modification as possible in the code, I decided to modify the code so that it could run in multiple different modes.

Below is a list of specific changes/enhancements done in the code.

  • Different version of a Superset image can be built using the same code.

  • Superset configurations can be easily edited and mounted into the container, with no need to rebuild the image.

  • We can use asynchronous query executions through Celery-based executors and manage it through Flower UI.

Exploration Made Easy

While for exploring a project, development mode is an excellent choice, however, it would be great if the initial exploration happened with all the features, for instance, in the case of Superset, running queries in async mode, and storing the result in cache. You can explore Superset smoothly by using the below commands.

First pull a Docker Superset image from docker-hub:

docker pull abhioncbr/docker-superset:<tag>

Get docker-compose.yml and superset-config.py from the code base and follow the same directory structure.

Lastly, start a Superset image as a container in a local or prod mode using docker-compose:

cd docker-files/ && SUPERSET_ENV=<local | prod> SUPERSET_VERSION=<tag> docker-compose up -d

Running Superset in a Completey Distributed Mode

As per my understanding, running a Superset in a production environment for serving thousands of end-users should be distributed in nature and can be easily scaled as per the requirements. The below image depicts such a setup:


Image title

The published Docker image of Superset can be leveraged to achieve the above image.

  • The load balancer in front for routing the request from clients to a one server container.

  • Multiple containers in server mode for serving the UI of the Superset. Starting a server container using docker run can be done as follows:

docker run -p 8088:8088 \
-v config:/home/superset/config/ \
abhioncbr/docker-superset:<tag> \
cluster server <db_url> <redis_url>
  • Use multiple containers in worker mode for executing the SQL queries in an async mode using Celery executor. Starting a worker container using docker run can be done as follows:

docker run -p 5555:5555 \
-v config:/home/superset/config/ \
abhioncbr/docker-superset:<tag> \
cluster worker <db_url> <redis_url>
  • Use a centralized Redis container or Redis cluster for serving it as a cache layer and Celery task queues for workers.

  • Use a centralized Superset metadata database.

I found setting up Superset as Docker container is quite easy and 9t can be used for different environments. 

Docker (software) Production (computer science)

Opinions expressed by DZone contributors are their own.

Related

  • Look, Ma! No Pods!
  • The Challenges of Adopting K8s for Production and Tips to Avoid Them
  • Canary Deployment of Applications on Kubernetes Using Spinnaker
  • 50% of Developers Don’t Scan Their Docker Images for Vulnerabilities at All

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: