Testing Streamlit Apps Using SeleniumBase
Streamlit makes it easy to rapidly create apps in Python. But before pushing to production, write tests first! Here's how using Streamlit and GitHub Actions
Join the DZone community and get the full member experience.
Join For Free
When I’ve worked at Streamlit, I’ve seen hundreds of impressive data apps ranging from computer vision applications to public health tracking of COVID-19 and even simple children’s games. I believe the growing popularity of Streamlit comes from the fast, iterative workflows through the Streamlit “magic” functionality and auto-reloading the front-end upon saving your Python script. Write some code, hit ‘Save’ in your editor, and then visually inspect each code change's correctness. And with the unveiling of Streamlit sharing for easy deployment of Streamlit apps, you can go from idea to coding to deploying your app in just minutes!
Once you've created a Streamlit app, you can use automated testing to future-proof it against regressions. This post will show how to programmatically validate that a Streamlit app is unchanged visually using the Python package SeleniumBase.
Case Study: Streamlit-folium
To demonstrate how to create automated visual tests, I’m going to use the streamlit-folium GitHub repo, a Streamlit Component I created for the Folium Python library leaflet.js. Visual regression tests help detect when the layout or content of an app changes without requiring the developer to manually visually inspect the output each time a line of code changes in their Python library. Visual regression tests also help with your Streamlit apps' cross-browser compatibility and provide an advanced warning about new browser versions affecting how your app is displayed.

Setting Up A Test Harness
The streamlit-folium test harness has three files:
tests/requirements.txt
: the Python packages only needed for testingtests/app_to_test.py
: the reference Streamlit app to testtests/test_package.py
: the tests to demonstrate the package works as intended
The first step is to create a Streamlit app using the package to be tested and set the baseline. We can then use SeleniumBase to validate that the app's structure and visual appearance remain unchanged relative to the baseline.
This post focuses on describing test_package.py
since it’s the file that covers how to use SeleniumBase and OpenCV for Streamlit testing.
Defining Test Success
There are several ways to think about what constitutes looking the same in terms of testing. I chose the following three principles for testing my streamlit-folium package:
- The Document Object Model (DOM) structure (but not necessarily the values) of the page should remain the same
- For values such as headings, a test that those values are exactly equal
- Visually, the app should look the same
I decided to take these less strict definitions of “unchanged” for testing streamlit-folium, as the internals of the Folium package itself appear to be non-deterministic. The same Python code will create the same looking image, but the generated HTML will be different.
Testing Using SeleniumBase
SeleniumBase is an all-in-one framework written in Python that wraps the Selenium WebDriver project for browser automation. SeleniumBase has two functions that we can use for the first and second testing principles listed above: check_window, which tests the DOM structure, and assert_text, to ensure a specific piece of text is shown the page.
To check the DOM structure, we first need a baseline to generate using the check_window
function. The check_window
has two behaviors, based on the required name
argument:
- If a folder <name> within the
visual_baseline/<Python file>.<test function name>
path does not exist, this folder will be created with all of the baseline files - If the folder does exist, then SeleniumBase will compare the current page against the baseline at the specified accuracy level
You can see an example of calling check_window and the resulting baseline files in the streamlit-folium repo. To keep the baseline constant between runs, I committed these files to the repo; if I were to make any substantive changes to the app I am testing (app_to_test.py
), I would need to remember to set the new baseline, or the tests would fail.
With the baseline folder now present, running check_window runs the comparison test. I chose to run the test at Level 2, with the level definitions as follows:
- Level 1 (least strict): HTML tags are compared to tags_level1.txt
- Level 2: HTML tags and attribute names are compared to tags_level2.txt
- Level 3 (most strict): HTML tags, attribute names, and attribute values are compared to tags_level3.txt
As mentioned in the “Defining Test Success” section, I run the check_window
function at Level 2 because the Folium library adds a GUID-like id value to the HTML attribute values, so the tests will never pass at Level 3 because the attribute values are always different between runs.
For the second test principle (“check certain values are equal”), the assert_text
method is straightforward to run:
self.assert_text("streamlit-folium")
This function checks that the exact text “streamlit-folium” is present in the app, and the test passes because it’s the value of the H1 heading in this example.
Testing Using OpenCV
While checking the DOM structure and presence of a piece of the text provides some useful information, my true acceptance criterion is that the app's visual appearance doesn’t change from the baseline. To test that the app is visually the same down to the pixel, we can use the save_screenshot
method from SeleniumBase to capture the current visual state of the app and compare to the baseline using the OpenCV package:
from seleniumbase import BaseCase
import cv2
import time
class ComponentsTest(BaseCase):
def test_basic(self):
# open the app and take a screenshot
self.open("http://localhost:8501")
time.sleep(10) # give leaflet time to load from web
self.save_screenshot("current-screenshot.png")
# test screenshots look exactly the same
original = cv2.imread(
"visual_baseline/test_package.test_basic/first_test/screenshot.png"
)
duplicate = cv2.imread("current-screenshot.png")
assert original.shape == duplicate.shape
difference = cv2.subtract(original, duplicate)
b, g, r = cv2.split(difference)
assert cv2.countNonZero(b) == cv2.countNonZero(g) == cv2.countNonZero(r) == 0
Using OpenCV, the first step is to read in the baseline image and the current snapshot, then compare that the size of the pictures are identical (the shape
comparison checks that the NumPy ndarrays of pixels have the same dimensions). Assuming the pictures are both the same size, we can then use the subtract
OpenCV function calculates the per-element difference between pixels by channel (blue, green, and red). If all three channels have no differences, then we know that the visual representation of the Streamlit app is identical between runs.
Automating Tests Using GitHub Actions
With our SeleniumBase and OpenCV code set up, we can now feel free to make changes to our Streamlit Component (or other Streamlit apps) and not worry about breaking unintentionally. In my single-contributor project, it’s easy to enforce running the tests locally, but with tools such as GitHub Actions available for free for open-source projects, setting up a Continuous Integration pipeline guarantees the tests are run for each commit.
The streamlit-folium has a workflow run_tests_each_PR.yml
defined that does the following:
- Sets up a test matrix for Python 3.6, 3.7, 3.8
- Installs the package dependencies and test dependencies
- Lints the code with flake8
- Install Chrome with seleniumbase
- Run the Streamlit app to test in the background
- Run the SeleniumBase and OpenCV tests in Python
xxxxxxxxxx
# This workflow will install Python dependencies, run tests and lint with a variety of Python versions
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions
name Run tests each PR
on
push
branches master
pull_request
branches master
jobs
build
runs-on ubuntu-latest
strategy
matrix
python-version 3.6 3.7 3.8
steps
uses actions/checkout@v2
name Set up Python $ matrix.python-version
uses actions/setup-python@v2
with
python-version $ matrix.python-version
name Install dependencies
run
python -m pip install --upgrade pip
pip install flake8 pytest
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
if [ -f tests/requirements.txt ]; then pip install -r tests/requirements.txt; fi
name Lint with flake8
run
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
name Install chromedriver
run
seleniumbase install chromedriver latest
name Start Streamlit app
run
streamlit run tests/app_to_test.py &
name Test with pytest
run
pytest
By having this workflow defined in your repo, and required status checks enabled on GitHub, every pull request will now have the following status check appended to the bottom, letting you know the status of your changes:

Writing Tests Saves Work In The Long Run
Having tests in your codebase has numerous benefits. As explained above, automating visual regression tests allows you to maintain an app without having a human in the loop looking for changes. Writing tests is also a great signal to potential users that you care about your projects' stability and long-term maintainability. It’s not only easy to write tests for a Streamlit app and have them automatically run on each GitHub commit but that the extra work of adding tests to your Streamlit project will save you time in the long run.
Published at DZone with permission of Randy Zwitch. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments