Testing Streamlit Apps Using SeleniumBase
Streamlit makes it easy to rapidly create apps in Python. But before pushing to production, write tests first! Here's how using Streamlit and GitHub Actions
Join the DZone community and get the full member experience.Join For Free
Once you've created a Streamlit app, you can use automated testing to future-proof it against regressions. This post will show how to programmatically validate that a Streamlit app is unchanged visually using the Python package SeleniumBase.
Case Study: Streamlit-folium
To demonstrate how to create automated visual tests, I’m going to use the streamlit-folium GitHub repo, a Streamlit Component I created for the Folium Python library leaflet.js. Visual regression tests help detect when the layout or content of an app changes without requiring the developer to manually visually inspect the output each time a line of code changes in their Python library. Visual regression tests also help with your Streamlit apps' cross-browser compatibility and provide an advanced warning about new browser versions affecting how your app is displayed.
Setting Up A Test Harness
The streamlit-folium test harness has three files:
tests/requirements.txt: the Python packages only needed for testing
tests/app_to_test.py: the reference Streamlit app to test
tests/test_package.py: the tests to demonstrate the package works as intended
The first step is to create a Streamlit app using the package to be tested and set the baseline. We can then use SeleniumBase to validate that the app's structure and visual appearance remain unchanged relative to the baseline.
This post focuses on describing
test_package.py since it’s the file that covers how to use SeleniumBase and OpenCV for Streamlit testing.
Defining Test Success
There are several ways to think about what constitutes looking the same in terms of testing. I chose the following three principles for testing my streamlit-folium package:
- The Document Object Model (DOM) structure (but not necessarily the values) of the page should remain the same
- For values such as headings, a test that those values are exactly equal
- Visually, the app should look the same
I decided to take these less strict definitions of “unchanged” for testing streamlit-folium, as the internals of the Folium package itself appear to be non-deterministic. The same Python code will create the same looking image, but the generated HTML will be different.
Testing Using SeleniumBase
SeleniumBase is an all-in-one framework written in Python that wraps the Selenium WebDriver project for browser automation. SeleniumBase has two functions that we can use for the first and second testing principles listed above: check_window, which tests the DOM structure, and assert_text, to ensure a specific piece of text is shown the page.
To check the DOM structure, we first need a baseline to generate using the
check_window function. The
check_window has two behaviors, based on the required
- If a folder <name> within the
visual_baseline/<Python file>.<test function name>path does not exist, this folder will be created with all of the baseline files
- If the folder does exist, then SeleniumBase will compare the current page against the baseline at the specified accuracy level
You can see an example of calling check_window and the resulting baseline files in the streamlit-folium repo. To keep the baseline constant between runs, I committed these files to the repo; if I were to make any substantive changes to the app I am testing (
app_to_test.py), I would need to remember to set the new baseline, or the tests would fail.
With the baseline folder now present, running check_window runs the comparison test. I chose to run the test at Level 2, with the level definitions as follows:
- Level 1 (least strict): HTML tags are compared to tags_level1.txt
- Level 2: HTML tags and attribute names are compared to tags_level2.txt
- Level 3 (most strict): HTML tags, attribute names, and attribute values are compared to tags_level3.txt
As mentioned in the “Defining Test Success” section, I run the
check_window function at Level 2 because the Folium library adds a GUID-like id value to the HTML attribute values, so the tests will never pass at Level 3 because the attribute values are always different between runs.
For the second test principle (“check certain values are equal”), the
assert_text method is straightforward to run:
This function checks that the exact text “streamlit-folium” is present in the app, and the test passes because it’s the value of the H1 heading in this example.
Testing Using OpenCV
While checking the DOM structure and presence of a piece of the text provides some useful information, my true acceptance criterion is that the app's visual appearance doesn’t change from the baseline. To test that the app is visually the same down to the pixel, we can use the
save_screenshot method from SeleniumBase to capture the current visual state of the app and compare to the baseline using the OpenCV package:
Using OpenCV, the first step is to read in the baseline image and the current snapshot, then compare that the size of the pictures are identical (the
shape comparison checks that the NumPy ndarrays of pixels have the same dimensions). Assuming the pictures are both the same size, we can then use the
subtract OpenCV function calculates the per-element difference between pixels by channel (blue, green, and red). If all three channels have no differences, then we know that the visual representation of the Streamlit app is identical between runs.
Automating Tests Using GitHub Actions
With our SeleniumBase and OpenCV code set up, we can now feel free to make changes to our Streamlit Component (or other Streamlit apps) and not worry about breaking unintentionally. In my single-contributor project, it’s easy to enforce running the tests locally, but with tools such as GitHub Actions available for free for open-source projects, setting up a Continuous Integration pipeline guarantees the tests are run for each commit.
The streamlit-folium has a workflow
run_tests_each_PR.yml defined that does the following:
- Sets up a test matrix for Python 3.6, 3.7, 3.8
- Installs the package dependencies and test dependencies
- Lints the code with flake8
- Install Chrome with seleniumbase
- Run the Streamlit app to test in the background
- Run the SeleniumBase and OpenCV tests in Python
By having this workflow defined in your repo, and required status checks enabled on GitHub, every pull request will now have the following status check appended to the bottom, letting you know the status of your changes:
Writing Tests Saves Work In The Long Run
Having tests in your codebase has numerous benefits. As explained above, automating visual regression tests allows you to maintain an app without having a human in the loop looking for changes. Writing tests is also a great signal to potential users that you care about your projects' stability and long-term maintainability. It’s not only easy to write tests for a Streamlit app and have them automatically run on each GitHub commit but that the extra work of adding tests to your Streamlit project will save you time in the long run.
Published at DZone with permission of Randy Zwitch. See the original article here.
Opinions expressed by DZone contributors are their own.