Charting COVID-19 Data With Python
Visualize characteristics and trends of the COVID-19 pandemic in the United States during 2020 using the integration between Python and ArcGIS Platform.
Join the DZone community and get the full member experience.Join For Free
Charting provides a powerful way to visualize and explore your data by helping to uncover patterns, trends, relationships, and structures that might not be apparent when looking at a table or map. The COVID-19 pandemic has created voluminous streams of data for scientists, researchers, and decision-makers to visualize, analyze, and understand through a variety of data analysis packages and tools.
This blog walks through visualizing characteristics and trends of the COVID-19 pandemic in the United States during 2020 using the integration between Python and ArcGIS Platform.
Preparing the Data
To get started, I’ll load and prepare the data using pandas, but you can use whatever Python tools you prefer. I’m acquiring the data from the New York Times COVID-19 data repository (publicly accessible here), and I’m filtering the data to include only dates from the complete year of 2020.
Here’s a quick look at the prepared dataset. Notice that there is an individual row for each date and state combination. These rows will be summarized and aggregated when I visualize this data with charts.
To create an ArcPy chart, you must convert your data to a supported format such as a Layer or Table object, a dataset path, or a feature service URL. For this demo, I’ll save the pandas DataFrame as an in-memory table using the ArcGIS API for Python.
Visualizing the Data
Now that I’ve prepared the data with the proper fields and saved it in a supported format, I can explore it.
First, I’ll start simple and create a bar chart showing the total COVID cases for each state. I do this by initializing an ArcPy Chart object and configuring the properties. As illustrated in the table above, the dataset contains one row for each
date combination, so here I use a
sum aggregation to calculate the total
cases_new values for each state. Also take note of the
dataSource property, which is used to specify the dataset you want to visualize. Here I’m configuring the chart to use the in-memory table I created above.
The chart above is a good first attempt, but it’s very difficult to read due to the small size. I’ll make the chart larger by setting the chart object’s
displaySize property. I’ll also arrange the bars in a more logical way by sorting them to be in descending order from most cases to fewest cases.
Now I’ll take a look at new cases per day for the entire United States by creating a bar chart with a date field on the X axis and the total aggregated daily COVID cases on the Y axis.
The above chart is helpful for understanding the trajectory of daily COVID cases in the US, but this chart is difficult to interpret due to the existence of noise in the dataset. As time progresses, you can see that the bars form many peaks and valleys, and this cyclical pattern is most likely due to inconsistent reporting of COVID cases. To combat this noise, I can re-create the same chart, but this time I’ll include a moving average line. Moving averages are useful for smoothing out the noise in a temporal dataset and highlighting the general pattern of the data.
I can also view aggregated COVID cases over time from a slightly different perspective by creating a calendar heat chart. This chart aggregates daily cases and displays them in a calendar grid. The calendar heat chart is effective at showing a per day summary of temporal data, particularly when the values are unevenly distributed, as the color for each cell is determined by a graduated natural breaks scheme.
Having visualized the daily COVID cases aggregated for the entire country, I may also be interested in comparing daily cases between states. To do this, I’ll create a line chart and split the data by the
state field. This creates a separate line for each state.
Above, you can see that line charts become messy and difficult to interpret when many series are displayed (such charts are sometimes referred to pejoratively as spaghetti plots). I can display this data in a clearer way by creating a matrix heat chart. Matrix heat charts are used to visualize relationships between categorical or date fields with a grid of shaded cells. Here I want to view each state on the Row axis and each day on the Column axis, and I’ll use the
cases_new field to determine the intensity of the cell shading.
You can see that this chart allows for an easier comparison of daily COVID cases between states because each state is displayed as a separate row, whereas the line chart forces all states to compete for the same space.
Building charts using Python and ArcGIS allows you to visually explore the patterns found in data with just a few simple lines of code. If you're interested in learning more, you can dig into the Pro Charts and ArcPy Chart documentation to learn more about all the supported chart types and how you can configure them to suit your visualization needs.
Published at DZone with permission of David Cardella. See the original article here.
Opinions expressed by DZone contributors are their own.