Charting COVID-19 Data With Python
Visualize characteristics and trends of the COVID-19 pandemic in the United States during 2020 using the integration between Python and ArcGIS Platform.
Join the DZone community and get the full member experience.
Join For FreeCharting provides a powerful way to visualize and explore your data by helping to uncover patterns, trends, relationships, and structures that might not be apparent when looking at a table or map. The COVID-19 pandemic has created voluminous streams of data for scientists, researchers, and decision-makers to visualize, analyze, and understand through a variety of data analysis packages and tools.
This blog walks through visualizing characteristics and trends of the COVID-19 pandemic in the United States during 2020 using the integration between Python and ArcGIS Platform.
Preparing the Data
To get started, I’ll load and prepare the data using pandas, but you can use whatever Python tools you prefer. I’m acquiring the data from the New York Times COVID-19 data repository (publicly accessible here), and I’m filtering the data to include only dates from the complete year of 2020.
Python
xxxxxxxxxx
import pandas as pd
from arcgis.features import GeoAccessor
import arcpy
arcpy.env.workspace = 'memory'
DATA_URL = 'https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv'
# load data with pandas, create new fields, and filter
daily_df = (
pd.read_csv(DATA_URL, parse_dates=['date'])
.sort_values(['state', 'date'])
.rename(columns={
'cases': 'cases_total',
'deaths': 'deaths_total'
})
.assign(
cases_new = lambda df: df.groupby('state')['cases_total'].diff().clip(lower=0),
deaths_new = lambda df: df.groupby('state')['deaths_total'].diff().clip(lower=0)
)
.query("'2020-01-01' <= date <= '2020-12-31'")
.reset_index(drop=True)
)
Here’s a quick look at the prepared dataset. Notice that there is an individual row for each date and state combination. These rows will be summarized and aggregated when I visualize this data with charts.
date | state | fips | cases_total | deaths_total | cases_new | deaths_new |
---|---|---|---|---|---|---|
2020-03-13 | Alabama | 1 | 6 | 0 | NaN | NaN |
2020-03-14 | Alabama | 1 | 12 | 0 | 6.0 | 0.0 |
2020-03-15 | Alabama | 1 | 23 | 0 | 11.0 | 0.0 |
2020-03-16 | Alabama | 1 | 29 | 0 | 6.0 | 0.0 |
2020-03-17 | Alabama | 1 | 39 | 0 | 10.0 | 0.0 |
To create an ArcPy chart, you must convert your data to a supported format such as a Layer or Table object, a dataset path, or a feature service URL. For this demo, I’ll save the pandas DataFrame as an in-memory table using the ArcGIS API for Python.
xxxxxxxxxx
daily_df.spatial.to_table('covid_daily')
Visualizing the Data
Now that I’ve prepared the data with the proper fields and saved it in a supported format, I can explore it.
First, I’ll start simple and create a bar chart showing the total COVID cases for each state. I do this by initializing an ArcPy Chart object and configuring the properties. As illustrated in the table above, the dataset contains one row for each state
and date
combination, so here I use a sum
aggregation to calculate the total cases_new
values for each state. Also take note of the dataSource
property, which is used to specify the dataset you want to visualize. Here I’m configuring the chart to use the in-memory table I created above.
xxxxxxxxxx
c = arcpy.Chart('bar_covid_by_state')
c.type = 'bar'
c.title = "Total COVID Cases by State"
c.xAxis.field = 'state'
c.xAxis.title= "State"
c.yAxis.field = 'cases_new'
c.yAxis.title = "Cases"
c.bar.aggregation = 'sum'
c.dataSource = 'memory/covid_daily'
c.exportToSVG('bar_covid_by_state.svg')
The chart above is a good first attempt, but it’s very difficult to read due to the small size. I’ll make the chart larger by setting the chart object’s displaySize
property. I’ll also arrange the bars in a more logical way by sorting them to be in descending order from most cases to fewest cases.
xxxxxxxxxx
c = arcpy.Chart('bar_covid_by_state_desc')
c.type = 'bar'
c.title = "Total COVID Cases by State"
c.xAxis.field = 'state'
c.xAxis.title = "State"
c.yAxis.field = 'cases_new'
c.yAxis.title = "Cases"
c.yAxis.sort = 'DESC'
c.bar.aggregation = 'sum'
c.displaySize = 900, 400
c.dataSource = 'memory/covid_daily'
c.exportToSVG('bar_covid_by_state_desc.svg')
Now I’ll take a look at new cases per day for the entire United States by creating a bar chart with a date field on the X axis and the total aggregated daily COVID cases on the Y axis.
xxxxxxxxxx
c = arcpy.Chart('bar_covid_daily')
c.type = 'bar'
c.title = "Total COVID Cases by Day"
c.xAxis.field = 'date'
c.xAxis.title = "Day"
c.yAxis.field = 'cases_new'
c.yAxis.title = "New Cases"
c.bar.aggregation = 'sum'
c.color = ['#fac9c7']
c.displaySize = 800, 500
c.dataSource = 'memory/covid_daily'
c.exportToSVG('bar_covid_daily.svg')
The above chart is helpful for understanding the trajectory of daily COVID cases in the US, but this chart is difficult to interpret due to the existence of noise in the dataset. As time progresses, you can see that the bars form many peaks and valleys, and this cyclical pattern is most likely due to inconsistent reporting of COVID cases. To combat this noise, I can re-create the same chart, but this time I’ll include a moving average line. Moving averages are useful for smoothing out the noise in a temporal dataset and highlighting the general pattern of the data.
xxxxxxxxxx
c = arcpy.Chart('bar_covid_daily_moving_avg')
c.type = 'bar'
c.title = "Total COVID Cases by Day"
c.xAxis.field = 'date'
c.xAxis.title = "Day"
c.yAxis.field = 'cases_new'
c.yAxis.title = "New Cases"
c.bar.aggregation = 'sum'
c.bar.showMovingAverage = True
c.color = ['#fac9c7']
c.displaySize = 900, 500
c.dataSource = 'memory/covid_daily'
c.exportToSVG('bar_covid_daily_moving_avg.svg')
I can also view aggregated COVID cases over time from a slightly different perspective by creating a calendar heat chart. This chart aggregates daily cases and displays them in a calendar grid. The calendar heat chart is effective at showing a per day summary of temporal data, particularly when the values are unevenly distributed, as the color for each cell is determined by a graduated natural breaks scheme.
xxxxxxxxxx
c = arcpy.Chart('chc_covid_daily')
c.type = 'calendarHeatChart'
c.title = "Total COVID Cases by Day"
c.xAxis.field = 'date'
c.xAxis.title = "Day"
c.yAxis.field = 'cases_new'
c.yAxis.title = "Month"
c.calendarHeatChart.aggregation = 'sum'
c.displaySize = 900, 500
c.dataSource = 'memory/covid_daily'
c.exportToSVG('chc_covid_daily.svg')
Having visualized the daily COVID cases aggregated for the entire country, I may also be interested in comparing daily cases between states. To do this, I’ll create a line chart and split the data by the state
field. This creates a separate line for each state.
xxxxxxxxxx
c = arcpy.Chart('line_covid_daily_by_state')
c.type = 'line'
c.title = "Total Cases by Day"
c.xAxis.field = 'date'
c.xAxis.title = "Day"
c.yAxis.field = 'cases_new'
c.yAxis.title = "New Cases"
c.yAxis.minimum = 0
c.line.aggregation = 'sum'
c.line.splitCategory = 'state'
c.line.timeIntervalSize = 1
c.line.timeIntervalUnits = 'DAYS'
c.displaySize = 900, 500
c.dataSource = 'memory/covid_daily'
c.exportToSVG('line_covid_daily_by_state.svg')
Above, you can see that line charts become messy and difficult to interpret when many series are displayed (such charts are sometimes referred to pejoratively as spaghetti plots). I can display this data in a clearer way by creating a matrix heat chart. Matrix heat charts are used to visualize relationships between categorical or date fields with a grid of shaded cells. Here I want to view each state on the Row axis and each day on the Column axis, and I’ll use the cases_new
field to determine the intensity of the cell shading.
xxxxxxxxxx
c = arcpy.Chart('mhc_covid_by_state')
c.type = 'matrixHeatchart'
c.title = "Daily COVID Cases by State"
c.xAxis.field = 'date'
c.xAxis.title = 'Day'
c.yAxis.field = ['state', 'cases_new']
c.yAxis.title = 'State'
c.matrixHeatChart.aggregation = 'sum'
c.matrixHeatChart.classificationMethod = 'naturalBreaks'
c.matrixHeatChart.classCount = 7
c.matrixHeatChart.nullPolicy = 'zero'
c.legend.title = "Number of Cases"
c.displaySize = 800, 1200
c.dataSource = 'memory/covid_daily'
c.exportToSVG('mhc_covid_by_state.svg')
You can see that this chart allows for an easier comparison of daily COVID cases between states because each state is displayed as a separate row, whereas the line chart forces all states to compete for the same space.
Building charts using Python and ArcGIS allows you to visually explore the patterns found in data with just a few simple lines of code. If you're interested in learning more, you can dig into the Pro Charts and ArcPy Chart documentation to learn more about all the supported chart types and how you can configure them to suit your visualization needs.
Published at DZone with permission of David Cardella. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments