Over a million developers have joined DZone.

Introducing the PWFSLSmoke Package

DZone's Guide to

Introducing the PWFSLSmoke Package

The new PWFSLSmoke package in R from Mazama Science can assist you in monitoring air quality generated by wildfire smoke.

· Big Data Zone
Free Resource

Effortlessly power IoT, predictive analytics, and machine learning applications with an elastic, resilient data infrastructure. Learn how with Mesosphere DC/OS.

Mazama Science has just released the PWFSLSmoke package. Source code is available on GitHub. Here is the package description.

Utilities for working with air quality monitoring data with a focus on small particulates (PM2.5) generated by wildfire smoke. Functions are provided for downloading available data from the United States Environmental Protection Agency (US EPA) and it’s AirNow air quality site. Additional sources of PM2.5 data made accessible by the package include AIRSIS (password protected), the Western Regional Climate Center (WRCC), and the open-source site OpenAQ.

In this post, we discuss the reasons for creating this package and provide examples of its use.

For several years now, Mazama Science has been working with the AirFire team at The USFS Pacific Wildland Fire Sciences Lab. The PWFSLSmoke R package is being developed for PWFSL to help modelers and scientists more easily work with PM2.5 data from monitoring locations across North America.

The package makes it easier to obtain data, perform analyses, and generate reports. It includes functionality to:

  • Download and easily work with regulatory PM2.5 data from AirNow.
  • Download and quality control raw monitoring data from other sources.
  • Convert between UTC and local time zones.
  • Apply various algorithms to the data (nowcast, rolling means, aggregation, etc.).
  • Provide interactive timeseries and maps through RStudio’s Viewer pane.
  • Create a variety of publication-ready maps and time series plots.

Data Model

The PWFSLSmoke package is designed to provide a compact, full-featured suite of utilities for working with PM 2.5 data used to monitor wildfire smoke. A uniform data model provides consistent data access across monitoring data available from different agencies. The core data model in this package is defined by the ws_monitor object used to store data associated with groups of individual monitors.

To work efficiently with the package it is important to understand the structure of this data object and which functions operate on it. Package functions that begin with monitor_, expect objects of class ws_monitor as their first argument. (Note that ‘ws_’ stands for ‘wildfire smoke’.)

Monitoring data will typically be obtained from an agency charged with archiving data acquired at monitoring sites. For wildfire smoke, the primary pollutant is PM 2.5 and the sites archiving this data include AirNowWRCC and AIRSIS.

The data model for monitoring data consists of an R list with two dataframes: data and meta.

The data dataframe contains all hourly measurements organized with rows (the ‘unlimited’ dimension) as unique timesteps and columns as unique monitors. The very first column is always named datetime and contains the POSIXct datetime in Coordinated Universal Time (UTC).

The meta dataframe contains all metadata associated with monitoring sites and is organized with rows as unique sites and columns as site attributes. The following columns are guaranteed to exist in the metadataframe:

  • monitorID – unique ID for each monitoring site (i.e., each instrument deployment).
  • longitude – decimal degrees East.
  • latitude – decimal degrees North.
  • elevation – meters above sea level.
  • timezone – Olson timezone.
  • countryCode – ISO 3166-1 alpha-2 code.
  • stateCode – ISO 3166-2 alpha-2 code.

(The MazamaSpatialUtils package is used to assign time zones and state and country codes.)

Additional columns may be available in the meta data frame and these will depend on the source of the data.

It is important to note that the monitorID acts as a unique key that connects data with metadata. The monitorID is used for column names in the data data frame and for row names in the meta dataframe. So the following will always be true:

rownames(ws_monitor$meta) == ws_monitor$meta$monitorID
colnames(ws_monitor$data) == c('datetime',ws_monitor$meta$monitorID)

File Size Reduction

Most providers of monitoring data store the data as it comes in, retaining station metadata with every single datum. However, individual monitors rarely move. Thus, replication of metadata vastly inflates the size of data in this “as-it-comes-in” data management strategy. The ws_monitor data model reflects the reality of monitors that rarely move.

Our Big Data presentation describes how we used this data model to convert unwieldy EPA annual datasets into bite sized .RData files without losing any information. In one example we reduced a 665 Mb ASCII CSV file into a 3.3 Mb .RData file — 1/200th the size of the original.

Reduced file sizes make working with data from multiple monitors much easier and the remainder of this post demonstrates the package capabilities by looking at smoke generated from the 2015 wildfire season in the Pacific Northwest

Pacific Northwest 2015 Wildfires

In the summer of 2015 Washington state had several catastrophic wildfires that led to heavy smoke in eastern Washington and northern Idaho for quite a few days. We will show how the mapping and time series plotting functions in the PWFSLSmoke package can help us visualize the spatial and temporal extent of wildfire smoke during these events.

To begin, let’s have a broader look at AirNow ambient monitoring data for the Pacific Northwest — Washington, Oregon, and Idaho — from June 1 through October 31, 2015. First, we create a 24-hour rolling mean for each monitor:

PacNW <- Northwest_Megafires # To work with AirNow data directly, uncomment the next two lines 
#N_M <- airnow_load(startdate=20150531, enddate=20151101) 
#PacNW <- monitor_subset(airnow, stateCodes=c("WA", "OR", "ID")) 
PacNW_24 <- monitor_rollingMean(PacNW, width=24)

Now we can create a map where each monitor is color coded by the maximum value of this 24-hr rolling mean. By default, AQI colors and labels are used but arguments to monitorMap() and addAQILegend() allow users to specify their own.

monitorMap(PacNW_24, slice=max) 
addAQILegend(title="Max AQI", cex=0.7)

The map shows that many areas of the Pacific NW had days with unhealthy air but a cluster of sites in Idaho were particularly bad.

(Note that this is not the regulatory midnight-to-midnight AQI but a continuous 24-hr rolling mean.)

We can us an interactive “leaflet” map to zoom in and get more information:

# Commented out for the blog post #monitorLeaflet(PacNW_24, slice=max)

The monitorMap() and monitorLeaflet() plots show us pretty much the same information, except that the leaflet plot allows you to get monitor-specific metadata by clicking on a monitor. In this manner, we can assemble a list of monitorIDs in and around the Nez Perce Reservation in northern Idaho and generate a multi-monitor time series plot showing the terrible smoke in late August.

NezPerceIDs <- c("160571012","160690012","160690013","160690014","160490003","160491012") 
NezPerce <- monitor_subset(PacNW, monitorIDs=NezPerceIDs) 
monitorPlot_timeseries(NezPerce, style='gnats')

At this point, it is clear that August is the month of interest so we’ll subset all of our existing ws_monitor objects to cover the month of August with full days according to West Coast time.

PacNW <- monitor_subset(PacNW, tlim=c(20150801,20150831), 
PacNW_24 <- monitor_subset(PacNW_24, tlim=c(20150801,20150831), 
NezPerce <- monitor_subset(NezPerce, tlim=c(20150801,20150831), 

We can use the monitorPlot_dailyBarplot() function to look at official, midnight-to-midnight AQI levels for each monitor during the month of August:

layout(matrix(seq(6))) par(mar=c(1,1,1,1)) for (monitorID in NezPerceIDs) {   siteName <- NezPerce$meta[monitorID,'siteName']   monitorPlot_dailyBarplot(NezPerce, monitorID=monitorID, main=siteName, axes=FALSE)  } par(mar=c(5,4,4,2)+.1) layout(1)

We could also take a more automated approach and directly calculate the location with the worst acute smoke (worst hourly value) during this time period:

data <- PacNW$data[,-1] # omit 'datetime' column
maxPM25 <- apply(data, 2, max, na.rm=TRUE)
worstAcute <- names(sort(maxPM25, decreasing=TRUE))[1:6]
intersect(worstAcute, NezPerceIDs)
## [1] "160571012" "160490003" "160491012"
##                      siteName countyName stateCode
## 160571012 Juliaetta E-sampler      LATAH        ID

We see that three of the sites we identified “by hand” also had the worst acute smoke of any sites in the Pacific NW. The site with the highest measured value of PM 2.5 for the month of august was Julietta in Latah county, Idaho.

Let’s do a similar analysis for chronic smoke. In this case, we will calculate the number of days at or above the AQI “unhealthy” level:

PacNW_dailyAvg <- monitor_dailyStatistic(PacNW, FUN=mean, minHours=20)
## Warning in monitor_dailyStatistic(PacNW, FUN = mean, minHours = 20): Found 
## 2 timezones. Only the first will be used 
data <- PacNW_dailyAvg$data[,-1] 
unhealthyDays <- apply(data, 2, function(x){ sum(x >= AQI$breaks_24[4], na.rm=TRUE) }) 
worstChronic <- names(sort(unhealthyDays, decreasing=TRUE))[1:6] 
intersect(worstChronic, NezPerceIDs) 
## [1] "160490003" "160571012" "160690014" 
##            siteName countyName stateCode ## 160490003 Kamiah-ID      IDAHO        ID

The area around the Nez Perce Reservation again had three of the worst sites for chronic smoke with the worst being Kamiah in Idaho county, Idaho.

Latah and Idaho counties were unfortunately downwind of several extraordinarily large Washington state wildfires in 2015 with the following ignition dates:

  • August 11: Kettle Complex.
  • August 13: Grizzly Bear Complex.
  • August 13: North Star.
  • August 14: Chelan Complex.
  • August 15: Okanogan Complex.

A little googling and we can obtain a set of coordinates for these fires to use in a new “Google” map of the August daily average maxima with fire icons at fire locations:

fireLons <- c(-118.461,-117.679,-120.039,-119.002,-119.662) 
fireLats <- c(48.756,46.11,47.814,48.338,48.519) 
gmap <- monitorGoogleMap(PacNW_dailyAvg, zoom=7, centerLon=-118, centerLat=47, 
                         slice=max) addIcon('redFlame', fireLons, fireLats, 
                                            map=gmap, expansion=0.2) 
addAQILegend(cex=0.8) title("August, 2015", line=-1.5, cex.main=2)

From this map, we can see that the monitor in Omak on the Colleville reservation also registered a daily AQI level of “extreme” as it was in-between the two largest fires. The overall spatial pattern, however, shows that the worst impact from these large fires resulted from smoke drifting SE with the prevailing winds and bunching up in the valleys and plains just upwind of the Bitterroot mountains of northern Idaho.

We will now take a closer look at two tribal monitors: Omak for the Colleville tribe and Kamiah for the Nez Perce.

  Omak <- monitor_subset(PacNW, monitorIDs="530470013")
  Kamiah <- monitor_subset(PacNW, monitorIDs="160490003") 
  monitorPlot_dailyBarplot(Omak, main="August 2015 Daiy AQI -- Omak, WA",                          
                           labels_x_nudge=0.8, labels_y_nudge=250) 
  monitorPlot_dailyBarplot(Kamiah, main="August Daily AQI -- Kamiah, ID",                          
                           labels_x_nudge=0.8, labels_y_nudge=250) layout(1)

We can also examine the diurnal cycle during the very worst days:

monitorPlot_timeOfDaySpaghetti(Omak, title="Aug 23-26 Diurnal Smoke -- Omak, WA",
                               xlab='', ylab='',
monitorPlot_timeOfDaySpaghetti(Kamiah, title="Aug 23-26 Diurnal Smoke -- Kamiah, ID",
                               xlab='', ylab='',

While both sites had horrible air all day, in Omak, it was somewhat less horrible in the evenings. This is in contrast to Kamiah where the least horrible conditions were encountered in the mornings.

This ends the spatiotemporal exploration of smoke from Pacific NW mega-fire in 2015. We hope this inspires you to harness the mapping plotting functionality available in the PWFSLSmoke package.

Learn to design and build better data-rich applications with this free eBook from O’Reilly. Brought to you by Mesosphere DC/OS.

big data ,data science ,geospatial

Published at DZone with permission of Jonathan Callahan, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.


Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.


{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}