Over a million developers have joined DZone.

Visualization of probabilistic forecasts

DZone's Guide to

Visualization of probabilistic forecasts

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

This week my research group dis­cussed Adrian Raftery’s recent paper on “Use and Com­mu­ni­ca­tion of Prob­a­bilis­tic Fore­casts” which pro­vides a fas­ci­nat­ing but brief sur­vey of some of his work on mod­el­ling and com­mu­ni­cat­ing uncer­tain futures. Coin­ci­den­tally, today I was also sent a copy of David Spiegelhalter’s paper on “Visu­al­iz­ing Uncer­tainty About the Future”. Both are well-​​worth reading.

It made me think about my own efforts to com­mu­ni­cate future uncer­tainty through graph­ics. Of course, for time series fore­casts I nor­mally show pre­dic­tion inter­vals. I pre­fer to use more than one inter­val at a time because it helps con­vey a lit­tle more infor­ma­tion. The default in the fore­cast pack­age for R is to show both an 80% and a 95% inter­val like this:

It is some­times prefer­able to use a 50% and a 95% inter­val, rather like a box­plot:

In some cir­cles (espe­cially macro­eco­nomic fore­cast­ing), fan charts are pop­u­lar:

Per­son­ally, I don’t like these at all as they lose any spe­cific prob­a­bilis­tic inter­pretabil­ity. What does the darker shaded region actu­ally refer to? At least in the pre­vi­ous ver­sion, it is clear that the dark region con­tains 50% of the probability.

The above three exam­ples are eas­ily pro­duced using the fore­cast package:

1. fit <- ets(hsales)
2. plot(forecast(fit),include=120)
3. plot(forecast(fit,level=c(50,95)),include=120)
4. plot(forecast(fit,fan=TRUE),include=120)

For multi-​​modal dis­tri­b­u­tions I like to use high­est den­sity regions. Here is an exam­ple applied to Nicholson’s blowfly data using a thresh­old model:

The dark region has 50% cov­er­age and the light region has 95% cov­er­age. The fore­cast dis­tri­b­u­tions become bimodal after the first ten iter­a­tions, and so the 50% region is split in two to show that. This graphic was taken from a J. Fore­cast­ing paper I wrote in 1996, so these ideas have been around for a while!

It is easy enough to pro­duce fore­cast HDR with time series. Here is some R code to do it:

01. #HDR for time series object
02. # Assumes that forecasts can be computed and futures simulated from object
03. forecasthdr <- function(object, h = ifelse(object$m > 1, 2 * object$m, 10),
04.                         nsim=2000, plot=TRUE, level=c(50,95), xlim=NULL, ylim=NULL, ...)
05. {
06.   require(hdrcde)
07.   # Compute forecasts
08.   fc <- forecast(object)
09.   ft <- time(fc$mean)
11.   # Simulate future sample paths
12.   sim <- matrix(0,nrow=h,ncol=nsim)
13.   for(i in 1:nsim)
14.     sim[,i] <- simulate(object, nsim=h)
16.   # Compute HDRs
17.   nl <- length(level)
18.   hd <- array(0, c(h,nl,10))
19.   mode <- numeric(h)
20.   for(k in 1:h)
21.   {
22.     hdtmp <- hdr(sim[k,], prob=level)
23.     hd[k,,1:ncol(hdtmp$hdr)] <- hdtmp$hdr
24.     mode[k] <- hdtmp$mode
25.   }
26.   # Remove unnecessary sections of HDRs
27.   nz <- apply(abs(hd),3,sum) > 0
28.   hd <- hd[,,nz]
29.   dimnames(hd)[[1]] <- 1:h
30.   dimnames(hd)[[2]] <- level
33.   # Produce plot if required
34.   if(plot)
35.   {
36.     if(is.<span>null</span>(xlim))
37.       xlim <- range(time(object$x),ft)
38.     if(is.<span>null</span>(ylim))
39.       ylim <- range(object$x, hd)
40.     plot(object$x,xlim=xlim, ylim=ylim, ...)
41.     # Show HDRs
42.     cols <- rev(colorspace::sequential_hcl(52))[level - 49]
43.     for(k in 1:h)
44.     {
45.       for(j in 1:nl)
46.       {
47.         hdtmp <- hd[k,j,]
48.         nint <- length(hdtmp)/2
49.         for(l in 1:nint)
50.         {
51.           polygon(ft[k]+c(-1,-1,1,1)/object$m/2,
52.                   c(hdtmp[2*l-1],hdtmp[2*l],hdtmp[2*l],hdtmp[2*l-1]),
53.                 col=cols[j], border=FALSE)
54.         }
55.       }
56.       points(ft[k], mode[k], pch=19, col="blue",cex=0.8)
57.     }
58.     #lines(fc$mean,col='blue',lwd=2)
59.   }
61.   # Return HDRs
62.   return(list(hdr=hd,mode=mode,level=level))
63. }

We can apply it using the exam­ple I started with:

1. z <- forecasthdr(fit,xlim=c(1986,1998),nsim=5000,
2.               xlab="Year",ylab="US monthly housing sales")

The dots are modes of the fore­cast dis­tri­b­u­tions, and the 50% and 95% high­est den­sity regions are also shown. In this case, the dis­tri­b­u­tions are uni­modal, and so all the regions are inter­vals.

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.


Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}