Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Visualization of probabilistic forecasts

DZone's Guide to

Visualization of probabilistic forecasts

· Big Data Zone
Free Resource

Need to build an application around your data? Learn more about dataflow programming for rapid development and greater creativity. 

This week my research group dis­cussed Adrian Raftery’s recent paper on “Use and Com­mu­ni­ca­tion of Prob­a­bilis­tic Fore­casts” which pro­vides a fas­ci­nat­ing but brief sur­vey of some of his work on mod­el­ling and com­mu­ni­cat­ing uncer­tain futures. Coin­ci­den­tally, today I was also sent a copy of David Spiegelhalter’s paper on “Visu­al­iz­ing Uncer­tainty About the Future”. Both are well-​​worth reading.

It made me think about my own efforts to com­mu­ni­cate future uncer­tainty through graph­ics. Of course, for time series fore­casts I nor­mally show pre­dic­tion inter­vals. I pre­fer to use more than one inter­val at a time because it helps con­vey a lit­tle more infor­ma­tion. The default in the fore­cast pack­age for R is to show both an 80% and a 95% inter­val like this:

It is some­times prefer­able to use a 50% and a 95% inter­val, rather like a box­plot:

In some cir­cles (espe­cially macro­eco­nomic fore­cast­ing), fan charts are pop­u­lar:

Per­son­ally, I don’t like these at all as they lose any spe­cific prob­a­bilis­tic inter­pretabil­ity. What does the darker shaded region actu­ally refer to? At least in the pre­vi­ous ver­sion, it is clear that the dark region con­tains 50% of the probability.

The above three exam­ples are eas­ily pro­duced using the fore­cast package:

1. fit <- ets(hsales)
2. plot(forecast(fit),include=120)
3. plot(forecast(fit,level=c(50,95)),include=120)
4. plot(forecast(fit,fan=TRUE),include=120)

For multi-​​modal dis­tri­b­u­tions I like to use high­est den­sity regions. Here is an exam­ple applied to Nicholson’s blowfly data using a thresh­old model:



The dark region has 50% cov­er­age and the light region has 95% cov­er­age. The fore­cast dis­tri­b­u­tions become bimodal after the first ten iter­a­tions, and so the 50% region is split in two to show that. This graphic was taken from a J. Fore­cast­ing paper I wrote in 1996, so these ideas have been around for a while!

It is easy enough to pro­duce fore­cast HDR with time series. Here is some R code to do it:

01. #HDR for time series object
02. # Assumes that forecasts can be computed and futures simulated from object
03. forecasthdr <- function(object, h = ifelse(object$m > 1, 2 * object$m, 10),
04.                         nsim=2000, plot=TRUE, level=c(50,95), xlim=NULL, ylim=NULL, ...)
05. {
06.   require(hdrcde)
07.   # Compute forecasts
08.   fc <- forecast(object)
09.   ft <- time(fc$mean)
10.   
11.   # Simulate future sample paths
12.   sim <- matrix(0,nrow=h,ncol=nsim)
13.   for(i in 1:nsim)
14.     sim[,i] <- simulate(object, nsim=h)
15.   
16.   # Compute HDRs
17.   nl <- length(level)
18.   hd <- array(0, c(h,nl,10))
19.   mode <- numeric(h)
20.   for(k in 1:h)
21.   {
22.     hdtmp <- hdr(sim[k,], prob=level)
23.     hd[k,,1:ncol(hdtmp$hdr)] <- hdtmp$hdr
24.     mode[k] <- hdtmp$mode
25.   }
26.   # Remove unnecessary sections of HDRs
27.   nz <- apply(abs(hd),3,sum) > 0
28.   hd <- hd[,,nz]
29.   dimnames(hd)[[1]] <- 1:h
30.   dimnames(hd)[[2]] <- level
31.   
32.   
33.   # Produce plot if required
34.   if(plot)
35.   {
36.     if(is.<span>null</span>(xlim))
37.       xlim <- range(time(object$x),ft)
38.     if(is.<span>null</span>(ylim))
39.       ylim <- range(object$x, hd)
40.     plot(object$x,xlim=xlim, ylim=ylim, ...)
41.     # Show HDRs
42.     cols <- rev(colorspace::sequential_hcl(52))[level - 49]
43.     for(k in 1:h)
44.     {
45.       for(j in 1:nl)
46.       {
47.         hdtmp <- hd[k,j,]
48.         nint <- length(hdtmp)/2
49.         for(l in 1:nint)
50.         {
51.           polygon(ft[k]+c(-1,-1,1,1)/object$m/2,
52.                   c(hdtmp[2*l-1],hdtmp[2*l],hdtmp[2*l],hdtmp[2*l-1]),
53.                 col=cols[j], border=FALSE)
54.         }
55.       }
56.       points(ft[k], mode[k], pch=19, col="blue",cex=0.8)
57.     }
58.     #lines(fc$mean,col='blue',lwd=2)
59.   }
60.   
61.   # Return HDRs
62.   return(list(hdr=hd,mode=mode,level=level))
63. }

We can apply it using the exam­ple I started with:

1. z <- forecasthdr(fit,xlim=c(1986,1998),nsim=5000,
2.               xlab="Year",ylab="US monthly housing sales")


The dots are modes of the fore­cast dis­tri­b­u­tions, and the 50% and 95% high­est den­sity regions are also shown. In this case, the dis­tri­b­u­tions are uni­modal, and so all the regions are inter­vals.

Check out the Exaptive data application Studio. Technology agnostic. No glue code. Use what you know and rely on the community for what you don't. Try the community version.

Topics:

Published at DZone with permission of Rob J Hyndman, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}