Visualization of probabilistic forecasts
Join the DZone community and get the full member experience.
Join For FreeThis week my research group discussed Adrian Raftery’s recent paper on “Use and Communication of Probabilistic Forecasts” which provides a fascinating but brief survey of some of his work on modelling and communicating uncertain futures. Coincidentally, today I was also sent a copy of David Spiegelhalter’s paper on “Visualizing Uncertainty About the Future”. Both are well-worth reading.
It made me think about my own efforts to communicate future uncertainty through graphics. Of course, for time series forecasts I normally show prediction intervals. I prefer to use more than one interval at a time because it helps convey a little more information. The default in the forecast package for R is to show both an 80% and a 95% interval like this:
It is sometimes preferable to use a 50% and a 95% interval, rather like a boxplot:
In some circles (especially macroeconomic forecasting), fan charts are popular:
Personally, I don’t like these at all as they lose any specific probabilistic interpretability. What does the darker shaded region actually refer to? At least in the previous version, it is clear that the dark region contains 50% of the probability.
The above three examples are easily produced using the forecast package:
1.
fit <- ets(hsales)
2.
plot(forecast(fit),include=120)
3.
plot(forecast(fit,level=c(50,95)),include=120)
4.
plot(forecast(fit,fan=TRUE),include=120)
For multi-modal distributions I like to use highest density regions. Here is an example applied to Nicholson’s blowfly data using a threshold model:
The dark region has 50% coverage and the light region has 95% coverage. The forecast distributions become bimodal after the first ten iterations, and so the 50% region is split in two to show that. This graphic was taken from a J. Forecasting paper I wrote in 1996, so these ideas have been around for a while!
It is easy enough to produce forecast HDR with time series. Here is some R code to do it:
01.
#HDR for time series object
02.
# Assumes that forecasts can be computed and futures simulated from object
03.
forecasthdr <- function(object, h = ifelse(object$m > 1, 2 * object$m, 10),
04.
nsim=2000, plot=TRUE, level=c(50,95), xlim=NULL, ylim=NULL, ...)
05.
{
06.
require(hdrcde)
07.
# Compute forecasts
08.
fc <- forecast(object)
09.
ft <- time(fc$mean)
10.
11.
# Simulate future sample paths
12.
sim <- matrix(0,nrow=h,ncol=nsim)
13.
for(i in 1:nsim)
14.
sim[,i] <- simulate(object, nsim=h)
15.
16.
# Compute HDRs
17.
nl <- length(level)
18.
hd <- array(0, c(h,nl,10))
19.
mode <- numeric(h)
20.
for(k in 1:h)
21.
{
22.
hdtmp <- hdr(sim[k,], prob=level)
23.
hd[k,,1:ncol(hdtmp$hdr)] <- hdtmp$hdr
24.
mode[k] <- hdtmp$mode
25.
}
26.
# Remove unnecessary sections of HDRs
27.
nz <- apply(abs(hd),3,sum) > 0
28.
hd <- hd[,,nz]
29.
dimnames(hd)[[1]] <- 1:h
30.
dimnames(hd)[[2]] <- level
31.
32.
33.
# Produce plot if required
34.
if(plot)
35.
{
36.
if(is.<
span
>null</
span
>(xlim))
37.
xlim <- range(time(object$x),ft)
38.
if(is.<
span
>null</
span
>(ylim))
39.
ylim <- range(object$x, hd)
40.
plot(object$x,xlim=xlim, ylim=ylim, ...)
41.
# Show HDRs
42.
cols <- rev(colorspace::sequential_hcl(52))[level - 49]
43.
for(k in 1:h)
44.
{
45.
for(j in 1:nl)
46.
{
47.
hdtmp <- hd[k,j,]
48.
nint <- length(hdtmp)/2
49.
for(l in 1:nint)
50.
{
51.
polygon(ft[k]+c(-1,-1,1,1)/object$m/2,
52.
c(hdtmp[2*l-1],hdtmp[2*l],hdtmp[2*l],hdtmp[2*l-1]),
53.
col=cols[j], border=FALSE)
54.
}
55.
}
56.
points(ft[k], mode[k], pch=19, col="blue",cex=0.8)
57.
}
58.
#lines(fc$mean,col='blue',lwd=2)
59.
}
60.
61.
# Return HDRs
62.
return(list(hdr=hd,mode=mode,level=level))
63.
}
We can apply it using the example I started with:
1.
z <- forecasthdr(fit,xlim=c(1986,1998),nsim=5000,
2.
xlab="Year",ylab="US monthly housing sales")
Published at DZone with permission of Rob J Hyndman, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments