The Problem with Too Narrow Prediction Intervals
Join the DZone community and get the full member experience.Join For Free
Almost all prediction intervals from time series models are too narrow. This is a well-known phenomenon and arises because they do not account for all sources of uncertainty. In my 2002 IJF paper, we measured the size of the problem by computing the actual coverage percentage of the prediction intervals on hold-out samples. We found that for ETS models, nominal 95% intervals may only provide coverage between 71% and 87%. The difference is due to missing sources of uncertainty.
There are at least four sources of uncertainty in forecasting using time series models:
- The random error term;
- The parameter estimates;
- The choice of model for the historical data;
- The continuation of the historical data generating process into the future.
When we produce prediction intervals for time series models, we generally only take into account the first of these sources of uncertainty. It would be possible to account for 2 and 3 using simulations, but that is almost never done because it would take too much time to compute. As computing speeds increase, it might become a viable approach in the future.
Even if we ignore the model uncertainty and the DGP uncertainty (sources 3 and 4), and just try to allow for parameter uncertainty as well as the random error term (sources 1 and 2), there are no closed form solutions apart from some simple special cases.
One such special case is an ARIMA(0,1,0) model with drift, which can be written as
where is a white noise process. In this case, it is easy to compute the uncertainty associated with the estimate of , and then allow for it in the forecasts.
This model can be fitted using either the
Arima function or the
rwf function from the forecast package for R. If the
Arima function is used, the uncertainty in is ignored, but if the
rwf function is used, the uncertainty in is included in the prediction intervals. The difference can be seen in the following simulated example.
library(forecast) set.seed(22) x <-ts(cumsum(rnorm(50, -2.5, 4))) RWD.x <- rwf(x, h=40, drift=TRUE, level=95) ARIMA.x <- Arima(x, c(0,1,0), include.drift=TRUE) plot(forecast(ARIMA.x, h=40, level=95)) lines(RWD.x$lower, lty=2) lines(RWD.x$upper, lty=2)
Published at DZone with permission of Rob J Hyndman, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.