# The Problem with Too Narrow Prediction Intervals

# The Problem with Too Narrow Prediction Intervals

Join the DZone community and get the full member experience.

Join For Free**The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.**

Almost all prediction intervals from time series models are too narrow. This is a well-known phenomenon and arises because they do not account for all sources of uncertainty. In my 2002 IJF paper, we measured the size of the problem by computing the actual coverage percentage of the prediction intervals on hold-out samples. We found that for ETS models, nominal 95% intervals may only provide coverage between 71% and 87%. The difference is due to missing sources of uncertainty.

There are at least four sources of uncertainty in forecasting using time series models:

- The random error term;
- The parameter estimates;
- The choice of model for the historical data;
- The continuation of the historical data generating process into the future.

When we produce prediction intervals for time series models, we generally only take into account the first of these sources of uncertainty. It would be possible to account for 2 and 3 using simulations, but that is almost never done because it would take too much time to compute. As computing speeds increase, it might become a viable approach in the future.

Even if we ignore the model uncertainty and the DGP uncertainty (sources 3 and 4), and just try to allow for parameter uncertainty as well as the random error term (sources 1 and 2), there are no closed form solutions apart from some simple special cases.

One such special case is an ARIMA(0,1,0) model with drift, which can be written as

where is a white noise process. In this case, it is easy to compute the uncertainty associated with the estimate of , and then allow for it in the forecasts.

This model can be fitted using either the `Arima`

function or the `rwf`

function from the forecast package for R. If the `Arima`

function is used, the uncertainty in is ignored, but if the `rwf`

function is used, the uncertainty in is included in the prediction intervals. The difference can be seen in the following simulated example.

```
library(forecast)
set.seed(22)
x <-ts(cumsum(rnorm(50, -2.5, 4)))
RWD.x <- rwf(x, h=40, drift=TRUE, level=95)
ARIMA.x <- Arima(x, c(0,1,0), include.drift=TRUE)
plot(forecast(ARIMA.x, h=40, level=95))
lines(RWD.x$lower, lty=2)
lines(RWD.x$upper, lty=2)
```

**Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.**

Published at DZone with permission of Rob J Hyndman , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

## {{ parent.title || parent.header.title}}

## {{ parent.tldr }}

## {{ parent.linkDescription }}

{{ parent.urlSource.name }}