Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

The Problem with Too Narrow Prediction Intervals

DZone's Guide to

The Problem with Too Narrow Prediction Intervals

· Big Data Zone
Free Resource

Learn best practices according to DataOps. Download the free O'Reilly eBook on building a modern Big Data platform.

Almost all pre­dic­tion inter­vals from time series mod­els are too nar­row. This is a well-​​known phe­nom­e­non and arises because they do not account for all sources of uncer­tainty. In my 2002 IJF paper, we mea­sured the size of the prob­lem by com­put­ing the actual cov­er­age per­cent­age of the pre­dic­tion inter­vals on hold-​​out sam­ples. We found that for ETS mod­els, nom­i­nal 95% inter­vals may only pro­vide cov­er­age between 71% and 87%. The dif­fer­ence is due to miss­ing sources of uncertainty.

There are at least four sources of uncer­tainty in fore­cast­ing using time series models:

  1. The ran­dom error term;
  2. The para­me­ter estimates;
  3. The choice of model for the his­tor­i­cal data;
  4. The con­tin­u­a­tion of the his­tor­i­cal data gen­er­at­ing process into the future.

When we pro­duce pre­dic­tion inter­vals for time series mod­els, we gen­er­ally only take into account the first of these sources of uncer­tainty. It would be pos­si­ble to account for 2 and 3 using sim­u­la­tions, but that is almost never done because it would take too much time to com­pute. As com­put­ing speeds increase, it might become a viable approach in the future.

Even if we ignore the model uncer­tainty and the DGP uncer­tainty (sources 3 and 4), and just try to allow for para­me­ter uncer­tainty as well as the ran­dom error term (sources 1 and 2), there are no closed form solu­tions apart from some sim­ple spe­cial cases.

One such spe­cial case is an ARIMA(0,1,0) model with drift, which can be writ­ten as

where is a white noise process. In this case, it is easy to com­pute the uncer­tainty asso­ci­ated with the esti­mate of c, and then allow for it in the forecasts.

This model can be fit­ted using either the Arima func­tion or the rwf func­tion from the fore­cast pack­age for R. If the Arima func­tion is used, the uncer­tainty in c is ignored, but if the rwf func­tion is used, the uncer­tainty in c is included in the pre­dic­tion inter­vals. The dif­fer­ence can be seen in the fol­low­ing sim­u­lated example.

library(forecast)
 
set.seed(22)
x <-ts(cumsum(rnorm(50, -2.5, 4)))
 
RWD.x <- rwf(x,  h=40, drift=TRUE, level=95)
ARIMA.x <- Arima(x, c(0,1,0), include.drift=TRUE)
 
plot(forecast(ARIMA.x, h=40, level=95))
lines(RWD.x$lower, lty=2)
lines(RWD.x$upper, lty=2)

Find the perfect platform for a scalable self-service model to manage Big Data workloads in the Cloud. Download the free O'Reilly eBook to learn more.

Topics:

Published at DZone with permission of Rob J Hyndman, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}