Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

The Problem with Too Narrow Prediction Intervals

DZone's Guide to

The Problem with Too Narrow Prediction Intervals

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

Almost all pre­dic­tion inter­vals from time series mod­els are too nar­row. This is a well-​​known phe­nom­e­non and arises because they do not account for all sources of uncer­tainty. In my 2002 IJF paper, we mea­sured the size of the prob­lem by com­put­ing the actual cov­er­age per­cent­age of the pre­dic­tion inter­vals on hold-​​out sam­ples. We found that for ETS mod­els, nom­i­nal 95% inter­vals may only pro­vide cov­er­age between 71% and 87%. The dif­fer­ence is due to miss­ing sources of uncertainty.

There are at least four sources of uncer­tainty in fore­cast­ing using time series models:

  1. The ran­dom error term;
  2. The para­me­ter estimates;
  3. The choice of model for the his­tor­i­cal data;
  4. The con­tin­u­a­tion of the his­tor­i­cal data gen­er­at­ing process into the future.

When we pro­duce pre­dic­tion inter­vals for time series mod­els, we gen­er­ally only take into account the first of these sources of uncer­tainty. It would be pos­si­ble to account for 2 and 3 using sim­u­la­tions, but that is almost never done because it would take too much time to com­pute. As com­put­ing speeds increase, it might become a viable approach in the future.

Even if we ignore the model uncer­tainty and the DGP uncer­tainty (sources 3 and 4), and just try to allow for para­me­ter uncer­tainty as well as the ran­dom error term (sources 1 and 2), there are no closed form solu­tions apart from some sim­ple spe­cial cases.

One such spe­cial case is an ARIMA(0,1,0) model with drift, which can be writ­ten as

where is a white noise process. In this case, it is easy to com­pute the uncer­tainty asso­ci­ated with the esti­mate of c, and then allow for it in the forecasts.

This model can be fit­ted using either the Arima func­tion or the rwf func­tion from the fore­cast pack­age for R. If the Arima func­tion is used, the uncer­tainty in c is ignored, but if the rwf func­tion is used, the uncer­tainty in c is included in the pre­dic­tion inter­vals. The dif­fer­ence can be seen in the fol­low­ing sim­u­lated example.

library(forecast)
 
set.seed(22)
x <-ts(cumsum(rnorm(50, -2.5, 4)))
 
RWD.x <- rwf(x,  h=40, drift=TRUE, level=95)
ARIMA.x <- Arima(x, c(0,1,0), include.drift=TRUE)
 
plot(forecast(ARIMA.x, h=40, level=95))
lines(RWD.x$lower, lty=2)
lines(RWD.x$upper, lty=2)

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

Topics:

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}