Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

The Problem with Too Narrow Prediction Intervals

DZone's Guide to

The Problem with Too Narrow Prediction Intervals

· Big Data Zone
Free Resource

Find your next Big Data job at DZone Jobs. See jobs focused on Big Data or create your profile and have the employers come to you!

Almost all pre­dic­tion inter­vals from time series mod­els are too nar­row. This is a well-​​known phe­nom­e­non and arises because they do not account for all sources of uncer­tainty. In my 2002 IJF paper, we mea­sured the size of the prob­lem by com­put­ing the actual cov­er­age per­cent­age of the pre­dic­tion inter­vals on hold-​​out sam­ples. We found that for ETS mod­els, nom­i­nal 95% inter­vals may only pro­vide cov­er­age between 71% and 87%. The dif­fer­ence is due to miss­ing sources of uncertainty.

There are at least four sources of uncer­tainty in fore­cast­ing using time series models:

  1. The ran­dom error term;
  2. The para­me­ter estimates;
  3. The choice of model for the his­tor­i­cal data;
  4. The con­tin­u­a­tion of the his­tor­i­cal data gen­er­at­ing process into the future.

When we pro­duce pre­dic­tion inter­vals for time series mod­els, we gen­er­ally only take into account the first of these sources of uncer­tainty. It would be pos­si­ble to account for 2 and 3 using sim­u­la­tions, but that is almost never done because it would take too much time to com­pute. As com­put­ing speeds increase, it might become a viable approach in the future.

Even if we ignore the model uncer­tainty and the DGP uncer­tainty (sources 3 and 4), and just try to allow for para­me­ter uncer­tainty as well as the ran­dom error term (sources 1 and 2), there are no closed form solu­tions apart from some sim­ple spe­cial cases.

One such spe­cial case is an ARIMA(0,1,0) model with drift, which can be writ­ten as

where is a white noise process. In this case, it is easy to com­pute the uncer­tainty asso­ci­ated with the esti­mate of c, and then allow for it in the forecasts.

This model can be fit­ted using either the Arima func­tion or the rwf func­tion from the fore­cast pack­age for R. If the Arima func­tion is used, the uncer­tainty in c is ignored, but if the rwf func­tion is used, the uncer­tainty in c is included in the pre­dic­tion inter­vals. The dif­fer­ence can be seen in the fol­low­ing sim­u­lated example.

library(forecast)
 
set.seed(22)
x <-ts(cumsum(rnorm(50, -2.5, 4)))
 
RWD.x <- rwf(x,  h=40, drift=TRUE, level=95)
ARIMA.x <- Arima(x, c(0,1,0), include.drift=TRUE)
 
plot(forecast(ARIMA.x, h=40, level=95))
lines(RWD.x$lower, lty=2)
lines(RWD.x$upper, lty=2)

Find your next Big Data job at DZone Jobs. See jobs focused on Big Data or create your profile and have the employers come to you!

Topics:

Published at DZone with permission of Rob J Hyndman, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}