DZone
Big Data Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Big Data Zone > The Problem with Too Narrow Prediction Intervals

The Problem with Too Narrow Prediction Intervals

Rob J Hyndman user avatar by
Rob J Hyndman
·
Jan. 23, 15 · Big Data Zone · Interview
Like (0)
Save
Tweet
5.91K Views

Join the DZone community and get the full member experience.

Join For Free

Almost all pre­dic­tion inter­vals from time series mod­els are too nar­row. This is a well-​​known phe­nom­e­non and arises because they do not account for all sources of uncer­tainty. In my 2002 IJF paper, we mea­sured the size of the prob­lem by com­put­ing the actual cov­er­age per­cent­age of the pre­dic­tion inter­vals on hold-​​out sam­ples. We found that for ETS mod­els, nom­i­nal 95% inter­vals may only pro­vide cov­er­age between 71% and 87%. The dif­fer­ence is due to miss­ing sources of uncertainty.

There are at least four sources of uncer­tainty in fore­cast­ing using time series models:

  1. The ran­dom error term;
  2. The para­me­ter estimates;
  3. The choice of model for the his­tor­i­cal data;
  4. The con­tin­u­a­tion of the his­tor­i­cal data gen­er­at­ing process into the future.

When we pro­duce pre­dic­tion inter­vals for time series mod­els, we gen­er­ally only take into account the first of these sources of uncer­tainty. It would be pos­si­ble to account for 2 and 3 using sim­u­la­tions, but that is almost never done because it would take too much time to com­pute. As com­put­ing speeds increase, it might become a viable approach in the future.

Even if we ignore the model uncer­tainty and the DGP uncer­tainty (sources 3 and 4), and just try to allow for para­me­ter uncer­tainty as well as the ran­dom error term (sources 1 and 2), there are no closed form solu­tions apart from some sim­ple spe­cial cases.

One such spe­cial case is an ARIMA(0,1,0) model with drift, which can be writ­ten as

where is a white noise process. In this case, it is easy to com­pute the uncer­tainty asso­ci­ated with the esti­mate of c, and then allow for it in the forecasts.

This model can be fit­ted using either the Arima func­tion or the rwf func­tion from the fore­cast pack­age for R. If the Arima func­tion is used, the uncer­tainty in c is ignored, but if the rwf func­tion is used, the uncer­tainty in c is included in the pre­dic­tion inter­vals. The dif­fer­ence can be seen in the fol­low­ing sim­u­lated example.

library(forecast)
 
set.seed(22)
x <-ts(cumsum(rnorm(50, -2.5, 4)))
 
RWD.x <- rwf(x,  h=40, drift=TRUE, level=95)
ARIMA.x <- Arima(x, c(0,1,0), include.drift=TRUE)
 
plot(forecast(ARIMA.x, h=40, level=95))
lines(RWD.x$lower, lty=2)
lines(RWD.x$upper, lty=2)
Uncertainty Time series

Published at DZone with permission of Rob J Hyndman, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • DZone's Article Submission Guidelines
  • What to Know About Python and Why Its the Most Popular Today
  • Resilient Kafka Consumers With Reactor Kafka
  • Native vs Hybrid vs Cross-Platform: How and What to Choose?

Comments

Big Data Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo