Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Generating quantile forecasts in R

DZone's Guide to

Generating quantile forecasts in R

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

From today’s email:

I have just fin­ished read­ing a copy of ‘Forecasting:Principles and Prac­tice’ and I have found the book really inter­est­ing. I have par­tic­u­larly enjoyed the case stud­ies and focus on prac­ti­cal applications.

After fin­ish­ing the book I have joined a fore­cast­ing com­pe­ti­tion to put what I’ve learnt to the test. I do have a cou­ple of queries about the fore­cast­ing out­puts required. The out­put required is a quan­tile fore­cast, is this the same as pre­dic­tion inter­vals? Is there any R func­tion to pro­duce quan­tiles from 0 to 99?

If you were able to point me in the right direc­tion regard­ing the above it would be greatly appreciated.

Many Thanks,

Pre­sum­ably the com­pe­ti­tion is GEFCOM2014 which I’ve posted about before.

The future value of a time series is unknown, so you can think of it as a ran­dom vari­able, and its dis­tri­b­u­tion is the “fore­cast dis­tri­b­u­tion”. A “quan­tile fore­cast” is a quan­tile of the fore­cast dis­tri­b­u­tion. The usual point fore­cast is often the mean or the median of the fore­cast dis­tri­b­u­tion. A pre­dic­tion inter­val is a range of spec­i­fied cov­er­age prob­a­bil­ity under that dis­tri­b­u­tion. For exam­ple, if we assume the fore­cast dis­tri­b­u­tion is nor­mal, then the 95% pre­dic­tion inter­val is defined by the 2.5% and 97.5% quan­tiles of the fore­cast distribution.

Still assum­ing nor­mal­ity, we could gen­er­ate the fore­cast quan­tiles from 1% to 99% in R using

qnorm((1:99)/100, m, s)

where mu and sigma are the esti­mated mean and stan­dard devi­a­tion of the fore­cast dis­tri­b­u­tion. So if you are using the fore­cast pack­age in R, you can do some­thing like this:

library(forecast)
fit <- auto.arima(WWWusage)
fc <- forecast(fit, h=20, level=95)
qf <- matrix(0, nrow=99, ncol=20)
m <- fc$mean
s <- (fc$upper-fc$lower)/1.96/2
for(h in 1:20)
  qf[,h] <- qnorm((1:99)/100, m[h], s[h])
 
plot(fc)
matlines(101:120, t(qf), col=rainbow(120), lty=1)

Of course, assum­ing a nor­mal dis­tri­b­u­tion is rather restric­tive and not very inter­est­ing. For a more inter­est­ing but much more com­pli­cated approach to gen­er­at­ing quan­tiles, see my 2010 paper on Den­sity fore­cast­ing for long-​​term peak elec­tric­ity demand.



Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}