Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Generating quantile forecasts in R

DZone's Guide to

Generating quantile forecasts in R

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

From today’s email:

I have just fin­ished read­ing a copy of ‘Forecasting:Principles and Prac­tice’ and I have found the book really inter­est­ing. I have par­tic­u­larly enjoyed the case stud­ies and focus on prac­ti­cal applications.

After fin­ish­ing the book I have joined a fore­cast­ing com­pe­ti­tion to put what I’ve learnt to the test. I do have a cou­ple of queries about the fore­cast­ing out­puts required. The out­put required is a quan­tile fore­cast, is this the same as pre­dic­tion inter­vals? Is there any R func­tion to pro­duce quan­tiles from 0 to 99?

If you were able to point me in the right direc­tion regard­ing the above it would be greatly appreciated.

Many Thanks,

Pre­sum­ably the com­pe­ti­tion is GEFCOM2014 which I’ve posted about before.

The future value of a time series is unknown, so you can think of it as a ran­dom vari­able, and its dis­tri­b­u­tion is the “fore­cast dis­tri­b­u­tion”. A “quan­tile fore­cast” is a quan­tile of the fore­cast dis­tri­b­u­tion. The usual point fore­cast is often the mean or the median of the fore­cast dis­tri­b­u­tion. A pre­dic­tion inter­val is a range of spec­i­fied cov­er­age prob­a­bil­ity under that dis­tri­b­u­tion. For exam­ple, if we assume the fore­cast dis­tri­b­u­tion is nor­mal, then the 95% pre­dic­tion inter­val is defined by the 2.5% and 97.5% quan­tiles of the fore­cast distribution.

Still assum­ing nor­mal­ity, we could gen­er­ate the fore­cast quan­tiles from 1% to 99% in R using

qnorm((1:99)/100, m, s)

where mu and sigma are the esti­mated mean and stan­dard devi­a­tion of the fore­cast dis­tri­b­u­tion. So if you are using the fore­cast pack­age in R, you can do some­thing like this:

library(forecast)
fit <- auto.arima(WWWusage)
fc <- forecast(fit, h=20, level=95)
qf <- matrix(0, nrow=99, ncol=20)
m <- fc$mean
s <- (fc$upper-fc$lower)/1.96/2
for(h in 1:20)
  qf[,h] <- qnorm((1:99)/100, m[h], s[h])
 
plot(fc)
matlines(101:120, t(qf), col=rainbow(120), lty=1)

Of course, assum­ing a nor­mal dis­tri­b­u­tion is rather restric­tive and not very inter­est­ing. For a more inter­est­ing but much more com­pli­cated approach to gen­er­at­ing quan­tiles, see my 2010 paper on Den­sity fore­cast­ing for long-​​term peak elec­tric­ity demand.



Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

Topics:

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}