Over a million developers have joined DZone.

Generating quantile forecasts in R

DZone's Guide to

Generating quantile forecasts in R

· Big Data Zone
Free Resource

Learn best practices according to DataOps. Download the free O'Reilly eBook on building a modern Big Data platform.

From today’s email:

I have just fin­ished read­ing a copy of ‘Forecasting:Principles and Prac­tice’ and I have found the book really inter­est­ing. I have par­tic­u­larly enjoyed the case stud­ies and focus on prac­ti­cal applications.

After fin­ish­ing the book I have joined a fore­cast­ing com­pe­ti­tion to put what I’ve learnt to the test. I do have a cou­ple of queries about the fore­cast­ing out­puts required. The out­put required is a quan­tile fore­cast, is this the same as pre­dic­tion inter­vals? Is there any R func­tion to pro­duce quan­tiles from 0 to 99?

If you were able to point me in the right direc­tion regard­ing the above it would be greatly appreciated.

Many Thanks,

Pre­sum­ably the com­pe­ti­tion is GEFCOM2014 which I’ve posted about before.

The future value of a time series is unknown, so you can think of it as a ran­dom vari­able, and its dis­tri­b­u­tion is the “fore­cast dis­tri­b­u­tion”. A “quan­tile fore­cast” is a quan­tile of the fore­cast dis­tri­b­u­tion. The usual point fore­cast is often the mean or the median of the fore­cast dis­tri­b­u­tion. A pre­dic­tion inter­val is a range of spec­i­fied cov­er­age prob­a­bil­ity under that dis­tri­b­u­tion. For exam­ple, if we assume the fore­cast dis­tri­b­u­tion is nor­mal, then the 95% pre­dic­tion inter­val is defined by the 2.5% and 97.5% quan­tiles of the fore­cast distribution.

Still assum­ing nor­mal­ity, we could gen­er­ate the fore­cast quan­tiles from 1% to 99% in R using

qnorm((1:99)/100, m, s)

where mu and sigma are the esti­mated mean and stan­dard devi­a­tion of the fore­cast dis­tri­b­u­tion. So if you are using the fore­cast pack­age in R, you can do some­thing like this:

fit <- auto.arima(WWWusage)
fc <- forecast(fit, h=20, level=95)
qf <- matrix(0, nrow=99, ncol=20)
m <- fc$mean
s <- (fc$upper-fc$lower)/1.96/2
for(h in 1:20)
  qf[,h] <- qnorm((1:99)/100, m[h], s[h])
matlines(101:120, t(qf), col=rainbow(120), lty=1)

Of course, assum­ing a nor­mal dis­tri­b­u­tion is rather restric­tive and not very inter­est­ing. For a more inter­est­ing but much more com­pli­cated approach to gen­er­at­ing quan­tiles, see my 2010 paper on Den­sity fore­cast­ing for long-​​term peak elec­tric­ity demand.

Find the perfect platform for a scalable self-service model to manage Big Data workloads in the Cloud. Download the free O'Reilly eBook to learn more.


Published at DZone with permission of Rob J Hyndman, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}