Over a million developers have joined DZone.

Generating quantile forecasts in R

· Big Data Zone

Learn how you can maximize big data in the cloud with Apache Hadoop. Download this eBook now. Brought to you in partnership with Hortonworks.

From today’s email:

I have just fin­ished read­ing a copy of ‘Forecasting:Principles and Prac­tice’ and I have found the book really inter­est­ing. I have par­tic­u­larly enjoyed the case stud­ies and focus on prac­ti­cal applications.

After fin­ish­ing the book I have joined a fore­cast­ing com­pe­ti­tion to put what I’ve learnt to the test. I do have a cou­ple of queries about the fore­cast­ing out­puts required. The out­put required is a quan­tile fore­cast, is this the same as pre­dic­tion inter­vals? Is there any R func­tion to pro­duce quan­tiles from 0 to 99?

If you were able to point me in the right direc­tion regard­ing the above it would be greatly appreciated.

Many Thanks,

Pre­sum­ably the com­pe­ti­tion is GEFCOM2014 which I’ve posted about before.

The future value of a time series is unknown, so you can think of it as a ran­dom vari­able, and its dis­tri­b­u­tion is the “fore­cast dis­tri­b­u­tion”. A “quan­tile fore­cast” is a quan­tile of the fore­cast dis­tri­b­u­tion. The usual point fore­cast is often the mean or the median of the fore­cast dis­tri­b­u­tion. A pre­dic­tion inter­val is a range of spec­i­fied cov­er­age prob­a­bil­ity under that dis­tri­b­u­tion. For exam­ple, if we assume the fore­cast dis­tri­b­u­tion is nor­mal, then the 95% pre­dic­tion inter­val is defined by the 2.5% and 97.5% quan­tiles of the fore­cast distribution.

Still assum­ing nor­mal­ity, we could gen­er­ate the fore­cast quan­tiles from 1% to 99% in R using

qnorm((1:99)/100, m, s)

where mu and sigma are the esti­mated mean and stan­dard devi­a­tion of the fore­cast dis­tri­b­u­tion. So if you are using the fore­cast pack­age in R, you can do some­thing like this:

fit <- auto.arima(WWWusage)
fc <- forecast(fit, h=20, level=95)
qf <- matrix(0, nrow=99, ncol=20)
m <- fc$mean
s <- (fc$upper-fc$lower)/1.96/2
for(h in 1:20)
  qf[,h] <- qnorm((1:99)/100, m[h], s[h])
matlines(101:120, t(qf), col=rainbow(120), lty=1)

Of course, assum­ing a nor­mal dis­tri­b­u­tion is rather restric­tive and not very inter­est­ing. For a more inter­est­ing but much more com­pli­cated approach to gen­er­at­ing quan­tiles, see my 2010 paper on Den­sity fore­cast­ing for long-​​term peak elec­tric­ity demand.

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks


Published at DZone with permission of Rob J Hyndman, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}