Batch forecasting in R
Batch forecasting in R
Join the DZone community and get the full member experience.Join For Free
SignalFx is the only real-time cloud monitoring platform for infrastructure, microservices, and applications. The platform collects metrics and traces across every component in your cloud environment, replacing traditional point tools with a single integrated solution that works across the stack.
I sometimes get asked about forecasting many time series automatically. Here is a recent email, for example:
I have looked but cannot find any info on generating forecasts on multiple data sets in sequence. I have been using analysis services for sql server to generate fitted time series but it is too much of a black box (or I don’t know enough to tweak/manage the inputs). In short, what package should I research that will allow me to load data, generate a forecast (presumably best fit), export the forecast then repeat for a few thousand items. I have read that R does not like ‘loops’ but not sure if the current cpu power offsets that or not. Any guidance would be greatly appreciated. Thank you!!
Loops are fine in R. They are frowned upon because people use them inappropriately when there are often much more efficient vectorized versions available. But for this task, a loop is the only approach.
Reading data and exporting forecasts is standard R and does not require any additional packages to load. To generate the forecasts, use the forecast package. Either the
ets() function or the
auto.arima() function depending on what type of data you are modelling. If it’s high frequency data (frequency greater than 24) than you would need the
tbats() function but that is very slow.
Some sample code
In the following example, there are many columns of monthly data in a csv file with the first column containing the month of observation (beginning with April 1982). Forecasts have been generated by applying
forecast() directly to each time series. That will select an ETS model using the AIC, estimate the parameters, and generate forecasts. Although it returns prediction intervals, in the following code, I’ve simply extracted the point forecasts (named
mean in the returned forecast object because they are usually the mean of the forecast distribution).
library(forecast) retail <- read.csv("http://robjhyndman.com/data/ausretail.csv",header=FALSE) retail <- ts(retail[,-1],f=12,s=1982+3/12) ns <- ncol(retail) h <- 24 fcast <- matrix(NA,nrow=h,ncol=ns) for(i in 1:ns) fcast[,i] <- forecast(retail[,i],h=h)$mean write(t(fcast),file="retailfcasts.csv",sep=",",ncol=ncol(fcast))
Note that the transpose of the
fcast matrix is used in
write() because the file is written row-by-row rather than column-by-column.
This code does not actually do what the questioner asked as I am writing all forecasts at once rather than exporting them at each iteration. The latter is much less efficient.
ns is large, this could probably be more efficiently coded using the parallel package.
Published at DZone with permission of Rob J Hyndman , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.