Batch forecasting in R
Join the DZone community and get the full member experience.
Join For FreeI sometimes get asked about forecasting many time series automatically. Here is a recent email, for example:
I have looked but cannot find any info on generating forecasts on multiple data sets in sequence. I have been using analysis services for sql server to generate fitted time series but it is too much of a black box (or I don’t know enough to tweak/manage the inputs). In short, what package should I research that will allow me to load data, generate a forecast (presumably best fit), export the forecast then repeat for a few thousand items. I have read that R does not like ‘loops’ but not sure if the current cpu power offsets that or not. Any guidance would be greatly appreciated. Thank you!!
My response
Loops are fine in R. They are frowned upon because people use them inappropriately when there are often much more efficient vectorized versions available. But for this task, a loop is the only approach.
Reading data and exporting forecasts is standard R and does not require any additional packages to load. To generate the forecasts, use the forecast package. Either the ets()
function or the auto.arima()
function depending on what type of data you are modelling. If it’s high frequency data (frequency greater than 24) than you would need the tbats()
function but that is very slow.
Some sample code
In the following example, there are many columns of monthly data in a csv file with the first column containing the month of observation (beginning with April 1982). Forecasts have been generated by applying forecast()
directly to each time series. That will select an ETS model using the AIC, estimate the parameters, and generate forecasts. Although it returns prediction intervals, in the following code, I’ve simply extracted the point forecasts (named mean
in the returned forecast object because they are usually the mean of the forecast distribution).
library(forecast) retail <- read.csv("http://robjhyndman.com/data/ausretail.csv",header=FALSE) retail <- ts(retail[,-1],f=12,s=1982+3/12) ns <- ncol(retail) h <- 24 fcast <- matrix(NA,nrow=h,ncol=ns) for(i in 1:ns) fcast[,i] <- forecast(retail[,i],h=h)$mean write(t(fcast),file="retailfcasts.csv",sep=",",ncol=ncol(fcast))
Note that the transpose of the fcast
matrix is used in write()
because the file is written row-by-row rather than column-by-column.
This code does not actually do what the questioner asked as I am writing all forecasts at once rather than exporting them at each iteration. The latter is much less efficient.
If ns
is large, this could probably be more efficiently coded using the parallel package.
Published at DZone with permission of Rob J Hyndman, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Trending
-
Manifold vs. Lombok: Enhancing Java With Property Support
-
Docker Compose vs. Kubernetes: The Top 4 Main Differences
-
Hiding Data in Cassandra
-
Constructing Real-Time Analytics: Fundamental Components and Architectural Framework — Part 2
Comments