{{ !articles[0].partner.isSponsoringArticle ? "Platinum" : "Portal" }} Partner

Batch forecasting in R

I some­times get asked about fore­cast­ing many time series auto­mat­i­cally. Here is a recent email, for example:

I have looked but can­not find any info on gen­er­at­ing fore­casts on mul­ti­ple data sets in sequence. I have been using analy­sis ser­vices for sql server to gen­er­ate fit­ted time series but it is too much of a black box (or I don’t know enough to tweak/​manage the inputs). In short, what pack­age should I research that will allow me to load data, gen­er­ate a fore­cast (pre­sum­ably best fit), export the fore­cast then repeat for a few thou­sand items. I have read that R does not like ‘loops’ but not sure if the cur­rent cpu power off­sets that or not. Any guid­ance would be greatly appre­ci­ated. Thank you!!

My response

Loops are fine in R. They are frowned upon because peo­ple use them inap­pro­pri­ately when there are often much more effi­cient vec­tor­ized ver­sions avail­able. But for this task, a loop is the only approach.

Read­ing data and export­ing fore­casts is stan­dard R and does not require any addi­tional pack­ages to load. To gen­er­ate the fore­casts, use the fore­cast pack­age. Either the ets() func­tion or the auto.arima() func­tion depend­ing on what type of data you are mod­el­ling. If it’s high fre­quency data (fre­quency greater than 24) than you would need the tbats() func­tion but that is very slow.

Some sam­ple code

In the fol­low­ing exam­ple, there are many columns of monthly data in a csv file with the first col­umn con­tain­ing the month of obser­va­tion (begin­ning with April 1982). Fore­casts have been gen­er­ated by apply­ing forecast() directly to each time series. That will select an ETS model using the AIC, esti­mate the para­me­ters, and gen­er­ate fore­casts. Although it returns pre­dic­tion inter­vals, in the fol­low­ing code, I’ve sim­ply extracted the point fore­casts (named mean in the returned fore­cast object because they are usu­ally the mean of the fore­cast distribution).

retail <- read.csv("http://robjhyndman.com/data/ausretail.csv",header=FALSE)
retail <- ts(retail[,-1],f=12,s=1982+3/12)
ns <- ncol(retail)
h <- 24
fcast <- matrix(NA,nrow=h,ncol=ns)
for(i in 1:ns)
  fcast[,i] <- forecast(retail[,i],h=h)$mean

Note that the trans­pose of the fcast matrix is used in write() because the file is writ­ten row-​​by-​​row rather than column-​​by-​​column.

This code does not actu­ally do what the ques­tioner asked as I am writ­ing all fore­casts at once rather than export­ing them at each iter­a­tion. The lat­ter is much less efficient.

If ns is large, this could prob­a­bly be more effi­ciently coded using the par­al­lel pack­age.

Published at DZone with permission of {{ articles[0].authors[0].realName }}, DZone MVB. (source)

Opinions expressed by DZone contributors are their own.

{{ tag }}, {{tag}},

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}
{{ parent.authors[0].realName || parent.author}}

{{ parent.authors[0].tagline || parent.tagline }}

{{ parent.views }} ViewsClicks