Over a million developers have joined DZone.

Batch forecasting in R

DZone's Guide to

Batch forecasting in R

· Performance Zone ·
Free Resource

SignalFx is the only real-time cloud monitoring platform for infrastructure, microservices, and applications. The platform collects metrics and traces across every component in your cloud environment, replacing traditional point tools with a single integrated solution that works across the stack.

I some­times get asked about fore­cast­ing many time series auto­mat­i­cally. Here is a recent email, for example:

I have looked but can­not find any info on gen­er­at­ing fore­casts on mul­ti­ple data sets in sequence. I have been using analy­sis ser­vices for sql server to gen­er­ate fit­ted time series but it is too much of a black box (or I don’t know enough to tweak/​manage the inputs). In short, what pack­age should I research that will allow me to load data, gen­er­ate a fore­cast (pre­sum­ably best fit), export the fore­cast then repeat for a few thou­sand items. I have read that R does not like ‘loops’ but not sure if the cur­rent cpu power off­sets that or not. Any guid­ance would be greatly appre­ci­ated. Thank you!!

My response

Loops are fine in R. They are frowned upon because peo­ple use them inap­pro­pri­ately when there are often much more effi­cient vec­tor­ized ver­sions avail­able. But for this task, a loop is the only approach.

Read­ing data and export­ing fore­casts is stan­dard R and does not require any addi­tional pack­ages to load. To gen­er­ate the fore­casts, use the fore­cast pack­age. Either the ets() func­tion or the auto.arima() func­tion depend­ing on what type of data you are mod­el­ling. If it’s high fre­quency data (fre­quency greater than 24) than you would need the tbats() func­tion but that is very slow.

Some sam­ple code

In the fol­low­ing exam­ple, there are many columns of monthly data in a csv file with the first col­umn con­tain­ing the month of obser­va­tion (begin­ning with April 1982). Fore­casts have been gen­er­ated by apply­ing forecast() directly to each time series. That will select an ETS model using the AIC, esti­mate the para­me­ters, and gen­er­ate fore­casts. Although it returns pre­dic­tion inter­vals, in the fol­low­ing code, I’ve sim­ply extracted the point fore­casts (named mean in the returned fore­cast object because they are usu­ally the mean of the fore­cast distribution).

retail <- read.csv("http://robjhyndman.com/data/ausretail.csv",header=FALSE)
retail <- ts(retail[,-1],f=12,s=1982+3/12)
ns <- ncol(retail)
h <- 24
fcast <- matrix(NA,nrow=h,ncol=ns)
for(i in 1:ns)
  fcast[,i] <- forecast(retail[,i],h=h)$mean

Note that the trans­pose of the fcast matrix is used in write() because the file is writ­ten row-​​by-​​row rather than column-​​by-​​column.

This code does not actu­ally do what the ques­tioner asked as I am writ­ing all fore­casts at once rather than export­ing them at each iter­a­tion. The lat­ter is much less efficient.

If ns is large, this could prob­a­bly be more effi­ciently coded using the par­al­lel pack­age.

SignalFx is built on a massively scalable streaming architecture that applies advanced predictive analytics for real-time problem detection. With its NoSample™ distributed tracing capabilities, SignalFx reliably monitors all transactions across microservices, accurately identifying all anomalies. And through data-science-powered directed troubleshooting SignalFx guides the operator to find the root cause of issues in seconds.


Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}