Over a million developers have joined DZone.

Forecasting Weekly Data

DZone's Guide to

Forecasting Weekly Data

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

This is another sit­u­a­tion where Fourier terms are use­ful for han­dling the sea­son­al­ity. Not only is the sea­sonal period rather long, it is non-​​integer (aver­ag­ing 365.25÷7 = 52.18). So ARIMA and ETS mod­els do not tend to give good results, even with a period of 52 as an approx­i­ma­tion.

Regres­sion with ARIMA errors

The sim­plest approach is a regres­sion with ARIMA errors. Here is an exam­ple using weekly data on US fin­ished motor gaso­line prod­ucts sup­plied (in thou­sands of bar­rels per day) from Feb­ru­ary 1991 to May 2005. An updated ver­sion of the data is avail­able from the EIA web­site. I select the num­ber of Fourier terms by min­i­miz­ing the AICc. The order of the ARIMA model is also selected by min­i­miz­ing the AICc, although that is done within the auto.arima() function.

gas <- ts(read.csv("http://robjhyndman.com/data/gasoline.csv", header=FALSE)[,1], 
          freq=365.25/7, start=1991+31/7/365.25)
bestfit <- list(aicc=Inf)
for(i in 1:25)
  fit <- auto.arima(gas, xreg=fourier(gas, K=i), seasonal=FALSE)
  if(fit$aicc < bestfit$aicc)
    bestfit <- fit
  else break;
fc <- forecast(bestfit, xreg=fourierf(gas, K=12, h=104))

The fit­ted model has 12 pairs of Fourier terms and can be writ­ten as

    \[y_t = bt + \sum_{j=1}^{12} \left[ \alpha_j\sin\left(\frac{2\pi j t}{52.18}\right) + \beta_j\cos\left(\frac{2\pi j t}{52.18}\right) \right> + n_t\]

where n_t is an ARIMA(3,1,3) process. Because n_t is non-​​stationary, the model is actu­ally esti­mated on the dif­fer­ences of the vari­ables on both sides of this equa­tion. That is why there is no need for an inter­cept term. There are 24 para­me­ters to cap­ture the sea­son­al­ity which is rather a lot, but appar­ently required accord­ing to the AIC selec­tion. (BIC would have given fewer.) The total num­ber of degrees of free­dom is 31 (the other seven com­ing from the 6 ARMA para­me­ters and the drift parameter).


An alter­na­tive approach is the TBATS model intro­duced by De Liv­era et al (JASA, 2011). This uses a state space model that is a gen­er­al­iza­tion of those under­pin­ning expo­nen­tial smooth­ing. It also allows for auto­matic Box-​​Cox trans­for­ma­tion and ARMA errors. The mod­el­ling algo­rithm is entirely automated:

gastbats <- tbats(gas)
fc2 <- forecast(gastbats, h=104)
plot(fc2, ylab="thousands of barrels per day")

(The tbats func­tion gen­er­ates some warn­ings here, but it still works ok. I’ll fix the warn­ings in the next version.)

Here the fit­ted model is given at the top of the plot as TBATS(0.999, {2,2}, 1, {<52.18,8>}). That is, a Box-​​Cox trans­for­ma­tion of 0.999 (essen­tially doing noth­ing), ARMA(2,2) errors, a damp­ing para­me­ter of 1 (doing noth­ing) and 8 Fourier pairs with period m=52.18. This model can be writ­ten as

    \begin{align*} y_t &= \ell_{t-1} + b_{t-1} + s_{t-1} + \alpha d_t\\ b_t &= b_{t-1} + \beta d_t\\ s_t &= \sum_{j=1}^{8} s_{j,t}\\ s_{j,t} &= s_{j,t-1}\cos \left(\frac{2\pi j t}{52.18}\right) +s_{j,t-1}^{*}\sin \left(\frac{2\pi j t}{52.18}\right) + \gamma_1d_t \\ s_{j,t}^* &= -s_{j,t-1}\sin\left(\frac{2\pi j t}{52.18}\right) + s_{j,t-1}^{*}\cos\left(\frac{2\pi j t}{52.18}\right)+\gamma_2d_t, \end{align*}

where d_t is an ARMA(2,2) process and \alpha, \beta, \gamma_1 and \gamma_2 are smooth­ing para­me­ters. Here the sea­son­al­ity has been han­dled with 18 para­me­ters (the six­teen ini­tial val­ues for s_{j,0} and s_{j,0}^* and the two smooth­ing para­me­ters \gamma_1 and \gamma_2). The total num­ber of degrees of free­dom is 26 (the other 8 com­ing from the two smooth­ing para­me­ters \alpha and \beta, the four ARMA para­me­ters, and the ini­tial level and slope val­ues \ell_0 and b_0).

Which to use?

In this exam­ple, the fore­casts are almost iden­ti­cal and there is lit­tle to dif­fer­en­ti­ate the two mod­els. The TBATS model is prefer­able when the sea­son­al­ity changes over time, or when there are mul­ti­ple sea­sonal peri­ods. The ARIMA approach is prefer­able if there are covari­ates that are use­ful pre­dic­tors as these can be added as addi­tional regres­sors.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.


Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}