Over a million developers have joined DZone.

Forecasting Weekly Data

DZone's Guide to

Forecasting Weekly Data

· Big Data Zone
Free Resource

Effortlessly power IoT, predictive analytics, and machine learning applications with an elastic, resilient data infrastructure. Learn how with Mesosphere DC/OS.

This is another sit­u­a­tion where Fourier terms are use­ful for han­dling the sea­son­al­ity. Not only is the sea­sonal period rather long, it is non-​​integer (aver­ag­ing 365.25÷7 = 52.18). So ARIMA and ETS mod­els do not tend to give good results, even with a period of 52 as an approx­i­ma­tion.

Regres­sion with ARIMA errors

The sim­plest approach is a regres­sion with ARIMA errors. Here is an exam­ple using weekly data on US fin­ished motor gaso­line prod­ucts sup­plied (in thou­sands of bar­rels per day) from Feb­ru­ary 1991 to May 2005. An updated ver­sion of the data is avail­able from the EIA web­site. I select the num­ber of Fourier terms by min­i­miz­ing the AICc. The order of the ARIMA model is also selected by min­i­miz­ing the AICc, although that is done within the auto.arima() function.

gas <- ts(read.csv("http://robjhyndman.com/data/gasoline.csv", header=FALSE)[,1], 
          freq=365.25/7, start=1991+31/7/365.25)
bestfit <- list(aicc=Inf)
for(i in 1:25)
  fit <- auto.arima(gas, xreg=fourier(gas, K=i), seasonal=FALSE)
  if(fit$aicc < bestfit$aicc)
    bestfit <- fit
  else break;
fc <- forecast(bestfit, xreg=fourierf(gas, K=12, h=104))

The fit­ted model has 12 pairs of Fourier terms and can be writ­ten as

    \[y_t = bt + \sum_{j=1}^{12} \left[ \alpha_j\sin\left(\frac{2\pi j t}{52.18}\right) + \beta_j\cos\left(\frac{2\pi j t}{52.18}\right) \right> + n_t\]

where n_t is an ARIMA(3,1,3) process. Because n_t is non-​​stationary, the model is actu­ally esti­mated on the dif­fer­ences of the vari­ables on both sides of this equa­tion. That is why there is no need for an inter­cept term. There are 24 para­me­ters to cap­ture the sea­son­al­ity which is rather a lot, but appar­ently required accord­ing to the AIC selec­tion. (BIC would have given fewer.) The total num­ber of degrees of free­dom is 31 (the other seven com­ing from the 6 ARMA para­me­ters and the drift parameter).


An alter­na­tive approach is the TBATS model intro­duced by De Liv­era et al (JASA, 2011). This uses a state space model that is a gen­er­al­iza­tion of those under­pin­ning expo­nen­tial smooth­ing. It also allows for auto­matic Box-​​Cox trans­for­ma­tion and ARMA errors. The mod­el­ling algo­rithm is entirely automated:

gastbats <- tbats(gas)
fc2 <- forecast(gastbats, h=104)
plot(fc2, ylab="thousands of barrels per day")

(The tbats func­tion gen­er­ates some warn­ings here, but it still works ok. I’ll fix the warn­ings in the next version.)

Here the fit­ted model is given at the top of the plot as TBATS(0.999, {2,2}, 1, {<52.18,8>}). That is, a Box-​​Cox trans­for­ma­tion of 0.999 (essen­tially doing noth­ing), ARMA(2,2) errors, a damp­ing para­me­ter of 1 (doing noth­ing) and 8 Fourier pairs with period m=52.18. This model can be writ­ten as

    \begin{align*} y_t &= \ell_{t-1} + b_{t-1} + s_{t-1} + \alpha d_t\\ b_t &= b_{t-1} + \beta d_t\\ s_t &= \sum_{j=1}^{8} s_{j,t}\\ s_{j,t} &= s_{j,t-1}\cos \left(\frac{2\pi j t}{52.18}\right) +s_{j,t-1}^{*}\sin \left(\frac{2\pi j t}{52.18}\right) + \gamma_1d_t \\ s_{j,t}^* &= -s_{j,t-1}\sin\left(\frac{2\pi j t}{52.18}\right) + s_{j,t-1}^{*}\cos\left(\frac{2\pi j t}{52.18}\right)+\gamma_2d_t, \end{align*}

where d_t is an ARMA(2,2) process and \alpha, \beta, \gamma_1 and \gamma_2 are smooth­ing para­me­ters. Here the sea­son­al­ity has been han­dled with 18 para­me­ters (the six­teen ini­tial val­ues for s_{j,0} and s_{j,0}^* and the two smooth­ing para­me­ters \gamma_1 and \gamma_2). The total num­ber of degrees of free­dom is 26 (the other 8 com­ing from the two smooth­ing para­me­ters \alpha and \beta, the four ARMA para­me­ters, and the ini­tial level and slope val­ues \ell_0 and b_0).

Which to use?

In this exam­ple, the fore­casts are almost iden­ti­cal and there is lit­tle to dif­fer­en­ti­ate the two mod­els. The TBATS model is prefer­able when the sea­son­al­ity changes over time, or when there are mul­ti­ple sea­sonal peri­ods. The ARIMA approach is prefer­able if there are covari­ates that are use­ful pre­dic­tors as these can be added as addi­tional regres­sors.

Learn to design and build better data-rich applications with this free eBook from O’Reilly. Brought to you by Mesosphere DC/OS.


Published at DZone with permission of Rob J Hyndman, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.


Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.


{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}