Over a million developers have joined DZone.

HTS with Regressors

DZone 's Guide to

HTS with Regressors

· Big Data Zone ·
Free Resource

The hts pack­age for R allows for fore­cast­ing hier­ar­chi­cal and grouped time series data. The idea is to gen­er­ate fore­casts for all series at all lev­els of aggre­ga­tion with­out impos­ing the aggre­ga­tion con­straints, and then to rec­on­cile the fore­casts so they sat­isfy the aggre­ga­tion con­straints. (An intro­duc­tion to rec­on­cil­ing hier­ar­chi­cal and grouped time series is avail­able in this Fore­sight paper.)

The base fore­casts can be gen­er­ated using any method, with ETS mod­els and ARIMA mod­els pro­vided as options in the forecast.gts() func­tion. As ETS mod­els do not allow for regres­sors, you will need to choose ARIMA mod­els if you want to include regressors.

Sup­pose x is a matrix of his­tor­i­cal regres­sors (with each col­umn con­tain­ing one regres­sor and with the num­ber of rows equal to the num­ber of time peri­ods of his­tor­i­cal data), and f is the cor­re­spond­ing matrix of future regres­sors (with the num­ber of rows equal to the fore­cast hori­zon). Then if y is an hts or gts object, the fol­low­ing code can be used for forecasting:

fc = forecast(y, fmethod="arima", xreg="x", newxreg="f")

That will fit a regres­sion with ARIMA errors to each of the orig­i­nal series (at all lev­els of aggre­ga­tion), pro­duce fore­casts from each model, and then rec­on­cile the fore­casts so they sat­isfy the aggre­ga­tion constraints.

Infant deaths

I will illus­trate using infant death num­bers for Aus­tralia. These are dis­ag­gre­gated by state and sex as shown below. This graph is pro­duced using plot(infantgts).

[Click for a larger ver­sion of the graph]

A poten­tial fore­cast­ing method is to use a regres­sion model on the log scale with a con­stant to 1970 and a decreas­ing trend there­after. The fig­ure below shows the model for the most aggre­gated data.

To apply this model to all series, and allow for ARMA errors, we can use the fol­low­ing code.

y = window(infantgts, start=1944)
z = pmax(time(y$bts) - 1970, 0)
fz = max(z) + 1:10
fc = forecast(y, h=10, fmethod="arima", xreg=z, newxreg=fz, lambda=0)

[Click for a larger ver­sion of the graph]

I started the series at 1944 as there were a few zero obser­va­tions before that, and tak­ing logs caused prob­lems. The argu­ment lambda=0 means the mod­els are fit­ted to the logged data (although rec­on­cil­i­a­tion must occur on the orig­i­nal scale).


Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}