Over a million developers have joined DZone.

HTS with Regressors

DZone's Guide to

HTS with Regressors

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

The hts pack­age for R allows for fore­cast­ing hier­ar­chi­cal and grouped time series data. The idea is to gen­er­ate fore­casts for all series at all lev­els of aggre­ga­tion with­out impos­ing the aggre­ga­tion con­straints, and then to rec­on­cile the fore­casts so they sat­isfy the aggre­ga­tion con­straints. (An intro­duc­tion to rec­on­cil­ing hier­ar­chi­cal and grouped time series is avail­able in this Fore­sight paper.)

The base fore­casts can be gen­er­ated using any method, with ETS mod­els and ARIMA mod­els pro­vided as options in the forecast.gts() func­tion. As ETS mod­els do not allow for regres­sors, you will need to choose ARIMA mod­els if you want to include regressors.

Sup­pose x is a matrix of his­tor­i­cal regres­sors (with each col­umn con­tain­ing one regres­sor and with the num­ber of rows equal to the num­ber of time peri­ods of his­tor­i­cal data), and f is the cor­re­spond­ing matrix of future regres­sors (with the num­ber of rows equal to the fore­cast hori­zon). Then if y is an hts or gts object, the fol­low­ing code can be used for forecasting:

fc = forecast(y, fmethod="arima", xreg="x", newxreg="f")

That will fit a regres­sion with ARIMA errors to each of the orig­i­nal series (at all lev­els of aggre­ga­tion), pro­duce fore­casts from each model, and then rec­on­cile the fore­casts so they sat­isfy the aggre­ga­tion constraints.

Infant deaths

I will illus­trate using infant death num­bers for Aus­tralia. These are dis­ag­gre­gated by state and sex as shown below. This graph is pro­duced using plot(infantgts).

[Click for a larger ver­sion of the graph]

A poten­tial fore­cast­ing method is to use a regres­sion model on the log scale with a con­stant to 1970 and a decreas­ing trend there­after. The fig­ure below shows the model for the most aggre­gated data.

To apply this model to all series, and allow for ARMA errors, we can use the fol­low­ing code.

y = window(infantgts, start=1944)
z = pmax(time(y$bts) - 1970, 0)
fz = max(z) + 1:10
fc = forecast(y, h=10, fmethod="arima", xreg=z, newxreg=fz, lambda=0)

[Click for a larger ver­sion of the graph]

I started the series at 1944 as there were a few zero obser­va­tions before that, and tak­ing logs caused prob­lems. The argu­ment lambda=0 means the mod­els are fit­ted to the logged data (although rec­on­cil­i­a­tion must occur on the orig­i­nal scale).

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.


Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}