Over a million developers have joined DZone.

HTS with Regressors

DZone's Guide to

HTS with Regressors

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

The hts pack­age for R allows for fore­cast­ing hier­ar­chi­cal and grouped time series data. The idea is to gen­er­ate fore­casts for all series at all lev­els of aggre­ga­tion with­out impos­ing the aggre­ga­tion con­straints, and then to rec­on­cile the fore­casts so they sat­isfy the aggre­ga­tion con­straints. (An intro­duc­tion to rec­on­cil­ing hier­ar­chi­cal and grouped time series is avail­able in this Fore­sight paper.)

The base fore­casts can be gen­er­ated using any method, with ETS mod­els and ARIMA mod­els pro­vided as options in the forecast.gts() func­tion. As ETS mod­els do not allow for regres­sors, you will need to choose ARIMA mod­els if you want to include regressors.

Sup­pose x is a matrix of his­tor­i­cal regres­sors (with each col­umn con­tain­ing one regres­sor and with the num­ber of rows equal to the num­ber of time peri­ods of his­tor­i­cal data), and f is the cor­re­spond­ing matrix of future regres­sors (with the num­ber of rows equal to the fore­cast hori­zon). Then if y is an hts or gts object, the fol­low­ing code can be used for forecasting:

fc = forecast(y, fmethod="arima", xreg="x", newxreg="f")

That will fit a regres­sion with ARIMA errors to each of the orig­i­nal series (at all lev­els of aggre­ga­tion), pro­duce fore­casts from each model, and then rec­on­cile the fore­casts so they sat­isfy the aggre­ga­tion constraints.

Infant deaths

I will illus­trate using infant death num­bers for Aus­tralia. These are dis­ag­gre­gated by state and sex as shown below. This graph is pro­duced using plot(infantgts).

[Click for a larger ver­sion of the graph]

A poten­tial fore­cast­ing method is to use a regres­sion model on the log scale with a con­stant to 1970 and a decreas­ing trend there­after. The fig­ure below shows the model for the most aggre­gated data.

To apply this model to all series, and allow for ARMA errors, we can use the fol­low­ing code.

y = window(infantgts, start=1944)
z = pmax(time(y$bts) - 1970, 0)
fz = max(z) + 1:10
fc = forecast(y, h=10, fmethod="arima", xreg=z, newxreg=fz, lambda=0)

[Click for a larger ver­sion of the graph]

I started the series at 1944 as there were a few zero obser­va­tions before that, and tak­ing logs caused prob­lems. The argu­ment lambda=0 means the mod­els are fit­ted to the logged data (although rec­on­cil­i­a­tion must occur on the orig­i­nal scale).

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.


Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}