Over a million developers have joined DZone.

HTS with Regressors

DZone's Guide to

HTS with Regressors

· Big Data Zone
Free Resource

Need to build an application around your data? Learn more about dataflow programming for rapid development and greater creativity. 

The hts pack­age for R allows for fore­cast­ing hier­ar­chi­cal and grouped time series data. The idea is to gen­er­ate fore­casts for all series at all lev­els of aggre­ga­tion with­out impos­ing the aggre­ga­tion con­straints, and then to rec­on­cile the fore­casts so they sat­isfy the aggre­ga­tion con­straints. (An intro­duc­tion to rec­on­cil­ing hier­ar­chi­cal and grouped time series is avail­able in this Fore­sight paper.)

The base fore­casts can be gen­er­ated using any method, with ETS mod­els and ARIMA mod­els pro­vided as options in the forecast.gts() func­tion. As ETS mod­els do not allow for regres­sors, you will need to choose ARIMA mod­els if you want to include regressors.

Sup­pose x is a matrix of his­tor­i­cal regres­sors (with each col­umn con­tain­ing one regres­sor and with the num­ber of rows equal to the num­ber of time peri­ods of his­tor­i­cal data), and f is the cor­re­spond­ing matrix of future regres­sors (with the num­ber of rows equal to the fore­cast hori­zon). Then if y is an hts or gts object, the fol­low­ing code can be used for forecasting:

fc = forecast(y, fmethod="arima", xreg="x", newxreg="f")

That will fit a regres­sion with ARIMA errors to each of the orig­i­nal series (at all lev­els of aggre­ga­tion), pro­duce fore­casts from each model, and then rec­on­cile the fore­casts so they sat­isfy the aggre­ga­tion constraints.

Infant deaths

I will illus­trate using infant death num­bers for Aus­tralia. These are dis­ag­gre­gated by state and sex as shown below. This graph is pro­duced using plot(infantgts).

[Click for a larger ver­sion of the graph]

A poten­tial fore­cast­ing method is to use a regres­sion model on the log scale with a con­stant to 1970 and a decreas­ing trend there­after. The fig­ure below shows the model for the most aggre­gated data.

To apply this model to all series, and allow for ARMA errors, we can use the fol­low­ing code.

y = window(infantgts, start=1944)
z = pmax(time(y$bts) - 1970, 0)
fz = max(z) + 1:10
fc = forecast(y, h=10, fmethod="arima", xreg=z, newxreg=fz, lambda=0)

[Click for a larger ver­sion of the graph]

I started the series at 1944 as there were a few zero obser­va­tions before that, and tak­ing logs caused prob­lems. The argu­ment lambda=0 means the mod­els are fit­ted to the logged data (although rec­on­cil­i­a­tion must occur on the orig­i­nal scale).

Check out the Exaptive data application Studio. Technology agnostic. No glue code. Use what you know and rely on the community for what you don't. Try the community version.


Published at DZone with permission of Rob J Hyndman, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.


Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.


{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}