Over a million developers have joined DZone.

HTS with Regressors

· Big Data Zone

Learn about how to rapidly iterate data applications, while reusing existing code and leveraging open source technologies, brought to you in partnership with Exaptive.

The hts pack­age for R allows for fore­cast­ing hier­ar­chi­cal and grouped time series data. The idea is to gen­er­ate fore­casts for all series at all lev­els of aggre­ga­tion with­out impos­ing the aggre­ga­tion con­straints, and then to rec­on­cile the fore­casts so they sat­isfy the aggre­ga­tion con­straints. (An intro­duc­tion to rec­on­cil­ing hier­ar­chi­cal and grouped time series is avail­able in this Fore­sight paper.)

The base fore­casts can be gen­er­ated using any method, with ETS mod­els and ARIMA mod­els pro­vided as options in the forecast.gts() func­tion. As ETS mod­els do not allow for regres­sors, you will need to choose ARIMA mod­els if you want to include regressors.

Sup­pose x is a matrix of his­tor­i­cal regres­sors (with each col­umn con­tain­ing one regres­sor and with the num­ber of rows equal to the num­ber of time peri­ods of his­tor­i­cal data), and f is the cor­re­spond­ing matrix of future regres­sors (with the num­ber of rows equal to the fore­cast hori­zon). Then if y is an hts or gts object, the fol­low­ing code can be used for forecasting:

fc = forecast(y, fmethod="arima", xreg="x", newxreg="f")

That will fit a regres­sion with ARIMA errors to each of the orig­i­nal series (at all lev­els of aggre­ga­tion), pro­duce fore­casts from each model, and then rec­on­cile the fore­casts so they sat­isfy the aggre­ga­tion constraints.

Infant deaths

I will illus­trate using infant death num­bers for Aus­tralia. These are dis­ag­gre­gated by state and sex as shown below. This graph is pro­duced using plot(infantgts).

[Click for a larger ver­sion of the graph]

A poten­tial fore­cast­ing method is to use a regres­sion model on the log scale with a con­stant to 1970 and a decreas­ing trend there­after. The fig­ure below shows the model for the most aggre­gated data.

To apply this model to all series, and allow for ARMA errors, we can use the fol­low­ing code.

y = window(infantgts, start=1944)
z = pmax(time(y$bts) - 1970, 0)
fz = max(z) + 1:10
fc = forecast(y, h=10, fmethod="arima", xreg=z, newxreg=fz, lambda=0)
plot(fc)

[Click for a larger ver­sion of the graph]

I started the series at 1944 as there were a few zero obser­va­tions before that, and tak­ing logs caused prob­lems. The argu­ment lambda=0 means the mod­els are fit­ted to the logged data (although rec­on­cil­i­a­tion must occur on the orig­i­nal scale).

The Big Data Zone is brought to you in partnership with Exaptive.  Learn how Rapid Application Development powers business. 

Topics:

Published at DZone with permission of Rob J Hyndman, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}