Over a million developers have joined DZone.

The Forecast Mean After Back-​​Transformation

· Big Data Zone

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

Many func­tions in the fore­cast pack­age for R will allow a Box-​​Cox trans­for­ma­tion. The mod­els are fit­ted to the trans­formed data and the fore­casts and pre­dic­tion inter­vals are back-​​transformed. This pre­serves the cov­er­age of the pre­dic­tion inter­vals, and the back-​​transformed point fore­cast can be con­sid­ered the median of the fore­cast den­si­ties (assum­ing the fore­cast den­si­ties on the trans­formed scale are sym­met­ric). For many pur­poses, this is accept­able, but occa­sion­ally the mean fore­cast is required. For exam­ple, with hier­ar­chi­cal fore­cast­ing the fore­casts need to be aggre­gated, and medi­ans do not aggre­gate but means do.

It is easy enough to derive the mean fore­cast using a Tay­lor series expan­sion. Sup­pose f(x) rep­re­sents the back-​​transformation func­tion, \mu is the mean on the trans­formed scale and \sigma^2 is the vari­ance on the trans­formed scale. Then using the first three terms of a Tay­lor expan­sion around \mu, the mean on the orig­i­nal scale is given by


Box-​​Cox transformations

For a Box-​​Cox transformation,



and the back­trans­formed mean is given by


There­fore, to adjust the back-​​transformed mean obtained by R, the fol­low­ing code can be used.

fit <- ets(eggs, lambda=0)
fc <- forecast(fit, h=50, level=95)
fvar <- ((BoxCox(fc$upper,fit$lambda)-BoxCox(fc$lower,fit$lambda))/qnorm(0.975)/2)^2
fc$mean <- fc$mean * (1 + 0.5*fvar)
fit <- ets(eggs, lambda=0.2)
fc <- forecast(fit, h=50, level=95)
fvar <- ((BoxCox(fc$upper,fit$lambda)-BoxCox(fc$lower,fit$lambda))/qnorm(0.975)/2)^2
fc$mean <- fc$mean * (1 + 0.5*fvar*(1-fit$lambda)/(fc$mean)^(2*fit$lambda))

The sec­ond of these plots is shown below. The blue line shows the fore­cast medi­ans while the red line shows the fore­cast means.

Scaled logis­tic transformation

In my pre­vi­ous post on trans­for­ma­tions, I described the scaled logit trans­for­ma­tion for bound­ing a fore­cast between spec­i­fied lim­its a and b. In this case,


and so

and the back-​​transformed mean is given by


In R, this can be cal­cu­lated as follows.

# Bounds
a <- 50
b <- 400
# Transform data
y <- log((eggs-a)/(b-eggs))
fit <- ets(y)
fc <- forecast(fit, h=50, level=0.95)
fvar <- ((fc$upper=fc$lower)/qnorm(0.975)/2)^2
emu <- exp(fc$mean)
# Back-transform forecasts
fc$mean <- (b-a)*exp(fc$mean)/(1+exp(fc$mean)) + a
fc$lower <- (b-a)*exp(fc$lower)/(1+exp(fc$lower)) + a
fc$upper <- (b-a)*exp(fc$upper)/(1+exp(fc$upper)) + a
fc$x <- eggs
# Plot result on original scale
# Compute forecast mean
fc$mean <- 1/(1+emu)^3*((a+b*emu)*(1+emu)^2 + fvar*(b-a)*emu*(1-emu)/2)

Hortonworks Sandbox is a personal, portable Apache Hadoop® environment that comes with dozens of interactive Hadoop and it's ecosystem tutorials and the most exciting developments from the latest HDP distribution, brought to you in partnership with Hortonworks.


Published at DZone with permission of Rob J Hyndman, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}