Over a million developers have joined DZone.

Constants and ARIMA Models in R

DZone's Guide to

Constants and ARIMA Models in R

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

This post is from my new book Fore­cast­ing: prin­ci­ples and prac­tice, avail­able freely online at OTexts​.com/fpp/.

A non-​​seasonal ARIMA model can be writ­ten as

(1)   \begin{equation*} (1-\phi_1B - \cdots - \phi_p B^p)(1-B)^d y_t = c + (1 + \theta_1 B + \cdots + \theta_q B^q)e_t \end{equation*}

or equiv­a­lently as

(2)   \begin{equation*} (1-\phi_1B - \cdots - \phi_p B^p)(1-B)^d (y_t - \mu t^d/d!) = (1 + \theta_1 B + \cdots + \theta_q B^q)e_t, \end{equation*}

where B is the back­shift oper­a­tor, c = \mu(1-\phi_1 - \cdots - \phi_p ) and \mu is the mean of (1-B)^d y_t. R uses the param­e­triza­tion of equa­tion (2).

Thus, the inclu­sion of a con­stant in a non-​​stationary ARIMA model is equiv­a­lent to induc­ing a poly­no­mial trend of order d in the fore­cast func­tion. (If the con­stant is omit­ted, the fore­cast func­tion includes a poly­no­mial trend of order d-1.) When d=0, we have the spe­cial case that \mu is the mean of y_t.

Includ­ing con­stants in ARIMA mod­els using R


By default, the arima() com­mand in R sets c=\mu=0 when d>0 and pro­vides an esti­mate of \mu when d=0. The para­me­ter \mu is called the “inter­cept” in the R out­put. It will be close to the sam­ple mean of the time series, but usu­ally not iden­ti­cal to it as the sam­ple mean is not the max­i­mum like­li­hood esti­mate when p+q>0.

The arima() com­mand has an argu­ment include.mean which only has an effect when d=0 and is TRUE by default. Set­ting include.mean=FALSE will force \mu=0.


The Arima() com­mand from the fore­cast pack­age pro­vides more flex­i­bil­ity on the inclu­sion of a con­stant. It has an argu­ment include.mean which has iden­ti­cal func­tion­al­ity to the cor­re­spond­ing argu­ment for arima(). It also has an argu­ment include.drift which allows \mu\ne0 when d=1. For d>1, no con­stant is allowed as a qua­dratic or higher order trend is par­tic­u­larly dan­ger­ous when fore­cast­ing. The para­me­ter \mu is called the “drift” in the R out­put when d=1.

There is also an argu­ment include.constant which, if TRUE, will set include.mean=TRUE if d=0 and include.drift=TRUE when d=1. If include.constant=FALSE, both include.mean and include.drift will be set to FALSE. If include.constant is used, the val­ues of include.mean=TRUE and include.drift=TRUE are ignored.

When d=0 and include.drift=TRUE, the fit­ted model from Arima() is

    \[(1-\phi_1B - \cdots - \phi_p B^p) (y_t - a - bt) = (1 + \theta_1 B + \cdots + \theta_q B^q)e_t.\]

In this case, the R out­put will label a as the “inter­cept” and b as the “drift” coefficient.


The auto.arima() func­tion auto­mates the inclu­sion of a con­stant. By default, for d=0 or d=1, a con­stant will be included if it improves the AIC value; for d>1 the con­stant is always omit­ted. If allowdrift=FALSE is spec­i­fied, then the con­stant is only allowed when d=0.

Even­tual fore­cast functions

The even­tual fore­cast func­tion (EFF) is the limit of \hat{y}_{t+h|t} as a func­tion of the fore­cast hori­zon h as h\rightarrow\infty.

The con­stant c has an impor­tant effect on the long-​​term fore­casts obtained from these models.

  • If c=0 and d=0, the EFF will go to zero.
  • If c=0 and d=1, the EFF will go to a non-​​zero con­stant deter­mined by the last few observations.
  • If c=0 and d=2, the EFF will fol­low a straight line with inter­cept and slope deter­mined by the last few observations.
  • If c\ne0 and d=0, the EFF will go to the mean of the data.
  • If c\ne0 and d=1, the EFF will fol­low a straight line with slope equal to the mean of the dif­fer­enced data.
  • If c\ne0 and d=2, the EFF will fol­low a qua­dratic trend.

Sea­sonal ARIMA models

If a sea­sonal model is used, all of the above will hold with d replaced by d+D where D is the order of sea­sonal dif­fer­enc­ing and d is the order of non-​​seasonal dif­fer­enc­ing.

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.


Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}