Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Estimating a Nonlinear Time Series Model in R

DZone's Guide to

Estimating a Nonlinear Time Series Model in R

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

There are quite a few R pack­ages avail­able for non­lin­ear time series analy­sis, but some­times you need to code your own mod­els. Here is a sim­ple exam­ple to show how it can be done.

The model is a first order thresh­old autoregression:

where e_t is a Gauss­ian white noise series with vari­ance \sigma^2. The fol­low­ing func­tion will gen­er­ate some ran­dom data from this model.

simnlts <- function(n, alpha, beta, r, sigma, gamma, burnin=100)
{
  # Generate noise
  e <- rnorm(n+burnin, 0, sigma)
  # Create space for y
  y <- numeric(n+burnin)
  # Generate time series
  for(i in 2:(n+burnin))
  {
    if(y[i-1] <= r)
      y[i] <- alpha*y[i-1] + e[i]
    else
      y[i] <- beta*y[i-1] + gamma*e[i]
  }
  # Throw away first burnin values
  y <- ts(y[-(1:burnin)])
  # Return result
  return(y)
}

The “burn-​​in” para­me­ter allows the first value of the series to be a ran­dom draw from the sta­tion­ary dis­tri­b­u­tion of the process (assum­ing it is stationary).

We will esti­mate the model by min­i­miz­ing the sum of squared errors across both regimes. Other approaches that take account of the dif­fer­ent vari­ances in the two regimes may be bet­ter, but it is use­ful to keep it sim­ple, at least ini­tially. The fol­low­ing func­tion uses a non-​​linear opti­mizer to esti­mate \alpha, \beta and r. To ensure good start­ing val­ues, we begin the opti­miza­tion with \alpha=\beta=0 and set the ini­tial value of r to be the mean of the observed data.

fitnlts <- function(x)
{
  ss <- function(par,x)
  {
    alpha <- par[1]
    beta <- par[2]
    r <- par[3]
    n <- length(x)
    # Check that each regime has at least 10% of observations
    if(sum(x<=r) < n/10 | sum(x>r) < n/10)
      return(1e20)
    e1 <- x[2:n] - alpha*x[1:(n-1)]
    e2 <- x[2:n] - beta*x[1:(n-1)]
    regime1 <- (x[1:(n-1)] <= r)
    e <- e1*(regime1) + e2*(!regime1)
    return(sum(e^2))
  }
  fit <- optim(c(0,0,mean(x)),ss,x=x,control=list(maxit=1000))
  if(fit$convergence > 0)
    return(rep(NA,3))
  else
    return(c(alpha=fit$par[1],beta=fit$par[2],r=fit$par[3]))
}

There are a few things to notice here.

  • I have used a func­tion inside a func­tion. The ss func­tion com­putes the sum of squared errors. It would have been pos­si­ble to define this out­side fitnlts, but as I don’t need it for any other pur­pose it is neater to define it locally.
  • Occa­sion­ally the opti­mizer will not con­verge, and then fitnlts will return miss­ing values.
  • I have avoided the use of a loop by com­put­ing the errors in both regimes. This is much faster than the alternatives.
  • If r is out­side the range of the data, then either \alpha or \beta will be uniden­ti­fi­able. To avoid this prob­lem, I force r to be such that each régime must con­tain at least 10% of the data. This is an arbi­trary value, but it makes the esti­ma­tion more stable.
  • The func­tion being opti­mized is not con­tin­u­ous in the r direc­tion. This will cause gradient-​​based opti­miz­ers to fail, but the Nelder-​​Mead opti­mizer used here is rel­a­tively robust to such problems.
  • This par­tic­u­lar model can be more effi­ciently esti­mated by exploit­ing the con­di­tional lin­ear­ity. For exam­ple, we could loop over a grid of r val­ues (e.g., at the mid-​​points between con­sec­u­tive ordered obser­va­tions) and use lin­ear regres­sion for the other para­me­ters. But the above approach is more gen­eral and can be adapted to other non­lin­ear time series models.

The func­tions can be used as fol­lows (with some exam­ple parameters):

y <- simnlts(100, 0.5, -1.8, -1, 1, 2)
fitnlts(y)

A sim­i­lar approach can be used for other time series models.


Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

Topics:

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}