Estimating a Nonlinear Time Series Model in R
Join the DZone community and get the full member experience.
Join For FreeThere are quite a few R packages available for nonlinear time series analysis, but sometimes you need to code your own models. Here is a simple example to show how it can be done.
The model is a first order threshold autoregression:
where is a Gaussian white noise series with variance . The following function will generate some random data from this model.
simnlts < function(n, alpha, beta, r, sigma, gamma, burnin=100) { # Generate noise e < rnorm(n+burnin, 0, sigma) # Create space for y y < numeric(n+burnin) # Generate time series for(i in 2:(n+burnin)) { if(y[i1] <= r) y[i] < alpha*y[i1] + e[i] else y[i] < beta*y[i1] + gamma*e[i] } # Throw away first burnin values y < ts(y[(1:burnin)]) # Return result return(y) }
The “burnin” parameter allows the first value of the series to be a random draw from the stationary distribution of the process (assuming it is stationary).
We will estimate the model by minimizing the sum of squared errors across both regimes. Other approaches that take account of the different variances in the two regimes may be better, but it is useful to keep it simple, at least initially. The following function uses a nonlinear optimizer to estimate , and . To ensure good starting values, we begin the optimization with and set the initial value of to be the mean of the observed data.
fitnlts < function(x) { ss < function(par,x) { alpha < par[1] beta < par[2] r < par[3] n < length(x) # Check that each regime has at least 10% of observations if(sum(x<=r) < n/10  sum(x>r) < n/10) return(1e20) e1 < x[2:n]  alpha*x[1:(n1)] e2 < x[2:n]  beta*x[1:(n1)] regime1 < (x[1:(n1)] <= r) e < e1*(regime1) + e2*(!regime1) return(sum(e^2)) } fit < optim(c(0,0,mean(x)),ss,x=x,control=list(maxit=1000)) if(fit$convergence > 0) return(rep(NA,3)) else return(c(alpha=fit$par[1],beta=fit$par[2],r=fit$par[3])) }
There are a few things to notice here.
 I have used a function inside a function. The
ss
function computes the sum of squared errors. It would have been possible to define this outsidefitnlts
, but as I don’t need it for any other purpose it is neater to define it locally.  Occasionally the optimizer will not converge, and then
fitnlts
will return missing values.  I have avoided the use of a loop by computing the errors in both regimes. This is much faster than the alternatives.
 If is outside the range of the data, then either or will be unidentifiable. To avoid this problem, I force to be such that each régime must contain at least 10% of the data. This is an arbitrary value, but it makes the estimation more stable.
 The function being optimized is not continuous in the direction. This will cause gradientbased optimizers to fail, but the NelderMead optimizer used here is relatively robust to such problems.
 This particular model can be more efficiently estimated by exploiting the conditional linearity. For example, we could loop over a grid of values (e.g., at the midpoints between consecutive ordered observations) and use linear regression for the other parameters. But the above approach is more general and can be adapted to other nonlinear time series models.
The functions can be used as follows (with some example parameters):
y < simnlts(100, 0.5, 1.8, 1, 1, 2) fitnlts(y)
A similar approach can be used for other time series models.
Published at DZone with permission of Rob J Hyndman, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Trending

Transactional Outbox Patterns Step by Step With Spring and Kotlin

Micro Frontends on Monorepo With Remote State Management

Integration Testing Tutorial: A Comprehensive Guide With Examples And Best Practices

What to Pay Attention to as Automation Upends the Developer Experience
Comments