Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

DZone's Guide to

# Seasonal Unit Roots

· Big Data Zone ·
Free Resource

Comment (0)

Save
{{ articles[0].views | formatCount}} Views

Cloudera Data Flow, the answer to all your real-time streaming data problems. Manage your data from edge to enterprise with a no-code approach to developing sophisticated streaming applications easily. Learn more today.

As discussed in the MAT8181 course, there are – at least – two kinds of non-stationary time series: those with a trend, and those with a unit-root (they will be called integrated). Unit root tests cannot be used to assess whether a time series is stationary, or not. They can only detect integrated time series. And the same holds for seasonal unit root.

In a previous post, we’ve seen that it was difficult to model hourly temperature, since most test do not reject unit roots. Consider here the average monthly temperature, still in Montréal, QC.

```> montreal=read.table("http://freakonometrics.free.fr/temp-montreal-monthly.txt")
> M=as.matrix(montreal[,2:13])
> X=as.numeric(t(M))
> tsm=ts(X,start=1948,freq=12)
> plot(tsm)```

For those who don’t know Montréal, Winter and Summer are very different. We can visualize monthly differences using

```> month=rep(1:12,length(tsm)/12)
> plot(month,as.numeric(tsm))
> lines(1:12,apply(M,2,mean),col="red",type="b",pch=19)```

or, if install the uroot package, removed from the CRAN repository, we can use

```> library(uroot)
> bbplot(tsm)```

or

```> bb3D(tsm)

It looks like our time series is cyclic, because of the yearly seasonal pattern. The autocorrelation function is here

`> acf(tsm,lag=120)`

Again, this cycle can be visualized using

```> persp(1948:2013,1:12,M,theta=-50,col="yellow",shade=TRUE,
+ xlab="Year",ylab="Month",zlab="Temperature",ticktype="detailed")```

Now, the question is is there a seasonal unit root ? This would mean that our model should be

$(1-L^{12})\Phi(L)[X_t-\mu>=\Theta(L)\varepsilon_t$

If we forget about the autoregressive and the moving average component, we can estimate

$(1-{\color{red}\alpha}L^{12})[X_t-\mu>=\varepsilon_t$

If there is a seasonal unit root then ${\color{red}\alpha}$ should be close to 1. Somehow.

```> arima(tsm,order=c(0,0,0),seasonal=list(order=c(1,0,0),period=12))

Call:
arima(x = tsm, order = c(0, 0, 0), seasonal = list(order = c(1, 0, 0), period = 12))

Coefficients:
sar1  intercept
0.9702     6.4566
s.e.  0.0071     2.1515```

It is not far away from 1. Actually, it cannot be too close to 1. If it was, then we would get an error message…

To illustrate some interesting models, let us consider also quarterly temperatures,

```> N=cbind(apply(montreal[,2:4],1,sum),apply(montreal[,5:7],1,sum),apply(montreal[,8:10],1,sum),apply(montreal[,11:13],1,sum))
> X=as.numeric(t(N))
> tsq=ts(X,start=1948,freq=4)
+ xlab="Year",ylab="Quarter",zlab="Temperature",ticktype="detailed")```

(again, the aim is just to be able to write down some equations, if necessary)

Why not consider a $VAR(1)$ model on the quarterly temperature? Something like

$\left(\begin{matrix}X_t^{(Q1)}\\ X_t^{(Q2)\\ X_t^{(Q3)\\ X_t^{(Q4)\end{matrix}\right)= A \left(\begin{matrix}X_{t-1}^{(Q1)}\\ X_{t-1}^{(Q2)\\ X_{t-1}^{(Q3)\\ X_{t-1}^{(Q4)\end{matrix}\right)+\left(\begin{matrix}\varepsilon_{t-1}^{(Q1)}\\ \varepsilon_{t-1}^{(Q2)\\ \varepsilon_{t-1}^{(Q3)\\ \varepsilon_{t-1}^{(Q4)\end{matrix}\right)$

i.e.

$\boldsymbol{Y}_t=A \boldsymbol{Y}_{t-1}+\boldsymbol{\varepsilon}_t$

where $A$ is some matrix $4\times 4$. This model can easily be estimated,

```> library(vars)
> df=data.frame(N)
> names(df)=paste("y",1:4,sep="")
> model=VAR(df)
> model

VAR Estimation Results:
=======================

Estimated coefficients for equation y1:
=======================================
Call:
y1 = y1.l1 + y2.l1 + y3.l1 + y4.l1 + const

y1.l1        y2.l1        y3.l1        y4.l1        const
-0.13943065   0.21451118   0.08921237   0.30362065 -34.74793931

Estimated coefficients for equation y2:
=======================================
Call:
y2 = y1.l1 + y2.l1 + y3.l1 + y4.l1 + const

y1.l1       y2.l1       y3.l1       y4.l1       const
0.02520938  0.05288958 -0.13277377  0.05134148 40.68955266

Estimated coefficients for equation y3:
=======================================
Call:
y3 = y1.l1 + y2.l1 + y3.l1 + y4.l1 + const

y1.l1       y2.l1       y3.l1       y4.l1       const
0.07740824 -0.21142726  0.11180518  0.12963931 56.81087283

Estimated coefficients for equation y4:
=======================================
Call:
y4 = y1.l1 + y2.l1 + y3.l1 + y4.l1 + const

y1.l1       y2.l1       y3.l1       y4.l1       const
0.18842863 -0.31964579  0.25099508 -0.04452577  5.73228873```

and matrix $A$ is here

```> A=rbind(
+ coefficients(model\$varresult\$y1)[1:4],
+ coefficients(model\$varresult\$y2)[1:4],
+ coefficients(model\$varresult\$y3)[1:4],
+ coefficients(model\$varresult\$y4)[1:4])
> A
y1.l1       y2.l1       y3.l1       y4.l1
[1,] -0.13943065  0.21451118  0.08921237  0.30362065
[2,]  0.02520938  0.05288958 -0.13277377  0.05134148
[3,]  0.07740824 -0.21142726  0.11180518  0.12963931
[4,]  0.18842863 -0.31964579  0.25099508 -0.04452577```

Since stationary of this multiple time series is closely related to eigenvalues of this matrix, let us look at them,

```> eigen(A)\$values
[1]  0.35834830 -0.32824657 -0.14042175  0.09105836
> Mod(eigen(A)\$values)
[1] 0.35834830 0.32824657 0.14042175 0.09105836```

So it looks like there is no stationarity issue, here. A restricted model is the periodic autoregressive model, so called $PAR(1)$ model, discussed by Paap and Franses

$\boldsymbol{\Phi}_0 \boldsymbol{Y}_t=\boldsymbol{\Phi}_1 \boldsymbol{Y}_{t-1}+\boldsymbol{\varepsilon}_t$

where

$\boldsymbol{\Phi}_0=\left(\begin{array}{cccc}1&0&0&0\\ -\phi_{1,2}&1&0&0\\ 0&-\phi_{1,3}&1&0\\ 0&0&-\phi_{1,4}&1\end{array}\right)$

and

$\boldsymbol{\Phi}_1=\left(\begin{array}{cccc}0&0&0&\phi_{1,1}\\ 0&0&0&0\\ 0&0&0&0\\ 0&0&0&0\end{array}\right)$

Keep in mind that this is a $VAR(1)$ model, since

$\boldsymbol{Y}_t=\underbrace{\boldsymbol{\Phi}_0^{-1} \boldsymbol{\Phi}_1}_A \boldsymbol{Y}_{t-1}+\boldsymbol{\varepsilon}_t$

This model can be estimated using a specific package (one can also look at the vignette, to get a better understanding of the syntax)

```> library(partsm)
> detcomp <- list(regular=c(0,0,0), seasonal=c(1,0), regvar=0)
> model=fit.ar.par(wts=tsq, detcomp=detcomp, type="PAR", p=1)
> PAR.MVrepr(model)
----
Multivariate representation of a PAR model.

Phi0:

1.000  0.000  0.000 0
-0.242  1.000  0.000 0
0.000 -0.261  1.000 0
0.000  0.000 -0.492 1

Phi1:

0 0 0 0.314
0 0 0 0.000
0 0 0 0.000
0 0 0 0.000

Eigen values of Gamma = Phi0^{-1} %*% Phi1:
0.01 0 0 0

Time varing accumulation of shocks:

0.010 0.040 0.155 0.314
0.002 0.010 0.037 0.076
0.001 0.003 0.010 0.020
0.000 0.001 0.005 0.010```

Here, the characteristic equation is

$|\boldsymbol{\Phi}_0-\boldsymbol{\Phi}_1 z |=(1-\phi_{1,1}\phi_{1,2}\phi_{1,3}\phi_{1,4}\cdot z)$

so there is a (seasonal) unit root if

$\phi_{1,1}\phi_{1,2}\phi_{1,3}\phi_{1,4}=1$

Which is clearly not the case, here. It is possible to perform Canova-Hansen test,

```> CH.test(tsq)

------ - ------ ----
Canova & Hansen test
------ - ------ ----

Null hypothesis: Stationarity.
Alternative hypothesis: Unit root.
Frequency of the tested cycles: pi/2 , pi ,

L-statistic: 1.122
Lag truncation parameter: 5

Critical values:

0.10 0.05 0.025 0.01
0.846 1.01  1.16 1.35```

The idea is that polynomial $(1-L^4)$ has four root, in $\mathbb{C}$,

$\{+1,+i,-1,-i\}$

since

$(1-L^4)=(1-L)(1+L)(1+L^2)=(1-L)(1+L)(1-iL)(1+iL)$

If we get back to monthly data, $(1-L^{12})$ has twelve roots,

$\left\{\pm 1, \ \pm i, \ \pm\frac{1\pm \sqrt{3}i}{2},\pm\frac{\sqrt{3}\pm i}{2}\right\}$

each of them having different interpretations.

Here we can have 1 cycle per year (on 12 months), 2 cycles per year (on 6 months), 3 cycles per year (on 4 months), 4 cycles per year (on 3 months), even 6 cycles per year (on 2 months). This will depend on the argument of the root, with respectively

$\left\{\frac{\pi}{6},\frac{\pi}{3},\frac{\pi}{2},\frac{2\pi}{3},\frac{5\pi}{6},\pi\right\}$

The output of the test is here

```> CH.test(tsm)

------ - ------ ----
Canova & Hansen test
------ - ------ ----

Null hypothesis: Stationarity.
Alternative hypothesis: Unit root.
Frequency of the tested cycles: pi/6 , pi/3 , pi/2 , 2pi/3 , 5pi/6 , pi ,

L-statistic: 1.964
Lag truncation parameter: 20

Critical values:

0.10 0.05 0.025 0.01
2.49 2.75  2.99 3.27```

And it looks like we reject the assumption of a seasonal unit root. I can even mention the following testing procedure

```> library(forecast)
> nsdiffs(tsm, test="ch")
[1] 0```

where the ouput: “1″ means that there is a seasonal unit root and “0″ that there is no seasonal unit root. Simple to read, isn’t it? If we consider the periodic autoregressive model on the monthly data, the output is

```> model=fit.ar.par(wts=tsm, detcomp=detcomp, type="PAR", p=1)
> model
----
PAR model of order 1 .

y_t = alpha_{1,s}*y_{t-1} + alpha_{2,s}*y_{t-2} + ... + alpha_{p,s}*y_{t-p} + coeffs*detcomp + epsilon_t,  for s=1,2,...,12
----
Autoregressive coefficients.

s=1  s=2  s=3  s=4  s=5 s=6 s=7  s=8  s=9 s=10 s=11 s=12
alpha_1s 0.15 0.05 0.07 0.33 0.11   0 0.3 0.38 0.31 0.19 0.15 0.37```

So, whatever the test, we always reject the assumption that there is a seasonal unit root. Which does not mean that we can not have a strong cycle! Actually, the series is almost periodic. But there is no unit root! So all of this makes sense (I hardly believe that there might be unit root – seasonal, or not – in temperatures).

Just to make sure that we get it right, consider two times series, inspired from the previous one. The first one is a periodic sequence (with a very very small noise, just to avoid problem of non-definite matrices) and the second one is clearly integrated.

```> Xp1=Xp2=as.numeric(t(M))
> for(t in 13:length(M)){
+ Xp1[t]=Xp1[t-12]
+ Xp2[t]=Xp2[t-12]+rnorm(1,0,2)
+ }
> Xp1=Xp1+rnorm(length(Xp1),0,.02)
> tsp1=ts(Xp1,start=1948,freq=12)
> tsp2=ts(Xp2,start=1948,freq=12)
> par(mfrow=c(2,1))
> plot(tsp1)
> plot(tsp2)```

```> par(mfrow=c(1,2))
> bb3D(tsp1)
> bb3D(tsp2)```

If we quickly look at those series, I would say that the first one has no unit root – even if it is not stationary, but it is because the series is periodic – while there is (are ?) unit root(s) for the second one. If we look at Canova-Hansen test, we get

```> CH.test(tsp1)

------ - ------ ----
Canova & Hansen test
------ - ------ ----

Null hypothesis: Stationarity.
Alternative hypothesis: Unit root.
Frequency of the tested cycles: pi/6 , pi/3 , pi/2 , 2pi/3 , 5pi/6 , pi ,

L-statistic: 2.234
Lag truncation parameter: 20

Critical values:

0.10 0.05 0.025 0.01
2.49 2.75  2.99 3.27

> CH.test(tsp2)

------ - ------ ----
Canova & Hansen test
------ - ------ ----

Null hypothesis: Stationarity.
Alternative hypothesis: Unit root.
Frequency of the tested cycles: pi/6 , pi/3 , pi/2 , 2pi/3 , 5pi/6 , pi ,

L-statistic: 5.448
Lag truncation parameter: 20

Critical values:

0.10 0.05 0.025 0.01
2.49 2.75  2.99 3.27```

I know that this package has been removed, so maybe I should not use it. Consider instead

```> nsdiffs(tsp1, 12,test="ch")
[1] 0
> nsdiffs(tsp2, 12,test="ch")
[1] 1```

Here we have the same conclusion. The first one does not have a unit root, but the second one has. But be careful: with Osborn-Chui-Smith-Birchenhall test,

```> nsdiffs(tsp1, 12,test="ocsb")
[1] 1
> nsdiffs(tsp2, 12,test="ocsb")
[1] 1```

we have the feeling that there is also a unit root in our cyclic series…

So here, on a short-frequency, we do reject the assumption of a unit root – even a seasonal one – on our temperature series. We still have our high-frequency problem to deal with, some day (but I don’t think I’ll have enough time to introduce long range dependence this session, unfortunately).

Cloudera Enterprise Data Hub. One platform, many applications. Start today.

Topics:

Comment (0)

Save
{{ articles[0].views | formatCount}} Views

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.