Over a million developers have joined DZone.

Measuring Time Series Characteristics

DZone's Guide to

Measuring Time Series Characteristics

· Big Data Zone
Free Resource

Access NoSQL and Big Data through SQL using standard drivers (ODBC, JDBC, ADO.NET). Free Download 

A few years ago, I was work­ing on a project where we mea­sured var­i­ous char­ac­ter­is­tics of a time series and used the infor­ma­tion to deter­mine what fore­cast­ing method to apply or how to clus­ter the time series into mean­ing­ful groups. The two main papers to come out of that project were:

  1. Wang, Smith and Hyn­d­man (2006) Characteristic-​​​​based clus­ter­ing for time series data. Data Min­ing and Knowl­edge Dis­cov­ery, 13(3), 335–364.
  2. Wang, Smith-​​Miles and Hyn­d­man (2009) “Rule induc­tion for fore­cast­ing method selec­tion: meta-​​​​learning the char­ac­ter­is­tics of uni­vari­ate time series”, Neu­ro­com­puting, 72, 2581–2594.

I’ve since had a lot of requests for the code which one of my coau­thors has been help­fully email­ing to any­one who asked. But to make it eas­ier, we thought it might be help­ful if I post some updated code here. This is not the same as the R code we used in the paper, as I’ve improved it in sev­eral ways (so it will give dif­fer­ent results). If you just want the code, skip to the bot­tom of the post.

Find­ing the period of the data

Usu­ally in time series work, we know the period of the data (if the obser­va­tions are monthly, the period is 12, for exam­ple). But in this project, some of our data was of unknown period and we wanted a method to auto­mat­i­cally deter­mine the appro­pri­ate period. The method we used was based on local peaks and troughs in the ACF. But I’ve since devised a bet­ter approach (prompted on cross​val​i​dated​.com) using an esti­mate of the spec­tral density:

find.freq <- function(x)
  n <- length(x)
  spec <- spec.ar(c(na.contiguous(x)),plot=FALSE)
  if(max(spec$spec)>10) # Arbitrary threshold chosen by trial and error.
    period <- round(1/spec$freq[which.max(spec$spec)])
    if(period==Inf) # Find next local maximum
      j <- which(diff(spec$spec)>0)
        nextmax <- j[1] + which.max(spec$spec[j[1]:500])
        if(nextmax <= length(spec$freq))
          period <- round(1/spec$freq[nextmax])
          period <- 1
        period <- 1
    period <- 1

The func­tion is called find.freq because time series peo­ple often call the period of sea­son­al­ity the “fre­quency” (which is of course highly confusing).

Decom­pos­ing the data into trend and sea­sonal components

We needed a mea­sure of the strength of trend and the strength of sea­son­al­ity, and to do this we decom­posed the data into trend, sea­sonal and error terms.

Because not all data could be decom­posed addi­tively, we first needed to apply an auto­mated Box-​​Cox trans­for­ma­tion. We tried a range of Box-​​Cox para­me­ters on a grid, and selected the one which gave the most nor­mal errors. That worked ok, but I’ve since found some papers that pro­vide quite good auto­mated Box-​​Cox algo­rithms that I’ve imple­mented in the fore­cast pack­age. So this code uses Guerrero’s (1993) method instead.

For sea­sonal time series, we decom­posed the trans­formed data using an stl decom­po­si­tion with peri­odic seasonality.

For non-​​seasonal time series, we esti­mated the trend of the trans­formed data using penal­ized regres­sion splines via the mgcv pack­age.

decomp <- function(x,transform=TRUE)
  # Transform series
  if(transform & min(x,na.rm=TRUE) >= 0)
    lambda <- BoxCox.lambda(na.contiguous(x))
    x <- BoxCox(x,lambda)
    lambda <- NULL
    transform <- FALSE
  # Seasonal data
    x.stl <- stl(x,s.window="periodic",na.action=na.contiguous)
    trend <- x.stl$time.series[,2]
    season <- x.stl$time.series[,1]
    remainder <- x - trend - season
  else #Nonseasonal data
    tt <- 1:length(x)
    trend <- rep(NA,length(x))
    trend[!is.na(x)] <- fitted(gam(x ~ s(tt)))
    season <- NULL
    remainder <- x - trend

Putting every­thing on a [0,1] scale

We wanted to mea­sure a range of char­ac­ter­is­tics such as strength of sea­son­al­ity, strength of trend, level of non­lin­ear­ity, skew­ness, kur­to­sis, ser­ial cor­re­lat­ed­ness, self-​​similarity, level of chaotic­ity (is that a word?) and the peri­od­ic­ity of the data. But we wanted all these on the same scale which meant map­ping the nat­ural range of each mea­sure onto [0,1]. The fol­low­ing two func­tions were used to do this.

# f1 maps [0,infinity) to [0,1]
f1 <- function(x,a,b)
  eax <- exp(a*x)
  if (eax == Inf)
    f1eax <- 1
    f1eax <- (eax-1)/(eax+b)
# f2 maps [0,1] onto [0,1]
f2 <- function(x,a,b)
  eax <- exp(a*x)
  ea <- exp(a)

The val­ues of a and b in each func­tion were cho­sen so the mea­sure had a 90th per­centile of 0.10 when the data were iid stan­dard nor­mal, and a value of 0.9 using a well-​​known bench­mark time series.

Cal­cu­lat­ing the measures

Now we are ready to cal­cu­late the mea­sures on the orig­i­nal data, as well as on the adjusted data (after remov­ing trend and seasonality).

measures <- function(x)
  N <- length(x)
  freq <- find.freq(x)
  fx <- c(frequency=(exp((freq-1)/50)-1)/(1+exp((freq-1)/50)))
  x <- ts(x,f=freq)
  # Decomposition
  decomp.x <- decomp(x)
  # Adjust data
  if(freq > 1)
    fits <- decomp.x$trend + decomp.x$season
  else # Nonseasonal data
    fits <- decomp.x$trend
  adj.x <- decomp.x$x - fits + mean(decomp.x$trend, na.rm=TRUE)
  # Backtransformation of adjusted data
    tadj.x <- InvBoxCox(adj.x,decomp.x$lambda)
    tadj.x <- adj.x
  # Trend and seasonal measures
  v.adj <- var(adj.x, na.rm=TRUE)
  if(freq > 1)
    detrend <- decomp.x$x - decomp.x$trend
    deseason <- decomp.x$x - decomp.x$season
    trend <- ifelse(var(deseason,na.rm=TRUE) < 1e-10, 0, 
    season <- ifelse(var(detrend,na.rm=TRUE) < 1e-10, 0,
  else #Nonseasonal data
    trend <- ifelse(var(decomp.x$x,na.rm=TRUE) < 1e-10, 0,
    season <- 0
  m <- c(fx,trend,season)
  # Measures on original data
  xbar <- mean(x,na.rm=TRUE)
  s <- sd(x,na.rm=TRUE)
  # Serial correlation
  Q <- Box.test(x,lag=10)$statistic/(N*10)
  fQ <- f2(Q,7.53,0.103)
  # Nonlinearity
  p <- terasvirta.test(na.contiguous(x))$statistic
  fp <- f1(p,0.069,2.304)
  # Skewness
  sk <- abs(mean((x-xbar)^3,na.rm=TRUE)/s^3)
  fs <- f1(sk,1.510,5.993)
  # Kurtosis
  k <- mean((x-xbar)^4,na.rm=TRUE)/s^4
  fk <- f1(k,2.273,11567)
  # Hurst=d+0.5 where d is fractional difference.
  H <- fracdiff(na.contiguous(x),0,0)$d + 0.5
  # Lyapunov Exponent
  if(freq > N-10)
      stop("Insufficient data")
  Ly <- numeric(N-freq)
  for(i in 1:(N-freq))
    idx <- order(abs(x[i] - x))
    idx <- idx[idx < (N-freq)]
    j <- idx[2]
    Ly[i] <- log(abs((x[i+freq] - x[j+freq])/(x[i]-x[j])))/freq
    if(is.na(Ly[i]) | Ly[i]==Inf | Ly[i]==-Inf)
      Ly[i] <- NA
  Lyap <- mean(Ly,na.rm=TRUE)
  fLyap <- exp(Lyap)/(1+exp(Lyap))
  m <- c(m,fQ,fp,fs,fk,H,fLyap)
  # Measures on adjusted data
  xbar <- mean(tadj.x, na.rm=TRUE)
  s <- sd(tadj.x, na.rm=TRUE)
  # Serial
  Q <- Box.test(adj.x,lag=10)$statistic/(N*10)
  fQ <- f2(Q,7.53,0.103)
  # Nonlinearity
  p <- terasvirta.test(na.contiguous(adj.x))$statistic
  fp <- f1(p,0.069,2.304)
  # Skewness
  sk <- abs(mean((tadj.x-xbar)^3,na.rm=TRUE)/s^3)
  fs <- f1(sk,1.510,5.993)
  # Kurtosis
  k <- mean((tadj.x-xbar)^4,na.rm=TRUE)/s^4
  fk <- f1(k,2.273,11567)
  m <- c(m,fQ,fp,fs,fk)
  names(m) <- c("frequency", "trend","seasonal",
    "dc autocorrelation","dc non-linear","dc skewness","dc kurtosis")

Here is a quick exam­ple applied to Aus­tralian monthly gas production:

     frequency              trend           seasonal    autocorrelation 
        0.1096             0.9989             0.9337             0.9985 
    non-linear           skewness           kurtosis              Hurst 
        0.4947             0.1282             0.0055             0.9996 
      Lyapunov dc autocorrelation      dc non-linear        dc skewness 
        0.5662             0.1140             0.0538             0.1743 
   dc kurtosis 

The func­tion is far from per­fect, and it is not hard to find exam­ples where it fails. For exam­ple, it doesn’t work with mul­ti­ple sea­son­al­ity — try measure(taylor) and check the sea­son­al­ity. Also, I’m not con­vinced the kur­to­sis pro­vides any­thing use­ful here, or that the skew­ness mea­sure is done in the best way pos­si­ble. But it was really a proof of con­cept, so we will leave it to oth­ers to revise and improve the code.

In our papers, we took the mea­sures obtained using R, and pro­duced self-​​organizing maps using Vis­cov­ery. There is now a som pack­age in R for that, so it might be pos­si­ble to inte­grate that step into R as well. The den­do­gram was gen­er­ated in mat­lab, although that could now also be done in R using the ggden­dro pack­age for example.

Down­load the code in a sin­gle file.

The fastest databases need the fastest drivers - learn how you can leverage CData Drivers for high performance NoSQL & Big Data Access.


Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}