DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
11 Monitoring and Observability Tools for 2023
Learn more
  1. DZone
  2. Data Engineering
  3. Big Data
  4. Measuring Time Series Characteristics

Measuring Time Series Characteristics

Rob J Hyndman user avatar by
Rob J Hyndman
·
Jun. 23, 12 · Interview
Like (0)
Save
Tweet
Share
5.77K Views

Join the DZone community and get the full member experience.

Join For Free

A few years ago, I was work­ing on a project where we mea­sured var­i­ous char­ac­ter­is­tics of a time series and used the infor­ma­tion to deter­mine what fore­cast­ing method to apply or how to clus­ter the time series into mean­ing­ful groups. The two main papers to come out of that project were:

  1. Wang, Smith and Hyn­d­man (2006) Characteristic-​​​​based clus­ter­ing for time series data. Data Min­ing and Knowl­edge Dis­cov­ery, 13(3), 335–364.
  2. Wang, Smith-​​Miles and Hyn­d­man (2009) “Rule induc­tion for fore­cast­ing method selec­tion: meta-​​​​learning the char­ac­ter­is­tics of uni­vari­ate time series”, Neu­ro­com­puting, 72, 2581–2594.

I’ve since had a lot of requests for the code which one of my coau­thors has been help­fully email­ing to any­one who asked. But to make it eas­ier, we thought it might be help­ful if I post some updated code here. This is not the same as the R code we used in the paper, as I’ve improved it in sev­eral ways (so it will give dif­fer­ent results). If you just want the code, skip to the bot­tom of the post.

Find­ing the period of the data

Usu­ally in time series work, we know the period of the data (if the obser­va­tions are monthly, the period is 12, for exam­ple). But in this project, some of our data was of unknown period and we wanted a method to auto­mat­i­cally deter­mine the appro­pri­ate period. The method we used was based on local peaks and troughs in the ACF. But I’ve since devised a bet­ter approach (prompted on cross​val​i​dated​.com) using an esti­mate of the spec­tral density:

find.freq <- function(x)
{
  n <- length(x)
  spec <- spec.ar(c(na.contiguous(x)),plot=FALSE)
  if(max(spec$spec)>10) # Arbitrary threshold chosen by trial and error.
  {
    period <- round(1/spec$freq[which.max(spec$spec)])
    if(period==Inf) # Find next local maximum
    {
      j <- which(diff(spec$spec)>0)
      if(length(j)>0)
      {
        nextmax <- j[1] + which.max(spec$spec[j[1]:500])
        if(nextmax <= length(spec$freq))
          period <- round(1/spec$freq[nextmax])
        else
          period <- 1
      }
      else
        period <- 1
    }
  }
  else
    period <- 1
 
  return(period)
}

The func­tion is called find.freq because time series peo­ple often call the period of sea­son­al­ity the “fre­quency” (which is of course highly confusing).

Decom­pos­ing the data into trend and sea­sonal components

We needed a mea­sure of the strength of trend and the strength of sea­son­al­ity, and to do this we decom­posed the data into trend, sea­sonal and error terms.

Because not all data could be decom­posed addi­tively, we first needed to apply an auto­mated Box-​​Cox trans­for­ma­tion. We tried a range of Box-​​Cox para­me­ters on a grid, and selected the one which gave the most nor­mal errors. That worked ok, but I’ve since found some papers that pro­vide quite good auto­mated Box-​​Cox algo­rithms that I’ve imple­mented in the fore­cast pack­age. So this code uses Guerrero’s (1993) method instead.

For sea­sonal time series, we decom­posed the trans­formed data using an stl decom­po­si­tion with peri­odic seasonality.

For non-​​seasonal time series, we esti­mated the trend of the trans­formed data using penal­ized regres­sion splines via the mgcv pack­age.

decomp <- function(x,transform=TRUE)
{
  require(forecast)
  # Transform series
  if(transform & min(x,na.rm=TRUE) >= 0)
  {
    lambda <- BoxCox.lambda(na.contiguous(x))
    x <- BoxCox(x,lambda)
  }
  else
  {
    lambda <- NULL
    transform <- FALSE
  }
  # Seasonal data
  if(frequency(x)>1)
  {
    x.stl <- stl(x,s.window="periodic",na.action=na.contiguous)
    trend <- x.stl$time.series[,2]
    season <- x.stl$time.series[,1]
    remainder <- x - trend - season
  }
  else #Nonseasonal data
  {
    require(mgcv)
    tt <- 1:length(x)
    trend <- rep(NA,length(x))
    trend[!is.na(x)] <- fitted(gam(x ~ s(tt)))
    season <- NULL
    remainder <- x - trend
  }
  return(list(x=x,trend=trend,season=season,remainder=remainder,
    transform=transform,lambda=lambda))
}

Putting every­thing on a [0,1] scale

We wanted to mea­sure a range of char­ac­ter­is­tics such as strength of sea­son­al­ity, strength of trend, level of non­lin­ear­ity, skew­ness, kur­to­sis, ser­ial cor­re­lat­ed­ness, self-​​similarity, level of chaotic­ity (is that a word?) and the peri­od­ic­ity of the data. But we wanted all these on the same scale which meant map­ping the nat­ural range of each mea­sure onto [0,1]. The fol­low­ing two func­tions were used to do this.

# f1 maps [0,infinity) to [0,1]
f1 <- function(x,a,b)
{
  eax <- exp(a*x)
  if (eax == Inf)
    f1eax <- 1
  else
    f1eax <- (eax-1)/(eax+b)
  return(f1eax)
}
 
# f2 maps [0,1] onto [0,1]
f2 <- function(x,a,b)
{
  eax <- exp(a*x)
  ea <- exp(a)
  return((eax-1)/(eax+b)*(ea+b)/(ea-1))
}

The val­ues of a and b in each func­tion were cho­sen so the mea­sure had a 90th per­centile of 0.10 when the data were iid stan­dard nor­mal, and a value of 0.9 using a well-​​known bench­mark time series.

Cal­cu­lat­ing the measures

Now we are ready to cal­cu­late the mea­sures on the orig­i­nal data, as well as on the adjusted data (after remov­ing trend and seasonality).

measures <- function(x)
{
  require(forecast)
 
  N <- length(x)
  freq <- find.freq(x)
  fx <- c(frequency=(exp((freq-1)/50)-1)/(1+exp((freq-1)/50)))
  x <- ts(x,f=freq)
 
  # Decomposition
  decomp.x <- decomp(x)
 
  # Adjust data
  if(freq > 1)
    fits <- decomp.x$trend + decomp.x$season
  else # Nonseasonal data
    fits <- decomp.x$trend
  adj.x <- decomp.x$x - fits + mean(decomp.x$trend, na.rm=TRUE)
 
  # Backtransformation of adjusted data
  if(decomp.x$transform)
    tadj.x <- InvBoxCox(adj.x,decomp.x$lambda)
  else
    tadj.x <- adj.x
 
  # Trend and seasonal measures
  v.adj <- var(adj.x, na.rm=TRUE)
  if(freq > 1)
  {
    detrend <- decomp.x$x - decomp.x$trend
    deseason <- decomp.x$x - decomp.x$season
    trend <- ifelse(var(deseason,na.rm=TRUE) < 1e-10, 0, 
      max(0,min(1,1-v.adj/var(deseason,na.rm=TRUE))))
    season <- ifelse(var(detrend,na.rm=TRUE) < 1e-10, 0,
      max(0,min(1,1-v.adj/var(detrend,na.rm=TRUE))))
  }
  else #Nonseasonal data
  {
    trend <- ifelse(var(decomp.x$x,na.rm=TRUE) < 1e-10, 0,
      max(0,min(1,1-v.adj/var(decomp.x$x,na.rm=TRUE))))
    season <- 0
  }
 
  m <- c(fx,trend,season)
 
  # Measures on original data
  xbar <- mean(x,na.rm=TRUE)
  s <- sd(x,na.rm=TRUE)
 
  # Serial correlation
  Q <- Box.test(x,lag=10)$statistic/(N*10)
  fQ <- f2(Q,7.53,0.103)
 
  # Nonlinearity
  p <- terasvirta.test(na.contiguous(x))$statistic
  fp <- f1(p,0.069,2.304)
 
  # Skewness
  sk <- abs(mean((x-xbar)^3,na.rm=TRUE)/s^3)
  fs <- f1(sk,1.510,5.993)
 
  # Kurtosis
  k <- mean((x-xbar)^4,na.rm=TRUE)/s^4
  fk <- f1(k,2.273,11567)
 
  # Hurst=d+0.5 where d is fractional difference.
  H <- fracdiff(na.contiguous(x),0,0)$d + 0.5
 
  # Lyapunov Exponent
  if(freq > N-10)
      stop("Insufficient data")
  Ly <- numeric(N-freq)
  for(i in 1:(N-freq))
  {
    idx <- order(abs(x[i] - x))
    idx <- idx[idx < (N-freq)]
    j <- idx[2]
    Ly[i] <- log(abs((x[i+freq] - x[j+freq])/(x[i]-x[j])))/freq
    if(is.na(Ly[i]) | Ly[i]==Inf | Ly[i]==-Inf)
      Ly[i] <- NA
  }
  Lyap <- mean(Ly,na.rm=TRUE)
  fLyap <- exp(Lyap)/(1+exp(Lyap))
 
  m <- c(m,fQ,fp,fs,fk,H,fLyap)
 
  # Measures on adjusted data
  xbar <- mean(tadj.x, na.rm=TRUE)
  s <- sd(tadj.x, na.rm=TRUE)
 
  # Serial
  Q <- Box.test(adj.x,lag=10)$statistic/(N*10)
  fQ <- f2(Q,7.53,0.103)
 
  # Nonlinearity
  p <- terasvirta.test(na.contiguous(adj.x))$statistic
  fp <- f1(p,0.069,2.304)
 
  # Skewness
  sk <- abs(mean((tadj.x-xbar)^3,na.rm=TRUE)/s^3)
  fs <- f1(sk,1.510,5.993)
 
  # Kurtosis
  k <- mean((tadj.x-xbar)^4,na.rm=TRUE)/s^4
  fk <- f1(k,2.273,11567)
 
  m <- c(m,fQ,fp,fs,fk)
  names(m) <- c("frequency", "trend","seasonal",
    "autocorrelation","non-linear","skewness","kurtosis",
    "Hurst","Lyapunov",
    "dc autocorrelation","dc non-linear","dc skewness","dc kurtosis")
 
  return(m)
}

Here is a quick exam­ple applied to Aus­tralian monthly gas production:

library(forecast)
measures(gas)
     frequency              trend           seasonal    autocorrelation 
        0.1096             0.9989             0.9337             0.9985 
    non-linear           skewness           kurtosis              Hurst 
        0.4947             0.1282             0.0055             0.9996 
      Lyapunov dc autocorrelation      dc non-linear        dc skewness 
        0.5662             0.1140             0.0538             0.1743 
   dc kurtosis 
        0.9992

The func­tion is far from per­fect, and it is not hard to find exam­ples where it fails. For exam­ple, it doesn’t work with mul­ti­ple sea­son­al­ity — try measure(taylor) and check the sea­son­al­ity. Also, I’m not con­vinced the kur­to­sis pro­vides any­thing use­ful here, or that the skew­ness mea­sure is done in the best way pos­si­ble. But it was really a proof of con­cept, so we will leave it to oth­ers to revise and improve the code.

In our papers, we took the mea­sures obtained using R, and pro­duced self-​​organizing maps using Vis­cov­ery. There is now a som pack­age in R for that, so it might be pos­si­ble to inte­grate that step into R as well. The den­do­gram was gen­er­ated in mat­lab, although that could now also be done in R using the ggden­dro pack­age for example.

Down­load the code in a sin­gle file.

Time series Data mining Measure (physics)

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Practical Example of Using CSS Layer
  • Mission-Critical Cloud Modernization: Managing Coexistence With One-Way Data Sync
  • What’s New in the Latest Version of Angular V15?
  • Unlock the Power of Terragrunt’s Hierarchy

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: