Over a million developers have joined DZone.

A New R Package for Detecting Unusual Time Series

· Performance Zone

Download Forrester’s “Vendor Landscape, Application Performance Management” report that examines the evolving role of APM as a key driver of customer satisfaction and business success, brought to you in partnership with BMC.

The anomalous package provides some tools to detect unusual time series in a large collection of time series. This is joint work with Earo Wang (an honours student at Monash) and Nikolay Laptev( from Yahoo Labs). Yahoo is interested in detecting unusual patterns in server metrics.

The package is based on this paper with Earo and Nikolay.

The basic idea is to measure a range of features of the time series (such as strength of seasonality, an index of spikiness, first order autocorrelation, etc.) Then a principal component decomposition of the feature matrix is calculated, and outliers are identified in 2-dimensional space of the first two principal component scores.

We use two methods to identify outliers.

  1. A bivariate kernel density estimate of the first two PC scores is computed, and the points are ordered based on the value of the density at each observation. This gives us a ranking of most outlying (least density) to least outlying (highest density).
  2. A series of\alpha–convex hulls are computed on the first two PC scores with decreasing\alpha, and points are classified as outliers when they become singletons separated from the main hull. This gives us an alternative ranking with the most outlying having separated at the highest value of\alpha, and the remaining outliers with decreasing values of\alpha.

I explained the ideas in a talk last Tuesday given at a joint meeting of the Statistical Society of Australia and the Melbourne Data Science Meetup Group. Slides are available here. A link to a video of the talk will also be added there when it is ready.

The density-ranking of PC scores was also used in my work on detecting outliers in functional data. See my2010JCGSpaperand the associated rainbow package for R.

There are two versions of the package: one under an ACM licence, and a limited version under a GPL licence. Eventually we hope to make the GPL version contain everything, but we are currently dependent on the alphahull package which has an ACM licence.

See Forrester’s Report, “Vendor Landscape, Application Performance Management” to identify the right vendor to help IT deliver better service at a lower cost, brought to you in partnership with BMC.

performance,monitoring,r,data science,time series

Published at DZone with permission of Rob J Hyndman, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}