Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

A New R Package for Detecting Unusual Time Series

DZone's Guide to

A New R Package for Detecting Unusual Time Series

· Performance Zone ·
Free Resource

Learn how error monitoring with Sentry closes the gap between the product team and your customers. With Sentry, you can focus on what you do best: building and scaling software that makes your users’ lives better.

The anomalous package provides some tools to detect unusual time series in a large collection of time series. This is joint work with Earo Wang (an honours student at Monash) and Nikolay Laptev( from Yahoo Labs). Yahoo is interested in detecting unusual patterns in server metrics.

The package is based on this paper with Earo and Nikolay.

The basic idea is to measure a range of features of the time series (such as strength of seasonality, an index of spikiness, first order autocorrelation, etc.) Then a principal component decomposition of the feature matrix is calculated, and outliers are identified in 2-dimensional space of the first two principal component scores.

We use two methods to identify outliers.

  1. A bivariate kernel density estimate of the first two PC scores is computed, and the points are ordered based on the value of the density at each observation. This gives us a ranking of most outlying (least density) to least outlying (highest density).
  2. A series of\alpha–convex hulls are computed on the first two PC scores with decreasing\alpha, and points are classified as outliers when they become singletons separated from the main hull. This gives us an alternative ranking with the most outlying having separated at the highest value of\alpha, and the remaining outliers with decreasing values of\alpha.

I explained the ideas in a talk last Tuesday given at a joint meeting of the Statistical Society of Australia and the Melbourne Data Science Meetup Group. Slides are available here. A link to a video of the talk will also be added there when it is ready.

The density-ranking of PC scores was also used in my work on detecting outliers in functional data. See my2010JCGSpaperand the associated rainbow package for R.

There are two versions of the package: one under an ACM licence, and a limited version under a GPL licence. Eventually we hope to make the GPL version contain everything, but we are currently dependent on the alphahull package which has an ACM licence.

What’s the best way to boost the efficiency of your product team and ship with confidence? Check out this ebook to learn how Sentry's real-time error monitoring helps developers stay in their workflow to fix bugs before the user even knows there’s a problem.

Topics:
performance ,monitoring ,r ,data science ,time series

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}