dtaidistance.clustering.hierarchical¶
Time series clustering using hierarchical clustering.
author:  Wannes Meert 

copyright:  Copyright 20172020 KU Leuven, DTAI Research Group. 
license:  Apache License, Version 2.0, see LICENSE for details. 

class
dtaidistance.clustering.hierarchical.
BaseTree
(**kwargs)¶ Base Tree abstract class.
Returns a datastructure compatible with the Scipy clustering methods:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.linkage.html
A (n1) by 4 matrix Z is returned. At the ith iteration, clusters with indices Z[i, 0] and Z[i, 1] are combined to form cluster n + i. A cluster with an index less than n corresponds to one of the original observations. The distance between clusters Z[i, 0] and Z[i, 1] is given by Z[i, 2]. The fourth value Z[i, 3] represents the number of original observations in the newly formed cluster.

get_linkage
(node)¶

maxnode
¶

plot
(filename=None, axes=None, ts_height=10, bottom_margin=2, top_margin=2, ts_left_margin=0, ts_sample_length=1, tr_label_margin=3, tr_left_margin=2, ts_label_margin=0, show_ts_label=None, show_tr_label=None, cmap='viridis_r', ts_color=None)¶ Plot the hierarchy and time series.
Parameters:  filename – If a filename is passed, the image is written to this file.
 axes – If a axes array is passed the image is added to this figure. Expects axes[0] and axes[1] to be present.
 ts_height – Height of a time series
 bottom_margin – Margin on bottom
 top_margin – Margin on top
 ts_left_margin – Margin on left of time series image
 ts_sample_length – Space between two points in the time series
 tr_label_margin – Margin between tree split and label
 tr_left_margin – Left margin for tree
 ts_label_margin – Margin between start of series and label
 show_ts_label – Show label indices. Boolean, callable or subscriptable object. If it is a callable object, the index of the time series will be given and the return string will be printed.
 show_tr_label – Show tree distances. Boolean, callable or subscriptable object. If it is a callable object, the index of the time series will be given and the return string will be printed.
 cmap – Matplotlib colormap name
 ts_color – function that takes the index and returns a color (compatible with the matplotlib.color color argument)

to_dot
()¶


class
dtaidistance.clustering.hierarchical.
Hierarchical
(dists_fun, dists_options, max_dist=inf, merge_hook=None, order_hook=None, show_progress=True)¶ Hierarchical clustering.
Note: This method first computes the entire distance matrix. This is not ideal for extremely large data sets.
Parameters:  dists_fun – Function to compute pairwise distance matrix between set of series.
 dists_options – Arguments to pass to dists_fun.
 max_dist – Do not merge or cluster series that are further apart than this.
 merge_hook – Function that is called when two series are clustered. The function definition is def merge_hook(from_idx, to_idx, distance), where idx is the index of the series.
 order_hook – Function that is called to decide on the next idx out of all shortest distances
 show_progress – Use a tqdm progress bar

fit
(series)¶ Merge sequences.
Parameters: series – Iterator over series. Returns: Dictionary with as keys the prototype indicices and as values all the indicides of the series in that cluster.

class
dtaidistance.clustering.hierarchical.
HierarchicalTree
(model=None, **kwargs)¶ Wrapper to keep track of the full tree that represents the hierarchical clustering.
The linkage tree is available in self.linkage.
Parameters: model – Clustering object. For example of class Hierarchical
. If no model is given, the arguments are identical to those of classHierarchical
.
fit
(series, *args, **kwargs)¶ Fit a hierarchical clustering tree.
The linkage tree is available in self.linkage.

get_linkage
(node)¶

maxnode
¶

plot
(filename=None, axes=None, ts_height=10, bottom_margin=2, top_margin=2, ts_left_margin=0, ts_sample_length=1, tr_label_margin=3, tr_left_margin=2, ts_label_margin=0, show_ts_label=None, show_tr_label=None, cmap='viridis_r', ts_color=None)¶ Plot the hierarchy and time series.
Parameters:  filename – If a filename is passed, the image is written to this file.
 axes – If a axes array is passed the image is added to this figure. Expects axes[0] and axes[1] to be present.
 ts_height – Height of a time series
 bottom_margin – Margin on bottom
 top_margin – Margin on top
 ts_left_margin – Margin on left of time series image
 ts_sample_length – Space between two points in the time series
 tr_label_margin – Margin between tree split and label
 tr_left_margin – Left margin for tree
 ts_label_margin – Margin between start of series and label
 show_ts_label – Show label indices. Boolean, callable or subscriptable object. If it is a callable object, the index of the time series will be given and the return string will be printed.
 show_tr_label – Show tree distances. Boolean, callable or subscriptable object. If it is a callable object, the index of the time series will be given and the return string will be printed.
 cmap – Matplotlib colormap name
 ts_color – function that takes the index and returns a color (compatible with the matplotlib.color color argument)

to_dot
()¶


class
dtaidistance.clustering.hierarchical.
Hooks
¶ 
static
create_orderhook
(weights)¶

static
create_weighthook
(weights, series)¶

static

class
dtaidistance.clustering.hierarchical.
LinkageTree
(dists_fun, dists_options, method='complete')¶ Hierarchical clustering using the Scipy linkage function.
The linkage tree is available in self.linkage.
This is the same but faster algorithm as available in Hierarchical (~10 times faster). But with less options to steer the clustering (e.g. no possibility to give weights). It still computes the entire distance matrix first and is thus not ideal for extremely large data sets.
Parameters:  dists_fun – Distance funcion, e.g. dtw.distance
 dists_options – Options passed to dists_fun
 method – Linkage method (see scipy.cluster.hierarchy.linkage)

fit
(series)¶ Fit a hierarchical clustering tree.
The linkage tree is available in self.linkage.

get_linkage
(node)¶

maxnode
¶

plot
(filename=None, axes=None, ts_height=10, bottom_margin=2, top_margin=2, ts_left_margin=0, ts_sample_length=1, tr_label_margin=3, tr_left_margin=2, ts_label_margin=0, show_ts_label=None, show_tr_label=None, cmap='viridis_r', ts_color=None)¶ Plot the hierarchy and time series.
Parameters:  filename – If a filename is passed, the image is written to this file.
 axes – If a axes array is passed the image is added to this figure. Expects axes[0] and axes[1] to be present.
 ts_height – Height of a time series
 bottom_margin – Margin on bottom
 top_margin – Margin on top
 ts_left_margin – Margin on left of time series image
 ts_sample_length – Space between two points in the time series
 tr_label_margin – Margin between tree split and label
 tr_left_margin – Left margin for tree
 ts_label_margin – Margin between start of series and label
 show_ts_label – Show label indices. Boolean, callable or subscriptable object. If it is a callable object, the index of the time series will be given and the return string will be printed.
 show_tr_label – Show tree distances. Boolean, callable or subscriptable object. If it is a callable object, the index of the time series will be given and the return string will be printed.
 cmap – Matplotlib colormap name
 ts_color – function that takes the index and returns a color (compatible with the matplotlib.color color argument)

to_dot
()¶