dupin.preprocessing.supervised#

Overview

window_iter

Iterate over a sequence in slices of length window_size.

Window

Computes the error of a classifier discerning between halves of a window.

Details

Classes for use in utilizing supervised learning for event detection.

class dupin.preprocessing.supervised.Window(classifier, window_size, test_size, loss_function=None, store_intermediate_classifiers=False, n_classifiers=1, combine_errors='mean')[source]#

Computes the error of a classifier discerning between halves of a window.

The class implements a generic way of discerning the similiarity between nearby sections in a sequence through the use of a rolling window and machine learning classifiers. The class then outputs this similarity as a single dimension regardless of input size.

The procedure is take a sliding window of a set size across the traectory. For each window, the left half is labeled as class 0 and the right as class one. The class then trains one or more weak classifiers for each window on a subset of points. The test loss on the remaining points is then aggregated across the classifiers and recorded. This testing loss is the single dimension representation of local signal similarity with higher values indicating dissimiliarity.

Note

The returned signal will be smaller by window_size - 1 than the original signal.

Warning

For this to be useful, a weak classifier must be chosen. A weak classifier is one that has low discrimination ability. This prevents the training on noise between window halves. For small and intermediate window sizes, most classifiers will find noise that can (nearly) perfectly discriminate the halves of the window.

Parameters:
  • classifier (sklearn.base.ClassifierMixin) – A sklearn compatible classifier that is ready to fit to data.

  • window_size (int) – The size of windows to learn on, should be a even number for best results.

  • test_size (float) – Fraction of samples to use for computing the error through the loss function. This fraction is not fitted on.

  • loss_function (callable [[sklearn.base.ClassifierMixin, numpy.ndarray, numpy.ndarray], float], optional) – A callable that takes in the fitted classifier, the test x and test y values and returns a loss (lower is better). By default this computes the zero-one loss if sklearn is available, otherwise this errors.

  • store_intermediate_classifiers (bool, optional) – Whether to store the fitted classifier for each window in the sequence passed to compute. Defaults to False. Warning: If the classifier stores some or all of the sequence in fitting as is the case for kernelized classifiers, this optional will lead to significant increase in use of memory.

  • n_classifiers (int, optional) – The number of classifiers and test train splits to use per window, defaults to 1. Higher numbers naturally smooth the error across a trajectory.

  • combine_errors (str, optional) – What function to reduce the errors of n_classifiers with, defauts to “mean”. Available values are “mean” and “median”.

__init__(classifier, window_size, test_size, loss_function=None, store_intermediate_classifiers=False, n_classifiers=1, combine_errors='mean')[source]#
__weakref__#

list of weak references to the object

property combine_errors#

What function to reduce the errors of n_classifiers with.

Available values are “mean” and “median”.

Type:

str

compute(X)[source]#

Compute the loss for classifiers trained on discerning window halves.

Parameters:

X ((\(T\), \(N_f\)) np.ndarray) – An NumPy array where the first dimension is time or sequence progression and the second is features.

Returns:

errors – Returns the list of loss function values for each window in X.

Return type:

list

property loss_function#

Returns the loss for a fitted classifier given the test x and y.

Type:

callable [[ sklearn.base.ClassifierMixin, numpy.ndarray, numpy.ndarray ], float ]

property n_classifiers#

Number of classifiers and test-train splits per window.

Higher numbers naturally smooth the error across a trajectory.

Type:

int

property store_intermediate_classifiers#

Whether to store the classifiers for each window.

If True the classifiers are stored in classifiers_ after calling compute.

Type:

bool

property test_size#

Fraction of samples to use for computing the error.

Type:

float

property window_size#

The size of windows to learn on.

Type:

int

dupin.preprocessing.supervised.window_iter(seq, window_size)[source]#

Iterate over a sequence in slices of length window_size.

Parameters:
  • seq (list [any]) – The sequence to yield windows of.

  • window_size (int) – The size of window iter iterator over.

Yields:

window (list [ any ]) – The current window of the original data.