dupin.preprocessing.supervised#
Overview
Iterate over a sequence in slices of length window_size. |
|
Computes the error of a classifier discerning between halves of a window. |
Details
Classes for use in utilizing supervised learning for event detection.
- class dupin.preprocessing.supervised.Window(classifier, window_size, test_size, loss_function=None, store_intermediate_classifiers=False, n_classifiers=1, combine_errors='mean')[source]#
Computes the error of a classifier discerning between halves of a window.
The class implements a generic way of discerning the similiarity between nearby sections in a sequence through the use of a rolling window and machine learning classifiers. The class then outputs this similarity as a single dimension regardless of input size.
The procedure is take a sliding window of a set size across the traectory. For each window, the left half is labeled as class 0 and the right as class one. The class then trains one or more weak classifiers for each window on a subset of points. The test loss on the remaining points is then aggregated across the classifiers and recorded. This testing loss is the single dimension representation of local signal similarity with higher values indicating dissimiliarity.
Note
The returned signal will be smaller by
window_size - 1
than the original signal.Warning
For this to be useful, a weak classifier must be chosen. A weak classifier is one that has low discrimination ability. This prevents the training on noise between window halves. For small and intermediate window sizes, most classifiers will find noise that can (nearly) perfectly discriminate the halves of the window.
- Parameters:
classifier (sklearn.base.ClassifierMixin) – A sklearn compatible classifier that is ready to fit to data.
window_size (int) – The size of windows to learn on, should be a even number for best results.
test_size (float) – Fraction of samples to use for computing the error through the loss function. This fraction is not fitted on.
loss_function (
callable
[[sklearn.base.ClassifierMixin
,numpy.ndarray
,numpy.ndarray
],float
], optional) – A callable that takes in the fitted classifier, the test x and test y values and returns a loss (lower is better). By default this computes the zero-one loss if sklearn is available, otherwise this errors.store_intermediate_classifiers (
bool
, optional) – Whether to store the fitted classifier for each window in the sequence passed tocompute
. Defaults to False. Warning: If the classifier stores some or all of the sequence in fitting as is the case for kernelized classifiers, this optional will lead to significant increase in use of memory.n_classifiers (
int
, optional) – The number of classifiers and test train splits to use per window, defaults to 1. Higher numbers naturally smooth the error across a trajectory.combine_errors (
str
, optional) – What function to reduce the errors ofn_classifiers
with, defauts to “mean”. Available values are “mean” and “median”.
- __init__(classifier, window_size, test_size, loss_function=None, store_intermediate_classifiers=False, n_classifiers=1, combine_errors='mean')[source]#
- __weakref__#
list of weak references to the object
- property combine_errors#
What function to reduce the errors of
n_classifiers
with.Available values are “mean” and “median”.
- Type:
- compute(X)[source]#
Compute the loss for classifiers trained on discerning window halves.
- Parameters:
X ((\(T\), \(N_f\)) np.ndarray) – An NumPy array where the first dimension is time or sequence progression and the second is features.
- Returns:
errors – Returns the list of loss function values for each window in
X
.- Return type:
- property loss_function#
Returns the loss for a fitted classifier given the test x and y.
- Type:
callable
[[sklearn.base.ClassifierMixin
,numpy.ndarray
,numpy.ndarray
],float
]
- property n_classifiers#
Number of classifiers and test-train splits per window.
Higher numbers naturally smooth the error across a trajectory.
- Type:
- property store_intermediate_classifiers#
Whether to store the classifiers for each window.
If
True
the classifiers are stored inclassifiers_
after callingcompute
.- Type:
- dupin.preprocessing.supervised.window_iter(seq, window_size)[source]#
Iterate over a sequence in slices of length window_size.
- Parameters:
seq (list [
any
]) – The sequence to yield windows of.window_size (int) – The size of window iter iterator over.
- Yields:
window (list [
any
]) – The current window of the original data.