dupin.detect#

Overview

kneedle_elbow_detection

Run the KNEEDLE algorithm for elbow detection from the kneed package.

CostLinearBiasedFit

Compute a start to end linear fit and pentalize error and bias.

CostLinearFit

Compute the L1 cumulative error of piecewise linear fits in time.

SweepDetector

Detects the optimal number of change points in a time series.

two_pass_elbow_detection

Return a two pass function of another elbow detection algorithm.

Details

Methods for event detection in molecular simulations.

dupin provides classes and functions to help in the event detection process but does not currently have a detection method implemented or independent elbow detection. The package provides interfaces into ruptures for an event detection implementation and kneed for elbow detection.

This module also provides cost functions for use with ruptures, a scheme for determining the correct number of events in SweepDetector and some elbow detection helpers.

class dupin.detect.CostLinearBiasedFit(metric='l1')[source]#

Compute a start to end linear fit and pentalize error and bias.

Works with ruptures.

class dupin.detect.CostLinearFit(metric='l1')[source]#

Compute the L1 cumulative error of piecewise linear fits in time.

Works with ruptures. Used to compute the relative cumulative L1 deviation from a linear piecewise fit of a signal.

\[C(s, e) = \min\limits_{m, b} \sum_i |y_i - (m x_i + b)|\]

\(m\) and \(b\) can be vectors in the case of a multidimensional signal (the summation also goes across dimensions.

Parameters:
  • metric (str, optional) – What metric to use in computing the error. Defaults to "l1". Options are "l1" and "l2".

  • Note – For use in ruptures search algorithms. To use properly fit must be called first with the signal.

fit(signal)[source]#

Store signal and compute base errors for later cost checking.

class dupin.detect.SweepDetector(detector, max_change_points, elbow_detector=None, tolerance=0.001)[source]#

Detects the optimal number of change points in a time series.

By using a composed change point detection algorithm that detects a fixed number of change points and an elbow detection algorithm, this class detects the optimal change points as defined by the change points detected at the elbow of the cost versus number of change points plot.

Parameters:
  • detector (Union[ruptures.base.BaseEstimator, callable [[numpy.ndarray, int], tuple [list [int ], float ]]:) – The detector to use for each round of change point detection. Can be any callable which takes in a NumPy array signal of shape \((N_{frames}, N_{features})\) and the number of change points and returns a tuple containing the list of change points and the total cost for the change points. The argument can also be any of ruptures estimators.

  • max_change_points (int) – The maximum number of change points to attempt to detect.

  • elbow_detector (callable [[ list [ float ]], int], optional) – A callable that takes in a list of costs and outputs the elbow of the data. The callable should return None if no elbow can be detected. Defaults to the KNEEDLE algorithm provided by the kneedle package (see kneedle_elbow_detection for dupin defaults).

  • tolerance (float, optional) – The percentile change in cost below which to stop detecting higher numbers of change points. Since detecting \(n+1\) change points is by definition going to decrease the cost less than the last iteration, this is a reliable way to prevent wasted computation. For instance, a value of 0.01 means that if adding a change point decreases the cost by less than one percent of the previous value the detector stops immediately regardless of max_change_points.

__init__(detector, max_change_points, elbow_detector=None, tolerance=0.001)[source]#
__weakref__#

list of weak references to the object

fit(data)[source]#

Fit and return change points for given data.

Compute the change points for [0, self.max_change_points], and detect the elbow of the associated costs if any.

Parameters:

data (numpy.ndarray) – The data to detect change points for.

Returns:

The change points if any. An empty list means no change points were detected.

Return type:

list[int]

dupin.detect.kneedle_elbow_detection(costs, S=1, interp_method='interp1d', curve='convex', direction='decreasing', **kwargs)[source]#

Run the KNEEDLE algorithm for elbow detection from the kneed package.

Note

See the kneed documentation for more information on parameter selection in the KNEEDLE algorithm.

Parameters:
  • costs (list [float ]) – The list/array of costs along some implicit x.

  • S (int, optional) – A sensitivity parameter. Higher values require more obvious elbows/knees, while the lowest value, 1, will detect elbows soonest. Defaults to 1.

  • interp_method (str, optional) – The method of interpolation for the discrete points. Options are “interp1d” and “polynomial”. “interp1d” uses scipy.interpolate.interp1d, and “polynomial” uses numpy.polyfit. Defualts to “interp1d”.

  • curve (str, optional) – Will detect knees if “concave” and elbows if “convex”. Defaults to “convex”.

  • direction (str, optional) – Either “increasing” or “decreasing”. Whether the trend from left to right is increasing or decreasing. Defaults to “decreasing”.

  • **kwargs (dict) – Other keyword arguments to pass to kneed.KneeLocator.

Returns:

The predicted index for the elbow.

Return type:

int

dupin.detect.two_pass_elbow_detection(threshold, detector=None)[source]#

Return a two pass function of another elbow detection algorithm.

The detector runs a first pass of the elbow detector detector and determines if the elbow is far enough along the cost curve (determined by threshold). If it is not, the detector runs a second pass with only the points at or beyond the first pass’s estimated elbow. This is designed to help with the case where the first elbow detected is expected to be from such a prodigious decrease in the cost function that the proper number of events would not be detected such as smaller events within a phase transition.

Note

If the second pass returns None the first pass elbow will still be used.

Parameters:
  • threshold (int) – If the first pass of the elbow detector computes an elbow less than threshold, run a second pass. Otherwise, the detector just returns the first pass.

  • detector (callable [[list [float ]], int], optional) – The callable to use for both sweeps of elbow detection. Defaults to kneedle_elbow_detection.

Returns:

Returns a new elbow detector that uses the two steps scheme shown above.

Return type:

callable [[list [float], list [float ]], int]