dupin.detect#
Overview
Run the KNEEDLE algorithm for elbow detection from the kneed package. |
|
Compute a start to end linear fit and pentalize error and bias. |
|
Compute the L1 cumulative error of piecewise linear fits in time. |
|
Detects the optimal number of change points in a time series. |
|
Return a two pass function of another elbow detection algorithm. |
Details
Methods for event detection in molecular simulations.
dupin provides classes and functions to help in the event detection process but does not currently have a detection method implemented or independent elbow detection. The package provides interfaces into ruptures for an event detection implementation and kneed for elbow detection.
This module also provides cost functions for use with ruptures, a scheme for
determining the correct number of events in SweepDetector
and some elbow
detection helpers.
- class dupin.detect.CostLinearBiasedFit(metric='l1')[source]#
Compute a start to end linear fit and pentalize error and bias.
Works with ruptures.
- class dupin.detect.CostLinearFit(metric='l1')[source]#
Compute the L1 cumulative error of piecewise linear fits in time.
Works with ruptures. Used to compute the relative cumulative L1 deviation from a linear piecewise fit of a signal.
\[C(s, e) = \min\limits_{m, b} \sum_i |y_i - (m x_i + b)|\]\(m\) and \(b\) can be vectors in the case of a multidimensional signal (the summation also goes across dimensions.
- Parameters:
- class dupin.detect.SweepDetector(detector, max_change_points, elbow_detector=None, tolerance=0.001)[source]#
Detects the optimal number of change points in a time series.
By using a composed change point detection algorithm that detects a fixed number of change points and an elbow detection algorithm, this class detects the optimal change points as defined by the change points detected at the elbow of the cost versus number of change points plot.
- Parameters:
detector (Union[
ruptures.base.BaseEstimator
,callable
[[numpy.ndarray
,int
],tuple
[list
[int
],float
]]:) – The detector to use for each round of change point detection. Can be any callable which takes in a NumPy array signal of shape \((N_{frames}, N_{features})\) and the number of change points and returns a tuple containing the list of change points and the total cost for the change points. The argument can also be any of ruptures estimators.max_change_points (int) – The maximum number of change points to attempt to detect.
elbow_detector (
callable
[[list
[float
]],int
], optional) – A callable that takes in a list of costs and outputs the elbow of the data. The callable should returnNone
if no elbow can be detected. Defaults to the KNEEDLE algorithm provided by the kneedle package (seekneedle_elbow_detection
for dupin defaults).tolerance (
float
, optional) – The percentile change in cost below which to stop detecting higher numbers of change points. Since detecting \(n+1\) change points is by definition going to decrease the cost less than the last iteration, this is a reliable way to prevent wasted computation. For instance, a value of 0.01 means that if adding a change point decreases the cost by less than one percent of the previous value the detector stops immediately regardless ofmax_change_points
.
- __weakref__#
list of weak references to the object
- fit(data)[source]#
Fit and return change points for given data.
Compute the change points for
[0, self.max_change_points]
, and detect the elbow of the associated costs if any.- Parameters:
data (numpy.ndarray) – The data to detect change points for.
- Returns:
The change points if any. An empty list means no change points were detected.
- Return type:
- dupin.detect.kneedle_elbow_detection(costs, S=1, interp_method='interp1d', curve='convex', direction='decreasing', **kwargs)[source]#
Run the KNEEDLE algorithm for elbow detection from the kneed package.
Note
See the kneed documentation for more information on parameter selection in the KNEEDLE algorithm.
- Parameters:
costs (
list
[float
]) – The list/array of costs along some implicit x.S (
int
, optional) – A sensitivity parameter. Higher values require more obvious elbows/knees, while the lowest value, 1, will detect elbows soonest. Defaults to 1.interp_method (
str
, optional) – The method of interpolation for the discrete points. Options are “interp1d” and “polynomial”. “interp1d” usesscipy.interpolate.interp1d
, and “polynomial” usesnumpy.polyfit
. Defualts to “interp1d”.curve (
str
, optional) – Will detect knees if “concave” and elbows if “convex”. Defaults to “convex”.direction (
str
, optional) – Either “increasing” or “decreasing”. Whether the trend from left to right is increasing or decreasing. Defaults to “decreasing”.**kwargs (dict) – Other keyword arguments to pass to
kneed.KneeLocator
.
- Returns:
The predicted index for the elbow.
- Return type:
- dupin.detect.two_pass_elbow_detection(threshold, detector=None)[source]#
Return a two pass function of another elbow detection algorithm.
The detector runs a first pass of the elbow detector
detector
and determines if the elbow is far enough along the cost curve (determined bythreshold
). If it is not, the detector runs a second pass with only the points at or beyond the first pass’s estimated elbow. This is designed to help with the case where the first elbow detected is expected to be from such a prodigious decrease in the cost function that the proper number of events would not be detected such as smaller events within a phase transition.Note
If the second pass returns
None
the first pass elbow will still be used.- Parameters:
threshold (int) – If the first pass of the elbow detector computes an elbow less than threshold, run a second pass. Otherwise, the detector just returns the first pass.
detector (
callable
[[list
[float
]],int
], optional) – The callable to use for both sweeps of elbow detection. Defaults tokneedle_elbow_detection
.
- Returns:
Returns a new elbow detector that uses the two steps scheme shown above.
- Return type: