dupin.data.reduce#

Overview

`CustomReducer`	Wrap a custom reducing callable.
`NthGreatest`	Reduce a distribution to the Nth greatest values.
`Percentile`	Reduce a distribution into percentile values.
`Tee`	Enable mutliple reducers to act on the same generator like object.

Details

Classes for transforming array quantities into scalar features.

Reduction in dupin takes an array and _reduces_ it to a set number of scalar values. A computer science reduction goes from an array to a single value. Our usage of the term is similar; we just allow for multiple reductions to happen within the same reducer. Examples of common reducers in the dupin sense are the max, min, mean, mode, and standard deviation functions.

class dupin.data.reduce.CustomReducer(custom_function)[source]#

Wrap a custom reducing callable.

Parameters:

generator (GeneratorLike) – A generator like object to reduce.
custom_function (callable [numpy.ndarray, dict [str, float ]) – A custom callable that takes in a NumPy array and returns a dictionary with keys indicating the reduction and values the reduced distribution value.

function#

The provided callable.

Type:: callable [[numpy.ndarray], dict [str, numpy.ndarray ]]

__init__(custom_function)[source]#

compute(data)[source]#: Call the internal function.

class dupin.data.reduce.NthGreatest(indices)[source]#

Reduce a distribution to the Nth greatest values.

This reducer returns the greatest and least values from a distribution as specified by the provided indices. Greatest values are specified by positive integers and least by negative, e.g. -1 is the minimum value in the array. The features keys are modified with the index ordinal number and whether it is greatest or least. -1 becomes “1st_least” and 10 becomes “10th_greatest”.

Parameters:: indices (list [ int ], optional) – The values to query. 1 is the greatest value in the distribution; 10 the tenth, and so on. Negative number consitute the smallest values in the distribution. -1 is the least value in the distribution. 0 is treated as 1.

__init__(indices)[source]#

compute(distribution)[source]#: Return the signals with modified keys.

class dupin.data.reduce.Percentile(percentiles=None)[source]#

Reduce a distribution into percentile values.

The reducers sorts the input array to get the provided percentiles. The reducers then uses the key format f”{percentile}%” to identify it reductions.

Parameters:: percentiles (tuple [ int ], optional) – The percentiles in integer form (i.e. 100% equals 100). By defualt, every 10% increment from 0% to 100% (inclusive) is taken.

__init__(percentiles=None)[source]#

compute(distribution)[source]#: Return the reduced distribution.

class dupin.data.reduce.Tee(reducers)[source]#

Enable mutliple reducers to act on the same generator like object.

Each reducer is run on the original distribution and their reductions are concatenated. This reducer does not create its own reductions or corresponding keys.

Parameters:: reducers (list [dupin.data.base.DataReducer]) – A sequence of a data reducers.

__init__(reducers)[source]#

attach_logger(logger)[source]#

Add a logger to this step in the data pipeline.

Parameters:: logger (dupin.data.logging.Logger) – A logger object to store data from the data pipeline for individual elements of the composed maps.

compute(distribution)[source]#: Run all composed reducer computes.

remove_logger()[source]#: Remove a logger from this step in the pipeline if it exists.