dupin.data.base#

Overview

CustomGenerator

Wrap a user callable for starting a data pipeline.

DataMap

Base class for mapping distributions to another distribution.

DataModifier

Generalized modifier of data in a pipeline.

DataReducer

Base class for reducing distributions into scalar features.

Generator

The abstract base class for generating signals used for event detection.

GeneratorLike

A type hint for objects that act like data generators for dupin.

PipeComponent

Base class for piping methods for intermediate date pipeline elements.

Details

Base classes for the data module.

class dupin.data.base.CustomGenerator(custom_function)[source]#

Wrap a user callable for starting a data pipeline.

This class allows custom user functions to be the generator of the initial data from a pipeline. The call signature is arbitrary but has an expected output described in the parameter section.

Parameters:

custom_function (callable [[…], dict[str, numpy.ndarray or float]]) – A custom callable that returns a dictionary with feature names for keys and feature values for values (as either floats or arrays).

function#

The provided callable.

Type:

callable [[…], dict[str, numpy.ndarray or float]]

__call__(*args, **kwargs)[source]#

Call the internal function.

__init__(custom_function)[source]#
class dupin.data.base.DataMap[source]#

Base class for mapping distributions to another distribution.

When the raw distribution of a given simulation snapshot is not appropriate as a feature or requires further processing, a DataMap instance can be used to wrap a Generator instance for this processing. This class automatically skips over scalar features.

This class requires the implemnation of compute in subclasses.

Note

While this is named after the map operation, the array returned need not be identical in size.

Note

This is an abstract base class and cannot be instantiated.

abstract compute(data)[source]#

Turn a distribution into another distribution.

Parameters:

distribution (\((N,)\) np.ndarray of float) – The array representing a distribution to map.

Returns:

signals – Returns a dictionary with string keys representing the type of reduction for its associated value. For instance if the value is the max of the distribution, a logical key value would be 'max'. The key only needs to represent the reduction, the original distribution name will be dealt with by a generator.

Return type:

dict[str, float]

class dupin.data.base.DataModifier[source]#

Generalized modifier of data in a pipeline.

This is an abstract base class and cannot be instantiated directlty.

Parameters:

generator (GeneratorLike) – A generator like object to modify.

__call__(*args, **kwargs)[source]#

Call the underlying generator performing the new modifications.

__init__()[source]#
__weakref__#

list of weak references to the object

attach_logger(logger)[source]#

Add a logger to this step in the data pipeline.

Parameters:

logger (dupin.data.logging.Logger) – A logger object to store data from the data pipeline for individual elements of the composed maps.

abstract compute(distribution)[source]#

Perform the data modification on the array.

remove_logger()[source]#

Remove a logger from this step in the pipeline if it exists.

update(args, kwargs)[source]#

Update data modifier before compute if necessary.

This is called before the internal generator is called. The method can consume arguments and returns the new args and kwargs (with potential arguments removed).

class dupin.data.base.DataReducer[source]#

Base class for reducing distributions into scalar features.

The class automatically skips over scalar features in its reduction. Subclasses requires the implemnation of compute.

Note

This is an abstract base class and cannot be instantiated.

abstract compute(distribution)[source]#

Turn a distribution into scalar features.

Parameters:

distribution (\((N,)\) np.ndarray of float) – The array representing a distribution to reduce.

Returns:

reduced_distribution – Returns a dictionary with string keys representing the type of reduction for its associated value. For instance if the value is the max of the distribution, a logical key value would be 'max'. The key only needs to represent the reduction, the original distribution name will be dealt automatically.

Return type:

dict[str, float]

class dupin.data.base.Generator[source]#

The abstract base class for generating signals used for event detection.

This just defines a simple interface through __call__ where signals are generated with name pairs in a dict.

abstract __call__(*args, **kwargs)[source]#

Return the output signal(s) for given inputs.

This method can have an arbitrary signature in subclasses.

Returns:

signals – Returns a mapping of signal names to floating point or array like data. Array like data must be reduced before use in detection.

Return type:

dict[str, Union[float, numpy.ndarray]]

__weakref__#

list of weak references to the object

attach_logger(logger)[source]#

Add a logger to this step in the data pipeline.

Parameters:

logger (dupin.data.logging.Logger) – A logger object to store data from the data pipeline for individual elements of the composed maps.

remove_logger()[source]#

Remove a logger from this step in the pipeline if it exists.

class dupin.data.base.PipeComponent[source]#

Base class for piping methods for intermediate date pipeline elements.

Provides helper methods for defining steps in a pipeline from a left to right or top to bottom approach.

__weakref__#

list of weak references to the object

map(map_)[source]#

Add a mapping step after the current step in the data pipeline.

Expects a custom callable or a DataMap instance.

Parameters:

map (dupin.data.base.DataMap or callable [numpy.ndarray, dict[str, numpy.ndarray]]:) – The next step in the data pipeline. Can be a custom callable mapping function or an dupin any of the built in mapping operations.

Returns:

Returns either a DataMap subclass based on the passed in object.

Return type:

DataMap

pipe(next_)[source]#

Add a step after current one in the data pipeline.

Expects a dupin.data.base.DataModifier instance.

Parameters:

next (dupin.data.base.DataModifier) – The next step in the data pipeline.

Returns:

Returns either a DataMap or DataReducer object based on the input to the method.

Return type:

DataMap or DataReducer

reduce(reduce_)[source]#

Add a reducing step after the current step in the data pipeline.

Expects a custom callable or a DataReducer instance.

Parameters:

reduce (dupin.data.base.DataReducer or callable [numpy.ndarray, dict[str, float]]) – The next step in the data pipeline. Can be a custom callable reducing function or an dupin any of the built in reducing operations.

Returns:

Returns a DataReducer subclass based on the passed in object.

Return type:

DataReducer

class dupin.data.base.GeneratorLike#
A type hint for objects that act like data generators for dupin.

The object can either be a Generator, DataMap, or callable with the appropriate return value.