{ "cells": [ { "cell_type": "markdown", "id": "0", "metadata": {}, "source": [ "# Setting up a Data Pipeline\n", "\n", "## Outline\n", "### Questions:\n", "- How can I set up a pipeline to generate, map, and reduce data from a point cloud?\n", "- What are some common reducers **dupin** provides?\n", "\n", "### Objectives:\n", "- Define what a generator is and the expected output.\n", "- Demonstrate the builder syntax for the creation of pipelines.\n", "- Show how to use multiple maps or reducer through teeing.\n", "\n", "## Imports" ] }, { "cell_type": "code", "execution_count": 1, "id": "1", "metadata": { "tags": [] }, "outputs": [], "source": [ "import freud\n", "\n", "import dupin as du" ] }, { "cell_type": "markdown", "id": "2", "metadata": {}, "source": [ "## The data module\n", "\n", "The data generation portion of **dupin** (generate to aggregate) can be found in the `dupin.data` submodule.\n", "\n", "## Generators\n", "\n", "The base of the data generation portion of **dupin** (generate to aggregate) is the generator.\n", "Generators are simply registered callables which when called return a dictionary of features.\n", "These dictionaries have feature names as keys with either float or NumPy arrays feature values.\n", "\n", "```python\n", "@du.data.CustomGenerator\n", "def eg_generator():\n", " return {\"feat-1\": 1.2, \"feat-2\": 0.0}\n", "```\n", "\n", "We will in this tutorial use a builtin generator class from **dupin** which uses [freud](https://freud.readthedocs.io/en/stable) a Python package for analyzing molecular trajectories as our generator.\n", "The point cloud or trajectory we are using comes from a molecular dynamics simulation of thermostated Lennard-Jones particles in a fixed volume periodic box (NVT) run using [hoomd-blue](https://hoomd-blue.readthedocs.io/en/stable).\n", "\n", "Below we define our generator which use Steinhardt order parameters.\n", "While not necessary for understanding, we use the spherical harmonic numbers $l \\in \\{2,4,6,8,10,12\\}$.\n", "This requires we specify multiple feature names in the `attrs` key-word argument below.\n", "`attrs` maps the attribute name in the **freud** compute object to feature names in **dupin**.\n", "For 2 dimensional array quantities such as we have hear, we map the attribute name `particle_order` to multiple names given by the $l$ value." ] }, { "cell_type": "code", "execution_count": 2, "id": "3", "metadata": { "tags": [] }, "outputs": [], "source": [ "ls = (2, 4, 6, 8, 10, 12)\n", "steinhardt = freud.order.Steinhardt(l=ls)\n", "generator = du.data.freud.FreudDescriptor(\n", " compute=steinhardt, attrs={\"particle_order\": [f\"$Q_{{{l}}}$\" for l in ls]}\n", ")" ] }, { "cell_type": "markdown", "id": "4", "metadata": {}, "source": [ "## Builder syntax\n", "\n", "**dupin** has 2 ways of attaching steps to a given data generation pipeline for mapping or reducing: the builder syntax and the decorator syntax.\n", "This tutorial will only cover the builder syntax; for the decorator syntax, see the API documentation.\n", "\n", "The builder syntax involves calling special methods from a extent pipeline (generators and all derivative objects are pipelines): `pipe`, `map`, and `reduce`.\n", "\n", "* `pipe`: Adds a new layer to the pipeline either for the map or reduce step. Objects passed to `pipe` must be known reducers or mappers. When piping two operations they are executed in order from left to right in a way that output from the first one is used as input for the right one.\n", "* `map`: Add a map layer to the pipeline. Objects passed to `map` can either be known mappers or a custom map function. Mappers can be used to map a vector-like (usually per-particle) quantity into another vector-like per-particle quantity that describes the property of interest better compared to the original property.\n", "* `reduce`: Add a reduce layer to the pipeline. Objects passed to `reduce` can either be known reducers or a custom reduction function. Reducers take a vector-like (usually per-particle) quantity and reduce it to one or more scalars that can be used in detection.\n", "\n", "The builder syntax leads to a pipeline whose steps should be read from left to right that is `A.pipe(B).map(C).reduce(D)` goes from `A->B->C->D`.\n", "Below we showcase the builder syntax.\n", "Don't worry if you don't understand the specific mappers or reducers here.\n", "The rest of the tutorial will go over commonly used values." ] }, { "cell_type": "code", "execution_count": 3, "id": "5", "metadata": { "tags": [] }, "outputs": [], "source": [ "pipeline = generator.pipe( # a map step\n", " du.data.spatial.NeighborAveraging(\n", " expected_kwarg=\"neighbors\", remove_kwarg=False\n", " )\n", ").reduce(du.data.reduce.NthGreatest((-1, 1, 10, -10)))" ] }, { "cell_type": "markdown", "id": "6", "metadata": {}, "source": [ "## Reducers\n", "\n", "We are going to skip over maps here as they are less commonly useful than reducers.\n", "Feel free to look at the documentation for `dupin.data.spatial.NeighborAveraging` above.\n", "\n", "Reducers take an array and return one or more features associated with the array.\n", "For purposes of event detection, features which focus on the extrema or limits of a distribution tend to outperform other as they can signal the transition earlier than other features.\n", "**dupin** has two classes which perform this function: `NthGreatest` and `Percentile`.\n", "\n", "* `NthGreatest` take the specified nth greatest or least (indicated by negative indices).\n", "* `Percentile` takes the specific quantiles given.\n", "\n", "The two classes perform similar functions, and the chosen class is a matter of taste mostly.\n", "If you prefer to specify the exact indices to take use `NthGreatest` if you'd rather think in terms of percentages use `Percentile`.\n", "For this tutorial we will use `NthGreatest`.\n", "Below we create the final pipeline for this section of the tutorial which will be used in the next section." ] }, { "cell_type": "code", "execution_count": 4, "id": "7", "metadata": { "tags": [] }, "outputs": [], "source": [ "pipeline = generator.pipe(du.data.reduce.NthGreatest((-1, 1, 10, -10)))" ] }, { "cell_type": "markdown", "id": "8", "metadata": { "nbsphinx": "hidden", "tags": [] }, "source": [ "[Previous section](01-basic-approach.ipynb) [Next section](03-collecting-data.ipynb)" ] } ], "metadata": { "kernelspec": { "display_name": "Python (dupin)", "language": "python", "name": "dupin" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 5 }