{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0",
   "metadata": {},
   "source": [
    "# Setting up a Data Pipeline\n",
    "\n",
    "## Outline\n",
    "### Questions:\n",
    "- How can I set up a pipeline to generate, map, and reduce data from a point cloud?\n",
    "- What are some common reducers **dupin** provides?\n",
    "\n",
    "### Objectives:\n",
    "- Define what a generator is and the expected output.\n",
    "- Demonstrate the builder syntax for the creation of pipelines.\n",
    "- Show how to use multiple maps or reducer through teeing.\n",
    "\n",
    "## Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "1",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import freud\n",
    "\n",
    "import dupin as du"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2",
   "metadata": {},
   "source": [
    "## The data module\n",
    "\n",
    "The data generation portion of **dupin** (generate to aggregate) can be found in the `dupin.data` submodule.\n",
    "\n",
    "## Generators\n",
    "\n",
    "The base of the data generation portion of **dupin** (generate to aggregate) is the generator.\n",
    "Generators are simply registered callables which when called return a dictionary of features.\n",
    "These dictionaries have feature names as keys with either float or NumPy arrays feature values.\n",
    "\n",
    "```python\n",
    "@du.data.CustomGenerator\n",
    "def eg_generator():\n",
    "    return {\"feat-1\": 1.2, \"feat-2\": 0.0}\n",
    "```\n",
    "\n",
    "We will in this tutorial use a builtin generator class from **dupin** which uses [freud](https://freud.readthedocs.io/en/stable) a Python package for analyzing molecular trajectories as our generator.\n",
    "The point cloud or trajectory we are using comes from a molecular dynamics simulation of thermostated Lennard-Jones particles in a fixed volume periodic box (NVT) run using [hoomd-blue](https://hoomd-blue.readthedocs.io/en/stable).\n",
    "\n",
    "Below we define our generator which use Steinhardt order parameters.\n",
    "While not necessary for understanding, we use the spherical harmonic numbers $l \\in \\{2,4,6,8,10,12\\}$.\n",
    "This requires we specify multiple feature names in the `attrs` key-word argument below.\n",
    "`attrs` maps the attribute name in the **freud** compute object to feature names in **dupin**.\n",
    "For 2 dimensional array quantities such as we have hear, we map the attribute name `particle_order` to multiple names given by the $l$ value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "3",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "ls = (2, 4, 6, 8, 10, 12)\n",
    "steinhardt = freud.order.Steinhardt(l=ls)\n",
    "generator = du.data.freud.FreudDescriptor(\n",
    "    compute=steinhardt, attrs={\"particle_order\": [f\"$Q_{{{l}}}$\" for l in ls]}\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4",
   "metadata": {},
   "source": [
    "## Builder syntax\n",
    "\n",
    "**dupin** has 2 ways of attaching steps to a given data generation pipeline for mapping or reducing: the builder syntax and the decorator syntax.\n",
    "This tutorial will only cover the builder syntax; for the decorator syntax, see the API documentation.\n",
    "\n",
    "The builder syntax involves calling special methods from a extent pipeline (generators and all derivative objects are pipelines): `pipe`, `map`, and `reduce`.\n",
    "\n",
    "* `pipe`: Adds a new layer to the pipeline either for the map or reduce step. Objects passed to `pipe` must be known reducers or mappers. When piping two operations they are executed in order from left to right in a way that output from the first one is used as input for the right one.\n",
    "* `map`: Add a map layer to the pipeline. Objects passed to `map` can either be known mappers or a custom map function. Mappers can be used to map a vector-like (usually per-particle) quantity into another vector-like per-particle quantity that describes the property of interest better compared to the original property.\n",
    "* `reduce`: Add a reduce layer to the pipeline. Objects passed to `reduce` can either be known reducers or a custom reduction function. Reducers take a vector-like (usually per-particle) quantity and reduce it to one or more scalars that can be used in detection.\n",
    "\n",
    "The builder syntax leads to a pipeline whose steps should be read from left to right that is `A.pipe(B).map(C).reduce(D)` goes from `A->B->C->D`.\n",
    "Below we showcase the builder syntax.\n",
    "Don't worry if you don't understand the specific mappers or reducers here.\n",
    "The rest of the tutorial will go over commonly used values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "5",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "pipeline = generator.pipe(  # a map step\n",
    "    du.data.spatial.NeighborAveraging(\n",
    "        expected_kwarg=\"neighbors\", remove_kwarg=False\n",
    "    )\n",
    ").reduce(du.data.reduce.NthGreatest((-1, 1, 10, -10)))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6",
   "metadata": {},
   "source": [
    "## Reducers\n",
    "\n",
    "We are going to skip over maps here as they are less commonly useful than reducers.\n",
    "Feel free to look at the documentation for `dupin.data.spatial.NeighborAveraging` above.\n",
    "\n",
    "Reducers take an array and return one or more features associated with the array.\n",
    "For purposes of event detection, features which focus on the extrema or limits of a distribution tend to outperform other as they can signal the transition earlier than other features.\n",
    "**dupin** has two classes which perform this function: `NthGreatest` and `Percentile`.\n",
    "\n",
    "* `NthGreatest` take the specified nth greatest or least (indicated by negative indices).\n",
    "* `Percentile` takes the specific quantiles given.\n",
    "\n",
    "The two classes perform similar functions, and the chosen class is a matter of taste mostly.\n",
    "If you prefer to specify the exact indices to take use `NthGreatest` if you'd rather think in terms of percentages use `Percentile`.\n",
    "For this tutorial we will use `NthGreatest`.\n",
    "Below we create the final pipeline for this section of the tutorial which will be used in the next section."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "7",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "pipeline = generator.pipe(du.data.reduce.NthGreatest((-1, 1, 10, -10)))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8",
   "metadata": {
    "nbsphinx": "hidden",
    "tags": []
   },
   "source": [
    "[Previous section](01-basic-approach.ipynb) [Next section](03-collecting-data.ipynb)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python (dupin)",
   "language": "python",
   "name": "dupin"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}