{ "cells": [ { "cell_type": "markdown", "id": "0", "metadata": {}, "source": [ "# Collecting Data\n", "\n", "## Outline\n", "### Questions\n", "- How do I use a pipeline to transform a frame from a trajectory into features?\n", "- How do I collect the entire trajectory's frames into a feature signal?\n", "\n", "### Objectives\n", "- Explain how to use the `SignalAggregator` class to collect generator data into a pandas DataFrame object.\n", "- Show passing arguments to a `Generator` and `SignalAggregator` object.\n", "\n", "## Import" ] }, { "cell_type": "code", "execution_count": 1, "id": "1", "metadata": { "tags": [] }, "outputs": [], "source": [ "import freud # analysis toolkit\n", "import gsd.hoomd # trajectory reader\n", "\n", "import dupin as du\n", "\n", "FILENAME = \"../data/lj-sim.gsd\"" ] }, { "cell_type": "code", "execution_count": 2, "id": "2", "metadata": { "tags": [] }, "outputs": [], "source": [ "def display_dataframe(df):\n", " style = df.head().style\n", " style.set_table_styles(\n", " [\n", " {\n", " \"selector\": \"th\",\n", " \"props\": \"background-color: #666666; color: #ffffff; border: 1px solid #222222;\",\n", " },\n", " {\n", " \"selector\": \"td\",\n", " \"props\": \"background-color: #666666; color: #ffffff; border: 1px solid #222222;\",\n", " },\n", " ]\n", " )\n", " display(style)" ] }, { "cell_type": "markdown", "id": "3", "metadata": {}, "source": [ "## Create the generator\n", "\n", "We use the generator from the last section of the tutorial." ] }, { "cell_type": "code", "execution_count": 3, "id": "4", "metadata": { "tags": [] }, "outputs": [], "source": [ "ls = (2, 4, 6, 8, 10, 12)\n", "steinhardt = freud.order.Steinhardt(l=ls)\n", "\n", "pipeline = du.data.freud.FreudDescriptor(\n", " compute=steinhardt, attrs={\"particle_order\": [f\"$Q_{{{i}}}$\" for i in ls]}\n", ").pipe(du.data.reduce.NthGreatest((-1, 1, 10, -10)))" ] }, { "cell_type": "markdown", "id": "5", "metadata": {}, "source": [ "## Calling the Generator\n", "\n", "### Passing Arguments\n", "\n", "Most of the time, we will want to pass arguments to a generator object.\n", "However, most of the time, we have also created a multi-step pipeline which may require their own arguments or not accept any.\n", "While the full availability of options for argument handling in the pipeline is not appropriate for this tutorial, in general adding elements to the pipeline does not change the signature expected.\n", "That is we can just pass the expected arguments of the original generator.\n", "\n", "For our generator, the arguments are a *system*-like object which is a [freud](https://github.com/glotzerlab/freud) concept and neighbor query arguments to specify the local point neighbors to consider.\n", "One such *system*-like object is a `gsd.hoomd.Frame` object which we use below to showcase using the pipeline defined above." ] }, { "cell_type": "code", "execution_count": 4, "id": "6", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "{'10th_greatest_$Q_{2}$': 0.20412414,\n", " '1st_greatest_$Q_{2}$': 0.25000003,\n", " '1st_least_$Q_{2}$': 2.1715324e-07,\n", " '10th_least_$Q_{2}$': 2.2129704e-07,\n", " '10th_greatest_$Q_{4}$': 0.36916766,\n", " '1st_greatest_$Q_{4}$': 0.3807429,\n", " '1st_least_$Q_{4}$': 0.28641072,\n", " '10th_least_$Q_{4}$': 0.28641072,\n", " '10th_greatest_$Q_{6}$': 0.22060224,\n", " '1st_greatest_$Q_{6}$': 0.25863975,\n", " '1st_least_$Q_{6}$': 0.110485375,\n", " '10th_least_$Q_{6}$': 0.110485375,\n", " '10th_greatest_$Q_{8}$': 0.59512764,\n", " '1st_greatest_$Q_{8}$': 0.6035463,\n", " '1st_least_$Q_{8}$': 0.5609913,\n", " '10th_least_$Q_{8}$': 0.5609913,\n", " '10th_greatest_$Q_{10}$': 0.2726314,\n", " '1st_greatest_$Q_{10}$': 0.30271238,\n", " '1st_least_$Q_{10}$': 0.1992841,\n", " '10th_least_$Q_{10}$': 0.19928412,\n", " '10th_greatest_$Q_{12}$': 0.43338743,\n", " '1st_greatest_$Q_{12}$': 0.44562435,\n", " '1st_least_$Q_{12}$': 0.4078147,\n", " '10th_least_$Q_{12}$': 0.40781474}" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "nlist_kwargs = {\"num_neighbors\": 12, \"exclude_ii\": True}\n", "with gsd.hoomd.open(FILENAME, \"rb\") as traj:\n", " display(pipeline(traj[0], nlist_kwargs))" ] }, { "cell_type": "markdown", "id": "7", "metadata": {}, "source": [ "## Data Collection\n", "\n", "To collect the data into a usable state, we can use the `SignalAggregator` class.\n", "This class takes a generator/pipeline object and through the `accumulate` or `compute` methods stores the sequential frames of features.\n", "These can then be turned into a `pandas.DataFrame` object for further manipulation or event detection." ] }, { "cell_type": "code", "execution_count": 5, "id": "8", "metadata": { "tags": [] }, "outputs": [], "source": [ "signal_aggregator = du.data.aggregate.SignalAggregator(pipeline)" ] }, { "cell_type": "markdown", "id": "9", "metadata": {}, "source": [ "For this tutorial, we will use the `accumulate` method which is a bit simpler to use than `compute`.\n", "For `accumulate`, we call the method with the arguments expected by the composed pipeline.\n", "This results in a new *frame* of features being stored in the `SignalAggregator` class.\n", "Therefore, we iterate over the trajectory frames in order.\n", "After computing the features for all features, we call `to_dataframe` to get the DataFrame object.\n", "We then save the data to a HDF5 file (for future use in the tutorial)." ] }, { "cell_type": "code", "execution_count": 6, "id": "10", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "\n", "
| \n", " | 10th_greatest_$Q_{2}$ | \n", "1st_greatest_$Q_{2}$ | \n", "1st_least_$Q_{2}$ | \n", "10th_least_$Q_{2}$ | \n", "10th_greatest_$Q_{4}$ | \n", "1st_greatest_$Q_{4}$ | \n", "1st_least_$Q_{4}$ | \n", "10th_least_$Q_{4}$ | \n", "10th_greatest_$Q_{6}$ | \n", "1st_greatest_$Q_{6}$ | \n", "1st_least_$Q_{6}$ | \n", "10th_least_$Q_{6}$ | \n", "10th_greatest_$Q_{8}$ | \n", "1st_greatest_$Q_{8}$ | \n", "1st_least_$Q_{8}$ | \n", "10th_least_$Q_{8}$ | \n", "10th_greatest_$Q_{10}$ | \n", "1st_greatest_$Q_{10}$ | \n", "1st_least_$Q_{10}$ | \n", "10th_least_$Q_{10}$ | \n", "10th_greatest_$Q_{12}$ | \n", "1st_greatest_$Q_{12}$ | \n", "1st_least_$Q_{12}$ | \n", "10th_least_$Q_{12}$ | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "0.204124 | \n", "0.250000 | \n", "0.000000 | \n", "0.000000 | \n", "0.369168 | \n", "0.380743 | \n", "0.286411 | \n", "0.286411 | \n", "0.220602 | \n", "0.258640 | \n", "0.110485 | \n", "0.110485 | \n", "0.595128 | \n", "0.603546 | \n", "0.560991 | \n", "0.560991 | \n", "0.272631 | \n", "0.302712 | \n", "0.199284 | \n", "0.199284 | \n", "0.433387 | \n", "0.445624 | \n", "0.407815 | \n", "0.407815 | \n", "
| 1 | \n", "0.153610 | \n", "0.167962 | \n", "0.023381 | \n", "0.033380 | \n", "0.214287 | \n", "0.235337 | \n", "0.056236 | \n", "0.073473 | \n", "0.525133 | \n", "0.567725 | \n", "0.251675 | \n", "0.287110 | \n", "0.370811 | \n", "0.404888 | \n", "0.136324 | \n", "0.168037 | \n", "0.302344 | \n", "0.343929 | \n", "0.120105 | \n", "0.152464 | \n", "0.397878 | \n", "0.438995 | \n", "0.194800 | \n", "0.217982 | \n", "
| 2 | \n", "0.162579 | \n", "0.187238 | \n", "0.014939 | \n", "0.030309 | \n", "0.210389 | \n", "0.239093 | \n", "0.053894 | \n", "0.075834 | \n", "0.528769 | \n", "0.587941 | \n", "0.196169 | \n", "0.279068 | \n", "0.378292 | \n", "0.422634 | \n", "0.129208 | \n", "0.175510 | \n", "0.310114 | \n", "0.333017 | \n", "0.134413 | \n", "0.154003 | \n", "0.393605 | \n", "0.439719 | \n", "0.170461 | \n", "0.216085 | \n", "
| 3 | \n", "0.154826 | \n", "0.187591 | \n", "0.023294 | \n", "0.035020 | \n", "0.209281 | \n", "0.226303 | \n", "0.048572 | \n", "0.077595 | \n", "0.543408 | \n", "0.573807 | \n", "0.246732 | \n", "0.284698 | \n", "0.374615 | \n", "0.394863 | \n", "0.137892 | \n", "0.166574 | \n", "0.308446 | \n", "0.343068 | \n", "0.121403 | \n", "0.157507 | \n", "0.396944 | \n", "0.433905 | \n", "0.209838 | \n", "0.227965 | \n", "
| 4 | \n", "0.145845 | \n", "0.169722 | \n", "0.016966 | \n", "0.032802 | \n", "0.216655 | \n", "0.285643 | \n", "0.033883 | \n", "0.072493 | \n", "0.548409 | \n", "0.578489 | \n", "0.214945 | \n", "0.278768 | \n", "0.371742 | \n", "0.399764 | \n", "0.139842 | \n", "0.162539 | \n", "0.309451 | \n", "0.350324 | \n", "0.118400 | \n", "0.158816 | \n", "0.405271 | \n", "0.443005 | \n", "0.194769 | \n", "0.227046 | \n", "