Collecting Data#

Outline#

Questions#

  • How do I use a pipeline to transform a frame from a trajectory into features?

  • How do I collect the entire trajectory’s frames into a feature signal?

Objectives#

  • Explain how to use the SignalAggregator class to collect generator data into a pandas DataFrame object.

  • Show passing arguments to a Generator and SignalAggregator object.

Import#

[1]:
import os

import freud  # analysis toolkit
import gsd.hoomd  # trajectory reader

import dupin as du

FILENAME = "lj-sim.gsd"
[2]:
def display_dataframe(df):
    style = df.head().style
    style.set_table_styles(
        [
            {
                "selector": "th",
                "props": "background-color: #666666; color: #ffffff; border: 1px solid #222222;",
            },
            {
                "selector": "td",
                "props": "background-color: #666666; color: #ffffff; border: 1px solid #222222;",
            },
        ]
    )
    display(style)

Create the generator#

We use the generator from the last section of the tutorial.

[3]:
ls = (2, 4, 6, 8, 10, 12)
steinhardt = freud.order.Steinhardt(l=ls)

pipeline = du.data.freud.FreudDescriptor(
    compute=steinhardt, attrs={"particle_order": [f"$Q_{{{i}}}$" for i in ls]}
).pipe(du.data.reduce.NthGreatest((-1, 1, 10, -10)))

Calling the Generator#

Passing Arguments#

Most of the time, we will want to pass arguments to a generator object. However, most of the time, we have also created a multi-step pipeline which may require their own arguments or not accept any. While the full availability of options for argument handling in the pipeline is not appropriate for this tutorial, in general adding elements to the pipeline does not change the signature expected. That is we can just pass the expected arguments of the original generator.

For our generator, the arguments are a system-like object which is a freud concept and neighbor query arguments to specify the local point neighbors to consider. One such system-like object is a gsd.hoomd.Frame object which we use below to showcase using the pipeline defined above.

[4]:
nlist_kwargs = {"num_neighbors": 12, "exclude_ii": True}
with gsd.hoomd.open(FILENAME, "rb") as traj:
    display(pipeline(traj[0], nlist_kwargs))
{'10th_greatest_$Q_{2}$': 0.20412414,
 '1st_greatest_$Q_{2}$': 0.25000003,
 '1st_least_$Q_{2}$': 2.1715324e-07,
 '10th_least_$Q_{2}$': 2.2129704e-07,
 '10th_greatest_$Q_{4}$': 0.36916766,
 '1st_greatest_$Q_{4}$': 0.3807429,
 '1st_least_$Q_{4}$': 0.28641072,
 '10th_least_$Q_{4}$': 0.28641072,
 '10th_greatest_$Q_{6}$': 0.22060224,
 '1st_greatest_$Q_{6}$': 0.25863975,
 '1st_least_$Q_{6}$': 0.110485375,
 '10th_least_$Q_{6}$': 0.110485375,
 '10th_greatest_$Q_{8}$': 0.59512764,
 '1st_greatest_$Q_{8}$': 0.6035463,
 '1st_least_$Q_{8}$': 0.5609913,
 '10th_least_$Q_{8}$': 0.5609913,
 '10th_greatest_$Q_{10}$': 0.2726314,
 '1st_greatest_$Q_{10}$': 0.30271238,
 '1st_least_$Q_{10}$': 0.1992841,
 '10th_least_$Q_{10}$': 0.19928412,
 '10th_greatest_$Q_{12}$': 0.43338743,
 '1st_greatest_$Q_{12}$': 0.44562435,
 '1st_least_$Q_{12}$': 0.4078147,
 '10th_least_$Q_{12}$': 0.40781474}

Data Collection#

To collect the data into a usable state, we can use the SignalAggregator class. This class takes a generator/pipeline object and through the accumulate or compute methods stores the sequential frames of features. These can then be turned into a pandas.DataFrame object for further manipulation or event detection.

[5]:
signal_aggregator = du.data.aggregate.SignalAggregator(pipeline)

For this tutorial, we will use the accumulate method which is a bit simpler to use than compute. For accumulate, we call the method with the arguments expected by the composed pipeline. This results in a new frame of features being stored in the SignalAggregator class. Therefore, we iterate over the trajectory frames in order. After computing the features for all features, we call to_dataframe to get the DataFrame object. We then save the data to a HDF5 file (for future use in the tutorial).

[6]:
with gsd.hoomd.open(FILENAME, "rb") as traj:
    for frame in traj:
        signal_aggregator.accumulate(frame, nlist_kwargs)
df = signal_aggregator.to_dataframe()
df.to_hdf("./lj-data.h5", "data")
display_dataframe(df)
  10th_greatest_$Q_{2}$ 1st_greatest_$Q_{2}$ 1st_least_$Q_{2}$ 10th_least_$Q_{2}$ 10th_greatest_$Q_{4}$ 1st_greatest_$Q_{4}$ 1st_least_$Q_{4}$ 10th_least_$Q_{4}$ 10th_greatest_$Q_{6}$ 1st_greatest_$Q_{6}$ 1st_least_$Q_{6}$ 10th_least_$Q_{6}$ 10th_greatest_$Q_{8}$ 1st_greatest_$Q_{8}$ 1st_least_$Q_{8}$ 10th_least_$Q_{8}$ 10th_greatest_$Q_{10}$ 1st_greatest_$Q_{10}$ 1st_least_$Q_{10}$ 10th_least_$Q_{10}$ 10th_greatest_$Q_{12}$ 1st_greatest_$Q_{12}$ 1st_least_$Q_{12}$ 10th_least_$Q_{12}$
0 0.204124 0.250000 0.000000 0.000000 0.369168 0.380743 0.286411 0.286411 0.220602 0.258640 0.110485 0.110485 0.595128 0.603546 0.560991 0.560991 0.272631 0.302712 0.199284 0.199284 0.433387 0.445624 0.407815 0.407815
1 0.153610 0.167962 0.023381 0.033380 0.214287 0.235337 0.056236 0.073473 0.525133 0.567725 0.251675 0.287110 0.370811 0.404888 0.136324 0.168037 0.302344 0.343929 0.120105 0.152464 0.397878 0.438995 0.194800 0.217982
2 0.162579 0.187238 0.014939 0.030309 0.210389 0.239093 0.053894 0.075834 0.528769 0.587941 0.196169 0.279068 0.378292 0.422634 0.129208 0.175510 0.310114 0.333017 0.134413 0.154003 0.393605 0.439719 0.170461 0.216085
3 0.154826 0.187591 0.023294 0.035020 0.209281 0.226303 0.048572 0.077595 0.543408 0.573807 0.246732 0.284698 0.374615 0.394863 0.137892 0.166574 0.308446 0.343068 0.121403 0.157507 0.396944 0.433905 0.209838 0.227965
4 0.145845 0.169722 0.016966 0.032802 0.216655 0.285643 0.033883 0.072493 0.548409 0.578489 0.214945 0.278768 0.371742 0.399764 0.139842 0.162539 0.309451 0.350324 0.118400 0.158816 0.405271 0.443005 0.194769 0.227046

Now we have finished the event detection method to the fourth step, aggregate. Next we will move to the transform and detect steps.