Collecting Data#
Outline#
Questions#
How do I use a pipeline to transform a frame from a trajectory into features?
How do I collect the entire trajectory’s frames into a feature signal?
Objectives#
Explain how to use the
SignalAggregator
class to collect generator data into a pandas DataFrame object.Show passing arguments to a
Generator
andSignalAggregator
object.
Import#
[1]:
import os
import freud # analysis toolkit
import gsd.hoomd # trajectory reader
import dupin as du
FILENAME = "lj-sim.gsd"
[2]:
def display_dataframe(df):
style = df.head().style
style.set_table_styles(
[
{
"selector": "th",
"props": "background-color: #666666; color: #ffffff; border: 1px solid #222222;",
},
{
"selector": "td",
"props": "background-color: #666666; color: #ffffff; border: 1px solid #222222;",
},
]
)
display(style)
Create the generator#
We use the generator from the last section of the tutorial.
[3]:
ls = (2, 4, 6, 8, 10, 12)
steinhardt = freud.order.Steinhardt(l=ls)
pipeline = du.data.freud.FreudDescriptor(
compute=steinhardt, attrs={"particle_order": [f"$Q_{{{i}}}$" for i in ls]}
).pipe(du.data.reduce.NthGreatest((-1, 1, 10, -10)))
Calling the Generator#
Passing Arguments#
Most of the time, we will want to pass arguments to a generator object. However, most of the time, we have also created a multi-step pipeline which may require their own arguments or not accept any. While the full availability of options for argument handling in the pipeline is not appropriate for this tutorial, in general adding elements to the pipeline does not change the signature expected. That is we can just pass the expected arguments of the original generator.
For our generator, the arguments are a system-like object which is a freud concept and neighbor query arguments to specify the local point neighbors to consider. One such system-like object is a gsd.hoomd.Frame
object which we use below to showcase using the pipeline defined above.
[4]:
nlist_kwargs = {"num_neighbors": 12, "exclude_ii": True}
with gsd.hoomd.open(FILENAME, "rb") as traj:
display(pipeline(traj[0], nlist_kwargs))
{'10th_greatest_$Q_{2}$': 0.20412414,
'1st_greatest_$Q_{2}$': 0.25000003,
'1st_least_$Q_{2}$': 2.1715324e-07,
'10th_least_$Q_{2}$': 2.2129704e-07,
'10th_greatest_$Q_{4}$': 0.36916766,
'1st_greatest_$Q_{4}$': 0.3807429,
'1st_least_$Q_{4}$': 0.28641072,
'10th_least_$Q_{4}$': 0.28641072,
'10th_greatest_$Q_{6}$': 0.22060224,
'1st_greatest_$Q_{6}$': 0.25863975,
'1st_least_$Q_{6}$': 0.110485375,
'10th_least_$Q_{6}$': 0.110485375,
'10th_greatest_$Q_{8}$': 0.59512764,
'1st_greatest_$Q_{8}$': 0.6035463,
'1st_least_$Q_{8}$': 0.5609913,
'10th_least_$Q_{8}$': 0.5609913,
'10th_greatest_$Q_{10}$': 0.2726314,
'1st_greatest_$Q_{10}$': 0.30271238,
'1st_least_$Q_{10}$': 0.1992841,
'10th_least_$Q_{10}$': 0.19928412,
'10th_greatest_$Q_{12}$': 0.43338743,
'1st_greatest_$Q_{12}$': 0.44562435,
'1st_least_$Q_{12}$': 0.4078147,
'10th_least_$Q_{12}$': 0.40781474}
Data Collection#
To collect the data into a usable state, we can use the SignalAggregator
class. This class takes a generator/pipeline object and through the accumulate
or compute
methods stores the sequential frames of features. These can then be turned into a pandas.DataFrame
object for further manipulation or event detection.
[5]:
signal_aggregator = du.data.aggregate.SignalAggregator(pipeline)
For this tutorial, we will use the accumulate
method which is a bit simpler to use than compute
. For accumulate
, we call the method with the arguments expected by the composed pipeline. This results in a new frame of features being stored in the SignalAggregator
class. Therefore, we iterate over the trajectory frames in order. After computing the features for all features, we call to_dataframe
to get the DataFrame object. We then save the data to a HDF5 file (for future use
in the tutorial).
[6]:
with gsd.hoomd.open(FILENAME, "rb") as traj:
for frame in traj:
signal_aggregator.accumulate(frame, nlist_kwargs)
df = signal_aggregator.to_dataframe()
df.to_hdf("./lj-data.h5", "data")
display_dataframe(df)
10th_greatest_$Q_{2}$ | 1st_greatest_$Q_{2}$ | 1st_least_$Q_{2}$ | 10th_least_$Q_{2}$ | 10th_greatest_$Q_{4}$ | 1st_greatest_$Q_{4}$ | 1st_least_$Q_{4}$ | 10th_least_$Q_{4}$ | 10th_greatest_$Q_{6}$ | 1st_greatest_$Q_{6}$ | 1st_least_$Q_{6}$ | 10th_least_$Q_{6}$ | 10th_greatest_$Q_{8}$ | 1st_greatest_$Q_{8}$ | 1st_least_$Q_{8}$ | 10th_least_$Q_{8}$ | 10th_greatest_$Q_{10}$ | 1st_greatest_$Q_{10}$ | 1st_least_$Q_{10}$ | 10th_least_$Q_{10}$ | 10th_greatest_$Q_{12}$ | 1st_greatest_$Q_{12}$ | 1st_least_$Q_{12}$ | 10th_least_$Q_{12}$ | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.204124 | 0.250000 | 0.000000 | 0.000000 | 0.369168 | 0.380743 | 0.286411 | 0.286411 | 0.220602 | 0.258640 | 0.110485 | 0.110485 | 0.595128 | 0.603546 | 0.560991 | 0.560991 | 0.272631 | 0.302712 | 0.199284 | 0.199284 | 0.433387 | 0.445624 | 0.407815 | 0.407815 |
1 | 0.153610 | 0.167962 | 0.023381 | 0.033380 | 0.214287 | 0.235337 | 0.056236 | 0.073473 | 0.525133 | 0.567725 | 0.251675 | 0.287110 | 0.370811 | 0.404888 | 0.136324 | 0.168037 | 0.302344 | 0.343929 | 0.120105 | 0.152464 | 0.397878 | 0.438995 | 0.194800 | 0.217982 |
2 | 0.162579 | 0.187238 | 0.014939 | 0.030309 | 0.210389 | 0.239093 | 0.053894 | 0.075834 | 0.528769 | 0.587941 | 0.196169 | 0.279068 | 0.378292 | 0.422634 | 0.129208 | 0.175510 | 0.310114 | 0.333017 | 0.134413 | 0.154003 | 0.393605 | 0.439719 | 0.170461 | 0.216085 |
3 | 0.154826 | 0.187591 | 0.023294 | 0.035020 | 0.209281 | 0.226303 | 0.048572 | 0.077595 | 0.543408 | 0.573807 | 0.246732 | 0.284698 | 0.374615 | 0.394863 | 0.137892 | 0.166574 | 0.308446 | 0.343068 | 0.121403 | 0.157507 | 0.396944 | 0.433905 | 0.209838 | 0.227965 |
4 | 0.145845 | 0.169722 | 0.016966 | 0.032802 | 0.216655 | 0.285643 | 0.033883 | 0.072493 | 0.548409 | 0.578489 | 0.214945 | 0.278768 | 0.371742 | 0.399764 | 0.139842 | 0.162539 | 0.309451 | 0.350324 | 0.118400 | 0.158816 | 0.405271 | 0.443005 | 0.194769 | 0.227046 |
Now we have finished the event detection method to the fourth step, aggregate. Next we will move to the transform and detect steps.