Machine-Learning Based Stacked Ensemble Model for Accurate

machine-learning (ML) based stacked ensemble model (SEM) with MD simulation, for the first time. As a proof-of-concept, two SEMs were developed to ana...
0 downloads 0 Views 2MB Size
Subscriber access provided by ALBRIGHT COLLEGE

Article

Machine-Learning Based Stacked Ensemble Model for Accurate Analysis of Molecular Dynamics Simulations Samrendra Kumar Singh, Karteek K. Bejagam, Yaxin An, and Sanket A Deshmukh J. Phys. Chem. A, Just Accepted Manuscript • DOI: 10.1021/acs.jpca.9b03420 • Publication Date (Web): 31 May 2019 Downloaded from http://pubs.acs.org on June 1, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Machine-Learning Based Stacked Ensemble Model for Accurate Analysis of Molecular Dynamics Simulations Samrendra K. Singha, Karteek K. Bejagamb, Yaxin Anb, Sanket A. Deshmukhb* a

b

CNH Industrial, Burr Ridge, Illinois 60527, United States

Department of Chemical Engineering, Virginia Tech, Blacksburg, Virginia 24061, United States

AUTHOR INFORMATION Corresponding Author *[email protected], Phone: +1 540-231-8785

Abstract Accurate, faster, and on-the-fly analysis of the molecular dynamics (MD) simulation trajectory becomes very critical during the discovery of new materials or while developing force-field parameters due to automated nature of these processes. Here to overcome the drawbacks of algorithm based analysis approaches, we have developed and utilized an approach that integrates machine-learning (ML) based stacked ensemble model (SEM) with MD simulation, for the first time. As a proof-of-concept, two SEMs were developed to analyze two dynamical properties of a water droplet, its contact angle, and hydrogen bonds. The two SEMs consisted of two layered networks of random forest, artificial neural network, support vector regression, Kernel ridge regression, and k-Nearest Neighbors ML models. The root mean square error values, uncertainty quantification, and sensitivity analysis of both the SEMs suggested that the final result was more accurate as compared to that of the individual ML models. This new computational framework is very general, robust, and has a huge potential in analyzing large size MD simulation trajectories as it can capture critical information very accurately. 1

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1. Introduction Accuracy of the results of molecular dynamics (MD) simulations, depends solely on the force-field (FF) parameters, and the precision of the results is determined by the algorithms and tools used to analyze MD simulation trajectories.1–3 In recent years, MD simulation have been integrated with optimization algorithms and machine-learning (ML) methods to accelerate the development of FF parameters and design of new materials.4–9 The ability of the algorithms/methods to perform accurate analysis becomes very critical during materials search or FF optimization because the input values for the next iteration of the optimization cycle are determined by evaluating the input sets and corresponding results in the current iterations. 7,10 Typically, these analyses are performed by using computer codes that are developed based on certain algorithms, which may have their own limitations. A wrong estimate of a desired property from MD simulations can sabotage the entire optimization process. Hence, it is important to develop methods/tools that are accurate, computationally efficient, and have the ability to perform on-the-fly analysis of the MD simulation trajectories during optimization process. Machine-learning (ML) methods have been used to predict and/or analyze molecular simulations as they are not limited by the accuracy of an algorithm.5,11–17 The ML methods heavily rely on the amount, diversity and integrity of the training data, features used to train the models, and hyperparameters used to construct the model.18–20 To develop, supervised ML models, identifying the unique and important features related to a specific property of materials is the most important and challenging task.21 Typically, 75 % to 90 % of the total data that represent the input features and related properties is used to train an ML model. While remaining data is used to test the predictability of a model. To the best of our knowledge, in the field of materials science, to predict the properties of a materials, in almost all the reported studies, only one type of ML model has been developed and used.18 Often relying on the predictions of only one ML model may not be ideal, especially during materials discovery or FF optimization. Hence, there is a pressing need to develop more robust, accurate, and reliable ML based approaches. Moreover, it has been predicted that by 2022 MD simulations may generate up to petabytes of data, which will constitute a valuable source of information.22 Hence,

2

ACS Paragon Plus Environment

Page 2 of 21

Page 3 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

developing ML approaches that can be trained with such big data to perform accurate predictions is very much desirable. Here, we present a novel computational framework that integrates an ML based stacked ensemble model (SEM) with MD simulation trajectories to perform accurate analysis.23,24 The SEM has several interconnected layers, where each layer consists of a number of different ML models. The ML models of the first layer receive the features and properties as input and output predictions, respectively. The predicted properties from the ML models in the first layer are combined and fed as input to train the ML models of the second layer, to enrich the predictions further.25 The predictions from every ML model have certain error associated with it, which depends on the inherent design of the ML model, hyper-parameters chosen for the model, and the complex relationship between the input features and the output. This process of stacking ML models in one or more layers lead to enrichment of the overall prediction accuracy.25 2. Methods Here, as a proof-of-concept two SEMs were developed to analyze an MD simulation trajectory for the following two dynamic properties of a water droplet: (i) its contact angle, and (ii) hydrogen-bond network in a water droplet. The SEMs developed in the present study consisted of two layers and following five types of ML models: random forest (RF),26 artificial neural network (ANN),7,27 support vector regression (SVR),28 Kernel ridge regression (KRR),29 and k-Nearest Neighbors (k-NN)30. The contact angle SEM was trained by using an ideal, equilibrated water droplet obtained from MD simulation. A novel set of input features describing the geometry of a droplet were created, and corresponding contact angle was used as an output feature. The hydrogen bond SEM was trained by using the MD simulation trajectory of a water droplet on a hexagonal boron nitride (BN) sheet. For this hydrogen bond SEM, the position of water molecules forming hydrogen bonds and its hydrogen bond status (i.e. presence/absence of a hydrogen bond) were used as the input and output features, respectively. To the best of our knowledge this contribution demonstrates the first attempt to analyze dynamic properties from MD simulation trajectories by using such an ML based SEM approach. The sensitivity analysis suggests that the predictions of the individual model and thus, the SEM depends on the size of the data. The uncertainty quantification to evaluate the performance of

3

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

the SEM shows that the final predictions of the SEM are more accurate as compared to individual ML models. Fig. 1 shows the architecture of the SEM developed in this present study, and overall process of the model training and result predictions. The first layer of SEM was constructed by combining two models of RF, ANN, SVE, KRR, and KNN, total of ten models. Details on the architecture and hyperparameters on each of these models are provided in the Supporting Information, Section S1, and Table S1. The second layer had two RF models, which used output from the first layer as an input. The final results were obtained by averaging the outputs from the second layer.

Fig. 1: The workflow of the stacked ensemble models (SEMs) developed in the present study. 3. Results and Discussion 3.1. SEM for Predicting Contact Angle 3.1.1. Data Generation for the Training of the SEM for Contact Angle In order to develop a robust SEM that is capable of maintaining high level of accuracy, under a wide range of inputs, it is important to train the ML models with a large set of high fidelity data.19 Specifically, this training data must cover a large range of feature landscape. Here

4

ACS Paragon Plus Environment

Page 4 of 21

Page 5 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

to create such a training data, a spherical water droplet with ~200 Å diameter (~150,000 SPC/Fw molecules) was created.31–33 This water droplet was further equilibrated by performing MD simulation for 1 ns. More details on the equilibration of the droplet are given in the Supporting Information Section S2. The last two equilibrated frames of the water droplet obtained from the simulation were used to create 15,762 feature-droplets. Specifically, a larger droplet from each frame, with ~200 Å diameter, was used to carve out 71 full spherical sub-droplets with radius ranging from 30 Å to 100 Å with 1 Å increments Fig. 2 (a). To mimic a water droplet on a planar surface, each of these full spherical sub-droplets were then sliced using a cut-off plane in Z-direction as shown in Fig. 2 (b - e), to create feature-droplets. The cut-off plane was moved in Z-direction by 1 Å steps to create new feature-droplets with different contact angles from each of the full spherical sub-droplets. To generate droplets that are observed in realistic hydrophilic and hydrophobic surfaces, we ensured that the cut-off plane position ranged from (10 Å - R Å) to (R Å - 10 Å) from the center of the sphere. Here, R represents the radius of the spherical subdroplet. As expected larger full spherical sub-droplet with 100 Å radius yielded 181 featuredroplets as compared to 41 feature-droplets generated from a smaller full spherical sub-droplet of 30 Å radius. Thus, by using all the 71 full spherical sub-droplets, we were able to generate a total of 7,881 feature-droplets for each frame (total of 15,762 feature-droplets). The features extracted from these feature-droplets and their corresponding contact angles were used to train and validate the ML models, which is sufficiently large dataset to train a high-fidelity model.

5

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Fig. 2: Feature generation and training of SEM from ideal water droplet (a) Original droplet with 200 Å diameter (~150,000 SPC/Fw water molecules). (b, c) Examples of two droplets created from 80 Å radius sphere. (d, e) Example of two droplets created from 50 Å radius spheres, for training of the SEM. (f) Schematic of input feature generation for the individual ML models in the SEM. Each row on the grid represent one feature, thus resulting in a feature set of 50 elements. We used only 19 features (3rd to 21st) as an input to our SEM out of 50, where the numbering index starts from the plane of the substrate. (g) 10 ML models of the Layer 1 were trained by using ten unique datasets from total of 13 datasets. (h) Training strategy for one of the two ML models, namely RF#3 model from Layer 2 is shown. RF#4 from Layer 2 was trained using twelfth dataset in similar manner. 3.1.2. Input Feature Engineering from a Feature-Droplet: Predictive power of an ML model strongly depends on the selected input features. Generation and selection of these input features is one of the most basic and challenging steps for applications of ML models. In the present study, we have developed a set of novel input features that are transferable to a water droplet placed on any hydrophilic or hydrophobic surface. Moreover, these new features can be used to represent a water droplet of any size, represented by

6

ACS Paragon Plus Environment

Page 6 of 21

Page 7 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

any type of commonly used atomistic water models e.g. SPC, SPC/E, TIP3P etc. 34 Fig. 2 (f) shows the schematic of the feature development process that involved the mapping of positions of atoms from a 3-D droplet on a 2-D grid, and calculating the number density of water molecules in every cell in the grid. Specifically, from a feature-droplet, only the oxygen of water molecules were used for feature creation. The 3-D coordinates (x, y, z) of the oxygen atoms of the water were reduced to a 2-D coordinate (r, Z) system, where r is the radial distance of an atom from the Z-axis. The Z-axis is a vector normal to the substrate plane passing through the center of the droplet as shown in the Fig. 2 (f). The (r, Z) plane was divided into 50 x 50 square grid, which could be mapped on all the feature-droplets generated in the present study. The 2 Å spacing in the grid generated cells with optimum oxygen number density. For every cell in the grid the number density of the oxygen atoms was calculated by dividing their number with the volume occupied by the corresponding water molecules. The projection of a 2-D cell in (r, Z) space looks like a torus shaped region in 3-D space as shown in Fig. 2 (f). The number density values across the rows were then added, yielding a feature vector of 50 elements, which was normalized by dividing every element with the maximum value. This allowed us to generate features that can be used to predict the contact angle of water on a given hydrophilic or hydrophobic surface, irrespective of its size. One can fit a circle using the points representing the water-air interface mapped on a 2D plane, with more than 4-5 points. The circle fit improves with the number of points before becoming insensitive to them. In other words, after a certain number of points, the additional points become redundant for a fitting a circle. For the training of the SEM model as discussed in detail in the following section, only 19 elements were utilized from this 50 element vector. Specifically, the elements in the first two rows of the grid near the water-surface interface were ignored. We observed that 19 elements were sufficient to represent the shape of the droplet. Thus, the 19 elements from 3rd to 21st were utilized as input features for the training of the SEM. 3.1.3. Calculations of Contact Angle of Feature-Droplets The contact angle of a feature-droplet was used as an output feature along with the corresponding input features to train individual ML models in the SEM. As shown in Fig. 2 (b e), the contact angle of a given feature-droplet was calculated based on the radius of the sphere encompassing the droplet (R), and the distance of the cut-off plane from the center of the sphere (H). Water molecules at the interface of the cut-off plane and the droplet were kept or removed 7

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 21

from the droplet by considering the position of an oxygen atom with respect to the cut-off plane. In a typical MD simulation of a water droplet on a hydrophilic or hydrophobic substrate, the distance between a water layer in direct contact with the substrate and top surface of the substrate could be of ~3.0 Å to ~4.0 Å. Here, to reproduce this, we assumed a substrate plane to be ~3.5 Å below the cut-off plane for contact angle calculations. The contact angle ϴ for each of the feature-droplets was calculated by using the equation below: ϴ = 90 - φ

……Eq. 1

φ = cos-1[(H - 3.5)/R]

……Eq. 2

H is the distance of the cut-off plane from the center of the sphere, and it is positive in the positive z-direction and negative in the negative z-direction. ϴ and φ are shown in Fig 2 (b - e), The angle calculated by using Eq. 1 is referred as an “Actual Angle” in Fig. 3 3.1.4. Training of the Contact Angle SEM To train the SEM, a water droplet from the MD simulation trajectory is converted into input and output features as discussed above. The number of water molecules and 19 features of a droplet are combined to create a feature vector of size 1 x 20. The feature vector is fed into every ML model in the Layer 1, and the contact angles predicted by each ML model is combined to form a 1 x 10 feature vector. Therefore, the output of the Layer 1 is a vector with 10 contact angle values as predicted by individual ML models. Individual ML models in the Layer 2 are trained to reduce the prediction errors made by the ML models in the Layer 1. This task is performed by using the already trained models in the Layer 1 to make predictions for a previously unused, unique training dataset. The contact angle predictions from the Layer 1 (10 contact angles, one from each ML model) are then used as the input for an ML model in Layer 2. A given ML model in Layer 2 is trained to make a better prediction for the contact angle based on the 10 predictions made by the first layer. This process is repeated for each ML model in the Layer 2. The two ML models in Layer 2 predict the two values of contact angles (one for each ML model). These two predictions were then averaged to get the final value of contact angle predicted by the SEM.

8

ACS Paragon Plus Environment

Page 9 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

In detail, the original dataset of 15,762 feature-droplets was split into typical 80:20 training and testing datasets. Thus, making it 12,000 data points for training and 3,762 data points for testing. The training dataset was further split into 13 equal size subsets, where 12 sets were used for training and 1 set for the validation. Each individual ML model in the SEM was trained on one of these independent training subsets of the original dataset as shown in Fig. 2 (g and h). This ensures the independence of the various ML models, which is essential for the high fidelity of the ensemble method. A droplet feature vector of 20 elements was used as input for the contact angle SEM. The first element of the input feature vector represents the number of water molecules in a feature-droplet. The remaining 19 elements were taken from the geometrical features of the droplet. The 19 droplet features were 3rd to 21st elements of the droplet features starting from the water-surface interface. In other words, the first two elements near the water-surface interface were ignored, as it is a common practice in contact angle calculations to ignore the region near water-surface interface (< 4 Å), due to inconsistent densities in this region.32,33,35 As shown in Fig. 2 (g), all ten ML models present in Layer 1 were trained with unique ten datasets of feature vectors of 20 elements each before training the two models in Layer 2. The ten trained ML models in Layer 1 were used to predict the contact angles for the eleventh dataset, and outputs from the ten ML models were combined to create an input vector, which was used to train the RF #3 model in Layer 2 (Fig. 2 (h)). Thus, the size of the input feature vector depends directly on the layer where a given ML model resides. Similarly, RF #4 model was trained using the twelfth dataset.

3.1.5. Results of Training of an SEM for Contact Angle Predictions Fig. 3 (a), and (b) show the prediction accuracy of the ten and two models used in Layer 1 and Layer 2 of the SEM, respectively. The averaged predictions of the two ML models in Layer 2 is also shown in Fig. 3 (c) as final predictions. We find a tight clustering of points about the main diagonal of the predicted angle vs actual angle plot. The accuracy is as follows: final prediction > Layer 2 > Layer 1. The increased tight clustering of points in final predictions as compared to the Layer 1 suggest that the ML models in the Layer 2 predict the contact angle more accurately. The root mean square error (RMSE) value for each model is shown in Section S3 of the Supporting Information (Fig. S6 (a), and Table S2), and it decreases for the final predictions as compared to the individual ML models in Layer 1 and Layer 2. The RMSE is

9

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

aggregation of the residuals, and can be used to evaluate the accuracy of the predictive model.36 A low RMSE value points to lower residuals, and hence higher prediction accuracy.

To further investigate the model performance in detail, the uncertainty quantification was performed using bootstrap sampling technique.37–39 In bootstrapping technique, a number of artificial datasets are generated by choosing points randomly from an existing dataset. It has been effectively used to study uncertainties in different ML models.40,41 The insets in Fig. 3 ( a - c) show the probability distribution histograms of the bootstrapping errors with 1000 measurements for each model. The black vertical line shows the mean value of the errors, and the region between two dotted red lines show the 95 % likelihood confidence interval. It can be seen that the mean of the model prediction error for the models in Layer 1 ranges from ~0.74 to ~2.06. Considering the mean of the bootstrap model prediction error, the 95 % confidence interval, and the RMSE, the final average predictions of the SEM performs better as compared to individual models in Layer 1 and Layer 2 (Fig. S6 (a), and (b) of the Supporting Information).

Fig. 3: Parity plot representing the training and testing of individual models in the SEM for contact angle predictions. Data shown in black dots on the diagonal of each model represents the

10

ACS Paragon Plus Environment

Page 10 of 21

Page 11 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

test data. The probability distributions of bootstrapping errors are shown in insets. The black lines in insets show the mean bootstrapping error and the region between two red lines show the 95 % confidence interval obtained from bootstrapping technique. (a) ML models in the Layer 1, (b) ML models in the Layer 2, (c) Final prediction of the SEM, and (d) Predictions of the contact angles. Contact angles estimated by the circular fit method vs the SEM method. The two angle values in the parenthesis, next to the inset droplet images, represent the angles calculated and predicted by the circular fit and SEM models, respectively. 3.1.6. Testing of the Contact Angle SEM on MD simulation Trajectories of a Water Droplet To further evaluate the reliability and accuracy of this new approach, we employed the contact angle SEM to estimate the contact angle of a water droplet on a hydrophilic and a hydrophobic surface. Here, we performed 93 MD simulations for a water droplet with 2000 molecules on a 2-D sheet. More details on the simulation methodology can be found in Section S4 of the Supporting Information. Interaction parameters between the 2-D sheet and water were tuned to generate hydrophilic and hydrophobic surfaces such that the contact angle ranged from ~30° to ~140° (Fig. S7 of the Supporting Information). Recently, we have developed an algorithm to calculate the contact angle of a water droplet on a surface, which was used to determine the values of contact angle of the water droplets generated in the present study.32,33 Note, we refer this value as angle calculated by “circular fit method”. The comparison of the angle calculated by circular fit method and the angle predicted by the SEM is shown in Fig. 3 (d). For angles predicted between ~25° to ~110°, both methods show similar values (difference < 10 %) of contact angles for 85 % of cases with average difference being ~5 %. We find that for angles > ~110° the average difference between circular fit methods and predicted contact angle from SEM further increases to ~11 %. To further identify the origin of this error and to evaluate which model is able to estimate the correct contact angle, we took a closer look at each snapshot. As shown in the insets in the Fig. 3 (d), we find that the contact angles predicted by the SEM are accurate as compared to the circular fit method. This further suggests that SEM can be used as a powerful tool to accurately analyze the MD simulation trajectories.

3.1.7 Sensitivity Analysis of the Contact Angle SEM To test the sensitivity of the SEM on the number of data points, we created eight different datasets with different sizes. Specifically, the SEM of contact angle was trained by using 1 %, 2 %, 5 %, 10 %, 20 %, 40 %, 60 %, and 80 % randomly selected feature-droplets from the total of 15,762. All the individual ML models in Layer 1 and Layer 2 of the SEM were trained by 11

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

using the approach described in Fig. 2 (g) and (h). The final predictions were obtained by averaging the results obtained from RF#3 and RF#4 in Layer 2. Fig. 4 (a - h) show the final predictions of the SEM for contact angle trained by using the datasets of different sizes. Results of the SEM trained by using 1 % to 80 % data are shown in Fig. S8 and Table S3 of the Supporting Information. For each of these instances, the training dataset was split into 13 equal parts, as explained earlier. Overall, we find that the accuracy of the final prediction improves significantly with the size of training dataset. As can be seen from Fig. 4 (i), the comparison of the results obtained for 10 % data (mean predictions = 0.66, RMSE = 1.044, and R 2= 0.9989) and with 100 % data (mean predictions = 0.64, RMSE = 0.94, and R2= 0.9991) shows very similar accuracy in predictions of the SEM. This suggests that the SEM is less sensitive when the size of the data is > 1200 training feature-droplets.

Fig. 4: Parity plot representing the training and testing of SEM for contact angle predictions with (a) 1 %, (b) 2 %, (c) 5 %, (d) 10 %, (e) 20 %, (f) 40 %, (g) 60 %, and (h) 80 % data. Data shown in black on each model represents the test data. Insets in Figures (a) to (h) show the uncertainty quantified by bootstrapping method. (i) the plot for the percentage (%) of data used for training and corresponding mean prediction error. It suggests the convergence of the mean error for the 10 % to 100 % dataset. The error bar on each data shows the 95 % confidence interval range. 3.2. SEM for Predicting Hydrogen Bonds 3.2.1. Data Generation for the Training of the SEM for Hydrogen Bonds To further validate the robustness of the SEM approach, we developed an SEM to analyze the hydrogen bonds present in water molecules in a droplet. Specifically, we performed MD simulation of an SPC/Fw water droplet with 2,000 molecules on a single BN sheet at 300 K

12

ACS Paragon Plus Environment

Page 12 of 21

Page 13 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

for ~2 ns.33 The BN sheet was represented by using the Tersoff potential.42 The interactions between SPC/Fw and BN sheet were modelled using the recently developed parameters to reproduce the macroscopic contact angle of water on hexagonal BN sheet.33 The simulation trajectory was stored after every 1 ps, which resulted in 1000 frames. All the MD simulation were performed using LAMMPS package.43 The BN sheet was infinite in X and Y-directions, which was implemented by periodic boundary conditions (PBC). In the Z-direction box size was 300 Å so as to avoid interactions of the droplet with its image and to create an air-water interface. Each water molecule’s hydrogen bond status (i.e. presence or absence of a hydrogen bond) was considered during this analysis. The hydrogen bond status for each water molecule was calculated using the conditions described in the Eq. 3. A total of 100,000 data points were used to train the SEM for hydrogen bonds. 3.2.2. Input Features for Hydrogen Bond SEM In the literature, various geometric and energetic criteria have been used to define a hydrogen bond between two water molecules.2,44,45 Here we have used the following geometric criteria to identify two hydrogen bonded water molecules: ROO ≤ 3.6 Å; ROH ≤ 2.45 Å; φ ≤ 30 Å

……Eq. 3

In Eq. 3, the ROO represent the distance between oxygen of a donor and acceptor atoms. The ROH, represents the distance between hydrogen atom of a donor molecule and oxygen atom of an acceptor molecule. The φ is the angle between oxygen of an acceptor molecule, oxygen of a donor molecule, and hydrogen of a donor molecule (Oacceptor….Odonor-Hdonor). Note, two water molecules that follow these geometric criteria are known to satisfy the hydrogen bond energetic criteria.2

In the present study, to generate the input features for the training of an SEM of hydrogen bonds, two water molecules with oxygen-oxygen distance within 3.6 Å were identified. Each water molecule present in the bulk had 4 to 8 neighboring molecules and had tetrahedron-like structure (Fig. 5 (b)). The molecule for which its neighbors were identified was placed at the center of a tetrahedron (referred as central molecule from here on), with its neighbors appearing at the vertices of the tetrahedron. The water molecules forming hydrogen bonds with each other, 13

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

closely followed the geometric criteria shown in Eq. 3. The individual ML models in Layer 1 and Layer 2 were trained by using the relative position of oxygen and hydrogen atoms (x, y, and z coordinates) of a neighboring water molecule that form hydrogen bonds with the central molecule. This relative position of a neighboring water molecule was calculated by placing the oxygen of the central molecule at the origin (0, 0, 0). Then the central atom was reoriented such that the vector connecting the oxygen atom to one of the hydrogen atoms was aligned with the xaxis. The normal vector to the water molecular plane was aligned with the y-axis. The neighboring water molecules were reoriented to retain their original position from the simulation trajectory with respect to the central water molecule. This relative position of the neighbor water molecule i.e. x, y, and z coordinates of oxygen and hydrogen atoms resulted in a feature with a vector of 9 elements. These features of a neighboring water molecule were used as input to train the SEM for hydrogen bond predictions. These features developed here are very general and can be used to define hydrogen bonds of a water droplet of any size, and on a hydrophilic or a hydrophobic surface. 3.2.3. Output Features for Hydrogen Bond SEM The hydrogen bond status i.e. presence or absence of a hydrogen bond between the central water molecule and its neighboring water molecules was used as the output feature to train the individual ML models in Layer 1 and Layer 2 of the SEM. The hydrogen bonds were identified by using the geometric criteria defined in Eq. 3. The output feature was set to 0 and 1 if the hydrogen bond was absent and present, respectively, between the two water molecules. 3.2.4. Results of Training of an SEM for Hydrogen Bond Predictions Here, to generate a high-quality diverse dataset, the 100,000 input features of SPC/Fw water molecules, and their corresponding hydrogen bond status were randomly divided into thirteen equal parts. The training of the individual models in SEM for hydrogen bonds was also trained by using the approach demonstrated in Fig. 2 (g), and (h). Briefly, 10 datasets were used to train individual models in Layer 1. The eleventh and twelfth dataset were used to train the RF#3, and RF#4 from Layer 2, respectively. The thirteenth dataset was used to test all the ML models in the SEM. The final predictions were estimated by averaging the results of RF#3, and RF#4 models. To evaluate the robustness of this SEM in predicting the accurate hydrogen bonds,

14

ACS Paragon Plus Environment

Page 14 of 21

Page 15 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

we utilized the water droplets with 2000 molecules of SPC/Fw, SPC, and SPC/E water models.33,46

Fig. 5 (d - f) shows the hydrogen bonds calculated by using the SEM (top panel), geometric criteria shown in Eq. 3 (middle panel), and difference between the results obtained from SEM and geometric criteria (lower panel). Overall, for all three water models, the SEM for hydrogen bond shows an excellent agreement with the results obtained from the geometric criteria shown in Eq. 3. The blue region in lower panel suggests that predictions of the ML model in the bulk region match very well with the geometric criteria. The hydrogen bonds predicted by SEM at the air-water and substrate-water interface are slightly higher (5 %) as compared to the geometric criteria. A possible explanation for the higher number of hydrogen bond predicted by the SEM in the interfacial region could be due to fact that the training set has significantly higher number of features corresponding to the water molecules in the bulk region. Hence, the SEM learning is slightly biased towards predicting the hydrogen bond for the water molecules in bulk region more accurately at the expense of interfacial water molecules.

15

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Fig. 5: Feature extraction for hydrogen bonds in water: (a) Two vectors O-H1 (black arrow) and the normal vector (yellow arrow) to the H1-O-H2 plane are shown, (b) Four possible hydrogen bonds (dotted lines) of a water molecule (at the center) with its neighboring water molecules, (c) The cartesian coordinates of the three atoms (9 features) of a hydrogen bonded water molecule used as input feature for the SEM are shown. Hydrogen bonds calculated in a water droplet represented by (d) SPC, (e) SPC/E, and (f) SPC/Fw models. Top and middle panel show the final predictions from SEM and calculated by using the geometric criteria shown in Eq. 3. The bottom panel shows the difference (Δ) between the hydrogen bonds predicted by SEM and calculated by geometric criteria. Fig. 5 (d - f) and Table S4 from Supporting Information show the results obtained from uncertainty quantification performed by using bootstrap sampling technique.39 The results were obtained by taking 1000 averages of 1000 re-sampled datasets from validation dataset. It can be seen that the average percentage error associated with the individual ML models is higher as

16

ACS Paragon Plus Environment

Page 16 of 21

Page 17 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

compared to that of the final output of the SEM. Moreover, the region that show the 95 % likelihood confidence interval also becomes narrow for the final predictions, suggesting that the accuracy of the SEM is further improved. We also evaluated the SEM with various statistical measures and training data size in supporting information Section S6 (Fig. S9 and S10, Table S4 and S5). These statistical measures suggest similar to SEM for contact angle, the prediction accuracy improved by stacking the ML models. The bootstrap average prediction error did not significantly change with increasing the training data size, but the 95 % confidence interval narrowed with a larger data size. The sensitivity analysis on the training data size showed that the mean error and 95 % confidence interval stabilized above ~40,000 training points (Fig. S11 and Table S6 in Supporting Information). 4. Conclusion A novel computational framework that employs stacked ensemble model (SEM), for the first time, to analyze the molecular dynamics (MD) simulation trajectories was developed. As a proof-of-concept, two SEMs were developed and used to analyze the contact angle of a water droplet, and hydrogen bonds in water molecules in a droplet. The SEMs had two layers of several different machine-learning (ML) models including, RF, ANN, SVR, k-NN, and KRR. To train the contact angle SEM, droplet size independent, unique features associated with the geometrical shape were developed. This SEM was trained to predict the contact angle of a water droplet for the corresponding input features. The uncertainty quantification using bootstrapping technique, and sensitivity of the SEM was also studied. The results of the mean of the bootstrap model prediction error, the 95 % confidence interval, and the RMSE suggests that the final predictions of the SEM are better as compared to individual models in the SEM. This SEM was also successfully used to predict the contact angle of a water droplet from MD simulation trajectory. The hydrogen bond SEM was trained by using the relative position of a water molecule forming a hydrogen bond with a given water molecule. Again, we find that the accuracy of the final prediction of the SEM was higher as compared to the predictions of the individual ML models in the SEM. Thus, the SEM approach proposed here is novel, general, and can be utilized to analyze other structural and dynamical properties of other solvents, materials, and biomolecules. Moreover, it can be extended to analyze trajectories/data obtained from other computational methods such as density functional theory, Monte Carlo simulations, and continuum calculations, to name a few. 17

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Supporting Information Details of machine-learning (ML) models, the method of generating 200 Å-diameter water droplet, evaluation of the performance of the stacked ensemble model (SEM), computational details of a water droplets on a 2-D sheet, sensitivity analysis of SEM.

Acknowledgments: Authors would like to acknowledge Advanced Research Computing (ARC), Virginia Tech for providing the computational resources. S.A.D. acknowledges the Start-up Funds from Virginia Tech. Authors are thankful to Prof. Luke Achenie for fruitful discussion on this work. References: (1) Bejagam, K. K.; Singh, S.; Deshmukh, S. A. Nanoparticle Activated and Directed Assembly of Graphene into a Nanoscroll. Carbon N. Y. 2018, 134, 43–52. (2) Deshmukh, S.; Mooney, D. A.; McDermott, T.; Kulkarni, S.; Don MacElroy, J. M. Molecular Modeling of Thermo-Responsive Hydrogels: Observation of Lower Critical Solution Temperature. Soft Matter 2009, 5 (7), 1514–1521. (3) Santiso, E.; Herdes, C.; Müller, E. On the Calculation of Solid-Fluid Contact Angles from Molecular Dynamics. Entropy 2013, 15 (9), 3734–3745. (4) An, Y.; Bejagam, K. K.; Deshmukh, S. A. Development of Transferable Nonbonded Interactions between Coarse-Grained Hydrocarbon and Water Models. J. Phys. Chem. B 2019. https://doi.org/10.1021/acs.jpcb.8b07990. (5) Bejagam, K. K.; An, Y.; Singh, S.; Deshmukh, S. A. Machine-Learning Enabled New Insights into the Coil-to-Globule Transition of Thermosensitive Polymers Using a CoarseGrained Model. J. Phys. Chem. Lett. 2018, 6480–6488. (6) An, Y.; Bejagam, K. K.; Deshmukh, S. A. Development of New Transferable CoarseGrained Models of Hydrocarbons. J. Phys. Chem. B 2018, 122 (28), 7143–7153. (7) Bejagam, K. K.; Singh, S.; An, Y.; Deshmukh, S. A. Machine-Learned Coarse-Grained Models. J. Phys. Chem. Lett. 2018, 9 (16), 4667–4672. (8) Patra, T. K.; Meenakshisundaram, V.; Hung, J.-H.; Simmons, D. S. Neural-Network-Biased Genetic Algorithms for Materials Design: Evolutionary Algorithms That Learn. ACS Comb. Sci. 2017, 19 (2), 96–107. (9) Bejagam, K. K.; Singh, S.; An, Y.; Berry, C.; Deshmukh, S. A. PSO-Assisted Development of New Transferable Coarse-Grained Water Models. J. Phys. Chem. B 2018, 123(4) 909-921 (10) Meenakshisundaram, V.; Hung, J.-H.; Patra, T. K.; Simmons, D. S. Designing SequenceSpecific Copolymer Compatibilizers Using a Molecular-Dynamics-Simulation-Based Genetic Algorithm. Macromolecules 2017, 50 (3), 1155–1166. (11) Gastegger, M.; Behler, J.; Marquetand, P. Machine Learning Molecular Dynamics for the Simulation of Infrared Spectra. Chem. Sci. 2017, 8 (10), 6924–6935. (12) Glazer, D. S.; Radmer, R. J.; Altman, R. B. Combining Molecular Dynamics and Machine Learning to Improve Protein Function Recognition. Pac. Symp. Biocomput. 2008, 332–343. 18

ACS Paragon Plus Environment

Page 18 of 21

Page 19 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

(13) Montavon, G.; Rupp, M.; Gobre, V.; Vazquez-Mayagoitia, A.; Hansen, K.; Tkatchenko, A.; Müller, K.-R.; von Lilienfeld, O. A. Machine Learning of Molecular Electronic Properties in Chemical Compound Space. New J. Phys. 2013, 15 (9), 095003. (14) Tribello, G. A.; Ceriotti, M.; Parrinello, M. Using Sketch-Map Coordinates to Analyze and Bias Molecular Dynamics Simulations. Proc. Natl. Acad. Sci. U. S. A. 2012, 109 (14), 5196– 5201. (15) Decherchi, S.; Berteotti, A.; Bottegoni, G.; Rocchia, W.; Cavalli, A. The Ligand Binding Mechanism to Purine Nucleoside Phosphorylase Elucidated via Molecular Dynamics and Machine Learning. Nat. Commun. 2015, 6, 6155. (16) Arhami, M.; Kamali, N.; Rajabi, M. M. Predicting Hourly Air Pollutant Levels Using Artificial Neural Networks Coupled with Uncertainty Analysis by Monte Carlo Simulations. Environ. Sci. Pollut. Res. Int. 2013, 20 (7), 4777–4789. (17) Jorge, M.; Fischer, M.; Gomes, J. R. B.; Siquet, C.; Santos, J. C.; Rodrigues, A. E. Accurate Model for Predicting Adsorption of Olefins and Paraffins on MOFs with Open Metal Sites. Ind. Eng. Chem. Res. 2014, 53 (40), 15475–15487. (18) Liu, Y.; Zhao, T.; Ju, W.; Shi, S. Materials Discovery and Design Using Machine Learning. Journal of Materiomics 2017, 3 (3), 159–177. (19) Millard, K.; Richardson, M. On the Importance of Training Data Sample Selection in Random Forest Image Classification: A Case Study in Peatland Ecosystem Mapping. Remote Sensing 2015, 7 (7), 8489–8515. (20) Singh, S.; Abbassi, H. 1D/3D Transient HVAC Thermal Modeling of an off-Highway Machinery Cabin Using CFD-ANN Hybrid Method. Appl. Therm. Eng. 2018, 135, 406–417. (21) Li, Z.; Ma, X.; Xin, H. Feature Engineering of Machine-Learning Chemisorption Models for Catalyst Design. Catal. Today 2017, 280, 232–238. (22) Pérez, A.; Martínez-Rosell, G.; De Fabritiis, G. Simulations Meet Machine Learning in Structural Biology. Curr. Opin. Struct. Biol. 2018, 49, 139–144. (23) Funda Güneş; Russ Wolfinger; Pei-Yi Tan Stacked Ensemble Models for Improved Prediction Accuracy, Static Analysis Symposium, 2017. (24) Sehgal, M. S. B.; Gondal, I.; Dooley, L. Stacked Regression Ensemble for Cancer Class Prediction. INDIN ’05. 2005 3rd IEEE International Conference on Industrial Informatics, 2005. (25) Wang, G.; Hao, J.; Ma, J.; Jiang, H. A Comparative Assessment of Ensemble Learning for Credit Scoring. Expert Syst. Appl. 2011, 38 (1), 223–230. (26) Breiman, L. Random Forests. Mach. Learn. 2001, 45 (1), 5–32. (27) Jain, A. K.; Mao, J.; Mohiuddin, K. M. Artificial Neural Networks: A Tutorial. Computer 1996, 29 (3), 31–44. (28) Suykens, J. A. K.; Vandewalle, J. Least Squares Support Vector Machine Classifiers. Neural Process. Letters 1999, 9 (3), 293–300. (29) An, S.; Liu, W.; Venkatesh, S. Face Recognition Using Kernel Ridge Regression. In 2007 IEEE Conference on Computer Vision and Pattern Recognition; 2007, 1–7. (30) Kramer, O. K-Nearest Neighbors. In Dimensionality Reduction with Unsupervised Nearest Neighbors; Kramer, O., Ed.; Springer Berlin Heidelberg: Berlin, Heidelberg, 2013, 13–23. (31) Wu, Y.; Tepper, H. L.; Voth, G. A. Flexible Simple Point-Charge Water Model with Improved Liquid-State Properties. J. Chem. Phys. 2006, 124 (2), 024503. (32) Bejagam, K. K.; Singh, S.; Deshmukh, S. A. Development of Non-Bonded Interaction Parameters between Graphene and Water Using Particle Swarm Optimization. J. Comput.

19

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chem. 2017, 721-734 (33) Achari, P. F.; Bejagam, K. K.; Singh, S.; Deshmukh, S. A. Development of Non-Bonded Interaction Parameters between Hexagonal Boron-Nitride and Water. Comput. Mater. Sci. 2019, 161, 339–345. (34) Mark, P.; Nilsson, L. Structure and Dynamics of the TIP3P, SPC, and SPC/E Water Models at 298 K. J. Phys. Chem. A 2001, 105 (43), 9954–9960. (35) Werder, T.; Walther, J. H.; Jaffe, R. L.; Halicioglu, T.; Koumoutsakos, P. On the Water−Carbon Interaction for Use in Molecular Dynamics Simulations of Graphite and Carbon Nanotubes. J. Phys. Chem. B 2003, 107 (6), 1345–1352. (36) Géron, A. Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems; “O’Reilly Media, Inc.,” 2017. (37) Efron, B.; Tibshirani, R. J. An Introduction to the Bootstrap; CRC Press, 1994. (38) Davison, A. C.; Kuonen, D. An Introduction to the Bootstrap with Applications in R. Statistical computing & Statistical graphics newsletter 2002, 13 (1), 6–11. (39) Hirsch, R. M.; Archfield, S. A.; De Cicco, L. A. A Bootstrap Method for Estimating Uncertainty of Water Quality Trends. Environmental Modelling & Software 2015, 73, 148– 166. (40) Reich, Y.; Barai, S. V. Evaluating Machine Learning Models for Engineering Problems. Artificial Intelligence in Engineering 1999, 13 (3), 257–272. (41) Li, Z.; Omidvar, N.; Chin, W. S.; Robb, E.; Morris, A.; Achenie, L.; Xin, H. MachineLearning Energy Gaps of Porphyrins with Molecular Graph Representations. J. Phys. Chem. A 2018, 122 (18), 4571–4578. (42) Brenner, D. W.; Shenderova, O. A.; Harrison, J. A.; Stuart, S. J.; Ni, B.; Sinnott, S. B. A Second-Generation Reactive Empirical Bond Order (REBO) Potential Energy Expression for Hydrocarbons. J. Phys. Condens. Matter 2002, 14 (4), 783. (43) Plimpton, S.; Crozier, P.; Thompson, A. LAMMPS-Large-Scale Atomic/molecular Massively Parallel Simulator. Sandia National Laboratories 2007, 18. (44) Deshmukh, S. A.; Sankaranarayanan, S. K. R. S.; Suthar, K.; Mancini, D. C. Role of Solvation Dynamics and Local Ordering of Water in Inducing Conformational Transitions in poly(N-Isopropylacrylamide) Oligomers through the LCST. J. Phys. Chem. B 2012, 116 (9), 2651–2663. (45) Deshmukh, S.; Mooney, D. A.; MacElroy, J. Molecular Simulation Study of the Effect of Cross-Linker on the Properties of poly(N-Isopropyl Acrylamide) Hydrogel. Mol. Simul. 2011, 37 (10), 846–854. (46) Vega, C.; de Miguel, E. Surface Tension of the Most Popular Models of Water by Using the Test-Area Simulation Method. J. Chem. Phys. 2007, 126 (15), 154707.

20

ACS Paragon Plus Environment

Page 20 of 21

Page 21 of 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

TOC Graphics

21

ACS Paragon Plus Environment