Configuration-Sampling-Based Surrogate Models for Rapid

the computational cost of molecular simulation, surrogate models (i.e. efficient ... In this context, “surrogate model” is an umbrella term that c...
0 downloads 0 Views 10MB Size
Subscriber access provided by Kaohsiung Medical University

Molecular Mechanics

Configuration-Sampling-Based Surrogate Models for Rapid Parameterization of Non-bonded Interactions Richard A. Messerly, Seyed Mostafa Razavi, and Michael R. Shirts J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.8b00223 • Publication Date (Web): 04 May 2018 Downloaded from http://pubs.acs.org on May 5, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 65 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Configuration-Sampling-Based Surrogate Models for Rapid Parameterization of Non-bonded Interactions Richard A. Messerly,∗,† S. Mostafa Razavi,‡ and Michael R. Shirts¶ †Thermodynamics Research Center, National Institute of Standards and Technology, Boulder, Colorado, 80305 ‡Department of Chemical and Biomolecular Engineering, The University of Akron, Akron, Ohio, 44325 ¶Department of Chemical and Biological Engineering, University of Colorado, Boulder, Colorado, 80309 E-mail: [email protected]

Abstract In this study, we present an approach for rapid force field parameterization and uncertainty quantification of the non-bonded interaction parameters for classical force fields. The accuracy of most thermophysical properties, and especially vapor-liquid equilibria Contribution of NIST, an agency of the United States government; not subject to copyright in the United States.

1 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(VLE), obtained from molecular simulation depends strongly on the non-bonded interactions. Traditionally, non-bonded interactions are parameterized to agree with macroscopic properties by performing large amounts of direct molecular simulation. Due to the computational cost of molecular simulation, surrogate models (i.e. efficient models that approximate direct molecular simulation results) are an essential tool for highdimensional parameterization and uncertainty quantification of non-bonded interactions. The present study compares two different configuration-sampling-based surrogate models, namely, Multistate Bennett Acceptance Ratio (MBAR) and Pair Correlation Function Rescaling (PCFR). MBAR and PCFR are coupled with the Isothermal Isochoric (ITIC) thermodynamic integration method for estimating vapor-liquid saturation properties. We find that MBAR and PCFR are complementary in their roles. Specifically, PCFR is preferred when exploring distant regions of the parameter space while MBAR is better in the local domain.

1 Introduction Molecular simulation is an invaluable tool in many fields of science and engineering. One of its many purposes is the efficient prediction of thermophysical properties such as sat saturated liquid density (ρsat l ), saturated vapor density (ρv ), saturated vapor pressure

(Pvsat ), enthalpy of vaporization, surface tension, viscosity, etc. The quantitative reliability of the estimated property values depends almost entirely on the force field (i.e. the molecular model) employed in the molecular simulation. For this reason, force field development is an important area in molecular modeling. Since the intramolecular and electrostatic contributions to the force field are often parameterized with ab initio results, this study focuses on parameterizing the non-bonded (intermolecular) van der Waals interactions with vapor-liquid equilibria (VLE) properties. 2 ACS Paragon Plus Environment

Page 2 of 65

Page 3 of 65 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Recently, several groups have developed united-atom (UA) based force fields for predicting vapor-liquid equilibria. Potoff and Bernard-Brunel 1 demonstrated that the Mie λ6 potential (i.e. a three parameter Lennard-Jones) provides considerable improvement at 2 sat predicting both ρsat l and Pv as compared to UA LJ 12-6 models, such as TraPPE-UA and

NERD. 3 Subsequently, Hemmen and Gross 4 implemented the Mie λ-6 potential but introduced an additional fitting parameter by using anisotropic-united-atom (AUA) sites for terminal groups (TAMie). Although the Mie λ-6 potential has shown significant promise in providing highly accurate force fields for VLE, it should not be viewed as a panacea. In fact, the force fields developed by Shah et al. 5 (TraPPE-UA2) and Errington and Panagiotopoulos 6 for ethane appear to reproduce experimental VLE data just as reliably as the Potoff and TAMie models. Errington’s force field is a four-parameter AUA Exponential-6 model while TraPPE-UA2 is a three-parameter AUA LJ 12-6 model. In general, the increased accuracy of the Potoff, TAMie, Errington, and TraPPE-UA2 force fields compared to the TraPPE-UA and NERD models has come at the cost of additional model parameters. Due to the increased accuracy and complexity (i.e. number of parameters) of modern force fields, sophisticated high-dimensional optimization, 7 multi-objective Pareto front, 8–10 and uncertainty quantification (UQ) methods 11–14 should play a key role in force field development. However, these methods are not tractable when molecular simulation is performed at each step of the algorithm, as this may necessitate O(102 to 106 ) simulations. 15 For this reason, surrogate models are essential to render these methods computationally feasible. In this context, “surrogate model” is an umbrella term that covers all methods that predict simulation results without requiring direct simulation using the given force field. The obvious benefit of a surrogate model is the reduction of computational cost. Typical surrogate models are a few orders of magnitude faster than direct simulation. For this reason, UQ and Pareto front studies rely heavily on surrogate models to replace 3 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

molecular simulation. 8–14 The first type of surrogate model is a mathematical or statistical model (sometimes referred to as a meta-model 16–18 ). These require little or no understanding of the underlying physics or what each force field parameter means on a molecular level. Instead, they simply interpolate and smooth the simulation output in the corresponding parameter space. 11,14,19 Although meta-models are computationally cheap to evaluate, developing reliable meta-models can be an arduous task and may require large amounts of molecular simulations. In addition, while these surrogate models are typically reliable for interpolation, extrapolation is dubious for certain model forms. The second type of surrogate model is an analytic equation-of-state model. Equationof-state models relate the force field parameters to different types of engineering equationsof-state. The simplest equation-of-state surrogate model is likely the corresponding states model. Typically, a corresponding states model is a correlation fit to reduced properties (simulation output scaled by the force field parameters). Examples of this type of surrogate model can be found for the single-site Lennard-Jones fluid (both with tail corrections and truncated 20–22 ) as well as the two-site Lennard-Jones plus point quadrupole. 23 More sophisticated equation-of-state surrogate models are PC-SAFT 24 and SAFT-γ. 25,26 The PCSAFT surrogate model has been used extensively to develop the TAMie force field by relating the Mie parameters to parameters in the PC-SAFT equation-of-state. 4,27,28 In this study, we investigate a third type of surrogate model, namely, configurationsampling-based surrogate models. Configuration-sampling-based surrogate models rely on atomic configurations that are sampled by simulating the reference force field(s) to predict observables for a non-simulated force field. The primary assumption is that the reference force field(s) has a distribution of configurations similar to the desired force field. For this reason, the reference force field(s) plays a significant role in the accuracy of this class of surrogate models and must be chosen judiciously. A key advantage 4 ACS Paragon Plus Environment

Page 4 of 65

Page 5 of 65 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

in configuration-sampling-based surrogate models is that they are compatible with any non-bonded functional form and can be used with both all-atom (AA) and coarse-grained (UA, AUA, etc.) force fields, whereas the SAFT-γ and PC-SAFT surrogate models are limited to coarse-grained models. The two configuration-sampling-based surrogate models that we examine in this study are Multistate Bennett Acceptance Ratio (MBAR) and Pair Correlation Function Rescaling (PCFR). While MBAR is a well-established method, PCFR is a novel approach set forth in this study. Section 2 discusses the methodology, starting with the force fields, the simulation conditions, data analysis, and surrogate model derivation. Section 3 compares the results for MBAR and PCFR. This comparison is made for simple systems, namely, united-atom representations of ethane, hexafluoroethane, propane, n-butane, and n-octane, but it is applicable to any compound and force field. Section 4 discusses recommendations and limitations regarding the implementation of these surrogate models. Section 4 also provides an algorithm for parameterization of non-bonded potentials which is demonstrated in Section 5. Finally, Section 6 summarizes the conclusions from this work.

2 Methods 2.1

Force Field

We emphasize that the methodology proposed in Sections 2.2-2.3 is applicable to any force field. Specifically, the configuration-sampling-based surrogate models can be applied to united-atom (UA) or all-atom (AA) based force fields and to LJ 12-6, Mie λ-6, Exp-6, or any other non-bonded functional form. In this study, however, we focus on a specific subset of force fields, namely, UA Mie λ-6. This model type was selected as it has received significant attention in recent years for development of accurate hydrocarbon

5 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 65

force fields. 1,4,29–31 The Mie λ-6 is a three-parameter non-bonded central potential of the form:

uvdw (, σ, λ; r) =



λ λ−6

6    λ−6    λ σ λ  σ 6  − 6 r r

(1)

where uvdw is the van der Waals interaction, σ is the distance (r) where uvdw = 0, − is the   ∂uvdw vdw energy of the potential at the minimum i.e. u = − and ∂r = 0 for r = rmin , and λ is the repulsive exponent. Note that the Mie potential reduces to the LJ 12-6 potential for λ = 12. Therefore, Equation 1 can be considered a generalized Lennard-Jones where the repulsive exponent is a parameter. Although an attractive exponent of 6 has a strong theoretical basis, λ = 12 is a historical artifact that was chosen primarily for computational purposes. 32 For the same reason (i.e. computational efficiency), a common practice is to use integer values of λ in Equation 1. Non-bonded interactions between two different site types (i.e. cross-interactions) are determined using Lorentz-Berthelot combining rules 32 for  and σ with an arithmetic mean for the repulsive exponent (λ) (as recommended by Potoff and Bernard-Brunel 1 ):

ij =



ii jj

(2)

σij =

σii + σjj 2

(3)

λij =

λii + λjj 2

(4)

where the ij subscript refers to cross-interactions and the subscripts ii and jj refer to same-site interactions. Section SI.I.3 of the Supporting Information provides the TraPPEUA and Potoff non-bonded parameters for both same-site interactions and cross-interactions. We use the same intramolecular potential as the TraPPE-UA and Potoff force fields, which was in large part adopted from the well-known Optimized Potential for Liquid 6 ACS Paragon Plus Environment

Page 7 of 65 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Simulations (OPLS-UA) force field. 33,34 Specifically, the simulations performed in this study use fixed bond-lengths, harmonic angular potential, and Fourier series for the dihedral torsional interactions. To be consistent with these force fields, non-bonded interactions between united-atom sites in the same molecule are only included if they are separated by at least four neighboring sites (i.e. we exclude 1-2, 1-3, and 1-4 non-bonded intramolecular interactions). Section SI.I.3 of the Supporting Information provides the equations and parameters for intramolecular interactions.

2.2

Isothermal Isochoric Thermodynamic Integration

In this study, isothermal isochoric (ITIC) thermodynamic integration 35 is used to desat sat termine ρsat l , ρv , and Pv for each force field and molecule. The equations for ITIC are:

Adep = Rg T sat

ρsat v



ρsat l

Z 0

ρsat l

Z −1 ∂ρ|T =T IT + ρ

 exp

Z

T sat

U

dep

T IT

 ∂

1 Rg T

 |ρ=ρsat l

Adep 3 sat2 + Zlsat − 1 − 2B2 ρsat v − B3 ρv sat Rg T 2

2

sat sat sat Pvsat ≈ (1 + B2 ρsat v + B3 ρv )ρv Rg T

Zlsat =

Pvsat sat ρsat l Rg T

(5)

 (6)

(7)

(8)

where Adep ≡ A − Aig is the Helmholtz free energy departure from ideal gas for temperature (T ) equal to the saturation temperature (T sat ) and density (ρ) equal to the saturated dep liquid density (ρsat ≡ U − U ig is the internal energy departure, Zlsat is the saturated l ), U

liquid compressibility factor (Z), B2 is the second virial coefficient, B3 is the third virial coefficient, T IT is the isothermal temperature, and Rg is the universal gas constant. (Note 7 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 65

that our definitions for Adep and U dep are slightly different than those used by Razavi, 35 where the energy departures are dimensionless, but are consistent with Elliott and Lira. 36 ) The ITIC equations are solved iteratively to ensure self-consistency. Specifically, the Zlsat value calculated in Equation 8 is then used to compute a new value of T sat by interpolating the simulation results for Z along the corresponding isochore. The T sat value in Equation 5 is updated and

Adep Rg T sat

is recomputed. The new values for Zlsat and

Adep Rg T sat

are

sat sat then used in Equation 6 to solve for ρsat are recalv . Likewise, Pv and, subsequently, Zl

culated using Equations 7 and 8, respectively. This process is repeated until the value of sat Zlsat (or alternatively T sat , ρsat v , or Pv ) has converged to within a predefined tolerance.

The B2 and B3 values found in Equations 6-7 can be determined in several different ways: using experimental data or correlations fit to data (i.e. REFPROP, 37 ThermoData Engine (TDE), 38 etc.), calculated with Mayer-sampling Monte Carlo, 39,40 or obtained by extrapolating low density simulation results. 35 In this study, we utilize the B2 and B3 values from REFPROP, primarily for simplicity. This decision was made after first valisat dating that the ITIC ρsat l and Pv values for n-alkanes (calculated using REFPROP B2 and

B3 ) are consistent with the literature values reported using Gibbs Ensemble Monte Carlo (GEMC) and Grand Canonical Monte Carlo (GCMC) for the TraPPE-UA and Potoff force fields, respectively 1,41 (see Section SI.II.3 of the Supporting Information). We also verified that the ITIC results for n-octane do not differ significantly when using Mayer-sampling Monte Carlo B2 and B3 values reported by Schultz and Kofke 39 instead of the REFPROP B2 and B3 values (see Section SI.II.3 of the Supporting Information). ITIC requires on the order of 10 NVT (constant number of molecules, constant volume, constant temperature) simulations along a supercritical isotherm, i.e. T IT > Tc (where Tc is the experimental critical temperature). As recommended by Razavi, 35 we use a value around 1.2 for the isotherm reduced temperature (TrIT ≡

T IT ), Tc

i.e. T IT ≈ 1.2Tc . Two or

three additional NVT simulations are performed along different isochores that intersect 8 ACS Paragon Plus Environment

Page 9 of 65 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

the saturated liquid curve (ρ = ρsat l ). Thus, each of the ITIC state points corresponds to either a liquid or supercritical fluid. The specific state points simulated in this study are provided in Section SI.I.1 of the Supporting Information. Section SI.II.3 of the Supporting Information also contains an example of the ITIC data analysis. One advantage of ITIC is that all the simulations are performed in the NVT ensemble. In fact, although ITIC requires roughly three simulations for each saturation temperature, the total simulation time is typically comparable to the traditional VLE methods, i.e. GEMC or GCMC. This is primarily because NVT systems converge quickly, as they do not require expensive particle insertion/deletion or volume fluctuation moves. NVT simulations should also foster reproducibility, as there are fewer user decisions that can introduce error (although the choice of thermostat can be important 42 ). By contrast, it was demonstrated recently that simulation practitioners struggled to generate reproducible results in the NPT ensemble. 43 Furthermore, the NVT ensemble is amenable to both molecular dynamics (MD) and Monte Carlo (MC) simulations. Thus, practitioners can implement ITIC with their preferred simulation software. We use GROMACS, 44 as it is an extremely fast, parallelized, graphics processing unit (GPU) optimized, open-source, MD code. (Sample input (.mdp, .top, .gro) files are provided in Section SI.I.4 of the Supporting Information.) Obtaining VLE properties from MD simulations has some advantages. For example, MD methods are ideal for highly branched compounds whereas traditional GEMC or GCMC methods may struggle to reach equilibrium due to the low acceptance rate of particle insertions. 45 For the same reason, GEMC and GCMC are typically limited to Trsat > 0.7 or 0.6 (depending on the molecular structure and ρsat l ) whereas ITIC can provide accurate VLE estimates for Trsat ≈ 0.45. However, while GEMC and GCMC can be used for Trsat ≈ 0.95, one disadvantage of ITIC is that Equations 6-7 require higher-order virial expansion terms sat sat (B4 , B5 , etc.) for Trsat > 0.85. 35 In this study, we use ITIC to obtain ρsat for l , ρv , and Pv

9 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

0.45 < Trsat < 0.85. sat The primary reason we employ ITIC is to calculate saturated properties (ρsat l , ρv , and

Pvsat ) from U dep and Z at specified state points (ρ − T ). Internal energies and pressures (compressibility factors) are intimately related to the non-bonded interactions and the sat sat atomic configurations. Therefore, ρsat l , ρv , and Pv can be predicted for any non-bonded

interactions by combining configuration-sampling-based surrogate models for predicting sat U dep and Z with the ITIC analysis, Equations 5-8. Converting U dep and Z to ρsat l , ρv , and

Pvsat is important because large amounts of evaluated experimental VLE data are available through databases, such as the Thermodynamics Research Center (TRC) source database, whereas experimental data for U dep and Z are scarce.

2.3

Surrogate Models

sat sat ITIC requires U dep and Z for each state point (ρ-T) to calculate ρsat l , ρv , and Pv . Thus, sat sat predicting ρsat for a given molecule and force field necessitates predicting l , ρv , and Pv

U dep and Z. Therefore, the aim of the surrogate models presented in this section is to predict U dep and Z for a given state point with as little direct simulation as possible. The configuration-sampling-based surrogate models (MBAR and PCFR) are well-suited for the task at hand as energies and pressures are calculated directly from the coordinates of interacting particles. In such surrogate models, we carry out a set of simulations at the ITIC conditions using the reference force field(s), and the configurations obtained from the reference force field(s) are used to estimate the VLE properties for a non-simulated force field.

10 ACS Paragon Plus Environment

Page 10 of 65

Page 11 of 65 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

2.3.1 Multistate Bennett Acceptance Ratio Importance sampling is a commonly used statistical technique for computing averages of properties for one model by reweighting configurations sampled with another model based on the ratio of probabilities for the two models. In chemistry and chemical physics, importance sampling from one or a set of simulations to another set of simulation conditions can be implemented using the MBAR algorithm. 46,47 With MBAR the expectation hO(θ)i for force field (θ) of any given observable (O) can be expressed as:

hO(θ)i =

N X

(9)

O(xn ; θ)Wn (θ)

n=1

where xn are configurations sampled from one or more reference force fields (θref ), O(xn ; θ) is the observable value using force field θ with configurations xn , and Wn (θ) is the weight of the nth configuration using force field θ, calculated by using:

Wn (θ) =

exp[fˆ(θ) − u(xn ; θ)] K P

(10)

Nk exp[fˆ(θref,k ) − u(xn ; θref,k )]

k=1

where the reduced free energies (fˆ(θ)) are calculated with:

fˆ(θ) = − ln

N X n=1

exp[−u(xn ; θ)] K P

(11)

Nk exp[fˆ(θref,k ) − u(xn ; θref,k )]

k=1

where K is the number of reference force fields, N =

P

k

Nk is the total number of

snapshots for all K reference force fields, Nk are the total number of snapshots from the k th reference force field, θref,k is the k th reference (i.e. simulated) force field, and u(xn ; θ) = βU (xn ; θ) is the reduced potential energy evaluated with θ for configuration

11 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

xn where β =

1 kB T

Page 12 of 65

and kB is the Boltzmann constant.

Note that fˆ(θref,k ) is required to evaluate the denominator of Equations 10-11. The values for fˆ(θref,k ) are obtained by solving a system of K equations for self-consistency. Specifically, an initial guess for fˆ(θref,k ) is used to evaluate Equation 11 with θ = θref,k to obtain updated values of fˆ(θref,k ). This process is repeated until the values for fˆ(θref,k ) converge to within a desired tolerance. Although solving the MBAR system of equations for self-consistency may require several iterations, fortunately, once this process has been performed fˆ(θ) (for an arbitrary θ) is evaluated without further iteration. For the specific case of predicting U dep and Z for a non-sampled force field, expressed by the set of force field parameters θ, the MBAR-based estimators for the departure internal energy and compressibility can be written as:

hU

dep

(θ)i =

N X

U dep (xn ; θ)Wn (θ)

(12)

Z(xn ; θ)Wn (θ)

(13)

n=1

hZ(θ)i =

N X n=1

where the energies and forces are computed using force field θ for each configuration (xn ) to determine U dep (xn ; θ) and Z(xn ; θ) (from the virial pressure 32 ), respectively, while the weights (Wn (θ)) are again calculated using Equations 10-11. The performance of MBAR depends strongly on good phase space overlap, meaning that the configurations sampled by the reference force field(s) must represent a significant portion of the “true” configurations that the non-simulated force field would sample. 48 If the configurational overlap is small, the MBAR estimates are often dominated by a few configurations, which are likely not representative of the ensemble that would be generated by direct simulation of force field θ. The amount of overlap can be quantified 12 ACS Paragon Plus Environment

Page 13 of 65 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

by the number of effective samples (Neff ), 49 using Kish’s formula:

Neff

P ( n Wn )2 = P 2 n Wn

(14)

P which reduces to Neff = ( n Wn2 )−1 when the weights are normalized. This has the property that when the weights are equal, Neff = N , when all but one weight is negligible, Neff ≈ 1, and behaves appropriately for intermediate cases. In the case of poor overlap (Neff ≈ 1), the predicted values of MBAR will demonstrate a strong bias and the uncertainties will likely be underestimated by the MBAR covariance matrix. We discuss two methods for overcoming poor phase space overlap, namely, pair correlation function rescaling (PCFR) and configuration mapping. Configuration mapping is an attempt to predict how the probabilities of each configuration would change when changing from the reference force field (θref ) to the non-simulated force field (θ). The clearest example of this is the theory of corresponding states. 50 For the single-site LennardJones fluid, the NVT simulation results of θref can be exactly weighted to predict the sim3

ulation properties for a different value of σ at a new volume, V = Vref σσ3 , where Vref is ref

the initial volume sampled using θref . This is because the energies (and thus the Boltzmann weights) will be exactly the same if we simply scale all of the coordinates by

σ3 3 . σref

Therefore, we can reuse (at least for that volume) all of the configurations to calculate new properties. If there is more than one length scale, however, the same corresponding states tactics do not work. A general formalism for creating new, more suitable overlap from the old configurations is to use configuration mapping. 51,52 This approach can drastically increase the efficiency of calculating properties using configurations sampled from similar models as in the case of similar rigid water models or dipolar molecule lengths. 51 One important key, of course, is finding a simple transformation rule. The coordinate transformation free

13 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 65

energy equations are true for any transformation, but will only be useful if they increase the overlap between the non-sampled and reference configurations. In the case of rigid water models and dipolar molecules, a linear transformation between shapes which preserves the center of mass has been shown to work very well. 51 In the case of small molecule hydrocarbons, however, the simplest transformation would be one which scales the centers of mass while keeping the intramolecular distances the same. This, however, gives properties of the new model at a new volume, which is not useful if we are performing canonical simulations. It is not immediately clear if there is a simple remapping between simulations with different σ at the same volume, but it seems unlikely, since the configurations must shift radial distributions in an inherently coupled way. We defer the development of a more general mapping formalism to a future study. We instead propose an alternative method, namely, PCFR. In essence, PCFR is an approximate way to map the coordinates for constant volume. We derive PCFR in the following section.

2.3.2 Pair Correlation Function Rescaling PCFR is a method to predict U dep and Z (the two necessary quantities for ITIC) for a non-simulated force field. Similar to MBAR, PCFR makes use of the configurations sampled from direct simulation of a reference force field (θref ) to predict these properties without direct simulation of a modified force field (θ). To derive the PCFR equations, we first assume that θref and θ have the same intramolecular and Coulombic potentials. The departure internal energy and compressibility factors for a non-simulated force field can then be expressed as:

hU dep (θ)i = hU dep (θref )i + U vdw (θ) − U vdw (θref )

14 ACS Paragon Plus Environment

(15)

Page 15 of 65 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

hZ(θ)i = hZ(θref )i + Z vdw (θ) − Z vdw (θref )

(16)

where h...i denotes an ensemble average and U vdw and Z vdw are the van der Waals (i.e. the non-bonded, non-Coulombic) contributions to U and Z, respectively. Subsequently, for pair-wise additive central potentials we obtain the energy and compressibility equations that relate the pair correlation function (PCF) to U vdw and Z vdw for a polyatomic molecule: 32 U

vdw

(θ) = 2πρRg

N NS Z ∞ S −1 X X i=1

Z

vdw

j>i

2 uvdw ij (θ; rij )gij (θ; rij )rij ∂rij

(17)

0

NS −1 X NS Z ∞ ∂uvdw 2πρ X ij (θ; rij ) 3 (θ) = − gij (θ; rij )rij ∂rij 3T i=1 j>i 0 ∂rij

(18)

where ρ is the number density (units of molecules per volume), NS is the number of sites in a molecule, uvdw ij (θ; rij ) is the van der Waals potential (units of kelvin, i.e. energy divided by the Boltzmann constant) between sites i and j with force field θ, and gij (θ; rij ) is the site-site radial distribution function (or “pair correlation function”) between sites i and j obtained with θ. Substitution of Equations 17 and 18 into Equations 15 and 16, respectively, and combining the two integrals for U vdw and Z vdw gives:

hU dep (θ)i = hU dep (θref )i + 2πρRg

N NS Z ∞ S −1 X X i=1

j>i



 2 vdw uvdw ij (θ; rij )gij (θ; rij ) − uij (θref ; rij )gij (θref ; rij ) rij ∂rij (19)

0

hZ(θ)i = hZ(θref )i " # NS Z ∞ NS −1 X vdw ∂uvdw (θ; r ) ∂u (θ ; r ) 2πρ X ij ref ij ij ij 3 ∂rij (20) gij (θ; rij ) − gij (θref ; rij ) rij − 3T i=1 j>i 0 ∂rij ∂rij

15 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 65

Equations 19 and 20 are exact. Unfortunately, while gij (θref ; rij ) is obtained from direct simulation of θref , the PCF for the non-simulated force field (gij (θ; rij )) is unknown a priori. PCFR attempts to estimate gij (θ; rij ) and, thereby, predict the properties U dep and Z for different non-bonded interactions. There are a number of possible approximations that could be made. The approach we recommend (and use in Section 3) is to perform a density expansion for the PCF: gij (θ; r, ρ, T ) = g0,ij (θ; r, T ) + ρg1,ij (θ; r, T ) + ρ2 g2,ij (θ; r, T ) + ...

(21)

where the zeroth order term (g0 ) is known analytically to be:  g0,ij (θ; rij ) ≡ exp

−uij (θ; rij ) T

 (22)

By substituting Equation 21 into Equations 17-18 and separating the integration by each term of gij , we can express U vdw and Z vdw as a virial expansion:

U

vdw

(θ) =

∞ X

ρh Uhvdw

(23)

ρh Zhvdw

(24)

h=0

Z

vdw

(θ) =

∞ X h=0

where Uhvdw and Zhvdw are obtained, in principle, by integrating Equations 17-18 after substituting gh,ij for gij . By assuming that the higher order (h > 0) contributions to U vdw and Z vdw for θ and θref are equal (or at least negligibly different) U dep (θ) and Z(θ) can be

16 ACS Paragon Plus Environment

Page 17 of 65 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

estimated using:

hU dep (θ)i ≈ hU dep (θref )i N NS Z ∞ S −1 X X  vdw  2 + 2πρRg uij (θ; rij )g0,ij (θ; rij ) − uvdw ij (θref ; rij )g0,ij (θref ; rij ) rij ∂rij (25) i=1

j>i

0

hZ(θ)i ≈ hZ(θref )i " # NS −1 X NS Z ∞ ∂uvdw ∂uvdw 2πρ X ij (θ; rij ) ij (θref ; rij ) 3 ∂rij (26) g0,ij (θ; rij ) − g0,ij (θref ; rij ) rij − 3T i=1 j>i 0 ∂rij ∂rij The primary advantage of this assumption is that, because g0,ij (θ; rij ) is calculated with Equation 22, the integrals in Equations 25-26 can be evaluated without performing any additional simulations. However, the PCFR results presented in Section 3 suggest that Equations 25-26 adequately approximate U dep but not Z, i.e. Uhvdw (θ) ≈ Uhvdw (θref ) while Zhvdw (θ) 6≈ Zhvdw (θref ) for h > 0 when θ 6≈ θref . PCFR is orders of magnitude faster than MBAR since PCFR only requires a numerical integration of Equations 25-26 while MBAR requires energy and force “rerun” calculations. Although “rerun” calculations are typically orders of magnitude faster than direct simulation, MBAR may still be too costly if an optimization or uncertainty quantification (UQ) method requires on the order of 104 to 106 MBAR evaluations. However, the computational cost of MBAR is approximately the same as PCFR when basis functions are implemented. 48 Basis functions can be constructed if a linear relationship exists between the non-bonded parameters and the non-bonded energies and forces. For example, basis functions are amenable to Mie λ-6 potentials because the energies are linearly dependent on r−λ and r−6 while the forces are linearly dependent on r(−λ−1) and r−7 (see Section SI.IV in the Supporting Information). By contrast, basis functions are not compatible with 17 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

the Exponential-6 function due to the non-linearity of the exponential term. Alternative PCFR approaches that we investigated, but found less reliable than Equations 25-26, are included in the Appendix. One such method is the “constant PCF” approach. “Constant PCF” assumes that the pair correlation function for the reference force field is equal to the PCF for the non-simulated force field. As demonstrated in the Appendix, this is mathematically identical to assuming the weights of each configuration are equal in Equations 12-13. Therefore, a comparison between PCFR and MBAR with “constant PCF” (or “equal weights”) quantifies the improvement due to rescaling and reweighting configurations, respectively. For this reason, we include the “constant PCF” results in Section 3 to provide a common basis for MBAR and the recommended PCFR approach (Equations 25-26). Furthermore, “constant PCF” can provide valuable insight due to its conceptual and mathematical simplicity.

3 Results In this section we compare MBAR and PCFR (Equations 25-26) based on their abilsat ity to predict the thermodynamic properties U dep , Z, ρsat l , and Pv . First, we compare the

predicted U dep and Z values with direct molecular simulation results. Subsequently, we combine ITIC with MBAR and PCFR and the predicted values for ρsat and Pvsat are coml pared with direct molecular simulation results. The systems considered are the united-atom representations of ethane (C2 H6 ), hexafluoroethane (C2 F6 ), propane (C3 H8 ), n-butane (C4 H10 ), and n-octane (C8 H18 ). A detailed comparison of the surrogate models is only presented for ethane. In the case of ethane, we perform two types of test for MBAR and PCFR. For the first test, the non-simulated force field uses the same non-bonded function (i.e. the LJ 12-6) as the reference force field. We refer to this test as the “constant model” 18 ACS Paragon Plus Environment

Page 18 of 65

Page 19 of 65 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

test. For the second test, the non-simulated force field uses a different non-bonded function (i.e. the Mie λ-6 or Exp-6) than the reference force field. We refer to this test as the “perturbed model” test. Since this study focuses on the Mie λ-6 potential, the “perturbed model” test corresponds to varying the value of λ. For both tests, we compare how well MBAR performs with only a single reference and with multiple references. The discussion for hexafluoroethane and the longer n-alkanes in Section 3.2 is limited but complements the ethane discussion in Section 3.1. Specifically, Section 3.2 compares how well MBAR and PCFR predict the Potoff λ-6 potential when the TraPPE-UA LJ 12-6 potential is used as a single reference force field. Hexafluoroethane is included since it provides an example for the “perturbed model” test where λ  12. The longer n-alkanes are included to demonstrate that the results are not specific to two-site molecules.

3.1 Ethane In this section, we compare the accuracy of MBAR and PCFR when predicting U dep , sat for ethane. First, in Section 3.1.1 we investigate how well these surroZ, ρsat l , and Pv

gate models predict the quantities of interest when there is only a single reference force field, namely, the TraPPE-UA LJ 12-6 model. Section 3.1.1 allows for a fair comparison between PCFR and MBAR since PCFR is not compatible with multiple references. More importantly, our initial goal is to determine if the optimal force field parameters can be obtained by performing direct simulations with only a single reference force field. This would enable rapid reparameterization of the non-bonded interactions for any existing force field without performing additional molecular simulations. Unfortunately, the results in section 3.1.1 demonstrate that this ideal scenario is not obtainable. For this reason, in Section 3.1.2 we investigate the improvement that is possible for MBAR when using multiple references. Then, in Section 3.1.3 we demonstrate how well MBAR and PCFR perform when used in connection with ITIC. 19 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

3.1.1 Single Reference Figure 1 presents the results from the “constant model” single reference test. Specifically, we use the TraPPE-UA LJ 12-6 model for ethane as our reference force field from which we sample configurations. We predict the departure internal energy (U dep ) and compressibility factor (Z) for a wide range of CH3 LJ parameters (88 < /K< 108, 0.365 < σ/nm< 0.385) using the various configuration-sampling-based surrogate models (MBAR, PCFR, and “constant PCF”). Note that the reference , σ, and λ are at the center of the investigated parameter space, i.e. ref = 98 K, σref = 0.375 nm, and λref = 12. Figure 1 Panels a)-b) and c)-d) compare the direct simulation results for the departure internal energy (U dep ≡ U − U ig ) and compressibility factor (Z), respectively, with those predicted from the various surrogate models. Figure 1 includes parity plots as well as embedded deviation plots. Panels a) and c) compare each surrogate model while Panels b) and d) present the results only for MBAR. Note that the MBAR and PCFR deviations are smaller than the “constant PCF” deviations. This shows the improvement due to reweighting (MBAR) or “rescaling” (PCFR) the reference configurations. Panel a) demonstrates that PCFR is superior to MBAR for estimating U dep , while Panel c) shows that neither method is particularly robust for estimating Z. However, Panels b) and d) demonstrate that with a sufficient number of effective samples (Neff ) the MBAR estimates for U dep and Z are typically reliable. For example, note that in Panels a) and c) the MBAR estimates follow the parity line very closely for larger values of Neff . This makes intuitive sense because Neff is intimately related to the degree of “overlap”, i.e. the probability that configurations sampled by the reference force field would be sampled by the non-simulated force field. To quantify this observation, the insets in Panels b) and d) suggest that for Neff > 50 (or log10 (Neff ) > 1.7) the percent deviation in U dep and the deviation in Z are less than 1 % and 0.3, respectively. Figure 2 presents the “perturbed model” single reference test results for ethane. Fig20 ACS Paragon Plus Environment

Page 20 of 65

Constant Model: LJ 12-6, 88 Constant PCF MBAR PCFR Parity

6

14 16 16

14

12

U dep

(

kJ mol

)

10

5

4

10

20 10 8 6 , Direct Simulation

4

8

2

12

4

0 5

4

2

0

2 4 Z, Direct Simulation

6

8

10 3.0

d)

2.5

6 8

10

10

(

12 14 16 16

14

Neff 500

50

5 0

4

6

2.0

Neff 500

50

950

4

0 4

5 4

8

2

4

2

)

0

0

2 4 Z, Direct Simulation

1.5 1.0

2

0.5

2

10

12 10 8 6 kJ U dep mol , Direct Simulation (

950

Z, Predicted with MBAR

8

Percent Deviation

, Predicted with MBAR )

0

0

2 b)

kJ mol

Direct Simulation 0 5

Deviation

U dep

12

4

Z, Predicted

Percent Deviation

10

/nm 0.385

8

Direct Simulation 15 10 5

8

(

kJ mol

)

, Predicted

4

/K 108, 0.365 12 c)

log10(Neff)

ref = TraPPE UA

2 a)

U dep

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Deviation

Page 21 of 65

6

8

10 0.0

Figure 1: PCFR performs well for U dep but not for Z while MBAR accurately predicts both properties when Neff > 50 for “constant model” test of ethane. The parity plots compare direct simulation values with those predicted using a constant PCF (equal weights), MBAR, and PCFR for λref = λ = 12. Panels a) and b) correspond to departure internal energy (U dep ) while Panels c) and d) correspond to compressibility factor (Z). Deviation plots are embedded, where percent deviation and deviation are plotted for U dep and Z, respectively. Panels b) and d) focus on the MBAR results where the color map corresponds to the number of effective samples (on a log10 scale). ure 2 is analogous to Figure 1, where Figures 1 and 2 present the “constant model” (i.e. λref = λ) and “perturbed model” (i.e. λref 6= λ = 16) test results, respectively. In both tests we use the TraPPE-UA LJ 12-6 model as our reference force field from which we sample configurations. For the “perturbed model” test, the parameter space over which the comparison is performed is shifted to higher values of  (108 < /K< 128) since Potoff and

21 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

Bernard-Brunel 1 demonstrated that for ethane the optimal CH3 increases with increasing λCH3 . (Note that the Potoff CH3 parameter set of  = 121.25 K, σ = 0.3783 nm, and λ = 16 is near the center of the investigated parameter space.) We again compare the surrogate model estimates of U dep and Z with those obtained by performing direct simulations with each Mie 16-6 parameter set. Perturbed Model: Mie 16-6, 108 /K 128, 0.365 Constant PCF 20 c) MBAR PCFR 15 Parity

Percent Deviation

10

14 16 16

14

0

0

40

10 4

2

Direct Simulation 0 5

10

4

2

0

)

0

2 4 6 Z, Direct Simulation

8

10

12 3.0

20 d)

4

2.5

15

8

10

10

(

12 14 16 16

14

12

U dep

(

kJ mol

)

Neff

10

Z, Predicted with MBAR

6

Percent Deviation

, Predicted with MBAR )

20

5

2 b)

kJ mol

5

20

12 10 8 6 kJ U dep mol , Direct Simulation (

10

50

0 10

2.0

10 5 0 5

20 30

10

10 8 6 , Direct Simulation

4

2

Neff

10 15 12 9 6 3 0 3

50

4

2

0

2 4 6 Z, Direct Simulation

1.5 1.0

Deviation

U dep

12

Z, Predicted

Direct Simulation 15 10 5

8

log10(Neff)

6

(

kJ mol

)

, Predicted

4

/nm 0.385

Deviation

ref = TraPPE UA

2 a)

U dep

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 65

0.5 8

10

12 0.0

Figure 2: Similar to the “constant model” results presented in Figure 1, PCFR accurately predicts U dep but not Z for the “perturbed model” test of ethane. MBAR performs poorly because very few state points have Neff > 50 for λref 6= λ = 16. The panels and symbols are the same as Figure 1. Panel a) of Figure 2 demonstrates that, similar to the LJ 12-6 results, the PCFR method is superior to MBAR at predicting U dep , while Panel c) shows that neither PCFR or MBAR is particularly accurate at predicting Z. Panels b) and d) help explain the poor perfor22 ACS Paragon Plus Environment

Page 23 of 65 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

mance of MBAR. Note that the maximum number of effective samples for this entire parameter space is less than 100, while for most systems Neff  50. This poor “overlap” causes a single (slightly more favorable) configuration to have a weight equal to one while all other configurations have a weight of zero. When Neff ≈ 1 typically even the non-zero weight configuration is still very unfavorable. The “constant PCF” results in Figure 2 Panels a) and c) provide further insight into why MBAR performs poorly when λref 6= λ = 16. Note the strong positive bias relative to the parity line for U dep and Z “constant PCF” and MBAR. This is because the “softer” LJ 12-6 potential samples close-range configurations that result in extremely high energies and forces when recomputed with the “harder” Mie 16-6 potential. Therefore, MBAR reweighting cannot produce accurate estimates because few (if any) of the reference configurations represent a reasonable state that would be sampled from the Mie 16-6 potential. By contrast, PCFR rescales these close-range interactions to avoid unreasonable energies and forces. This is an important advantage of PCFR when λref 6≈ λ. The data depicted in Figure 3 help quantify over what range of the , σ, and λ parameter space MBAR is reliable (i.e. Neff > 50) when using a single reference parameter set. The different color contours correspond to the “constant model” (λ = 12) and the “perturbed model” (λ = 13 to 18) tests. The contours in Figure 3 represent the average ¯eff ), i.e. the mean Neff for the 19 different ITIC state points. number of effective samples (N Multiple contours are included for λ = 12 while only a single contour is provided for λ = 13 to 18 for visual clarity. Figure 3 demonstrates that Neff (and, thus, the “overlap” between force fields) depends strongly on σ and the repulsive exponent (λ) with much less dependence on . Specifically, the average Neff is greater than 50 for σ = σref = 0.375 nm, λ = λref = 12, and ¯eff > 50 for  = ref = 98 K, 88 < /K< 113, covering a range of ±15 % ref . By contrast, N λ = λref = 12, and 0.3675 < σ/nm< 0.3725, a range of only ±0.7 % σref . In addition, the 23 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

128

Average Number of Effective Samples ref = TraPPE UA

= 12 = 13 = 14 = 15 = 16 = 17 = 18

50

100

CH3

108

50

50

(K)

118

98 400

20 10 5 6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

88 0.365

0.370

0.375

CH3

(nm)

0.380

0.385

Figure 3: Average number of effective samples depends strongly on σ and λ for ethane. Specifically, large differences in σ from the reference lead to Neff < 50 while the entire λ = 15 to 18 parameter space has Neff  50. Contours represent the average Neff for λ = 12 to 18. A single reference parameter set was used (TraPPE-UA, i.e. λref = 12). ¯eff > 50 region for λ = 13, 14 is smaller than for λ = 12 and is shifted to σ < σref and N  < ref . This is especially troubling, since several studies have demonstrated that with increasing λ the optimal  increases significantly and the optimal σ rarely decreases. 1,29,30 ¯eff > 50 regions for λ = 13, 14 are not particularly useful as they do not Therefore, the N ¯eff  50 for the entire overlap with the expected optimal  and σ. Even more concerning, N ¯eff ≈ 1 for the Potoff model joint  and σ parameter space for λ = 15 to 18. For example, N ( = 121.25 K, σ = 0.3783 nm, λ = 16). These results suggest that the TraPPE-UA LJ 12-6 configurations cannot be reweighted to optimize  and σ for λ > 12. These observations make some intuitive sense as the configurations sampled from σ = 0.375 nm should be significantly different than those sampled from σ  0.375 nm or σ  0.375 nm. Likewise, larger values of λ have a steeper, more repulsive barrier which increases the “effective hard-sphere diameter” and significantly reduces the probability of close-range interactions. These results are convincing evidence that MBAR requires 24 ACS Paragon Plus Environment

Page 24 of 65

Page 25 of 65 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

more than a single reference for large changes in σ (greater than 0.0025 nm) and when λ 6= λref .

3.1.2 Multiple References MBAR is intrinsically designed for multiple reference force fields, which is one of its primary advantages. With additional (and judiciously chosen) references, the reliability of MBAR estimates improves dramatically, as demonstrated in the following discussion. By contrast, PCFR has no obvious way to combine the prediction of multiple references in a meaningful manner. We attempted to incorporate multiple references into PCFR by performing a weighted average of independent PCFR estimates where the weights depend on the difference between σ and σref . However, we did not find this approach to be beneficial. Therefore, we do not discuss PCFR in this section and focus solely on MBAR. The results presented in Section 3.1.1 demonstrate that MBAR is limited in how far it can extrapolate in parameter space. As discussed previously, MBAR does not provide accurate U dep and Z estimates when Neff  50, which is typically the case when σ and/or λ are significantly different from the σref and λref . One obvious solution to increase Neff and, thereby, improve MBAR is to sample additional configurations from multiple references. These references should be chosen to cover a wide range of parameter space while using as few as possible to limit the increase in the computational cost of additional molecular simulations. Figure 4 presents the results when multiple reference force fields are utilized with MBAR. Specifically, we performed direct simulations for nine equally spaced values of σ from 0.365-0.385 nm for the “constant model” λ = 12 with  = 98 K. The spacing corresponds to one reference σ every 0.0025 nm since we observed that Neff > 50 when σ = σref ± 0.0025 nm (recall Figure 1). For the “perturbed model” λ = 16 test, we use two 25 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

different  values of 98 and 118 K. We use the same range of σ as the “constant model” test for ref = 98 K to provide a fair comparison between the two tests for multiple references. For ref = 118 K (the center value of the Mie 16-6 parameter space), we modify the σ range slightly by including three additional larger σref values (0.3875, 0.3900, and 0.3925 nm). Recall that in Figure 3 the maximum Neff for λ = 16 was observed when σ < σref . Thus, the expanded σref range with ref = 118 K results help elucidate whether MBAR can adequately predict the Mie 16-6 parameter space if the LJ 12-6 references are chosen judiciously. Panels a) and d), b) and e), and c) and f) provide the results for the “constant model”, “perturbed model” with ref = 98 K, and “perturbed model” with ref = 118 K, respectively. For the “constant model” test (Panels a) and d)), due to the increased number of references most of the LJ 12-6 parameter sets have good overlap and, thus, Neff > 50. However, it should be noted that there are a few exceptions where Neff < 50 in Panels a) and d). In each case, these correspond to high density, saturated liquids with σ = 0.385 nm and  > 107 K. By comparing Figure 4 Panels a) and d) with Figure 1 Panels b) and d), respectively, additional σref values significantly improve the MBAR “constant model” results.

26 ACS Paragon Plus Environment

Page 26 of 65

mol

)

12

U dep

14

14

250

8 12 4

2

0 Percent Deviation

(

14 16 14

50

Neff

150

2 0

Deviation

6 0

0 50

Neff

250

4

2.0

1.5

2

10

4

2

0

2 Simulation 4 Z, Direct

6

8

10 1.0

6 4

0

2 0 2

4

2

6

50

Neff

150

0.5

1.0

4

12 kJ 10 8 6 U dep mol , Direct Simulation (

8

f) Multiple ref: = 118 K, 0.365 /nm 0.3925 Model: Mie 16-6, 8 Perturbed 108 /K 128, 0.365 /nm 0.385

2 16

6

10

6

12

0.1

0

2 c) Multiple ref: = 118 K, 0.365 /nm 0.3925 Perturbed Model: Mie 16-6, 4 108 /K 128, 0.365 /nm 0.385

10

0.0

5

)

8

2.5

0.1

0.2 0 Z, Direct Simulation 2 4

2

5

12 kJ 10 8 6 U dep mol , Direct Simulation (

, Predicted with MBAR

Neff

16 16

4

Neff 1000 2000 3000

e) Multiple ref: = 98 K, 0.365 /nm 0.385 Perturbed Model: Mie 16-6, 10 108 /K 128, 0.365 /nm 0.385 Z, Predicted with MBAR

10

2 0 2 4

0 50

4

2

6

16

)

4

2 b) Multiple ref: = 98 K, 0.365 /nm 0.385 Perturbed Model: Mie 16-6, 4 108 /K 128, 0.365 /nm 0.385

8

0

50

log10(Neff)

(

0.2

Deviation

U dep

1 10 8 6 , Direct Simulation

Percent Deviation

, Predicted with MBAR )

12 kJ

(

kJ mol

14

2

2

Z, Predicted with MBAR

U dep

14

0

4

Deviation

12

50

Neff 1000 2000 3000

Z, Predicted with MBAR

Percent Deviation

)

10

(

kJ mol

1

16

kJ mol

d) Multiple ref: = 98 K, 0.365 /nm 0.385 Constant Model: LJ 12-6, 6 88 /K 108, 0.365 /nm 0.385

6 8

3.0

8

2 a) Multiple ref: = 98 K, 0.365 /nm 0.385 Constant Model: LJ 12-6, 4 88 /K 108, 0.365 /nm 0.385

16

U dep

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

, Predicted with MBAR

Page 27 of 65

0.5 0.0 0.5 1.0

4

2

0

)

2 Simulation 4 Z, Direct

6

8

10 0.0

Figure 4: Multiple σref values significantly improve the “constant model” (i.e. λref = λ = 12) results for ethane (see Panels a) and d)). The “perturbed model” (i.e. λref 6= λ = 16) results are only moderately improved when ref = 98 K (see Panels b) and e)) while ref = 118 K provides substantial improvement (see Panels c) and f)), although large deviations still exist at several state points where Neff  50. Panels a)-c) and d)-f) correspond to U dep and Z, respectively. The parity plots, embedded deviation plots, and color map are the same as Figure 1 Panels b) and d). 27 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

By contrast, MBAR still performs quite poorly for the “perturbed model” (Mie 16-6) potential despite the increased number of references (see Figure 4 Panels b), c), e) and f)). In Panels b) and e) this is somewhat expected since ref = 98 K, while the non-simulated  values range from 108-128 K. As seen in Panels c) and f), one way to improve the performance of MBAR for the perturbed model is to use ref = 118 K. However, notice that Neff in Panels c) and f) are still much less than those for the constant model results (Panels a) and d)). Therefore, sampling directly from λref = 16 with multiple σref (and a reasonable value of ref ) is the only way to ensure that Neff > 50. In Section 4 we propose how PCFR can be used to assist in choosing ref and σref for a given λref .

3.1.3 Vapor-Liquid Equilibria The comparison between MBAR and PCFR in Sections 3.1.1-3.1.2 has focused on U dep and Z. However, for the purpose of force field optimization, the properties of interest are most likely ρsat and Pvsat (U dep could also be of interest as a substitute for enthalpy of l vaporization and/or heat capacity). For this reason, in this section we compare how well MBAR and PCFR predict saturation properties when used in conjunction with ITIC. Figure 5 plots the root-mean-square (RMS) with respect to REFPROP values for the sat “constant model” test. Panels a) and b) plot the RMS for ρsat l and log10 (Pv ), respectively.

ρsat and Pvsat are computed with ITIC using the U dep and Z values obtained from either l direct simulation or the surrogate models. Both the single and multiple reference results are included for MBAR. For visual clarity, only the first two contours are included for the single reference (θref = θTraPPE ) results of MBAR and PCFR. The primary conclusion from Figure 5 is that the “MBAR with multiple references” RMS contours for ρsat and log10 (Pvsat ) are nearly identical to those obtained from direct l simulation. By contrast, “MBAR with a single reference” shows good agreement for RMS ρsat and log10 (Pvsat ) for only a small range of σ values. PCFR has the wrong shape for l 28 ACS Paragon Plus Environment

Page 28 of 65

a)

108

7700

50

40

30

88 0.365

0.370

10

5 10 110

CH3

(nm) sat (

0.1 0.050 0.0 5

0 0.1

4040

0.380

5500

0.385

)

0.20

00..1100 0 0.055 .10 0.0 0.05 0.05 0.05 0.1 0.100 0.05 0.10 0.20 0.30

93

88 0.365

30

RMS of log10 Pv

103

98

Direct Simulation MBAR multiple references MBAR single reference PCFR single reference References

5 10 2200

0.375

b) 108

(K)

(kg/m3)

20

2200 10

93

CH3

sat l

5

98

RMS of

10

CH3

(K)

103

60

5

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

5

Page 29 of 65

0.40 0.40 0.50 0.370

0.375

CH3

(nm)

0.380

0.385

Figure 5: MBAR with multiple references predicts ethane ρsat and Pvsat more accurately l than PCFR and MBAR with a single reference for λref = λ = 12. Contours of the rootmean-square (RMS) relative to REFPROP values are plotted for each method and comsat pared with direct simulation. Panels a)-b) show RMS for ρsat l and log10 (Pv ), respectively. Note that the “single reference” is at the center of the plot, i.e.  = 98 K, σ = 0.375 nm. Direct simulations were performed on a 21x21 grid equally spaced between 88-108 K and 0.365-0.385 nm. the RMS ρsat contours while the RMS log10 (Pvsat ) contours for PCFR are in satisfactory l agreement. The incorrect shape of the RMS ρsat l contours for PCFR is especially concerning for Bayesian and other UQ methods that depend on having the correct local behavior near the optimum. Figure 6 plots the “perturbed model” test results, i.e. λref = 12 while λ = 16, using the same format as the “constant model” test results in Figure 5. The references are not depicted in Figure 6 because λref 6= λ = 16. Only the “MBAR multiple references” results are presented for ref = 118 K since ref = 98 K with multiple σref values showed very little improvement over a single reference when λref = 12. 29 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

a)

RMS of

128

(kg/m3)

sat l

20

20

10

10

10 20

20

10

118

20

CH3

(K)

123

113

5 5

10

10

20

20

20 108 0.365

(K)

123

0.370

108 0.365

(nm) sat (

118

113

0.375

CH3

RMS of log10 Pv

0.05 0.0 5 0.10

b) 128

CH3

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

0.10

0.10 0.05

0.380 )

0.385

0.20

0.05 0.10 0.05 0.05 0. 0.10 0.105 0

0.05 0.10

Direct Simulation MBAR multiple references, ref = 118 K MBAR single reference, ref = TraPPE UA PCFR single reference, ref = TraPPE UA Potoff, CH3 = 16 0.370

Page 30 of 65

0.375

CH3

(nm)

0.20 0.30 0.40 0.380

0.385

Figure 6: MBAR with multiple references provides reasonable (although quite noisy) estisat mates for ethane ρsat l and log10 (Pv ) for λref 6= λ = 16. Similar to the “constant model” resat sults presented in Figure 5, PCFR produces an incorrect trend for ρsat l while the log10 (Pv ) contours are reasonably accurate and follow the correct trend. MBAR with a single refand Pvsat . Contours of the root-mean-square erence produces inaccurate estimates of ρsat l (RMS) relative to REFPROP values are plotted for each method and compared with direct simulation. Panels a)-b) show RMS for ρsat and log10 (Pvsat ), respectively. The “multiple l references” are the same as Figure 4 Panels c) and f), namely, ref = 118 K, λref = 12, with 11 σref values evenly spaced between 0.365-0.3925 nm. The “single reference” is the TraPPE-UA force field (λref = 12). The Potoff parameters are included only as a visual reference. Direct simulations were performed on a 21x21 grid equally spaced between 108-128 K and 0.365-0.385 nm with λ = 16. sat Figure 6 demonstrates that the ρsat l and log10 (Pv ) RMS contours for “MBAR multiple

references” are similar to those from direct simulation while the “MBAR single reference” contours are completely unreliable for the “perturbed model” test. However, the “MBAR multiple references” contours are extremely noisy, which would render parameterization quite difficult. More importantly, the moderate improvement (compared to “MBAR single reference”) was primarily achieved by using ref = 118 K instead of the TraPPE-UA 30 ACS Paragon Plus Environment

Page 31 of 65 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

value of 98 K. Unfortunately, the optimal ref value for a different λref is not known a priori. For these reasons, regardless of what values are used for ref and σref , sampling multiple LJ sat 12-6 references is not a recommended approach for predicting ρsat l and Pv when λ 6= λref .

Although the reliability of MBAR is greatly diminished when λ 6= λref , the performance of PCFR is similar for both the “perturbed model” and the “constant model” tests. For example, the PCFR RMS contours in Figure 6 are very similar to the PCFR RMS contours in Figure 5. Specifically, although the ρsat RMS contours have an incorrect trend l with respect to  and σ, the log10 (Pvsat ) RMS contours are in close agreement with direct simulation, especially when σ ≈ σref . Furthermore, the PCFR contours are smooth (i.e. not noisy) with respect to  and σ. This suggests that PCFR can be a useful tool for parameterizing different Mie λ-6 potentials from a single LJ 12-6 reference. In fact, one of our key recommendations provided in Section 4 is that the RMS of log10 (Pvsat ) (or, alternatively, U dep ) predicted by PCFR be used as an initial objective function when perturbing the value of λ. The poor prediction of ρsat for the “perturbed model” is expected since ITIC ρsat del l pends primarily on the isochore Z values (see discussion in Section 2.2). Thus, neither contours be“MBAR single reference” or “PCFR single reference” provide reliable ρsat l cause neither method accurately predicts Z for the “perturbed model” and single reference test (see Figure 2, Panels c)-d)). The large fluctuations in the “MBAR multiple references” contours are likely the result of the inherent randomness in Z when Neff ≈ 1 (see Figure 4 Panel f)). The near-saturation, isochore state points that directly impact T sat are more likely to have Neff ≈ 1 because Neff decreases with increasing density and decreasing temperature. Elucidating the reason why “MBAR multiple references” and “PCFR single reference” provide reasonable estimates of Pvsat for the “perturbed model” is more complicated, because ITIC Pvsat depends strongly on both Z and U dep . This is due to the exponential 31 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

relationship between Adep and Pvsat (see Equations 6-7) where Adep is computed from the isotherm Z and isochore U dep values (see Equation 5). In addition, the isochore Z values can have a significant impact on Pvsat through T sat in Equations 5-7. However, because “PCFR single reference” does not provide accurate estimates of Z, the accuracy of Pvsat appears to depend primarily on the reliability of U dep (see Figure 2 Panels a) and b)). Since “MBAR multiple references” predicts Udep to within 1-2 %, the noise in the corresponding log10 (Pvsat ) contours can likely be attributed to large (random) deviations for Z at certain state points (see Figure 4 Panels c) and f)). By contrast, the smooth “PCFR single reference” log10 (Pvsat ) contours suggest that the “PCFR single reference” deviations in Z are more systematic (i.e. less random) than those from “MBAR multiple references.”

3.2

Additional Compounds

In this section we perform the “perturbed model” and single reference test for additional compounds. Specifically, we verify that MBAR struggles to predict U dep , while PCFR can provide reasonable estimates of U dep when λ > λref . The Z results are not included because we have already concluded that neither MBAR or PCFR is capable of predicting Z in this case. The “perturbed model” test is performed by sampling from the TraPPE-UA LJ 12-6 model (θref = θTraPPE ) while the non-simulated force field is the Potoff Mie λ-6 model (θ = θPotoff ). The reason to emphasize the “perturbed model” results is because a plethora of LJ 12-6 parameters exist in the literature. Improved force field accuracy likely necessitates a systematic conversion of these two-parameter LJ models to a model with additional parameters (i.e. the three-parameter Mie λ-6, Exp-6, extended Lennard-Jones, 35 etc.). There are at least two primary reasons to focus on the single reference test. First, the number of direct simulations is minimized by using a single reference. Second, our goal is to verify the reliability of PCFR, which is designed only for a single reference. 32 ACS Paragon Plus Environment

Page 32 of 65

Page 33 of 65 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

The additional compounds considered in this section are hexafluoroethane, propane, n-butane, and n-octane. Hexafluoroethane (C2 F6 ) provides an extreme test case where the perturbed repulsive barrier is significantly different from the LJ 12-6. Specifically, the Potoff model utilizes a Mie 36-6 potential for CF3 sites. The n-alkanes provide insight into how the surrogate models scale with additional sites (note that λ = 16 for the Potoff CH3 , CH2 , and cross-interactions). The parameters for each force field and the simulation conditions for each compound are provided in Sections SI.I.1-SI.I.3 of the Supporting Information. Figure 7 Panels a) and b) provide parity plots of U dep for hexafluoroethane and the n-alkanes, respectively. MBAR and PCFR estimates are compared with direct simulation results. Panel a) also includes “constant PCF” for comparison. Percent deviation plots are included as insets for PCFR in Panel a) and for MBAR and PCFR in Panel b). Panel a) demonstrates that MBAR (and “constant PCF”) are completely inadequate for providing reasonable estimates of U dep for hexafluoroethane. There are two primary reasons for the poor performance of MBAR. First, the  value for Potoff (155.75 K) is nearly twice that of TraPPE-UA (87 K). The second, and more important, reason is the extreme difference in λ. The LJ 12-6 potential is much softer than the Mie 36-6 potential. Therefore, none of the configurations sampled with TraPPE-UA represents a reasonable configuration for the Potoff model. This is manifested by the fact that Neff ≈ 1 for every state point (not depicted). By contrast, PCFR provides relatively accurate estimates of U dep for hexafluoroethane. As discussed in Section 3.1.1, this is because PCFR effectively “rescales” the configurations to avoid infeasible energies and forces. Panel b) demonstrates that the MBAR single reference and “perturbed model” results observed for ethane are similar to those for larger n-alkanes. Specifically, the MBAR deviations from direct simulation are significantly positive for each compound. By contrast, the PCFR percent deviations increase in magnitude with increasing chain-length. There 33 ACS Paragon Plus Environment

=

Potoff

Percent Deviation

ref = TraPPE UA , 5 Constant PCF 70 MBAR PCFR 0 Parity 50 a) Hexafluoroethane (C2F6) 5 30

15 10 5 Direct Simulation

10 10 16

14 dep 12 10 8 6 4 U (kJ/mol), Direct Simulation Propane, MBAR n-Butane, MBAR n-Octane, MBAR Propane, PCFR n-Butane, PCFR n-Octane, PCFR Parity

5 10 15 20

10

25 30 35 40 40

35 dep 30

U

2

Direct Simulation 35 25 15 5

b) n-Alkanes Percent Deviation

U dep (kJ/mol), Predicted

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

U dep (kJ/mol), Predicted

Journal of Chemical Theory and Computation

25

0 10 20

20

15

10

(kJ/mol), Direct Simulation

5

Figure 7: MBAR performs poorly for large changes in λ (λ = 36 for hexafluoroethane) while PCFR provides reasonable estimates of Udep , although the reliability of PCFR diminishes with increasing chain-length of n-alkanes. Panels a) and b) correspond to hexafluoroethane and n-alkanes, respectively. Panel a) compares PCFR with the “constant PCF” and MBAR results while Panel b) only includes the MBAR and PCFR results. The parity plots compare direct simulation values with those predicted using the different surrogate models. Percent deviation plots are included as insets for PCFR in Panel a) and for MBAR and PCFR in Panel b). are at least two likely explanations for the poor performance of PCFR for n-octane. The first reason is that the error for each interaction site accumulates and, therefore, the overall error increases with the number of sites. The second reason is that PCFR assumes that the PCF of all site-types, regardless of molecular topology, can be scaled based on the nonbonded interactions. However, it makes intuitive sense that the PCF for CH3 sites (which

34 ACS Paragon Plus Environment

Page 34 of 65

Page 35 of 65 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

are located at a terminal position) would be more sensitive to the non-bonded interactions than the PCF for a CH2 site with several neighboring sites. Therefore, PCFR should primarily be utilized for smaller compounds and/or when non-bonded parameters are modified for fewer site-types.

4 Recommendations and Limitations MBAR provides accurate estimates of U dep and Z (and, thereby, ρsat and Pvsat ) when l there is sufficient configurational overlap between the reference force field(s) and the nonsimulated force field. We recommend using the number of effective samples (Neff ) to quantify the overlap. Specifically, we recommend that MBAR be utilized if Neff > 50. Multiple reference parameter sets are necessary to ensure that Neff > 50 over a large region of parameter space. The key limitations to implementing MBAR are to determine the best reference parameters and to minimize the number of references required. More reference force fields necessitates more direct simulations. Therefore, the reference force fields should be chosen judiciously such that adequate, but not excessive, sampling of the parameter space is achieved in the region of most importance. For example, an adaptive sampling algorithm can determine which additional parameters will reduce the MBAR uncertainties. 48 However, this approach attempts to reduce the uncertainty in the entire parameter space while we are only interested in the parameter space near the optimum. Rather than sampling hundreds of ref and σref sets for λref = 12, we recommend sampling a few different ref and σref sets for each value of λ. Unfortunately, the optimal ref for a given λ is not known a priori and increases with increasing λ. For this reason, a key recommendation from this study is that PCFR be utilized to determine the reference parameters for MBAR when the non-bonded potential form (particularly the repulsive 35 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

barrier, i.e. λ) is modified. Specifically, in the case of converting from a single LJ 12-6 reference force field to a Mie λ-6 potential, the optimal ref for different values of λ is determined by minimizing the RMS of log10 (Pvsat ) with σ = σref . We also found that minimizing the RMS of U dep using PCFR could be a reliable approach, as shown in Section SI.III of the Supporting Information. We recommend maintaining σ constant during this preliminary optimization since the PCFR estimates for Pvsat (and U dep ) are most reliable when σ ≈ σref . Subsequently, multiple references are sampled for the desired λ using this “PCFR-optimal” ref . Since Pvsat (and U dep ) is fairly insensitive to σ (or rmin ), we recommend that a constraint be applied to σref (or rmin,ref ) to reduce the number of reference parameters. For example, the TraPPE-UA LJ 12-6 model has been well optimized to match ρsat (which depends l strongly on σ), therefore we could constrain the reference σ (and rmin ) values to be within a certain “trust region”, say ±1 %. Alternatively, we recommend that σref ≥ 0.99 × σTraPPE and rmin,ref ≤ 1.01 × rmin,TraPPE for λ > 12 and vice versa for λ < 12. This empirical recommendation is based on the fact that most united-atom sites follow the trend that σPotoff > σTraPPE and rmin,Potoff < rmin,TraPPE where λPotoff > 12. To facilitate future implementation, we recommended the following algorithm (referred to as PCFR-MBAR-ITIC) for rapid parameterization of non-bonded interactions aimed at accurate prediction of vapor-liquid equilibria: 1. Perform molecular simulations using: (a) NVT ensemble (either MD or MC) (b) ITIC conditions (19 ρ–T state points: nine for isotherm, two for each of the five isochores) (c) Reference force field(s) (θref ), initial reference is taken from literature (important to have reasonably optimized σref ) 36 ACS Paragon Plus Environment

Page 36 of 65

Page 37 of 65 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

2. Store independent configurations or basis functions (see Section SI.IV in the Supporting Information) 3. Determine additional references: (a) σref : evenly spaced within “trust region” (reasonable range for optimal σ based on σref ) (b) ref : minimize the RMS of log10 (Pvsat ) (or U dep ) predicted by PCFR and ITIC 4. Repeat Steps 1-2 using additional references found in Step 3 5. Optimize force field parameters: (a) Predict U dep and Z with MBAR sat sat with ITIC (b) Calculate ρsat l , ρv and P sat sat (c) Define an objective function that depends on ρsat , U dep and/or Z l , ρv , P

6. Determine if additional references are needed based on Neff near optimum 7. Repeat Steps 5-6 until parameters converge to within a desired tolerance 8. Quantify uncertainty in non-bonded parameters The computational bottleneck for configuration-sampling-based, statistical, 11,14,16–19 and equation-of-state 4,20–28 surrogate models is the molecular simulation step. Specifically, the “real time to solution” of the PCFR-MBAR-ITIC algorithm depends primarily on Step 1, namely, the real time to perform the direct molecular simulations. The post-simulation optimization time, i.e. the real time required to complete Steps 3 and 5, is negligible in comparison. By performing the additional reference simulations in parallel, i.e. Step 4, the “real time to solution” is approximately the same as the real time to perform a single

37 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

NVT ensemble simulation, which depends on the computer hardware, simulation software, force field complexity, cut-off distance, number of molecules, size of compound, etc. For example, the results presented in Section 5 were obtained in approximately one week of real time. Note that the PC-SAFT algorithm proposed in Reference 24 determines additional reference parameter sets sequentially by iteratively finding a new proposed optimum and, therefore, molecular simulations must be performed in serial. This is because the PCSAFT equation-of-state surrogate model is limited to a single reference parameter set, similar to PCFR. Information from previous iterations is effectively lost. By contrast, MBAR and meta-models 16–18 are able to utilize information from several reference parameter sets and, thus, molecular simulations can be performed in parallel. Although this would appear to be a significant algorithmic advantage for PCFR-MBAR-ITIC, the PC-SAFT approach typically finds the optimal parameter set within 2 to 3 iterations of direct simulation. Therefore, the “real time to solution” is comparable for all three surrogate model classes. Furthermore, the PCFR-MBAR-ITIC algorithm relies on starting with a reasonable reference parameter set, see Step 1c, whereas the PC-SAFT approach demonstrates rapid convergence even with a poor initial guess. 24

5 Algorithm Application We apply the PCFR-MBAR-ITIC algorithm from Section 4 to optimize transferable CH3 and CH2 Mie λ-6 parameters for n-alkanes. Although we do not explicitly define an objective function, we provide contours of the RMS deviations from the REFPROP ρsat and log10 (Pvsat ) values to help visualize the optimal region. We perform this analyl sis sequentially by assuming the CH3 parameters obtained for ethane are transferable to larger n-alkanes. The CH2 parameter analysis is performed independently for propane, 38 ACS Paragon Plus Environment

Page 38 of 65

Page 39 of 65 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

n-butane, and n-octane to investigate the transferability of the CH2 parameters. For Step 1, we simulate ethane using the TraPPE-UA LJ 12-6 force field as our initial reference. Additional reference points (i.e. Step 3) are determined by minimizing the log10 (Pvsat ) RMS predicted by PCFR. Figure 8 depicts the results from this analysis. The lines in Figure 8 represent the “PCFR-optimal” CH3 values for the corresponding σCH3 and λCH3 . The points in Figure 8 are the ref,CH3 and σref,CH3 values determined in Step 3 for integer values of λref,CH3 = 13 to 18. Specifically, ref,CH3 for a given λref,CH3 corresponds to the “PCFR-optimal” CH3 value for σCH3 = σTraPPE,CH3 . The σref,CH3 values are evenly spaced such that σref,CH3 ≥ 0.99 × σTraPPE,CH3 and rmin,ref,CH3 ≤ 1.01 × rmin,TraPPE,CH3 . The spacing between neighboring σref,CH3 values is no more than 0.0025 nm since this is the ¯eff > 50, recall Figure 3). Note that range over which we found MBAR to be reliable (i.e. N the “Potoff” points are included in Figure 8 for comparison with the “PCFR-optimal” curves for λCH3 = 14, 16, and 18. “Potoff” points for λCH3 = 14, 18 are approximations extracted from Figure 2 of Reference [1]. 1 The “PCFR-optimal” lines in Figure 8 follow a reasonable, smooth trend and are in good quantitative agreement with the “Potoff” values for a given λCH3 . This suggests that ref,CH3 and σref,CH3 obtained in Step 3 are near the optimum for the respective λCH3 values and, therefore, should adequately sample the relevant region of parameter space. To substantiate this statement we perform Steps 4-5 for the λCH3 = 16 reference values plotted in Figure 8. The configurations sampled from these additional references are then reweighted using MBAR to predict U dep and Z. The ITIC equations convert the estimated U dep and Z values to ρsat and Pvsat . Figure 9 shows the high level of agreement of direct l simulation and MBAR RMS values relative to REFPROP for the same range of CH3 and σCH3 as plotted previously in Figure 6. This process is then repeated for propane, n-butane, and n-octane to examine the CH2 parameter space. For simplicity and visualization purposes, we only investigate 39 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

sat 133 PCFR-optimal (minimized RMS log10 Pv for (

)

ref = TraPPE UA)

(K)

128

123

CH3

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

118

= 13 = 14 = 15 = 16 = 17 113 = 18 Potoff, = 14 Potoff, = 16 Potoff, = 18 108 0.365 0.370

0.375 CH3 (nm)

0.380

0.385

Figure 8: Ethane “PCFR-optimal” ref values for CH3 follow a reasonable trend and are in good quantitative agreement with “Potoff” points. The dashed lines are obtained by minimizing RMS log10 (Pvsat ) over a range of σ for a given λ. The different symbols correspond to the recommended  and σ reference sets for the respective values of λ. λCH2 = 16 and use the Potoff CH3 parameter values, since these appear to be near optimal (see Figure 9). The TraPPE-UA force field is again used as the single reference for PCFR. The PCFR-optimal ref,CH2 values for propane and n-butane are obtained by minimizing their respective RMS of log10 (Pvsat ) with σCH2 = σTraPPE,CH2 = 0.395 nm, λCH2 = 16, CH3 = 121.25 K, σCH3 = 0.3783 nm, and λCH3 = 16. Since PCFR is less reliable for larger molecules, we use the optimal CH2 from n-butane for the ref,CH2 of n-octane. We simulate eight evenly spaced σref,CH2 values between 0.99 × σTraPPE,CH2 and 1.01 × rmin,TraPPE,CH2 with constant ref,CH2 . Finally, MBAR and ITIC are implemented to predict ρsat and Pvsat l over a wide range of CH2 and σCH2 with λCH2 = 16 and using the Potoff CH3 parameter values. Figure 10 compares the direct simulation and MBAR RMS values relative to REFPROP for the joint CH2 and σCH2 parameter space. For clarity, only the propane direct simulation results are plotted and only one contour level is included for n-butane and n-octane. 40 ACS Paragon Plus Environment

Page 40 of 65

Page 41 of 65

a)

128

RMS of

sat l

3) (kg/m 2 22 18 14 110 66

CH3

(K)

123

2 66 10 14 18 2222 26 3300

118 113 108 0.365

b) 128

(K)

123

CH3

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

118 113 108 0.365

0.370

0.375

(nm) RMS of log10 Pvsat CH3

(

0.15 0.10 0.0 0.025 0. 0.0022 00..0055 0.10 0.15 0.20 Direct Simulation 0.25 0.25 MBAR 0.30 References (PCFR-optimal) 0.35 Potoff, CH3 = 16 0.40 0.370 0.375

CH3

(nm)

0.380

0.385

0.380

0.385

)

Figure 9: Contours of RMS deviation from REFPROP ethane values predicted by MBAR using PCFR-optimal references are in excellent agreement with direct simulation over a wide region of the CH3 parameter space. Panels a) and b) plot the RMS for ρsat and l log10 (Pvsat ), respectively. “References (PCFR-optimal)” are obtained as described in Step 3 for λ = 16 as found in Figure 8. The Potoff CH3 parameters are included only as a visual reference. Direct simulations were performed on a 21x21 grid equally spaced between 108-128 K and 0.365-0.385 nm. The MBAR contours for propane in Figure 10 appear to be in excellent agreement with direct simulation over the relevant region of parameter space. Note that the MBAR contours in Figures 9-10 are smoother than direct simulation because MBAR effectively includes uncorrelated replicate simulations from each reference, whereas the direct simulation contours were obtained without any replicate simulations. The good agreement between the MBAR and direct simulation contours (and the smoothness) near the optimum is important for quantifying parameter uncertainties in Step 8 (although the “optimal”  and σ values depend on how the objective function is defined). Although Bayesian inference methods (such as Markov Chain Monte Carlo) are 41 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

RMS of

70

a)

CH2

60

4

55 50 0.385

0.390

sat l

65 60

0.05 0.05 0.05 0.05

55 50 0.385

(kg/m3)

4 4

16 0.400

CH2

0.405

(nm) sat

RMS 00of log10 Pv .1. 5

b)70

(K)

16 16 14 12 1100 88 6 44 2 4 2 6466 8 10 10 11142 0.395

6

(K)

65

CH2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

00..1 100 00..0055 0.0022 0. 0.02 0.05 0.05 0. 0.1100

Page 42 of 65

15

(

0.410

0.415

)

Direct Simulation, propane MBAR, propane MBAR, n-butane MBAR, n-octane References, propane References, n-butane References, n-octane Potoff, CH2 = 16

00..1155 0.390

0.200 0.2 0.395

0.400

CH2

(nm)

0.405

0.410

0.415

Figure 10: Contours of RMS deviation from REFPROP propane values predicted by MBAR using PCFR-optimal references are in excellent agreement with direct simulation over a wide region of the CH2 parameter space. RMS contours for propane, n-butane, and n-octane are slightly different. Panels a) and b) plot the RMS for ρsat and log10 (Pvsat ), l respectively. Reference values are obtained as described in Step 3 for λCH2 = 16. The CH3 parameters are from the Potoff force field, i.e. CH3 = 121.25 K, σCH3 = 0.3783 nm, and λCH3 = 16. CH3 -CH2 cross-interactions are calculated using Lorentz-Berthelot combining rules for  and σ and an arithmetic mean for λ. The Potoff CH2 parameters are included only as a visual reference. Direct simulations were performed on a 21x31 grid equally spaced between 50-70 K and 0.385-0.415 nm. beyond the scope of this article, it is worth mentioning that these UQ methods are feasible because MBAR (with basis functions) is several orders of magnitude less expensive than direct simulation. Also, by utilizing basis functions the computational cost to perform a UQ analysis, and to generate Figure 10, is the same for n-octane and n-propane (see Section SI.IV in the Supporting Information). Although not depicted here, we observe that the average number of effective samples ¯eff )  50 for most of the parameter space investigated (Step 6). Therefore, we conclude (N 42 ACS Paragon Plus Environment

Page 43 of 65 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

that no additional references are needed near the optimum.

6 Conclusions The development of accurate and reliable surrogate models is essential for rapid, highdimensional parameterization and uncertainty quantification of force fields. The nonbonded potential has been the focus of this study since it is typically parameterized to reproduce vapor-liquid equilibria data. Configuration-sampling-based surrogate models, such as Multistate Bennett Acceptance Ratio (MBAR) and Pair Correlation Function Rescaling (PCFR) are well suited for estimating internal energies and pressures (compressibility) for a non-simulated non-bonded potential. Isothermal Isochoric (ITIC) thermodynamic integration converts the MBAR and PCFR outputs into saturation properties sat sat (ρsat l , ρv , and Pv ).

We performed several tests to determine the range of reliability of MBAR and PCFR for parameterizing a Mie λ-6 potential. MBAR provides accurate estimates of U dep , Z, sat when Neff > 50. Unfortunately, Neff  50 when λ 6≈ λref and/or σ 6≈ σref . ρsat l , and Pv

Therefore, MBAR should primarily be used when several σref values are sampled from λ = λref . By contrast, PCFR accurately predicts U dep and Pvsat from a single reference force field for a wide range of λ values (and a moderate range of σ values). However, PCFR has the wrong trend for Z and ρsat with respect to  and σ. l Therefore, MBAR and PCFR are complementary tools. PCFR is a useful exploratory tool for proposing optimal regions of parameter space while MBAR is for final optimization and parameter uncertainty quantification. We developed a method that utilizes PCFR to rapidly determine which reference parameters should be sampled to improve the MBAR results. We demonstrated that this algorithm is a reliable and efficient means for parameterizing a Mie λ-6 potential for n-alkanes. We expect the same to be true in 43 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 44 of 65

general for other potentials, such as the Exponential-6. Future studies will utilize these surrogate models (MBAR and PCFR) to rapidly optimize accurate force fields with meaningful estimates of parameter uncertainty.

Acknowledgments The authors would like to acknowledge J. Richard Elliott for his collaboration and guidance for implementing the ITIC approach. This research was performed while R.A.M. held a National Research Council (NRC) Postdoctoral Research Associateship at the National Institute of Standards and Technology (NIST).

Supporting Information Supporting Information provides simulation set-up specifications in Section SI.I, details for the ITIC analysis in Section SI.II, an alternative PCFR-optimal approach in Section SI.III, and a thorough discussion of basis functions in Section SI.IV. This information is available free of charge via the Internet at http://pubs.acs.org. The MBAR-PCFR-ITIC code is available at https://github.com/ramess101/MBAR_ITIC.

Appendix We now provide a brief summary of some other PCFR methods to estimate gij (θ; rij ) that we have investigated. First, the simplest PCFR approach is to assume that the pair correlation functions for the two force fields are equal (referred to as “constant PCF” in Section 3): gij (θ; rij ) = gij (θref ; rij )

44 ACS Paragon Plus Environment

(27)

Page 45 of 65 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

For very small differences between θ and θref this assumption may be a good approximation. Applying Equation 27 to Equations 19 and 20 yields the expressions:

hU dep (θ)i ≈ hU dep (θref )i + 2πρRg

N NS Z ∞ S −1 X X i=1

j>i

 vdw  2 uij (θ; rij ) − uvdw ij (θref ; rij ) gij (θref ; rij )rij ∂rij (28)

0

hZ(θ)i ≈ hZ(θref )i " # NS −1 X NS Z ∞ ∂uvdw ∂uvdw 2πρ X ij (θ; rij ) ij (θref ; rij ) 3 − − gij (θref ; rij )rij ∂rij (29) 3T i=1 j>i 0 ∂rij ∂rij A connection exists between Equations 28-29 and the MBAR equations for calculating expectation values of U dep and Z (Equations 12-13). The assumption that the pair correlation functions (i.e. configurations) for the two force fields are equal (Equation 27) is mathematically equivalent to assigning each sampled configuration an equal probability in the MBAR equations, i.e. Wn (θ) =

1 N

for all xn . With the weights equal, Equations

10-11 are no longer needed to calculate U dep and Z as Equations 12-13 reduce to:

hU

dep

N 1 X dep (θ)i = U (xn ; θ) N n=1

N 1 X hZ(θ)i = Z(xn ; θ) N n=1

(30)

(31)

Although Equations 28-29 (“constant PCF”) and 30-31 (“equal weights”) are equivalent, the use of pair correlation functions in Equations 28-29 significantly reduces the bookkeeping compared to MBAR when N configurations (i.e. coordinates of every interaction-

45 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 46 of 65

site for each molecule) are stored. As shown in Figures 1, 2, and 7 “constant PCF” (i.e. Equations 28-29 or 30-31) has a strong bias in U dep and Z. Rather than assuming the PCFs for the two force fields are equal with respect to the absolute distance, rij , (Equation 27) we can assume gij is constant with respect to a reduced ∗ : distance, rij ∗ ∗ gij (θ; rij ) ≈ gij (θref ; rij ) ∗ where the reduced distance (rij ) may be defined as

r σ

or

(32) r , rmin

for example. This is equiv-

alent to the MBAR configuration mapping example discussed in Section 2.3.1 for the single-site LJ system. The advantage of Equation 32 is that the positions of the peaks in gij depend strongly on σ (and rmin ). However, this approach is only rigorous if the system volume is also scaled, which is not straightforward for a multi-site molecule. For a constant volume, the location of the first peak and the peaks heights depend strongly on σ. In addition, Equation 32 does not account for changes in the well depth () or the steepness of repulsive barrier (λ for the Mie λ-6 potential). The value of  changes the height of the first peak for gij , while λ changes the slope of the initial ascent for gij . In order to account for changes in λ and/or , we can rescale the PCF by estimating the potential of mean force (w, or PMF) as:   vdw wij (θ; rij ) ≈ wij (θref ; rij ) + uvdw ij (θ; rij ) − uij (θref ; rij )

(33)

where w (or PMF) is defined as:

wij (θ; rij ) ≡

− ln(gij (θ; rij )) T

46 ACS Paragon Plus Environment

(34)

Page 47 of 65 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Equation 34 is rearranged to yield:  gij (θ; rij ) ≡ exp

−wij (θ; rij ) T

 (35)

and an estimation of the PCF is obtained by substituting Equation 33 into Equation 35:

gij (θ; rij ) ≈ gij (θref ; rij ) exp

−∆uvdw ij (θ, θref ; rij ) T

! (36)

vdw vdw where ∆uvdw ij (θ, θref ; rij ) = uij (θ; rij ) − uij (θref ; rij ). Note that rij does not need to be ∗ a scaled distance (rij ), because ∆uvdw already accounts for the difference in σ (or rmin ). ij

Equation 36 is then substituted into Equations 19-20. In our experience, Equation 36 is better than Equations 27 and 32, although the rescaled PCF still suffers from some deficiencies. Future development of additional PCFR methods is an ongoing task. In many instances, Equations 25-26 yield similar results to Equations 19-20 and 36. This is not surprising, since Equation 36 can also be viewed as multiplying the PCF by the ratio of the zeroth order radial distribution functions (g0 ) of θ and θref . However, Equations 25-26 are simpler and have some computational benefits. For example, Equations 25-26 are numerically stable since they do not depend on the bin widths of the gij (θref ; rij ) histogram. Therefore, in this study the PCFR results are obtained using Equations 25-26.

References (1) Potoff, J. J.; Bernard-Brunel, D. A. Mie Potentials for Phase Equilibria Calculations: Applications to Alkanes and Perfluoroalkanes. J. Phys. Chem. B 2009, 113, 14725– 14731. (2) Martin, M. G.; Siepmann, J. I. Transferable potentials for phase equilibria. 1. United-

47 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

atom description of n-alkanes. J. Phys. Chem. B 1998, 102, 2569–2577. (3) Nath, S. K.; Escobedo, F. A.; de Pablo, J. J. On the simulation of vapor-liquid equilibria for alkanes. J. Chem. Phys. 1998, 108, 9905–9911. (4) Hemmen, A.; Gross, J. Transferable Anisotropic United-Atom Force Field Based on the Mie Potential for Phase Equilibrium Calculations: n-Alkanes and n-Olefins. J. Phys. Chem. B 2015, 119, 11695–11707. (5) Shah, M. S.; Siepmann, J. I.; Tsapatsis, M. Transferable potentials for phase equilibria. Improved united-atom description of ethane and ethylene. AIChE J. 2017, 63, 5098– 5110. (6) Errington, J. R.; Panagiotopoulos, A. Z. A new intermolecular potential model for the n-alkane homologous series. J. Phys. Chem. B 1999, 103, 6314–6322. (7) Rhinehart, R. R.; Su, M.; Manimegalai-Sridhar, U. Leapfrogging and synoptic Leapfrogging: A new optimization approach. Comput. Chem. Eng. 2012, 40, 67 – 81. (8) Stöbener, K.; Klein, P.; Reiser, S.; Horsch, M.; Kufer, K.-H.; Hasse, H. Multicriteria optimization of molecular force fields by Pareto approach. Fluid Ph. Equilibria 2014, 373, 100 – 108. (9) Werth, S.; Stöbener, K.; Klein, P.; Kufer, K.-H.; Horsch, M.; Hasse, H. Molecular modelling and simulation of the surface tension of real quadrupolar fluids. Chem. Eng. Sci. 2015, 121, 110 – 117, 2013 Danckwerts Special Issue on Molecular Modelling in Chemical Engineering. (10) Stöbener, K.; Klein, P.; Horsch, M.; Kufer, K.; Hasse, H. Parametrization of two-center Lennard-Jones plus point-quadrupole force field models by multicriteria optimization. Fluid Ph. Equilibria 2016, 411, 33 – 42. 48 ACS Paragon Plus Environment

Page 48 of 65

Page 49 of 65 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(11) Cailliez, F.; Pernot, P. Statistical approaches to forcefield calibration and prediction uncertainty in Molecular Simulation. J. Chem. Phys. 2011, 134, 054124. (12) Rizzi, F.; Najm, H. N.; Debusschere, B. J.; Sargsyan, K.; Salloum, M.; Adalsteinsson, H.; Knio, O. M. Uncertainty Quantification in MD Simulations. Part II: Bayesian Inference of Force-Field Parameters. Multiscale Model. Sim. 2012, 10, 1460–1492. (13) Angelikopoulos, P.; Papadimitriou, C.; Koumoutsakos, P. Bayesian uncertainty quantification and propagation in molecular dynamics simulations: A high performance computing framework. J. Chem. Phys. 2012, 137, 144103. (14) Messerly, R. A.; KnottsIV, T. A.; Wilding, W. V. Uncertainty quantification and propagation of errors of the Lennard-Jones 12-6 parameters for n-alkanes. J. Chem. Phys. 2017, 146, 194110. (15) Hülsmann, M.; Vrabec, J.; Maaß, A.; Reith, D. Assessment of numerical optimization algorithms for the development of molecular models. Comput. Phys. Commun. 2010, 181, 887 – 905. (16) Laurent, L.; Le Riche, R.; Soulier, B.; Boucard, P.-A. An Overview of GradientEnhanced Metamodels with Applications. Archives of Computational Methods in Engineering 2017, 24, 1–46. (17) Hülsmann, M.; Kirschner, K. N.; Krämer, A.; Heinrich, D. D.; Krämer-Fuhrmann, O.; Reith, D. In Foundations of Molecular Modeling and Simulation: Select Papers from FOMMS 2015; Snurr, R. Q., Adjiman, C. S., Kofke, D. A., Eds.; Springer Singapore: Singapore, 2016; pp 53–77. (18) Hülsmann, M.; Reith, D. SpaGrOW - A Derivative-Free Optimization Scheme for Intermolecular Force Field Parameters Based on Sparse Grid Methods. Entropy 2013, 15, 3640. 49 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(19) Cailliez, F.; Bourasseau, A.; Pernot, P. Calibration of forcefields for Molecular Simulation: Sequential design of computer experiments for building cost-efficient kriging metamodels. J. Comput. Chem. 2014, 35, 130–149. (20) Lotfi, A.; Vrabec, J.; Fischer, J. Vapour liquid equilibria of the Lennard-Jones fluid from the NpT plus test particle method. Mol. Phys. 1992, 76, 1319–1333. (21) Thol, M.; Rutkai, G.; Span, R.; Vrabec, J.; Lustig, R. Equation of State for the LennardJones Truncated and Shifted Model Fluid. Int. J. Thermophys. 2015, 36, 25–43. (22) Thol, M.; Rutkai, G.; Köster, A.; Lustig, R.; Span, R.; Vrabec, J. Equation of State for the Lennard-Jones Fluid. J. Phys. Chem. Ref. Data 2016, 45, 023101. (23) Stoll, J.; Vrabec, J.; Hasse, H.; Fischer, J. Comprehensive study of the vapour-liquid equilibria of the pure two-centre Lennard-Jones plus point-quadrupole fluid. Fluid Ph. Equilibria 2001, 179, 339 – 362. (24) van Westen, T.; Vlugt, T. J. H.; Gross, J. Determining Force Field Parameters Using a Physically Based Equation of State. J. Phys. Chem. B 2011, 115, 7872–7880. (25) Papaioannou, V.; Calado, F.; Lafitte, T.; Dufal, S.; Sadeqzadeh, M.; Jackson, G.; Adjiman, C. S.; Galindo, A. Application of the SAFT-γ Mie group contribution equation of state to fluids of relevance to the oil and gas industry. Fluid Ph. Equilibria 2016, 416, 104 – 119, Special Issue: SAFT 2015. (26) Avendaño, C.; Lafitte, T.; Adjiman, C. S.; Galindo, A.; Müller, E. A.; Jackson, G. SAFTγ Force Field for the Simulation of Molecular Fluids: 2. Coarse-Grained Models of Greenhouse Gases, Refrigerants, and Long Alkanes. J. Phys. Chem. B 2013, 117, 2717– 2733.

50 ACS Paragon Plus Environment

Page 50 of 65

Page 51 of 65 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(27) Hemmen, A.; Panagiotopoulos, A. Z.; Gross, J. Grand Canonical Monte Carlo Simulations Guided by an Analytic Equation of State-Transferable Anisotropic Mie Potentials for Ethers. J. Phys. Chem. B 2015, 119, 7087–7099. (28) Weidler, D.; Gross, J. Transferable Anisotropic United-Atom Force Field Based on the Mie Potential for Phase Equilibria: Aldehydes, Ketones, and Small Cyclic Alkanes. Ind. Eng. Chem. Res. 2016, 55, 12123–12132. (29) Mick, J. R.; Soroush Barhaghi, M.; Jackman, B.; Rushaidat, K.; Schwiebert, L.; Potoff, J. J. Optimized Mie potentials for phase equilibria: Application to noble gases and their mixtures with n-alkanes. J. Chem. Phys. 2015, 143, 114504. (30) Mick, J. R. Force Field Development with GOMC, A Fast New Monte Carlo Molecular Simulation Code. Ph.D. thesis, Wayne State University, 2016. (31) Hoang, H.; Delage-Santacreu, S.; Galliero, G. Simultaneous Description of Equilibrium, Interfacial, and Transport Properties of Fluids Using a Mie Chain CoarseGrained Force Field. Ind. Eng. Chem. Res. 2017, 56, 9213–9226. (32) Allen, M. P.; Tildesley, D. J. Computer simulation of liquids; Clarendon Press ; Oxford University Press: Oxford England New York, 1987; pp xix, 385 p. (33) Jorgensen, W. L.; Madura, J. D.; Swenson, C. J. Optimized Intermolecular Potential Functions for Liquid Hydrocarbons. J. Am. Chem. Soc. 1984, 106, 6638–6646. (34) Jorgensen, W. L.; Maxwell, D. S.; Tirado-Rives, J. Development and Testing of the OPLS All-Atom Force Field on Conformational Energetics and Properties of Organic Liquids. J. Am. Chem. Soc. 1996, 118, 11225–11236. (35) Razavi, S. M. Optimization of a Transferable Shifted Force Field for Interfaces and In-

51 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

homogenous Fluids using Thermodynamic Integration. M.Sc. thesis, The University of Akron, 2016. (36) Elliott, J. R.; Lira, C. T. Introductory Chemical Engineering Thermodynamics, Second Edition; Prentice Hall: Upper Saddle River, New Jersey, 2012. (37) Lemmon, E. W.; Huber, M. L.; McLinden, M. O. NIST Standard Reference Database 23: Reference Fluid Thermodynamic and Transport Properties-REFPROP, Version 9.1, National Institute of Standards and Technology. 2013; https://www.nist. gov/srd/refprop. (38) Frenkel, M.; Chirico, R. D.; Diky, V.; Yan, X.; Dong, Q.; Muzny, C. ThermoData Engine (TDE): Software Implementation of the Dynamic Data Evaluation Concept. J. Chem. Inf. Model. 2005, 45, 816–838. (39) Schultz, A. J.; Kofke, D. A. Virial coefficients of model alkanes. J. Chem. Phys. 2010, 133, 104101. (40) Schultz, A. J.; Barlow, N. S.; Chaudhary, V.; Kofke, D. A. Mayer Sampling Monte Carlo calculation of virial coefficients on graphics processors. Mol. Phys. 2013, 111, 535–543. (41) Eggimann, B.; Bai, P.; Bliss, A.; Chen, Q.; Chen, T.; Corest-Morales, A.; Fetisov, E.; Haldoupis, E.; Harwood, D.; Lindsey, R.; Arachchi, T.; Shah, M.; Stern, H.; Struk, K.; Sung, J.; Sunnarborg, A.; Xue, B.; Siepmann, J. I. T-UA No. 2 ethane. TraPPE Validation Database, University of Minnesota: Minneaoplis, MN. http://www.chem.umn.edu/groups/siepmann/trappe/, http://www.chem.umn.edu/groups/siepmann/trappe/ (accessed 2015 June 11). (42) Shirts, M. R. Simple Quantitative Tests to Validate Sampling from Thermodynamic Ensembles. J. Chem. Theory Comput. 2013, 9, 909–926. 52 ACS Paragon Plus Environment

Page 52 of 65

Page 53 of 65 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(43) Schappals, M.; Mecklenfeld, A.; Kröger, L.; Botan, V.; Köster, A.; Stephan, S.; García, E. J.; Rutkai, G.; Raabe, G.; Klein, P.; Leonhard, K.; Glass, C. W.; Lenhard, J.; Vrabec, J.; Hasse, H. Round Robin Study: Molecular Simulation of Thermodynamic Properties from Models with Internal Degrees of Freedom. J. Chem. Theory Comput. 2017, 13, 4270–4280. (44) Abraham, M.; van der Spoel, D.; Lindahl, E.; B.Hess,; the GROMACS development team, GROMACS User Manual version 2016.3, www.gromacs.org (2017). (45) Morales, A. D. C.; Economou, I. G.; Peters, C. J.; Siepmann, J. I. Influence of simulation protocols on the efficiency of Gibbs ensemble Monte Carlo simulations. Mol. Simul. 2013, 39, 1135–1142. (46) Shirts, M. R.; Chodera, J. D. Statistically optimal analysis of samples from multiple equilibrium states. J. Chem. Phys. 2008, 129, 124105. (47) Shirts, M. R. Reweighting from the mixture distribution as a better way to describe the Multistate Bennett Acceptance Ratio. https://arxiv.org/abs/1704.00891 (48) Naden, L. N.; Shirts, M. R. Rapid Computation of Thermodynamic Properties Over Multidimensional Nonbonded Parameter Spaces using Adaptive Multistate Reweighting. J. Chem. Theory Comput. 2016, 12, 1806–1823. (49) Dybeck, E. C.; König, G.; Brooks, B. R.; Shirts, M. R. Comparison of Methods To Reweight from Classical Molecular Simulations to QM/MM Potentials. J. Chem. Theory Comput. 2016, 12, 1466–1480. (50) Streett, W. B.; Staveley, L. A. K. Calculation on a Corresponding States Basis of the Volume Change on Mixing Simple Liquids. J. Chem. Phys. 1967, 47, 2449–2454.

53 ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(51) Paliwal, H.; Shirts, M. R. Multistate reweighting and configuration mapping together accelerate the efficiency of thermodynamic calculations as a function of molecular geometry by orders of magnitude. J. Chem. Phys. 2013, 138, 154108. (52) Moustafa, S. G.; Schultz, A. J.; Kofke, D. A. Very fast averaging of thermal properties of crystals by Molecular Simulation. Phys. Rev. E 2015, 92, 043303.

54 ACS Paragon Plus Environment

Page 54 of 65

Lennard-Jones 12-6

Mie 16-6

ɛ

1 2 3 4 5 6

ɛ

Page Journal 55 of 65 of Chemical Theory and Computation

ACS Paragon Plus Environment

σ

σ

Journal of Chemical Theory and Computation

Constant Model: LJ 12-6, 88 Constant PCF MBAR PCFR Parity

4

12 14 16

Z, Predicted

10

Percent Deviation

10

14

12

U dep

(

kJ mol

)

10

10 8 6 , Direct Simulation

4

8

2

12

5 0 5 4

2

0

2 4 Z, Direct Simulation

6

8

10 3.0

d)

2.5

6 8

10

10 12 14 16 16

14

12

U dep

(

kJ mol

)

50

Neff 500

950

5 0

4

6

2.0

Neff 500

50

950

4

0 4

5 4

8 2 4 ACS Paragon Plus Environment

2

0

0

2 4 Z, Direct Simulation

1.5 1.0

2

0.5

2

10

10 8 6 , Direct Simulation

Z, Predicted with MBAR

8

Percent Deviation

, Predicted with MBAR )

4

(

kJ mol

0 4

2 b)

U dep

Direct Simulation 0 5

0

20 16

4

Deviation

)

U dep

8

Direct Simulation 15 10 5

8

(

kJ mol

6

/nm 0.385

log10(Neff)

2 a)

, Predicted

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

/K 108, 0.365 12 c)

Deviation

ref = TraPPE UA

Page 56 of 65

6

8

10 0.0

Perturbed Model: Mie 16-6, 108 /K 128, 0.365 Constant PCF 20 c) MBAR PCFR 15 Parity

8 10

Percent Deviation

U dep

12 14 16 16

14

12

U dep

(

kJ mol

)

0 20 40

0

10

10

10 8 6 , Direct Simulation

4

2

4

4

2

0

0

2 4 6 Z, Direct Simulation

8

10

12 3.0 2.5

6 8

10

10 12 14 16 16

14

12

U dep

(

kJ mol

)

10

Neff

50

0 10

2.0

10

30

5 0

10 4

2 4 ACS Paragon Plus Environment

Neff

10

5

20

10 8 6 , Direct Simulation

Z, Predicted with MBAR

15

Percent Deviation

, Predicted with MBAR )

20

Direct Simulation 0 5

20 d)

(

kJ mol

5

5

2 b)

U dep

10

15 12 9 6 3 0 3

50

2

0

2 4 6 Z, Direct Simulation

1.5 1.0

Deviation

)

Direct Simulation 15 10 5

(

kJ mol

6

log10(Neff)

4

/nm 0.385

Deviation

ref = TraPPE UA

2 a)

, Predicted

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Z, Predicted

Page 57 of 65

0.5 8

10

12 0.0

Journal of Chemical Theory and Computation

Average Number of Effective Samples ref = TraPPE UA

= 12 = 13 = 14 = 15 = 16 = 17 = 18

118

108

50

50

100

CH3

(K)

128

50

98

20 10 5 6

400

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44

Page 58 of 65

88 0.365

0.370

0.375

CH3

(nm)

ACS Paragon Plus Environment

0.380

0.385

1

)

Percent Deviation

10 12

U dep

14

2 0 2 4

0 50

2

12 kJ

U dep

(

mol

)

Neff

250

8

10 8 6 , Direct Simulation

4

2

0 Percent Deviation

(

14 16 14

12 kJ

U dep

(

mol

)

6

8

2.0

Deviation

5 6 0

0 50

Neff

250

4

1.5

2

50

10

4

2

0

Neff

150

2 0

6

8

10 1.0

6 4

0

2

4

2

0 2

6

50

Neff

150

0.5

1.0

4

10 8 6 , Direct Simulation

2 Simulation 4 Z, Direct

f) Multiple ref: = 118 K, 0.365 /nm 0.3925 Model: Mie 16-6, 8 Perturbed 108 /K 128, 0.365 /nm 0.385

2 16

2

10

6

12

0.1

0

2 c) Multiple ref: = 118 K, 0.365 /nm 0.3925 Perturbed Model: Mie 16-6, 4 108 /K 128, 0.365 /nm 0.385

10

0.0

0.2 0 Z, Direct Simulation 2 4

5

12

8

4

0.1

e) Multiple ref: = 98 K, 0.365 /nm 0.385 Perturbed Model: Mie 16-6, 10 108 /K 128, 0.365 /nm 0.385

16 14

4

Z, Predicted with MBAR

8

16

, Predicted with MBAR

4

6

16

)

10 8 6 , Direct Simulation

2 b) Multiple ref: = 98 K, 0.365 /nm 0.385 Perturbed Model: Mie 16-6, 4 108 /K 128, 0.365 /nm 0.385

(

kJ mol

)

, Predicted with MBAR

(

0

2.5

log10(Neff)

12

kJ U dep mol

0.2

50

Neff 1000 2000 3000

Deviation

14

2

2

Z, Predicted with MBAR

U dep

14

0

4

Deviation

12

50

Neff 1000 2000 3000

Z, Predicted with MBAR

10

Percent Deviation

)

1

16

kJ mol

d) Multiple ref: = 98 K, 0.365 /nm 0.385 Constant Model: LJ 12-6, 6 88 /K 108, 0.365 /nm 0.385

6 8

3.0

8

2 a) Multiple ref: = 98 K, 0.365 /nm 0.385 Constant Model: LJ 12-6, 4 88 /K 108, 0.365 /nm 0.385

(

kJ mol

Journal of Chemical Theory and Computation

16

U dep

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

, Predicted with MBAR

Page 59 of 65

0.5 0.0 0.5 1.0

4

ACS Paragon Plus Environment

2

0

2 Simulation 4 Z, Direct

6

8

10 0.0

40

30

20

2200 10

10

5 10 110

5

98

(kg/m3)

5

103

50

10

(K) CH3

(K) CH3

7700 60

sat l

93

88 0.365

0.370

CH3

30

(nm) sat

RMS of log10 Pv (

103

.0 10

4040

0.380

5500

0.385

)

0.20

0.10 0.10 0.0 .05 0.05 0.0055 0. 00.1 .10 0.05 0.10 0.20 0.30

93

88 0.365

Direct Simulation MBAR multiple references MBAR single reference PCFR single reference References

5 10 2200

0.375

108

98

Page 60 of 65

5

b)

108

0.10 0.05 0.0 5

a)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

RMS of

Journal of Chemical Theory and Computation

0.40 0.50 0.370

0.375

CH3

(nm)

ACS Paragon Plus Environment

0.380

0.385

RMS of

Page 61 of 65

113

123

0.370

108 0.365

10

10

5 5

10

10

20

20 0.375

CH3

(nm) sat

RMS of log10 Pv (

0.05

118

113

20

20

108 0.365 128

20

10 20

118

10

20

123

0.0 5 0.10

(K) CH3 CH3

(K)

b)

128

20

a)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(kg/m3)

sat l

Journal of Chemical Theory and Computation

0.10

0.10 0.05

)

0.385

0.20

0.05 0.10 0.05 0.05 0. 0.10 0.105 0

0.05 0.10

Direct Simulation MBAR multiple references, ref = 118 K MBAR single reference, ref = TraPPE UA PCFR single reference, ref = TraPPE UA Potoff, CH3 = 16 0.370

0.380

0.375

CH3

(nm)

ACS Paragon Plus Environment

0.20 0.30 0.40 0.380

0.385

ref = TraPPE UA , 5 Constant PCF 70 MBAR PCFR 0 Parity 50 a) Hexafluoroethane (C2F6) 5

=

30

Potoff

15 10 5 Direct Simulation

10 10 16

14 dep 12 10 8 6 4 U (kJ/mol), Direct Simulation Propane, MBAR n-Butane, MBAR n-Octane, MBAR Propane, PCFR n-Butane, PCFR n-Octane, PCFR Parity

5 10 15 20

b) n-Alkanes

25 30 35 40 40

10 0 10 20

35 dep 30 ACS Paragon 25 Plus Environment 20 15

U

2

Direct Simulation 35 25 15 5 Percent Deviation

U dep (kJ/mol), Predicted

U dep (kJ/mol), Predicted

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 62 of 65

Percent Deviation

Journal of Chemical Theory and Computation

10

(kJ/mol), Direct Simulation

5

Page 63 of 65

sat for PCFR-optimal (minimized RMS log P 10 v 133 (

)

ref = TraPPE UA)

128

123

CH3

(K)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56

Journal of Chemical Theory and Computation

118

= 13 = 14 = 15 = 16 = 17 113 = 18 Potoff, = 14 Potoff, = 16 Potoff, = 18 108 0.365 0.370

0.375 CH3 (nm)

ACS Paragon Plus Environment

0.380

0.385

RMS of

sat l

(kg/m3)

Journal of Chemical Theory and Computation

a)

22 18 14 110 66

123

CH3

(K)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

128

2 66 10 14 18 2222 26 3300

118 113 108 0.365

CH3

(K)

b) 128 123 118 113 108 0.365

0.370

0.375

(nm) sat RMS of log10 Pv CH3

(

0.15 0.10 0.05 0.02 0.02 0.05 0.10 0.15 0.20 Direct Simulation 0.2255 0. MBAR 0.30 0.30 References (PCFR-optimal) 0.35 Potoff, CH3 = 16 0.40 0.370 0.375

CH3

(nm)

ACS Paragon Plus Environment

Page 64 of 65

0.380

0.385

0.380

0.385

)

RMS of

Page 65 of 65

70

a)

CH2

60

4

55 50 0.385

0.390

(K)

65 60

0.05 0.05 0.05 0.05

55 50 0.385

(kg/m3)

4 4

16 0.400

CH2

0.405

(nm) sat

RMS 00of log P 10 v ..15

b)70

CH2

16 16 124 1 1100 88 6 44 2 4 2 6466 8 110 11142 0.395

6

65

(K)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

sat l

Journal of Chemical Theory and Computation

00..1100 0.05 00..0022 0. 02 0.0 0.055 0.100

15

(

0.410

0.415

)

Direct Simulation, propane MBAR, propane MBAR, n-butane MBAR, n-octane References, propane References, n-butane References, n-octane Potoff, CH2 = 16

0.15 0.390

0..220 0.395

0.400

CH2

(nm)

ACS Paragon Plus Environment

0.405

0.410

0.415