Molecular Dynamics Force-Field Refinement ... - ACS Publications

Nov 23, 2015 - Neutron Data Analysis and Visualization Division, Oak Ridge National Laboratory (ORNL), Oak Ridge, Tennessee, United States. ABSTRACT: ...
1 downloads 4 Views 1MB Size
Article pubs.acs.org/JCTC

Molecular Dynamics Force-Field Refinement against Quasi-Elastic Neutron Scattering Data Jose M. Borreguero* and Vickie E. Lynch Neutron Data Analysis and Visualization Division, Oak Ridge National Laboratory (ORNL), Oak Ridge, Tennessee, United States ABSTRACT: Quasi-elastic neutron scattering (QENS) is one of the experimental techniques of choice for probing the dynamics at length and time scales that are also in the realm of full-atom molecular dynamics (MD) simulations. This overlap enables extension of current fitting methods that use timeindependent equilibrium measurements to new methods fitting against dynamics data. We present an algorithm that fits simulation-derived incoherent dynamical structure factors against QENS data probing the diffusive dynamics of the system. We showcase the difficulties inherent to this type of fitting problem, namely, the disparity between simulation and experiment environment, as well as limitations in the simulation due to incomplete sampling of phase space. We discuss a methodology to overcome these difficulties and apply it to a set of full-atom MD simulations for the purpose of refining the force-field parameter governing the activation energy of methyl rotation in the octa-methyl polyhedral oligomeric silsesquioxane molecule. Our optimal simulated activation energy agrees with the experimentally derived value up to a 5% difference, well within experimental error. We believe the method will find applicability to other types of diffusive motions and other representation of the systems such as coarse-grain models where empirical fitting is essential. Also, the refinement method can be extended to the coherent dynamic structure factor with no additional effort.



ensure the transferability of the final potential to a set of systems and environmental conditions, differing substantially from the experimental conditions used for refinement. Quantitative comparison to the experiment is the necessary condition upon which we can trust the simulation to a level where we can exploit its atomistic details in order to produce confident observations that are not accessible to the experiment. The method here introduced allows for optimization of one force-field parameter; thus, care must be taken, with regard to which parameter one wishes to optimize. While this requirement may be limiting, there are many scenarios when diffusion properties are governed by a single force-field parameter, and in this paper, we will present, in detail, one such instance. On the other hand, when diffusion is governed by a set of uncorrelated force-field parameters and we insist on optimizing only one parameter, the procedure will likely yield unrealistic values, because of overfitting. For instance, consider a molecule in a solvent endowed with internal diffusivity motions governed by one parameter characterizing solute−solute interactions, and center of mass diffusion governed by a second parameter characterizing solute−solvent interactions. Typical QENS datasets are fitted with analytical composite models comprehending, at most, a model describing center-of-mass diffusion, a model describing rotational diffusion, a model describing

INTRODUCTION Molecular dynamics (MD) simulations have become one of the de facto standards for computational studies of dynamics employing diverse representations of the system, ranging from full atom details to coarse-grained representations of groups of atoms. This, despite its intrinsic disadvantage in that the associated force fields have a limited validity to reproduce the experiment over a broad range of systems and environment conditions (pressure, temperature, pH, or some other external generalized force) accurately. This lack of transferability has been tackled in several ways. One approach is to enrich the model with a more complex potential function, as long as one can tolerate the associated decrease in computational speed. The TIPxP water model series (TIP3P,1 TIP4P,2 TIP5P3) is a representative example of this approach. Another approach is to let the a priori constant parameters in the force field vary with environment conditions. This approach requires refinement of the parameters every time the system is to be simulated in an environment deviating significantly from the environment at which the parameter values were first determined. An implicit assumption of this approach is the validity of the chosen functional form for the potential. Otherwise, the refinement procedure will, in all likelihood, yield optimal parameters with unphysical values, which is a strong indication that the validity of the selected potential function to describe the physics of the system is questionable. Our objective in this work is to reproduce the quasi-elastic neutron scattering (QENS) data under the set environment conditions quantitatively, rather than © XXXX American Chemical Society

Received: September 12, 2015

A

DOI: 10.1021/acs.jctc.5b00878 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation

sampling. Our approach is to accept the occurrence of errors in the simulations, generating an ensemble of such “faulty” simulations and from this ensemble extricate the signal amid the noise, with the help of a variant of the smoothing algorithm.16 An ingredient of this smoothing procedure is a cubic-interpolation scheme to render an analytic dependence of the structure factor with the force-field parameter. This type of interpolation has been successfully employed by Luksic et al.17 to derive potentials of mean force for solvated ions of any size from a set of simulations for pairs of solvated ions of specific sizes. The resulting computed dynamical structure factor allows the optimization algorithm to evolve toward the global minimum. We selected a powder microcrystalline sample of octa-methyl polyhedral oligomeric silsesquioxane (mPOSS) molecules as the first system to apply our optimization procedure. The mPOSS molecule is composed of a cubic cage where Si atoms occupy the cube vertices, and O atoms are located in the cube edges (Figure 1). Thus, each Si atom has a tetrahedral

internal diffusive motions, and a model describing internal vibrations (usually a Debye−Waller factor). These composite models can feature up to five or six fitting parameters. In addition, one must also include a background model and, in some cases, an additional elastic component to take into account sizable coherent elastic contributions from the sample. These additions may bring the total number of fitting parameters close to the double-digit figure. In this scenario, it is not uncommon to impose constraints on the parameters to prevent overfitting. From these considerations, QENS datasets will allow for a maximum of five or six force-field parameters to be optimized, provided they are uncorrelated in the sense that they govern diffusive processes occurring in the different time and length scales that are accessible to the QENS technique. There are several well-established methods4−12 to optimize force-field parameters against time-independent (as opposed to dynamic, or time-dependent), equilibrium measurements such as scalars (density, vapor pressures) and multivariate (static structure factor) system properties. One advantage of optimizing against time-independent data is the relative simplicity of phase-space sampling: Any equilibrium property can be calculated by evaluating the partition function, which requires evaluation of the Boltzmann factor for each of the N conformations that the system may adopt. Of course, only those conformations maximizing the Boltzmann factor will contribute significantly to the partition function; however, in principle, we do not know what they are, because the force-field is yet to be optimized. An analogous strategy when calculating dynamical properties relevant in a time scale t = MΔt (where Δt is some microscopic time step) would require enumeration of all trajectories of duration t in phase space. The number of such trajectories is approximately given as NM, many orders of magnitude above N. Only those trajectories minimizing Jacobi’s action will describe the dynamics of the system accurately, but, again, we ignore who they are because the force-field is yet to be optimized. At any rate, the search in phase space with correlated conformations (via the equations of motion) will require many more computational resources than the uncorrelated search that is performed when computing structure properties. It is not surprising then that force fields optimized to faithfully reproduce time-independent data typically show a qualitative-degree performance when compared against dynamical data. For instance, a fit of the interparticle potential for a liquid against the static structure factor will incur in errors for the long-range part of potential, unless very-high-accuracy structure factors are obtained.4 The reason lies in the fact that the structure of a liquid is mainly determined by the short-range order imposed by the core repulsive part of the potential, and is fairly insensitive to the exact shape of the long-range part. By way of contrast, the dynamics structure factor is sensitive to both the short and long-range features of the interparticle potential. Comparison of experimental data against a model derived from simulations, as opposed to an analytical model, spurs a series of difficulties having their root cause in an incomplete sampling of phase space. Simulations, being finite in space and time, have associated errors that must be addressed if high accuracy is desired in the final force-field parameters. There exist mature algorithms13−15 for the sampling of phase space when comparing against time-independent equilibrium data. However, these algorithms do not reproduce the dynamics of the system, because their key ingredient is precisely the modification of the inherent dynamics in order to enhance

Figure 1. mPOSS molecule composed of Si (yellow), O (red), C (cyan), and H (gray) atoms. Nine different chains of consecutive O− Si−C−H covalent bonds can be constructed for each methyl group, because of the three different O and H atoms that can be selected at the extremes of the chain.

coordination to three O atoms and one methyl group. Methyl substitution by other chemical species makes POSS molecules highly versatile, with applications as organic solvents, polymer dispersants, catalysts, nanocomposites, diodes, and many other uses.18 In particular, mPOSS has found application as a coating for carbon fibers and low-dielectric films.19 QENS experiments have determined an activation energy for methyl rotations in mPOSS of 1.22 kcal/mol. In our simulations methyl rotations are described by a dihedral potential function. Obtaining the dihedral potential energy barrier yielding the best comparison between simulated and experimental dynamical structure factor produces a computed activation energy of 1.15 kcal/mol. This is to be compared with the value obtained before the optimization procedure is applied: 1.97 kcal/mol.



METHODS

The crystal structure for mPOSS was obtained from the Cambridge Structural Database20 (entry OCMSIO03) and the initial conformaB

DOI: 10.1021/acs.jctc.5b00878 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation tion, consisting of a single molecule, was prepared with Materials Studio.21 Topology and force field are specified with the AMBER99 force field.22 Of particular interest to our optimization procedure is the potential governing methyl rotations, described by a four-body term correlating atoms O, Si, C, and H connected in a chain of consecutive covalent bonds O−Si−C-H: V = K[1 + cos(3ϕ)]

were calculated with no cutoff, which is possible because of the small system size. Once the simulation was finished, translations and rotations of the entire molecule were removed by root-mean-square (RMS) superposition to the first conformation in the trajectory, so that only internal motions were retained. The processed trajectory was passed onto the Sassena software25 for calculation of the simulated incoherent intermediate structure factor:

(1)

where V is the potential, 2K the energy barrier, and ϕ the dihedral angle. There are nine of these terms for a given methyl group, because we can create nine different chains of consecutive O−Si−C−H bonds (selecting three different O and three different H at the ends of the chain; see Figure 1). Thus, the activation energy to methyl rotation becomes Eact = 9 × 2K. In what follows, K is the force-field parameter targeted for optimization. The methyl dihedral potential energy barrier is defined under entry FLAG DIHEDRAL_FORCE_CONSTANT in the AMBER topology file, which we simply edit to the desired value prior to running an MD simulation. The refinement of a singleparameter has the advantage that a moderate amount of MD simulations are sufficient to probe the parameter space of interest. Typically, 10−20 simulations are sufficient; this is a number that makes parallel submission a feasible strategy. Thus, all required sampling is done in the time required for a single simulation to run, typically a matter of hours. This simple grid search scheme becomes prohibitive if two or more force-field parameters are required to describe the diffusive properties of the system in the time scales probed by the experiment. In this scenario, a scheme performing a type of smart sampling in parameter space becomes necessary. A set of 26 MD simulations spanning K-values with sampling in the range [Kmin, Kmax] ≡ [0.02, 0.12] kcal/mol were carried out with the NAMD engine.23 In the experiment, derivation of the activation energy, Eact = 1.22 kcal/mol (Kexp = 0.0678 kcal/mol), was obtained by fitting an Arrhenius plot of the relaxation time describing the main relaxation of intramolecule motions, τ(T) ∝ eEact/kBT. Thus, a temperature scan in the range of 150−300 K was necessary to obtain Eact. By contrast, all simulations here were done at the same temperature (200 K), while the scan was performed in K-parameter space. A temperature of 200 K allows for the existence of abundant methyl rotations within the time scales (10−100 ps) where the quasi-elastic decay probed in experiment is most sensitive, thus optimizing the signal to be compared against. Simulations were carried out in the presence of a heat bath (NVT ensemble). The addition of a heat bath was deemed necessary due to small system size. In the powder crystal, every atom can transfer energy to/from atoms of neighboring molecules, while this is not possible in our single-molecule simulation. As a result, whole-molecule excitations will not relax in the simulation if the constant energy (NVE) ensemble is chosen, leading to very different internal dynamics, compared to the powder crystal. Here, the NVT was implemented through a Langevin dynamics term24 with a damping coefficient of 5 ps−1. The chosen value has been shown to reproduce the experimental diffusion coefficient of water for equilibrium simulations of the TIP3 water model1 under ambient conditions. Also, it is small enough to consider it as a nonsignificant perturbation to the main energy transfer pathway through bonded atoms. Furthermore, the heat bath was not coupled to the hydrogen atoms, in order to prevent whatever perturbations the heat bath may cause in the dynamics of those atoms generating most of the experimental signal. All these considerations bear little relevance when computing equilibrium properties, but they must be careful weighted when computing dynamical quantities. The span (tmax) of the equilibrium MD simulations was selected to be 20 ns. The simulation span should encompass the time resolution of the experiment to which one wishes to make comparison. In our example, data were taken at the BASIS beamline, capable of sampling diffusive motions with relaxation times up to τmax ≈ 1 ns. Although our tmax extends well beyond τmax, it is the case that the long-time tails of the structure factor must be clipped, because of insufficient statistical significance. A simulation time an order of magnitude above τmax has proven to be a satisfactory rule of thumb. Conformations were recorded every 1 ps. Both van der Waals and electrostatic interactions

Isim(Q , t , K ) = ⟨Isim(Q⃗ , t , K )⟩ΩQ =

1 Nat

Nat

∑ (biinc)2 eiQ⃗ [ri(⃗ t + t0) − ri(⃗ t)] i=1

t0 , ΩQ

(2)

The orientation average ΩQ is carried out with the calculation of Isim(Q⃗ , t, K) for a set of 103 Q⃗ vectors of the same magnitude and randomly oriented. An average initial time t0 is determined by selecting all conformations with times t + t0 < tmax for a given time t. Only incoherent scattering lengths (biinc) were considered, since the experimental signal is dominated by the incoherent scattering of neutrons by hydrogen atoms, in the range of energies and momentum transfers considered. Although the signal from the heavy atoms amounts to 0.004% of the hydrogen signal, they were incorporated into the calculation, since they posed no significant computational overhead. Isim was computed for a set of nine Q values spanning the range [0.3 Å−1, 1.9 Å−1], using same values as reported in the experiment. For every Q value, the experimental Iexp(Q, t) is actually an average of the signal over the range [Q − ΔQ,Q + ΔQ] with ΔQ = 0.1 Å−1, which is performed to gain statistical significance. A similar procedure can be carried out by calculating Isim(Q, t, K) on a finer grid of Q-points if computational cost is not a concern. Later, a re-bin step in the Q-axis is necessary so that the Q-grid used for simulations will conform to the experimental one. A valid indicator for the necessity of a finer Q-grid is a significant deviation from linearity in the decay of the elastic line Ssim(E = 0, Q, K) = ∫ Isim(Q, t, K) dt within each [Q − ΔQ,Q + ΔQ] window. Once Isim is calculated, the LoadSassena algorithm of the Mantid framework26 was employed to read this quantity. Currently, Mantid has loading algorithms that are able to import the output from the two most popular software packages that calculate Isim from MD trajectories, namely nMoldyn27 and Sassena. However, any other software can be used if the output is saved in formats readable by Mantid (ASCII and Nexus). The SassenaFFT algorithm was used to perform the Fourier transformation from time to energy transfer E, and incorporate the detailed balance condition correction.28 A re-bin step in the energy axis is carried out to conform to the range and spacing of experimental energies. The resulting structure factor, Ssim(Q, E, K), could be directly compared to experiments, if not for the fact that one must account for experimental resolution as well as background, environment disparities, and insufficient simulation sampling. Overcoming these difficulties is the main focus of this work. Because of different sources of errors in the experiment, the recorded number of neutrons scattered with nominal changes in momentum and energy (Q⃗ , E) will contain some neutrons that scattered with values (Q⃗ ′, E′) close to the nominal ones; thus, they were incorrectly assigned. As a result, the reported signal at (Q⃗ , E) contains contributions from the true signal, as well as other (Q⃗ ′, E′) values. The weight function that quantifies these contributions is usually termed as the resolution f unction, and the reported signal is written as a convolution product between the resolution and the true signal: ⎯→ Sreported(Q⃗ , E) = Strue(Q⃗ ′, E′)R(Q⃗ ′, E′|Q⃗ , E) dQ ′ dE′ (3)



Fortunately, this complex four-dimensional (4D) integral has enormous simplifications in the case of QENS. First, the resolution in momentum transfer is much higher than the Q-spacing, so that we can ignore contributions from values of Q different than the nominal Q value. Second, the range of energy transfers is so small that we can consider the resolution being of the same shape for all nominal E C

DOI: 10.1021/acs.jctc.5b00878 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation values. The reported signal can then be written as a one-dimensional (1D) convolution product:

χ 2 (K , {pi }) =



(Sexp(Q i , Ej) − Smodel(Q i , Ej , K , {pi }))2 σ(Q i , Ej)2

i,j

Sreported(Q , E) =

(7) where {pi} represents the instrument fit parameters (I, x, a, b) and σ is the estimation of the experimental error at every scattering channel (Q, E). It is implicit in this formula that the model is “exact”; that is, there is no error associated with the model. Minimization of χ2 with analytic methods requires its variation with respect to the fitting parameters. Variation of Smodel, with respect to the beamline parameters, can be obtained by taking the analytic derivative of eq 6. On the other hand, variation with respect to the force-field parameter K can only be calculated numerically:

∫ Strue(Q , E′)R(E − E′|Q ) dE

≡ R(Q , E) ⊗ Strue(Q , E)

(4)

If we want to faithfully reproduce the reported signal with our simulations, we must convolve our Ssim with the resolution function. It should be noted that the simulation itself also has a resolution. The simulation cannot report on dynamics happening on a time scale above tmax, which corresponds to an energy scale of approximately h/ tmax (0.2 μeV in our simulations). This energy resolution is well above that of the BASIS instrument (3.5 μeV), so it can safely be ignored. Analytical functions representing the instrument resolution, such as a Gaussian or Lorentzian, have been used29−32 in lieu of the experimental resolution. On the other hand, the resolution can be obtained from the experiment as the reported signal at very low temperatures, when all scattering is considered to be elastic and, thus, Strue(Q, E′) ∝ δ(E′). Using the resolution derived from experiment yields a more accurate quantitative comparison to experiment. The main reasons are that (i) the resolution function is not an even function in the E-axis, which is especially true in the case of time-offlight instruments33 such as BASIS; and (ii) changes in the resolution function may occur with successive instrument updates. Fortunately, obtaining the resolution function in a QENS experiment is a matter of a few hours of measuring time. In addition to resolution, the experiment features a background noise that is usually modeled by a constant plus linear term in energy. Putting everything together, our model structure factor now reads as

Smodel = R(Q , E) ⊗ Ssim(Q , E) + [aE + b]

′ ∂Ssim S′ (Q , E , K + ΔK ) − Ssim(Q , E , K ) ≈ sim ∂K ΔK

(8)

The calculation entails comparison between a simulation carried out at K and another simulation carried out at K + ΔK. The sampling problem arises when ΔK is so small that simulations report no change in S′sim, or worse, a change opposite to what is expected. Because of finite simulation time, as well as finite system size, conformational sampling during the simulation may not yield statistically significant ′ upon small changes in K. This is a real problem if the changes in Ssim dynamics, as probed by QENS, is highly sensitive to parameter K. These indeterminacies produce a rugged χ2(K), which results in a multitude of spurious local minima that ruin all chances of finding the global minimum. A brute force approach consists of running an ensemble of longer simulations and bigger systems, if resources are not ′ along a concern. Instead, we propose a smoothing procedure of Ssim the K-axis for all scattering channels (Q, E) consisting of two steps: (i) a running local fit and (ii) cubic spline interpolation. The recipe for the first step (running local fit) is as follows: (i) select a value in the K-axis (K = K0); (ii) consider a K-window of width 2ΔK, [K0 − ΔK, K0 + ΔK], and perform a linear or quadratic fit of S′sim(K|Q, E) to yield a locally LF fitted SLF sim(K0|Q, E) as well as an estimation of the error εsim(K0| Q, E). The error is computed as the average of the fit residuals over the window. The fitting takes into account all simulations that were performed with force-field values within the Kwindow, extracting the signal amid the noise (see Figure 2); and (iii) proceed to the next K value K0 → K0 + ΔK and repeat the recipe. In the second step, a cubic interpolation of SLF sim taking into account error εLF sim is carried out to yield Ssmooth(Q, E, K), which is continuously derivable in K. In addition, a cubic interpolation of εLF sim yields εsmooth(Q, E, K). The smoothing procedure is independently performed for all scattering channels. The algorithm has been implemented as a Python module.34 In the final model, Ssmooth replaces S′sim:

(5)

In our example, experiments were performed on a powder sample of microcrystals. By contrast, computations were carried out on a single molecule (gas phase) where global translations and rotations were removed, leaving only intramolecule motions. Thus, whole-molecule motions that are present in the experiment and are incorporated into the experimental structure factor are absent in the computed counterpart. Two reasonable approximations are made to deal with this disparity: (1) diffusive motions of the entire molecule in the probed length scales (the Q-range) are very minor, compared with intramolecule diffusive motions; and (2) dihedral barriers to methyl rotations are unaffected by the presence of neighboring molecules in the time scale resolved by the instrument. As a consequence of these approximations, differences between experimental and model structure factors due to whole-molecule diffusive motions are only significant within the energy range where the elastic line part of Sreported dominates the signal intensity. Also, coherent effects in the powder may contribute significantly to the elastic line intensity, especially at low Q. Thus, we removed the elastic line from Ssim and then included a fitting parameter in order to account for the experimental intensity of the elastic line. Our model structure factor now can be given as

Smodel(Q , E , K , {pi }) = I ·R(Q , E) ⊗ [(1 − x)δ(E) + xSsmooth(Q , E , K )] + [aE + b] (9)

Smodel(Q , E , K , {pi })

Since our model is not exact, we can opt to include the estimation of the simulation error in the goodness-of-fit measure. The model error can be written as

′ (Q , E , K )] + [aE + b] = I · R(Q , E) ⊗ [(1 − x)δ(E) + xSsim (6)

εmodel(Q , E , K , x) = I ·R(Q , E) ⊗ xεsmooth(Q , E , K )

′ includes only the quasi-elastic part of Ssim, and our fit where Ssim parameters are I, x, K, a, and b. Parameter x with 0 < x < 1 characterizes the strength of the simulated quasi-elastic signal, compared to the elastic line R(Q, E) ⊗ δ(E). While the above considerations are necessary to avoid those disparities related to the experiment, we still must address the shortcomings arising solely from the simulations. Optimization of the model structure factor is achieved with the minimization of the goodness of fit:

(10)

and the goodness of fit becomes χ 2 (K , {pi }) =

⎡ S (Q , E ) j exp i

∑⎢ i,j

+

D

⎢⎣ σ(Q i , Ej)2



(Smodel(Q i , Ej , K , {pi }))2

εmodel(Q i , Ej , K , x)2 ⎤ ⎥ ⎥⎦ σ(Q i , Ej)2

σ(Q i , Ej)2

(11)

DOI: 10.1021/acs.jctc.5b00878 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation

amplitude of the methyl group librations. The plateau in Isim indicates a lack of free diffusion, which is the result of the removal of global translations in the simulations. This plateau produces a sharp spike in the Fourier transform Ssim (see Figure 4).

Figure 2. Smoothing of the structure factor for scattering channel (Q, ′ E) = (0.9 Å−1, −0.13 meV). The structure factor from simulations Ssim (red dots) show fluctuations as we vary K due to finite sampling of phase space. An instance of the running local quadratic fit is shown for K0 = 0.06 kcal/mol. The five data points contained within the dashed blue circle are employed in the fit, yielding a locally fitted structure LF factor SLF sim (blue dot) and error estimation εsim (blue error bar). The final Ssmooth and εsmooth are shown as the black line and black error bars, respectively.

Figure 4. Structure factor Ssim with Q = 1.9 Å−1 and Q = 0.06 kcal/ mol. The removal of global rotations and translations of the molecule has produced a sharp elastic line at E = 0.



RESULTS AND DISCUSSION The shape of Isim is typical of caged or localized diffusive motions, with an initial decay due to diffusion within the cage that evolves into a plateau when enough time has elapsed for the particle to fully explore the cage (see Figure 3).

This spike is the elastic line of the simulation, which bears little similarity to the experimental elastic line. One could try to reproduce the experimental elastic line by mimicking the experimental sample as close as possible in our simulations.35 In our example, it would amount to simulate a supercell of mPOSS molecules, instead of the single molecule. Although possible, it is more than likely that the resulting model and experimental elastic lines would differ; therefore, we ought to include the coherent scattering contribution and an additional force-field parameter governing whole-molecule diffusivity motions in the fitting procedure, in order to reconcile the disparities. Neglecting to include an extra parameter would result in overfit of the force-field parameter K governing the local methyl rotations. Since we are restricted in this study to a single force-field parameter optimization, we opted to insert the experimental elastic line in the model via the convolution of the experimental resolution with a delta Dirac function. It is also possible that intermolecular interactions may be modulating the barrier to methyl rotations. On the other hand, we expected this effect to be of perturbative order. The mPOSS molecule lacks solid−solid phase transitions, in contrast with other POSS molecules functionalized with more complex aliphatic side chains (e.g., isobutyl-POSS). This observation points to a simple energy landscape governing intermolecular interactions, most likely resulting from comparatively weak Van deer Waals forces. Another supporting observation is the value of the activation temperature of methyl rotations in hydrated proteins, which is largely independent of hydration level and suggests weak interactions with the hydration shell. The evolution of Ssim with K in the region of 0.10−0.12 kcal/ mol, where intensive K-sampling was performed, shows the typical noisy pattern that is due to insufficient sampling of the simulation phase space (see Figure 5a). This noise results in spurious local minima for χ2(K), where the optimization routine will become trapped. Application of the smoothing

Figure 3. Computed intermediate structure factor for K = 0.06 kcal/ mol and Q = 1.9 Å−1. A steep decorrelation decay is followed by a nonzero constant baseline; all of these features are indicative of caged diffusion. The decorrelation relaxation time is on the order of 50 ps.

In our example, motions from all atoms are taken into consideration when calculating Isim, but the signal is dominated by contributions from hydrogens, because of their much stronger incoherent scattering length. The shape of the cage for a hydrogen atom in a methyl is a toroid with three favored positions, corresponding to the three possible rotamers. The magnitude of the toroid cross section is determined by the E

DOI: 10.1021/acs.jctc.5b00878 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation

Figure 5. Simulated structure factor before and after application of the smoothing procedure. (Left) A Ssim(K|Q, E) plot is shown for Q = 0.9 Å−1 and energy transfers from E = −0.13 meV (red) up to E = 0.10 meV (cyan) in increments of 0.1 meV. Noisy patterns for K > 0.10 kcal/mol are evident, because of insufficient simulation sampling. (Right) Resulting Ssmooth(K|Q, E) plot.

procedure with a K-window encompassing five consecutive Kvalues and a quadratic fit results in Ssmooth that is derivable, with respect to K in the sampled range of K-values (see Figure 5b), allowing the optimization to find the global minima. In principle, smoothing can also be achieved by alternative methods such as filtering out the high-frequency tails in the Fourier spectra of S′sim(K|Q, E), although this method would not reproduce a K-dependent error such as the one derived from the local running fit (see eq 11). Other optimization schemes that avoid calculation of partial derivatives, such as genetic algorithms, will not solve the problems arising from insufficient sampling unless they also incorporate a smoothing routine. The fit of Smodel against Sexp was carried out separately for each value of Q; thus, the sum in eq 11 is only restricted to energy values, that is, we have χ2(Q, K). By fitting each Q separately, we fit to the line shape but we leave the area under the Smodel curve as a free parameter. It is expected that the beamline parameters in eq 9 will be Q-dependent. However, optimal K* should be independent of Q if the force field is able to reproduce the dynamics of methyl rotations at all probed length scales. At any rate, since most of the reported QENS signal arises from localized methyl rotations, this test is not as demanding as in the case of a problem where the force-field parameter governs free diffusion. Prior to the optimization, and starting with the highest Q (Qmax = 1.9 Å−1), we calculated the K-dependence of χ2(Qmax, K) with the help of the Mantid algorithm DSFinterp, which performs the same type of smoothing as the optimization function DSFinterp1DFit. Algorithm DSFinterp returns a smooth structure factor Ssmooth(Q, E, K) that can be evaluated for any value of K in the [Kmin, Kmax] range, not just those values for which we performed the simulations. For each of these evaluations, we calculate χ2(Qmax, K) and plot it in Figure 6, which features a sharp minimum at ∼0.065 kcal/mol and shows that the range of K-values with low χ2 is rather narrow. This is the strongest indication that force-field parameter K is indeed governing the dynamics of the system, as reported by the QENS data. Once we obtained K* at Qmax, we proceed to determine K* for the next lower Q value (Q = 1.7 Å−1) using, as an initial guess, the value of K* at Qmax. We continue in a sequence

Figure 6. K-dependence of the goodness of fit (χ2) at the highest measured Q, Qmax = 1.9 Å−1. The measure shows a minimum at ∼0.065 kcal/mol. Also, the K-dependence at Qmin is shown. The shallower minimum indicates that the fit is less sensitive at large length scales (∼2π/Qmin).

fashion, finding K* for the current Q, using the optimal value from the previous round as the initial guess, until all Q-values have been exhausted. This procedure was later found to be not necessary since the smoothness of the structure factor guaranteed convergence to the absolute minimum independent of where we started the minimization, and we did test this assertion starting the minimization from various initial guesses spanning the [Kmin, Kmax] range. The sequential optimization merely accelerated the entire fitting routine, as we always started close to the minimum. The series of K*-values shown in Table 1 confirm that K* can be considered as independent of Q, and is within 4%−6% from Kexp = 0.0678 kcal/mol. The best estimate of K* is obtained at Qmax, with (Kexp − K*(Qmax))/Kexp ≈ 4%, and then a mild worsening trend toward smaller Q-values, with a difference of 6%. This is so because the length scale Lmin = 2π/Qmax ∼ 3Å is most similar to the diameter of the methyl rotations. Thus, the QENS signal at Qmax is most apt to describe the self-correlation of these F

DOI: 10.1021/acs.jctc.5b00878 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation

Note: An online tutorial to reproduce a fitting similar to the one presented in this study is available at http://camm.ornl. gov/users/dsfinterp1dfit.

Table 1. Optimal Barrier as a Function of the Q-Value or Length Scale Q (Å−1)

L (Å)

K* (kcal/mol)

goodness of fit, χ2

0.3 0.5 0.7 0.9 1.1 1.3 1.5 1.7 1.9

20.9 12.6 9.0 7.0 5.7 4.8 4.2 3.7 3.3

0.06354 0.06350 0.06313 0.06398 0.06472 0.06495 0.06508 0.06517 0.06493

1.83 2.44 2.20 2.69 2.43 2.54 2.89 3.55 4.07



SUMMARY AND CONCLUSIONS We have developed a method to refine a force-field parameter governing the diffusion in the system as probed by quasi-elastic neutron scattering experiments. We tested the method on a full atom representation of the mPOSS molecule in order to refine the dihedral barrier to methyl rotations, obtaining an optimal value for the activation energy that differs by 5% from the experimentally determined value, which is well within experimental error. The method we presented allows the optimization of one force-field parameter; however, QENS data of more-complex systems are rich enough to allow for a greater number of parameters to be optimized. We are exploring the extension of the algorithm to more than one parameter, which will require smart sampling of parameter space to avoid submission of a prohibitive number of simulations.37 Also, since it is not uncommon to combine QENS experiments with other spectroscopy techniques (dielectric relaxation,38 Raman and infrared methods39), differential scanning calorimetry,40 and structural measurements (X-ray and neutron diffraction), there is the opportunity to insert this optimization method into modular optimization toolkits41,42 that would enable robust optimization against an expanded experimental dataset. We expect this optimization method will find wide applicability in both coarse-grained and full atom models of samples that are studied at QENS beamlines (ionic solutions and ionic polymers, functionalized nanoparticles, diffusion of hydrogenous species in solid matrices, and protein dynamics, among others). Future plans are to integrate this type of refinement into the experimental chain to empower researchers to compare simulated and experimental results in almost real time.

motions. Therefore, we expected the best estimate of K* at Qmax with subsequent estimates worsening as Q decreased (L increased). This was the reason that led us to begin the sequential fitting routine at Qmax. In Figure 7, we show two of the fits spanning the experimental Q-range. All K* values are consistently smaller than the experimental ones. A plausible explanation is that we ignored the effect of neighboring molecules in the rotation of the methyl groups, which we hypothesize as hampering rotation due to mild steric clashes. The role of neighboring molecules is likely to become significant for POSS molecules with a larger side-group than the methyl one of mPOSS. The role of neighboring molecules was also invoked as the most likely cause for the discrepancy found in first-principles (density functional theory, DFT) calculations of the methyl rotational barrier.36 In this study, the barrier was calculated to be 1.49 kcal/mol, which is 22% higher than the QENS result. The goodness of fit (χ2) shown in Table 1 also features a trend: it decreases as Q decreases. We now recall that we explicitly inserted the experimental elastic line in our model via the convolution of the experimental resolution with a delta Dirac. At high Q, the quasi-elastic component originating in our simulations dominates Smodel, but at low Q, it is the elastic component originating in the experiment that dominates Smodel. Therefore, it is only natural that, in the low-Q regime, Smodel will compare better to the experimental signal, when the elastic component dominates the signal.



AUTHOR INFORMATION

Corresponding Author

*Tel.: (678) 793-6463. E-mail: [email protected]. Notes

The authors declare no competing financial interest.

Figure 7. Optimal fits Qmin = 0.3 Å−1 and Qmax = 1.9 Å−1. The elastic line dominates the fit in the first case, and the quasi-elastic signal dominates the fit at Qmax. G

DOI: 10.1021/acs.jctc.5b00878 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation



C., Kremer, K., Eds.; Advances in Polymer Science, Vol. 221; Springer−Verlag: Berlin, Germany, 2009; p 167. (16) Andricioaei, I.; Straub, J. E. Global optimization using bad derivatives: Derivative-free method for molecular energy minimization. J. Comput. Chem. 1998, 19 (13), 1445. (17) Lukšič, M.; Fennell, C. J.; Dill, K. A. Using interpolation for fast and accurate calculation of ion−ion interactions. J. Phys. Chem. B 2014, 118, 8017. (18) DeArmitt, C. In Applications of Polyhedral Oligomeric Silsesquioxanes, 1st Edition; Hartmann-Thompson, C., Ed.; Advances in Silicon Science, Vol. 3; Springer: Amsterdam, 2011; p 210. (19) Cordes, D. B.; Lickiss, P. D.; Rataboul, F. Recent developments in the chemistry of cubic polyhedral oligosilsesquioxanes. Chem. Rev. 2010, 110 (4), 2081. (20) Allen, F. The Cambridge structural database: a quarter of a million crystal structures and rising. Acta Crystallogr., Sect. B: Struct. Sci. 2002, 58 (3), 380. (21) Discovery studio modeling environment, 4.0; Accelrys Software, Inc.: San Diego, CA, 2013. (22) Pearlman, D. A.; Case, D. A.; Caldwell, J. W.; Ross, W. S.; Cheatham, T. E.; DeBolt, S.; Ferguson, D.; Seibel, G.; Kollman, P. AMBER, a package of computer programs for applying molecular mechanics, normal mode analysis, molecular dynamics and free energy calculations to simulate the structural and energetic properties of molecules. Comput. Phys. Commun. 1995, 91 (1−3), 1. (23) Phillips, J. C.; Braun, R.; Wang, W.; Gumbart, J.; Tajkhorshid, E.; Villa, E.; Chipot, C.; Skeel, R. D.; Kalé, L.; Schulten, K. Scalable molecular dynamics with NAMD. J. Comput. Chem. 2005, 26 (16), 1781. (24) Leach, A. R. Molecular Modelling: Principles and Applications, 2nd Edition; Prentice Hall: Dorchester, England, 2001; p 744. (25) Lindner, B.; Smith, J. C. SassenaX-ray and neutron scattering calculated from molecular dynamics trajectories using massively parallel computers. Comput. Phys. Commun. 2012, 183 (7), 1491. (26) Arnold, O.; Bilheux, J. C.; Borreguero, J. M.; Buts, A.; Campbell, S. I.; Chapon, L.; Doucet, M.; Draper, N.; Ferraz Leal, R.; Gigg, M. A.; Lynch, V. E.; Markvardsen, A.; Mikkelson, D. J.; Mikkelson, R. L.; Miller, R.; Palmen, K.; Parker, P.; Passos, G.; Perring, T. G.; Peterson, P. F.; Ren, S.; Reuter, M. A.; Savici, A. T.; Taylor, J. W.; Taylor, R. J.; Tolchenov, R.; Zhou, W.; Zikovsky, J. MantidData analysis and visualization package for neutron scattering and SR experiments. Nucl. Instrum. Methods Phys. Res., Sect. A 2014, 764 (0), 156−166. (27) Hinsen, K.; Pellegrini, E.; Stachura, S.; Kneller, G. R. nMoldyn 3: Using task farming for a parallel spectroscopy-oriented analysis of molecular dynamics simulations. J. Comput. Chem. 2012, 33 (25), 2043. (28) Schofield, P. Space-time correlation function formalism for slow neutron scattering. Phys. Rev. Lett. 1960, 4 (5), 239. (29) Borreguero, J. M.; He, J.; Meilleur, F.; Weiss, K. L.; Brown, C. M.; Myles, D. A.; Herwig, K. W.; Agarwal, P. K. Redox-promoting protein motions in rubredoxin. J. Phys. Chem. B 2011, 115 (28), 8925. (30) Calandrini, V. H.; Hamon, V.; Hinsen, K.; Calligari, P.; Bellissent-Funel, M.-C.; Kneller, G. R. Relaxation dynamics of lysozyme in solution under pressure: Combining molecular dynamics simulations and quasielastic neutron scattering. Chem. Phys. 2008, 345 (2−3), 289. (31) Tarek, M.; Tobias, D. J. The dynamics of protein hydration water: A quantitative comparison of molecular dynamics simulations and neutron-scattering experiments. Biophys. J. 2000, 79 (6), 3244. (32) Roh, J. H.; Curtis, J. E.; Azzam, S.; Novikov, V. N.; Peral, I.; Chowdhuri, Z.; Gregory, R. B.; Sokolov, A. P. Influence of hydration on the dynamics of lysozyme. Biophys. J. 2006, 91 (7), 2573. (33) Vondreele, R. B.; Jorgensen, J. D.; Windsor, C. G. Rietveld refinement with spallation neutron powder diffraction data. J. Appl. Crystallogr. 1982, 15 (Dec), 581. (34) Borreguero, J. M. The dsfinterp python module at the python package index. https://pypi.python.org/pypi/dsfinterp (accessed Jan. 21, 2015).

ACKNOWLEDGMENTS Authors would like to thank S. E. Anderson, for initial topology and coordinates files; N. Jalarvo, M. K. Crawford, and E. Mamontov, for providing the experimental structure factors and fruitful discussions; and K. W. Herwig and T. Proffen, for careful review of the manuscript and providing valuable comments and suggestions. J.M.B. and V.E.L. are supported by the Center for Accelerating Materials Modeling (CAMM), which is funded by the U.S. Department of Energy, Basic Energy Sciences, Materials Sciences and Engineering Division Division, under FWP-3ERKCSNL. Research at the Spallation Neutron Source was sponsored by the Division of Scientific User Facilities, Office of Basic Energy Sciences, U.S. Department of Energy, under Contract No. DE-AC05-00OR22725 with UT-Battelle, LLC.



ABBREVIATIONS



REFERENCES

MD, molecular dynamics; mPOSS, octa-methyl polyhedral oligomeric silsesquioxane; QENS, quasi-elastic neutron scattering; RMS, root-mean-square

(1) Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L. Comparison of Simple Potential Functions for Simulating Liquid Water. J. Chem. Phys. 1983, 79 (2), 926−935. (2) Jorgensen, W. L.; Madura, J. D. Temperature and size dependence for Monte Carlo simulations of TIP4P water. Mol. Phys. 1985, 56 (6), 1381−1392. (3) Mahoney, M. W.; Jorgensen, W. L. A five-site model for liquid water and the reproduction of the density anomaly by rigid, nonpolarizable potential functions. J. Chem. Phys. 2000, 112 (20), 8910−8922. (4) Levesque, D.; Weis, J. J.; Reatto, L. Pair interaction from structural data for dense classical liquids. Phys. Rev. Lett. 1985, 54 (5), 451. (5) Soper, A. K. Empirical potential Monte Carlo simulation of fluid structure. Chem. Phys. 1996, 202 (2−3), 295. (6) Nerenberg, P. S.; Head-Gordon, T. Optimizing protein−solvent force fields to reproduce intrinsic conformational preferences of model peptides. J. Chem. Theory Comput. 2011, 7 (4), 1220. (7) Chapman, D. E.; Steck, J. K.; Nerenberg, P. S. Optimizing protein−protein van der Waals interactions for the AMBER ff9x/ff12 force field. J. Chem. Theory Comput. 2014, 10 (1), 273. (8) Best, R. B.; Hummer, G. Optimized molecular dynamics force fields applied to the helix-coil transition of polypeptides. J. Phys. Chem. B 2009, 113 (26), 9004. (9) Jiang, F.; Zhou, C. Y.; Wu, Y. D. Residue-specific force field based on the protein coil library. RSFF1: modification of OPLS-AA/L. J. Phys. Chem. B 2014, 118 (25), 6983. (10) Di Pierro, M.; Elber, R. Automated optimization of potential parameters. J. Chem. Theory Comput. 2013, 9 (8), 3311. (11) Mechelke, M.; Habeck, M. Estimation of interaction potentials through the configurational temperature formalism. J. Chem. Theory Comput. 2013, 9 (12), 5685. (12) Ni, B.; Baumketner, A. Reduced atomic pair-interaction design (RAPID) model for simulations of proteins. J. Chem. Phys. 2013, 138 (6), 064102. (13) Barducci, A.; Bonomi, M.; Parrinello, M. Metadynamics. WIREs Comput. Mol. Sci. 2011, 1 (5), 826. (14) Nymeyer, H.; Gnanakaran, S.; Garcia, A. E. Atomic simulations of protein folding, using the replica exchange algorithm. Methods Enzymol. 2004, 383, 119. (15) Dellago, C.; Bolhuis, P. G., Transition path sampling and other advanced simulation techniques for rare events. In Advanced Computer Simulation Approaches for Soft Matter Sciences III, 1st Edition; Holm, H

DOI: 10.1021/acs.jctc.5b00878 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation (35) Tarek, M.; Martyna, G. J.; Tobias, D. J. Amplitudes and frequencies of protein dynamics: Analysis of discrepancies between neutron scattering and molecular dynamics simulations. J. Am. Chem. Soc. 2000, 122 (42), 10450. (36) Jalarvo, N.; Gourdon, O.; Ehlers, G.; Tyagi, M.; Kumar, S. K.; Dobbs, K. D.; Smalley, R. J.; Guise, W. E.; Ramirez-Cuesta, A.; Wildgruber, C.; Crawford, M. K. Structure and Dynamics of Octamethyl-POSS Nanoparticles. J. Phys. Chem. C 2014, 118 (10), 5579. (37) Hulsmann, M.; Reith, D. SpaGrOWA Derivative-Free Optimization Scheme for Intermolecular Force Field Parameters Based on Sparse Grid Methods. Entropy 2013, 15 (9), 3640. (38) Richert, R., Supercooled liquids and glasses by dielectic relaxation spectroscopy. In Advances in Chemical Physics, Vol. 156; Rice, S. A., Dinner, A. R., Eds.; John Wiley & Sons, Inc: Hoboken, NJ, 2015; p 101. (39) Wilson, E. B., Jr.; Decius, J. C.; Cross, P. C. Molecular Vibrations: The Theory of Infrared and Raman Vibrational Spectra, 1st Edition; Dover Publications: New York, 1980; p 1. (40) Gill, P.; Moghadam, T. T.; Ranjbar, B. Differential scanning calorimetry techniques: Applications in biology and nanoscience. J. Biomol. Technol. 2010, 21 (4), 167. (41) Hulsmann, M.; Müller, T. J.; Koddermann, T.; Reith, D. Automated force field optimization of small molecules using a gradient-based workflow package. Mol. Simul. 2010, 36 (14), 1182. (42) Wang, L. P.; Martinez, T. J.; Pande, V. S. Building Force Fields: An Automatic, Systematic, and Reproducible Approach. J. Phys. Chem. Lett. 2014, 5 (11), 1885.

I

DOI: 10.1021/acs.jctc.5b00878 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX