On Obtaining Boltzmann-Distributed Configurational Ensembles from

5 days ago - Discover the Most-Read Physical Chemistry Articles of March 2019. There are lots of different ways to look at the reach of an article...
0 downloads 0 Views 2MB Size
Letter pubs.acs.org/JCTC

Cite This: J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

On Obtaining Boltzmann-Distributed Configurational Ensembles from Expanded Ensemble Simulations with Fast State Mixing Sebastian Wingbermühle* and Lars V. Schäfer* Theoretical Chemistry, Ruhr University Bochum, D-44780 Bochum, Germany

Downloaded via UNIV OF MELBOURNE on April 23, 2019 at 03:14:12 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.

S Supporting Information *

ABSTRACT: In Expanded Ensemble (EXE) or Simulated Tempering simulations, the system’s (effective) temperature is frequently updated to enhance configurational sampling. We investigated how short the EXE state update interval τ can become before too frequent updates impede Boltzmann sampling. Simulating alanine dipeptide in explicit water, we show that a hybrid MC/MD integrator reliably yields Boltzmann-distributed configurations regardless of τ. However, in MD-driven EXE simulations with short τ, configurational ensembles depend on the thermostat settings.

I

which the state is sampled. To ensure equal sampling of all states,7 all gi are chosen to be the relative free energy of their state in this study. With this setup, the EXE simulation walker performs a random walk in state space, during which it successively samples configurations in all states and frequently returns to the target state in between.8 The random walk in state space is performed in an efficient way by the Metropolized-Gibbs sampler9 that suggests a new state with probability

n molecular dynamics (MD) simulations, sampling the entire relevant configurational space of large and flexible molecules such as proteins is a time-consuming challenge. A large number of important minima on the energy surface can be separated by high barriers that, with the thermal energy available, are overcome only very rarely on the limited simulation time scale. To enhance configurational sampling, it is helpful to introduce additional states in which barrier crossing is facilitated1−either by elevating the temperature to increase the kinetic energy of the system or by scaling the potential energy to lower the energy barrier.2,3 As downscaling the potential energy changes the Boltzmann weight of a configuration in the same way as elevating the temperature, potential energy scaling corresponds to increasing the effective temperature of the system. For most thermodynamic properties, however, one is only interested in structures sampled in the unmodified target state (usually the lowest (effective) temperature); states at higher (effective) temperatures should only be sampled intermittently to speed up barrier crossing. This is the idea underlying Expanded Ensemble (EXE)4 or Simulated Tempering.5 After a (usually) fixed number of configurational updates in state i sampled from the conditional distribution6 π (x | i ) =

(1b)

Here, ui(x) denotes the reduced potential energy (i.e., in units of β−1 = kBT) of the configuration x in state i, and gi represents the biasing weight of state i that determines the frequency at © XXXX American Chemical Society

(2b)

where τ denotes the time interval between two subsequent EXE state updates, and λ2 is the second largest eigenvalue of the empirical state transition matrix. As λ2 is largely unaffected by the choice for τ, the time interval τ is crucial for the sampling efficiency of EXE simulations, and it would be desirable to choose it as short as possible.10−12 The lower bound for τ is given by the requirement that configurational sampling must not be distorted by the history of state space

(1a)

exp[gi − ui(x)] ∑i′ exp[gi′ − ui′(x)]

| l 1 − π (i|x) o o o 1, Paccept(j|x , i) = mino m } o o o o 1 ( ) − | j x π n ~

For the Markovian walk in state space, a lower bound for the correlation time τ2 of the EXE simulation can be estimated as6 τ τ2 = 1 − |λ 2 | (3)

a new state for the resulting configuration x is drawn from the conditional distribution π (i|x) =

(2a)

The new state is accepted or rejected with probability

exp[−ui(x)]

∫Ω dx exp[−ui(x)]

l π (j| x ) o o o j≠i o π (i|x) − 1 α (j| x , i ) = o m o o o o o j=i 0 n

Received: January 31, 2019 Published: April 17, 2019 A

DOI: 10.1021/acs.jctc.9b00100 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Letter

Journal of Chemical Theory and Computation

Figure 1. Representative Ramachandran ϕ, ψ maps (free energy contour plots) of alanine dipeptide from EXE simulations with the hybrid MC/ MD sampler and the state update interval set to τ = 0.2 ps (a-c) are compared to the map sampled in the 10 μs standard MD simulation (d). The Ramachandran maps obtained from the EXE simulations show the dihedral distribution sampled in a) the lowest (unscaled) state of one single EXE run, b) the lowest state of all five EXE runs, or c) the higher (scaled) states of all five EXE runs (i.e., excluding the lowest state). The dihedral distribution shown in c) is therefore not Boltzmann-weighted. The black boxes indicate the definitions used to assign configurations to the four different secondary structure elements.

Boltzmann weights but also frequently randomizes the velocities. To compare EXE simulations driven by the hybrid MC/MD sampler to EXE runs conducted with a velocity Verlet MD integrator, we investigate the configurational ensembles of alanine dipeptide in water in terms of the relative populations of different secondary structures, i.e., different regions in the Ramachandran (ϕ, ψ backbone dihedral angle) map. All simulations were carried out with the MD program package GROMACS,15 which was extended with the implementation of the hybrid MC/MD sampler described in this work. For the MD integrator, the influence of the thermostat settings on the populations of the different secondary structure elements was examined.

sampling. Irrespective of the previous state and the resulting initial configuration, the integrator−assisted by a thermostat in the case of MD−has to sample the final configuration in state i with probability π(x|i) after the lag-time τ.10,13 In this work, pure MD integration (velocity Verlet) is compared in terms of relaxation efficiency to a hybrid MC/MD scheme in which the configuration proposed by MD is evaluated by an additional Metropolis Monte Carlo (MC) step. The hybrid MC/MD sampling scheme employed here was originally proposed by Mehlig et al.14 and includes three steps: after random initial velocities have been drawn from a Maxwell−Boltzmann distribution, the configuration xold is propagated by MD integration in the NVE ensemble. The resulting configuration xnew is accepted or rejected based on the probability Paccept (xnew|xold) = min{1, exp[−βδH(x , p)]}

1. METHODS 1.1. Potential Energy Scaling in EXE Simulations. In three states, corresponding to the effective temperatures 300 K, 400 K, and 600 K, the internal potential energy of alanine dipeptide was scaled2,16 by factors of 1 (i.e., no scaling), 0.75, and 0.5, respectively. The interaction potential between alanine dipeptide and the water molecules and ions was scaled by factors 1, 0.75 , and 0.5 , respectively, whereas the potential energy of water and ions was not scaled. Moves in state space

(4)

where β is the inverse temperature, and δH(x, p) denotes the discretization error in the total energy, i.e., the energy difference between the beginning and the end of a short NVE simulation. This acceptance criterion ensures detailed balance.14 To sum up, this hybrid MC/MD scheme not only ensures that configurations enter the ensemble with their B

DOI: 10.1021/acs.jctc.9b00100 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Letter

Journal of Chemical Theory and Computation

Figure 2. Relative populations of four regions in the Ramachandran ϕ, ψ map of alanine dipeptide, corresponding to the indicated four secondary structure elements, as sampled in EXE simulations. The time interval τ between two subsequent EXE state updates was varied for four integrator and thermostat settings: a) hybrid MC/MD sampler, b) velocity Verlet and Andersen-massive thermostat, c) velocity Verlet and velocity-rescaling thermostat with a stochastic term, and d) velocity Verlet and Nosé−Hoover chain (with ten thermostats). Reference values (solid lines) were obtained from a 10 μs standard MD simulation in which the velocity Verlet MD integrator and one Nosé−Hoover chain with ten thermostats were employed. The statistical errors of the reference values (shaded areas) were calculated from the difference between the results obtained for the first 5 μs and the last 5 μs.

were mediated by Metropolized-Gibbs sampling among all effective temperatures. 1.2. Simulation Sets. In the EXE simulations conducted here, different time intervals τ between subsequent EXE state updates were tested. For each time interval, 20 sets of three 50 ns EXE simulations were run. In five sets, the hybrid MC/MD sampler was employed. In 15 sets, pure velocity Verlet integration was used together with different thermostats (5x Andersen-massive,17 5x velocity-rescaling with a stochastic term,18 5x Nosé−Hoover chain19,20). In each set, one EXE run was launched at each effective temperature. Configurations that were sampled in the same set and belonged to the lowest effective temperature were collected before the relative populations of different regions in the Ramachandran map, corresponding to the secondary structure elements polyproline (PP)-helix, β-sheet, and αr- and αl-helix were analyzed. Results were averaged among the five sets with identical simulation parameters, and statistical error estimates were obtained from the corresponding standard deviations. The time intervals τ between EXE state updates were 5 ps (2 ps), 2 ps (1 ps), 1 ps (1 ps), 0.6 ps (0.2 ps), 0.4 ps (0.2 ps), and 0.2 ps (0.2 ps).

Values in brackets indicate the time interval between two subsequent Metropolis steps of the MC/MD sampler or the time constant τT of the Andersen-massive thermostat in the MD-driven EXE runs. The EXE results are compared to reference values from a 10 μs standard MD simulation that was driven by the velocity Verlet MD integrator and thermostated by one Nosé−Hoover chain (including ten thermostats).

2. RESULTS AND DISCUSSION Before analyzing the relative populations of the different secondary structure elements, four selected Ramachandran ϕ, ψ maps are discussed to illustrate how the EXE algorithm samples the dihedral space of alanine dipeptide (Figure 1). In contrast to the 10 μs standard MD simulation (Figure 1d), configurational sampling in the lowest (unscaled) state of an EXE run is focused on the minima of the free energy surface, i.e., on the four selected regions of the ϕ, ψ map (Figure 1a). The transition regions are sampled very rarely, except for the transitional configurations in the vicinity of the αr-helix. This sampling pattern of the EXE algorithm is confirmed by the pooled distribution of the dihedral angles sampled in the C

DOI: 10.1021/acs.jctc.9b00100 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Letter

Journal of Chemical Theory and Computation lowest state of all five EXE simulations (Figure 1b). Due to the potential energy scaling applied, barriers are crossed preferentially in the higher (scaled) states so that the transition regions of the Ramachandran map are mainly populated by simulation walkers in these excited states (Figure 1c). By design, the lowest state of the EXE simulations performed here thus rapidly yields locally equilibrated configurational distributions such that the free energy minima are sampled according to their Boltzmann weight. The relative populations of the four selected regions of the ϕ, ψ map of alanine dipeptide obtained from the EXE simulations are compared to the reference values from a converged 10 μs standard MD simulation in Figure 2 (data listed in Table S1). Interestingly, in the EXE simulations driven by a velocity Verlet MD integrator combined with a chain of ten Nosé−Hoover thermostats, the populations of the PP- and αr-helix systematically deviate from the reference values at short time intervals τ between subsequent EXE state updates (Figure 2d). To overcome this problem, we employed a hybrid MC/MD integrator that rigorously ensures sampling the desired thermodynamic ensemble by the Metropolis step (Figure 2a). In the following, we will first discuss the EXE simulations driven by the hybrid MC/MD sampler and then return to purely MD-driven schemes with different thermostat settings that require increasingly more time to adjust the kinetic energy and thus lead to longer velocity correlation times. Our systematic analysis shows that the thermostat’s time constant τT has to be chosen with care to ensure the desired Boltzmann sampling. We also suggest a mechanistic explanation as to why large values of τT lead to non-Boltzmann sampling in the EXE framework. At the lowest effective temperature, the EXE runs with the hybrid MC/MD integrator yield the reference populations within one standard deviation for all values of τ investigated (Figure 2a). Thus, the hybrid MC/MD scheme reliably and robustly samples the desired configurational distribution regardless of the time interval τ between subsequent EXE state updates. Next, we investigated whether the desired Boltzmann distribution will still be sampled if the additional Metropolis acceptance step included in the hybrid MC/MD scheme is omitted and only the velocities are randomized frequently. To that end, we combined the velocity Verlet MD integrator with the Andersen-massive thermostat, which randomly assigns new velocities to all particles every τT. These EXE runs also yield the reference populations for all values of τ (Figure 2b), showing that randomizing only the velocities suffices to ensure Boltzmann sampling in EXE simulations. In MD integration, the current velocities depend on accelerations at previous time steps and thus contain information about the system’s history. By randomizing the velocities, the above two schemes therefore frequently clear the system’s history. Consequently, the time constant of the Metropolis step or the Andersen-massive thermostat is an upper bound for the time span during which the dynamics can be influenced by the history of the trajectory. Thus, as long as the Metropolis step or the Andersen-massive thermostat is applied at least as frequently as the EXE state update, the configurational distribution sampled in the current state does not depend on the state that has been sampled before. Velocity-rescaling with a stochastic term and the Nosé− Hoover thermostat, in contrast, do not adjust the system’s kinetic energy instantaneously but relax it to a target value over

several coupling steps. The speed with which these thermostats add or withdraw kinetic energy is controlled by τT; i.e., for greater values of τT, the kinetic energy is adjusted more slowly and can convey information from more previous sampling points in the trajectory. In Figure 2c, velocity Verlet MD integration with the velocity-rescaling thermostat with τT = 0.1 ps is used. The populations of all secondary structure elements are in good agreement with the reference populations regardless of the time interval τ between two subsequent EXE state updates. However, with a thermostat time constant of τT = 0.5 ps, population density is transferred from the PPhelix to the αr-helix at τ ≤ 0.6 ps (Figure S1). A similar systematic shift of population density is observed for the EXE runs in which the velocity Verlet MD integrator was combined with a Nosé−Hoover chain with τT = 0.5 ps (Figure 2d). Only if τ ≥ 2 ps, the desired configurational distribution of the reference simulation is sampled. As τ becomes shorter, the relative population of the PP-helix systematically decreases while that of the αr-helix increases. These deviations from the reference values are small but statistically significant. According to the AMBER99SB* force field21,22 used here, the β-sheet and PP-helix have the lowest potential energy (with nearly identical potential energy distributions), followed by the αr-helix, and the αl-helix has the highest potential energy (Figure S2a). If only the internal potential energy of alanine dipeptide is considered and the interactions with the water molecules are omitted (Figure S2b), the potential energy distribution obtained for the β-sheet is shifted clearly toward lower energies compared to the PP-helix. In the EXE runs with τ < 2 ps, population density is thus (artificially) transferred between two energetically neighboring regions, from configurations with lower potential energies to those with higher potential energies. In addition, the distribution of potential energies is monotonically shifted toward higher potential energies as the effective temperature of the state increases (Figure S3). As the potential energy of the configurations in all states was calculated using the potential energy function corresponding to the lowest effective temperature, this shift is not an artifact of the potential energy scaling applied. Instead, states with higher effective temperatures sample more energetically unfavorable configurations. The configuration that returns from a state with higher effective temperature to the state with the lowest effective temperature is thus likely to have a high potential energy at the lowest effective temperature, and the MD integrator and thermostat need to relax it to lower potential energies before the next state update. Consequently, the population shifts observed in the EXE runs indicate that the velocity Verlet integrator together with Nosé−Hoover or velocity-rescaling thermostats with a time constant τT = 0.5 ps cannot relax the incoming configuration fast enough for very frequent EXE state updates (small values of τ). Hence, the configurational distribution sampled in the new state still deviates from the Boltzmann distribution when the state is updated again. Taken together, the data show that, to obtain the desired Boltzmann-distributed configurational ensembles in EXE simulations, the probability for a configuration to be sampled must not depend on the configuration sampled one time interval τ earlier. If the temperature itself is updated during the simulation, this can be achieved by rescaling the velocities immediately after changing to a new temperature.23 In the EXE simulations performed here, the potential energy is scaled, and D

DOI: 10.1021/acs.jctc.9b00100 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Letter

Journal of Chemical Theory and Computation ORCID

the temperature is kept constant such that the velocities cannot be rescaled instantaneously after changing the state. Instead, Boltzmann-distributed configurational ensembles can be achieved by the MD integrator together with a thermostat that adjusts the kinetic energy rapidly enough to the new potential energy surface and thus removes information about the sampling history that is contained in the velocities. The combination of the EXE algorithm with the potential energy scaling applied here is expected to be particularly useful to enhance the sampling of larger systems like proteins in explicit water because it allows to use a very small number of states to span a given range of the effective temperature. Given appropriate biasing weights, the EXE algorithm has been shown to be more efficient than replica exchange if the states are spaced widely.12 The potential energy scaling applied here greatly reduces the number of states by heating only a selected part of the system, e.g. only the protein and not the surrounding water molecules, and has been shown to be particularly efficient at sampling large-scale conformational transitions of proteins.3 Alanine dipeptide was used as a model system to identify suitable integrator and thermostat settings: the hybrid MC/MD sampler and the velocity Verlet integrator combined with the Andersen-massive thermostat. With these two settings, Boltzmann-distributed configurational ensembles are robustly sampled, regardless of the EXE state update interval. In addition, appropriate values of the simulation parameters, e.g. of the time constant τT of the Andersenmassive thermostat, are easily identified.

Lars V. Schäfer: 0000-0002-8498-3061 Funding

This work was supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy − EXC-2033 − project number 390677874 and an Emmy Noether grant to L.V.S. (SCHA 1574/3-1). S.W. thanks the Fonds der Chemischen Industrie (FCI) for a Chemiefonds stipend. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS We thank Michael R. Shirts and Pascal Merz for insightful discussions.



3. CONCLUSIONS To maximize the sampling efficiency of Expanded Ensemble or Simulated Tempering simulations, it is desirable that the time interval τ between two EXE state updates is as short as possible. In EXE simulations driven by MD integration, too small values of τ can lead to distorted configurational ensembles. Thus, to ensure that a properly Boltzmannweighted configurational ensemble is sampled, the MD thermostat settings, especially the thermostat’s time constant, have to be chosen with care. If a Nosé−Hoover chain or velocity-rescaling with a stochastic term is applied as the thermostat of an unknown system, the MD thermostat settings might need to be validated for a particular value of τ beforehand, which is tedious, especially for large systems. The hybrid MC/MD sampler as well as the velocity Verlet integrator combined with the Andersen-massive thermostat reliably yield the correct configurational distribution as long as the velocities are randomized at least as frequently as the EXE state is updated.



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jctc.9b00100.



REFERENCES

(1) Pan, A. C.; Weinreich, T. M.; Piana, S.; Shaw, D. E. Demonstrating an Order-of-Magnitude Sampling Enhancement in Molecular Dynamics Simulations of Complex Protein Systems. J. Chem. Theory Comput. 2016, 12, 1360−1367. (2) Liu, P.; Kim, B.; Friesner, R. A.; Berne, B. J. Replica exchange with solute tempering: A method for sampling biological systems in explicit water. Proc. Natl. Acad. Sci. U. S. A. 2005, 102, 13749−13754. (3) Wang, L.; Friesner, R. A.; Berne, B. J. Replica Exchange with Solute Scaling: A More Efficient Version of Replica Exchange with Solute Tempering (REST2). J. Phys. Chem. B 2011, 115, 9431−9438. (4) Lyubartsev, A. P.; Martsinovski, A. A.; Shevkunov, S. V.; Vorontsov-Velyaminov, P. N. New approach to Monte Carlo calculation of the free energy: Method of expanded ensembles. J. Chem. Phys. 1992, 96, 1776−1783. (5) Marinari, E.; Parisi, G. Simulated Tempering: A New Monte Carlo Scheme. Europhys. Lett. 1992, 19, 451−458. (6) Chodera, J. D.; Shirts, M. R. Replica exchange and expanded ensemble simulations as Gibbs sampling: Simple improvements for enhanced mixing. J. Chem. Phys. 2011, 135, 194110. (7) Martínez-Veracoechea, F. J.; Escobedo, F. A. Variance Minimization of Free Energy Estimates from Optimized Expanded Ensembles. J. Phys. Chem. B 2008, 112, 8120−8128. (8) Park, S. Comparison of the serial and parallel algorithms of generalized ensemble simulations: An analytical approach. Phys. Rev. E 2008, 77, 016709. (9) Liu, J. S. Peskun’s theorem and a modified discrete-state Gibbs sampler. Biometrika 1996, 83, 681−682. (10) Rosta, E.; Hummer, G. Error and efficiency of simulated tempering simulations. J. Chem. Phys. 2010, 132, 034102. (11) Rauscher, S.; Neale, C.; Pomès, R. Simulated Tempering Distributed Replica Sampling, Virtual Replica Exchange, and Other Generalized-Ensemble Methods for Conformational Sampling. J. Chem. Theory Comput. 2009, 5, 2640−2662. (12) Zhang, C.; Ma, J. Comparison of sampling efficiency between simulated tempering and replica exchange. J. Chem. Phys. 2008, 129, 134112. (13) Rosta, E.; Buchete, N.-V.; Hummer, G. Thermostat Artifacts in Replica Exchange Molecular Dynamics Simulations. J. Chem. Theory Comput. 2009, 5, 1393−1399. (14) Mehlig, B.; Heermann, D. W.; Forrest, B. M. Hybrid Monte Carlo method for condensed-matter systems. Phys. Rev. B: Condens. Matter Mater. Phys. 1992, 45, 679−685. (15) Abraham, M. J.; Murtola, T.; Schulz, R.; Páll, S.; Smith, J. C.; Hess, B.; Lindahl, E. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 2015, 1−2, 19−25. (16) Terakawa, T.; Kameda, T.; Takada, S. On Easy Implementation of a Variant of the Replica Exchange with Solute Tempering in GROMACS. J. Comput. Chem. 2011, 32, 1228−1234.

Details on potential energy scaling applied and simulation parameters as well as additional data on Ramachandran ϕ, ψ maps (PDF)

AUTHOR INFORMATION

Corresponding Authors

*E-mail: [email protected]. *E-mail: [email protected]. E

DOI: 10.1021/acs.jctc.9b00100 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Letter

Journal of Chemical Theory and Computation (17) Andersen, H. C. Molecular dynamics simulations at constant pressure and/or temperature. J. Chem. Phys. 1980, 72, 2384−2393. (18) Bussi, G.; Donadio, D.; Parrinello, M. Canonical sampling through velocity rescaling. J. Chem. Phys. 2007, 126, 014101. (19) Nosé, S. A molecular dynamics method for simulations in the canonical ensemble. Mol. Phys. 1984, 52, 255−268. (20) Hoover, W. G. Canonical dynamics: Equilibrium phase-space distributions. Phys. Rev. A: At., Mol., Opt. Phys. 1985, 31, 1695−1697. (21) Hornak, V.; Abel, R.; Okur, A.; Strockbine, B.; Roitberg, A.; Simmerling, C. Comparison of Multiple Amber Force Fields and Development of Improved Protein Backbone Parameters. Proteins: Struct., Funct., Genet. 2006, 65, 712−725. (22) Best, R. B.; Hummer, G. Optimized Molecular Dynamics Force Fields Applied to the Helix-Coil Transition of Polypeptides. J. Phys. Chem. B 2009, 113, 9004−9015. (23) Sindhikara, D. J.; Emerson, D. J.; Roitberg, A. E. Exchange Often and Properly in Replica Exchange Molecular Dynamics. J. Chem. Theory Comput. 2010, 6, 2804−2808.

F

DOI: 10.1021/acs.jctc.9b00100 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX