Free Energy Reconstruction from Logarithmic Mean-Force Dynamics

Jun 12, 2017 - (2, 3) The free energy (or potential of mean force) as a function of X is ..... lth replica and (21)which is nothing but the Jarzynski ...
0 downloads 0 Views 3MB Size
Article pubs.acs.org/JCTC

Free Energy Reconstruction from Logarithmic Mean-Force Dynamics Using Multiple Nonequilibrium Trajectories Tetsuya Morishita,*,†,‡ Yasushige Yonezawa,§ and Atsushi M. Ito∥,⊥ †

Research Center for Computational Design of Advanced Functional Materials (CD-FMat), National Institute of Advanced Industrial Science and Technology (AIST), 1-1-1 Umezono, Tsukuba 305-8568, Japan ‡ Mathematics for Advanced Materials Open Innovation Laboratory (MathAM-OIL), National Institute of Advanced Industrial Science and Technology (AIST), c/o AIMR, Tohoku University, 2-1-1 Katahira, Aoba-ku, Sendai 980-8577, Japan § High Pressure Protein Research Center, Institute of Advanced Technology, Kindai University, 930 Nishimitani, Kinokawa, Wakayama 649-6493, Japan ∥ National Institute of Fusion Science, 322-6 Oroshi-cho, Toki 509-5292, Japan ⊥ The Graduate University for Advanced Studies, 322-6 Oroshi-cho, Toki 509-5292, Japan S Supporting Information *

ABSTRACT: Efficient and reliable estimation of the mean force (MF), the derivatives of the free energy with respect to a set of collective variables (CVs), has been a challenging problem because free energy differences are often computed by integrating the MF. Among various methods for computing free energy differences, logarithmic mean-force dynamics (LogMFD) [Morishita et al., Phys. Rev. E 2012, 85, 066702] invokes the conservation law in classical mechanics to integrate the MF, which allows us to estimate the free energy profile along the CVs on-the-fly. Here, we present a method called parallel dynamics, which improves the estimation of the MF by employing multiple replicas of the system and is straightforwardly incorporated in LogMFD or a related method. In the parallel dynamics, the MF is evaluated by a nonequilibrium pathensemble using the multiple replicas based on the Crooks−Jarzynski nonequilibrium work relation. Thanks to the Crooks relation, realizing full-equilibrium states is no longer mandatory for estimating the MF. Additionally, sampling in the hidden subspace orthogonal to the CV space is highly improved with appropriate weights for each metastable state (if any), which is hardly achievable by typical free energy computational methods. We illustrate how to implement parallel dynamics by combining it with LogMFD, which we call logarithmic parallel dynamics (LogPD). Biosystems of alanine dipeptide and adenylate kinase in explicit water are employed as benchmark systems to which LogPD is applied to demonstrate the effect of multiple replicas on the accuracy and efficiency in estimating the free energy profiles using parallel dynamics.

1. INTRODUCTION Reliable reconstruction of free energy profiles by overcoming the issue of sampling rare events has been one of the central subjects in computational chemistry and biochemistry. Over the past decades, a variety of computational methods to tackle this problem have been proposed.1−3 Thermodynamic integration (TI)4 and free energy perturbation (FEP)5 methods are “traditional” techniques to compute free energy differences, which have been widely used and applied to various chemical and physical problems. These methods are, however, no longer suitable for dealing with recently treated complex biosystems because enormous computations are often required to estimate their free energy profiles by TI or FEP.1,6−8 Such complex systems have become our targets only recently thanks to the rapid development of high-performance parallel computers. Developing methods that estimate free energy more efficiently than “traditional” methods is, therefore, an urgent issue, © 2017 American Chemical Society

considering that the development of parallel computers has been continuously promoted. When a chemical or physical system we are interested in is well-described by a set of N collective variables (CVs), X̂ (q) = {X̂ 1, X̂ 2,...,X̂ N}, where q indicates the positions of atoms in the system, the free energy profile with respect to X̂ will provide key information for understanding the system. Therefore, methods for evaluating the free energy at a particular CV value (X̂ (q) = X) are in high demand and have been extensively developed.2,3 The free energy (or potential of mean force) as a function of X is defined as F(X) = −kBT log Z(X)

(1)

where Received: March 10, 2017 Published: June 12, 2017 3106

DOI: 10.1021/acs.jctc.7b00252 J. Chem. Theory Comput. 2017, 13, 3106−3119

Article

Journal of Chemical Theory and Computation N

Z(X) =

∫ ∏ δ(X̂i(q) − Xi)exp(−β Φ) dq i=1

The advantages of LogMFD mentioned above are, however, sometimes degraded by a heavy computational load for evaluating the MF under the near perfect adiabatic conditions. Obtaining an accurate MF, albeit dependent on the system, generally requires a long q-trajectory (i.e., many q-configurations) that samples the canonical distribution in phase space. LogMFD simulations for complex systems, therefore, would sometimes be time-consuming calculations. Furthermore, a qtrajectory that samples all (meta)stable states (conformations) with appropriate weights is crucial for reliable reconstruction of F(X), especially when these metastable states are in the hidden subspace orthogonal to the X-subspace. None of the MFDbased methods (including MTD, TAMD, and LogMFD) are, however, able to handle this situation without difficulty; the chosen CVs are, in this case, often considered to be inappropriate for free energy reconstruction.1 In this work, we develop a new technique called “parallel dynamics” to remedy these problems, which can be straightforwardly incorporated in LogMFD. We introduce multiple replicas of the physical system and run canonical molecular dynamics (MD) calculations in parallel each for each replicated system to generate many of the q-configurations, which are then collected to estimate the MF. Although parallel computing (multiple cores or CPUs) is necessary, a long single q-trajectory needs not be generated, indicating that the elapsed computing time will be considerably reduced (note that multiple replicas are introduced in the q-space but not in the X-subspace, which is in stark contrast to the “multiple walker” scheme in MTD28). The notable point is the use of the nonequilibrium work relation being the key to the following features of the parallel dynamics: (1) the condition of the high adiabaticity can be relaxed by invoking a nonequilibrium pathensemble and (2) metastable states in the hidden subspace orthogonal to the X-subspace are taken into account with appropriate weights. To our knowledge, the proposed method here is the first attempt to realize these two features in the framework of MFD. The concept of parallel dynamics is quite general and can be incorporated, in principle, in MTD and TAMD as well. This paper is organized as follows: In section 2, we review the theory of LogMFD, which will be of help to our understanding of the parallel dynamics. The basic idea of the parallel dynamics is presented in section 3, and the combination of the parallel dynamics and LogMFD, called logarithmic parallel dynamics (LogPD), is illustrated through its application to the alanine dipeptide and adenylate kinase systems in section 4. Finally, concluding remarks are drawn in section 5. Some technical details on the implementation of LogMFD/PD are given in the Appendices and the Supporting Information (SI).

(2)

where Φ is the potential energy function, kB is Boltzmann constant, T is the temperature of the physical system, and β = 1/kBT. Although finding appropriate CVs itself is, in some cases, a challenging task, F(X) gives significant information such as the free energy barrier (leading to an estimate of the transition rate, etc.) once such CVs are established. In this paper, although various techniques have been introduced to estimate F(X), particular focus will be given to the methods based on mean-force dynamics (MFD).9,10 The basis for MFD was established through the introduction of adiabatic free energy dynamics (AFED or d-AFED) by Tuckerman et al.,9,11,12 which incorporates the idea of adiabatic decoupling between the two dynamical variables in the Car− Parrinello dynamics13 (a similar concept is invoked in “coarse molecular dynamics”14,15). In MFD, X is treated as fictitious dynamical variables to enhance the sampling in the X-subspace, and their dynamics are driven by the mean force (MF), −∂F(X)/∂X, which is treated as (a part of) a mechanical force acting on X. Accordingly, most of the MFD-based simulations are governed by a fictitious extended-Lagrangian in which the dynamical variables X and the other variables q are assumed to be adiabatically decoupled. 3,9−11,16−18 Metadynamics (MTD)16,17 and temperature-accelerated molecular dynamics (TAMD)2,3,18 are representatives of the methods based on the extended Lagrangian that have been used for free energy reconstruction for various systems from biomolecules to bulk molecular crystals.19−22 The equations of motion (EOM) for X and q in MTD and TAMD (equivalent to d-AFED) are integrated simultaneously with the same time step δt under the condition that the dynamics of X are much slower than those of q to maintain the adiabatic decoupling between them. Practically, this condition needs not be perfectly met, but instead, a long X-trajectory that repeatedly visits the same X point (or its vicinity) should be generated for accurately reconstructing F(X). This, however, often makes it difficult to reconstruct multidimensional free energy surfaces. Although based on a similar formulation, logarithmic meanforce dynamics (LogMFD),10 which has been recently proposed as an MFD-based method, is not in the framework of the extended Lagrangian, but instead, it deals with X and q separately with formally independent EOM under the assumption that their adiabatic decoupling is almost perfect. (Note that EOM for X and q from the extended Lagrangian can also be formally written as independent EOM in the limit of the adiabatic separation.) Thanks to this non-Lagrangian formulation, the mechanical force acting on X can be more flexibly designed than in the extended Lagrangian formulation; thus, more efficient sampling in the X-subspace is possible. Furthermore, the near perfect adiabatic separation (within numerical calculations) allows us to reconstruct F(X) on-the-fly along a single sweep of X. That is, a long X-trajectory that repeatedly visits the same point is not necessary, which may be of help to multidimensional free energy calculations. LogMFD has been applied to a biomolecule10,23,24 [described by CHARM22 force field25 and density functional theory (DFT) with the GGA functional26] and a two-dimensional silicon material,27 demonstrating its efficiency and applicability. In the latter case, an unexpected structural change was discovered in the course of the enhanced sampling of the atomic configuration by LogMFD.

2. LOGARITHMIC MEAN-FORCE DYNAMICS (LOGMFD) We consider reconstructing F(X) [eq 1] for a physical system of Nq particles, which is described by the Hamiltonian Nq

HMD(Γ) =

∑ j=1

p2j 2mj

+ Φ(q) (3)

where the position, momentum, and mass of the jth particle are given as qj, pj, and mj [Γ = (q, p)], respectively. Φ is the potential energy of q, whereas the free energy surface F(X) [or a functional of F(X)] is postulated as the potential energy of X 3107

DOI: 10.1021/acs.jctc.7b00252 J. Chem. Theory Comput. 2017, 13, 3106−3119

Article

Journal of Chemical Theory and Computation in MFD-based simulations. This means that the MF [slope of F(X) with respect to X] acts as a mechanical force on the variable X in MFD-based simulations. The blue moon approach,29 for example, can be employed to evaluate the MF, which however needs explicit constraints on X̂ (q). In most of the methods based on MFD, instead, the harmonic restraining potential is used,3,12 which comes from the approximation F(X) ≈ − kBT log





N





i=1

∫ exp⎢⎢−β⎜⎜HMD + ∑

mjqj̈ = −

K →∞

⎞⎤ Ki (X̂i (q) − Xi)2 ⎟⎟⎥ dΓ 2 ⎠⎥⎦ (4)

⎤ ⎡ βK βK exp⎢ − (X̂ (q) − X )2 ⎥ ⎦ ⎣ 2 2π

Thanks to this approximation, the MF is calculated as 1 ∂F = Z ∂Xi





N





i=1

∫ Ki(X̂i(q) − Xi)exp⎢⎢−β⎜⎜HMD + ∑

⎞⎤ Ki (X̂i (q) − Xi)2 ⎟⎟⎥ dΓ 2 ⎠⎥⎦

= ⟨K i(X̂i (q) − Xi)⟩X

(6)

where ⟨···⟩X indicates a canonical average under the condition that X is fixed at a certain value. The simplest dynamical equation of motion (EOM) for Xi is thus introduced as MiẌi = K i⟨X̂i (q) − Xi⟩X

(7)

where Mi is the fictitious mass for Xi. Eq 7 is the basis for MFD-based calculations. This simplest form, however, may not be useful unless the kinetic energy of X is kept sufficiently high because F(X) could have high energy barriers along X, which may limit the range to be sampled by the dynamical variables X. Several attempts to modify the potential energy function for X have thus been made to effectively reduce the energy barriers on F(X). The generalized potential for X, Ψ[F(X)], which is a functional of F(X), is now introduced in the EOM for X as, M i Ẍ i = −

∂Ψ[F(X)] ∂Xi

(8)

If Ψ is flatter than F(X), more efficient sampling in the Xsubspace is expected by the fictitious dynamics of X. This is exactly what is achieved in LogMFD.10,23 The following logarithmic form is employed as Ψ in LogMFD Ψlog = γ log[αF(X) + 1]



⎞ ∂X̂i (q) ⎟ ⎟ + thermostat ⎝ ∂qj ⎠

i=1

(11)

(5)



N

∑ K i(X̂i(q) − Xi)⎜⎜

is therefore also introduced to estimate the MF. We do not specify the details of the thermostat here, but any type of thermostat, e.g., Nosé−Hoover thermostat, 30,31 can be employed as long as the resultant q-trajectory samples the canonical distribution. The potential energy surface for X, Ψlog, can be much flatter than the original surface of F(X) by appropriately choosing α and γ. Efficient barrier crossing and efficient sampling in the Xsubspace by the X-trajectories from eq 10 are thus highly expected. In many cases, γ can be taken as 1/α for simplicity [eq 10] (we hereafter assume γ = 1/α). It is shown in ref 10 that Ψlog becomes flatter as α increases with γ = 1/α. Note that Ψ in TAMD and MTD can be regarded as F(X) and F(X) + Vg(X), respectively, where Vg(X) is the sum of small Gaussian hills. Because the MF on X is obtained by canonically averaging Ki(X̂ i(q) − Xi) in phase space (q-space), the adiabatic decoupling between X and q is inevitable.3,9,12 To meet this condition, one can perform an Nm-step canonical MD run every time X is updated [this algorithm corresponds to “Double loop” in Figure S1(b); the details of the LogMFD algorithm are discussed in section S1]. With Nm → ∞, the adiabatic separation becomes perfect; thus, large Nm is generally desirable, which however increases the computational cost for each MFD step [each step to numerically update X by eq 10]. In the meantime, because the MF becomes more accurate with larger Nm, the interval between the X points, ΔX, can be taken to be relatively large as in typical TI calculations, which decreases the number of MFD steps to yield the X-trajectory for F(X). Thus, the overall computational cost may not significantly increase even with large Nm. We compare, in section 4.3, the F(X) for alanine dipeptide obtained using small or large Nm, both of which require similar computational costs as a whole. We will demonstrate that they exhibit essentially the same profile. When a small Nm is employed, the displacement of X at each MFD step should be kept very small to maintain the adiabatic decoupling and to reduce deviation from a quasi-equilibrium state. Nm can be taken to be even one [this algorithm corresponds to “Single loop” in Figure S1(a)] as in TAMD and MTD, in which the same time step is used to numerically integrate the EOM for both X and q [eqs 10 and 11]. X should be, in this case, almost “immobile” in a typical time scale of the dynamical variables q. The mass ratio, mj/Mi, is therefore taken to be extremely small to meet this condition, which was originally introduced in the Car−Parrinello dynamics.13 (Note that some remarks on the parameters such as Mi and α are given in Appendix A and section S4.) Once the adiabatic separation is effectively achieved, the following Hamiltonian, which generates the EOM for X in eq 10, can be seen as a constant of motion

with large Ki because δ(X̂ (q) − X ) ≈ lim

∂Φ − ∂qj

(9)

where α and γ are positive parameters that can be tuned to effectively reduce the energy barriers on F(X). The EOM for X in LogMFD becomes ⎞ ⎛ ⎞⎛ ∂F(X) ⎞ ⎛ αγ αγ Mi Ẍ i = − ⎜ ⎟=⎜ ⎟K i⟨X̂i (q) − Xi⟩X ⎟⎜ ⎝ αF(X) + 1 ⎠⎝ ∂Xi ⎠ ⎝ αF(X) + 1 ⎠

N

Hlog =

(10)

∑ i=1

It is recognized that the net force on Xi is reduced as F(X) becomes larger, which results from the logarithmic form of Ψlog. The canonical average on the r.h.s. can be evaluated by running a short canonical MD calculation. The following EOM for qj

2 pXi

2Mi

+γ log[αF(X) + 1] (12)

where pXi is the momentum of Xi and is given as pXi = MiVi, where Vi is the velocity of Xi. This immediately indicates that F(X) can be obtained every time X is updated as 3108

DOI: 10.1021/acs.jctc.7b00252 J. Chem. Theory Comput. 2017, 13, 3106−3119

Article

Journal of Chemical Theory and Computation ⎧ ⎡ ⎛ 1 ⎪ ⎢1⎜ F(X) = ⎨exp ⎜Hlog − ⎢ α⎪ ⎩ ⎣γ ⎝

N

∑ i=1

⎫ 2 ⎞⎤ pXi ⎪ ⎟⎥ − 1⎬ ⎪ 2Mi ⎟⎠⎥⎦ ⎭

3. LOGARITHMIC PARALLEL DYNAMICS (LOGPD) 3.1. Parallel Dynamics. In LogMFD, X does not need to visit the same point repeatedly, and thus, only a single sweep of X is sufficient to reconstruct free energy profiles. This means that the accuracy of the MF at each MFD step is particularly crucial in reconstructing the free energy profiles using LogMFD. In this section, we introduce “parallel dynamics”, which enables us to obtain highly accurate MF in LogMFD. It is straightforward to increase Nm for improving the accuracy in estimating the MF. Because the error in the MF typically decreases only by Nm−1/2,36 however, it often results in a time-consuming calculation (see also the discussion in section 4.2). Instead of increasing Nm, here, we introduce multiple replicas of the MD system that run in parallel as schematically shown in Figure 1. As in the standard LogMFD, the CVs

(13)

Hlog in eq 13 is now treated as a parameter being set to always N

p2

satisfy Hlog ≥ ∑i = 1 2MXi (when no thermostat is attached to X) i

to ensure F(X) ≥ 0 throughout the LogMFD calculation. This condition, F(X) ≥ 0, is mandatory to yield a smooth potential surface of Ψlog for efficient sampling in the X-subspace. Eq 13 is the key to the on-the-fly reconstruction of F(X), which is based on the assumption that Hlog is constant throughout the LogMFD run. As long as a conserved quantity such as Hlog exists, therefore, the on-the-fly reconstruction of F(X) is feasible irrespective of the form of Ψ[F(X)]. It is easily recognized that TAMD, for example, also enables us to estimate F(X) on-the-fly, where Ψ is equal to F(X). One should keep in mind, however, that the MF at each MFD step should be very accurate for the on-the-fly reconstruction using any form of Ψ[F(X)]. Finally, we address the importance of thermostatting the variable X in LogMFD. Large displacement of X per MFD step (or high velocity of X) tends to break the adiabatic separation between X and q unless Nm is very large. Thermostatting is thus of help to obviate such a high velocity and to maintain the adiabaticity throughout the LogMFD run. Because LogMFD makes use of the conservation law to estimate F(X) on-the-fly [eqs 12 and 13], Nosé−Hoover-type thermostats can be employed. With a single Nosé−Hoover thermostat, Hlog is replaced with HNH as N

HNH =

∑ i=1

2 pXi

2Mi

+ γ log[αF(X) + 1] +

pη2 2Q

Figure 1. Schematic diagram of the protocol to update Xi in LogPD. The MF (indicated by the red rectangle) is evaluated using the MD trajectories from Nr replicas (indicated by the yellow rectangle) with the weight of Wl. Nx is the total number of MFD steps (see also for comparison the protocol in LogMFD illustrated in Figure S1).

+ NkBTXη (14)

where η and pη are the thermostat variable and its momentum, respectively, Q is the mass for η, and TX is the temperature of X, which can be practically set to be equal to T. F(X) is thus obtained10 by slightly modifying eq 13 as F(X) =

⎧ ⎡ ⎛ 1 ⎪ ⎢1⎜ ⎨exp HNH − α ⎪ ⎢⎣ γ ⎜⎝ ⎩

N

∑ i=1

2 pXi

2Mi



[X̂ (q)] in each replica is restrained to the same value of X(t). A canonical MD run with Nm steps is performed in each replica; then, the MF on Xi, ⟨Ki(X̂ i(q) − Xi(t))⟩X, is evaluated using the MD trajectories from all replicas (namely, N m × N r instantaneous values of X̂ i(q), where Nr is the number of replicas). Although the number of processing units (CPU or core) increases as Nr increases, the elapsed computing time is almost unaffected, ignoring the communication process. Thus, we consider this is an efficient way to accurately estimate the MF. We refer to this algorithm as parallel dynamics (PD). LogMFD incorporating this algorithm is thus called logarithmic parallel dynamics (LogPD). As can be easily seen, the basic idea of PD can be combined not only with LogMFD but also with other MFD-based methods such as TAMD and MTD. It should be remarked that PD is different from the scheme called “multiple walkers” (MW) in MTD.28 In the MW scheme, X̂ i(q) in each replica is restrained to a different value of X (in the case of the extended-Lagrangian MTD16), and the bias constructed in each replica is collected and combined to accelerate the sampling in the X-subspace. In contrast, X̂ i(q) in each replica in PD is restrained to the same value of X, i.e., the same restraint is imposed in each replica. A different qconfiguration is, however, sampled in each replica and collected and used to evaluate the MF on X. We thus see that PD is a kind of MW scheme in phase space (but not in the X-

⎫ ⎞⎤ ⎪ − NkBTXη⎟⎟⎥ − 1⎬ ⎥ 2Q ⎪ ⎠⎦ ⎭ (15) pη2

This ensures that F(X) can be obtained on-the-fly in Nosé− Hoover LogMFD. Multiple thermostat algorithms (such as Nosé− Hoover chain, 32 Nosé− Hoover network, 33 and GGMT34) can also be employed, but a single thermostat is generally sufficient because the X-trajectory needs not sample the canonical distribution, in contrast to TAMD.9,11,12 It is worth noting that, because of the inertia of η, the kinetic energy of X could fluctuate considerably and become very large, which is undesirable for maintaining the adiabaticity. The mass Q can therefore be reduced as long as the numerical integration is stably performed with the employed time step of Δt, preventing large fluctuations of the kinetic energy of X. In this line, isokinetic thermostatting (velocity scaling35) is also suitable as a method for controlling the kinetic energy of X. In fact, we have recently found that isokinetic LogMFD is feasible with on-thefly estimation of F(X), which will be reported elsewhere. 3109

DOI: 10.1021/acs.jctc.7b00252 J. Chem. Theory Comput. 2017, 13, 3106−3119

Article

Journal of Chemical Theory and Computation

number of necessary MFD steps for reconstructing F(X), thus resulting in efficient LogPD calculations. It should also be stressed that, with sufficiently large Nr, the MF can be correctly estimated even when more than one stable state exists in the hidden subspace orthogonal to the Xsubspace. This situation often leads to insufficient sampling in the hidden subspace, especially when these states are separated by a high energy barrier (recall that such insufficient sampling results in a hysteresis in the forward and backward integration of ∂F/∂X in TI calculations8). Eq 22 ensures that all the contributions from such hidden states are incorporated with appropriate weights in the limit of Nr → ∞.37,38 We thus consider that an accurate MF can be obtained from the parallel dynamics using eq 22, which will be demonstrated later (see also section S2). Note that this problem is also related with how to choose the CVs. In an X-subspace defined by inappropriately chosen CVs, we often suffer from insufficient sampling in the hidden subspace. The work wl in the weight Wl is practically calculated as

subspace). The point is that the replicated systems in the original MW are for enhancing the sampling in the X-subspace, whereas they are for improving the accuracy of the MF in PD. This means that even the original MW scheme can employ PD in which each walker in the X-subspace is accompanied by multiple walkers in phase space. The advantage of PD will be discussed in more detail in the next subsection. 3.2. Evaluation of the Mean Force. The MF in LogPD is calculated as −

Nr

∂F = ∂Xi

Nm

1 K i[X̂i (q nl) − Xi] Nm

∑ Wl ∑ l=1

n=1

(16)

where qnl indicates the q-configuration at time step n in the Nmstep MD run in the lth replica. If F(X) is dominated by a single free energy basin and the q-configurations therein are sufficiently sampled at each MFD step in all replicas, Wl can be simply taken as 1/Nr. This situation, however, cannot always be expected, because there often exist metastable states in the hidden subspace orthogonal to the X-subspace, and some replicas could be trapped in such a metastable state when the sampling in the hidden subspace is not sufficient. To handle this issue, Wl is taken as Wl =

e − βw

Nμ l

w =

N

l

(17)



wl =

X N

∂H ∫X ∑ ∂XMD 0

l

dX i

(18) N

HMDl(Γ, X) = HMD(Γ l) +

∑ i=1

Ki l (X̂i − Xi)2 2



⟨A(q)⟩eq =

l=1



(19)

=





∂F = ∂Xi

Nr

∑ e − βw

Nr l=1

∂HMDl(X (ν ,2)) ∂X

+

∂HMDl(X (ν , Nm)) ⎫ ⎪ ⎬ · dX ⎪ ∂X ⎭

1 Nm

Nm



∂HMDl(X (ν , n))

n=1

∂X

∂HMDl(X (ν))

·NmdX ·ΔX (ν)

∂X Nm , X(ν)

(24)

where NmdX is regarded as ΔX(ν) = X(ν) − X(ν−1). Because X is invariant in the Nm-step MD run, X(ν,n) at any n are actually the same as X(ν), i.e., X(ν) = X(ν,1) = X(ν,2) = ··· = X(ν,Nm). This leads to the approximation in the last term on the right-hand side of eq 24. ⟨ ⟩Nm,X(ν) indicates an average over Nm q-configurations from a canonical MD run at X = X(ν). In our calculations presented in section 4, we have employed eq 24 to evaluate wl. Eq 24 simply shows that, as Nm increases, wl approaches the free energy difference calculated by TI as in Jarzynski’s equality.38 That is, all Wl are expected to approach the same value of 1/Nr with sufficiently large Nm. Note that, with any Nm, LogPD calculations should always be initiated with an equilibrium q-configuration in each replica because the Crooks−Jarzynski nonequilibrium work relation is invoked as in eq 22. Now, we demonstrate, using a toy model, that metastable states separated by a high energy barrier in the hidden space can be appropriately sampled using parallel dynamics. Panels a and b in Figure 2 show the two-dimensional energy landscape of the model, having a single basin (minimum) at X ≈ 0 and two basins (minima) with different depths at X > 0.7, which are separated by a high energy barrier (∼3 kcal mol−1). This model

(20)

l

(21)

l=1



∂X



+

l

e − βw A(q l) Nre−βFJ

which is nothing but the Jarzynski nonequilibrium work relation38 (i.e., FJ is the free energy difference, FJ = F(X) − F(X0), by Jarzynski’s equality). The MF is thus calculated as a path-ensemble average using eqs 16−21 as −

∑ ν=1

where A(q ) is an arbitrary function of the q-configuration in the lth replica and 1 Nr

∑ ν=1

l

e−βFJ =

(23)

l MD (X (ν ,1))



··· +

and X̂ il stands for X̂ i(ql) (the ith CV in the lth replica). Wl comes from the Crooks relation37 by which we can evaluate an equilibrium ensemble average from a set of nonequilibrium trajectories as Nr

· dX

⎧ ⎪ ∂H

∑⎨ ν=1

i

i=1

∂X

where X(μ) indicates X at the μth MFD step. In the case of Nm > 1, we relabel the subscript of X by introducing Nv, which satisfies Nμ = Nv × Nm. wl can then be rewritten as

where wl =

∑ μ=1

l

∑l =r 1 e−βw

∂HMDl(X (μ))

l

e − βw Nre−βFJ

Nm

∑ n=1

1 K i(X̂i (q ln) − Xi) Nm

(22)

This can also be obtained by differentiating FJ in eq 21 with respect to Xi [eq 24 should be invoked when Nm > 1 (see below)]. It is assured, using eq 22, that the MF can be accurately estimated even when the adiabatic separation is not strictly maintained, particularly with large Nr.37,38 This indicates that the velocity of Xi can be slightly increased to reduce the 3110

DOI: 10.1021/acs.jctc.7b00252 J. Chem. Theory Comput. 2017, 13, 3106−3119

Article

Journal of Chemical Theory and Computation

basins separated by high energy barriers. For more details on this calculation, see section S2. Overall, the parallel dynamics is considered to provide an accurate mean force by incorporating the nonequilibrium pathensemble, which in principle obviates the issue regarding insufficient sampling in the hidden space and sampling with inappropriate weights (section S2). In the next section, we apply LogPD to the biosystems, alanine dipeptide and adenylate kinase (AdK) in explicit water, to demonstrate how to implement LogPD. AdK is a ubiquitous enzyme and plays a significant role in cellular energy homeostasis.39,40 Because AdK exhibits a large conformational change in catalyzing the phosphoryl transfer reaction,41−43 it is considered as a good benchmark system, together with alanine dipeptide, for LogPD.

4. APPLICATION OF LOGPD 4.1. Implementation. Logarithmic parallel dynamics can be performed essentially in the same line as discussed in section S1 for LogMFD. The only difference is the protocol used to estimate the MF at each MFD step (see Figure 1 and Figure S1). As discussed in section 3.2, the MF in LogPD is estimated using eq 22 with wl given in eq 24. In the present simulations, we employ a slightly modified version of eq 22 −

∂F(X (t )) ∂Xi

Nr

=



l ⎞ N e−βws (X (t)) ⎟ m 1 ∑ Ki −βw l(X ) ⎟ ⎝ ∑l e s (t) ⎠ n = 1 Nm

∑ ⎜⎜ l=1

×(X̂i (q nl) − Xi(t ))

(25)

where Figure 2. (a, b) Two-dimensional energy surface Φ(q, X) of the toy model and (c) the free energy profile F(X) with respect to X. “Run S4” indicates the benchmark result (see section S2 for details).

t

ws l(X (t )) =



∂HMDl(X (ν)) ∂X

ν=1

t−1

·ΔX (ν) −

∑ F′J(X (k)) k=0

(26)

is thus suitable to examine the ability of the nonequilibrium work relation (eq 22) to sample local minima in the space orthogonal to the X-space during the variation of X. We evaluated the free energy with respect to X, F(X), with or without PD. X was here assumed to grow linearly from 0 to 1 with time to mimic the dynamics of X in LogMFD/PD. Every time X was updated, the q-space (which is regarded as the hidden space orthogonal to X) was sampled by the trajectory from the 300 K Nosé−Hoover chain dynamics, which generated Nm q-configurations at each X point. (Note that the sampling in the q-space is not enhanced by the LogMFD dynamics.) F(X) was then obtained by integrating the MF (evaluated at each X point using the Nm q-configurations) and is presented in Figure 2(c). Using 10 replicas (Run S1; Nm = 100), each starting with a different q position but sharing the same X value throughout the run, we found that the two basins (X > 0.7) in the q-space were sampled with appropriate weights and that the resultant F(X) agrees well with the benchmark profile [labeled “Run 4” in Figure 2(c), see also Figure S2]. In contrast, single-replica runs (Runs S2 and S3; Nm = 1000) yielded (X, q)-trajectories trapped in either basin near X ≈ 1 depending on the initial position of q. The resultant F(X) significantly deviate from the benchmark F(X), especially at X > 0.7. Because the number of the q-configurations generated for evaluating F(X) is the same in all the runs, we conclude that the nonequilibrium work relation (eq 22) indeed improves the sampling in the space orthogonal to the X-space, especially when there exist multiple

exp[−βF′J (X (t ))] = =

1 Nr

Nr

1 Nr

Nr

∑ exp[−βwsl(X (t))] l=1 t−1

∑ exp[−β{wl(X (t)) − ∑ F′J(X (k))}] l=1

k=0

(27)

and F′J (X (0)) ≡ 0

(28)

“Nν” in eq 24 is hereafter relabeled “t” with wsl(X(1)) = wl(X(1)) and wsl(X(0)) = wl(X(0)) = 0. Because exp(−βwl) tends to rapidly decrease in nonequilibrium simulations, it is more convenient to use the rescaled quantity as t−1

exp[−βws l(X (t ))] = exp[−βwl(X (t ))]exp[β ∑ F′J (X (k))] k=0

(29)

to reduce possible underflow in the numerical computations. It t−1 should be stressed that exp[β ∑k = 0 F′J (X (k))]is independent of l; thus, eq 25 is equivalent to eq 22. We found that the F(X) estimated using the MF from eq 22 or 25 is better than FJ(X) (eq 21) unless ΔX is extremely small. This is likely because FJ is an exponential average, whereas the F(X) in LogPD is not. Similar arguments have also been made in previous studies.37,44 We will demonstrate the difference 3111

DOI: 10.1021/acs.jctc.7b00252 J. Chem. Theory Comput. 2017, 13, 3106−3119

Article

Journal of Chemical Theory and Computation between F(X) and FJ(X) reconstructed for the aqueous alanine dipeptide in the next subsection. 4.2. Computational Details and the Effect of the Number of Replicas. 4.2.1. Computational Models. LogPD simulations were performed for alanine dipeptide and AdK in explicit water. The double- or triple-loop algorithm (see section S1) was employed in the present calculations, wherein the MD part is handled by the COSGENE MD module45 and the MFD part is handled by our in-house code. The computational model for the alanine dipeptide system consists of an alanine dipeptide molecule solvated with 448 water molecules in a cubic box of length 24 Å. The alanine dipeptide was parametrized with the AMBER96 force field,46 and the TIP3P water model47 was used as a solvent with the SHAKE algorithm48 to keep the molecules rigid. The Gaussian isokinetic dynamics49 was employed to sample the canonical distribution at 300 K in each MD replica. The thermostatted EOM for the molecules were integrated with a time step of 1 fs (δt = 1 fs) under periodic boundary conditions. We consider the free energy profile as a function of the Ramachandran angles, φ and ψ, both of which were restrained to the dynamical variables X throughout the simulations with the harmonic coupling constants Ki of 656.56 kcal mol−1 rad −2. The computational model for the AdK system consists of a 214 residue AdK molecule from Escherichia coli and 9763 TIP3P water molecules (we employed the structure of PDB ID 4AKE50). The AdK molecule comprises the following three domains: core domain consisting of residues 1−29, 68−117, and 168−214 (CORE), nucleoside monophosphate-binding domain consisting of residues 30−59 (NMPbD), and lipshaped adenosine triphosphate-binding domain consisting of residues 118−167 (LID). In the present calculation, the distance between the LID and CORE domains is chosen as a CV to describe the transformation between the open and closed forms of AdK. The distance between the two domains, R, is defined as that between the centers of mass of Cα in the LID and CORE domains and is restrained with a harmonic coupling constant of 5,000 kcal mol−1 Å−2 during the LogPD calculation. The ligand-binding effect is not considered in this study; thus, the present model does not include the ligand (apo-AdK). A cubic box of length 69.184 Å containing the AdK and water molecules was employed with periodic boundary conditions with some water molecules being replaced with sodium or chloride ions to preserve charge neutrality of the system. The AMBER96 force field was used for the AdK molecule. Both the AdK and water molecules were treated as rigid body molecules, and their EOM were integrated with a time step of 1 fs (δt = 1 fs). The Nosé−Hoover dynamics for 300 K was employed to sample the canonical distribution in each MD replica. The number of MD replicas for estimating the MF using parallel dynamics was set to 12 and 8 for the alanine dipeptide and AdK system, respectively. Although it is desirable, in general, to employ as many replicas as possible, we judged from our preliminary calculations discussed below that 12 replicas are minimal but sufficient for small biomolecules such as the alanine dipeptide under the quasi-adiabatic conditions in typical LogMFD calculations. In the meantime, more replicas would be desirable for the AdK system, as it is much larger than the alanine dipeptide system and thus may need much more qconfigurations for evaluating the MF. With the computational resources presently available to us, however, only 8 replicas were able to be employed for the AdK system. The present

LogPD calculations for the AdK is thus for demonstrating the feasibility of LogPD for a system larger than an alanine dipeptide but not for closely investigating the free energy profile of the AdK itself. In both systems, all LogPD runs were started with 12 or 8 replicas, each having different initial q-configurations (in equilibrium) but under the same restraint of X̂ (q) = X(0). The initial atomic coordinates (position and momentum) for each replica were sampled from a preliminary canonical MD run in equilibrium with the interval of 100 ps under the restraint of X̂ (q) = X(0). Then, each replica ran independently with X̂ (q) being restrained to the same value of X(t) in all replicas. 4.2.2. Effects of Multiple Replicas. One of the advantages of PD (LogPD) is the enhanced sampling of multiple energy basins in the hidden subspace as demonstrated in section 3.2. Even with a single energy basin in the hidden subspace, however, one still gains several advantages from PD. Here, the following effects are demonstrated using the alanine dipeptide system; (1) reduction of statistical errors by increasing the number of the q-configurations and (2) relaxation of the adiabatic condition by increasing the number of nonequilibrium trajectories. To see the effect of (1), a test LogPD run for the alanine dipeptide was started with the dihedral angles, φ and ψ [X̂ = (φ(q), ψ(q))], being restrained to the same values of X(0) = (−2.618, 2.443) in each replica. The initial positions and momenta of the peptide and water molecules were sampled independently for each replica from a preliminary canonical MD run under the restraint of X̂ (q) = X(0) at 300 K. X[=(Xφ, Xψ)] was then updated from X(0) = (−2.618, 2.443) to X(1) = (−2.617, 2.444) and was kept unchanged while the MD run in each replica lasted for 500,000 steps (500 ps). The cumulative running averages of K(X̂ − X), ⟨K(X̂ − X)⟩l, for φ and ψ in each replica are presented in Figure 3(a and b) (as measured from the converged value ⟨K(X̂ − X)⟩∞, which was obtained by averaging ⟨K(X̂ − X)⟩l at time step 500,000 over the 12 replicas). It is seen that ΔMF[=⟨K(X̂ − X)⟩l − ⟨K(X̂ − X)⟩∞] ranges from −15 to 15 (kcal mol−1 rad−1) up to time step ∼100 (i.e., Nm < ∼100), indicating that the accuracy of the MF strongly depends on the initial condition of the MD runs when Nm < ∼100. The deviation is, however, substantially reduced by invoking multiple replicas. Panels c and d in Figure 3 show ΔMF averaged over 4, 8, and 12 replicas. It is clearly seen that the deviation is reduced using multiple replicas. This, in turn, indicates that the convergence speed increases with the number of replica. Log−log plots of

(ΔMF)2 as a function of time

(Nm) in Figure 3(e and f) again demonstrates that (ΔMF)2 decreases as the number of replicas increases. Also seen is that 36 (ΔMF)2 decreases with Nm by roughly N−1/2 These m . findings clearly indicate that, with a larger number of replica, a smaller Nm can be employed to maintain the same accuracy. Next, the effect of (2), the number of nonequilibrium paths, is demonstrated in one-dimensional LogPD calculations for the alanine dipeptide system. The free energy profiles F(φ) for the aqueous alanine dipeptide at 300 K were reconstructed using 4 and 12 replicas, which are compared in Figure 4. In these calculations, the following parameter set was used: α = 2, Nm = 20000, Nf = 100, and Mφ = 5 × 105 (note that the MF was evaluated every Nf MFD steps only, see section S1 for details). F(φ) was estimated as a function of φ(t) [Xφ (t)] while fixing ψ

3112

DOI: 10.1021/acs.jctc.7b00252 J. Chem. Theory Comput. 2017, 13, 3106−3119

Article

Journal of Chemical Theory and Computation

Substantial improvement in accuracy of F(φ) by increasing the number of replicas is thus demonstrated. In other words, the adiabatic condition can be relaxed by increasing the number of nonequilibrium paths. It should be stressed that F2 > F3 > F1 in both LogPD runs and that the drift is clearly seen in F2 even in the 12-replica calculation. The MF estimator using the nonequilibrium work relation [eq 22 or 25] is thus crucial in the LogPD calculation, especially when the adiabatic separation easily breaks down. Note that, although the difference between F1 and F3 tends to become small as ΔXφ decreases, F3 (from Jarzynski’s equality) is always larger than F1, which becomes more visible with time. This difference is likely to come from the approximation in eq 23 [or 24], which becomes worse as dX (or ΔX) becomes larger. Although the exponential average will be sensitive to this approximation, the MF estimator [eq 22 or 25] is less sensitive due to possible error cancellation.44 It is thus reconfirmed that obviating the exponential average is significant to reduce the numerical error in estimating the MF and the corresponding free energy profile. 4.3. Free-Energy Profiles of Aqueous Alanine Dipeptide and AdK. First, we present the two-dimensional free energy profile F(φ, ψ) of the aqueous alanine dipeptide from single-replica (LogMFD) calculations. We performed three LogMFD runs using different sets of the parameters, demonstrating that the resultant free energy does not depend on the details of the parameter set. In all the runs for the aqueous alanine dipeptide, X was coupled with a single Nosé− Hoover thermostat of 300 K, and the thermostatted EOM for X was integrated with a time step of 1 fs (i.e., Δt = δt) using the integrator introduced in Appendix B (eqs 44−52). α of 1−3 was found to be sufficient for the X-trajectory to cover the entire X(φ, ψ) subspace, and HNH was set to 1.5−2.0. The parameter values for Nm, Nf, and Mi, along with the total simulation time t, are summarized in Table 1.

Figure 3. ΔMF (cumulative running average of the MF measured from ⟨K(X̂ − X)⟩∞) in 12 MD replicas for (a) Xφ and (b) XΨ and ΔMF averaged over 4, 8, and 12 replicas for (c) Xφ and (d) XΨ. Log− log plots of (ΔMF)2 for Xφ and XΨ are shown in (e) and (f), respectively. The dashed line has a slope of −1/2, indicating that

(ΔMF)2 decreases with Nm by N−1/2 m .

Table 1. Parameters for the Alanine Dipeptide LogMFD Runs run ta (ns) 1 2 3

40 60 550

Nm

Nf b

Mic (kcal mol−1 fs2 rad−2)

σF (kcal mol−1)

100 50,000 50,000

1 100 30

5 × 106 5 × 105 5 × 105

0.323 0.258 0.181

This amounts to the Nm × Nx MD steps with time step of 1 fs, where Nx is the number of MFD steps. bSee section S1. cThe same mass parameter was employed for both Xφ and Xψ. Note that 5 × 106 kcal mol−1 (fs2 rad−2) = 2092 amu (Å/rad)2. a

Figure 4. F(φ(t)) for the aqueous alanine dipeptide obtained by the three different procedures (F1−F3) using 4 (red lines) and 12 (blue lines) replicas.

The overall free energy surface was finally reconstructed by interpolating the “on-the-fly” free energy [green line in Figure 5(a)]. To this end, the on-the-fly free energy trajectory was binned into 0.3 rad × 0.3 rad (or 0.1 rad × 0.1 rad) bins and averaged at each bin, which was then fitted by Gaussian functions. This binning process is actually not necessary, but it enables us to estimate the uncertainty of the free energy obtained directly from the LogMFD/PD runs (which will be discussed later). In fact, fitting the Gaussians directly to the raw data [the green line in Figure 5(a)] was also found to give a similar free energy surface as in Figure 5. Figure 5 compares F(φ, ψ) from the three LogMFD runs. Although the different parameter sets were used in each of the LogMFD runs, we see that the overall profile of the three F(φ, ψ) are almost the same. F(φ, ψ) in Figure 5(a) [and (b)] was

at 2.44 rad. To realize the same positive displacement of Xφ (ΔXφ > 0) at each MFD step, we performed isokinetic (300 K) LogPD runs, which enable us to directly compare the resultant F(φ(t)) using a different number of replicas as a function of the MFD time step. A periodic profile is seen in F(φ(t)) as a function of the MFD step because φ(t) sweeps 3 times from −π to π under the periodic boundary conditions. F(φ) in Figure 4 is reconstructed by the following three different ways: (1) using the MF from eq 25 [F1], (2) using the MF from eq 25 but with Wl = 1/Nr [F2], and (3) using Jarzynski’s equality (eq 21) [F3]. It is clearly seen that, although all types of F(φ) from the 4-replica calculation show a drift reflecting insufficient adiabatic separation, the drift is almost suppressed in F1 and F3 in the 12-replica calculation. 3113

DOI: 10.1021/acs.jctc.7b00252 J. Chem. Theory Comput. 2017, 13, 3106−3119

Article

Journal of Chemical Theory and Computation

Figure 6. F(φ, ψ) for the alanine dipeptide in explicit water from runs (a) 4 and (b) 5.

from two LogPD runs using 12 replicas, each generating 7.3 ns [Figure 6(a), run 4] or 15 ns [Figure 6(b), run 5] MD trajectories. As in the single-replica calculations (run 1−3), we employed different parameter sets for the two LogPD runs, which are summarized in Table 2. Figure 5. F(φ, ψ) for the alanine dipeptide in explicit water from runs (a, b) 1, (c) 2, and (d) 3. The green lines in (a) indicate the on-the-fly free energy trajectory from run 1.

Table 2. Parameters for the Alanine Dipeptide LogPD Simulations

reconstructed from run 1 in which X moves relatively slowly with a very large mass. Thanks to this slow dynamics, Nm can be reduced to 100 in run 1 (corresponding to the “Double loop” in Figure S1). F(φ, ψ) in Figure 5(c), in contrast, was reconstructed from run 2 using large Nm (50,000, Table 1). Because the adiabatic decoupling can be reasonably recovered at each MFD step with large Nm, a smaller M (leading to a larger displacement of X at each MFD step) and a larger Nf (i.e., Nf > 1) can be employed to reduce the number of the X points at which the MF is evaluated [“Triple loop” in Figure S1(c)]. As a result, both runs 1 (40 ns) and 2 (60 ns) produced similar F(φ, ψ) using a similar number of computations. F(φ, ψ) in Figure 5(d) was reconstructed from run 3, which was obtained using the 550 ns MD trajectory and is almost the same as F(φ, ψ) from a 1.19 μs umbrella sampling calculation (not shown) being a benchmark for runs 1 and 2. It is clearly seen that all the runs reproduce the same characteristics of the free energy profile (Figure 5), and thus, the LogMFD results from runs 1 and 2 are highly reliable. Although both runs 1 and 2 produce excellent free energy profiles, the parameter set in run 2 is recommended when the MD part is handled by an existent MD code and the MFD part by another distinct code, which may be compiled and built separately. This is because, as discussed in section S1, the data transfer between these codes should be smaller in run 2 than in run 1, which effectively reduces the elapsed computing time. The uncertainty of the resultant F(φ, ψ), σF, for each run is given in Table 1. This is the standard deviation of F(φ, ψ) averaged over 400−1800 points of the 0.1 rad × 0.1 rad grid, which were sampled more than one time by the X-trajectory. This quantity is particularly sensitive to the degree of adiabatic decoupling, being larger when the decoupling becomes weaker during the LogMFD run. It is seen that the uncertainty is ∼0.3 kcal mol−1 or less for all the LogMFD runs. This indicates that the employed parameter sets are indeed appropriate to maintain the adiabaticity in the LogMFD runs and thus to yield reasonable F(φ, ψ). Next, we discuss the multiple-replica (LogPD) results for the aqueous alanine dipeptide. Figure 6 shows F(φ, ψ) obtained

run

ta (ns)

Nm

Nf b

Mic (kcal mol−1 fs2 rad−2)

σF (kcal mol−1)

4 5

7.3 × 12 15 × 12

100 5000

4 30

5 × 106 5 × 105

0.280 0.183

a This amounts to the Nm × Nx MD steps with time step of 1 fs, where Nx is the number of MFD steps. bSee section S1. cThe same mass parameter was employed for both Xφ and Xψ. Note that 5 × 106 kcal mol−1 (fs2 rad−2) = 2092 amu (Å/rad)2.

F(φ, ψ) in Figure 6(a and b) agree well with the F(φ, ψ) obtained by the 550 ns LogMFD run [Figure 5(d)] and 1.19 μs umbrella sampling calculation. Although the total CPU time required for run 5 is 12-times longer, the elapsed computing time is almost the same as that for a single 15 ns LogMFD run, indicating that the effective time to reconstruct the free energy profile can be substantially reduced. Figure 6(a) demonstrates that a rough outline of F(φ, ψ) can be obtained even in half that time. The uncertainties of F(φ, ψ) are 0.28 and 0.18 kcal mol−1 for runs 4 and 5, respectively (Table 2), indicating that the adiabatic decoupling is well maintained in these LogPD calculations. It is also interesting to compare the LogPD results with those reconstructed by a method other than LogMFD. In section S3, F(φ, ψ) computed by TAMD is compared with the LogPD and LogMFD results with detailed analyses on the (Xφ, Xψ)-space dependence of the free energy error. We found that the 40 ns TAMD run presented in section S3 yielded slightly worse F(φ, ψ) than those of run 1 (by LogMFD) and run 5 (by LogPD). This is considered to come from the less homogeneous sampling in the (Xφ, Xψ)-space by TAMD, which in turn highlights the advantage of the logarithmic form of LogMFD/ PD to flatten the free energy landscape. See section S3 for more details. Figure 7 shows the time evolution of wl along with the F(φ(t), ψ(t)) (constructed on-the-fly using eq 15) in run 5. It is seen that a lower wl contributes to F(φ, ψ) more substantially as was demonstrated in previous steered MD calculations using the nonequilibrium work relation.51 It is also confirmed that more than one replica alternately yields the most significant state for constructing F(φ, ψ) [i.e., wl ≈ F(φ, ψ)], indicating 3114

DOI: 10.1021/acs.jctc.7b00252 J. Chem. Theory Comput. 2017, 13, 3106−3119

Article

Journal of Chemical Theory and Computation

We tested, in the AdK-LogPD run, an adaptive reinitialization scheme in which all MD replicas were initialized in the middle of the LogPD run (at R = 25.17 Å) in the same way as in the adaptive steered MD method,51,52 assuming that F(R) is dominated by a single free energy basin. In the present protocol, the Log PD run was stopped at R = 25.17 Å and was then restarted with the 8 replicas, each being initiated with a new set of atomic positions and momenta sampled from an independent equilibrium MD trajectory. More specifically, four canonical MD runs with the restraint of R = 25.17 Å were initiated with the final configurations from the four replicas, which yielded the four lowest wl at R = 25.17 Å. Two sets of atomic configurations and momenta were then extracted from each of the four canonical MD trajectories with an interval of 100 ps, giving 8 sets of configurations and momenta in total, which were then used as the initial coordinates for the 8 replicas in the subsequent Log PD run. Because the free energy profile is reasonably reconstructed as in Figure 8, we consider that the present protocol of adaptive reinitialization works well in the framework of LogPD. The reinitialization scheme has been examined in previous steered MD calculations using the nonequilibrium work relation and is known to yield better free energy profiles than those without reinitialization.51,52 LogPD makes use of the nonequilibrium work relation; thus, such reinitialization schemes should be able to be straightforwardly incorporated in LogPD. Although several other protocols for reinitialization have also been proposed in ref 52, a thorough investigation of their efficiency is beyond the scope of this paper. It is, however, expected that various schemes including reinitialization can be incorporated into LogPD to further improve its efficiency, which should merit more detailed investigation.

Figure 7. Time evolution of wl in the 12 MD replicas along with the LogPD result of F(φ(t), ψ(t)) indicated by the red circles.

that the weight (Wl) for estimating the MF (eq 17) works well in the LogPD calculation. Finally, we demonstrate the LogPD result for the aqueous AdK. Figure 8 shows the free energy profile, F(R), with respect

5. CONCLUSIONS We have introduced parallel dynamics that give a reliable mean force on the basis of the nonequilibrium work relation. By implementing LogPD that combines parallel dynamics and LogMFD, we have shown that F(X) can be reconstructed with a shorter elapsed computing time than in standard LogMFD through its application to benchmark biosystems. The key features of LogPD are the following: (1) the strict adiabatic condition in LogMFD can be loosened and (2) metastable states in the hidden subspace orthogonal to the X-subspace are taken into account with appropriate weights. These two features, being effective particularly with a large number of the replicas, improve the efficiency and reliability in free energy reconstruction. Parallel dynamics can be incorporated not only in LogMFD but also in TAMD or MTD (or related methods); thus, it can be employed in any situation where such an MFD-based method works for obtaining free energy profiles. LogMFD, however, particularly benefits from PD because it is designed to use a highly accurate MF to reconstruct a free energy profile “on-the-fly” from only a single sweep of X. LogPD shares the virtues of LogMFD; thus, free energy profiles can also be reconstructed “on-the-fly” in LogPD. Because the nonequilibrium work relation is invoked in estimating the MF, several extensions previously proposed to improve the nonequilibrium path-ensemble averages can be incorporated in LogPD. One of them is the adaptive reinitialization scheme51,52 (or the path-breaking scheme53) as demonstrated in the AdK-LogPD simulation. Another possible technique is the escorted fast-switching method,54 whose

Figure 8. Free-energy profile F(R) with respect to the distance between the LID and CORE domains in the aqueous AdK and its representative structures for open (right) and closed (left) forms.

to the distance between the LID and CORE domains in the aqueous AdK. The AdK-LogPD run was started at R ≈ 30 Å with a negative velocity of 300 K for the dynamical variable XR. The standard EOM (eq 10) (without thermostat) was employed to govern the dynamics of XR, and the velocity was reset to 300 K at R = 25.17 Å with reinitialization of all MD replicas (see below). F(R) in Figure 8 was reconstructed using 25.6 ns ×8 MD trajectories with the following parameter set; α = 3, M = 6 × 105 (kcal mol−1 fs2 Å−2), Nm = 3500, and Nf = 1. The free energy minimum is found around R = 28 Å, and F(R) increases as the distance decreases with a hump around R = 23.5 Å. Although the details are slightly different, similar profiles are also reported in previous studies, where F(R) is estimated for apo-AdK using the multistate Bennett acceptance ratio42 or umbrella sampling41 [note that F(R) decreases as the distance decreases, in contrast, in holo-AdK]. It is thus considered that, even with 8 replicas, the present LogPD calculation succeeded in characterizing the overall conformational change associated with the LID and CORE domains in apo-AdK. 3115

DOI: 10.1021/acs.jctc.7b00252 J. Chem. Theory Comput. 2017, 13, 3106−3119

Article

Journal of Chemical Theory and Computation

be practically taken to be the same as the time step used in the canonical MD run, δt (e.g., Δt = δt = 1 fs). Eq 30 can thus be simplified if Δt is taken to be 1 fs as

feasibility is now under investigation. There should also be other possible techniques to be incorporated in PD or LogPD. More extensive investigations on such attempts thus deserve future work.



⎛ 20000 ⎞2 ⎟⎟ Mi ∼ kBTX ⎜⎜ ⎝ X range ⎠

APPENDIX A: PARAMETERS IN LOGMFD/PD The parameters that mainly affect the accuracy and efficiency of the free energy construction in LogMFD/PD are α (= 1/γ), Nm, and Mi. Their effects on the resultant F(X) are, in part, demonstrated in ref 23. Here, some additional comments on α, Mi, and also on Hlog are given. α can take any value as long as the barriers on the free energy landscape are significantly reduced, and the resultant Xtrajectory sufficiently covers the range we focus on in the Xsubspace. If one can roughly estimate the height of the highest energy barrier, ΔF, α can be chosen such that Ψlog = log[αΔF + 1]/α ≈ kBTX, where TX is normally the same as T. This relation can be rewritten as log(α′f + 1) ≈ α′ by introducing the following scaled parameters, α = α′/kBTX and ΔF = f kBTX. A suitable α (α′) that satisfies this relation can be easily estimated and is presented in section S4. In fact, α′ can be roughly estimated as α′ ≈ 1.5log f [or α ≈ (1.5/kBTX)log(ΔF/kBTX)] as long as f < ∼200 (see section S4 for details). It is worth noting that, thanks to the logarithmic form, readjustment of α according to the system is not always necessary. That is, it is likely that the same α can often be employed to survey free energy landscapes having different barrier heights; for example, Ψlog with α = 4 is ∼0.9 and ∼1.1 kcal mol−1 when ΔF is 10 and 20 kcal mol−1, respectively, indicating that Ψlog is rather insensitive to ΔF. The fictitious mass, Mi, for Xi is crucial to maintain the adiabatic separation as discussed in previous studies using TAMD and MTD as well as LogMFD.17,23,55,56 A suitable range for Mi is empirically evaluated for TAMD simulations for alanine dipeptide in vacuum.56 We, however, found that this range may not be applied to LogMFD simulations because the recommended range is not suitable to ensure the adiabatic separation at each MFD step, which is necessary for LogMFD (but is actually not necessary for TAMD). We found that, in our LogMFD simulations for the aqueous alanine dipeptide, the following intervals between the X points, ΔXi = Xi(t) − Xi(t − Δt), are sufficient to maintain the adiabaticity [we assume here that the MF is evaluated at each MFD step as in Figure S1(b)]; ΔX ≈ Xrange/200 with Nm = 50000 (50 ps) or ΔX ≈ Xrange/ 20000 with Nm = 100 (100 fs), where “Xrange” indicates the range in the X-subspace over which one tries to reconstruct F(X) (2π in this case, see for example Figure 5). It can be easily recognized that the interval can be increased as Nm is increased. Following this finding, we may tentatively determine Mi so that the displacement of Xi at each MFD step roughly equals to Xrange/20000 with a relatively small Nm (∼100). To this end, the initial kinetic energy of Xi is equated to kBTX/2 as ⎛ ΔXi ⎞2 ⎟ = kBTX Mi ⎜ ⎝ Δt ⎠

(31)

This gives Mi ≈ 6 × 106 kcal mol−1 fs2 rad−2 with kBTX = 0.6 (kcal mol−1) (∼300 K) for the aqueous alanine dipeptide system. Note that, if the MF is evaluated only every Nf MFD steps [Figure S1(c)], ΔXi discussed so far should be read as NfΔXi. Although typical ΔX values for maintaining the adiabaticity are available only for aqueous alanine dipeptide at present, we believe this can still be used as a reference to roughly estimate Mi for other systems. In fact, eq 31 was invoked to select an initial trial value of Mi in our calculation of the AdK system presented in section 4. Hlog can, in principle, take any high value in the same way as α. F(X) is, however, obtained by eq 13 (or 15); thus, extremely large Hlog [or αHlog] should not be used to avoid possible overflow. If a LogMFD run starts from the global minimum of the free energy surface, Hlog, can be equal to the initial kinetic energy of X. However, one cannot be sure that a LogMFD calculation always starts from the minimum of F(X) even when X is initially at a state sufficiently equilibrated in a preceding preliminary MD run. It is thus recommended that Hlog is set to be larger than the initial kinetic energy of X (for example, twice the initial kinetic energy). It should be remarked that the resultant F(X) does not depend on the nominal value of Hlog except for the origin in the energy scale. (The discussion here is also applied to HNH.) Although F(X) does not depend on the nominal values of α and Hlog, the dynamics of X does depend on these parameters because the EOM for X depends on them (eq 10). Assuming that the global minimum of F(X) takes c with a preset Hlog, the coefficient of the MF on the r.h.s. of eq 10 (with αγ = 1) can be rewritten as 1 1 1 = = α(F0 + c) + 1 αF0 + αc + 1 (αc + 1)(α′F0 + 1) (32) α , αc + 1

where α′ = F0(X) = F(X) − c, and F0(X) takes 0 at its global minimum. The effective potential Ψlog that gives the EOM for X (eq 10) can also be rewritten as Ψlog = log[α(F0 + c) + 1]/α = log[α′F0 + 1]/α + c′ (33)

where c′ = log(αc + 1)/α. Because c increases as Hlog increases, Ψlog and the coefficient of the MF (eq 32) depends on Hlog. This means that the dynamics of X indeed depends on Hlog as well as α. Recall, however, that the dynamics of X are physically irrelevant as mentioned before, and the resultant F(X) is not affected by the choice of α or Hlog.



(30)

APPENDIX B: INTEGRATORS FOR LOGMFD/PD EQUATIONS The EOM for Xi in LogMFD/PD (without thermostat) is rewritten as

Taking ΔXi to be Xrange/20000, eq 30 gives a rough estimate for Mi according to Δt. Eq 30 does not mean ΔXi is always Xrange/ 20000. However, if the initial position of Xi is near the global minimum of the F(X), or if the thermostat of kBTX is attached to Xi, then the upper bound of ΔXi is well controlled, and ΔXi is ∼Xrange/20000, on average, in the thermostatted LogMFD run. Because the dynamics of Xi are physically irrelevant, Δt can

⎛α MiẌi = −C exp⎜⎜ ⎝2 3116

⎞ ∂F

∑ MiVi2⎟⎟ i

⎠ ∂Xi

(34)

DOI: 10.1021/acs.jctc.7b00252 J. Chem. Theory Comput. 2017, 13, 3106−3119

Article

Journal of Chemical Theory and Computation where C = exp(−αHlog). In general, dealing with the force depending on the velocity as well as the position is not straightforward in Verlet-type algorithms36 (the second-order symplectic integrator57 for the standard microcanonical MD). In the meantime, the calculated force (MF) is not exactly the derivative of F(X) due to statistical errors, indicating that a sophisticated higher-order integrator is unlikely to be necessary, as the error in the MF is expected to be larger than the error associated with the integrator. In fact, an algorithm based on a Euler-like method (the asymmetrical Euler method57 corresponding to the firstorder symplectic integrator for the standard microcanonical MD) is found to be sufficient to integrate eq 34 and to reconstruct reliable F(X). The algorithm is explicitly given as ⎛α Vi (t + Δt ) = Vi (t ) + ΔtC exp⎜⎜ ⎝2

reversibility is not maintained because of the iterative process, this algorithm retains the second-order accuracy with respect to Δt as in the Verlet-type algorithm. Note that ∼5 iterations are sufficient to have a converged Ṽ i(t); thus, the computational cost for the iteration is negligible. It should be kept in mind, however, that the accuracy of the numerical integration (i.e., the conservation of Hlog) depends on the accuracy of the MF much more than the details of the integrator employed in the LogMFD/PD run. We confirmed that both algorithms mentioned above worked well in reconstructing F(X) (section S5). To perform thermostatted LogMFD/PD (with a Nosé− Hoover-type thermostat), we can employ either an integrator based on eqs 35 and 36 or on the combination of the iterative Verlet integrator (eqs 38−41) and the integrator obtained by decomposing the time propagators for the Nosé−Hoover equations.59,60 Here, we consider the latter case for the LogMFD/PD dynamics coupled with a single Nosé−Hoover thermostat as

⎞ Fo (t )

∑ MiVi (t )2 ⎟⎟

i

⎠ Mi

i

(35)

Xi(t + Δt ) = Xi(t ) + ΔtVi (t + Δt )

(36)

M i Ẍ i =

where

Foi = −

∂F ∂Xi

The updated Xi(t + Δt) and Vi(t + Δt) are, in this algorithm, obtained from Xi(t) and Vi(t). Although this algorithm is not directly obtained from the decomposition of the time propagator because of the velocity-dependent force, it enables us to perform a stable LogMFD/PD calculation as long as the MF is accurately estimated. In section S5, we demonstrate the stability of this algorithm using a model system where the MF (eq 37) can be analytically evaluated. It is also possible to employ a Verlet-type algorithm with an iterative process for integrating eq 34. The iterative Verlet algorithm was introduced to handle the velocity-dependent force in the MTK equations for isothermal−isobaric (NPT) MD calculations,58 which can also be employed in LogMFD/ PD as follows: First, Vi at t is estimated as

(

Vi t −

(

Δt 2

∂F

(39)

V(1) i (t),

(

i

(44)

⎞ Fo (t ) ∑ MiVĩ (t )2 ⎟⎠ i Mi

Δt 2

)

(45)

in eq 44 with

F(t) can be recalculated as

⎡ ⎛ αF (2)(t ) + 1 = exp⎢α⎜HNH − ⎣ ⎝ −

∑ 1 MiVi(1)2(t ) 2

⎞⎤ 1 2 Qvη (t ) − NkBTXη(t )⎟⎥ ⎠⎦ 2

(46)

The velocity at t is also recalculated as

Xi(t) is updated as ⎛ Δt ⎞⎟ Xi(t + Δt ) = Xi(t ) + ΔtVi ⎜t + ⎝ 2 ⎠

(

The updated Xi(t + Δt) and Vi t +

(

⎞⎤ 1 2 Qvη (t ) − NkBTXη(t )⎟⎥ ⎠⎦ 2

where Foi = − ∂X . By replacing Vi t −

(40)

X i (t) and Vi t −

2

Δt ⎞⎟ 2 ⎠

⎛ Δt ⎞⎟ ⎛⎜ Δt ⎞ V i(1)(t ) = Vi ⎜t − exp −vη(t ) ⎟ ⎝ ⎠ ⎝ 2 2 ⎠ Foi(t ) ⎞ ⎛ Δt ⎛ Δt ⎞ ⎜⎜ ⎟⎟exp⎜ −vη(t ) ⎟ + (1) ⎝ 2Mi ⎝ αF (t ) + 1 ⎠ 4 ⎠

) is then obtained as

⎛ ⎛ ⎛α Δt ⎞⎟ Δt ⎞⎟ = Vi ⎜t − + ΔtC exp⎜ Vi ⎜t + ⎝2 ⎝ ⎝ 2 ⎠ 2 ⎠



∑ 1 MiVi2⎝⎜t −

Using this F(1)(t), the velocity at t is tentatively calculated as

By repeating this process ∼5 more times, the velocity at t, Ṽ i(t), can be determined. Vi t +

) as



and V′i (t) is again used to evaluate V″i (t) as, ⎞ Foi(t ) ⎟ Mi

Δt 2

⎡ ⎛ αF (1)(t ) + 1 = exp⎢α⎜HNH − ⎣ ⎝

2 ⎛ Δt ⎞ ⎞ Fo (t ) ∑ MiVi ⎜⎝t − ⎟⎠ ⎟ i 2 ⎠ Mi

∑ MiVi′(t )2 ⎠

(43)

where vη is the velocity of η. These EOM can be numerically integrated by the following integration algorithm. First, the free energy at t is tentatively obtained using

(38)

⎛ ⎛α Δt ⎞⎟ Δt + Vi″(t ) = Vi ⎜t − C exp⎜ ⎝2 ⎝ 2 ⎠ 2

(42)

⎡ ⎤ Qη̈ = ⎢∑ MiVi2 − NkBTX ⎥ ⎢⎣ i ⎥⎦

(37)

⎛α ⎛ Δt ⎞⎟ Δt + Vi′(t ) = Vi ⎜t − C exp⎜ ⎝ ⎠ 2 2 ⎝2

⎛ 1 ⎞ ⎜ ⎟K ⟨X̂ (q) − X ⟩ − M Vv i X i i η ⎝ αF + 1 ⎠ i i

Δt 2

)

Δt 2

⎛ Δt ⎞⎟ ⎛⎜ Δt ⎞ V i(2)(t ) = Vi ⎜t − exp −vη(t ) ⎟ ⎝ 2 ⎠ ⎝ 2 ⎠ Foi(t ) ⎞ ⎛ Δt ⎛ Δt ⎞ ⎜⎜ ⎟⎟exp⎜ −vη(t ) ⎟ + 2Mi ⎝ αF (2)(t ) + 1 ⎠ ⎝ 4 ⎠

(41)

) are thus obtained from

in this algorithm. Although the 3117

(47)

DOI: 10.1021/acs.jctc.7b00252 J. Chem. Theory Comput. 2017, 13, 3106−3119

Article

Journal of Chemical Theory and Computation By iterating eqs 46 and 47 ∼5 more times, we obtain V(n) ≈ i (n) V(n−1) . F(t) is thus finally obtained from eq 46 using V (t) (n i i ≈ 5 is normally sufficient). Once the converged Vi(t) is obtained, the other dynamical variables are updated as

Institute for Information Technology, Kyushu University, and at the Research Center for Computational Science, NINS, Japan.



⎛ ⎛ Δt ⎞⎟ Δt ⎞ Δt ⎛ Foi(t ) ⎞ = Vi (t )exp⎜ −vη(t ) ⎟ + Vi ⎜t + ⎟ ⎜ ⎝ ⎝ 2 ⎠ 2 ⎠ 2Mi ⎝ αF(t ) + 1 ⎠ ⎛ Δt ⎞ × exp⎜ −vη(t ) ⎟ ⎝ 4 ⎠

(49)

⎛ Δt ⎞⎟ Δt vη(t ) η⎜t + = η (t ) + ⎝ 2 ⎠ 2

(50)

vη(t + Δt ) = vη(t ) +

(1) Free energy calculations; theory and applications in chemistry and biology; Chipot, C., Pohorille, A., Eds.; Springer: Heidelberg, Germany, 2007. (2) Vanden-Eijnden, E. Some recent techniques for free energy calculations. J. Comput. Chem. 2009, 30, 1737−1747. (3) Fujisaki, H.; Moritsugu, K.; Matsunaga, Y.; Morishita, T.; Maragliano, L. Extended phase-space methods for enhanced sampling in molecular simulations: a review. Front. Bioeng. Biotechnol. 2015, 3, 125. (4) Kirkwood, J. G. Statistical Mechanics of Fluid Mixtures. J. Chem. Phys. 1935, 3, 300−313. (5) Zwanzig, R. W. High-Temperature Equation of State by a Perturbation Method. I. Nonpolar Gases. J. Chem. Phys. 1954, 22, 1420−1426. (6) Pohorille, A.; Jarzynski, C.; Chipot, C. Good Practices in FreeEnergy Calculations. J. Phys. Chem. B 2010, 114, 10235−10253. (7) Shinoda, K.; Shinoda, W.; Mikami, M. Efficient free energy calculation of water across lipid membranes. J. Comput. Chem. 2008, 29, 1912−1918. (8) Garate, J. A.; Oostenbrink, C. Free-energy differences between states with different conformational ensembles. J. Comput. Chem. 2013, 34, 1398−1408. (9) Rosso, L.; Minary, P.; Zhu, Z.; Tuckerman, M. E. On the use of the adiabatic molecular dynamics technique in the calculation of free energy profiles. J. Chem. Phys. 2002, 116, 4389. (10) Morishita, T.; Itoh, S. G.; Okumura, H.; Mikami, M. Freeenergy calculation via mean-force dynamics using a logarithmic energy landscape. Phys. Rev. E 2012, 85, 066702. (11) Abrams, J. B.; Tuckerman, M. E. Efficient and Direct Generation of Multidimensional Free Energy Surfaces via Adiabatic Dynamics without Coordinate Transformations. J. Phys. Chem. B 2008, 112, 15742−15757. (12) Tuckerman, M. E. Statistical Mechanics: Theory and Molecular Simulation; Oxford University Press: New York, NY, 2010. (13) Car, R.; Parrinello, M. Unified approach for molecular dynamics and density-functional theory. Phys. Rev. Lett. 1985, 55, 2471−2474. (14) Hummer, G.; Kevrekidis, I. G. Coarse molecular dynamics of a peptide fragment: Free energy, kinetics, and long-time dynamics computations. J. Chem. Phys. 2003, 118, 10762−10773. (15) Gear, C. W.; Kevrekidis, I. G.; Theodoropoulos, C. ‘Coarse’ integration/bifurcation analysis via microscopic simulators: microGalerkin methods. Comput. Chem. Eng. 2002, 26, 941−963. (16) Laio, A.; Gervasio, F. L. Metadynamics: a method to simulate rare events and reconstruct the free energy in biophysics, chemistry and material science. Rep. Prog. Phys. 2008, 71, 126601. (17) Iannuzzi, M.; Laio, A.; Parrinello, M. Efficient Exploration of Reactive Potential Energy Surfaces Using Car-Parrinello Molecular Dynamics. Phys. Rev. Lett. 2003, 90, 238302. (18) Maragliano, L.; Vanden-Eijnden, E. A temperature accelerated method for sampling free energy and determining reaction pathways in rare events simulations. Chem. Phys. Lett. 2006, 426, 168−175. (19) Selwa, E.; Huynh, T.; Ciccotti, G.; Maragliano, L.; Malliavin, T. E. Temperature-accelerated molecular dynamics gives insights into globular conformations sampled in the free state of the AC catalytic domain. Proteins: Struct., Funct., Genet. 2014, 107, 2483−2496. (20) Bernardi, R. C.; Melo, M. C. R.; Schulten, K. Enhanced sampling techniques in molecular dynamics simulations of biological systems. Biochim. Biophys. Acta, Gen. Subj. 2015, 1850, 872−877. (21) Gervasio, F. L.; Laio, A.; Parrinello, M. Flexible docking in solution using metadynamics. J. Am. Chem. Soc. 2005, 127, 2600− 2607.

(48)

⎛ Δt ⎞⎟ Xi(t + Δt ) = Xi(t ) + ΔtVi ⎜t + ⎝ 2 ⎠

⎡ Δt ⎣∑ MiVi2 t +

(

Δt 2

) − NkBTX ⎤⎦

Q (51)

⎛ Δt ⎞⎟ Δt vη(t + Δt ) η(t + Δt ) = η⎜t + + ⎝ 2 ⎠ 2

(

(

+ Δt) are obtained from Xi(t), Vi t −

(52)

Δt , η(t + Δt), and vη(t 2 Δt , η(t), and vη(t) in this 2

)

Overall, the updated Xi(t + Δt), Vi t +

)

algorithm, respectively.



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jctc.7b00252. Details of the algorithms to perform a LogMFD/PD calculation (S1, Figure S1); demonstration of the effect of the nonequilibrium path ensemble on the mean force evaluation using a toy model system (S2, Figure S2); free energy profiles for the aqueous alanine dipeptide system obtained from a TAMD calculation (S3, Figure S3); estimation of a suitable parameter value for α (S4, Figure S4); and examination of the stability of the numerical integrators given in eqs 35, 36, and 44−52 (S5, Figure S5 (PDF)



REFERENCES

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. ORCID

Tetsuya Morishita: 0000-0002-8308-6211 Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS T.M. thanks Kinya Kumagai for preparing a python script to implement LogPD calculations on parallel computers at AIST, and Michelle J. S. Spencer for careful reading of the manuscript. This work was supported in part by a Grant-in-Aid for Scientific Research from JSPS KAKENHI (Grants: 17K05620, 15K07037, 15H01360, and 26105012). The computations were performed on the computational facilities at the Research 3118

DOI: 10.1021/acs.jctc.7b00252 J. Chem. Theory Comput. 2017, 13, 3106−3119

Article

Journal of Chemical Theory and Computation (22) Yu, T. Q.; Tuckerman, M. E. Temperature-Accelerated Method for Exploring Polymorphism in Molecular Crystals Based on Free Energy. Phys. Rev. Lett. 2011, 107, 015701. (23) Morishita, T.; Itoh, S. G.; Okumura, H.; Mikami, M. On-the-Fly Reconstruction of Free-Energy Profiles Using Logarithmic MeanForce Dynamics. J. Comput. Chem. 2013, 34, 1375−1384. (24) Nakamura, M.; Obata, M.; Morishita, T.; Oda, T. An ab initio approach to free-energy reconstruction using logarithmic mean force dynamics. J. Chem. Phys. 2014, 140, 184110. (25) MacKerell, A. D., Jr.; Bashford, D.; Bellott, M.; Dunbrack, R. L., Jr.; Evanseck, J. D.; Field, M. J.; Fischer, S.; Gao, J.; Guo, H.; Ha, S.; Joseph-McCarthy, D.; Kuchnir, L.; Kuczera, K.; Lau, F. T. K.; Mattos, C.; Michnick, S.; Ngo, T.; Nguyen, D. T.; Prodhom, B.; Reiher, W. E., III; Roux, B.; Schlenkrich, M.; Smith, J. C.; Stote, R.; Straub, J.; Watanabe, M.; Wiorkiewicz-Kuczera, J.; Yin, D.; Karplus, M. All-Atom Empirical Potential for Molecular Modeling and Dynamics Studies of Proteins. J. Phys. Chem. B 1998, 102, 3586−3616. (26) Perdew, J. P.; Chevary, J. A.; Vosko, S. H.; Jackson, K. A.; Pederson, M. R.; Singh, D. J.; Fiolhais, C. Atoms, molecules, solids, and surfaces: Applications of the generalized gradient approximation for exchange and correlation. Phys. Rev. B: Condens. Matter Mater. Phys. 1992, 46, 6671−6687. (27) Morishita, T.; Spencer, M. J. S.; Kawamoto, S.; Snook, I. K. A new surface and structure for silicene: polygonal silicene formation on the Al (111) surface. J. Phys. Chem. C 2013, 117, 22142−22148. (28) Raiteri, P.; Laio, A.; Gervasio, F. L.; Micheletti, C.; Parrinello, M. Efficient Reconstruction of Complex Free Energy Landscapes by Multiple Walkers Metadynamics. J. Phys. Chem. B 2006, 110, 3533− 3539. (29) Carter, E. A.; Ciccotti, G.; Hynes, J. T.; Kapral, R. Constrained reaction coordinate dynamics for the simulation of rare events. Chem. Phys. Lett. 1989, 156, 472−477. (30) Nosé, S. A molecular dynamics method for simulations in the canonical ensemble. Mol. Phys. 1984, 52, 255−268. (31) Hoover, W. G. Canonical dynamics: Equilibrium phase-space distributions. Phys. Rev. A: At., Mol., Opt. Phys. 1985, 31, 1695−1697. (32) Martyna, G. J.; Klein, M. L.; Tuckerman, M. Nosé-Hoover chains: The canonical ensemble via continuous dynamics. J. Chem. Phys. 1992, 97, 2635−2643. (33) Morishita, T. From Nosé-Hoover chain to Nosé-Hoover network: design of non-Hamiltonian equations of motion for molecular-dynamics with multiple thermostats. Mol. Phys. 2010, 108, 1337−1347. (34) Liu, Y.; Tuckerman, M. E. Generalized Gaussian moment thermostatting: A new continuous dynamical approach to the canonical ensemble. J. Chem. Phys. 2000, 112, 1685−1700. (35) Nosé, S. Constant Temperature Molecular Dynamics Methods. Prog. Theor. Phys. Suppl. 1991, 103, 1−46. (36) Allen, M. P.; Tildesley, D. J. Computer Simulation of Liquids; Oxford University Press: Oxford, UK, 1987. (37) Crooks, G. E. Path-ensemble averages in systems driven far from equilibrium. Phys. Rev. E: Stat. Phys., Plasmas, Fluids, Relat. Interdiscip. Top. 2000, 61, 2361−2366. (38) Jarzynski, C. Nonequilibrium Equality for Free Energy Differences. Phys. Rev. Lett. 1997, 78, 2690−2693. (39) Shapiro, Y. E.; Sinev, M. A.; Sineva, E. V.; Tugarinov, V.; Meirovitch, E. Backbone Dynamics of Escherichia coli Adenylate Kinase at the Extreme Stages of the Catalytic Cycle Studied by 15N NMR Relaxation. Biochemistry 2000, 39, 6634−6644. (40) Pontiggia, F.; Zen, A.; Micheletti, C. Small- and Large-Scale Conformational Changes of Adenylate Kinase: A Molecular Dynamics Study of the Subdomain Motion and Mechanics. Biophys. J. 2008, 95, 5901−5912. (41) Jana, B.; Adkar, B. V.; Bagchi, B. Dynamic coupling between the LID and NMP domain motions in the catalytic conversion of ATP and AMP to ADP by adenylate kinase. J. Chem. Phys. 2011, 134, 035101. (42) Matsunaga, Y.; Fujisaki, H.; Terada, T.; Furuta, T.; Moritsugu, K.; Kidera, A. Minimum Free Energy Path of Ligand-Induced Transition in Adenylate Kinase. PLoS Comput. Biol. 2012, 8, e1002555.

(43) Yonezawa, Y. A Method for Predicting Protein Conformational Pathways by Using Molecular Dynamics Simulations Guided by Difference Distance Matrices. J. Comput. Chem. 2016, 37, 1139−1146. (44) Evans, D. J.; Searles, D. J. Steady states, invariant measures, and response theory. Phys. Rev. E: Stat. Phys., Plasmas, Fluids, Relat. Interdiscip. Top. 1995, 52, 5839−5848. (45) Fukunishi, Y.; Mikami, Y.; Nakamura, H. The Filling Potential Method: A Method for Estimating the Free Energy Surface for Protein-Ligand Docking. J. Phys. Chem. B 2003, 107, 13201−13210. (46) Cornell, W. D.; Cieplak, P.; Bayly, C. I.; Gould, I. R.; Merz, K. M., Jr.; Ferguson, D. M.; Spellmeyer, D. C.; Fox, T.; Caldwell, J. W.; Kollman, P. A. A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules. J. Am. Chem. Soc. 1995, 117, 5179−5197. (47) Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 1983, 79, 926−935. (48) Ryckaert, J. P.; Ciccotti, G.; Berendsen, H. J. C. Numerical integration of the cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes. J. Comput. Phys. 1977, 23, 327−341. (49) Evans, D. J.; Hoover, W. G.; Failor, B. H.; Moran, B.; Ladd, A. J. C. Nonequilibrium molecular dynamics via Gauss’s principle of least constraint. Phys. Rev. A: At., Mol., Opt. Phys. 1983, 28, 1016−1021. (50) Muller, C. W.; Schlauderer, G. J.; Reinstein, J.; Schulz, G. E. Adenylate kinase motions during catalysis: an energetic counterweight balancing substrate binding. Structure 1996, 4, 147−156. (51) Ozer, G.; Valeev, E. F.; Quirk, S.; Hernandez, R. Adaptive Steered Molecular Dynamics of the Long-Distance Unfolding of Neuropeptide Y. J. Chem. Theory Comput. 2010, 6, 3026−3038. (52) Ozer, G.; Keyes, T.; Quirk, S.; Hernandez, R. Multiple branched adaptive steered molecular dynamics. J. Chem. Phys. 2014, 141, 064101. (53) Chelli, R.; Gellini, C.; Pietraperzia, G.; Giovannelli, E.; Cardini, G. Path-breaking schemes for nonequilibrium free energy calculations. J. Chem. Phys. 2013, 138, 214109. (54) Vaikuntanathan, S.; Jarzynski, C. Escorted Free Energy Simulations: Improving Convergence by Reducing Dissipation. Phys. Rev. Lett. 2008, 100, 190601. (55) Ensing, B.; Laio, A.; Parrinello, M.; Klein, M. L. A Recipe for the Computation of the Free Energy Barrier and the Lowest Free Energy Path of Concerted Reactions. J. Phys. Chem. B 2005, 109, 6676−6687. (56) Cuendet, M. A.; Tuckerman, M. E. Free Energy Reconstruction from Metadynamics or Adiabatic Free Energy Dynamics Simulations. J. Chem. Theory Comput. 2014, 10, 2975−2986. (57) Leimkuhler, B.; Reich, S. Simulating Hamiltonian Dynamics; Cambridge University Press: Cambridge, UK, 2004. (58) Martyna, G. J.; Tobias, D. J.; Klein, M. L. Constant pressure molecular dynamics algorithms. J. Chem. Phys. 1994, 101, 4177−4189. (59) Martyna, G. J.; Tuckerman, M. E.; Tobias, D. J.; Klein, M. L. Explicit reversible integrators for extended systems dynamics. Mol. Phys. 1996, 87, 1117−1157. (60) Yamamoto, T. Comment on “Comment on ‘Simple reversible molecular dynamics algorithms for Nosé-Hoover chain dynamics’. J. Chem. Phys. 2006, 124, 217101.

3119

DOI: 10.1021/acs.jctc.7b00252 J. Chem. Theory Comput. 2017, 13, 3106−3119