Article pubs.acs.org/JCTC
TaBoo SeArch Algorithm with a Modified Inverse Histogram for Reproducing Biologically Relevant Rare Events of Proteins Ryuhei Harada,*,†,‡,§ Yu Takano,*,∥ and Yasuteru Shigeta*,†,‡ †
Department of Physics, Graduate School of Pure and Applied Sciences, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8571, Japan ‡ Center for Computational Sciences, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8577, Japan § Computational Engineering Applications Unit, RIKEN, Advanced Center for Computing and Communication, 2-1, Hirosawa, Wako-shi, Saitama 351-0198, Japan ∥ Department of Biomedical Information Sciences, Hiroshima City University, 3-4-1 Ozuka-Higashi, Asa-Minami-Ku, Hiroshima 731-3194, Japan ABSTRACT: The TaBoo SeArch (TBSA) algorithm [Harada et al. J. Comput. Chem. 2015, 36, 763−773 and Harada et al. Chem. Phys. Lett. 2015, 630, 68−75] was recently proposed as an enhanced conformational sampling method for reproducing biologically relevant rare events of a given protein. In TBSA, an inverse histogram of the original distribution, mapped onto a set of reaction coordinates, is constructed from trajectories obtained by multiple short-time molecular dynamics (MD) simulations. Rarely occurring states of a given protein are statistically selected as new initial states based on the inverse histogram, and resampling is performed by restarting the MD simulations from the new initial states to promote the conformational transition. In this process, the definition of the inverse histogram, which characterizes the rarely occurring states, is crucial for the efficiency of TBSA. In this study, we propose a simple modification of the inverse histogram to further accelerate the convergence of TBSA. As demonstrations of the modified TBSA, we applied it to (a) hydrogen bonding rearrangements of Met-enkephalin, (b) large-amplitude domain motions of Glutamine-Binding Protein, and (c) folding processes of the B domain of Staphylococcus aureus Protein A. All demonstrations numerically proved that the modified TBSA reproduced these biologically relevant rare events with nanosecond-order simulation times, although a set of microsecond-order, canonical MD simulations failed to reproduce the rare events, indicating the high efficiency of the modified TBSA.
I. INTRODUCTION Structural transitions of proteins are strongly related to their biological functions. Hence, clarifying the underlying mechanisms of the structural transitions of proteins is indispensable for understanding their functions. Structural transitions rarely occur over long time scales and are regarded as biologically relevant rare events. These rare events include protein folding,1−4 molecular recognition upon ligand/drug binding,5,6 molecular associations, and large-amplitude structural transitions of proteins.7 Moreover, slow biochemical processes include structural transitions induced in simple processes, as well as more complicated activation processes, such as oxidative damage of DNA via radical cation formation,8 the hydrolysis reaction of Hsc70 ATPase,9 and the formation of a covalent glycosyl-enzyme species.10 To theoretically reproduce the chemical reactions and structural transitions of biomolecules, quantum mechanics (QM), quantum mechanics and molecular mechanics (QM/MM), and molecular dynamics (MD) simulations have been widely used. In particular, from a dynamic aspect, the resultant atomic trajectories of the MD simulations are used directly to analyze the intrinsic dynamics of a protein.11−13 From a statistical point of view, the atomic © XXXX American Chemical Society
trajectories are used to calculate the physical properties of a given protein as ensemble averages. If sufficiently long-time MD simulations can be performed, then the atomic trajectories can be utilized to analyze the biological functions and to calculate several physical properties on an equal footing. Herein, a time average over a sufficiently long-time MD simulation is ideally equal to a statistical average based on a conformational ensemble, because the long-time trajectory includes structural transitions relevant to the biological functions and can be regarded as a statistically reliable ensemble to estimate the physical quantities of a given protein. However, for the sake of clarity, we note that the sufficiently long-time MD simulation does not hold with the ergodic hypothesis, which has been shown to be false, especially for NVT simulations.14−16 Unfortunately, the time scales of biological functions are generally in the range of milliseconds to seconds, far from the accessible time scales of conventional MD simulations, which are on the order of nanoseconds to microseconds. Thus, the Received: January 22, 2016
A
DOI: 10.1021/acs.jctc.6b00082 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation
coordinates (RCs). Then, multiple short-time MD simulations are performed simultaneously, starting from the rarely occurring states with renewed velocities. As reported in previous studies, restarting the MD simulations with renewed initial velocities efficiently promotes structural transitions,50 where the regeneration of the initial velocities might perturb the temperatures of biological systems and stochastically provide sufficient kinetic energy to overcome the free energy barriers upon structural transition. Herein, the conformational resampling of TBSA is summarized as follows: (I) selection of initial structures with the inverse histograms and (II) conformational resampling with renewed initial velocities. Cycles of (I) and (II) might eventually promote structural transitions, enhancing the conformational sampling of proteins. As motivation for modifying the inverse histogram, we next focused on applications to large biological systems. In our previous studies,48,49 we only considered small systems (for instance, chignolin in an implicit solvent environment). In such simple systems, the original inverse histogram effectively characterizes the rarely occurring states and efficiently distributes initial structures onto relatively broad conformational spaces without any modification of the inverse histogram. However, considering the challenging application to large biological systems, it might be difficult to distribute the initial structures in the same way, because the initial structures selected by the original inverse histogram are insufficient to cover the broad conformational subspace due to the large number of degrees of freedom. Therefore, we need to determine how to efficiently distribute the initial structures onto the broad conformational subspaces by modifying the inverse histogram with a weight function. In this study, we propose an efficient inverse histogram with a simple modification and apply the modified TBSA to several proteins. As typical examples of biologically relevant rare events, we consider the (I) hydrogen bonding rearrangement of Met-enkephalin (Menk), (II) large-amplitude domain motion of Glutamine-Binding Protein (GlnBP), and (III) protein folding of the B domain of Staphylococcus aureus Protein A. The paper is organized as follows. In Section II, a modification of the inverse histogram and a brief summary of TBSA are described. The numerical results are discussed in Section III. Issues related to the proposed method are discussed in Section IV. Finally, a summary is provided in Section V.
issue of time scale in the conventional MD simulations should be adequately considered when analyzing biological functions with MD simulations. As a strategy for addressing the issue of time scale, special purpose machines, such as the Anton series, enable extremely long-time MD simulations to be performed,13,17−19 and these machines typically extend the accessible time scales of conventional MD simulations to milliseconds. In addition to the specialized hardware, generalpurpose graphics processing units (GPUs) have recently been used with MD programs. Many important algorithms and models have been implemented on GPU-based MD packages. For instance, myPresto/psygene-G,20−22 a recently developed GPU-based MD program, has accelerated the calculations of electrostatic interactions using zero dipole summation methods.23−26 The current popular MD packages, including AMBER,27 GROMACS,28 NAMD,29 and OpenMM,30 have been implemented on GPUs. If one can perform extremely long-time MD simulations, there is no question that they will be the best choice, because they convey the time-dependent information that is lacking in other types of biased simulations. However, the long-time MD simulations may not capture the essential conformational transitions during every MD simulation, because the conformational transitions relevant to the biological functions are stochastic processes. Therefore, distinct long-time MD simulations starting from different initial structures should be performed to remove the dependence on the initial conditions. As alternative ways to reproduce biologically relevant rare events, several enhanced sampling methods have been proposed. To enhance the conformational sampling, external perturbations are artificially imposed. As well-established biased dynamics, targeted MD (TMD)31 and steered MD (SMD)32,33 provide sets of candidates of transition pathways connecting the reactant to the product by imposing external restraints or forces with respect to the product of a given protein. Metadynamics34 is one of the most popular enhanced conformational sampling methods based on external bias potentials. Recently, variants35−37 of metadynamics have also been proposed for reproducing more complicated biological reactions.8,38 The biological reactions of a given protein can be described by a set of collective variables (CVs). Once the CVs are specified, the positive history-dependent Gaussian potentials as functions of the CVs are added to the original potential energy to enhance the conformational sampling. Similarly to metadynamics, the conformational flooding method39 promotes conformational sampling based on the flooding potentials that are imposed on the conformational subspaces. As other strategies based on Boltzmann and non-Boltzmann ensembles, replica exchange MD (REMD),40 multicanonical MD (McMD)41 simulations, and their variants42−46 have been proposed. These methods are referred to as the generalized ensemble methods47 and have been widely used to find the global minima and metastable structures in protein folding problems. Based on these developments, it is desirable to establish enhanced sampling methods to reproduce biologically relevant rare events. To evade the conformational sampling problem arising from the limited time scale, we recently proposed a simple, yet powerful conformational sampling method called the TaBoo SeArch (TBSA) algorithm.48,49 In this method, to promote structural transitions, rarely occurring states of biomolecules are identified with a weight proportional to an inverse histogram, where the inverse histogram is derived from the original distribution mapped onto a set of reaction
II. MATERIALS AND METHODS In TBSA, rarely occurring states of a given protein are identified by the inverse histogram. When applying this method to a given protein, the definition of the inverse histogram is crucial for reproducing the biologically relevant rare events. In our previous study,49 the inverse histogram at the k-th cycle, Inv(k)(E), was defined from the energy density, D(k)(E), as follows (k) (k) (k) Inv(k)(E) = (Dmax − D(k)(E))θ(Emax − E)θ(E − Emin )
(1)
D(k) max
where is the peak of the energy density at the k-th cycle. The step function, θ(E), is employed to identify explicitly unvisited energy regions that are distributed outside of E(k) min < E < E(k) max at the k-th cycle. To further enhance the applicability of TBSA, a multidimensional form of the inverse histogram at the k-th cycle, Inv(k)(ξ ⃗), is introduced as an extension of eq 1, which is a function of a set of RCs, ξ⃗, as follows48 B
DOI: 10.1021/acs.jctc.6b00082 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation N
large number of grids, then the total number of initial structures must be increased to cover the conformational subspace sufficiently. In contrast, if the conformational subspace is roughly divided into a small number of grids, then it is difficult to perform a sufficient conformational search. Therefore, the total number of grids for constructing the inverse histogram should be neither too large nor too small. The optimized number of grids depends on both the available computational resources and the desired accuracy in the conformational sampling. Hereafter, the TBSAs with and without the newly proposed inverse histogram defined by eq 3 are referred to as the modified and the original TBSA, respectively, for simplicity. The modified TBSA is summarized as follows: first, a set of RCs is specified to generate a histogram, and then the shorttime MD simulations are started from the initial structure as preliminary runs to generate an initial modified inverse histogram for the first cycle. The cutoff value of the inverse histogram in eq 3 is also set to identify rarely occurring states of a given protein. Next, cycles of the following steps are repeated until the conformational sampling is sufficiently converged. The flowchart of TBSA was provided in our previous report (for instance, see Figure 1 of ref 49). (I) All trajectories of the conformational resampling, ranging from the initial to (k-1)-th cycle, are mapped onto a subspace spanned by a set of RCs (ξ⃗), providing both the k-th distribution function and the modified inverse histogram, Inv′(k)(ξ ⃗), by eq 3. Based on Inv′(k)(ξ ⃗), rarely occurring states are identified. Snapshots are then selected from all of the trajectories with weights proportional to Inv′(k)(ξ ⃗) and are regarded as initial structures for the conformational resampling at the k-th cycle. (II) The conformational resampling is performed by restarting multiple short-time MD simulations from the selected initial structures. To promote the structural transitions efficiently, the initial velocity of each structure is renewed within the Maxwell−Boltzmann distribution when restarting the multiple short-time MD simulations.
(k) k) k) Inv(k)(ξ ⃗) = (Histmax − Hist (k)(ξ ⃗)) ∏ θi(ξi(,max − ξi)θi(ξi − ξi(,min ) i=1
(2) (k)
(k) where Hist (ξ ⃗) and Histmax are the multidimensional distribution function (histogram) mapped onto the given RCs and its maximum value, respectively, and N is the total → ⎯ number of RCs. The step function, θi(ξi ), defines the maximum ≤ ξi ≤ ξi(k) . and minimum range for the i-th RC, ξi(k) min max In this study, a multidimensional inverse histogram at the kth cycle, Inv′(k)(ξ ⃗), is proposed as a simple modification, as follows
Inv′(k)(ξ ⃗) = w(k)(ξ ⃗)Inv(k)(ξ ⃗)
(3)
⎧ 1 Inv(k)(ξ ⃗) ≥ cutoff ⎪ w(k)(ξ ⃗) = ⎨ ⎪ 0 Inv(k)(ξ ⃗) ≤ cutoff ⎩
(4)
where a weight function at the k-th cycle, w(k)(ξ ⃗), limits the rarely occurring states in terms of a cutof f defined in eq 3, i.e., the states with low populations are more intensively selected as the initial structures via the weight function. The weight function plays a similar role to the acceptance criterion in Monte Carlo simulations, because the values of the weight function are either 1 or 0 depending on whether the inverse histogram is larger or smaller than the given cutof f. The cutoff parameters in the weight function are used to select states with low populations intensively. The inverse histogram is modified by the cutoff parameters. Due to the introduction of these cutoff parameters, the probability of selecting the states with low populations rapidly increases. In all of the demonstrations, we tested the most extreme case (cutof f = the highest value of the inverse histogram), which means that the initial structures would only be selected from the states with the lowest population of the original distribution in each cycle. This extreme selection for the cutoff rapidly expands the distributions toward unvisited conformational subspaces. In actual applications, the ranges of the cutoff parameters should be carefully selected. One option is to set the range of the cutoff parameters as a ratio with respect to the peak of the inverse histogram. The details of the optimization of the simulation length of the short-time MD simulation were discussed previously.51 In our previous analysis, we found that most of the initial structures were selected from the first tens of picoseconds, i.e., within a 50 ps transitional time, through tests in several biological systems.52 Therefore, in this study, a 100 ps MD simulation was selected for all of the demonstrations, because this simulation time is sufficiently longer than the transitional times. To determine the length of the short-time MD simulation, an optimum simulation time is estimated by trial and error, because the optimum simulation time is dependent on each biological target. If possible, it is better to perform short-time MD simulations with several simulation times during the first few cycles while monitoring which snapshots are frequently selected as initial structures. Based on the trial cycles, the optimum simulation length can be determined. The total number of initial structures selected in each cycle is determined based on the total number of grids when constructing the inverse histogram. If a conformational subspace spanned by a set of RCs is finely divided into a
III. RESULTS In this section, we analyzed three different structural transitions that are characteristic of protein functions: (a) local hydrogen bonding rearrangements, (b) global large-amplitude domain motions due to ligand binding, and (c) protein folding processes from a fully extended to a native state. The degree of difficulty in these conformational searches generally follows this order: (a) < (b) < (c). First, we applied both the modified TBSA and the long-time canonical MD simulations to (a) to prove the efficiency of the conformational search of the former method. Next, we also applied this method to more realistic systems (b) and (c) and compared the conformational sampling efficiency with the efficiency obtained by the longtime canonical MD simulations. The numerical details and the obtained results for each problem are thoroughly described. (a) Local Hydrogen Bonding Rearrangements of Menk. As the first demonstration, the hydrogen bonding rearrangement of Menk in a vacuum was addressed. Menk is a 5-residue peptide consisting of 84 atoms, with several characteristic hydrogen bonds (HBs). The C- and N-termini are capped with acetyl- and N methyl-groups, respectively (Ace-Tyr-Gly-Gly-Phe-Met-Nme). As reported in previous studies,52−54 Menk has two metastable states (LM1 and C
DOI: 10.1021/acs.jctc.6b00082 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation LM2) at 300 K in a vacuum, where LM2 is the global minimum. Menk transitions between LM1 and LM2 by rearranging the backbone hydrogen bonds (HBs). Herein, the free energy barrier for the rearrangement of the HBs is several times kBT (∼5 kBT, where kBT ∼ 0.59 kcal/mol at 300 K) according to the multiscale free energy calculation.53 This free energy difference is relatively high; therefore, it is usually difficult to reproduce the structural transition between LM1 and LM2 with long-time canonical MD simulations at 300 K. Therefore, Menk can be regarded as a suitable target to confirm the conformational sampling efficiency, although this system is relatively simple and small compared to real proteins. The Cα root-mean-square deviations (RMSDs) measured from LM1 and LM2 were employed as the set of RCs. Based on the set of Cα RMSDs, a two-dimensional inverse histogram was constructed using eq 3. The cutoff in eq 3 was set to the maximum value of Inv(k)(ξ ⃗) at the k-th cycle, i.e., the states with extremely low populations (the states with the minimum population in the original histogram) were identified as the initial states for the next conformational resampling. In this demonstration, 100 initial structures were employed for the conformational resampling in each cycle. The total computational time of the conformational resampling per cycle was 10 ns (100 initial structures × 100 ps). Starting from LM1 and LM2, the cycles of detection of the initial structures and their conformational resampling were continued until the 50th cycle, amounting to a total of 500 ns for each trial. The conformational resampling was performed by restarting the short-time MD simulations from the 100 initial structures. All of the MD simulations were performed with the SANDER module of AMBER1427 under the NVT ensemble (T = 300 K). The temperature was controlled using a Berendsen thermostat55 with a relaxation time of 1 ps. The AMBER ff96 force field56 was used as the energy function of Menk in a vacuum. To check the convergence of the conformational sampling, we focused on the conformational space that was searched by the modified TBSA. To quantify the conformational subspaces, grids were defined as finite areas in the subspace. By tracing the accumulated number of grids in the subspace, ng(cycle), it is possible to confirm how the conformational resampling was quickly achieved by the modified TBSA. A set of Cα RMSDs measured from LM1 and LM2 was employed as axes to construct grids in the subspace with a finite area (0.1 Å × 0.1 Å). Based on the definitions of the grids, ng(cycle) was monitored as the cycles progressed. Figure 1 shows the convergence of the conformational areas for Menk in the MD simulations starting from the metastable states LM1 (Figure 1(a), red) and LM2 (Figure 1(b), red). In both cases, ng(cycle) sufficiently converged to more than 1,000 areas within 50 cycles, although the convergence speeds were slightly different for LM1 and LM2. The convergence speed of LM1 was faster than that of LM2, because it took a longer time to escape from the global minimum (LM2) than the local minimum (LM1) due to the difference in the free energy barriers measured from both minima. Based on the red curves in Figure 1(a-b), 40 cycles are sufficient to search the overall conformational subspaces of Menk in a vacuum. In contrast, the final conformational areas of the original TBSA at the 50th cycle were much smaller than the modified areas, with more than 400 for LM1 (Figure 1(a), blue) and 350 for LM2 (Figure 1(b), blue), indicating the slow convergence of the original TBSA. These results show the power of the modified TBSA for
Figure 1. Convergence of the conformational areas of the TBSAs for Menk, starting from LM1 (a) and LM2 (b), where the numbers of grids, ng(cycle), are shown during 50 cycles. The blue and red curves correspond to the original and modified TBSAs, respectively. The dashed lines represent the convergence of the number of grids after the 40th cycle.
the conformational search. Therefore, we need an alternative approach to efficiently distribute the initial seeds onto the highdimensional conformational areas using the modified inverse histogram with lower computational costs, which is why the proposal of a modified inverse histogram is meaningful. The red points in Figure 2 show projections of the trajectories of the conformational resampling, where the subspace was spanned by a set of Cα RMSDs measured from LM1 and LM2. The left (a-i) and right (b-j) panels of Figure 2 correspond to the projections of the conformational resampling starting from LM1 and LM2, respectively. As observed in Figure 2(c-d), the conformational transitions between LM1 and LM2 were reproduced after the 10th cycle but were not yet reproduced at the first cycle in Figure 2(a-b). According to Figure 2(e-f) and Figure 2(g-h), the projections at the 25th and 50th cycles were similar, indicating the convergence of the conformational resampling. The blue points represent the initial structures of the conformational resampling and were selected with the weighted inverse histogram defined by eq 3. The initial structures were mainly distributed at the edges or transitional regions of the subspace. In each cycle, the conformational resampling was initiated from these marginal regions with renewed initial velocities, which efficiently promoted the conformational transitions and is why the modified TBSA exhibits faster convergence than the original TBSA. Thus, the detection of the marginal region plays a key role in the efficient conformational sampling. To compare the conformational sampling efficiency with canonical MD simulations, a set of long-time (10 μs) canonical MD simulations was started from LM1 and LM2 under the NVT (T = 300 K) ensemble. Figure 2(i-j) shows the projections of the 10-μs trajectories. As observed in Figure 2(i-j), the long-time canonical MD simulations successfully reproduced the structural transitions between LM1 and LM2. However, the high-energy regions far from LM1 and LM2 were not sufficiently sampled. In contrast, the modified TBSA sampled broad conformational areas with lower computational costs (500 ns). These results also support the high conformational sampling efficiency of this approach. D
DOI: 10.1021/acs.jctc.6b00082 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation
Figure 3. Projections of the trajectories generated by the original TBSA mapped onto the subspace spanned by a set of Cα RMSDs measured from LM1 and LM2. The left (a-g) and right (b-h) panels correspond to the projections of trajectories starting from LM1 and LM2 at the 1st, 10th, 25th, and 50th cycles. The blue points represent the initial structures selected by the normal inverse histogram in each cycle.
Figure 2. Projections of trajectories generated by the modified TBSA mapped onto the subspace spanned by a set of Cα RMSDs measured from the metastable states, LM1 and LM2. The left (a-g) and right (bh) panels correspond to the projections of the trajectories starting from LM1 and LM2 at the 1st, 10th, 25th, and 50th cycles. The bottom panels (i-j) correspond to the projections of trajectories generated by canonical 10-μs MD simulations at 300 K starting from LM1 and LM2. The backbone representations of LM1 and LM2 are shown in the bottom panels. LM1 has a γ-turn structure that is stabilized by two backbone hydrogen bonds (HBs) between Gly2 and Met5, in which one HB is formed between the acceptor CO of Gly2 and the donor NH of Phe4. LM2 has a β-turn structure with two HBs between Gly2 and Phe4.
movements from the open to closed forms are induced upon ligand (glutamine) binding to a cleft between the two domains in the open form. The structures of the open-apo (PDB ID: 1GGG)57 and the closed-holo (PDB ID: 1WDN)58 forms have been determined by X-ray crystallography. Note here that the structural transition of the apo-type (without the ligand) was addressed for simplicity. For the initial structures, the open-apo and closed-apo forms were constructed from the X-ray structures. The ligand was removed from the closed-holo form when modeling the apo-closed form, because the X-ray structure of the closed-apo form is unavailable. An explicit water environment was modeled with the TIP3P model,59 and one chloride ion was added to neutralize the systems, finally amounting to 62,666 atoms for the solvated model systems. After short energy minimizations, 100 ps MD simulations were performed for both the open-apo and the closed-apo systems under NPT (P = 1 bar, T = 300 K) to adjust their volumes. All of the MD simulations were performed with the PMEMD module of AMBER14.27 Subsequently, 1 ns NVT (T = 300 K) simulations were performed for both the systems to relax them. The pressure and the temperature were controlled by a Berendsen barostat and thermostat56 for the NPT and NVT simulations, respectively. The final snapshots of the relaxation processes were used as the set of initial structures. The conformational resampling from these initial structures was performed by the same procedures used for Menk and were continued until the 50th cycle starting from the apo-open and apo-closed forms. The total simulation time was 500 ns (100 initial structures × 100 ps × 50 cycles) for each trial. The Cα RMSDs measured from the open-apo and closed-apo forms were employed as the set of RCs. Figure 4 shows the projections of the trajectories generated by the modified TBSA mapped onto the subspace spanned by a set of Cα RMSDs. The left and right panels of Figure 4 correspond to the projections of the trajectories generated by
The conformational sampling efficiency was also compared with the original TBSA. Figure 3 shows the projections of the trajectories generated by the original TBSA. As observed in Figure 3(c-d), the structural transitions between LM1 and LM2 were reproduced after the 10th cycle, even when the original TBSA was employed. However, the conformational areas were not expanded after the 10th cycle, making it difficult to search the conformational areas far from LM1 and LM2 with the original TBSA. In the original TBSA, the speed of convergence for the conformational resampling was slow, as observed in the blue curves in Figure 1(a-b). Moreover, comparing Figure 3 with Figure 2, the initial structures detected by the original TBSA are sometimes similar and located on one side. In contrast, the structures detected by the modified TBSA are located at the edges of the distributions. Therefore, the modified TBSA efficiently selected the initial structures, occasionally promoted the structural transition from LM1 to LM2, and gradually broadened the distribution in the conformational areas after the global minima were completely searched. (b) Global Large-Amplitude Domain Motions of GlnBP. As an application to a realistic system, the open-closed structural transition of GlnBP in explicit water was addressed with the all-atom model. GlnBP is a periplasmic protein with 226 residues (3,535 atoms). In GlnBP, large hinge-binding E
DOI: 10.1021/acs.jctc.6b00082 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation
used as the starting structure. To save computational costs, the Generalized Born with Surface Area (GBSA) model was employed to include the solvent effects implicitly. The salt concentration was set to 0.2 M, and the surface tension was 0.005 kcal/mol/Å2 (IGB = 5 in AMBER 14).27 All of the MD simulations were performed with the SANDER module of AMBER 14. The AMBER ff03 force field61 was used as the parameter for the energy function. The temperature of the system was controlled at 300 K using a Berendsen thermostat.55 A set of partial Cα RMSDs was selected as RCs based on the secondary structures of the native state, in which the segments were defined by the three-bundle helices of Protein A, i.e., the first helix (residues 10−24), the second helix (residues 25−37), and the third helix (residues 38−55), respectively. Based on these secondary structures, we adopted the following two segments, A (the first and second helices: residues 10−37) and B (the second and third helices: residues 25−55), to define a set of partial Cα RMSDs measured from the native state. The overlapping region (the second helix) ensures that the overall folding is described correctly when both segments fold into the native state. In each cycle, 100 initial structures were selected with the weighted inverse histogram. For the conformational resampling, a 10 ns computation was required (100 initial structures × 100 ps) per cycle. Figure 5 shows the projections of the trajectories generated by the modified TBSA. Starting from a fully extended structure, the initial structures gradually approached the native structure according to the cycles (see the distributions of blue points in each cycle). As observed in Figure 5(d), the modified TBSA successfully sampled the native structure within 2.0 Å of the Cα RMSD at the 30th cycle. After 50 cycles, the closest snapshot to
Figure 4. Projections of the trajectories generated by the modified TBSA mapped onto the subspace spanned by a set of Cα RMSDs measured from the open-apo (OPA) and closed-apo forms (CLA). The left (a-g) and right (b-h) panels correspond to the projections of trajectories starting from OPA and CLA at the 1st, 10th, 25th, and 50th cycles, respectively. The bottom panels (i-j) correspond to the projections of trajectories generated by the canonical 500 ns MD simulations at 300 K starting from OPA and CLA. In the bottom panels, the X-ray crystal structures of OPA and CLA are shown.
the conformational resampling from the open-apo and closedapo forms, respectively. As observed in Figure 4(e-f), the modified TBSA successfully reproduced each conformational transition within 25 cycles, whereas the 10th cycle was not always sufficient to reproduce the transition. The projections broadened after the 25 cycles, indicating that the modified TBSA gradually covered the high-energy areas that were far from the starting structures. For comparison, the trajectories obtained by canonical 500 ns MD simulations (T = 300 K) starting from both the open-apo and closed-apo forms (with the same simulation time as the 50 cycles of the modified TBSA) were mapped onto the subspace, as depicted in Figure 4(i-j). Each canonical MD simulation failed to reproduce the structural transition within 500 ns. The conformational subspaces searched by the canonical MD simulations were much narrower than those searched by the modified TBSA, again indicating the remarkable efficiency of this approach. As shown in this example, the modified TBSA may be applicable to normal size proteins over a few hundred residues, and one might reproduce these types of large-amplitude domain motions upon structural transitions with this approach. (c) Folding Processes of Protein A. As an additional application, the folding processes of protein A (the B domain of Staphylococcus aureus Protein A) (PDB ID: 1BDD)60 were analyzed by the modified TBSA. Some of the terminal residues were omitted, and only residues 10−55 of the original sequence were considered in the model system (46 residues, 725 atoms). A fully extended structure was modeled with the tLeap module of AMBER 14, according to the amino-acid sequence, and was
Figure 5. Projections of the trajectories generated by the modified TBSA mapped onto the subspace spanned by a set of partial Cα RMSDs measured from each segment defined by the secondary structures of the native state. The panels (a-f) correspond to the projections of trajectories starting from a fully extended structure at the 1st, 10th, 20th, 30th, 40th, and 50th cycles. The initial state is indicated by an arrow in part (a). The native state is also indicated by an arrow in part (f). A snapshot sampled by TBSA (red) is superimposed on the X-ray structure (blue), and the Cα RMSD between these structures was 1.3 Å. F
DOI: 10.1021/acs.jctc.6b00082 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation the native state was superimposed with the experimental structure (Figure 5(f)), and its minimum Cα RMSD was approximately 1.3 Å, showing good correspondence with the experimental data. Generally speaking, these types of protein folding processes are difficult to reproduce with canonical MD simulations. In a previous study,18 microsecond- to millisecond-order canonical MD simulation was needed to reproduce the protein folding. As a reference, microsecond-order computational time was required to reproduce the folding processes of Protein B in an explicit solvent environment.18 Protein B is comparable to Protein A, because Protein B is also a protein with three-bundle helices and is similar to Protein A in terms of its system size. It is difficult to directly compare the computational costs of this demonstration with those of Anton, because we used information about the native structure when constructing the set of RCs. In this sense, our simulations are not a blind prediction of the native structure, although the modified TBSA successfully reproduced the folding pathways of Protein A with a nanosecond-order simulation time. As a summary of these demonstrations, we would like to focus on the computational costs to reproduce the biologically rare events. In all of the systems, the modified TBSA reproduced the individual structural transitions with relatively the same computational costs (nanosecond-order simulation times). Each type of protein dynamics has different time scales to induce individual structural transitions. The reason why our approach reproduced structural transitions with comparable time scales might be attributed to the conformational resampling from the rarely occurring states of proteins. The conformational resampling from states with extremely low populations promotes structural transitions by rapidly expanding the edges of the conformational spaces. Therefore, the modified TBSA can reproduce biologically relevant rare events with almost the same computational costs, and each system might have a different free energy between the reactants and products.
An advantage of TBSA is that its implementation is easier than metadynamics, because the program codes do not require modification. TBSA can be implemented with simple scripts that are easily handled together with several MD programs. However, metadynamics requires several modifications of the program codes to implement the external biases, although it has already been implemented in several MD packages. As a common disadvantage of both TBSA and metadynamics, their conformational sampling efficiency strongly depends on the selections of RCs. Thus, the best way to select the RCs remains an issue to be addressed. Ordinarily, RCs are selected by trial and error. The set of appropriate RCs is carefully selected based on each biological target. For this issue, we have recently proposed a method using a clustering algorithm to automatically detect the relevant RCs.62 To further enhance the efficiency of the modified TBSA, it is desirable to combine with the automatic-detection method. This extension will be presented in our future work.62 (b) Free Energy Calculations Based on the Modified TBSA. For the further development of TBSA, reconstruction of the free energy is one of the most important issues. In the current status, umbrella samplings63,64 are performed to reconstruct the free energies in a postprocessing manner, as follows. (1) Snapshots are first selected from the trajectories of TBSA and are used as references of the umbrella samplings along the RCs. (2) The umbrella samplings are performed based on the references. (3) Each snapshot of the umbrella samplings is analyzed by the weighted histogram analysis method (WHAM).65−70 As a demonstration, the free energy calculation of Menk was considered. In the previous study,53 the free energy barrier for hydrogen bonding rearrangement between the global minimum (LM2) and the transitional states (TS) was on the order of approximately five kBT units. To evaluate the activation barrier, we performed free energy calculations using the trajectories obtained by the modified TBSA, in which umbrella sampling was used to calculate the free energy landscape for the conformational transition between LM1 and LM2. First, the Cα RMSD measured from LM1 was used as the RC. Then, multiple snapshots were selected from the trajectories generated by the modified TBSA as references of the umbrella samplings. The multiple references were uniformly distributed along the RC. As the umbrella potentials, positional restraints were imposed around the references, as follows: ki( r ⃗ − rref ⃗ i), where ki, r,⃗ and rref ⃗ i are the force constant of the positional restraint for the ith umbrella potential, the coordinates of the system, and those of the ith reference, respectively. In this free energy calculation, 100 snapshots were selected as references and {ki} (i = 1, 2, ..., 100) were set to 0.1 kcal/mol/Å2. After the umbrella samplings, the snapshots were summed up with WHAM to estimate the free energy landscape. Figure 6 shows the one-dimensional free energy landscape of Menk in a vacuum along the RC. Based on Figure 6, the activation barrier between LM2 and TS was on the order of approximately four kBT units, showing relatively good correspondence with the previous study.53 This evidence proves that the free energy calculations based on these umbrella samplings were valid and also supports the reliability of the conformational sampling by the modified TBSA. In contrast to the free energy calculation via the postprocessing treatments, the weighted ensemble (WE)71−74 directly estimates the free energies, because WE conserves the
IV. DISCUSSION In the following paragraphs, we discuss several issues related to our present method and applications in more detail. First, we focus on the similarity between the modified TBSA and other conformational sampling methods. As a future perspective, free energy calculations based on the modified TBSA are then discussed. Finally, we address the issue of the selection of RCs to confirm how the RCs specified in this study affect the conformational sampling efficiency. (a) Similarity between the Modified TBSA and Metadynamics. As mentioned in the Introduction, TBSA shares some similarities with metadynamics.34 Both methods use a self-avoiding scheme to discourage a given system from revisiting conformational subspaces. To avoid revisiting the same places, TBSA employs an inverse histogram constructed from profiles of the conformational resampling, whereas metadynamics employs history-dependent bias potentials. For the convergence of the conformational sampling, the speed is proportional to the total number of initial structures selected in each cycle because the conformational resampling from multiple points accelerates the self-avoiding sampling. Herein, the idea of conformational resampling from multiple points resembles that of multiple-walker metadynamics35 to improve the slow convergence of the original (single-walker) metadynamics.34 G
DOI: 10.1021/acs.jctc.6b00082 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation
Figure 6. Free energy profile mapped onto the Cα RMSD measured from LM1. The local energy minimum state (LM1), the global energy minimum state (LM2), and the transitional states (TS) are indicated with arrows.
total weights of distinct trajectories in the conformational resampling, providing a statistical ensemble. Therefore, the direct sum of distinct trajectories of the WE can reconstruct the free energies without any postprocessing treatments. As an alternative means to reconstruct free energies, we are developing an improved methodology by modifying the current version of TBSA. For instance, a Markov state model (MSM)75−78 might be a good approach to reconstruct free energies in combination with TBSA. Based on the MSM, the transitional rates among microstates can be estimated using the trajectories of TBSA, thus providing free energies by solving an eigenvalue problem for the transitional matrix defined by the transitional rates. However, the accuracy of MSM depends on the selection of the RCs when defining microstates with the specified RCs. Therefore, a set of appropriate RCs that can correctly describe the biological reaction of interest must be specified. (c) Selections of Appropriate RCs and Limitation of TBSA. To address the dependency of the selection of RCs, it is important to confirm whether the selected RCs (Cα RMSDs measured from LM1 and LM2) can sufficiently capture the details of the rearrangement of the hydrogen bonds upon the transition. We focused on the demonstration of Menk and confirmed the structural transitions reproduced by the modified TBSA along the set of Cα RMSDs. To directly confirm the structural transitions reproduced by the modified TBSA, we addressed the rearrangement of the hydrogen bonds of the backbone. As a previous study reported,53 the two hydrogen bond distances, HB1 (the distance between the nitrogen atom of Gly2 and the oxygen atom of Met5) and HB2 (the distance between the nitrogen atom of Gly2 and the oxygen atom of Phe4), are switched upon rearrangement. To address the transitions, we joined the distinct trajectories of each cycle so that they connected the metastable states, LM1 and LM2, where the joined trajectories are referred to as reactive trajectories and correspond to the quasi-transitional pathways. To address the details, we monitored the profiles of the hydrogen bond distances. Figure 7 shows the profiles of the hydrogen bond distances of a reactive trajectory. As shown in Figure 7(a), HB1 was broken and HB2 was formed upon the structural transition from LM1 to LM2. Figure 7(b) shows the profile of the Cα RMSDs measured from LM1 and LM2, in which the values of the Cα RMSDs drastically changed upon transition. According to Figure 7(a-b), the switch of the hydrogen bond distances and the change of the Cα RMSD were well correlated. The specified RCs in this study (Cα RMSDs measured from LM1 and LM2) sufficiently captured the fundamental processes, such as hydrogen bond disruption, formation, and switching of Menk. For the sake of safety, we
Figure 7. Profiles of one of the reactive trajectories of Menk. (a) The time series of the hydrogen bond distances of the backbone. HB1 (red): distance between the nitrogen atom of Gly2 and the oxygen atom of Met5. HB2 (green): distance between the nitrogen atom of Gly2 and the oxygen atom of Phe4. (b) The time series of the Cα RMSDs measured from LM2. The dashed line indicates the structural transition from LM1 to LM2.
should check the validities of the reactive trajectories along several RCs. In addition to the problem of the RCs, there exists a limitation in actual application. TBSA is limited to the case of diffusive and slow biologically relevant events, where the potential energy surfaces are rather flat with ranges of a few kBT because TBSA does not employ any external biases. Therefore, TBSA is not suitable to investigate rare events that are high in energy, such as chemical bond breaking processes. Therefore, TBSA might be applicable for reproducing structural transitions that occur in thermal fluctuations. For these problems, hybrid methods of TBSA and other conformational sampling methods are important. Finally, we would like to mention a particular concern for applications to large biological systems. To analyze more complex biological reactions, a problem might arise in the modified TBSA when the RC dimensions increase. According to the increase in the dimensions, the number of unvisited conformational subspaces will grow quickly. As a result, the total number of initial structures will increase to sufficiently cover the conformational subspaces in each cycle, requiring additional computational costs. Therefore, the modification alone with a cutoff might not be efficient. To prevent this problem, a set of suitable RCs should be carefully selected to describe the biological reactions in a low-dimensional conformational subspace.
V. CONCLUSION In this study, we proposed an efficient inverse histogram of the original TaBoo SeArch (TBSA) algorithm to accelerate the convergence of the conformational sampling. Based on the newly proposed inverse histogram, rarely occurring states of a given protein are more intensively identified as sparse distributions. The efficient detection of sparse distributions enables us to further enhance the conformational sampling, H
DOI: 10.1021/acs.jctc.6b00082 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation
(18) Lindorff-Larsen, K.; Piana, S.; Dror, R. O.; Shaw, D. E. Science 2011, 334, 517−520. (19) Piana, S.; Lindorff-Larsen, K.; Shaw, D. E. Proc. Natl. Acad. Sci. U. S. A. 2013, 110, 5915−5920. (20) Arakawa, T.; Kamiya, N.; Nakamura, H.; Fukuda, I. PLoS One 2013, 8, e76606. (21) Kamiya, N.; Fukuda, I.; Nakamura, H. Chem. Phys. Lett. 2013, 568-569, 26−32. (22) Mashimo, T.; Fukunishi, Y.; Kamiya, N.; Takano, Y.; Fukuda, I.; Nakamura, H. J. Chem. Theory Comput. 2013, 9, 5599−5609. (23) Fukuda, I.; Yonezawa, Y.; Nakamura, H. J. Chem. Phys. 2011, 134, 164107. (24) Fukuda, I.; Kamiya, N.; Yonezawa, Y.; Nakamura, H. J. Chem. Phys. 2012, 137, 054314. (25) Fukuda, I. J. Chem. Phys. 2013, 139, 174107. (26) Fukuda, I.; Kamiya, N.; Nakamura, H. J. Chem. Phys. 2014, 140, 194307. (27) Case, D. A.; Babin, V.; Berryman, J. T.; Betz, R. M.; Cai, Q.; Cerutti, D. S.; Cheatham, T. E., III; Darden, T. A.; Duke, R. E.; Gohlke, H.; Goetz, A. W.; Gusarov, S.; Homeyer, N.; Janowski, P.; Kaus, J.; Kolossváry, I.; Kovalenko, A.; Lee, T. S.; LeGrand, S.; Luchko, T.; Luo, R.; Madej, B.; Merz, K. M.; Paesani, F.; Roe, D. R.; Roitberg, A.; Sagui, C.; Salomon-Ferrer, R.; Seabra, G.; Simmerling, C. L.; Smith, W.; Swails, J.; Walker, R. C.; Wang, J.; Wolf, R. M.; Wu, X.; Kollman, P. A. AMBER14; 2014. (28) Pronk, S.; Pall, S.; Schulz, R.; Larsson, P.; Bjelkmar, P.; Apostolov, R.; Shirts, M. R.; Smith, J. C.; Kasson, P. M.; van der Spoel, D.; Hess, B.; Lindahl, E. Bioinformatics 2013, 29, 845−854. (29) Phillips, J. C.; Braun, R.; Wang, W.; Gumbart, J.; Tajkhorshid, E.; Villa, E.; Chipot, C.; Skeel, R. D.; Kale, L.; Schulten, K. J. Comput. Chem. 2005, 26, 1781−1802. (30) Eastman, P.; Friedrichs, M. S.; Chodera, J. D.; Radmer, R. J.; Bruns, C. M.; Ku, J. P.; Beauchamp, K. A.; Lane, T. J.; Wang, L. P.; Shukla, D.; Tye, T.; Houston, M.; Stich, T.; Klein, C.; Shirts, M. R.; Pande, V. S. J. Chem. Theory Comput. 2013, 9, 461−469. (31) Schlitter, J.; Engels, M.; Kruger, P.; Jacoby, E.; Wollmer, A. Mol. Simul. 1993, 10, 291−308. (32) Isralewitz, B.; Baudry, J.; Gullingsrud, J.; Kosztin, D.; Schulten, K. J. Mol. Graphics Modell. 2001, 19, 13−25. (33) Isralewitz, B.; Gao, M.; Schulten, K. Curr. Opin. Struct. Biol. 2001, 11, 224−230. (34) Laio, A.; Parrinello, M. Proc. Natl. Acad. Sci. U. S. A. 2002, 99, 12562−12566. (35) Raiteri, P.; Laio, A.; Gervasio, F. L.; Micheletti, C.; Parrinello, M. J. Phys. Chem. B 2006, 110, 3533−3539. (36) Piana, S.; Laio, A. J. Phys. Chem. B 2007, 111, 4553−4559. (37) Laio, A.; Gervasio, F. L. Rep. Prog. Phys. 2008, 71, 126601. (38) Gervasio, F. L.; Laio, A.; Parrinello, M. J. Am. Chem. Soc. 2005, 127, 2600−2607. (39) Grubmuller, H. Phys. Rev. E: Stat. Phys., Plasmas, Fluids, Relat. Interdiscip. Top. 1995, 52, 2893−2906. (40) Sugita, Y.; Okamoto, Y. Chem. Phys. Lett. 1999, 314, 141−151. (41) Nakajima, N.; Nakamura, H.; Kidera, A. J. Phys. Chem. B 1997, 101, 817−824. (42) Sugita, Y.; Kitao, A.; Okamoto, Y. J. Chem. Phys. 2000, 113, 6042−6051. (43) Fukunishi, H.; Watanabe, O.; Takada, S. J. Chem. Phys. 2002, 116, 9058−9067. (44) Liu, P.; Kim, B.; Friesner, R. A.; Berne, B. J. Proc. Natl. Acad. Sci. U. S. A. 2005, 102, 13749−13754. (45) Okumura, H. J. Chem. Phys. 2008, 129, 124116. (46) Higo, J.; Umezawa, K.; Nakamura, H. J. Chem. Phys. 2013, 138, 184106. (47) Okamoto, Y. J. Mol. Graphics Modell. 2004, 22, 425−439. (48) Harada, R.; Takano, Y.; Shigeta, Y. Chem. Phys. Lett. 2015, 630, 68−75. (49) Harada, R.; Takano, Y.; Shigeta, Y. J. Comput. Chem. 2015, 36, 763−772.
leading to the easier reproduction of biologically relevant rare events with lower computational costs. To prove the effectiveness of the modified TBSA, we applied this algorithm to three characteristic structural transitions, including the (a) rearrangements of hydrogen bonds in Menk, (b) domain motions of GlnBP, and (c) protein folding of Protein A. All of the demonstrations demonstrated the higher efficiency of the modified TBSA. Therefore, the modified TBSA is a remarkably practical tool to reproduce biologically important rare events.
■
AUTHOR INFORMATION
Corresponding Authors
*E-mail:
[email protected] (R.H.). *E-mail:
[email protected] (Y.T.). *E-mail:
[email protected] (Y.S.). Notes
The authors declare no competing financial interest.
■
ACKNOWLEDGMENTS This work was mainly supported by a Japan Society for the Promotion Science (JSPS) Research Fellowship for Young Scientists (No. 15J03797) and a JSPS KAKENHI Grant-in-Aid for Young Scientists (B), No. 26840056 and was partially supported by Grants-in-Aid for Scientific Research of the Innovative Areas “Photosynergetics” and “3D Active-Site Science” (Nos. 26107004 and 26105012) from JSPS and JST, CREST. The computations were performed with the Integrated Cluster of Clusters (RICC), the HOKUSAI-greatwave system at RIKEN, and HA-PACS/COMA at the Computer Center for Sciences (CCS), University of Tsukuba.
■
REFERENCES
(1) Bryngelson, J. D.; Onuchic, J. N.; Socci, N. D.; Wolynes, P. G. Proteins: Struct., Funct., Genet. 1995, 21, 167−195. (2) Becker, O. M. Proteins: Struct., Funct., Genet. 1997, 27, 213−226. (3) Simmerling, C.; Strockbine, B.; Roitberg, A. E. J. Am. Chem. Soc. 2002, 124, 11258−11259. (4) Bhattacharya, D.; Cheng, J. L. Sci. Rep. 2015, 5, 16332. (5) Carlson, H. A.; McCammon, J. A. Mol. Pharmacol. 2000, 57, 213−218. (6) Wang, W.; Donini, O.; Reyes, C. M.; Kollman, P. A. Annu. Rev. Biophys. Biomol. Struct. 2001, 30, 211−243. (7) Abrams, C. F.; Vanden-Eijnden, E. Proc. Natl. Acad. Sci. U. S. A. 2010, 107, 4961−4966. (8) Gervasio, F. L.; Laio, A.; Iannuzzi, M.; Parrinello, M. Chem. - Eur. J. 2004, 10, 4846−4852. (9) Boero, M.; Ikeda, T.; Ito, E.; Terakura, K. J. Am. Chem. Soc. 2006, 128, 16798−16807. (10) Rojas-Cervellera, V.; Ardevol, A.; Boero, M.; Planas, A.; Rovira, C. Chem. - Eur. J. 2013, 19, 14018−14023. (11) Karplus, M.; Petsko, G. A. Nature 1990, 347, 631−639. (12) Karplus, M.; McCammon, J. A. Nat. Struct. Biol. 2002, 9, 646− 652. (13) Klepeis, J. L.; Lindorff-Larsen, K.; Dror, R. O.; Shaw, D. E. Curr. Opin. Struct. Biol. 2009, 19, 120−127. (14) Calvo, F.; Galindez, J.; Gadea, F. X. J. Phys. Chem. A 2002, 106, 4145−4152. (15) Legoll, F.; Luskin, M.; Moeckel, R. Arch. Ration. Mech. Anal. 2007, 184, 449−463. (16) Legoll, F.; Luskin, M.; Moeckel, R. Nonlinearity 2009, 22, 1673− 1694. (17) Shaw, D. E.; Maragakis, P.; Lindorff-Larsen, K.; Piana, S.; Dror, R. O.; Eastwood, M. P.; Bank, J. A.; Jumper, J. M.; Salmon, J. K.; Shan, Y. B.; Wriggers, W. Science 2010, 330, 341−346. I
DOI: 10.1021/acs.jctc.6b00082 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation (50) Caves, L. S. D.; Evanseck, J. D.; Karplus, M. Protein Sci. 1998, 7, 649−666. (51) Harada, R.; Nakamura, T.; Shigeta, Y. J. Comput. Chem. 2016, 37, 724−738. (52) Itoh, S. G.; Okamoto, Y. Chem. Phys. Lett. 2004, 400, 308−313. (53) Harada, R.; Kitao, A. Chem. Phys. Lett. 2011, 503, 145−152. (54) Yamamori, Y.; Kitao, A. J. Chem. Phys. 2013, 139, 145105. (55) Berendsen, H. J. C.; Postma, J. P. M.; Vangunsteren, W. F.; Dinola, A.; Haak, J. R. J. Chem. Phys. 1984, 81, 3684−3690. (56) Kollman, P.; Dixon, R.; Cornell, W.; Fox, T.; Chipot, C.; Pohorille, A. Elsevier 1997, 83−96. (57) Hsiao, C. D.; Sun, Y. J.; Rose, J.; Wang, B. C. J. Mol. Biol. 1996, 262, 225−242. (58) Sun, Y. J.; Rose, J.; Wang, B. C.; Hsiao, C. D. J. Mol. Biol. 1998, 278, 219−229. (59) Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L. J. Chem. Phys. 1983, 79, 926−935. (60) Gouda, H.; Torigoe, H.; Saito, A.; Sato, M.; Arata, Y.; Shimada, I. Biochemistry 1992, 31, 9665−9672. (61) Duan, Y.; Wu, C.; Chowdhury, S.; Lee, M. C.; Xiong, G. M.; Zhang, W.; Yang, R.; Cieplak, P.; Luo, R.; Lee, T.; Caldwell, J.; Wang, J. M.; Kollman, P. J. Comput. Chem. 2003, 24, 1999−2012. (62) Harada, R.; Nakamura, T.; Shigeta, Y. Chem. Phys. Lett. 2015, 639, 269−274. (63) Torrie, G. M.; Valleau, J. P. J. Comput. Phys. 1977, 23, 187−199. (64) Torrie, G. M.; Valleau, J. P. J. Chem. Phys. 1977, 66, 1402−1408. (65) Ferrenberg, A. M.; Swendsen, R. H. Phys. Rev. Lett. 1989, 63, 1195−1198. (66) Kumar, S.; Bouzida, D.; Swendsen, R. H.; Kollman, P. A.; Rosenberg, J. M. J. Comput. Chem. 1992, 13, 1011−1021. (67) Souaille, M.; Roux, B. Comput. Phys. Commun. 2001, 135, 40− 57. (68) Wang, J. Y.; Deng, Y. Q.; Roux, B. Biophys. J. 2006, 91, 2798− 2814. (69) Shirts, M. R.; Chodera, J. D. J. Chem. Phys. 2008, 129, 124105. (70) Hub, J. S.; de Groot, B. L.; van der Spoel, D. J. Chem. Theory Comput. 2010, 6, 3713−3720. (71) Huber, G. A.; Kim, S. Biophys. J. 1996, 70, 97−110. (72) Bhatt, D.; Zhang, B. W.; Zuckerman, D. M. J. Chem. Phys. 2010, 133, 014110. (73) Zhang, B. W.; Jasnow, D.; Zuckerman, D. M. J. Chem. Phys. 2010, 132, 054107. (74) Zwier, M. C.; Adelman, J. L.; Kaus, J. W.; Pratt, A. J.; Wong, K. F.; Rego, N. B.; Suarez, E.; Lettieri, S.; Wang, D. W.; Grabe, M.; Zuckerman, D. M.; Chong, L. T. J. Chem. Theory Comput. 2015, 11, 800−809. (75) Bowman, G. R.; Huang, X. H.; Pande, V. S. Methods 2009, 49, 197−201. (76) Pande, V. S.; Beauchamp, K.; Bowman, G. R. Methods 2010, 52, 99−105. (77) Lane, T. J.; Bowman, G. R.; Beauchamp, K.; Voelz, V. A.; Pande, V. S. J. Am. Chem. Soc. 2011, 133, 18413−18419. (78) Weber, J. K.; Pande, V. S. J. Chem. Theory Comput. 2011, 7, 3405−3411.
J
DOI: 10.1021/acs.jctc.6b00082 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX