Temperature-Shuffled Structural Dissimilarity Sampling Based on a

Jun 8, 2018 - Structural dissimilarity sampling (SDS) has been proposed as an enhanced conformational sampling method for finding neighboring ...
1 downloads 0 Views 2MB Size
Article pubs.acs.org/jcim

Cite This: J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Temperature-Shuffled Structural Dissimilarity Sampling Based on a Root-Mean-Square Deviation Ryuhei Harada* and Yasuteru Shigeta* Center for Computational Sciences, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8577, Japan

Downloaded via NEW MEXICO STATE UNIV on July 6, 2018 at 13:17:46 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.

S Supporting Information *

ABSTRACT: Structural dissimilarity sampling (SDS) has been proposed as an enhanced conformational sampling method for finding neighboring metastable states of a given reactant or generating transition pathways starting from the reactant. SDS repeats a cycle of two steps: (1) selections of initial structures based on structural dissimilarities by referring to a measure and (2) conformational resampling by restarting short-time molecular dynamics (MD) simulations from the initial structures. In the present study, the measure was defined as the root-mean-square deviation (RMSD) among the resampled snapshots to characterize their structural dissimilarities. Additionally, the temperatures in restarting the short-time MD simulations were randomly shuffled at the beginning of each cycle to further promote the conformational transitions. We call this approach temperature-shuffled SDS (TSF-SDS). As a demonstration, TSF-SDS was applied to promote the open− closed transition of T4 lysozyme (T4L) in explicit water. TSF-SDS successfully reproduced the relevant domain motion with nanosecond-order simulation time, whereas conventional SDS without shuffling of the temperatures failed to promote the transition of T4L, indicating the high conformational sampling efficiency of TSF-SDS for promoting essential conformational transitions of proteins. Furthermore, as a wide-range application, TSF-SDS efficiently identified the native state of trp-cage and a dissociation process of ubiquitin dimer in explicit water.

1. INTRODUCTION Molecular dynamics (MD) simulations are regarded as one of the general conformational sampling methods in studying conformational transitions of proteins. However, the proteins often have rough free energy landscapes with many local minima separated by free energy barriers that are difficult to traverse at room temperature. Furthermore, it is easy to fall into a nonfunctional state that is hard to escape with most conventional MD (CMD) simulations. Recent studies have demonstrated that proteins get trapped in irrelevant conformations in their long-time MD simulations without going back to the original relevant conformations. Actually, many free energy minima can trap the proteins for a long time and slow down the conformational sampling process for them, leading to a poor characterization of the protein dynamics. In general, conformational transitions with large-amplitude motions are essential for protein activities. For instance, catalysis with large-amplitude movements is frequently required for transport through membranes or channels. In addition, transporters have to undergo large conformational transitions upon gating their substrates. These complicated and time-consuming processes are commonly beyond the accessible time scales of typical CMD simulations. Therefore, enhanced conformational sampling methods are needed. To overcome the issue of conformational sampling inefficiency, in the past few decades methods such as targeted MD,1 steered MD,2,3 metadynamics,4−7 multicanonical MD (McMD),8 replica-exchange MD (REMD),9 and their variants10−17 have © XXXX American Chemical Society

been implemented in the well-established MD packages to elucidate the biological functions induced as long-time protein dynamics. To promote structural transitions of proteins, external constraints or biases are often imposed in the several enhanced conformational sampling methods. In contrast, as an externalbias-free approach, we have proposed structural dissimilarity sampling (SDS) for finding neighboring metastable states of a given reactant and generating transition pathways from the reactant. On the basis of a simple strategy of conformational searching, SDS repeats a cycle of two steps: (1) selections of structurally uncorrelated snapshots (initial structures) by referring to structural dissimilarity as characterized by a chosen measure and (2) conformational resampling by restarting short-time molecular dynamics (MD) simulations from the initial structures with regenerated initial velocities. As a result of conformational resampling from the structurally uncorrelated snapshots, transition probabilities to neighboring metastable states might be increased despite the simple computational procedure because rarely occurring states of the biological target are intensively resampled, promoting the conformational transitions relevant to the biological functions. In the present study, the measure to characterize the structural dissimilarity was simply defined as the root-meansquare deviation (RMSD) among the resampled snapshots. Received: February 18, 2018 Published: June 8, 2018 A

DOI: 10.1021/acs.jcim.8b00095 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

inner product (IP) is utilized to measure the structural dissimilarity among snapshots. In the present study, the RMSD among snapshots is simply utilized to measure the structural dissimilarity instead of an IP. In terms of RMSD among snapshots, the initial structures at the ith cycle are selected from the snapshots at the (i − 1)th cycle as follows: (1) an average structure is calculated using all of the snapshots sampled at the (i − 1)th cycle. Then the snapshot with the largest RMSD measured from the average is selected as the first candidate (I1(α)), where it is assumed that the αth snapshot is selected as the first initial structure. (2) For all of the snapshots except for the αth snapshot, the RMSD between the αth and kth snapshots, RMSD(α, k) (k = 1, 2, ..., Ni−1, k ≠ α) is calculated, where Ni−1 is the total number of snapshots at the (i − 1)th cycle. Then the snapshot with the largest RMSD(α, β) is selected as the second candidate (I2(β)), where it is assumed that the βth snapshot is selected as the second initial structure. (3) For the third initial candidate, RMSD(α, k) and RMSD(β, k) (k = 1, 2, ..., Ni−1, k ≠ α, β) are calculated and summed. Then, the snapshot with the largest value of the sum, RMSD(α, γ) + RMSD(β, γ), is selected as the third candidate (I3(γ)), where the γth snapshot is selected as the third initial structure. (4) For the fourth initial candidate and beyond, the RMSD values with respect to all of the selected initial structures (α, β, ..., λ) are calculated and summed, and the μth snapshot with the largest value (RMSDmax) is selected as the initial structure (eq 1):

Furthermore, the temperatures assigned in restarting the shorttime MD simulations were randomly shuffled at the beginning of each cycle to promote conformational transitions to newly found states. We hereafter refer to this method as temperatureshuffled SDS (TSF-SDS). As a demonstration, the conformational sampling efficiencies for T4 lysozyme (T4L) in conventional and TSF-SDS simulations are compared. Furthermore, as a wide-range application, TSF-SDS efficiently identified the native state of trp-cage and a dissociation process of ubiquitin dimer in explicit water.

2. MATERIALS AND METHODS 2.1. Structural Dissimilarity Sampling Based on the Root-Mean-Square Deviation. SDS has been proposed as an enhanced conformational sampling method to generate conformational transition pathways from a starting structure (reactant). In principle, SDS does not need target structures (products) in the applications, i.e., only a given reactant is required. The strategy of conformational sampling is based on conformational resampling from structurally uncorrelated snapshots characterized by structural dissimilarity. In SDS, to promote conformational transitions to neighboring metastable states, cycles of conformational resampling are repeated, where each cycle consists of (1) selections of structurally uncorrelated snapshots (initial structures) and (2) conformational resampling by restarting short-time MD simulations from the initial structures with regenerated initial velocities. The cycles of SDS are repeated so that the searched conformational subspaces are sufficiently expanded. To terminate the cycles of conformational resampling, it is generally confirmed whether the explored conformational distributions converge. For example, minimum and maximum energy ranges are addressed to confirm whether conformational resampling based on SDS has sampled a variety of configurations of a given system. The concept of selecting initial structures in SDS is illustrated in Figure 1. To efficiently promote conformational transitions by SDS, it is essential to specify the structurally uncorrelated snapshots correctly. In the original SDS,18 high-dimensional conformational subspaces are spanned by principal axes (modes), and an

max

k = 1,2,..., Ni − 1(k ≠ α , β ,..., λ)

RMSD(α , k) + RMSD(β , k) +

... + RMSD(λ , k) ≤ RMSDmax (k = μ)

(1)

By repeating the selections of initial structures in terms of eq 1, a total of ninitial initial structures are specified at the beginning of every cycle. Then the initial structures are intensively resampled by restarting the short-time MD simulations with regenerated initial velocities. 2.2. Temperature-Shuffled SDS. To further enhance the conformational sampling efficiency, the temperatures of the short-time MD simulations are randomly shuffled at the beginning of conformational resampling. This idea has been already adopted in parallel cascade selection MD (PaCSMD)19 as temperature-shuffled PaCS-MD (TSF-PaCS-MD).20 As a result of the temperature shuffling among the distinct short-time MD simulations, the conformational sampling efficiency might be improved. By a strategy similar to TSFPaCS-MD, in the present method, the temperatures are randomly shuffled before the short-time MD simulations in SDS are restarted. This approach is called TSF-SDS. Hereafter, details of the parameters utilized in TSF-SDS are described. To shuffle a set of temperatures, ninitial temperatures are randomly generated in a temperature range (Tmin ≤ T ≤ Tmax) at the beginning of each cycle in terms of random seeds, i.e., the shuffled temperatures should be uniformly distributed in this range. Then one of these temperatures {Titarget, i = 1, 2, ..., ninitial} is assigned to each system as its target temperature. Before resampling of the ninitial short-time MD simulations, their initial velocities are newly generated on the basis of the Maxwell−Boltzmann distribution at the corresponding target temperatures {Titarget, i = 1, 2, ..., ninitial}. A flowchart of TSFSDS is shown in Figure 2. 2.3. System Setup and Computational Details for TSF-SDS. For a demonstration of TSF-SDS, T4L was adopted

Figure 1. Concept of selecting initial structures in SDS. In the present example, five structurally uncorrelated initial structures are selected from the eight snapshots. First, the αth snapshot with the largest RMSD measured from the structural center (average) is selected as the first candidate. Next, the βth snapshot with the largest RMSD from the αth snapshot (RMSD(α,β)) is selected as the second candidate. Subsequently, the γth snapshot with the largest sum of RMSDs from the αth and βth snapshots (RMSD(α,γ) + RMSD(β,γ)) is selected as the third candidate. Subsequently, the fourth and fifth candidates are selected as the δth and the εth snapshots by evaluating the sum of RMSDs defined by eq 1. B

DOI: 10.1021/acs.jcim.8b00095 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

To address how the initial temperature setting affects the conformational sampling efficiency, different TSF-SDS simulations were performed using the following initial temperature settings: Set1, ninitial temperatures uniformly distributed between 300 and 310 K; Set2, ninitial temperatures uniformly distributed between 300 and 320 K; Set3, ninitial temperatures uniformly distributed between 300 and 340 K. At the beginning of each cycle, the temperatures of the systems were randomly regenerated within these temperature ranges, corresponding to the temperature shuffling in TSF-SDS. After shuffling of the temperatures, 100 ps MD simulations were independently restarted from the initial structures using the shuffled temperatures, {Tji}, where Tji represents the temperature of the ith initial structure (i = 1, 2, ..., ninitial) shuffled at the beginning of the jth cycle. As a reference for the TSF-SDS simulations, conventional SDS simulations were performed at a constant temperature (300 K). To maintain repeatability, three trials were independently considered for all of the conventional and TSF-SDS simulations. As an additional demonstration of a blind prediction, a protein folding process of a mini-protein (trp-cage)32 in explicit water was simulated by conventional SDS and TSFSDS simulations to compare their conformational sampling efficiencies in sampling the native structure. To construct a starting structure, a denatured structure was taken from a 100 ns MD simulation of trp-cage at 500 K. Starting from the denatured structure, both SDS simulations were performed for 50 cycles to sample the native structure of trp-cage. As a practical application to a biological system, a dissociation process of ubiquitin dimer (UBQD)33 was sampled by TSF-SDS. In the present application, UBQD was solvated in explicit water to construct a full atomic system (38 447 atoms). After the solvated system was relaxed with 200 ps MD simulations under NVT (T = 300 K) and NPT (T = 300 K, P = 1 bar), an equilibrated snapshot was employed as a reactant for the TSF-SDS simulation. Starting from the reactant, the TSF-SDS simulation repeated 40 cycles to induce the dissociation process. In the TSF-SDS simulations for the trp-cage and UBQD systems, the number of initial structures was set to 10 (ninitial = 10). The initial temperatures of these systems were randomly generated between 300 and 310 K. As a reference for TSFSDS, a conventional SDS simulation at 300 K was performed without temperature shuffling for the same systems. The conditions for the MD simulations, such as temperature and pressure controls, were the same as for the TSF-SDS simulations of T4L described before.

Figure 2. Flowchart of TSF-SDS. As a preparation, an initial set of temperatures {T1, T2, ..., Tn} is specified. Then n short-time MD simulations are restarted from a given reactant (starting structure) using the n temperatures, and the final snapshots obtained from the n short-time MD simulations are specified as the first initial structures, i.e., {I1, I2, ..., In}. As the first conformational resampling, n short-time MD simulations are restarted from the initial candidates {I1, I2, ..., In}. After the conformational resampling, n structurally uncorrelated snapshots are selected from all of the trajectories on the basis of eq 1, which are specified as the initial structures at the next cycle. If the cycle of conformational resampling is enough, TSF-SDS can be terminated. Otherwise, the set of temperatures at the current cycle is randomly regenerated in terms of a random seed, corresponding to the shuffling of temperatures, and the newly updated temperatures are utilized at the next cycle.

as a biological target. In the present study, TSF-SDS was applied to generate the transition pathways from the open state to the closed state of T4L. As the starting structure (reactant), the X-ray crystal structure of a mutant open form of T4L (PDB ID 150L)21 was mutated back to the wild type (chain A was utilized as the initial model). To construct a full atomic model, the mutated-back open form of T4L was solvated with the TIP3P water model22 in a rectangular simulation box. The solvated systems included three Na+ for neutralization. The chemical bonds of T4L were treated as rigid bodies with the LINCS algorithm,23 and the water molecules were treated as rigid bodies with the SETTLE algorithm.24 To equilibrate the full atomic system, a 100 ps NVT simulation was started from the solvated system with the modified Berendsen thermostat at 300 K25 and followed by a 100 ps NPT simulation with the Parrinello−Rahman method26,27 (1 bar and 300 K) to adjust the volume. For nonlocal interactions, the electrostatic interactions were calculated with the particle mesh Ewald method28 (with a real-space cutoff of 10 Å and the cutoff value for van der Waals interactions set to 10 Å). The final snapshot taken from the 100 ps NPT simulation was adopted as the starting structure of the open form of T4L. All of the MD simulations were performed using the GPU version of the GROMACS 2018 package29 with the AMBER ff03 force field.30 To treat the protonation states of each system, PROPKA31 was utilized. In the present TSF-SDS simulations, the number of initial structures was fixed at 10 (ninitial = 10). At the beginning of each cycle, ninitial initial structures were selected by referring to the structural dissimilarity defined by eq 1 and were independently resampled with short-time MD simulations.

3. RESULTS AND DISCUSSION 3.1. Conformational Sampling Efficiency of TSF-SDS (T4L and Trp-Cage). To confirm the conformational subspaces searched by TSF-SDS, the Cα RMSD measured from an already known metastable state (the closed form of T4L, PDB ID 2LZM)34 was defined as Cα RMSDclosed, and their profiles were monitored (see Figure 3). For the present demonstrations, it was judged that the TSF-SDS simulations sampled the closed form of T4L when the value of Cα RMSDclosed was less than or equal to 1.0 Å. Judging from the profiles, the minimum values of Cα RMSDclosed were less than or equal to 1.0 Å in all the TSF-SDS simulations (see Table 1), meaning that TSF-SDS could find the closed state as a blind prediction. Generally, the open−closed conformational transition of T4L is regarded as a biologically rare event induced in C

DOI: 10.1021/acs.jcim.8b00095 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

(300 K) failed to rigorously sample the closed state of T4L because the minimum value of Cα RMSDclosed among all of the trials of conventional SDS was ∼1.2 Å, which was still larger than the threshold (1.0 Å). For the TSF-SDS simulations, the profiles of minimum average Cα RMSDclosed are shown by the blue lines in Figure 3, and the profile for conventional SDS is shown by the magenta lines. Judging from Figure 3, all of the TSF-SDS simulations showed higher conformational sampling efficiencies than the conventional SDS simulation. The present demonstrations confirmed that the quite simple procedure of temperature shuffling in SDS provided a higher conformational sampling efficiency. As a property of TSF-SDS, the initial structures are selected as structurally uncorrelated snapshots at every cycle. Therefore, TSF-SDS began to search different kinds of configurations once the closed state of T4L was sampled. Actually, the profiles of Cα RMSDclosed began to increase after passing through the dashed line (see the blue lines in Figure 3), indicating that TSF-SDS might have begun to search neighboring metastable states around the closed state of T4L. Finally, the conformational sampling efficiency was compared in terms of another demonstration: a protein folding process of trp-cage. To confirm whether both SDS simulations sampled the native structure of trp-cage, the Cα RMSD measured from the X-ray crystal structure (C α RMSDnative) was monitored, as shown in Figure S1. For trpcage, a Cα RMSDnative value of 2.0 Å was used as a threshold to judge when the native structure was sampled. Judging from Figure S1a, the conventional SDS simulation failed to sample the native structure for the 50 cycles because the Cα RMSDnative value was always larger than 2.0 Å. In contrast, the TSF-SDS simulation frequently identified the native structure (Cα RMSDnative < 2.0 Å) during the 50 cycles (Figure S1b), also validating the high conformational sampling efficiency of TSF-SDS. 3.2. Dissociation Process of UBQD Sampled by TSFSDS. To confirm whether TSF-SDS successfully sampled the dissociation process, the profiles of (1) the Cα RMSD measured from a dimer X-ray structure (PDB ID 1AAR)33 (Cα RMSDdimer) and (2) the center of mass (COM) distance between the two monomers were monitored (Figure 4a,b). Judging from these profiles, Cα RMSDdimer and the COM distance reached ∼20.0 Å and ∼50.0 Å at the 40th cycle, meaning that UBQD was completely dissociated to the monomers. A representative dissociated snapshot at the 40th cycle is shown in Figure 5c. To understand the dissociation process in more detail, we divided it into three stages based on the profile of the COM distance (Figure 4b) as follows: (1) 1−10 cycles, (2) 11−25 cycles, and (3) 26−40 cycles. As shown in Figure 4b, the COM distance was almost unchanged until the 10th cycle and then gradually increased during the second stage, with the variance of the COM distance becoming larger as the cycles progressed. Finally, the COM distance rapidly increased during the third stage until the 40th cycle. In contrast, the profile of Cα RMSDdimer increased monotonically for the 40 cycles, indicating that the COM distance between two monomers might be a more reasonable reaction coordinate (RC) to characterize the dissociation process of UBQD than Cα RMSDdimer. During the first stage, a characteristic intermolecular interaction between Ile44 in monomer A and Ile44 in monomer B remained, corresponding to the surface interaction

Figure 3. Profiles of minimum Cα RMSD measured from the closed form of T4L, Cα RMSDclosed, for 50 cycles for TSF-SDS using (a) the first (Set1), (b) the second (Set2), and (c) the third (Set3) initial temperature setting. The averages of Cα RMSDclosed are plotted in blue and their standard deviations in red. In each panel, average of Cα RMSDclosed for conventional SDS is plotted in magenta and its standard deviation in green. The dashed lines represent the criterion for confirming whether the TSF-SDS and conventional SDS simulations sampled the closed state of T4L.

Table 1. Minimum Simulation Times To Sample the Closed State of T4L (Cα RMSDclosed ≤ 1.0 Å) and Minimum Cα RMSDclosed among Three Independent Runs of the TSFSDS Simulation Using the First, Second, and Third Initial Temperature Settings (Set1−Set3) for 50 Cycles setting

run

minimum number of cycles

minimum simulation time [ns]

mininum Cα RMSDclosed [Å]

Set1

1 2 3 1 2 3 1 2 3

5 11 4 6 4 5 2 15 13

5 11 4 6 4 5 2 15 13

1.00 0.85 0.72 1.00 0.87 1.00 0.85 0.84 0.95

Set2

Set3

microsecond-order simulation time. Therefore, it might be of value that TSF-SDS successfully reproduced the essential transition of T4L with nanosecond-order simulation time as shown in Table 1. In particular, the minimum computational cost was ∼2 ns to promote the conformational transition to the closed state in the first trial in TSF-SDS using the third initial temperature setting (Set3), also indicating the extremely high conformational sampling efficiency of TSF-SDS. In contrast, the conventional SDS simulations at constant temperature D

DOI: 10.1021/acs.jcim.8b00095 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

Figure 5. (a) UBQD X-ray structure (PDB ID 1AAR).33 (b) Representative snapshot of UBQD sampled at the 25th cycle in the transition state before the complete dissociation to the monomers. (c) Representative snapshot of the completely dissociated state. In each structure, the key residue (Ile44) for the intermolecular interaction is highlighted by a VDW representation.

Figure 4. (a) Profile of Cα RMSD of UBQD measured from the X-ray dimer (PDB ID 1AAR).33 (b) Profile of the COM distance between the monomers. (c) Cα RMSDs of each monomer measured from the monomer X-ray structure (PDB ID 1UBQ).35 The values were averaged over all of the snapshots sampled at every cycle (solid lines) and plotted with their standard deviations. The dashed lines in (b) represent the dissociation stages identified by the TSF-SDS simulation.

the 25th cycle, i.e., the hydrophobic interaction between the Ile44 residues of the two monomers was destroyed. Furthermore, the C-terminus of one monomer was partially unfolded to contact with the other monomer, which might be a transition state of the dissociation (Figure 5b). This partial unfolding in the C-terminus corresponds to the rapid increase of Cα RMSDdimer until the 25th cycle (blue line in Figure 4c). These results imply that one monomer might recognize its partner with this flexible terminus as a tag in the backward reaction, i.e., the C-terminus of one monomer weakly contacted with the other monomer. This contact with the Cterminus disappeared in the late second stage to achieve two isolated monomers, as shown in Figure 5c. To confirm how each monomer fluctuates in the equilibrium state, three types of 100 ns MD simulations were independently performed, started from (1) one of the monomers in the dimer X-ray structure, (2) the other monomer in the dimer, and (3) the monomer X-ray structure. Figure 6a−c shows the corresponding profiles of Cα RMSD of the monomer measured from the monomer X-ray structure. Judging from these profiles, each monomer fluctuated between 1.0 and 1.5 Å in Cα RMSDmonomer, while their values instantaneously fluctuated over 2.0 Å during the 100 ns. These large fluctuations correspond to the partial unfolding of the C-terminus. As described before, each monomer might utilize this flexibility of the C-terminus to initiate the dimerization. As evidence of this, the fluctuations of Cα RMSD in the dissociation process (Figure 4c) were much smaller than those of Cα RMSD in the monomer (Figure 6), i.e., the fluctuations of the C-termini of the monomers were suppressed by the contact with each other, indicating a crucial

between the monomers observed in the dimer X-ray structure (Figure 5a). Actually, the mean deviation of the COM distance was about 0.68 Å at the 10th cycle. Therefore, the tight interaction between the two monomers seems to be maintained during the first stage. On the other hand, this characteristic interaction was destroyed during the second stage. As shown in Figure 4a,b, the mean deviations of the COM distance and Cα RMSD became gradually larger than those during the first stage, indicating that more conformations were allowed in the second stage. In the third stage, the profiles of the COM distance and Cα RMSDdimer rapidly increased. Finally, UBQD was completely dissociated to the monomers, judging from both the average Cα RMSDdimer and the COM distance at the 40th cycle. To address how the monomer structurally changed for the cycles, the Cα RMSD measured from the monomer X-ray structure (PDB ID 1UBQ) 35 (C α RMSDmonomer ) was monitored. As a cycle went by, Cα RMSDmonomer for one of the monomers monotonically increased (blue line in Figure 4c), while that of the other monomer was moderately changed from the initial value (magenta line in Figure 4c). Interestingly, the former Cα RMSDmonomer rapidly increased until the 25th cycle and gradually decreased after the 25th cycle, indicating that one of the monomers might be partially unfolded to initiate the dissociation. In particular, an intermolecular interaction at the central hydrophobic part was destroyed at E

DOI: 10.1021/acs.jcim.8b00095 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

Figure 7. Histograms of the initial structures for TSF-SD simulations using the first (Set1, red), second (Set2, green), and third (Set3, blue) initial temperature settings. The temperatures of the initial structures were recorded at the beginning of each cycle, and their values were accumulated over the cycles. The values of the histograms were normalized. The dashed lines represent the upper bounds of temperature for the three types of TSF-SDS simulations.

tional transitions to metastable states separated by higher free energy barriers cannot be promoted when only low temperatures are specified. Analogous to the strategy of conformational sampling in REMD, a set of initial temperatures should be reasonably specified at the beginning of a TSF-SDS simulation. Actually, the histogram of the TSF-SDS simulation using the third initial temperature setting (Set3) had a higher population of snapshots in the lower half of the temperature range from 300 to 320 K than in the higher half of the range from 320 to 340 K, indicating that the higher half of the temperature range might be unreasonable for enhancing the conformational sampling of the current system. For an upper bound on the initial temperature, too high values might cause inefficiencies in conformational sampling because conformational resampling at extremely high temperatures would sample unphysical configurations irrelevant to the conformational transitions of interest. For the current system, the initial structures were intensively selected from the temperature range of 300 to 320 K in all of the TSF-SDS simulations, as shown in Figure 7, i.e., this might be the most reasonable temperature range to promote the conformational transitions of T4L between the open and closed states. In summary, one might perform several trials of TSF-SDS simulations with different sets of initial temperatures to determine a reasonable temperature range depending on the biological target. 3.4. Comparisons of Conformational Areas Searched by TSF-SDS (T4L). To identify the locations of metastable states sampled with the TSF-SDS simulations, negative logarithm plots of populations for T4L were constructed. To construct the population maps, all of the snapshots were projected onto a conformational subspace spanned by the set of RMSDs (Cα RMSDopen, Cα RMSDclosed) (see the projected population maps in Figure 8). Judging from these projections, the closed state of T4L was identified as a metastable state on the population maps in all of the TSF-SDS simulations. As shown in the population maps, TSF-SDS tends to sample configurations that are remarkably deviated from the metastable states, which might correspond to structures irrelevant to the biological functions and to unphysical structures with extremely high energies existing at the edges of the free energy landscapes (FELs). Therefore, one might have to carefully treat these configurations with large deviations from a given starting structure by checking their energies or locations on the population maps.

Figure 6. Profiles of Cα RMSD measured from the monomer X-ray structure (PDB ID 1UBQ)35 for simulations started from (a) one monomer in the dimer X-ray structure (PDB ID 1AAR)33 (b) the other monomer in the dimer, and (c) the monomer X-ray structure.

role of the C-terminus for initiating dimerization/dissociation of UBQD. As a comparison of the conformational sampling efficiencies of the conventional and extended simulations, the conventional SDS simulation was performed at 300 K without temperature shuffling. Figure S2 presents the obtained profiles of Cα RMSDdimer and the COM distance, which show that the conventional SDS simulation failed to promote the dissociation of UBQD within the 50 cycles because both profiles were almost unchanged for the 50 cycles. In contrast, the TSF-SDS simulation successfully promoted the dissociation of UBQD, as shown above. This result clearly might validate the high conformational sampling efficiency of TSF-SDS compared with conventional SDS. 3.3. How the Initial Temperature Setting Affects the Conformational Sampling Efficiency (T4L). For the conformational sampling efficiencies of TSF-SDS with different ranges of initial temperatures, histograms of the initial structures were evaluated for T4L (Figure 7). To construct the histograms, the temperatures of the initial structures were recorded at the beginning of each cycle. Judging from Figure 7, the histograms for the TSF-SDS simulations using the first and second initial temperature settings (Set1 and Set2) had double peaks in the lower and higher temperature ranges. Judging from the double peaks, the higher temperatures might have been utilized to promote the conformational transitions, and the lower temperatures might have been utilized to intensively sample the metastable states of T4L. Herein, the strategy of conformational sampling in TSF-SDS might be similar to that in REMD.9 In REMD, high and low temperatures are prepared to enhance the conformational sampling by exchanging the temperatures among neighboring biological systems (replicas) for cycles of parallel MD simulations with different temperatures. As a characteristic property of REMD, low-free-energy states such as the native states of proteins cannot be sampled when only high temperatures are specified. Also, conformaF

DOI: 10.1021/acs.jcim.8b00095 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

Figure 9. Profiles of (a) minimum and (b) maximum potential energies for the TSF-SDS simulations in terms of the first (Set1, red), second (Set2, green), and third (Set3, blue) initial temperature settings.

of the initial temperature setting should be carefully specified in the applications of TSF-SDS. 3.5. Evaluations of Trajectories Generated by TSFSDS. In TSF-SDS, the obtained conformations might be irrelevant to the biological functions at room temperature because all of the snapshots are generated at different temperatures (rather than 300 K) in each temperature group. Therefore, it might be difficult to directly estimate the FEL and elucidate transition pathways without a postprocessing treatment. To evaluate sampled structures at room temperature, a free energy calculation might be convenient. To calculate the FEL at room temperature, TSF-SDS is switched to the non-temperature-shuffled (conventional) SDS after a wide conformational subspace is achieved with TSFSDS. After the switching, conventional SDS is performed for long cycles at room temperature such as 300 K. Finally, the structures sampled with SDS at room temperature are used to evaluate the FEL. To calculate the FEL, a Markov state model (MSM) is constructed from the SDS trajectories at room temperature. Herein it might be easy to estimate the FEL by means of the MSM as our preceding studies reported,36−38 i.e., an equilibrium distribution π is obtained from the MSM by solving the eigenvalue problem π = Tπ, where T is a transition matrix presumed by the trajectories. In summary, the following three steps might be required to evaluate the configurations on the FEL: (1) TSF-SDS quickly/ roughly searches a wide conformational space; (2) conventional SDS at room temperature is used to refine the conformational space searched with TSF-SDS; (3) a free energy calculation based on an MSM in terms of the trajectories of the conventional SDS at room temperature is performed. For the refinement of the coarse-grained information on metastable states, one might need a postprocessing treatment such as the free energy calculation

Figure 8. Negative logarithm plots of populations of snapshots generated by the TSF-SDS simulations using (a) the first (Set1), (b) the second (Set2), and (c) the third (Set3) initial temperature settings. All of the snapshots generated by each TSF-SDS simulation were projected onto a conformational subspace spanned by the RMSD values (Cα RMSDopen, Cα RMSDclosed).

To check the convergence of conformational sampling, the minimum and maximum potential energies (lower and upper bounds) were monitored versus cycle number. Figure 9 shows the minimum and maximum potential energies during the 50 cycles for all types of TSF-SDS simulations. Judging from these profiles, the maximum potential energy kept increasing in all cases (Figure 9b), whereas the minimum potential energy converged to almost the same value in two cases (red line (Set1) and green line (Set2) in Figure 9a). In contrast, the minimum potential energy of Set3 was quite higher than the others (blue line in Figure 9a). Therefore, it might take a longer simulation time to sample essential states with lower energies sufficiently when higher temperatures are specified in the initial temperature setting. The reason why the upper bounds of the potential energies kept increasing in all three cases was that TSF-SDS tends to sample configurations with higher energies once possible metastable states with lower energies have been sampled in the early cycles. Therefore, it might be reasonable to monitor the lower bound of the potential energy to determine whether the TSF-SDS simulation has sampled essential metastable states and to use this information to judge whether to end the TSF-SDS simulation. In summary, a too-high temperature setting might cause a slowdown of conformational sampling for the essential states with low energies. Therefore, it is noted that the upper bound G

DOI: 10.1021/acs.jcim.8b00095 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

basis set. Therefore, IP might identify initial structures with structural dissimilarity more efficiently than RMSD when a reasonable basis set is known a priori. Generally, SDS with RMSD tends to identify similar initial structures because of simple square root sums of structural differences, which might cause inefficiency of conformational sampling in SDS. Therefore, to remedy the weak point of selections of initial structures using RMSD, the conventional SDS should be extended as TSF-SDS by shuffling the temperatures in restarting short-time MD simulations as perturbations, which might be one of the different impacts on TSF-SDS/SDS compared with the choice of the two measures (IP and RMSD).

based on the SDS at room temperature combined with an MSM. 3.6. Several Dependencies on Actual Simulations. As an additional computational cost of the present method, one has to consider the calculations of RMSDs for structures taken from trajectories. When selecting a number of initial structures (ninitial), it takes a vast computational cost because RMSDs for all of the pairs among the structures should be evaluated. Therefore, ninitial should be as small as possible to save the computational cost in calculating RMSDs. To set a reasonable value of ninitial, it might better to confirm how the initial structures are distributed on a conformational subspace. As an example, for the present target (T4L), the 10 initial structures (ninitial = 10) were distributed over the subspace spanned by RMSDs as shown in Figure S3, meaning that they were distributed at the boundaries of the distribution. Because of the appropriate arrangement of initial structures, TSF-SDS successfully promoted the conformational transition from the open state to the closed state at around the sixth cycle (Figure S3d). Therefore, 10 initial structures were enough to efficiently perform TSF-SDS for T4L. Actually, the reasonable value of ninitial might more or less depend on the target system of interest. Thus, one has to carefully specify ninitial when performing TSF-SDS. The conformational sampling efficiency of TSF-SDS might depend on the set of initial temperatures. As shown in Table 1, the minimum simulation times to sample the closed state of T4L are different for the three sets of initial temperatures, ranging from 2 to 15 ns. Therefore, one might address the conformational sampling efficiency in terms of a different set of initial temperatures as preliminary runs. For the temperature range, we prepared Set1 (from 300 to 310 K), Set2 (from 300 to 320 K), and Set3 (from 300 to 340 K). Herein, Set1 is a bit higher than room temperature (300 K). This range might be appropriate for promoting conformational transitions of proteins often found at room temperature because the higher temperatures work as perturbations to promote the essential transitions. Set2 might be appropriate for inducing relatively large structural changes such as the folding of mini-proteins. By reference to the folding temperature of trp-cage (∼311 K),39 Set1 was adopted to investigate the folding process of trp-cage. For UBQD, we also utilized the first temperature range (Set1) to efficiently promote the normal conformational transition (the dissociation process of UBQD). On the other hand, Set3 was utilized as an inefficient control experiment for Set1 and Set2. Actually, too-high temperatures might decrease the conformational sampling efficiency because short-time MD simulations under too-high temperatures might sample structures irrelevant to essential conformational transitions. In the original (non-temperature-shuffled) SDS, the inner product (IP) was utilized to measure the structural dissimilarity among the structures. In contrast, RMSD was employed to measure the structural dissimilarity in the present study (TSF-SDS). The choice of the measure depends strongly on the system of interest and brings both advantages and drawbacks. RMSD might be more general than IP because a reasonable basis set must be specified to define the IP. Furthermore, how to set the basis set is nontrivial and remains an issue to be tackled. On the contrary, RMSD is easy to calculate once a set of structures is given. However, RMSD tends to fail to treat high dimensionality of structures, which IP can do since the high-dimensional subspace is defined by the

4. CONCLUSION In the present study, TSF-SDS has been proposed as an extension of SDS for finding neighboring metastable states of a given reactant and generating transition pathways starting from the reactant. In TSF-SDS, the sum of RMSDs among configurations defined by eq 1 was adopted to characterize the structural dissimilarity and worked to specify the structurally uncorrelated configurations as the initial structures of SDS. Additionally, to further promote conformational transitions, the temperatures of the initial structures were randomly shuffled at the beginning of each conformational cycle. It was concluded that the simple temperature shuffling in TSF-SDS might enhance the conformational sampling efficiency of conventional SDS, but a reasonable temperature range should be found and specified a priori. As a future perspective, the application of the present SDS extension to a large protein system will be reported in a subsequent paper.



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jcim.8b00095. RMSD profile for trp-cage (Figure S1); RMSD and COM distance profiles for UBQD (Figure S2); projections of the trajectories for T4L (Figure S3) (PDF)



AUTHOR INFORMATION

Corresponding Authors

*E-mail: [email protected]. *E-mail: [email protected]. ORCID

Ryuhei Harada: 0000-0001-7987-0540 Yasuteru Shigeta: 0000-0002-3219-6007 Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS The present work was supported by a Japan Society for the Promotion of Science (JSPS) Research Fellowship for Young Scientists (15J03797), a JSPS KAKENHI Grant-in-Aid for Young Scientists (A) (16H06164), and Grants-in-Aid for Scientific Research on Innovative Areas “Dynamic Ordering and Biofunction”, “Photosynergetics”, and “3D Active-Site Science” (26102525, 26107004, and 26105012) from JSPS. The computations were performed using the COMA/HAPACS at the Center for Computational Sciences (CCS) of the H

DOI: 10.1021/acs.jcim.8b00095 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Information and Modeling

Structural Transitions of Proteins. J. Chem. Theory Comput. 2017, 13, 1411−1423. (19) Harada, R.; Kitao, A. Parallel Cascade Selection Molecular Dynamics (Pacs-Md) to Generate Conformational Transition Pathway. J. Chem. Phys. 2013, 139, 035103−1−10. (20) Harada, R.; Shigeta, Y. Temperature-Shuffled Parallel Cascade Selection Molecular Dynamics Accelerates the Structural Transitions of Proteins. J. Comput. Chem. 2017, 38, 2671−2674. (21) Zhang, X. J.; Matthews, B. W. Conservation of Solvent-Binding Sites in 10 Crystal Forms of T4-Lysozyme. Protein Sci. 1994, 3, 1031− 1039. (22) Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L. Comparison of Simple Potential Functions for Simulating Liquid Water. J. Chem. Phys. 1983, 79, 926−935. (23) Hess, B.; Bekker, H.; Berendsen, H. J. C.; Fraaije, J. G. E. M. LINCS: A Linear Constraint Solver for Molecular Simulations. J. Comput. Chem. 1997, 18, 1463−1472. (24) Miyamoto, S.; Kollman, P. A. SETTLE - an Analytical Version of the Shake and Rattle Algorithm for Rigid Water Models. J. Comput. Chem. 1992, 13, 952−962. (25) Bussi, G.; Donadio, D.; Parrinello, M. Canonical Sampling through Velocity Rescaling. J. Chem. Phys. 2007, 126, 014101−1−7. (26) Parrinello, M.; Rahman, A. Polymorphic Transitions in SingleCrystals - a New Molecular-Dynamics Method. J. Appl. Phys. 1981, 52, 7182−7190. (27) Nose, S.; Klein, M. L. Constant Pressure Molecular-Dynamics for Molecular-Systems. Mol. Phys. 1983, 50, 1055−1076. (28) Essmann, U.; Perera, L.; Berkowitz, M. L.; Darden, T.; Lee, H.; Pedersen, L. G. A Smooth Particle Mesh Ewald Method. J. Chem. Phys. 1995, 103, 8577−8593. (29) Abraham, M. J.; van der Spoel, D.; Lindahl, E.; Hess, B.; The GROMACS Development Team. GROMACS User Manual, version 2018; www.gromacs.org. (30) Duan, Y.; Wu, C.; Chowdhury, S.; Lee, M. C.; Xiong, G. M.; Zhang, W.; Yang, R.; Cieplak, P.; Luo, R.; Lee, T.; Caldwell, J.; Wang, J. M.; Kollman, P. A Point-Charge Force Field for Molecular Mechanics Simulations of Proteins Based on Condensed-Phase Quantum Mechanical Calculations. J. Comput. Chem. 2003, 24, 1999−2012. (31) Olsson, M. H. M.; Sondergaard, C. R.; Rostkowski, M.; Jensen, J. H. PROPKA3: Consistent Treatment of Internal and Surface Residues in Empirical pKa Predictions. J. Chem. Theory Comput. 2011, 7, 525−537. (32) Neidigh, J. W.; Fesinmeyer, R. M.; Andersen, N. H. Designing a 20-Residue Protein. Nat. Struct. Biol. 2002, 9, 425−430. (33) Cook, W. J.; Jeffrey, L. C.; Carson, M.; Chen, Z.; Pickart, C. M. Structure of a Diubiquitin Conjugate and a Model for Interaction with Ubiquitin Conjugating Enzyme (E2). J. Biol. Chem. 1992, 267, 16467−16471. (34) Weaver, L. H.; Matthews, B. W. Structure of Bacteriophage-T4 Lysozyme Refined at 1.7 Å Resolution. J. Mol. Biol. 1987, 193, 189− 199. (35) Vijaykumar, S.; Bugg, C. E.; Cook, W. J. Structure of Ubiquitin Refined at 1.8 Å Resolution. J. Mol. Biol. 1987, 194, 531−544. (36) Harada, R.; Nishihara, Y.; Wakai, N.; Kitao, A. Conformational Transition Pathway and Free Energy Analyses of Proteins by Parallel Cascade Selection Molecular Dynamics (PaCS-MD). AIP Conf. Proc. 2014, 1618, 86−89. (37) Tran, D. P.; Takemura, K.; Kuwata, K.; Kitao, A. ProteinLigand Dissociation Simulated by Parallel Cascade Selection Molecular Dynamics. J. Chem. Theory Comput. 2018, 14, 404−417. (38) Kitao, A.; Harada, R.; Nishihara, Y.; Tran, D. P. Parallel Cascade Selection Molecular Dynamics for Efficient Conformational Sampling and Free Energy Calculation of Proteins. AIP Conf. Proc. 2016, 1790, 020013. (39) Halabis, A.; Zmudzinska, W.; Liwo, A.; Oldziej, S. Conformational Dynamics of the Trp-Cage Miniprotein at Its Folding Temperature. J. Phys. Chem. B 2012, 116, 6898−6907.

University of Tsukuba. This research was partially supported by the Initiative on Promotion of Supercomputing for Young or Women Researchers, Information Technology Center, The University of Tokyo. This research partly used computational resources under the Collaborative Research Program for Young/Women Scientists provided by the Academic Center for Computing and Media Studies, Kyoto University. This work was supported by the TSUBAME Encouragement Program for Young/Female Users of Global Scientific Information and the Computing Center at Tokyo Institute of Technology.



REFERENCES

(1) Schlitter, J.; Engels, M.; Kruger, P.; Jacoby, E.; Wollmer, A. Targeted Molecular-Dynamics Simulation of Conformational Change - Application to the T-R Transition in Insulin. Mol. Simul. 1993, 10, 291−308. (2) Isralewitz, B.; Baudry, J.; Gullingsrud, J.; Kosztin, D.; Schulten, K. Steered Molecular Dynamics Investigations of Protein Function. J. Mol. Graphics Modell. 2001, 19, 13−25. (3) Isralewitz, B.; Gao, M.; Schulten, K. Steered Molecular Dynamics and Mechanical Functions of Proteins. Curr. Opin. Struct. Biol. 2001, 11, 224−230. (4) Laio, A.; Parrinello, M. Escaping Free-Energy Minima. Proc. Natl. Acad. Sci. U. S. A. 2002, 99, 12562−12566. (5) Raiteri, P.; Laio, A.; Gervasio, F. L.; Micheletti, C.; Parrinello, M. Efficient Reconstruction of Complex Free Energy Landscapes by Multiple Walkers Metadynamics. J. Phys. Chem. B 2006, 110, 3533− 3539. (6) Piana, S.; Laio, A. A Bias-Exchange Approach to Protein Folding. J. Phys. Chem. B 2007, 111, 4553−4559. (7) Laio, A.; Gervasio, F. L. Metadynamics: A Method to Simulate Rare Events and Reconstruct the Free Energy in Biophysics, Chemistry and Material Science. Rep. Prog. Phys. 2008, 71, 126601−1−22. (8) Nakajima, N.; Nakamura, H.; Kidera, A. Multicanonical Ensemble Generated by Molecular Dynamics Simulation for Enhanced Conformational Sampling of Peptides. J. Phys. Chem. B 1997, 101, 817−824. (9) Sugita, Y.; Okamoto, Y. Replica-Exchange Molecular Dynamics Method for Protein Folding. Chem. Phys. Lett. 1999, 314, 141−151. (10) Sugita, Y.; Kitao, A.; Okamoto, Y. Multidimensional ReplicaExchange Method for Free-Energy Calculations. J. Chem. Phys. 2000, 113, 6042−6051. (11) Fukunishi, H.; Watanabe, O.; Takada, S. On the Hamiltonian Replica Exchange Method for Efficient Sampling of Biomolecular Systems: Application to Protein Structure Prediction. J. Chem. Phys. 2002, 116, 9058−9067. (12) Liu, P.; Kim, B.; Friesner, R. A.; Berne, B. J. Replica Exchange with Solute Tempering: A Method for Sampling Biological Systems in Explicit Water. Proc. Natl. Acad. Sci. U. S. A. 2005, 102, 13749−13754. (13) Okumura, H. Partial Multicanonical Algorithm for Molecular Dynamics and Monte Carlo Simulations. J. Chem. Phys. 2008, 129, 124116−1−9. (14) Wang, L.; Friesner, R. A.; Berne, B. J. Replica Exchange with Solute Scaling: A More Efficient Version of Replica Exchange with Solute Tempering (Rest2). J. Phys. Chem. B 2011, 115, 9431−9438. (15) Itoh, S. G.; Okumura, H. Hamiltonian Replica-Permutation Method and Its Applications to an Alanine Dipeptide and Amyloidβ(29−42) Peptides. J. Comput. Chem. 2013, 34, 2493−2497. (16) Itoh, S. G.; Okumura, H. Replica-Permutation Method with the Suwa-Todo Algorithm Beyond the Replica-Exchange Method. J. Chem. Theory Comput. 2013, 9, 570−581. (17) Yamamori, Y.; Kitao, A. Mustar Md: Multi-Scale Sampling Using Temperature Accelerated and Replica Exchange Molecular Dynamics. J. Chem. Phys. 2013, 139, 145105−1−11. (18) Harada, R.; Shigeta, Y. Efficient Conformational Search Based on Structural Dissimilarity Sampling: Applications for Reproducing I

DOI: 10.1021/acs.jcim.8b00095 J. Chem. Inf. Model. XXXX, XXX, XXX−XXX