Applications for Reproducing Structural Transitions - ACS Publications

Feb 7, 2017 - Conformational resampling is performed as follows: (I) arrangement ... promote structural transitions because conformational resampling ...
1 downloads 0 Views 2MB Size
Subscriber access provided by UB + Fachbibliothek Chemie | (FU-Bibliothekssystem)

Article

Efficient Conformational Search Based on Structural Dissimilarity Sampling: Applications for Reproducing Structural Transitions of Proteins Ryuhei Harada, and Yasuteru Shigeta J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.6b01112 • Publication Date (Web): 07 Feb 2017 Downloaded from http://pubs.acs.org on February 10, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Theory and Computation is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Efficient Conformational Search Based on Structural Dissimilarity Sampling: Applications for Reproducing Structural Transitions of Proteins Ryuhei Harada*a and Yasuteru Shigeta*a a

Center for Computational Sciences, University of Tsukuba, Tennodai, 1-1-1, Tsukuba, Ibaraki, 305-8577, Japan Corresponding authors: Ryuhei Harada, Yasuteru Shigeta E-mails: [email protected], [email protected] Abstract Structural Dissimilarity Sampling (SDS) is proposed as an efficient conformational search method to promote structural transitions essential for the biological functions of proteins. In SDS, initial structures are selected based on structural dissimilarity, and conformational resampling is repeated. Conformational resampling is performed as follows: (I) arrangement of initial structures for a diverse distribution at the edge of a conformational subspace, and (II) promotion of the structural transitions with multiple short-time molecular dynamics (MD) simulations restarting from the diversely distributed initial structures. Cycles of (I) and (II) are repeated to intensively promote structural transitions because conformational resampling from the initial structures would quickly expand conformational distributions toward unvisited conformational subspaces. As a demonstration, SDS was first applied to maltodextrin binding protein (MBP) in explicit water to reproduce structural transitions between the open and closed states of MBP. Structural transitions of MBP were successfully reproduced with SDS in nanosecond-order simulation times. Starting from both the open and closed forms, SDS successfully reproduced the structural transitions within 25 cycles (a total 250 ns of simulation time). For reference, a conventional long-time (500 ns) MD simulation under NPT (300 K and 1 bar) starting from the open form failed to reproduce the structural transition. In addition to the open-closed motions of MBP, SDS was applied to folding processes of the fast-folding proteins (chignolin, trp-cage, and villin) and successfully sampled their native states. To confirm how the selections of initial structures affected conformational sampling efficiency, numbers of base sets for characterizing structural dissimilarity of initial structures were addressed in distinct trials of SDS. The parameter searches showed that conformational sampling efficiency was relatively insensitive with respect to the numbers of base sets, indicating the robustness of SDS for actual applications.

1

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 31

I. Introduction Proteins induce structural transitions to activate biological functions such as ligand binding and release, catalysis, and signal transduction. Therefore, clarifying the underlying mechanisms of structural transitions is indispensable for understanding their functions. These structural transitions rarely occur over long timescales and are thus regarded as biologically relevant rare events. Quantum mechanics (QM), quantum mechanics and molecular mechanics (QM/MM), and molecular dynamics (MD) simulations are widely used to reproduce chemical reactions and the structural transitions of proteins. In particular, the atomic trajectories of MD simulations are used to analyze the intrinsic dynamics of proteins.

The atomic

trajectories generated by sufficiently long-time MD simulations could be used to analyze biological functions and calculate several physical properties. However, the timescales of biological functions generally range from milliseconds to seconds, which are far from accessible timescales with conventional MD simulations. The issue of timescale should be adequately resolved to understand biological functions with MD simulations. To address this issue, special-purpose machines have been developed to perform extremely lengthy MD simulations.1,2 General-purpose graphics processing units (GPUs) have also been recently used with MD programs. Many important algorithms and models have been implemented on GPU-based MD packages. These MD packages can now perform extremely long-time protein simulations.3 However, because biologically relevant transitions are stochastic processes, these long-time MD simulations may not capture essential structural transitions during every run. Several other enhanced sampling methods have been proposed to reproduce biologically relevant rare events. For example, metadynamics4,5 is a popular enhanced conformational sampling method. This method describes the biological reactions of a given protein by a set of collective variables (CVs). Once CVs are specified, positive history-dependent Gaussian potentials as functions of the CVs are added to the original potential energy as an artificial bias to enhance conformational sampling. Recently, several variants6 of metadynamics have been proposed to reproduce more complicated biological reactions. In particular, the adaptively biased MD (ABMD)7 replaces many Gaussian functions with cubic B-spline functions to save computational cost. Instead of flattening free energy surfaces using the external potentials, the adaptive biasing force (ABF)8 directly uses inverse gradients of original ones to promote structural transitions efficiently. Another alternative is the temperature-accelerated MD method (TAMD).9 In the TAMD, a set of additional CVs is coupled to the original CVs of a biological system at a high temperature. Previous studies of TAMD have shown that, if a set of suitable CVs is specified to describe biological reactions, the conformational sampling of system can be greatly accelerated thorough coupling with additional CVs. Essential dynamic sampling (EDS),10 amplified collective motions,11 and accelerated MD12 are recognized as enhanced conformational sampling methods based on CVs. However, it is generally difficult to find a set of essential CVs and nontrivial to specify suitable CVs for describing the biological reactions of interest.

2

ACS Paragon Plus Environment

Page 3 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Other strategies based on Boltzmann and non-Boltzmann ensembles, such as the replica exchange MD (REMD),13 the multicanonical MD (McMD)14, and their variants, have also been proposed.15-19 These methods are known as generalized ensemble methods20 and have been widely used to find global minimum and metastable states existing in complicated free energy landscapes. The above studies indicate that it is desirable to develop enhanced sampling methods for reproducing biologically relevant rare events. To evade the conformational sampling problem arising from the limited timescale of conventional MD simulations, we have recently proposed a number of simple yet powerful conformational sampling methods. A series of the proposed methods has been summarized as a review.21 The proposed methods have been applied to several biological systems to investigate their biological functions.

22,23

In addition to the classical MD simulations, a reactive quantum mechanics/molecular

mechanics (QM/MM) simulation was coupled with the metadynamics for investigating enzyme reaction,24 indicating that a combination of different types of methods might be effective for reproducing the biologically relevant rare events as a multi-scale approach. Common to these methods is conformational resampling, which is performed by restarting multiple short-time MD simulations from different initial structures. Multiple short-time MD simulations are launched from a set of initial structures with renewed velocities at the same time. To promote structural transitions efficiently, conformational resampling is simply repeated. The key to efficiently perform these methods is to appropriately select initial structures, i.e., the initial structures should be selected as candidates with high potential to transit from current states to neighboring metastable states. Generally, the initial structures are characterized by a set of reaction coordinates (RCs). Therefore, a set of RCs should be appropriately specified in actual applications. For the review of our methods, see our feature article.21 In this study, we developed a new, efficient conformational sampling method, the structural dissimilarity sampling (SDS) algorithm, based on the same strategy proposed in our previous studies. In SDS, initial structures are selected to be structurally uncorrelated with each other and they were intensively resampled with multiple short-time MD simulations, which enables searching of unvisited conformational subspaces according to the cycles of conformational resampling. As the first demonstration of SDS, we examined the open-closed structural transitions of maltodextrin binding protein (MBP) in explicit water. Additionally, the present method was applied to protein folding of mini-proteins (chignolin, trp-cage, and villin) in implicit water. The present paper is outlined as follows. In section II, we provide the basic theory of SDS. We then discuss numerical results in sections III and IV. Finally, we summarize our findings in section V. II. Material and Methods A. Structural Dissimilarity Sampling (SDS) SDS employs conformational resampling from initial structures that are diversely distributed at the edge of a conformational subspace. Conformational resampling is defined as a cycle of short-time MD simulations restarting from the initial structures. In the previous work,25 we have numerically proven that 3

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 31

conformational resampling based on a cycle of restarting multiple short-time MD simulations might be powerful for promoting structural transitions. The key to this process is the selection of suitable initial structures for reproducing biologically relevant structural transitions of proteins. In SDS, the initial structures are diversely distributed at the edge of conformational subspace and selected from the diversely distribution. Due to the selections, conformational resampling from the initial structures might promote frequent structural transitions, enabling the search of unvisited conformational subspaces through the cycles of conformational resampling. At every cycle, the initial structures are updated by refereeing the accumulated trajectories, and the newly updated initial structures are broadly distributed onto the conformational subspace. SDS might therefore provide a broad conformational distribution once a set of starting structures for a given protein is specified at the first cycle. To perform SDS, a criterion must be defined for selecting a set of initial structures. One candidate criterion, principal component analysis (PCA),26,27 might be suitable for defining structural dissimilarity in a high-dimensional conformational subspace. Generally, the PCA characterizes the structural fluctuations of a given protein based on its vibrational modes. In terms of the mass-weighted coordinates, , the PCA is performed by diagonalizing a variance-corvariance matrix, , as follows: C ≡ 〈 − 〈〉  − 〈〉 〉 =   ,

(1)

 = √  , √  , √  , . .. ,   ,   ,   ,

(2)



where  =  − 〈〉 stands for the transposition of  ≡  − 〈〉, and  stands for the coordinates of a reference that determines the origin of a conformational distribution. 〈〉 is defined as an average over the snapshots generated by conformational resampling. As a result of diagonalization, principal modes (PMs)  ! " = 1, 2, … , 3' − 6 , where ' is the total are obtained as the following set of eigenvectors: PM number of degrees of freedom (DOF). Herein, six DOF for the external motions (translation and rotation) are removed when performing the PCA. In SDS, the origins of the conformational subspaces and PMs are simultaneously updated, i.e. the PCA is performed using the accumulated trajectories at every cycle. In addition to PMs, the PCA provides magnitudes of each mode as a set of eigenvalues of the variance-corvariance matrix, defined by Eq. (1) as follows: )* + " = 1, 2, … , 3' − 6 . By summing up the eigenvalues until the kth PM, the contribution of the accumulated PMs with respect to the total modes, ,- , can be defined as follows: ,- = ∑-/ * ⁄∑ / * .

(3)

This contribution also means that a partial set of PMs can cover the total fluctuations of a given protein in a  , PM 1 , …, PM - . If reduced low-dimensional conformational subspace spanned by a set of the PMs, (PM the value of contribution is sufficiently large, the subset of PMs can be a basis set sufficient for constructing a reduced conformational subspace. Therefore, the reduced conformational subspace can be defined by 4

ACS Paragon Plus Environment

Page 5 of 31

Journal of Chemical Theory and Computation -

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

678 setting an upper bound for PMs, 2345 , which is the lowest k that satisfies ,-678 = ∑ / * 9∑ / * >

cutoff, where cutoff is a threshold for the contribution.  , may be To select diverse initial structures, the inner products among principal coordinates (PCs), @  are defined as projections of the mass-weighted coordinates, , onto the appropriate criteria. Herein, @ PMs as follows:  = A , A1 , … , A- , @ 678 -B ∙  2 = 1, 2, … , 2345 , A- = PM

(4)

 projected onto the kth PM. Based on the PCs, the inner product (IPEF ) where A- is the kth element of @  E and @ F ) is defined as follows: between the Gth and the H th structures (@

-678 E F E @ F = ∑-/ IPEF = @ A- A- ,

(5)

F E and @ F , respectively. As the first step to select a set of where A-E and A- stand for the kth elements of @ initial (N) structures (I , I1 , … , I ), a snapshot should be specified as the first candidate. In this

demonstration, a set of multiple shot-time MD simulations is launched from a starting structure of a given protein to define an origin. Herein, an averaged structure is calculated in terms of the trajectories generated by the multiple short-time MD simulations and utilized as the origin. After determining the origin, norms are measured for all (M) snapshots from the multiple short-time MD simulations, i.e., √IP , √IP11 , … , and  E . After that step, √IPJJ . The snapshot with the largest norm is then specified as the first candidate, I = @ IPEK L = 1, 2, 3, … , M: L ≠ G are calculated, and the snapshot with the minimum value of inner products F . Inner products for all of the snapshots are then calculated, i.e., is selected as the second candidate, I1 = @ IPEK and IPFK L = 1, 2, 3, … , M: L ≠ G, H , and the sums of the inner products are considered as follows: IPEK + IPFK . The snapshot with the minimum value of this sum is selected as the third candidate,  W . For candidates for the kth initial structure, the above minKTE,F IP EK + IPFK ≥ MinL = V , i.e., I = @ procedure is simply repeated as follows: minKTE,F,…,X IPEK + IPFK + ⋯ IP XK ≥ MinL = Z ,

[ . I- = @

(6)

Conformational resampling is then launched with multiple short-time MD simulations starting from these initial structures, where initial velocities are reassigned based on the Maxwell-Boltzmann distribution. In SDS, the above initial structure selections and their conformational resampling are repeated as cycles. The flowchart of SDS is given in Figure 1(a). Figure 1(b) represents a conceptual diagram for selecting initial structures based on the structural dissimilarity defined by Eqs. (1-6).

5

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 31

B. System setup for demonstrations and computational details To demonstrate the conformational sampling efficiency, SDS was applied to maltodextrin binding protein (MBP) in explicit water to reproduce the structural transitions. MBP is a protein consisting of 370 residues and induces large-amplitude domain motions upon ligand binding. In this demonstration, apo-type simulations were considered without the ligand for simplicity. As the starting structures, the experimentally determined open (1OMP)28 and closed (1ANF)29 forms were taken from the Protein Data Bank (PDB). Both forms were solvated with the SPC/E water model30 and immersed in rectangular boxes. The minimum distances between the proteins and box boundaries were set to 10 Å. Eight sodium ions were replaced by water molecules to prepare electrically neutral systems. After solvation, the total numbers of atoms were 82,338 and 65,844 for the open and closed forms of MBP, respectively. MD simulations were performed under three-dimensional periodic boundary conditions using the AMBER ff03 force field.31 Water molecules were treated as rigid bodies with the SETTLE algorithm.32 The chemical bonds of the proteins were treated as rigid bodies with the LINCS algorithm.33 To model the equilibrated systems, NVT simulations were first performed and followed by NPT (1 bar and 300 K) simulations. NVT simulations were conducted with the modified Berendsen thermostat at 300 K.34 NPT simulations were then conducted with the Parrinello-Rahman method.35,36 The equations of motion were integrated by the leapfrog method. The time step length was set to 2 fs. Electrostatic interactions were calculated with the particle mesh Ewald method37 for a real space cutoff of 10 Å. The cutoff value for van der Walls interactions was set to 10 Å. MD simulations were performed with the GPU version of the Gromacs 5.0.7 package.38 To select the initial structures at the 1st cycle, short-time (100-ps) MD simulations were independently launched from the equilibrated snapshots of open and closed forms, where initial velocities were randomly assigned based on the Maxwell Boltzmann distribution. During conformational sampling, a hundred structures were selected as initial structures at every cycle, i.e. the total simulation time per cycle was 10 ns (100 initial structures × 100 ps MD simulations). The cycles of conformational resampling were continued until the 50th cycle, amounting to a total 500 ns of simulation time. In the present demonstration, the top thirty PMs were utilized to calculate the contribution defined by Eq. (3). The PMs were obtained by diagonalizing a variance-corvariance matrix constructed by the Cα coordinates, which could cover the total fluctuations of MBP with ~70% of the contributions. To evaluate conformational sampling efficiency, a set of conventional MD simulations under NPT (1 bar and 300 K) were launched from the open and closed forms and continued until 500 ns. In addition to the open-closed domain motions of MBP, SDS was applied to folding processes of fast-folding proteins, i.e. chignolin (10 residues and 138 atoms, PDBid 1UAO),39 trp-cage (20 residues and 304 atoms, PDBid 1L2Y),40 and villin (35 residues and 582 atoms, PDBid 1YRF).41 To include the solvent effect implicitly, the generalized born with surface area (GBSA) model was employed. Herein, the surface tension was 0.005 kcal/mol/Å2. MD simulations were performed with the AMBER ff03 force field31 under NVT (T = 300 K). The temperature of the systems was controlled at 300 K using the Berendsen thermostat.42 6

ACS Paragon Plus Environment

Page 7 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

To start the trials of SDS, fully extended configurations were modeled with tLeap module in the AMBER tools.43 Then, to select the initial structures at the 1st cycle, a hundred short-time (100-ps) MD simulations were launched from the modeled configurations with different initial velocities. During the cycles of conformational resampling, a hundred initial structures were selected at each cycle and resampled by 100-ps MD simulations. The cycles of conformational resampling were continued to the 10th cycle for chignolin and the 50th cycle for trap-cage and villin. Through all the demonstrations, the SDS trials were executed with an original interface developed by us. III. Numerical Results A. Structural transition from the open to closed states of MBP To illustrate the diffusions of conformational distributions, the trajectories were projected onto a conformational subspace spanned by the PMs. To characterize the structural transitions between the open and closed states, temporal PMs were defined and used for projecting the trajectories. To determine the temporal PMs, a set of trajectories generated by MD simulations starting from the open and closed states of MBP were joined, and a total of 20 ns (2 × 10 ns) of trajectories were utilized for the PCA. By this definition, the transition state between the open and closed states was located around the origin of a conformational subspace. To confirm whether the structural transition from the open to closed states was induced or not, the trajectories were projected onto a one-dimensional conformational subspace spanned by the 1st PM. Figure 2(a) shows the projections of trajectories at the 1st (red), 25th (green), and 50th (blue) cycles. Judging from Figure 2(a), the projection at the 1st cycle had a single peak around the open state. As the cycles went by, several peaks appeared in the conformational subspace at the 25th cycle, and the distribution had a sufficient overlap with the conformational subspace around the closed state, indicating that the structural transition from the open to closed states was successfully reproduced with SDS within the 25 cycles. After the 50th cycle, a broad conformational subspace covering the open and closed states was sampled (the blue boxes of Figure 2(a)). For a reference, the trajectory generated by the conventional long-time (500 ns) MD simulation was also projected onto the conformational subspace spanned by the 1st PM as shown in Figure 2(b). The same computational cost (500 ns) as the 50 cycles of SDS was employed to compare the conformational sampling efficiency. Judging from Figure 2(b), the conformational subspace searched by the 500-ns MD simulation starting from the open state was much narrower than that sampled by SDS during the 50 cycles. Herein, the 500-ns MD simulation clearly failed to reproduce the structural transition from the open to closed states because of trapping into the initial state, indicating the high conformational sampling efficiency of SDS than a conventional long-time MD simulation. To address how SDS sampled the closed state, initial structures selected at every cycle were also projected onto a two-dimensional subspace spanned by a set of Cα RMSDs measured from the open and closed states. Figures 3(a-f) show the projections of the initial structures and they were diffused as the cycles went by. For the reason why the conformations from the shorter MD simulations are more widely distributed than those of the longer MD simulation (for instance, Figure 3(a)), there might be several possibilities. 7

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 31

Actually, the initial structures at the 1st cycle were selected from multiple (100) trajectories generated by preliminary short-time (100-ps) MD simulations starting from the open/closed states. Therefore, several initial structures might be selected so that they were widely distributed in the conformational subspace at the begging, if the multiple trajectories fortunately included snapshots relevant to structural transitions. For detecting the target structure, the projection of the initial structures selected at the 20th cycle (Figure 3(c)) had sufficient overlap with the projection of the trajectory generated by a 10-ns MD simulation starting from the closed state, indicating that SDS successfully sampled the closed state without referring to any target structure. The minimum value of Cα RMSD measured from the X-ray crystal structure of the closed form (PDBid: 1ANF)36 sampled by SDS was 0.78 Å. The closest snapshot to the X-ray crystal structure was superimposed in Figure 4(a). As shown in Figures 3(d-f), the projections of the initial structures at the 30th, 40th, and 50th cycles were broadly distributed in the conformational subspace after reproducing the structural transition from the open to closed states within the 20th cycle. Judging from the projections of initial structures, SDS could search the broad conformational subspaces separated from the starting structures. Generally, it might be difficult to sample these high-energy conformational subspaces including transition states withe conventional MD simulations, showing the high conformational sampling efficiency of SDS. To confirm how SDS covered the conformational subspace during the 50 cycles, all the trajectories were projected (see Supporting Information, Figure S1(a)). Judging from the projection, SDS sufficiently covered the broad conformational subspace including the transitional area between the open/closed states. B. Structural transition from the closed to open states of MBP SDS was applied to reproduce the inverse structural transition from the closed to open states. A total of 50 cycles were repeated starting from the closed state. Figure 2(c) shows the projections of trajectories onto the 1st PM at the 1st (red), 25th (green), and 50th (blue) cycles. At the 25th cycle, SDS sufficiently covered the conformational subspace of the open state, since the distribution at the 50th cycle (the blue boxes of Figure 2(c)) had sufficient overlap with the distribution of trajectory generated by the 500-ns MD simulation starting from the open state (see Figure 2(b)), where the minimum value of Cα RMSD measured from the X-ray crystal structure of the open form (PDBid: 1OMP)35 was 0.89 Å. The closest snapshot to the X-ray crystal structure sampled by SDS was superimposed in Figure 4(b). As shown in Figure 2(d), the 500-ns MD simulation reproduced the structural transition from the closed to open states. Herein, the closed state tends to be unstable without the ligand, leading to the stabilization of the open state. However, the transition state was not yet sufficiently sampled, as shown in Figure 2(d), and a single structural transition from the closed to open states was reproduced. In contrast, SDS could sufficiently sample the conformational subspace including the transition state during the cycles. To address the diffusions of initial structures, these structures were projected onto the conformational subspace spanned by a set of Cα RMSDs measured from the open and closed states, as shown in Figures 3(g-l). Similarly to the structural transition from the open to closed states, the projections of the initial 8

ACS Paragon Plus Environment

Page 9 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

structures diffused according to the cycles. In particular, the projection of initial structures at the 50th cycle widely diffused from the initial state. The computational results from the inverse structural transition suggest that SDS could reproduce both structural transitions between the open and closed states, also indicating that the present method might be valuable for efficiently promoting the structural transitions of a given protein with relatively low (nanosecond-order) computational cost. For the conformational subspace searched by the 50 cycles, Supporting Information (Figure S1(b)) shows projections of all the trajectories onto the conformational subspace spanned by the set of Cα RMSD measured from the open and closed forms. Judging from the projections, SDS also sufficiently covered the transitional area between the closed and open states, indicating the high conformational sampling efficiency of SDS. C. Protein folding of chignolin In the application to protein folding of chignolin, SDS selected a hundred initial structures at every cycle and the cycle of conformational resampling was continued to the 10th cycle, amounting to a total 100 ns of simulation time. To confirm whether SDS sampled the native state or not, distributions for Cα RMSD measured from the PDB structure were monitored. Figure 7(a) shows the profiles of distributions for Cα RMSD versus the cycles. Judging from the profiles, the populations around the native state (Cα RMSD ~ 1.0 Å) gradually increased as the cycles went by. Finally, the minimum value of Cα RMSD was 0.20 Å after the 10th cycle. Herein, the closest snapshot to the PDB structure was superimposed in Figure 4(c). In the present demonstration, SDS successfully sampled the native state during the 10 cycles. To compare the conformational sampling efficiency, we referred to an enhanced conformational sampling method called “multiscale essential sampling (MSES)”.44 In the MSES, a model of full DOF (protein atoms and explicit solvent) are coupled with the accelerated dynamics of the essential subsystem represented by the structural dynamics using a coarse-grained model. To enhance conformational sampling of the full-dimensional DOF, the different DOFs are coupled with a different set of Hamiltonians. Finally, the Hamiltonian exchange method15,16 was applied to remove the biased conformational sampling from the coupling term. In the preceding study,44 the MSES was applied to chignolin to estimate the folding energy landscape. In the actual demonstration, six replicas were prepared to couple a different set of Hamiltonians with the both implicit and explicit solvent for accelerating protein folding of chignolin. As the computational cost, 100 ns simulations were performed for each replica, amounting to a total 600 ns of simulation time.44 Compared to the results reported in the refrence,44 SDS took the almost same computational cost (submicrosecond-order simulation time, i.e. 100 ns in SDS versus 600 ns in the MSES) to sample the native state. Herein, it should be noted that SDS needs post-processing treatments for free energy calculations in terms of the generated trajectories, which would be done by the umbrella sampling 45,46 or the Markov state model,47 although the MSES method can directly provide free energies based on reweighting through the Hamiltonian exchange method. However, the post-processing treatment might be relatively valid to calculate 9

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 31

free energies, since a multi-scale free energy calculation method combined with the umbrella sampling successfully estimated the folding free energy landscape on chignolin with 200 ns simulation time in our previous study,48 indicating that the total computational time for estimating free energies with the post-processing treatment such as the umbrella sampling might be the almost same as the MSES (600 ns). To confirm the conformational area searched by SDS, all the snapshots were projected onto a two-dimensional subspace spanned by a set of hydrogen bond distances of the backbone (Asp3N-Gly7O, Asp3N-Thr8O in Figure 7(b)). The hydrogen bond distances characterize protein folding of chignolin.44,48,49 Judging from the projections, both SDS and the MSES sampled the almost same conformational subspace including the native and mis-folded states, indicating the validity of these methods (compare Figure 7(b) and Figure 2(b) in the preceding study).44 In particular, the highly populated states of Figure 7(b) well correspond to the local and global minima in the free energy landscape under the implicit solvent estimated by the MSES.44 D. Protein folding of trp-cage and villin In contrast to protein folding of chignolin, the applications to larger systems (trp-cage and villin) might be difficult because it is not sure that if the collective modes identified by the PCA represents local formations of secondary structures in protein folding, although the PCs might be sufficient for representing a simple formation of β-sheet for chignolin. Therefore, an alternative set of reaction coordinates defined in the preceding study50 was adopted, i.e. a set of partial Cα RMSDs measured from a native structure that can represent local formations of secondary structures. To define the partial Cα RMSDs, the native structures of trp-cage and villin were divided into two segments by referring to the secondary structures of native states. For trp-cage, the native structure was divided into segment1 (the helical region, the 2-11 residues) and segmetn2 (the remaining region, the 8-19 residues) by overlapping the intermediate loop region (the 8-11 residues). For villin, the native structure was divided into segment1 (the first and second helices, the 3-21 residues) and segment2 (the second and third helices, the 15-33 residues) by overlapping the second helix (the 15-21 residues). The overlapped regions between segment1 and segment2 ensure the complete folding when the both segments fold. The partial Cα RMSDs were defined by Cα atoms of each segment and adopted as RCs. For the divisions of segments, refer to Supporting Information (Figure S8). In terms of the partial Cα RMSDs, distributions on the two-dimensional conformational subspace were generated and initial structures were selected from the two-dimensional distributions. To select initial structures, a similar way in the PC subspace was adopted. In the present applications, the high-dimensional distribution in the PC subspace as used in the previous demonstrations were replaced with the two-dimensional distribution, (RMSD1, RMSD2), where RMSD1 and RMSD2 mean the partial Cα aaaaaaaaaa, RMSD for segmetn1 and segment2. In terms of the two-dimensional distributions, an average, (RMSD1 aaaaaaaaaa), was calculated and utilized as an origin of the two-dimensional subspace, i.e. (RMSD1 − RMSD1 aaaaaaaaaa, RMSD2 aaaaaaaaaa). Then, a norm was calculated for each snapshot, and a snapshot with the largest norm RMSD2 − RMSD2 was selected as the 1st initial structure. For the remaining initial structures, subsequent snapshots were 10

ACS Paragon Plus Environment

Page 11 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

selected based on the sum of inner products defined by Eq. (6) using the two-dimensional distribution, aaaaaaaaaa, RMSD2 − RMSD2 aaaaaaaaaa). (RMSD1 − RMSD1 Figure 8 shows the profiles of distributions in the suibspace spanned by the set of partial Cα RMSDs with the projections of initial structures selected at each cycle, i.e. Figures 8(a-f) for trp-cage and 8(g-l) for villin. Judging from the projections, the edges of distributions were selected as the initial structures in the subspace. The distributions were expanded to the conformational area of the native states according to the cycles, and SDS successfully sampled the native structures. The simulation time to sample the native states (Cα RMSD < 1.0 Å) were 70 ns (the 7th cycle) for the trp-cage and 230 ns (the 23th cycle) for villin. The snapshots with the minimum Cα RMSDs were superimposed with the PDB structures (Figures 4(d-e)). Herein, we would like to discuss the conformational sampling efficiencies for the applications to protein folding. For trp-cage, in our previous study, a conventional (500-ns) MD simulation at 300 K was started from the fully extend structure51 and it was demonstrated that the relatively long-time MD simulation failed to sample the native state in implicit solvent, indicating that it might be difficult to find the native state of trp-cage with the conventional MD simulation. In contrast, SDS successfully sampled the native state and the broad conformational subspace with the same computational cost (50 cycles, i.e. 500 ns), as shown in Figures 8(a-f). For villin, the conformational area searched by SDS during the 50 cycles was almost consistent with the computational result from the µs-order (an aggregated 8.0 µs) replica exchange MD in implicit solvent.50 In the present applications, the geometry of secondary structures of native states were utilized to define the partial Cα RMSDs for segments, meaning that our simulations did not rigorously correspond to blind predictions of the native states. Therefore, toward the blind predictions, one might need to propose an appropriate set of RCs for representing protein folding, especially the local formations of secondary structures. Generally, it might be difficult to characterize protein folding with the PCA, since locally correlated motions relevant for the formations of secondary structures might not be identified by the PCA. However, α helix and β sheet, and loop regions are empirically deuced from amino-acid sequences. By incorporating the secondary structure information to the local formation of the component might be efficient for protein folding processes of unknown structures. This extension will be performed in our future work. IV. Discussion A. Conformational subspaces sampled by SDS after reproducing structural transitions After reproducing the structural transitions between the open and closed states, conformational resampling was initiated from outer conformational subspaces separated from the metastable states (see Figures 3(c-f) and 3(i-l)). These initial structures might be energetically unfavorable states. Generally, conformational resampling from initial structures with high energies tends to converge to one of the neighboring local minima. This result occurs because the conformational subspaces searched by SDS do not keep expanding toward outer conformational subspaces. Finally, conformational sampling by SDS tends to

11

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 31

converge to relatively valid conformational subspaces after reproducing biologically relevant structural transitions as the cycles continue. SDS does not continuously sample conformational subspaces with extremely high energies after reproducing structural transitions, because conformational resampling is based on regenerations of initial velocities in restarting multiple short-time MD simulations at 300 K, i.e., conformational sampling in SDS can be performed at room temperature (300 K), leading to more rapid convergence than other enhanced sampling methods, such as the REMD13 and the McMD14, that are based on MD simulations at high temperatures. These enhanced sampling methods might needlessly oversample conformational subspaces with high energies, resulting in slow convergence upon actual application. B. Conformational sampling efficiency depending on dimensionality of principal modes To address the dependence of conformational sampling efficiency on the dimensionality of PMs, the upper bound of the contribution defined by Eq. (3) was changed to 10 PMs from 30 PMs tested in section III. According to the reduced (10) PMs, additional trials were performed starting from the open and closed forms of MBP. To determine the speeds for reproducing structural transitions between the open and closed states, profiles of minimum values of Cα RMSD measured from the open and closed forms were monitored (Figures 5(a-b)). These figures indicate that the minimum values of Cα RMSD gradually converged within 1.0 Å during the first 25 cycles for the open-closed transitions. These profiles indicate that the speeds for reproducing the structural transitions were nearly identical between the both trials using the different sets of PMs (30 PMs and 10 PMs), concluding that differences in the upper bound of the contribution defined by Eq. (3) do not significantly affect the conformational sampling efficiency. Next, we address how the contributions of PMs defined by Eq. (3) changed according to the cycles. To perform SDS efficiently, it might be meaningful to confirm if a set of PMs is sufficient for describing the overall conformational subspace. Figures 5(c-d) show the profiles of the contributions of PMs defined by Eq. (3) for the trials using the different sets of PMs, where the red and blue lines correspond to the trials using the 30 PMs and 10 PMs. As shown in Figures 5(c-d), the contributions of PMs defined by Eq. (3) rapidly increased during the first few cycles starting from low values and converged to around one, showing that an initial set of PMs was sufficient for describing the overall conformational subspaces. The contributions of PMs rapidly increase according to the cycles, indicating that the upper bound of the contribution defined by Eq. (3) does not significantly affect the conformational sampling efficiency. Instead of PMs, one might simply go with the inner products in the configuration space without the operation of the PC transformation. However, the computational cost to calculate the inner products of all possible pairs of structures in trajectories increase, if the configuration space was considered in terms of all the atomic coordinates. Judging from the computational cost, the dimensional reduction by the PC transformation might be convenient. Furthermore, structurally uncorrelated initial structures are efficiently selected by the inner product defined by PCs, since an accumulation of structural changes due to small

12

ACS Paragon Plus Environment

Page 13 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

vibrational motions exceeds that of large amplitude motions resulting in a decrease of sampling efficiency for the structural transition. To confirm a conformational sampling efficiency using inner products between atomic configurations (Cα coordinates), an additional trial of SDS (Ninitial = 100, DOFs = 3 × 370) was performed starting from the open form of MBP. Supporting Information (Figure S2) shows profiles of the minimum RMSD from the closed form (RMSDclosed) during the 20 cycles. As shown in Figure S2, the additional trial in terms of the inner products between atomic configurations failed to sample the closed form of MBP (green), while the original SDS in terms of the inner products of the PCs successfully sampled the closed form of MBP within RMSDclosed < 1.0 Å (red). Therefore, it might be better to utilize a set of PCs, although it is possible to employ the atomic coordinates. Generally, it might be difficult to specify a comprehensive set of CVs. In the present study, the 30 PMs might be sufficient for reproducing the structural transitions of MBP. However, it does not ensure that if the set of 30 PMs was an optimized set of CVs. Compared to the full-atomic coordinates, the PCs reduced by the PCA are surely convenient for saving the computational costs. For a more efficient conformational sampling, a combination of CV search algorithm with SDS might be feasible. As an automatic specification of CVs, we recently proposed an efficient algorithm based on clustering of distributions projected onto a conformational subspace.52 In the proposed algorithm, an initial set of CVs is dynamically changed during conformational resampling to search an optimized set of CVs automatically, resulting in finding degrees of freedom (DOFs) for slow dynamics. To specify an exhaustive and comprehensive set of CVs, a combination of the efficient CV search algorithm with SDS might be valuable. C. How many initial structures are necessary for the cycles of conformational resampling To define how many initial structures can be considered to be sufficient for conformational sampling, average structures are constructed from different numbers of initial structures, and then the values of RMSDs were monitored. Supporting Information (Figure S3) shows a profile of RMSD between the subsequent average structures according to the numbers of initial structures used for the constructions. As shown in S3, the value of RMSD rapidly decreased within the 20 initial structures and well converged even if 50 initial structures were considered, indicating the small number of initial structures might be sufficient for performing SDS. To confirm the conformational sampling efficiency compared to the original trial (Ninitial = 100 and DOFs = 30 PCs), an additional trial (Ninitial = 50 and DOFs = 30 PCs) was started from the open form of MBP and continued to the 50th cycle. Supporting Information (Figure S4(a)) shows a profile of the minimum Cα RMSD measure from the closed form (RMSDclosed) during the 50 cycles, indicating that the small number of initial structures might be sufficient for promoting the structural transitions from the open to closed states (RMSDclosed ~ 1.0 Å). Supporting Information (Figure S4(b)) shows the projection of accumulated trajectories during the 50 cycles. Compared to the projections of accumulated trajectories from the original trial (Ninitial = 100 and DOFs = 30 PCs), the additional trial (Ninitial = 50 and DOFs = 30 PCs) 13

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 31

covered almost the same conformational area (Figures S1(a) and S4(b)). Therefore, the minimum number of initial structures might be determined by estimating RMSDs among average structures constructed from a different number of initial structures selected at each cycle. D. Comparisons of conformational sampling efficiency between SDS and the previous methods In the previous methods, initial structures were selected by several measures. For instance, the non-targeted Parallel Cascade Selection MD (nt-PaCS-MD)53 utilizes fluctuations of proteins to select initial structures in terms of deviations from an average structure. As a common issue for selecting initial structures, our previous methods tend to select similar snapshots because it might be difficult to reflect structural information rigorously. Thus, SDS is essentially different from our previous methods because the current method focuses on structural dissimilarity in selecting initial structures. Therefore, initial structures might be widely distributed onto a conformational subspace than our previous methods, which promotes structural transitions of proteins through conformational resampling from the widely distributed initial structures. To quantitatively show a difference of conformational sampling efficiency between SDS and the nt-PaCS-MD, it was confirmed that how initial structures were widely distributed around their average structures by defining structural dissimilarity for selected initial structures, which was defined as follows: An average structure was constructed from selected initial structures at every cycle. Then, RMSDs measured from the average structure were calculated for each initial structure, and the values of RMSD were averaged. Herein, the average RMSD was defined as structural dissimilarity for characterizing how the selected initial structures were widely distributed around the average structures. Figure 6(a) shows a profile of the structural dissimilarities for the trials of SDS (the red, green, and blue lines) and nt-PaCS-MD (the magenta line) starting from the open form of MBP. Judging from the profiles, the former rapidly increased. In contrast, the latter failed to increase, indicating that SDS widely distributed initial structures around the average structures than the nt-PaCS-MD. Therefore, SDS might efficiently promote structural transitions based on the algorithm proposed in the present study. As shown in Figure 6(a), structural dissimilarity increased in the trials of SDS using a large number of initial structures. In particular, structural dissimilarity of the trial (Ninitial = 100) rapidly increased at the early cycles and continued to increase for 50 cycles (the red line). In contrast, that of the trial (Ninitial = 10) almost linearly increased according to the cycles (the blue line). Interestingly, that of the trial (Ninitial = 50) first overlapped with blue line (Ninitial = 10), and then it caught up with the red line (Ninitial = 100) at the 30th cycle and showed the same tendency after there. For the normalized computational cost (50 ns) of the trials using 10, 50, 100 initial structures, the RMSD values and their standard deviations were as follows: 2.5±0.8 Å (at the 50th cycle), 1.4±0.4 Å (at the 10th cycle) and 1.3±0.4 Å (at the 5th cycle), respectively. Herein, the trials using the small number of initial structures (Ninitial = 10) had the larger RMSD among all the accumulated times as shown in Figure 6(b), and those using large numbers of initial structures (Ninitial = 50 and Ninitial = 100) showed almost the same tendency. Therefore, for an efficient conformational sampling, the trial (Ninitial = 10) is performed during a first few tens of cycles (for example, 20 ns) and then switched to the 14

ACS Paragon Plus Environment

Page 15 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

trial (Ninitial = 50 or 100) because the both trials showed the same conformational sampling efficiency at the same computational cost. As a possible strategy, a number of initial structures might be dynamically changed depending on cycles. For instance, one launches SDS with a small number of initial structures and switches the number of initial structures to a large number at appropriate cycles by referring to structural dissimilarity among initial structures. E. Convergence of the conformational sampling It is difficult to judge if conformational resampling sufficiently covered accessible phase spaces rigorously. As a way to confirm the statistical convergence, it might be convenient to count finite cells in a conformational subspace searched by SDS. To define the finite cells, the conformational subspace is discretized into grids. If the number of grids converged to a constant, the statistical converge is attained, and one can terminate the cycle of conformational resampling. Otherwise, one should repeat the cycle of conformational resampling until the number of grids sufficiently converged. For checking the statistical convergence, the grids were counted for the trials of SDS (Ninitial = 100 and DOFs = 30 PCs) starting from the open and closed forms of MBP. Supporting Information (Figure S5(a)) shows profiles of the number of grids during the 50 cycles, indicating that the conformational sampling well converged at the 50th cycle because the numbers of grids became constants. To further judge convergence of the conformational sampling, the minimum and maximum potential energies of initial structures might also be one of useful indicators. Herein, Supporting Information (Figure S5(b)) shows the profiles of minimum and maximum values of the initial structures selected at each cycle for the trial of SDS starting from the open form of MBP (Ninitial = 100 and DOF = 30 PCs). Judging from the profiles, the minimum potential energy converged after the 5th cycle and almost unchanged afterward, indicating that SDS sufficiently sampled structures with low potential energies. For the maximum potential energy, the profile gradually increased, indicating that SDS might oversample structures with high potential energies after the target structure was sampled. Therefore, monitoring the minimum and maximum potential energies in addition to the conformational area might be valuable to judge if the cycle of conformational sampling is continued. In SDS, the conformational sampling is basically based on short-time MD simulations restarting from selected initial structures, meaning that some unselected structures might not be sampled at later cycles, i.e. the present method might be biased to some structures in the selection process. In principal, SDS selects rarely occurring states as initial structures based on structural dissimilarity indices. Therefore, the conformational sampling efficiency might depend on the arrangement of selected initial structures. To validate the conformational search, one might need to launch distinct trials of SDS initiating from different sets of initial arrangements and check both conformational areas and energy profiles as described before. F. Dependency of the number of principal components

15

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 31

In SDS, the number of PCs, kmax, is also an important parameter for performing the efficient conformational sampling. Generally, the PCs corresponding to large eigenvalues represent large-amplitude of transitions as collective motions. In actual applications, one has to set a large number of PCs that includes the large-amplitude collective motions by referring to the contribution of PCs (Figures 5(c-d)). However, some PCs associated with small amplitudes sometimes play key roles in initiating structural transitions. In the present demonstration, the open-closed transitions of MBP might correspond to large-amplitude motions, and fortunately, the PCs chosen in this work well represented the collective motions. However, kmax might be extremely large for systems with drastic conformational changes that are not described by the PCA, since that is merely a harmonic approximation to the conformational changes around an averaged structure. Large kmax may result in an increase of conformational space volume to be explored that makes the convergence of SDS slow. Thus, alternative analytical and/or numerical methods might be considered instead of the PCA to identify the locally correlated motions that are difficult to be represented with the PMs such as local folding of secondary structures in a given protein. To identify the locally correlated motions, several analytical methods have been proposed so far. For instance, the independent component analysis (ICA)54-56 and the relaxation mode analysis (RMA)57,58 might be suitable for identifying essential modes for representing the small amplitudes related to the locally correlated motions. In relation to the secondary structure information as noted in the previous section, the partial PCA for each segment (α helix and β sheet, and loop regions) might be also helpful to find local conformational changes at the early stage of protein-folding process, i.e. the secondary structure formations. Therefore, the essential modes identified by the ICA, the RMA, and the partial PCA might be utilized to define relevant conformational subspaces to include the locally correlated motions. In the present demonstrations for Trp-cage and villin, we have adopted the partial RMSDs in SDS as RCs and succeeded in finding the native structures. Since the partial RMSDs are not always known and not optimal for arbitrary proteins, one might learn by trial and error for the optimal choice of CVs deduced from these methodologies and their combinations. As numerical methods, machine learning techniques might be powerful for finding an optimal choice of CVs.59-61 To realize a highly accurate blind prediction of possible structures of proteins, SDS with dynamically variable CVs using some of above methods are desirable. G. The signal-to-noise ratio for confirming plausible structural transitions In the present demonstrations, a target structure of transition was already known, i.e. it is possible to judge if the trajectories in the cascading short-time MD simulations included the successful transition. However, the target structure is generally unknown, and thus it is necessary to confirm if plausible structural transitions were promoted in the cascading MD trajectories. In this regard, the signal-to-noise ratio (S/N) of structural transitions in the multiple cascading trajectories would be a critical property. Therefore, it might be meaningful to provide a profile of percentage of the target structure over all the snapshots, which could be defined as snapshots within Cα RMSD measured from the target structure (RMSDtarget < 2.0 Å) in the cascading MD trajectories. Supporting Information (Figure S6) shows profiles of the S/N for (red) the open 16

ACS Paragon Plus Environment

Page 17 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

to closed and (green) the closed to open transitions of the trails of SDS (Ninitial = 100 and DOFs = 30 PCs). As shown in S6, the values of S/N increased during the first few cycles, indicating that SDS successfully produced the plausible structural transitions to the target structures. Herein, we would like to discuss the non-converging behavior of the decreasing S/N rations. The non-converging behavior indicates that one may encounter a difficulty in identifying whether the target structure is sampled or not where it is unknown a priori, since there is no way to judge if the plausible structures corresponds to the target structure or just an intermediate one. For this issue, clustering of trajectories might characterize highly populated structures as candidates of the plausible structures. Additionally, unbiased MD simulations are expected to restart from the candidates for evaluating their structural stabilities as identifications of the target structures. One has to note that the non-converging behavior of S/N rations might hold an inherent feature that SDS tries to sample wider conformational spaces after the conformational sampling has achieved the target state. Actually, in our demonstrations, SDS sampled the wider conformational subspaces after the target structures were already sampled (see Figures 3(e-f), 3(k-l)). Therefore, one needs to confirm convergence of conformational distributions (Figure S5) in addition to the checking of highly populated structures during the cycles. As a more quantitatively approach, free energy calculations might be reasonable to judge if the highly populated structures correspond to the target structure. H. How reassignments of initial velocities affect thermal distributions It is important to mention about effects caused by the reassigned initial velocities in restarting short-time MD simulations. In SDS, initial structures might be selected from non-equilibrium trajectories due to the reassignments of initial velocites, which would be related to the physical nature of the (de-) accelerating a process of the conformational sampling. Supporting Information (Figure S7) shows a distribution of time at which initial structures were selected from the shot-time (100-ps) trajectories of the trial for MBP starting from the open form (Ninitial = 100 and DOFs = 30 PCs). Judging from the distribution, initial structures were intensively selected from the first 20 ps, indicating that the reassignments of initial velocities might impose perturbation to the thermal distribution. However, average temperatures during the MD simulations were well controlled. Actually, the average temperature was 299.8 K (±1.3 K) for the trial, meaning that the reassignments of initial velocities might not significantly destroy the original distribution. However, to obtain a thermal distribution, one might combine the Markov State Model with SDS because the present method does not rigorously provide a thermal distribution. To obtain the thermal distribution, it might be noted that the first few picosecond snapshots including non-equilibrium processes might be excluded in constructing a transitional matrix of the Markov State Model. Finally, the thermal distribution might be estimated by solving an eigenvalue problem through diagonalization of the transition matrix. In addition to the reassignments of initial velocities, we would like to discuss a simulation length of MD simulation. For stable proteins, a short-time MD simulation may only sample a small area around the 17

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 31

free energy minima, finding structures that are highly similar with each other and also similar to the starting structure. This is because one has to set an appropriately long simulation length in conformational resampling. To set the optimal simulation length for SDS, one might address histograms of initial structures for time, i.e. they were selected from which timescales. In the present demonstrations, the 100-ps MD simulations were performed as conformational resampling. Judging from the histogram (Figure S7), most of initial structures were selected from the first 20 ps, indicating that the 100-ps MD simulations might be sufficiently long for promoting the structural transitions in the simulated systems. However, one needs to take longer time than 100 ps depending on biological targets, i.e. one has to carefully address an optimal simulation length. Ideally, one needs to perform preliminary MD simulations with different simulation lengths before production runs to determine optimal simulation lengths. For this issue, we have already discussed in our previous study (see the reference62 in detail). V. Conclusion In the present study, SDS was proposed as an efficient conformational search method for promoting structural transitions of proteins. In SDS, the following conformational resampling is repeated as follows: (I) initial structures are selected for diverse distributions at the edges of a conformational subspace, and (II) structural transitions are promoted through the restarting of multiple short-time MD simulations from them. Together, cycles of (I) and (II) lead to the intensive promotions of structural transitions. As a demonstration, SDS was applied to MBP in explicit solvent to reproduce the large-amplitude domain motions between the open and closed states. SDS successfully reproduced these biologically relevant rare events within nanosecond-order simulation times. In more detail, starting from the open and closed forms, SDS successfully reproduced the structural transitions between these metastable states within 25 cycles (a total 250 ns of simulation time). For reference, conventional long-time (500 ns) MD simulations under NPT (300 K and 1 bar) were started from the open and closed forms, and their conformational areas were compared. For the conformational sampling efficiency, the subspaces searched by SDS were broader than those searched by the conventional 500-ns MD simulations, indicating that SDS can reproduce the structural transitions of a given protein with reasonable computational costs without referring to any target structure. In particular, the 500-ns MD simulations starting from the open form failed to reproduce the structural transition within 500 ns. In contrast, SDS could successfully reproduce the structural transition with less simulation time (250 ns). In addition to the open-closed motions of MBP, SDS was applied to folding processes of the fast-folding proteins (chignolin, trp-cage, and villin) and successfully sampled their native states within nanosecond-order simulation time (500 ns). Acknowledgement This work was supported by Japan Society for the Promotion of Science (JSPS) Research Fellowship for Young Scientist (No. 15J03797) and JSPS KAKENHI Grant-in-Aid for Young Scientists (A) (No. 18

ACS Paragon Plus Environment

Page 19 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

16H06164), and by Grant-in-Aid for Scientific Research of the Innovative Areas “Dynamic Ordering and Biofunction”, “Photosynergetics”, and “3D Active-Site Science” (Nos. 26102525, 26107004, and 26105012) from JSPS. The computations were performed using the COMA / HP-PACS at the CCS, University of Tsukuba. Supporting Information Assessments of the conformational sampling efficiencies for the trials of SDS with different computational conditions, including Figures S1−S8. This information is available free of charge via the Internet at http://pubs.acs.org

19

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 31

Figure 1 (a)

Flowchart of SDS. Initial structures (seeds) for the 1st cycle (seed-1, seed-2, … , and seed-N) are selected from initial trajectories based on structural dissimilarity. Multiple short-time MD simulations are started from the seeds via renewed initial velocities. Multiple trajectories generated by conformational resampling, (Traj-1, Traj2, … , Traj-N), are gathered and used to distribute (N) seeds for the next cycle. To promote structural transitions for a given protein, cycles of conformational resampling are repeated. (b)

Conceptual diagram for selecting (N) initial structures in SDS. A case for selecting five initial structures (N = 5) among all structures (M = 8) is shown. After defining the origin as an average structure, a snapshot with the maximum norm (I1) is selected as the 1st initial structure. The following initial structures (I2, I3, I4, I5) are sequentially selected based on structural dissimilarity. The sum of inner products between PCs, defined by Eqs. (1-6), are used for the sequential selections of initial structures.

20

ACS Paragon Plus Environment

Page 21 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 2

(a) Histograms of trajectories starting from the open form of MBP as a function of the 1st PC (PC1) at the 1st (red), 25th (green), and 50th (blue) cycles. (b) Histogram of the trajectory generated by the 500-ns MD simulation starting from the open form of MBP. (c) Histograms of trajectories starting from the closed form of MBP at the 1st (red), 25th (green), and 50th (blue) cycles. (d) Histogram of trajectory generated by the 500-ns MD simulation starting from the closed form of MBP.

21

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 31

Figure 3

(a-f) Projections of initial structures selected at each cycle starting from the open form of MBP. A set of Cα RMSDs measured from the open and closed forms was utilized for defining the two-dimensional conformational subspace. The blue dots correspond to the projections of initial structures selected at the (a) 1st, (b) 10th, (c) 20th, (d) 30th, (e) 40th, and (f) 50th cycles. The red and green dots correspond to projections of the trajectories generated by the 500-ns and 10-ns MD simulations, respectively, starting from the open and closed forms of MBP. (g-l) Projections of seeds selected at each cycle starting from the closed form of MBP. The blue dots correspond to projections of seeds selected at the (a) 1st, (b) 10th, (c) 20th, (d) 30th, (e) 40th, and (f) 50th cycles. The red and green dots correspond to projections of the trajectories generated by the 10-ns and 500-ns MD simulations, respectively, starting from the open and closed forms of MBP.

22

ACS Paragon Plus Environment

Page 23 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 4

(red) Snapshots with the minimum values of Cα RMSD measured from (a) the closed form of MBP (0.78 Å), (b) the open form of MBP (0.89 Å), (c) the native structure of chignolin (0.20 Å), (d) the native structure of trp-cage (0.38 Å), and (e) the native structure of villin (0.64 Å), respectively. (blue) Each snapshot was the PDB structures. All the figures were created with VMD.63

23

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 31

Figure 5

Profiles of the minimum Cα RMSDs starting from the (a) open and (b) closed forms of MBP. (c) Profiles of the contributions of PMs defined by Eq. (3) starting from the (c) open and (d) closed forms of MBP. The red and green lines in (a-d) represent the profiles of distinct trials performed in the different conformational subspaces spanned by the top 30 and 10 PMs, respectively.

24

ACS Paragon Plus Environment

Page 25 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 6

(a) Profiles of structural dissimilarity of initial structures for the trials of SDS starting from the open form of MBP in the 30 PCs subspace using (red) 100, (green) 50, and (blue) 10 initial structures. (magenta) Structural dissimilarity of initial structures for the nt-PaCS-MD using 50 initial structures. (b) Profiles of structural dissimilarity of initial structures versus the accumulated simulation time until 50 ns. The trials of SDS starting from the open form of MBP in the 30 PCs subspace using (blue) 10, (green) 50, and (red) 100 initial structures.

25

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 31

Figure 7

(a) Profiles of histograms projected onto Cα RMSD measured from the native structure of chignolin, calculated by the accumulated trajectories until the 6th (blue), 8th (green), and 10th (red) cycles. (b) Minus logarithmic plot of the population of accumulated trajectories until the 10th cycle projected onto the two-dimensional conformational subspace spanned by the hydrogen bond distances of backbone, (Asp3N-Gly7O, Asp3N-Thr8O). The locations of native and mis-folded states of chignolin are pointed with arrows.

26

ACS Paragon Plus Environment

Page 27 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 8

(red) Projections of accumulated trajectories of (a-f) trp-cage and (g-l) villin onto the two-dimensional subspace spanned by the set of partial Cα RMSDs for each segment, until the 5th (a, g), the 10th (b, h), the 15th (c, i), the 20th (d, j), the 25th (e, k), and the 50th (f, l) cycles. (blue) Projections of initial structures selected at each cycle.

27

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 31

References (1)

Shaw, D. E.; Maragakis, P.; Lindorff-Larsen, K.; Piana, S.; Dror, R. O.; Eastwood, M. P.; Bank, J. A.; Jumper, J. M.; Salmon, J. K.; Shan, Y. B.; Wriggers, W. Science 2010, 330, 341-346.

(2)

Lindorff-Larsen, K.; Piana, S.; Dror, R. O.; Shaw, D. E. Science 2011, 334, 517-520.

(3)

Mashimo, T.; Fukunishi, Y.; Kamiya, N.; Takano, Y.; Fukuda, I.; Nakamura, H. J. Chem. Theory Comput. 2013, 9, 5599-5609.

(4)

Laio, A.; Parrinello, M. Proc. Natl. Acad. Sci. U. S. A. 2002, 99, 12562-12566.

(5)

Laio, A.; Gervasio, F. L. Rep. Prog. Phys. 2008, 71.

(6)

Raiteri, P.; Laio, A.; Gervasio, F. L.; Micheletti, C.; Parrinello, M. J. Phys. Chem. B 2006, 110, 3533-3539.

(7)

Babin, V.; Roland, C.; Sagui, C. J. Chem. Phys. 2008, 128, 134101.

(8)

Darve, E.; Rodriguez-Gomez, D.; Pohorille, A. J. Chem. Phys. 2008, 128, 144120.

(9)

Abrams, C. F.; Vanden-Eijnden, E. Proc. Natl. Acad. Sci. U. S. A. 2010, 107, 4961-4966.

(10)

Amadei, A.; Linssen, A. B. M.; deGroot, B. L.; vanAalten, D. M. F.; Berendsen, H. J. C. J. Biomol. Struct. Dyn. 1996, 13, 615-625.

(11)

Zhang, Z. Y.; Shi, Y. Y.; Liu, H. Y. Biophys. J 2003, 84, 3583-3593.

(12)

Hamelberg, D.; Mongan, J.; McCammon, J. A. J. Chem. Phys. 2004, 120, 11919-11929.

(13)

Sugita, Y.; Okamoto, Y. Chem. Phys. Lett. 1999, 314, 141-151.

(14)

Nakajima, N.; Nakamura, H.; Kidera, A. J. Phys. Chem. B 1997, 101, 817-824.

(15)

Sugita, Y.; Kitao, A.; Okamoto, Y. J. Chem. Phys. 2000, 113, 6042-6051.

(16)

Fukunishi, H.; Watanabe, O.; Takada, S. J. Chem. Phys. 2002, 116, 9058-9067.

(17)

Liu, P.; Kim, B.; Friesner, R. A.; Berne, B. J. Proc. Natl. Acad. Sci. U. S. A. 2005, 102, 13749-13754.

(18)

Okumura, H. J. Chem. Phys. 2008, 129, 124116.

(19)

Yamamori, Y.; Kitao, A. J. Chem. Phys. 2013, 139, 145105.

(20)

Okamoto, Y. J. Mol. Graph. Model. 2004, 22, 425-439.

(21)

Harada, R.; Takano, Y.; Baba, T.; Shigeta, Y. Phys. Chem. Chem. Phys. 2015, 17, 6155-6173.

(22)

Baba, T.; Boero, M.; Kamiya, K.; Ando, H.; Negoro, S.; Nakano, M.; Shigeta, Y. Phys. Chem. Chem. Phys. 2015, 17, 4492-4504.

(23)

Baba, T.; Harada, R.; Nakano, M.; Shigeta, Y. J. Comput. Chem. 2014, 35, 1240-1247.

(24)

Kamiya, K.; Baba, T.; Boero, M.; Matsui, T.; Negoro, S.; Shigeta, Y. J. Phys. Chem. Lett. 2014, 5, 1210-1216.

(25)

Caves, L. S. D.; Evanseck, J. D.; Karplus, M. Protein Sci. 1998, 7, 649-666.

(26)

Kitao, A.; Hirata, F.; Go, N. J. Phys. Chem. 1993, 97, 10223-10230.

(27)

Kitao, A.; Go, N. Curr. Opin. Struct. Biol. 1999, 9, 164-169. 28

ACS Paragon Plus Environment

Page 29 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(28)

Sharff, A. J.; Rodseth, L. E.; Spurlino, J. C.; Quiocho, F. A. Biochemistry 1992, 31, 10657-10663.

(29)

Quiocho, F. A.; Spurlino, J. C.; Rodseth, L. E. Structure 1997, 5, 997-1015.

(30)

Berendsen, H. J. C.; Grigera, J. R.; Straatsma, T. P. J. Phys. Chem. 1987, 91, 6269-6271.

(31)

Duan, Y.; Wu, C.; Chowdhury, S.; Lee, M. C.; Xiong, G. M.; Zhang, W.; Yang, R.; Cieplak, P.; Luo, R.; Lee, T.; Caldwell, J.; Wang, J. M.; Kollman, P. J. Comput. Chem. 2003, 24, 1999-2012.

(32)

Miyamoto, S.; Kollman, P. A. J. Comput. Chem. 1992, 13, 952-962.

(33)

Hess, B.; Bekker, H.; Berendsen, H. J. C.; Fraaije, J. G. E. M. J. Comput. Chem. 1997, 18, 1463-1472.

(34)

Bussi, G.; Donadio, D.; Parrinello, M. J. Chem. Phys. 2007, 126, 014101.

(35)

Nose, S.; Klein, M. L. Mol. Phys. 1983, 50, 1055-1076.

(36)

Parrinello, M.; Rahman, A. J. Appl. Phys. 1981, 52, 7182-7190.

(37)

Essmann, U.; Perera, L.; Berkowitz, M. L.; Darden, T.; Lee, H.; Pedersen, L. G. J. Chem. Phys. 1995, 103, 8577-8593.

(38)

Abraham, M. J.; van der Spoel, D.; Lindahl, E.; Hess, B,and the GROMACS development team, GROMACS User Mannual version 5.0.7, www.gromacs.org (2015).

(39)

Honda, S.; Yamasaki, K.; Sawada, Y.; Morii, H. Structure 2004, 12, 1507-1518.

(40)

Neidigh, J. W.; Fesinmeyer, R. M.; Andersen, N. H. Nat. Struct. Biol. 2002, 9, 425-430.

(41)

Chiu, T. K.; Kubelka, J.; Herbst-Irmer, R.; Eaton, W. A.; Hofrichter, J.; Davies, D. R. Proc. Natl. Acad. Sci. U. S. A. 2005, 102, 7517-7522.

(42)

Berendsen, H. J. C.; Postma, J. P. M.; Vangunsteren, W. F.; Dinola, A.; Haak, J. R. J. Chem. Phys. 1984, 81, 3684-3690.

(43)

Case, D. A.; Babin, V.; Berryman, J. T.; Betz, R. M.; Cai, Q.; Cerutti, D. S.; Cheatham III, T. E.; Darden, T. A.; Duke, R. E.; Gohlke, H.; Goetz, A. W.; Gusarov, S.; Homeyer, N.; Janowski, P.; Kaus, J.; Kolossváry, I.; Kovalenko, A.; Lee, T. S.; LeGrand, S.; Luchko, T.; Luo, R.; Madej, B.; Merz, K. M.; Paesani, F.; Roe, D. R.; Roitberg, A.; Sagui, C.; Salomon-Ferrer, R.; Seabra, G.; Simmerling, C. L.; Smith, W.; Swails, J.; Walker, R. C.; Wang, J.; Wolf, R. M.; Wu, X.; Kollman, (2014), AMBER 14, University of California, San Francisco.

(44)

Moritsugu, K.; Terada, T.; Kidera, A. J. Chem. Phys. 2010, 133, 224105.

(45)

Torrie, G. M.; Valleau, J. P. J. Chem. Phys. 1977, 66, 1402-1408.

(46)

Torrie, G. M.; Valleau, J. P. J. Comput. Phys. 1977, 23, 187-199.

(47)

Kitao, A.; Harada, R.; Nishihara, Y.; Tran, D. P. Aip Conf. Proc. 2016, 1790, 020013.

(48)

Harada, R.; Kitao, A. J. Phys. Chem. B 2011, 115, 8806-8812.

(49)

Satoh, D.; Shimizu, K.; Nakamura, S.; Terada, T. FEBS Lett. 2006, 580, 3422-3426.

(50)

Lei, H. X.; Wu, C.; Liu, H. G.; Duan, Y. Proc. Natl. Acad. Sci. U. S. A. 2007, 104, 29

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

4925-4930. (51)

Harada, R.; Nakamura, T.; Shigeta, Y. Bull. Chem. Soc. Jpn. 2016, 89, 1361-1367.

(52)

Harada, R.; Nakamura, T.; Shigeta, Y. Chem. Phys. Lett. 2015, 639, 269-274.

(53)

Harada, R.; Kitao, A. J. Chem. Theory Comput. 2015, 11, 5493-5502.

(54)

Naritomi, Y.; Fuchigami, S. J. Chem. Phys. 2013, 139, 215102.

(55)

Naritomi, Y.; Fuchigami, S. J. Chem. Phys. 2011, 134, 065101.

(56)

Sakuraba, S.; Joti, Y.; Kitao, A. J. Chem. Phys. 2010, 133, 185102.

(57)

Mitsutake, A.; Takano, H. J. Chem. Phys. 2015, 143, 124111.

(58)

Mitsutake, A.; Iijima, H.; Takano, H. J. Chem. Phys. 2011, 135, 164102.

(59)

Jo, T.; Hou, J.; Eickholt, J.; Cheng, J. L. Sci. Rep. 2015, 5, 17573.

(60)

Gromiha, M. M.; Huang, L. T. Curr Protein Pept. Sc. 2011, 12, 490-502.

(61)

Blazewicz, J.; Lukasiak, P.; Wilk, S. Control. Cybern. 2007, 36, 183-201.

(62)

Harada, R.; Nakamura, T.; Shigeta, Y. J. Comput. Chem. 2016, 37, 724-738.

(63)

Humphrey, W.; Dalke, A.; Schulten, K. J. Mol. Graph. Model. 1996, 14, 33-38.

30

ACS Paragon Plus Environment

Page 30 of 31

Page 31 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

For Table of Contents Only

Structural Dissimilarity Sampling (SDS) is proposed as an efficient conformational search method to promote structural transitions essential for the biological functions. In SDS, initial structures are selected based on structural dissimilarity, and conformational resampling is repeated. Conformational resampling from the diversely distributed initial structures would rapidly expand edges of conformational distributions, promoting structural transitions of proteins intensively.

31

ACS Paragon Plus Environment