Characterizing Conformational Dynamics of Proteins Using

‡Center for Biophysics and Quantitative Biology, University of Illinois, ... §National Center for Supercomputing Applications, University of Illino...
1 downloads 0 Views 4MB Size
Subscriber access provided by UNIV OF NEW ENGLAND ARMIDALE

Article

Characterizing Conformational Dynamics of Proteins Using Evolutionary Couplings Jiangyan Feng, and Diwakar Shukla J. Phys. Chem. B, Just Accepted Manuscript • DOI: 10.1021/acs.jpcb.7b07529 • Publication Date (Web): 02 Jan 2018 Downloaded from http://pubs.acs.org on January 3, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

The Journal of Physical Chemistry B is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Characterizing Conformational Dynamics of Proteins Using Evolutionary Couplings Jiangyan Feng† and Diwakar Shukla∗,†,‡,¶,§ †Department of Chemical and Biomolecular Engineering, University of Illinois, Urbana, IL, 61801, USA ‡Center for Biophysics and Quantitative Biology, University of Illinois, Urbana, IL, 61801, USA ¶Department of Plant Biology, University of Illinois, Urbana, IL, 61801, USA §National Center for Supercomputing Applications, University of Illinois, Urbana, IL, 61801, USA E-mail: [email protected]

1

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Abstract Understanding of protein conformational dynamics is essential for elucidating molecular origins of protein structure-function relationship. Traditionally, reaction coordinates i.e. some functions of protein atom positions and velocities have been used to interpret the complex dynamics of proteins obtained from experimental and computational approaches such as molecular dynamics simulations. However, it is non-trivial to identify the reaction coordinates a priori even for small proteins. Here, we evaluate the power of evolutionary couplings (ECs) to capture protein dynamics by exploring their use as reaction coordinates, which can efficiently guide the sampling of a conformational free energy landscape. We have analyzed ten diverse proteins and show that a few ECs are sufficient to characterize complex conformational dynamics of proteins involved in folding and conformational change processes. With the rapid strides in sequencing technology, we expect that ECs could help identify reaction coordinates a priori and enhance the sampling of the slow dynamical process associated with protein folding and conformational change.

Introduction Proteins are dynamic entities which orchestrate diverse cellular function by either acquiring a specific folded structure or by adapting their conformation in response to an external stimulus. 1–6 For example, G-protein-coupled receptors (GPCRs), the largest group of drug targets, modulate their transitions from inactive to active conformational states in response to the drugs. 7 Therefore, a full understanding of a protein function requires interpretable description of complex dynamics between different functional states of a protein. Molecular dynamics (MD) simulations have been largely used to study protein dynamics in silico allowing us to examine folding and conformational change mechanisms in atomistic detail. 8 One major problem with the MD simulations of proteins is the high dimensionality of the conformational ensemble. Therefore, reaction coordinates have been introduced to overcome 2

ACS Paragon Plus Environment

Page 2 of 26

Page 3 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

this limit by acting as one-dimensional coordinates to quantify the progress between key conformational states of the system. Reaction coordinates provide information about the functionally relevant motions of the protein and eliminate noise due to the functionally irrelevant motions. Ideally, reaction coordinates coupled with exhaustive sampling in MD simulations could facilitate physically meaningful insights into the reaction mechanism. 9,10 Reaction coordinates are also used in MD simulations to efficiently guide the sampling of protein conformational landscapes. 11–14 They are also invaluable in the construction of Markov State Models (MSMs), a powerful analytical tool parameterized by MD data for building kinetic models of protein conformational dynamics. MSMs discretize the high-dimensional protein state space into many microstates and then calculate the transition probabilities between these states. Protein dynamics is therefore described as the conversion between states and the information about reaction mechanism can be inferred from the transition probabilities. 8 Reaction coordinates help in defining the state space by providing the suitable metrics for the dimensionality reduction of raw MD trajectories. However, identification of optimal reaction coordinates remains challenging for complex systems, due to the large number of degrees of freedom associated with protein dynamics and limited availability of experimental structural and dynamic information. 15 For simple or well-studied systems, the reaction coordinates constructed based on physical intuition (e.g. the root-mean-square deviation (RMSD) from crystal structures, the fraction of native contacts Q and the backbone dihedral angles) could provide an interpretation of the reaction mechanism. 16,17 Aided by recent advances in computer hardware and software, theoretically robust methods have been developed to identify and validate reaction coordinates a posteriori after an exhaustive sampling of the complex system dynamics is achieved. Time-structure based independent component analysis (tICA), a dimensionality-reduction algorithm, can be used to extract reaction coordinates from the high-autocorrelation linear combinations of large dataset of observables. 18,19 Despite its popularity, it is hard to interpret tICA-derived reaction coordinates due to the abstract linear combinations of large number of observables, which may be computationally

3

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

intractable for large set of observables. Another drawback of tICA is that it can only provide a reliable estimate of a reaction coordinate once detailed sampling of conformational space is available. Therefore, it would be ideal to identify reaction coordinates a priori in order to guide the sampling of conformational space in both biased and unbiased MD simulations. Evolutionary couplings (ECs) refer to the correlations between coevolving residue pairs constrained to preserve three-dimensional (3D) structure and biological functions of proteins. Recent advances in methods for estimating evolutionary couplings 20–23 have provided an exciting possibility of inferring relevant reaction coordinates from ECs a priori. 24 Due to the explosive growth in genome sequencing and statistical analysis, ECs can now be extracted from multiple sequence alignments (MSA) using maximum entropy global probability model (known as Potts model). Unlike local models, global probability models effectively reduce confounding correlations such as transitive correlations, the strong indirect correlation introduced between two residues by direct correlation of the two residues with the same third residue. 21 ECs can be divided into structural ECs that provide information about physical contacts in 3D structure and functional ECs that indicate the functional interactions in proteins. Previous studies have demonstrated that ECs are sufficient for computing protein 3D structures from amino acid sequences. 22,23,25–29 In addition, ECs are also predictive of functional features including multimeric contacts, protein-substrate interactions, or alternate conformations. 21,30 The biological premise is that residues coevolve to maintain structural and functional integrity. 31 Recently, Shamsi and Moffett et al. have shown that inferring reaction coordinates from the top 800 ranked evolutionarily coupled residues accelerates the sampling of activation pathways of β 2 AR, folding of WW domain and dimerization of subunits of the E. coli molybdopterin synthase (MoaD and MoaE) (in press). Although the relationship between ECs and protein dynamics has been mentioned in their work and few papers in literature, 24,30,32–34 direct evidence of ECs ability to capture the slowest conformational dynamics of proteins has not been provided. Furthermore, it is not clear how many ECs are required to capture protein dynamics since taking all the ECs as metrics could

4

ACS Paragon Plus Environment

Page 4 of 26

Page 5 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

hamper computational efficiency due to the excessive statistical noise. Here, we aim to capture the conformational dynamics of proteins using optimal number of ECs. We analyze a set of ten diverse proteins ranging in size from 35 to 505 residues. 7,35,36 Seven proteins are small folding proteins where as three proteins (Calmoduliun, β 2 AR and PepTSo ) serve as an example of conformational change process between different folded states of a protein. For each protein, we construct MSMs using different number of top ranked ECs as features. The ability of ECs to capture slow dynamical processes is quantified using the variational principle introduced by No´e and N¨ uske, which enables us to rank the set of ECs in terms of their ability to capture the kinetic processes observed in the protein dynamics datasets. 37,38 We report that a few ECs (1-10% of all residue pairs) suffice to capture the complex dynamics associated with protein folding and conformational changes and that the number of ECs required is related to the correlation between ECs. We also elucidate the specific dynamic process captured by ECs and identify functionally and structurally important ECs based on the distance change between the evolutionarily coupled residues during the conformational change process.

Methods Protein dataset. The MD datasets of seven fast-folding proteins ranging from 35 to 80 amino acid residues in length were generated by Lindorff-Larsen et al. 35 using MD simulations in explicit solvent. The MD dataset for apo calmodulin (CaM) was obtained by Shukla et al. 36 via MD simulations in explicit water at constant pressure and temperature of 1 atm and 298 K respectively. The MD dataset for β 2 -adrenergic receptor (β 2 AR) was generated by Dror et al. 7 via MD simulations in explicit lipids (1 bar, 310 K). The MD dataset for PepTSo was generated by Selvam et al. via MD simulations in constant NPT conditions at 1 atm and 300 K (unpublished results, simulation details for PepTSo are summarized in Supporting Information). 39 The protein and trajectory information is detailed in Table S1.

5

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

For this paper, we retain trajectory frames of seven fast-folding proteins, apo CaM, β 2 AR and PepTSo at every 2 ns, 0.5 ns, 1.8 ns and 0.7 ns, respectively. Evolutionary couplings. ECs are the distances between coevolving residue pairs conserved across evolution due to the functional and structural constraints. Two challenges of mining evolutionary information from MSA include the availability of enormous sequence information and the interference of transitive correlations. In these years, global probability method (Potts model) is replacing local statistical method due to the power of minimizing transitive effects by computing the dependencies of residue pairs simultaneously instead of assuming the independence between residue pairs. Aided by explosive sequence accumulation and global statistical model, the detection of EC information is becoming efficient and accurate. In this work, ECs were extracted using pseudolikelihood (PLM) method 40 on EVCouplings web server (http://evfold.org) 22,41 with default settings. Markov State Models (MSMs). The basic approach for building MSMs is summarized below. The detailed theoretical framework underlying MSMs could be found in recent reviews on the topic. 8 A MSM provides a network of different conformation states with transition matrix (T ) representing the probability of memoryless jump between states in short time interval (lag time τ ). 42 If P (t0 ) represnts the vector of state populations at time t = t0 , then the evolution of the system is determined as Eq.1 MSM construction begins with a large set of MD trajectories. Suitable metrics (e.g. dihedral angles) are chosen to featurize raw trajectories from Cartesian coordinates into different features. Optionally, the dimensionality of datasets can be further reduced to few slow collective variables using principal component analysis (PCA) or time-structure based independent component analysis (tICA), the variant of PCA. 18 Both PCs and tICs are linear combinations of the features with the weight representing the relevance of each feature. The difference between PCA and tICA is that PCA finds high-variance linear combinations of features while tICA is based on maximizing autocorrelation time. Next, every state is assigned to a microstate using clustering algorithms such as Minibatch-Kmeans and K-centers. 43 In the end, a functioning

6

ACS Paragon Plus Environment

Page 6 of 26

Page 7 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

MSM is generated. The thermodynamics and kinetic information of protein folding and conformational changes can then be obtained from equilibrium population and transition matrix, respectively. To satisfy the Markov assumption, MSM lag times in this work are chosen from refs 7,35,36 such that the probability of being at state j at time t is independent of previous history. All the MSM analyses in this work were conducted using MSMbuilder3.8 where the models are constrained to satisfy detailed balance. 44

P (t0 + nτ ) = [T (τ )]n P (t0 )

(1)

where T ji is transition probability from state i to state j, τ is the lag time of the model, T is transition matrix, P (t0 + nτ ) is state populations vector at time t0 + nτ and P (t0 ) is the initial vector of state populations at time t0 . Variational principle. The transition matrix T in Eq. 1 can be decomposed into a complete set of eigenvectors and eigenvalues (Eq. 2). However, identification of true eigenvectors is highly nontrivial and must be approximated starting with a trial set of eigenvectors. The generalized Rayleigh quotient(GMRQ) has been introduced to measure the quality of MSMs based on the fact that the sum of estimated eigenvalues is bound from the sum of real eigenvalues (Eq. 3). 37,38 This enables us to optimize MSMs by maximizing GMRQ score through systematic search of internal parameters. In this paper, we denote the best MSMs as the one with the highest GMRQ score. For all the analyses, GMRQ scores are based on the slowest 2 timescales of the MSMs because the process associated with protein folding or conformational change is generally much slower than other processes in the system.

T (τ )ψi = ψi λi

(2)

where λi and pairwise ψi are ith largest eigenvalues and corresponding eigenvectors of transition matrix T (τ ). These eigenvalues λi are real and indexed in decreasing order. The corresponding eigenvectors ψi describe the specific transitions between state given time scales

7

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 26

converted from λi . The largest eigenvalue λ1 is and the corresponding eigenvector ψ1 is related to the stationary distribution of the system. The first m eigenvectors are representative of the m slowest dynamic processes.

GM RQ =

m X i=1

ˆi < λ

m X

λi

(3)

i=1

ˆ i and λi stands for the estimated where GMRQ represents the generalized Rayleigh quotient, λ and real eigenvalues, respectively. GMRQ therefore represents the score of MSMs. Cross-validation. We employed cross-validation to avoid the statistical noise added by overfitting. The dataset is split into a training set and a test set. MSMs are constructed from the training set and then evaluated based on the test set. This ensures that the evaluated dynamics are corresponding to the true dynamics since these dynamics are present both in training set and test set. Procedure. First, we subsampled the MD trajectories and featurized the trajectories using the alpha carbon distances between evolutionarily coupled residues. Then, we further discretized phase space using tICA and clustered them into microstates with Mini-Batch K-Means clustering. In this work, we randomly searched around 800 models to optimize three internal parameters including tICA components, tICA lag time and number of clusters (Table S2, Table S3). The highest mean test scores of five cross-validation iterations are assumed as the best GMRQ score under given conditions. To measure the quality of MSMs featurized with increasing number of top ranked ECs, we assume the highest score obtained by featurizing MD datasets with alpha carbons distances of all residue pairs as upper bound score for each protein. And 95% of upper bound score is considered as acceptable score. We evaluate the ability of ECs to represent protein dynamics based on the deviation of the corresponding score from acceptable score.

8

ACS Paragon Plus Environment

Page 9 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Results and Discussion Few ECs can capture the complex dynamics associated with protein folding and conformational change. To probe the role of ECs in characterizing protein dynamics, we assume that ECs are capturing protein dynamics if their highest GMRQ score reach 95% of the GMRQ score achieved by using the distances between all the residue pairs as representative features. GMRQ score represents the sum of eigenvalues of the MSM transition probability matrix. 37,38 A higher GMRQ score for an MSM indicates that the MSM is able to capture the slow timescale processes (i.e. folding or conformational change) from the underlying protein dynamics. For each protein, we selected top ranked ECs, inferred from EVfold PLM webserver, as representative features to build MSMs and then searched for the highest ranked MSMs among more than 800 models using osprey package. 45 Here, we show that few ECs suffice to capture the protein folding and conformational change process (Table 1, Figure 1, Figure S1, Figure S2). Furthermore, we find that MSMs built with few ECs (less than 6 % of all the residue pairs) for 6 of the 7 proteins involved in folding process are even better than those built with the fraction of native contacts Q (Table S4, Figure 1, Figure S1, Figure S2), which is considered as an excellent reaction coordinate for protein folding. 46 Table 1 summarizes the number of top ranked ECs needed to capture protein dynamics, which is a small fraction of the total number of residue pairs in a protein. For 9 of the 10 proteins, the number of ECs required is less than 10% of all the residue pairs. The finding that few ECs are sufficient to capture slow dynamics also addresses the practical difficulty of using ECs due to the rapid decay of true positive rate. Since only few ECs with high evolutionary coupling score are included, false positive ECs with low evolutionary coupling score will be eliminated in the first place (Figure S10). We find that GMRQ score of all the ten proteins increases rapidly and then remains constant with increasing number of top ranked ECs (Figure 1, Figure S1, Figure S2). This is because sufficient information about slow timescale processes observed during simulations is already provided by the previous top ranked ECs and limited new information is provided by the addition of more ECs. Inter9

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 26

estingly, the incorporation of some ECs reduced GMRQ score (Figure 1, Figure S1, Figure S2), which is reasonable because such ECs may enrich for functionally important information such as oligomerization and protein-substrate interactions instead of physical contacts in 3D structure. This result opens up the possibility that few top ranked ECs can capture conformational dynamics associated with protein dynamics and serve as a priori reaction coordinate. Table 1: Number of ECs Needed to Reach 95% of Maximum Score Protein Villin WW domain NTL9 BBL Protein B Protein G Lambda-repressor Calmodulin β 2 AR PepTSo

PDB ID Residues Number of ECs needed 2F4K 35 73 2F21 35 5 2HBA 39 55 2WXC 47 17 1PRB 47 29 1MIO 56 14 1LMB 80 14 1CFD 67 27 3P0G 284 17 4UVM 505 5

Fraction of all residue pairs% 12.2689 0.8403 7.4224 1.5726 2.6827 0.9091 0.4430 1.2212 0.0423 0.0039

Elucidating the role of high ranked ECs in protein dynamics. From Table 1, it can be seen that only top 5 ECs are needed to capture the folding process of WW domain and the conformational changes of the membrane peptide transporter, PepTSo . Therefore, we analyzed the conformational dynamics dataset of these proteins to understand the slow dynamical processes on the underlying free energy landscape. To investigate the dynamics captured by ECs, we selected the best models featurized using the top 5 ECs for WW domain and PepTSo , and projected the free energy landscapes along the first two tICs. Figure 2 illustrates that the first two tICs track the formation of two hairpins during the folding process of WW domain, which is consistent with the folding pathways of WW domain. 51 In Figure 2f, all the top 5 ECs are found to constrain the 3D structure of WW domain. EC2 which contributes most to tIC1 is controlling hairpin 1 formation while EC5 that contributes 10

ACS Paragon Plus Environment

Page 11 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Figure 1: Exponents of GMRQ score with increasing number of top ranked ECs. (a) WW domain (PDB: 2F21 47 ), (b) Protein B (PDB: 1PRB 48 ), (c) Lambda-repressor (PDB: 1LMB 49 ), (d) BBL (PDB: 2WXC 50 ). The red dash lines (maximum score) represent the highest score using all the alpha carbon distances between residue pairs as featurization metrics while the green dash lines (acceptable score) represent 95% of the maximum score. The yellow dash lines (Q score) represent the highest score using the fraction of native contacts Q as the featurization metric. The yellow, blue and red stars represent the minimum number of top ranked ECs needed to reach Q score, acceptable score and maximum score, respectively. For these folding proteins, reaching the acceptable score only necessitates few ECs. To amplify the difference between GMRQ score, the exponents of GMRQ score are used as y-axis. most to tIC2 is related to the hairpin2 formation. EC3 and EC4 do not contribute to the first and second tICs indicating that these couplings are potentially associated with fast dynamical modes of the protein. Figure 3 provides an interpretation of the first two tICs for the conformatonal changes of PepTSo . The progress along the first two tICs leads to the formation of inward facing state of 11

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2: The folding process of WW domain (PDB: 2F21 47 ) captured by top 5 ECs. (a) Free energy landscape projected along first two tICs. Three internal parameters for best MSMs were chosen via variational principle: tICA components =3, tICA lag time = 146 ns, and number of clusters = 485. (b) tIC weight of top 5 ECs, representing the relevance of the top 5 ECs. (c) 3D structure for the folded state (cyan) superimposed with folded crystal structure (PDB: 2F21 47 ) with RMSD=0.445 nm. (d)-(e) 3D structure for unfolded state 1 and unfolded state 2, respectively. (f) Visualization of top 5 ECs and two hairpins related to the folding pathways in the folded state. PepTSo . In Figure 3a, the conformational free energy landscape of the PepTSo along the top two tICS shows multiple metastable states representing the inward-facing, outward-facing and occluded states of the transporter. The structure of the most populated state (Figure 3c) is compared to the crystal structure of the inward facing state of PepTSo (PDB: 4UVM 52 ). The gap at the cytoplasmic side of the inward facing state (state 1) is obviously larger than the other two states while the gap at the periplasmic side is smaller than the other two states (Figure 3c-e). From Figure 3b, we find EC1, EC2, EC4 which contribute least to tICs are not informative of the slowest conformational change of PepTSo and are mainly associated with contacts required to fold the membrane protein. EC5 and EC3 which contribute largely to tICs represent the motion of the gating residues on the periplasmic side. This indicates that 12

ACS Paragon Plus Environment

Page 12 of 26

Page 13 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Figure 3: The conformational changes process of PepTSo (PDB: 4UVM 52 ) captured by top 5 ECs. (a) Free energy landscape projected along first two tICs. Three internal parameters for best MSMs were chosen via variational principle: tICA components =2, tICA lag time = 8.4 ns, and number of clusters = 200. (b) tIC weight of top 5 ECs, representing the relevance of the top 5 ECs. 3D structure for the (c) state 1 (cyan), (d) state 2 (magenta) and (e) state 3 (green) imposed on the inward facing crystal structure (PDB: 4UVM 52 ). ECs can help capture the slow conformational change dynamics but the structural/folding ECs and functional ECs should be distinguished to better capture the protein dynamics. The role of functional vs structural ECs in conformational change. The constraint of stable structure and protein function causes the co-variation of coupling residues. Previous studies indicate that ECs not only provide valuable information about the physical contacts in 3D structure but also can reveal functionally important residues involved in alternate conformations, ligand binding and intermolecular interactions. 28 Examples of functional ECs include the residue pair interaction in the Rev nucleation site of HIV, 28 where

13

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

top ranked ECs infer the coupling between Rev Arg39 and RRE A102 on stem IIB, which is consistent with experimental results. In order to identify functional and structural ECs, we calculated the distances between all the evolutionarily coupled residues for ten proteins and categorized them into three different groups: functional ECs, structural ECs and background noise (Figures 4-5, Figures S3-S5). We chose 0.8 nm as the cutoff between the structural ECs that are close in 3D structure and the functional ECs that are far apart in 3D structure (Figure 4a, 4b and Figure 5a) for PepTSo and WW domain (Figures 4-5). The results for other proteins are included in the supporting information. Clearly, there are a large number of ECs experiencing significant changes in the distances from IF to OF conformational state of PepTSo , which are related to the gating residues in the periplasmic side or in the cytoplasmic side (Figure 4(c)) while the others maybe informative of structural information. Regarding the folding process of WW domain, several distances between evolutionarily coupled residues decrease which maybe informative of structural information while few number of the distances increased which is likely due to the choice of the reference unfolded structure. To further explain functional ECs, we plot contact maps of different conformational states and visualize ECs exclusive to each state for PepTSo , β 2 AR and Calmodulin (Figure S7-S9). We notice that these ECs contain information about different conformational states of the protein and therefore may help capture conformational changes. Overall, our analysis of ECs provides a chance to identify structural residues that correspond to physical contacts in 3D structure and functional residues that can be transformed into distance metrics to capture protein conformational changes for the unresolved proteins of interest. These results also indicate that need for the development of improved algorithms for distinguishing functional couplings from structural couplings. For example, the residue pairs with a distance greater than 0.8 nm leads to a constraint violation in the structure prediction using ECs as constraint. However, our results indicate that such residue pairs could play a role in the conformational dynamics of proteins.

14

ACS Paragon Plus Environment

Page 14 of 26

Page 15 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Figure 4: Identification of structural ECs and functional ECs for PepTSo . (a) Alpha carbon distances between evolutionarily coupled residues in inward facing state (IF, PDB: 4UVM 52 ), (b) alpha carbon distances between evolutionarily coupled residues in outward facing state (OF, MD structure 39 ), (c) the change of alpha carbon distances between evolutionarily coupled residues from IF to OF, (d) the visual representation of top 5 ECs in inward facing state. The background noise is estimated as the symmetric range around 0 with the width of the absolute value of minimum evolutionary coupling score.

15

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 5: Identification of structural ECs and functional ECs for WW domain. (a) Alpha carbon distances between evolutionarily coupled residues in folded state (PDB: 2F21 47 ) and (b) the change of alpha carbon distances between evolutionarily coupled residues from unfolded state to folded state. ECs with negative evolutionary coupling score are estimated as background noise.

Autocorrelation between ECs determines the number of ECs required for capturing protein dynamics. As shown in Table 1, the number of ECs required for characterizing conformational ensemble of different proteins could vary from 0.1-10% of all residue pairs. For example, Villin, the smallest protein (35 residues), requires highest number of ECS (73 ECs) to capture folding dynamics while WW domain, which has the similar number of residues as Villin requires the least number of ECs (5 ECs). This observation could be explained by estimating the degree of correlation between ECs (Figure 6, Figure S6) for a particular protein. The finding that the top 100 ECs are much more connected in WW domain than Villin (similar results are observed for proteins BBL and NTL9) suggests that proteins may need much less number of ECs if the ECs are highly correlated. This is reasonable because the strong correlation between ECs indicate that ECs share similar information and therefore only few top ranked ECs are needed to capture the protein dynamics. On the contrary, the poor correlation between ECs indicate that ECs provide independent information and therefore a large number of ECs are required to capture the slow dynamics of protein folding or conformational changes.

16

ACS Paragon Plus Environment

Page 16 of 26

Page 17 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Figure 6: Covariance matrix of distances between top 100 evolutionarily coupled residues. (a) Villin (top left, 73 ECs needed, PDB: 2F4K 53 ) and WW domain (bottom right, 5 ECs needed, PDB: 2F21 47 ), (b) NTL9 (top left, 55 ECs needed, PDB: 2HBA 54 ) and BBL (bottom right, 17 ECs needed, PDB: 2WXC 50 ). For proteins with highly connected ECs, fewer ECs are needed to capture dynamics.

Conclusions We report here that ECs derived using EVCoupling PLM method can capture the slow dynamical process of protein folding and conformation changes. We have shown that (1) few ECs can capture the complex conformational dynamics of proteins with good accuracy, (2) ECs not only provide 3D structural information but also can reveal the functional residues such as contacts that can distinguish the inward facing conformation from the outward facing conformation of PepTSo , and (3) proteins with highly correlated ECs usually require less ECs to capture the protein dynamics or the number of ECs required for capturing protein dynamics are dependent on the topology of the conformational landscape. This approach could be applied to predict the reaction coordinates a priori and therefore enhance the sampling of the conformational dynamics of unresolved proteins. One of the key limitations of this work and evolutionary coupling in general is the lack of distinction between ECs involved in different aspects of protein function such as ligand binding and intermolecular interactions. Although few top ranked ECs have been found to characterize protein dynamics, the number

17

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

of ECs can be further reduced by removing irrelevant ECs such as the structural ECs in the conformation changes of PepTSo . Further improvements can be made in the number of MSM timescales chosen for proteins undergoing conformational changes, which is currently chosen as two timescales for the consistency with folding proteins. Nevertheless, the success of representing protein dynamics using ECs in this work would be applicable for a large number of protein families with reliable sequence information. We anticipate that MD simulations of diverse proteins will benefit from the incorporation of evolutionary information inferred from the global maximum entropy model that minimizes the confounding correlations. Overall, the findings in this work highlight that the inclusion of ECs could help infer the reaction coordinates for protein dynamics a priori.

Supporting Information Tables S1-S4 (protein and trajectory information, osprey search range, best model hyperparameters and number of ECs needed to reach Q score) and Figures S1-S6 (raw GMRQ, exponents of GMRQ, identification of structural ECs and functional ECs for the rest proteins involved in folding process, identification of structural ECs and functional ECs for β 2 AR, identification of structural ECs and functional ECs for Calmodulin and covariance matrix of distances between top 100 evolutionarily coupled residues for Protein B, Protein G and Lambda-repressor), simulation details for PepTSo and definition of fraction of native contacts Q are available in the supporting information.

Acknowledgement The authors thank the Blue Waters sustained-petascale computing project, which is supported by the National Science Foundation (awards OCI-0725070 and ACI-1238993) and the state of Illinois, for providing computing time for this study. JF was supported by the graduate student fellowship from the Department of Chemical & Biomolecular Engineering 18

ACS Paragon Plus Environment

Page 18 of 26

Page 19 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

at University of Illinois at Urbana-Champaign.

References (1) Berg, J. M.; Tymoczko, J.; Gatto Jr, G. Stryer: Biochemistry; WH Freeman and Company: New York, 2002. (2) Shukla, D.; Meng, Y.; Roux, B.; Pande, V. S. Activation Pathway of Src Kinase Reveals Intermediate States as Targets for Drug Design. Nat. Commun. 2014, 5 . (3) Moffett, A. S.; Bender, K. W.; Huber, S. C.; Shukla, D. Molecular Dynamics Simulations Reveal the Conformational Dynamics of Arabidopsis thaliana BRI1 and BAK1 Receptor-Like Kinases. J. Biol. Chem. 2017, jbc–M117. (4) Shukla, D.; Lawrenz, M.; Pande, V. S. Elucidating Ligand-Modulated Conformational Landscape of GPCRs Using Cloud-Computing Approaches. Methods Enzymol. 2015, 557, 551–572. (5) Kohlhoff, K.; Shukla, D.; Lawrenz, M.; Bowman, G.; Konerding, D. E.; Belov, D.; B., A. R.; Pande, V. S. Cloud-based Simulations on Google Exacycle Reveal Ligand Modulation of GPCR Activation Pathways. Nat. Chem. 2014, 6, 15–21. (6) Vanatta, D. K.; Shukla, D.; Lawrenz, M.; Pande, V. S. A Network of Molecular Switches Controls the Activation of the Two-Component Response Regulator NtrC. Nat. Commun. 2015, 6, 7283. (7) Dror, R. O.; Arlow, D. H.; Maragakis, P.; Mildorf, T. J.; Pan, A. C.; Xu, H.; Borhani, D. W.; Shaw, D. E. Activation Mechanism of the β2-adrenergic Receptor. Proc. Natl. Acad. Sci. USA 2011, 108, 18684–18689. (8) Shukla, D.; Hernndez, C. X.; Weber, J. K.; Pande, V. S. Markov State Models Provide

19

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Insights into Dynamic Modulation of Protein Function. Acc. Chem. Res. 2015, 48, 414–422. (9) Peters, B. Common Features of Extraordinary Rate Theories. J. Phys. Chem. B 2015, 119, 6349–6356. (10) McGibbon, R. T.; Husic, B. E.; Pande, V. S. Identification of Simple Reaction Coordinates from Complex Dynamics. J. Chem. Phys. 2017, 146, 044109. (11) Beck, D. A.; Daggett, V. A one-dimensional Reaction Coordinate for Identification of Transition States from Explicit Solvent P Fold-Like Calculations. Biophys. J. 2007, 93, 3382–3391. (12) Rohrdanz, M. A.; Zheng, W.; Clementi, C. Discovering Mountain Passes via Torchlight: Methods for the Definition of Reaction Coordinates and Pathways in Complex Macromolecular Reactions. Ann. Rev. Phys. Chem. 2013, 64, 295–316. (13) Meshkin, H.; Zhu, F. Thermodynamics of Protein Folding Studied by Umbrella Sampling along a Reaction Coordinate of Native Contacts. J. Chem. Theo. Comput. 2017, 13, 2086–2097. (14) Domaski, J.; Hedger, G.; Best, R. B.; Stansfeld, P. J.; Sansom, M. S. Convergence and Sampling in Determining Free Energy Landscapes for Membrane Protein Association. J. Phys. Chem. B 2016, (15) Best, R. B.; Hummer, G. Reaction Coordinates and Rates from Transition Paths. Proc. Natl. Acad. Sci. USA 2005, 102, 6732–6737. (16) Rohrdanz, M. A.; Zheng, W.; Maggioni, M.; Clementi, C. Determination of Reaction Coordinates via Locally Scaled Diffusion Map. J. Chem. Phys. 2011, 134, 03B624. (17) Clementi, C.; Nymeyer, H.; Onuchic, J. N. Topological and Energetic Factors: What Determines the Structural Details of the Transition State Ensemble and “ En-route ” 20

ACS Paragon Plus Environment

Page 20 of 26

Page 21 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Intermediates for Protein Folding? An Investigation for Small Globular Proteins. J. Mol. Bio. 2000, 298, 937–953. (18) Schwantes, C. R.; Pande, V. S. Improvements in Markov State Model Construction Reveal many Non-native Interactions in the Folding of NTL9. J. Chem. Theo. Comput. 2013, 9, 2000–2009. (19) P´erez-Hern´andez, G.; Paul, F.; Giorgino, T.; De Fabritiis, G.; No´e, F. Identification of Slow Molecular Order Parameters for Markov Model Construction. J. Chem. Phys. 2013, 139, 07B604 1. (20) Hopf, T. A.; Sch¨arfe, C. P.; Rodrigues, J. P.; Green, A. G.; Kohlbacher, O.; Sander, C.; Bonvin, A. M.; Marks, D. S. Sequence Co-evolution Gives 3D Contacts and Structures of Protein Complexes. Elife 2014, 3, e03430. (21) Marks, D. S.; Hopf, T. A.; Sander, C. Protein Structure Prediction from Sequence Variation. Nat. Biotechnol. 2012, 30, 1072–1080. (22) Hopf, T. A.; Colwell, L. J.; Sheridan, R.; Rost, B.; Sander, C.; Marks, D. S. ThreeDimensional Structures of Membrane Proteins From Genomic Sequencing. Cell 2012, 149, 1607–1621. (23) Marks, D. S.; Colwell, L. J.; Sheridan, R.; Hopf, T. A.; Pagnani, A.; Zecchina, R.; Sander, C. Protein 3D Structure Computed from Evolutionary Sequence Variation. PLoS One 2011, 6, e28766. (24) Morcos, F.; Jana, B.; Hwa, T.; Onuchic, J. N. Coevolutionary Signals across Protein Lineages Help Capture Multiple Protein Conformations. Proc. Natl. Acad. Sci. USA 2013, 110, 20533–20538. (25) Sulkowska, J. I.; Morcos, F.; Weigt, M.; Hwa, T.; Onuchic, J. N. Genomics-Aided Structure Prediction. Proc. Natl. Acad. Sci. USA 2012, 109, 10340–10345. 21

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(26) Nugent, T.; Jones, D. T. Accurate De Novo Structure Prediction of Large Transmembrane Protein Domains Using Fragment-Assembly and Correlated Mutation Analysis. Proc. Natl. Acad. Sci. USA 2012, 109, E1540–E1547. (27) Taylor, W. R.; Jones, D. T.; Sadowski, M. I. Protein Topology from Predicted Residue Contacts. Protein Sci. 2012, 21, 299–305. (28) Weinreb, C.; Riesselman, A. J.; Ingraham, J. B.; Gross, T.; Sander, C.; Marks, D. S. 3D RNA and Functional Interactions from Evolutionary Couplings. Cell 2016, 165, 963–975. (29) Tang, Y.; Huang, Y. J.; Hopf, T. A.; Sander, C.; Marks, D. S.; Montelione, G. T. Protein Structure Determination by Combining Sparse NMR Data with Evolutionary Couplings. Nat. Methods 2015, 12, 751–754. (30) Sfriso, P.; Duran-Frigola, M.; Mosca, R.; Emperador, A.; Aloy, P.; Orozco, M. Residues Coevolution Guides the Systematic Identification of Alternative Functional Conformations in Proteins. Structure 2016, 24, 116–126. (31) Hayat, S.; Sander, C.; Marks, D. S.; Elofsson, A. All-Atom 3D Structure Prediction of Transmembrane β-Barrel Proteins from Sequences. Proc. Natl. Acad. Sci. USA 2015, 112, 5413–5418. (32) Morcos, F.; Pagnani, A.; Lunt, B.; Bertolino, A.; Marks, D. S.; Sander, C.; Zecchina, R.; Onuchic, J. N.; Hwa, T.; Weigt, M. Direct-Coupling Analysis of Residue Coevolution Captures Native Contacts across many Protein Families. Proc. Natl. Acad. Sci. USA 2011, 108, E1293–E1301. (33) Dago, A. E.; Schug, A.; Procaccini, A.; Hoch, J. A.; Weigt, M.; Szurmant, H. Structural Basis of Histidine Kinase Autophosphorylation Deduced by Integrating Genomics, Molecular Dynamics, and Mutagenesis. Proc. Natl. Acad. Sci. USA 2012, 109, E1733– E1742. 22

ACS Paragon Plus Environment

Page 22 of 26

Page 23 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

(34) Sutto, L.; Marsili, S.; Valencia, A.; Gervasio, F. L. From Residue Coevolution to Protein Conformational Ensembles and Functional Dynamics. Proc. Natl. Acad. Sci. USA 2015, 112, 13567–13572. (35) Lindorff-Larsen, K.; Piana, S.; Dror, R. O.; Shaw, D. E. How Fast-Folding Proteins Fold. Science 2011, 334, 517–520. (36) Shukla, D.; Peck, A.; Pande, V. S. Conformational Heterogeneity of the Calmodulin Binding Interface. Nat. Commun. 2016, 7 . (37) No´e, F.; Nuske, F. A Variational Approach to Modeling Slow Processes in Stochastic Dynamical Systems. Multiscale Model. Simul. 2013, 11, 635–655. (38) McGibbon, R. T.; Pande, V. S. Variational Cross-validation of Slow Dynamical Modes in Molecular Kinetics. J. Chem. Phys. 2015, 142, 03B621 1. (39) Selvam, B.; Shukla, D. Understanding the Conformational Diversity of Proton-Coupled Oligopeptide Transporter (POT) Family. Biophys. J. 2017, 112, 16a–17a. (40) Ekeberg, M.; L¨ovkvist, C.; Lan, Y.; Weigt, M.; Aurell, E. Improved Contact Prediction in Proteins: Using Pseudolikelihoods to Infer Potts Models. Phys. Rev. E 2013, 87, 012707. (41) Sheridan, R.; Fieldhouse, R. J.; Hayat, S.; Sun, Y.; Antipin, Y.; Yang, L.; Hopf, T.; Marks, D. S.; Sander, C. EVfold. org: Evolutionary Couplings and Protein 3D Structure Prediction. biorxiv 2015, 021022. (42) Bowman, G. R.; Pande, V. S.; No´e, F. An Introduction to Markov State Models and their Application to Long Timescale Molecular Simulation; Springer Science & Business Media: Heidelberg, Germany, 2013; Vol. 797. (43) Husic, B. E.; McGibbon, R. T.; Sultan, M. M.; Pande, V. S. Optimized Parameter

23

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Selection Reveals Trends in Markov State Models for Protein Folding. J. Chem. Phys. 2016, 145, 194103. (44) Beauchamp, K. A.; Bowman, G. R.; Lane, T. J.; Maibaum, L.; Haque, I. S.; Pande, V. S. MSMBuilder2: Modeling Conformational Dynamics on the Picosecond to Millisecond Scale. J. Chem. Theory Comput. 2011, 7, 3412–3419. (45) McGibbon, R. T.; Hern´andez, C. X.; Harrigan, M. P.; Kearnes, S.; Sultan, M. M.; Jastrzebski, S.; Husic, B. E.; Pande, V. S. Osprey: Hyperparameter Optimization for Machine Learning. J. Open Source Software 2016, 1, 00034. (46) Best, R. B.; Hummer, G.; Eaton, W. A. Native Contacts Determine Protein Folding Mechanisms in Atomistic Simulations. Proc. Natl. Acad. Sci. U. S. A. 2013, 110, 17874– 17879. (47) J¨ager, M.; Zhang, Y.; Bieschke, J.; Nguyen, H.; Dendle, M.; Bowman, M. E.; Noel, J. P.; Gruebele, M.; Kelly, J. W. Structure-Function-Folding Relationship in a WW domain. Proc. Natl. Acad. Sci. U. S. A. 2006, 103, 10648–10653. ` (48) Johansson, M. U.; de Chˆateau, M.; WikstroEm, M.; Fors´en, S.; Drakenberg, T.; Bj¨orck, L. Solution Structure of the Albumin-Binding GA Module: A Versatile Bacterial Protein Domain. J. Mol. Biol. 1997, 266, 859–865. (49) Beamer, L. J.; Pabo, C. O. Refined 1.8 ˚ A Crystal Structure of the λ Repressor-Operator Complex. J. Mol. Biol. 1992, 227, 177–196. (50) Neuweiler, H.; Sharpe, T. D.; Rutherford, T. J.; Johnson, C. M.; Allen, M. D.; Ferguson, N.; Fersht, A. R. The Folding Mechanism of BBL: Plasticity of Transition-State Structure Observed within an Ultrafast Folding Protein Family. J. Mol. Biol. 2009, 390, 1060–1073.

24

ACS Paragon Plus Environment

Page 24 of 26

Page 25 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

ˇ (51) Beccara, A. S.; Skrbi´ c, T.; Covino, R.; Faccioli, P. Dominant Folding Pathways of a WW domain. Proc. Natl. Acad. Sci. USA 2012, 109, 2330–2335. (52) Fowler, P. W.; Orwick-Rydmark, M.; Radestock, S.; Solcan, N.; Dijkman, P. M.; Lyons, J. A.; Kwok, J.; Caffrey, M.; Watts, A.; Forrest, L. R. et al. Gating Topology of the Proton-Coupled Oligopeptide Symporters. Structure 2015, 23, 290–301. (53) Kubelka, J.; Chiu, T. K.; Davies, D. R.; Eaton, W. A.; Hofrichter, J. Sub-Microsecond Protein Folding. J. Mol. Biol. 2006, 359, 546–553. (54) Cho, J.-H.; Meng, W.; Sato, S.; Kim, E. Y.; Schindelin, H.; Raleigh, D. P. Energetically Significant Networks of Coupled Interactions within an Unfolded Protein. Proc. Natl. Acad. Sci. U. S. A. 2014, 111, 12079–12084.

25

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

TOC Graphic

26

ACS Paragon Plus Environment

Page 26 of 26