A Maximum-Caliber Approach to Predicting Perturbed Folding Kinetics

We present a maximum-caliber method for inferring transition rates of a Markov state model (MSM) with perturbed equilibrium populations given estimate...
0 downloads 0 Views 3MB Size
Article pubs.acs.org/JCTC

A Maximum-Caliber Approach to Predicting Perturbed Folding Kinetics Due to Mutations Hongbin Wan, Guangfeng Zhou, and Vincent A. Voelz* Department of Chemistry, Temple University, Philadelphia, Pennsylvania 19122, United States S Supporting Information *

ABSTRACT: We present a maximum-caliber method for inferring transition rates of a Markov state model (MSM) with perturbed equilibrium populations given estimates of state populations and rates for an unperturbed MSM. It is similar in spirit to previous approaches, but given the inclusion of prior information, it is more robust and simple to implement. We examine its performance in simple biased diffusion models of kinetics and then apply the method to predicting changes in folding rates for several highly nontrivial protein folding systems for which non-native interactions play a significant role, including (1) tryptophan variants of the GB1 hairpin, (2) salt-bridge mutations of the Fs peptide helix, and (3) MSMs built from ultralong folding trajectories of FiP35 and GTT variants of the WW domain. In all cases, the method correctly predicts changes in folding rates, suggesting the wide applicability of maximum-caliber approaches to efficiently predict how mutations perturb protein conformational dynamics.



INTRODUCTION Markov state models (MSMs) of conformational dynamics have significantly advanced our understanding of biomolecular function.1−3 In the MSM approach, conformational dynamics is modeled as a kinetic network of transitions between metastable states.4,5 This analysis works particularly well with large collections of molecular simulation trajectories obtained from parallel distributed computing,6,7 which enable the identification of relevant metastable states and efficient sampling of the rates of transitions between them.8 A complete description of conformational dynamics comes from the estimates of the discrete-time transition probability pij, i.e., the probability of transitioning between states i and j in some time interval τ. Recently, so-called maximum-caliber approaches9 have been used to infer transition rates pij given only the equilibrium populations πi and constraints based on dynamical averages across the set of transitions.10−12 This method works by maximizing the path entropy : = −∑ πipij ln pij

A natural application of the maximum-caliber method is to understand how the thermodynamic landscapes of proteins shape their kinetics. For example, if we have an MSM of a given protein sequence, we would like to apply the maximum-caliber approach to a mutant protein sequence to infer differences in the folding kinetics directly from predicted changes in thermodynamics. This is easily stated, but it is hard to achieve good results in practice. As described by Dixit et al.,10 current maximum-caliber approaches require constraints on average dynamical quantities on which the results sensitively depend. For processes like diffusion, a restraint on the mean jump rate leads to very good estimates of microscopic rates, but for systems like proteins, the microscopic rates between metastable states depend sensitively on highly local features, such as the kinetic barriers separating states and their local diffusivity. Selection of meaningful dynamical restraints thus depends on obtaining a projection of the system that allows a good estimation of the effective kinetic distance between metastable states. In the work of Dixit et al.,10 simple geometric distance metrics failed to produce good maximum-caliber models from simulation data; instead, metrics that better captured the effective kinetic distance prevailed.10 Similarly, the accuracy of

(1)

i,j

of a Markov state model in the presence of constraints to enforce the conservation of transition probabilities and dynamical averages. The result is obtained using a Lagrange multiplier approach. © XXXX American Chemical Society

Received: September 25, 2016 Published: November 15, 2016 A

DOI: 10.1021/acs.jctc.6b00938 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation an MSM depends strongly on our ability to find coordinate transformations that correctly capture the kinetic distance between states.13 Fortunately, recent advances have enabled the development of methods that can capture kinetic distance very well by exploiting the variational approach to conformational dynamics (VAC).14−16 These methods include time-lagged independent component analysis (tICA), which finds the linear combinations of structural order parameters that best capture long-time-scale dynamics,17−19 and related diffusion map methods.20 Instead of enforcing constraints on average dynamical quantities as done by Dixit et al.,10 here we consider an unperturbed Markov state model that has already been constructed for a given protein, and we next want to understand the kinetic behavior of a perturbed MSM, say, for a related sequence variant. In this case, the existing MSM (if well-constructed) already tells us a great deal about the local kinetic environment shaping the rate between any two metastable states i and j; we know the unperturbed equilibrium populations πi and πj as well as rate estimates pij and pji and hence how the thermodynamic gradient between states i and j is related to the rate of population flow between them. It would be advantageous to use this information as a starting point for making rate estimates for similar systems. In this paper, we present a maximum-caliber approach for inferring transition rates of an MSM with perturbed equilibrium populations that is robust and simple to implement. We first show how the method performs in a simple biased diffusion system. We then apply the method to predict changes in folding rates for several highly nontrivial mini-protein folding systems in which non-native interactions play a significant role, including tryptophan mutants of the GB1 hairpin and saltbridge mutations of the Fs peptide. Finally, we apply the method to MSMs built from ultralong folding trajectories of the GTT and FiP35 variants of the WW domain from Shaw et al.21,22 The overall accuracy of these predictions suggests the wide applicability of maximum-caliber approaches to predict perturbed kinetics.

distribution should be as difficult as possible to discriminate from the prior distribution. The information on a distribution p with respect to q is quantified by relative entropy, or Kullback− Leibler divergence, D(p|q) = ∑x p(x) ln[p(x)/q(x)]. For a Markov chain, the relative path entropy is

⎛p ⎞ ij + = ∑ πipij ln⎜⎜ ⎟⎟ * i,j ⎝ pij ⎠

so maximizing the caliber is equivalent to minimizing the relative path entropy of the perturbed MSM with respect to the unperturbed MSM.23 To maximize the caliber, we turn to a Lagrange multiplier approach to minimize + with constraints on probability conservation for all states i,

∑ pij

−1=0 (4)

j

and constraints on detailed balance for all pairs of states i and j,

πipij − πjpji = 0

(5)

With these constraints, we can now find new rates pij that will minimize the relative entropy through a Lagrange multiplier approach. The Lagrange function to be minimized is 3=++

∑ vi(∑ pij i

− 1) +

j

∑ λij(πipij i,j

− πjpji )

(6)

where vi and λij are Lagrange multipliers. First, we solve for the optimal pij in terms of the unknown Lagrange multipliers by ∂3 setting ∂p = 0: ij

⎛p ⎞ ∑ πipij ln⎜⎜ ij* ⎟⎟ + vi ∂ (∑ pij − 1) ∂pij j i,j ⎝ pij ⎠ ∂ + ∑ λij(πipij − πjpji ) = 0 ∂pij i , j

∂ ∂pij



THEORY Suppose an MSM, defined by transition probabilities p*ij and equilibrium populations πi*, is perturbed such that new equilibrium populations πi are known. Our goal is to infer the new values of the transition probabilities pij. To do this, we propose maximizing the path entropy, or caliber * , of the perturbed MSM with respect to the unperturbed MSM: ⎛p ⎞ ij * = ∑ −πipij ln⎜⎜ ⎟⎟ * p i,j ⎝ ij ⎠

(3)

(7)

This simplifies to ⎛p ⎞ ij πi ln⎜⎜ ⎟⎟ + πi + vi + πi(λij − λji) = 0 * ⎝ pij ⎠

(8)

from which we obtain

pij = pij* e−Δijwi

(9)

where, for convenience, we define Δij = λij − λji and wi = e−1−vi/πi. To determine the values of the Lagrange multipliers λij and vi, we insert our expression for pij into the restraint equations (eqs 4 and 5). Using the constraints πi pij = πj pji, we find that

(2)

The extremum principle of maximizing the path entropy is the dynamical equivalent of maximizing the entropy (MaxEnt) in statistical thermodynamics. It can also be derived from an information theory perspective, with the distribution p*ij acting as a Bayesian prior and pij acting as a posterior distribution that results from learning new information about the equilibrium populations πi. A full discussion of these derivations is beyond the scope of this paper, and we refer the reader to the excellent review by Presse et al.9 From an information theory perspective, maximum caliber can be seen as a “minimum information” principle: upon learning new equilibrium populations πi, the posterior

πipij* e−Δijwi = πjp*ji e+Δijwj

(10)

from which we obtain e−Δij =

πjp*ji wj πipij* wi

(11)

From the constraint ∑j pij = 1, we find that B

DOI: 10.1021/acs.jctc.6b00938 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation

Figure 1. A maximum-caliber method predicts changes in relaxation times for biased 2D diffusion. (A) Biasing potential applied to a 2D diffusion MSM with 25 states, where b is the bias in units of kT. (B) Comparison of estimated rates between MSM states for two independent Monte Carlo trajectories of 106 steps. Differences are due to finite sampling error. (C) Comparison of estimated rates between MSM states for unbiased diffusion (b = 0) and biased diffusion (b = 2). (D) Comparison of biased rates with maximum-caliber predictions. (E) Maximum-caliber predictions of the slowest relaxation time scale shown as a function of applied bias. Predicted time scales are accurate up to 2 orders of magnitude in rate differences. For biases larger than ±6kT, finite sampling (106 steps) ultimately limits the accuracy of the maximum-caliber predictions. Error bars were computed from the standard deviation across five independent trials. (F) Estimates of MSM implied time scale spectra shown for unbiased diffusion (black), biased diffusion (blue, b = 2), and maximum-caliber predictions from the unbiased MSM.

wi =

1 * ∑j pij e−Δij



RESULTS Biased Diffusion Models. We first explore the performance of the maximum-caliber approach in simple biased twodimensional diffusion models. As a test system, we consider the diffusive dynamics of a particle over x, y ∈ (0, 5) in the presence of a bias potential

(12)

These expressions do not result in a closed-form solution but rather a simple iterative scheme that numerically converges. The iterative procedure begins with an initial estimate of wi (we have found that wi = 1 works well in all practical cases we have tried). We next iteratively estimate pij ← pij*

πjp*ji wj πipij* wi

b Vb(x , y) = − (⌊x ⌋ − 2)(⌊y ⌋ − 2) 4

wi

where b is the bias in units of kT (Figure 1A). To construct a Markov state model of the dynamics, a Monte Carlo algorithm was first used to generate a single long dynamic trajectory of particle diffusion. Proposed moves in x and y were normally distributed translations taken from N(μ = 0, σ = 0.2), accepted according to the Metropolis criterion. MSMs were constructed using a lag time of τlag = 25 steps, for which the dynamics is Markovian. Discrete MSM states were defined as the set of 25 1 × 1 squares tiling the x, y domain. The matrix of transition

(13)

and then wi ←

wi ∑j pij

(15)

(14)

repeating until the values are converged within the desired tolerance (10−15 in all of the results we show below). C

DOI: 10.1021/acs.jctc.6b00938 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation probabilities pij was computed using a maximum-likelihood estimator given the observed transition counts.24−26 Given “wild-type” unbiased diffusion (b = 0kT) MSMs built from trajectories of 106 steps, we examined the ability of our maximum-caliber approach to predict perturbed transition probabilities and implied time scales for biased diffusion (i.e., the “mutant”). The implied time scales of an MSM are τn = −τlag/ln μn, where μn are the eigenvalues of the transition probability matrix. The unbiased MSMs yielded transition probabilities pij* and equilibrium populations πi*, from which the biased populations were computed as πi =

πi* exp( −Vi ) ∑i πi* exp( −Vi )

graining, to achieve a metastable state decomposition that could accurately represent all four hairpin designs. To test our maximum-caliber approach, we used the equilibrium populations π*i and transition probabilities p*ij from a wild-type MSM to infer perturbed transition probabilities and folding kinetics of a mutant MSM with equilibrium populations πi. The four sequences (gb1, trpzip4, trpzip5, and trpzip6) provide 12 different pairwise predictions to test the maximum-caliber method. As a specific example, we show results for predicting the folding kinetics of trpzip4 from the MSM of trpzip6 (Figure 2B). Our previous simulation results showed that this V14W mutation induces a dramatic increase in the folding time by almost an order of magnitude. The maximum-caliber method predicts the same result. As in the 2D diffusion example above, maximum-caliber estimates of the transition probabilities are more correlated with the trpzip4 MSM (R2 = 0.905, rmsd(ln pij) = 0.816) than with the wild-type MSM (R2 = 0.883, rmsd(ln pij) = 0.904), although in both cases there is increased variability due to the complexity of the models (150 macrostates). The implied time scale predictions agree very well with the actual MSM time scales (Figure 2F). Fs Peptide Helix and Salt-Bridge Mutants. MSMs for saltbridge mutants of the Fs peptide (AceA5AAARAAAARAAAARAA-Nme) (Figure 3 top) provide another valuable test of the maximum-caliber approach. As described by Zhou and Voelz,28 eight Fs peptide helix variants, in which the three arginine residues were mutated in all possible combinations with glutamic acid, were simulated on Folding@home in explicit solvent to produce about 130 μs of total trajectory data per sequence. The glutamic acid mutations were designed to introduce potential salt-bridge interactions that can only be formed in unfolded states, inducing highly nontrivial changes in folding kinetics from non-native interactions. Like the hairpin systems above, the Fs peptide MSMs were constructed using tICA-based clustering of the combined trajectories of all sequences, producing a 1200-microstate MSM with a lag time of 5 ns. Optimal MSM construction parameters such as the tICA lag time, number of tICA components, number of microstates, and clustering method were chosen systematically using GMRQ variational cross-validation.16 In our original study, microstate MSMs were lumped into combined 40-macrostate models using the BACE algorithm;31 here we present 30-macrostate MSMs lumped using the same method that retain nearly identical kinetics. The additional lumping helps improve the overlap in metastable states for the eight sequences. The eight Fs peptide sequences provide a total of 54 different wild-type versus mutant predictions to test the maximumcaliber method. MSM folding times versus maximum-caliber predictions show good agreement across this large number of comparisons (Figure 3). The Maximum-Caliber Approach Predicts Changes in Folding Kinetics for a WW Domain Mutation. The WW domain is a small three-stranded β-sheet protein whose folding kinetics has been studied extensively by experiment32−35 and molecular simulation.8,21,22,36 Protein engineering studies have discovered many fast-folding variants of the WW domain, most notably the FiP35 sequence, which has a folding time of 13 μs.33 An even faster-folding (3 μs) variant of the WW domain was subsequently discovered by Piana et al.37 through molecular-simulation-based computational prediction. In the

(16)

As a detailed example, we present the maximum-caliber results for a bias of b = 2kT, shown in Figure 1. In this case, the maximum-caliber method predicts estimated transition probabilities that agree more closely with the estimates from the biased MSM (a correlation coefficient of R2 = 0.892 and a rootmean-square deviation (rmsd) of 0.681 for the predicted ln pij values) than unbiased estimates (R2 = 0.793, rmsd(ln pij) = 0.909). Maximum-caliber predictions for a range of bias potentials 0 < b < 10kT in both the forward and backward directions (i.e., predictions of unbiased transition rates from the biased MSM) agree well with the actual MSMs up to about 2 orders of magnitude in perturbed rates before succumbing to finite sampling error with trajectories of 106 steps (Figure 1E). Moreover, the spectra of implied time scales predicted by the maximum-caliber method agree very well with spectra from biased MSMs (Figure 1F). The Maximum-Caliber Approach Predicts Changes in Mini-Protein Folding Rates Due to Mutations. To test the ability of our maximum-caliber approach to predict changes in folding kinetics due to mutations, we utilized MSMs built in previous studies for tryptophan variants of the GB1 hairpin27 and salt-bridge mutations of the Fs peptide helix.28 A key finding in both of those studies was the importance of nonnative interactions in shaping folding rates and mechanisms. For example, according to a simple two-state model of folding, the greater stability of the trpzip4 hairpin compared with wildtype GB129 should confer a higher folding rate. Instead, both experiment30 and simulation models27 show that trpzip4 folds at a slower rate because of non-native interactions in the unfolded state. Thus, the prediction of the perturbed folding kinetics due to mutations should be highly nontrivial and provide a stringent test of the maximum-caliber approach. GB1 Hairpin and Tryptophan Variants. Four 150-macrostate MSMs, for GB1, trpzip4, trpzip5, and trpzip6 (Table 1 and Figure 2A), were constructed from over 9 ms of explicitsolvent trajectories simulated on the Folding@home distributed computing platform according to the methods described by Razavi and Voelz.27 These MSMs were carefully constructed from the combined trajectory data of all four sequences, using tICA-based clustering and subsequent macrostate coarseTable 1. Sequences of GB1 Hairpin Variants abbreviation

sequence

GB1 trpzip4 trpzip5 trpzip6

GEWTYDDATKTFTVTE GEWTWDDATKTWTWTE GEWTYDDATKTFTWTE GEWTWDDATKTWTVTE D

DOI: 10.1021/acs.jctc.6b00938 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation

Figure 2. A maximum-caliber method predicts changes in folding kinetics for tryptophan variants of the GB1 hairpin. (A) Visualization of conformations taken from the native macrostate of MSMs constructed for GB1 hairpin variants, as described by Razavi and Voelz.27 (B) Comparison of rates between MSM states for trpzip6 (unperturbed) and a similar MSM built from a matrix of (multinomial) resampled transition counts, showing the estimated effects of finite sampling error. (C) Comparison of rates between MSM states for trpzip6 (unperturbed) and trpzip4 (perturbed). (D) Comparison of the trpzip4 rates with maximum-caliber predictions from perturbation of the trpzip6 MSM. (E) Comparisons of the actual and predicted differences in the folding relaxation time scales across all 12 possible sequence comparisons, which agree over a range of more than an order of magnitude. (F) Estimates of trpzip6 (black) and trpzip4 (blue) MSM implied time scale spectra, shown with maximum-caliber predictions of trpzip4 time scales.

Construction of MSMs for WW Domain Folding. Trajectory data for the GTT WW domain came from two independent simulation trajectories with lengths of 651 and 486 μs performed at 360 K using the CHARMM22* force field and four additional simulations with lengths of 83, 118, 124, and 272 μs at 395 K using the AMBER ff99SB-ildn force field.22,37 Trajectory data for the FiP35 WW domain came from six ∼100 μs trajectories performed at 395 K.21,37 In all of the MSMs we describe below, we used tICA17 with a lag time of 10 ns to find the best low-dimensional subspace projection to perform kcenters clustering. All of the pairwise distances between Cα and Cβ atoms were used as the input coordinates for tICA. The GMRQ method16 was used to determine that 1000 microstates and eight tICA components were optimal for accurately capturing folding dynamics (Figure S1). An MSM lag time of 100 ns was chosen for all of the models on the basis of the observation that the computed implied time scales plateau near this lag time (Figures S2 and S3).

fast-folding variant, the native sequence Asn-Ala-Ser (NAS) near loop 2 is replaced with Gly-Thr-Thr (GTT), a sequence whose backbone propensities favor the native conformation. The discovery of the GTT variant is significant because it was predicted using direct simulation of reversible folding on the Anton supercomputer,38 the first such example of this approach for protein design.37 Subsequent temperature-jump refolding experiments showed stabilities and folding rates in good agreement with the predictions, validating the computational design. We wanted to see whether our maximum-caliber approach could be used to make a similar prediction without the need to perform expensive brute-force folding simulations. Toward this end, we built Markov state models of the FiP35 and GTT variants of the WW domain using published trajectory data from Shaw et al. to see whether our maximum-caliber approach could correctly predict changes in folding rates given estimates of the differences in equilibrium probabilities. E

DOI: 10.1021/acs.jctc.6b00938 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation

to perform microstate clustering on the combined data. Separate MSMs for GTT and FiP35 were then built using these combined microstate generators such that the two shared metastable state definitions. We refer to these models as the “combined” MSMs for GTT and FiP35; because they share metastable states, these models are amenable to our maximumcaliber approach. Because of finite sampling, however, not all metastable states are sampled by each sequence. Since states with zero population present problems for the maximum-caliber estimator, we performed further coarse-graining of our 1000microstate MSM to one with a smaller number of macrostates using the BACE algorithm.31 This procedure afforded a set of metastable states that are sampled by each sequence, although some states are visited so rarely as to cause long-time-scale artifacts. These states j can easily be identified by estimated transition probabilities Tij < 10−20, which we removed by setting them to zero and performing ergodic trimming using Tarjan’s algortihm.39 We refer to these macrostate MSMs as the “lumped” models. MSM Predictions of WW Domain Folding Time Scales. To avoid the complications of comparing trajectory data simulated using different force fields and temperatures, we first built MSMs using the four GTT trajectories simulated at 395 K (totaling 597 μs) and six FiP35 trajectories (totaling 600 μs). The folding dynamics predicted by all of the constructed MSMs, whether from separate or combined trajectory data, recapitulate the time scales found in previous analyses of the data. Notably, all of our MSMs successfully predict faster folding time scales for GTT compared with FiP35, in agreement with experiment. The solo, projected, and combined MSMs of the FiP35 WW domain predict folding relaxation time scales of 4.1, 4.2, and 4.3 μs, respectively, with each successive 1000-state model suffering minimally from projection artifacts. As expected, the lumped 100-macrostate MSM of the FiP35 WW domain suffers more acutely from coarse-graining artifacts, with faster folding relaxation time scales of 2.1 μs (Figure S4). In similar fashion, the solo, projected, combined, and lumped MSMs of the GTT

Figure 3. The maximum-caliber method predicts changes in folding kinetics for salt-bridge mutations of the Fs peptide helix. Comparisons of the actual and predicted differences in the folding relaxation time scales across all 54 possible sequence comparisons correlate well over a range of about an order of magnitude.

To test the assumption that the relevant metastable states are conserved for both GTT and FiP35, several methods were used to build Markov state models. First, we built individual MSMs for each sequence, performing a separate tICA analysis and microstate clustering for each. We will refer to these as the “solo” models of GTT and FiP35. Second, we performed a tICA analysis on the combined data for GTT and FiP35 and then built separate MSMs for GTT and FiP35 using this tICA projection. We will refer to these as the “projected” models of GTT and FiP35. Finally, we built MSMs from the combined data for GTT and FiP35 using the combined tICA projection

Figure 4. Projection of GTT and FiP35 trajectory data onto the two largest tICA components, overlaid with the (shared) locations of the 1000 “combined” MSM microstates (red dots). The two folding landscapes are highly similar, although unlike FiP35, GTT shows significant population for a trap conformation at 395 K. F

DOI: 10.1021/acs.jctc.6b00938 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation WW domain predict folding relaxation time scales of 3.4, 3.4, 3.2, and 1.3 μs at 395 K, respectively. To put these predicted time scales into perspective, it is useful to compare them against previously published analyses. For the FiP35 WW domain, Shaw et al. reported predicted folding times of 10 ± 2 μs, based on the average waiting time in the unfolded state seen in the trajectory data (at 395 K). This value compares well to the experimental folding time of 14 μs measured at 340 K.40 Previous MSMs built by Beauchamp et al.41 from the same data (using a lag time of 50 ns) yield a predicted folding relaxation time of ∼2 μs. Beauchamp et al. also used a sliding constraint rate estimator (SCRE) to derive a folding relaxation time of ∼9 μs for a three-state model of FiP35 folding. This range was consistent with estimates of ∼5 μs (using a 100 ns lagtime) by Lane et al.42 and ∼3.5 μs (using a 75 ns lag time) by McGibbon and Pande,43 which tested improved MSM construction methods. For the GTT WW domain, Piana et al.37 predicted folding times of 6 ± 1 μs by calculating the average folding time from 27 folding events simulated at 395 K. This value compares well to the experimental folding time of 5.7 μs. MSM Predictions of the WW Domain Folding Mechanism. Projection of the 1000-microstate “combined” MSMs onto the two largest tICA components (tIC1 and tIC2) revealed similar folding mechanisms for the GTT and FiP35 variants. The tIC1 component corresponds to the slowest (folding) relaxation time scale; along this component a broad unfolded-state basin is separated from a narrow folded-state basin (Figure 4). The next-slowest relaxation corresponds to an off-pathway “trap” state, as further revealed by projections onto tIC1 and tIC3 (Figure S5). This trap state has a native-like structure in the first loop and longer third strand, but with the first β-strand inverted with misregistered hydrogen bonds. Previous MSMs built by McGibbon and Pande also found that the unusual first loop conformations resulted from the broken hydrogen bond between Ser13 and Asp15.43 In this sense, the trap state is a near-native “decoy” on the free energy landscape, i.e., a local, not global, free energy minimum. Maximum-Caliber Predictions of Folding Rates at 395 K. Given that our “solo” MSMs predicted the GTT and FiP35 WW domains to fold at 3.4 and 4.2 μs, respectively, we next applied our maximum-caliber approach to the 1000-microstate “combined” MSMs to predict the folding rates of the GTT WW domain from the FiP35 model and vice versa. Both of the maximum-caliber predictions of the folding relaxation time scales match the true values very closely, demonstrating that this approach can successfully discriminate how Gly-Thr-Thr versus Asn-Ala-Ser mutations perturb the slowest relaxation (Figure 5). Changes in the second-slowest relaxation, dominated by formation of the trap state, are less accurately predicted (for likely reasons we discuss below), but the direction of the change (faster vs slower) is correctly predicted. We then applied our maximum-caliber approach to predict the folding rates of the GTT WW domain given the 100macrostate “lumped” MSM populations of the FiP35 WW domain and vice versa. The computed folding relaxation time scales for these coarser-grained MSMs are less accurate, although again we find that MaxCal correctly predicts the correct directionality of the change for both the slowest and second-slowest time scales (Figure S6). To examine the mechanism by which our MSMs predict GTT to be stabilized and have faster folding compared with FiP35, we computed a free energy profile along an rmsd-to-

Figure 5. Maximum-caliber predictions of (left) the FiP35 folding time scales from the 1000-microstate MSM of GTT and (right) the GTT folding time scales from the 1000-microstate MSM of FiP35, which agree well with the actual values at 395 K.

native coordinate (the population-weighted average backbone rmsd of N, Cα, C, and O atoms) using kernel density estimation across the 1000-microstate equilibrium populations from the “combined” MSMs (Figure S7). The results show that the GTT mutations stabilize the folded basin by ∼0.5 kcal/mol and lower the energy barrier by about 0.3 kcal/mol, resulting in accelerated folding. These results are very similar to the analysis presented in Figure 5 of Piana et al.,37 showing stabilization of ∼0.65 kcal/mol (0.5 kcal/mol by over population changes) and a barrier lowered by ∼0.1 kcal/mol. Maximum-Caliber Predictions of Folding Rates at 360 K (GTT) and 395 K (FiP35). As a further test of our maximumcaliber approach, we built MSMs with the two long trajectories of the GTT WW domain at 360 K (651 and 486 μs) simulated using the CHARMM22** force field and two 100 μs trajectories of the FiP35 WW domain simulated using the AMBER ff99SB-ILDN force field at 395 K.21,22 Although the two systems have different temperatures and use different force fields, we wanted to see whether the MaxCal approach would be similarly predictive. The MSMs were constructed using the protocols described above, including GMRQ analysis (Figures S8−S10). The solo, projected, combined, and lumped (250-macrostate) MSMs of the GTT WW domain at 360 K predict folding relaxation time scales of 10.2, 9.6, 9.8, and 9.8 μs, respectively (Figure S11). This compares well to the folding time of 21 ± 6 μs calculated by Shaw et al. from the average unfolded state lifetime. Previous MSMs built by Beauchamp et al. from this data using a lag time of 50 ns yield a predicted folding relaxation time of ∼6 μs, and a SCRE estimate of ∼8 μs for a three-state model of GTT folding.41 Our predictions are larger than these values, likely reflecting improvements in MSM construction protocols since that work. The solo, projected, and combined MSMs of FiP35 WW domain predict folding relaxation time scales of 4.3, 4.0, 3.9, and 3.8 μs (Figure S11), respectively, with each successive model suffering only slightly from projection artifacts and lesser amounts of data. Importantly, we note that all of the MSMs we built from this set of data (as do the Beauchamp et al. MSMs) predict FiP35 to have a faster folding rate than GTT. This is of course the opposite result obtained from experiments and the simulation predictions presented in Piana et al.37 and is a consequence of the lower temperature at which GTT was simulated (360 K) as well as differences in the force field. When we apply our maximum-caliber approach, we find that the folding time scale for FiP35 at 395 K is well-predicted from the MSM of the GTT WW domain at 360 K, but not vice versa, G

DOI: 10.1021/acs.jctc.6b00938 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation

and later reweighted by our maximum-caliber approach to calculate unbiased dissociation rates. Similarly, the common problem of force field bias in molecular simulations could be remedied by using our maximum-caliber approach to reweight MSM transition rates to match experimental observables. Our maximum-caliber approach is similar in some ways to the dTRAM method,26 which uses a Lagrangian method to infer maximum-likelihood estimates of transition rates from transition counts observed in multiple thermodynamic ensembles. Perhaps deeper connections exist between these two methods that could be explored in future work, including the incorporation of multiple thermodynamic biases. Interestingly, the dTRAM algorithm is unable to make estimates of rates for ensembles in which no transition counts are observed. The maximum-caliber method we present here has no such limitation. Finally, we stress that the maximum-caliber approach we present here is completely general and can be used in any number of applications in which changes in kinetics need to be inferred from changes in equilibrium populations.

although the predictions are in the right direction (Figure S12). Similar to the 395 K results, the second-slowest relaxation time scale is found to be associated with the population of a misfolded “trap state”. In this case, however, the trap state is different: it has a flipped misregistered β2-strand (Figures S13 and S14). MaxCal predictions of the second-slowest time scales are less accurate than those from the 395 K MSMs, with the GTT predictions in the right direction but not those for FiP35. The poorer performance of MaxCal applied to the 360 K/ CHARMM22* (GTT) versus 395 K/AMBER (FiP35) systems is likely due to sampling issues. For the 395 K MSMs, the nonnative trap state is sampled by both the GTT and FiP35 sequences, although with much higher population for the GTT WW domain. For the 360 K GTT versus 395 K FiP35 systems, only the GTT system samples the trap state, while the FiP35 system does not. This could be a result of the different temperatures and/or force fields or could arise from the fact that only 200 μs of FiP35 trajectory data was used to built these models, compared to 1.1 ms of GTT trajectory data, as is evident from the larger sampled area of the tICA projections for GTT (Figures S13 and S14). Better sampling in the 395 K MSMs can be quantified by the overlap in the number of microstates sampled by each sequence: GTT sampled 994 of 1000 microstates, while FiP35 sampled 990 out of 1000. By contrast, in the 360 K/395 K MSMs, GTT sampled 951 out of 1000 microstates and FiP35 only 904 out of 1000. These comparisons raise questions about the usefulness of single long trajectories for constructing MSMs and/or performing maximum-caliber estimates. Even with more than a millisecond of folding trajectory data for the GTT WW domain, we note that one trajectory stays in the folded state for over 200 μs (Figure S15), an impediment to the good statistical sampling of state transitions necessary for constructing MSMs. We suspect that the use of many short trajectories to sample states relevant across multiple sequences may be a more useful approach, especially in conjunction with maximum-caliber approaches.



CONCLUSIONS In this work, we have presented a simple and robust maximumcaliber method for inferring transition rates of a Markov state model (MSM) with perturbed equilibrium populations given estimates of the state populations and rates for an unperturbed MSM. Applying this approach to several MSMs of protein folding sequence variants results in good predictions of perturbed folding rates directly from changes in equilibrium state populations.



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jctc.6b00938. Figures S1−S15 (PDF)





DISCUSSION The success of our maximum-caliber approach appears to be dependent on several factors, including (1) a metastable state decomposition that closely reflects the underlying kinetic distances, (2) small enough perturbations as to conserve the important metastable states, and (3) sufficient sampling to make accurate estimates of rate changes. Only recently have such considerations been quantitatively incorporated into standard MSM construction protocols for protein conformational dynamics, which means one can expect wider applicability of this method to more MSMs in the future. The potential applications of our maximum-caliber method are especially exciting. For one, with a sufficiently accurate method of predicting mutational free energy changes for each metastable state, the method could be used to elucidate how disease- or resistance-associated mutations perturbed protein dynamics. Similarly, the effects of many candidate mutations could be efficiently interrogated for the purposes of protein design. More generally, there are many situations where it may be desirable to remove thermodynamic bias from a given MSM, which now can be robustly performed using our maximumcaliber method. One application would be to construct MSM models of ligand dissociation; unbinding transitions otherwise too rare to observe could be sampled used a biasing potential

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. ORCID

Vincent A. Voelz: 0000-0002-1054-2124 Funding

This research was supported in part by the National Science Foundation through major research instrumentation grant CNS-09-58854 and MCB-1412508. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS The authors thank the participants of Folding@home, without whom this work would not have been possible. We graciously acknowledge D. E. Shaw Research for providing access to the WW domain folding trajectory data.



REFERENCES

(1) Kohlhoff, K. J.; Shukla, D.; Lawrenz, M.; Bowman, G. R.; Konerding, D. E.; Belov, D.; Altman, R. B.; Pande, V. S. Cloud-Based Simulations on Google Exacycle Reveal Ligand Modulation of GPCR Activation Pathways. Nat. Chem. 2013, 6, 15−21.

H

DOI: 10.1021/acs.jctc.6b00938 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation (2) Plattner, N.; Noé, F. Protein Conformational Plasticity and Complex Ligand-Binding Kinetics Explored by Atomistic Simulations and Markov Models. Nat. Commun. 2015, 6, 7653. (3) Voelz, V. A.; Bowman, G. R.; Beauchamp, K.; Pande, V. S. Molecular Simulation of ab initio Protein Folding for a Millisecond Folder NTL9(1−39). J. Am. Chem. Soc. 2010, 132, 1526−1528. (4) Chodera, J. D.; Noé, F. Markov State Models of Biomolecular Conformational Dynamics. Curr. Opin. Struct. Biol. 2014, 25, 135−144. (5) Prinz, J.-H.; Wu, H.; Sarich, M.; Keller, B.; Senne, M.; Held, M.; Chodera, J. D.; Schütte, C.; Noé, F. Markov Models of Molecular Kinetics: Generation and Validation. J. Chem. Phys. 2011, 134, 174105. (6) Shirts, M.; Pande, V. S. Screen Savers of the World Unite! Science 2000, 290, 1903−1904. (7) Buch, I.; Harvey, M. J.; Giorgino, T.; Anderson, D. P.; De Fabritiis, G. High-throughput all-atom molecular dynamics simulations using distributed computing. J. Chem. Inf. Model. 2010, 50, 397−403. (8) Noé, F.; Schütte, C.; Vanden-Eijnden, E.; Reich, L.; Weikl, T. R. Constructing the Equilibrium Ensemble of Folding Pathways from Short off-Equilibrium Simulations. Proc. Natl. Acad. Sci. U. S. A. 2009, 106, 19011−19016. (9) Pressé, S.; Ghosh, K.; Lee, J.; Dill, K. A. Principles of maximum entropy and maximum caliber in statistical physics. Rev. Mod. Phys. 2013, 85, 1115−1141. (10) Dixit, P. D.; Jain, A.; Stock, G.; Dill, K. A. Inferring Transition Rates of Networks from Populations in Continuous-Time Markov Processes. J. Chem. Theory Comput. 2015, 11, 5464−5472. (11) Dixit, P. D.; Dill, K. A. Inferring Microscopic Kinetic Rates from Stationary State Distributions. J. Chem. Theory Comput. 2014, 10, 3002−3005. (12) Dixit, P. D. Stationary properties of maximum-entropy random walks. Phys. Rev. E 2015, 92, 042149. (13) Noé, F.; Clementi, C. Kinetic Distance and Kinetic Maps from Molecular Dynamics Simulation. J. Chem. Theory Comput. 2015, 11, 5002−5011. (14) Noé, F.; Nüske, F. A variational approach to modeling slow processes in stochastic dynamical systems. Multiscale Model. Simul. 2013, 11, 635−655. (15) Nüske, F.; Keller, B. G.; Perez-Hernandez, G.; Mey, A. S. J. S.; Noé, F. Variational Approach to Molecular Kinetics. J. Chem. Theory Comput. 2014, 10 (4), 1739−1752. (16) McGibbon, R. T.; Pande, V. S. Variational Cross-Validation of Slow Dynamical Modes in Molecular Kinetics. J. Chem. Phys. 2015, 142, 124105. (17) Schwantes, C. R.; Pande, V. S. Improvements in Markov State Model Construction Reveal Many Non-Native Interactions in the Folding of NTL9. J. Chem. Theory Comput. 2013, 9, 2000−2009. (18) Perez-Hernandez, G.; Paul, F.; Giorgino, T.; De Fabritiis, G.; Noé, F. Identification of Slow Molecular Order Parameters for Markov Model Construction. J. Chem. Phys. 2013, 139, 015102. (19) Schwantes, C. R.; Pande, V. S. Modeling Molecular Kinetics with tICA and the Kernel Trick. J. Chem. Theory Comput. 2015, 11, 600−608. (20) Boninsegna, L.; Gobbo, G.; Noé, F.; Clementi, C. Investigating Molecular Kinetics by Variationally Optimized Diffusion Maps. J. Chem. Theory Comput. 2015, 11, 5947−5960. (21) Shaw, D. E.; Maragakis, P.; Lindorff-Larsen, K.; Piana, S.; Dror, R. O.; Eastwood, M. P.; Bank, J. A.; Jumper, J. M.; Salmon, J. K.; Shan, Y.; Wriggers, W. Atomic-Level Characterization of the Structural Dynamics of Proteins. Science 2010, 330, 341−346. (22) Lindorff-Larsen, K.; Piana, S.; Dror, R. O.; Shaw, D. E. How Fast-Folding Proteins Fold. Science 2011, 334, 517−520. (23) Hazoglou, M. J.; Walther, V.; Dixit, P. D.; Dill, K. A. Communication: Maximum caliber is a general variational principle for nonequilibrium statistical mechanics. J. Chem. Phys. 2015, 143, 051104. (24) Bowman, G. R.; Beauchamp, K. A.; Boxer, G.; Pande, V. S. Progress and challenges in the automated construction of Markov state models for full protein systems. J. Chem. Phys. 2009, 131, 124101.

(25) Beauchamp, K. A.; Bowman, G. R.; Lane, T. J.; Maibaum, L.; Haque, I. S.; Pande, V. S. MSMBuilder2: Modeling Conformational Dynamics on the Picosecond to Millisecond Scale. J. Chem. Theory Comput. 2011, 7, 3412−3419. (26) Wu, H.; Mey, A. S. J. S.; Rosta, E.; Noé, F. Statistically optimal analysis of state-discretized trajectory data from multiple thermodynamic states. J. Chem. Phys. 2014, 141, 214106. (27) Razavi, A. M.; Voelz, V. A. Kinetic Network Models of Tryptophan Mutations in β-Hairpins Reveal the Importance of NonNative Interactions. J. Chem. Theory Comput. 2015, 11, 2801−2812. (28) Zhou, G.; Voelz, V. A. Using Kinetic Network Models To Probe Non-Native Salt-Bridge Effects on α-Helix Folding. J. Phys. Chem. B 2016, 120, 926−935. (29) Cochran, A. G.; Skelton, N.; Starovasnik, M. Tryptophan zippers: Stable, monomeric β-hairpins. Proc. Natl. Acad. Sci. U. S. A. 2001, 98, 5578. (30) Du, D.; Zhu, Y.; Huang, C.-Y.; Gai, F. Understanding the key factors that control the rate of beta-hairpin folding. Proc. Natl. Acad. Sci. U. S. A. 2004, 101, 15915−15920. (31) Bowman, G. R. Improved Coarse-Graining of Markov State Models via Explicit Consideration of Statistical Uncertainty. J. Chem. Phys. 2012, 137, 134111. (32) Nguyen, H.; Jäger, M.; Moretto, A.; Gruebele, M.; Kelly, J. W. Tuning the free-energy landscape of a WW domain by temperature, mutation, and truncation. Proc. Natl. Acad. Sci. U. S. A. 2003, 100, 3948−3953. (33) Nguyen, H.; Jäger, M.; Kelly, J. W.; Gruebele, M. Engineering a β-Sheet Protein toward the Folding Speed Limit. J. Phys. Chem. B 2005, 109, 15182−15186. (34) Jäger, M.; Zhang, Y.; Bieschke, J.; Nguyen, H.; Dendle, M.; Bowman, M. E.; Noel, J. P.; Gruebele, M.; Kelly, J. W. Structurefunction-folding relationship in a WW domain. Proc. Natl. Acad. Sci. U. S. A. 2006, 103, 10648−10653. (35) Liu, F.; Gruebele, M. Downhill dynamics and the molecular rate of protein folding. Chem. Phys. Lett. 2008, 461, 1−8. (36) Freddolino, P. L.; Liu, F.; Gruebele, M.; Schulten, K. TenMicrosecond Molecular Dynamics Simulation of a Fast-Folding WW Domain. Biophys. J. 2008, 94, L75−L77. (37) Piana, S.; Sarkar, K.; Lindorff-Larsen, K.; Guo, M.; Gruebele, M.; Shaw, D. E. Computational Design and Experimental Testing of the Fastest-Folding β-Sheet Protein. J. Mol. Biol. 2011, 405, 43−48. (38) Shaw, D. E.; Chao, J. C.; Eastwood, M. P.; Gagliardo, J.; Grossman, J. P.; Ho, C. R.; Lerardi, D. J.; Kolossváry, I.; Klepeis, J. L.; Layman, T.; McLeavey, C.; Deneroff, M. M.; Moraes, M. A.; Mueller, R.; Priest, E. C.; Shan, Y.; Spengler, J.; Theobald, M.; Towles, B.; Wang, S. C.; Dror, R. O.; Kuskin, J. S.; Larson, R. H.; Salmon, J. K.; Young, C.; Batson, B.; Bowers, K. J. Anton, a Special-Purpose Machine for Molecular Dynamics Simulation. Commun. ACM 2008, 51, 91−97. (39) Scalco, R.; Caflisch, A. Equilibrium Distribution from Distributed Computing (Simulations of Protein Folding). J. Phys. Chem. B 2011, 115, 6358−6365. (40) Liu, F.; Du, D.; Fuller, A. A.; Davoren, J. E.; Wipf, P.; Kelly, J. W.; Gruebele, M. An experimental survey of the transition between two-state and downhill protein folding scenarios. Proc. Natl. Acad. Sci. U. S. A. 2008, 105, 2369−2374. (41) Beauchamp, K. A.; McGibbon, R.; Lin, Y.-S.; Pande, V. S. Simple few-state models reveal hidden complexity in protein folding. Proc. Natl. Acad. Sci. U. S. A. 2012, 109, 17807−17813. (42) Lane, T. J.; Bowman, G. R.; Beauchamp, K.; Voelz, V. A.; Pande, V. S. Markov state model reveals folding and functional dynamics in ultra-long MD trajectories. J. Am. Chem. Soc. 2011, 133, 18413−18419. (43) McGibbon, R. T.; Pande, V. S. Learning Kinetic Distance Metrics for Markov State Models of Protein Conformational Dynamics. J. Chem. Theory Comput. 2013, 9, 2900−2906.

I

DOI: 10.1021/acs.jctc.6b00938 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX