Folding of a β-Sheet Miniprotein: Probability Fluxes, Streamlines, and

Dec 27, 2014 - particular, we calculate the potential for the driving force of folding in ... the folding reaction with reference to the conformation ...
0 downloads 0 Views 3MB Size
Article pubs.acs.org/JPCB

Folding of a β‑Sheet Miniprotein: Probability Fluxes, Streamlines, and the Potential for the Driving Force Igor V. Kalgin† and Sergei F. Chekmarev*,†,‡ †

Institute of Thermophysics, Siberian Branch of the Russian Academy of Sciences, 630090 Novosibirsk, Russia Department of Physics, Novosibirsk State University, 630090 Novosibirsk, Russia



ABSTRACT: In this work we continue the study of the first-passage folding of an antiparallel β-sheet miniprotein (beta3s) that was initiated in the previous work [Kalgin et al. J. Phys. Chem. B, 2014, 118, 4287]. We consider a larger ensemble of folding trajectories, which allows us to gain a closer insight into the folding dynamics. In particular, we calculate the potential for the driving force of folding in a reduced space of collective variables. The potential has two components. One component (Φ) is responsible for the source and sink of the folding flow, which are formed, respectively, in the regions of the unfolded and native states of the protein, and the other (Ψ) accounts for the flow vorticity inherently generated at the sides of the reaction channel and provides the canalization of the folding flow between the source and sink. We show that both components obey Poisson’s equations with the corresponding source/sink terms. The resulting components have a very simple form: the Φ-surface consists of two well-defined peaks of different signs, which correspond, respectively, to the source and sink of the folding flow, and the Ψsurface consists of two ridges of different signs that connect the source and sink of the flow.

principal component analysis (PCA)6 is typically employed, although the other methods are also possible (see, e.g., Kalgin et al.,7 where different reduction methods were tested to describe folding of beta3s). The FESs are very useful for a general view of the folding reaction; they reveal the entropic barrier that the system has to overcome to achieve the native state as well as some essential details of the reaction, such as onand off-pathway intermediates and multiple pathways. One general shortcoming of the FESs is that they determine only the probabilities for the system to be in different states but do not reveal the order in which these states follow each other; i.e., folding dynamics (and kinetics) mostly remain out of consideration. To gain a closer insight into folding dynamics, we have recently introduced a “hydrodynamic” description of

1. INTRODUCTION A widely used approach to display the essential properties of the folding reaction with reference to the conformation space is to construct a free energy surface (FES) of the reaction in a two-dimensional space of collective variables.1−4 The spectrum of possible collective variables is broad. In one group, the variables are selected on the physical basis. Typically, one variable is chosen to describe how the protein compacts as the contacts between the residues are formed, and the other is chosen to determine how close the protein conformation is to the native state. Most often, the combination of the radius of gyration and fraction of native contacts is used for this purpose, although other variables are also possible, e.g., the hydrogen bond distances and the root-mean-square deviation (RMSD) from the native state (see Qi et al.,5 where different variables were compared for folding of beta3s). In another group, representing an unbiased choice of the variables, the methods of dimensionality reduction are used. For this purpose, © 2014 American Chemical Society

Received: November 11, 2014 Revised: December 17, 2014 Published: December 27, 2014 1380

DOI: 10.1021/jp5112795 J. Phys. Chem. B 2015, 119, 1380−1387

Article

The Journal of Physical Chemistry B the folding process.8 In this approach, similar to the FESs, the folding process is considered in a reduced space of collective variables, but the calculated folding trajectories are used to determine the rates of transitions between the protein states, similar to the disconnectivity graphs9,10 and kinetic networks.11 These rates are organized in the form of local flows (probability fluxes) of transitions. In analogy with hydrodynamics,12 this makes it possible to view the process of folding as a motion of a “folding fluid”, whose density is proportional to the probability for the system to be in the current state. The present approach is particularly useful when the simulations are intended to mimic physiological conditions, i.e., when the native state is stable and unfolding events are improbable. In this case, the folding trajectories start at unfolded states of the protein and are terminated upon reaching the native state (“first-passage folding”13). Correspondingly, the process of folding can be considered as a steady flow of the folding fluid from the unfolded states of the protein to its native state. The hydrodynamic description has been successfully applied to study folding dynamics of several model proteins: an α-helical hairpin (a lattice model),8 an SH3 domain14,15 and a β-hairpin protein16 (a Cα-model), and beta3s (all-atom simulations).7,13 One general conclusion drawn from these studies is that the folding flows do not generally follow the FES landscape; even local vortices can be formed that do not leave clear fingerprints on the FES.8,14 It follows that the FES does not completely determine the local flows of transitions and thus cannot be considered as the potential that governs the folding reaction. At the same time, the knowledge of the probability fluxes offers a simple and rigorous way to determine such a potential.16 For this, using the Helmholtz decomposition theorem,17 the vector field of the probability fluxes is represented as a sum of divergence-free and curl-free fields. Each of these fields has its own potential that governs the field, so the total flow is determined by a combination of the potentials for the divergence-free and curl-free fields. Using a β-hairpin protein in a Cα-representation as a model system, it has been shown16 that one component of the potential, which corresponds to the curl-free field, is responsible for the source and sink of the folding flow that represent, respectively, the unfolded states and the native state of the protein, and the other component, which corresponds to the divergence-free field, accounts for the flow vorticity generated at the periphery of the flow field and provides the canalization of the flow between the source and sink within the reaction valley. In the present paper, we perform all-atom simulations of the beta3s miniprotein using the CHARMM program.18 In contrast to the equilibrium folding conditions, which have been studied in detail by Caflisch and various co-workers,5,7,19−23 including the authors of the present paper,7 we consider the first-passage folding, the general aspects of which we investigated previously.13 Based on an enlarged number of folding trajectories, we perform a more detailed analysis of the distribution of the probability fluxes with the reference to the FES landscape, introduce the two-component potential for the driving force of folding, and show that each of the components is determined by a diffusion (Poisson’s) equation, in which the divergence and vorticity of the flow play the role of the source/ sink terms for the curl-free and divergence-free flow fields, respectively. The paper is organized as follows. In section 2, we describe the methods we used to simulate and analyze folding of beta3s, including the calculation of the two-component potential for

the driving force. In section 3, we present the results of the simulations and their analysis. Section 4 contains a brief summary of the results and concluding remarks.

2. METHODS To simulate and characterize folding of beta3s, we used methods that are similar to those we previously employed to study folding of this protein;7,13 in sections 2.1−2.4 we give a brief survey of these methods. Section 2.5 contains the description and development of the method of calculation of the potential for the driving force.16 2.1. The System and Molecular Dynamics Simulations. The designed three-stranded antiparallel 20-residue peptide (Thr1-Trp2-Ile3-Gln4-Asn5-Gly6-Ser7-Thr8-Lys9Trp10-Tyr11-Gln12-Asn13-Gly14-Ser15-Thr16-Lys17-Ile18Tyr19-Thr20 with charged termini24), whose native structure is shown in Figure 1, was modeled with the CHARMM

Figure 1. Conformations of the beta3s miniprotein.

program.18 All heavy atoms and the hydrogen atoms bound to nitrogen or oxygen atoms were considered explicitly; PARAM19 force field25 and a default cutoff of 7.5 Å for the nonbonding interactions were used. To describe the main effects of the aqueous solvent, a mean field approximation based on the solvent-accessible surface26 was employed. The simulations were performed at T = 330 K, using the Berendsen thermostat with a coupling constant of 5 ps. For the present protein model, this temperature is slightly above the melting temperature.27 The SHAKE algorithm28 was used to fix the length of the covalent bonds involving hydrogen atoms, which allowed the integration time step of 2 fs. Molecular dynamics (MD) trajectories were initiated in unfolded states of the protein and terminated upon reaching the native state, with the atomic coordinates saved every 20 ps. To have the vector field of the probability fluxes reasonably smooth, which is required for the application of the Helmholtz decomposition, the ensemble of the simulated trajectories was increased more than twice in comparison to the previous work13 (from 200 trajectories to 450). To prepare unfolded conformations, the standard CHARMM18 protocol was used; i.e., an extended conformation of the protein was first minimized (200 steps of the steepest descent followed by 300 steps of the conjugate gradient algorithm) and then heated to T = 330 K and equilibrated for 5 × 103 time steps. To determine the final state where the trajectories were terminated, we employed the sidechain distances. Specifically, we assumed that a native contact was formed if the distance between the geometrical centers of 1381

DOI: 10.1021/jp5112795 J. Phys. Chem. B 2015, 119, 1380−1387

Article

The Journal of Physical Chemistry B

2.5. Potential for the Driving Force. To calculate the potential for the driving force, the Helmholtz decomposition theorem17 was used, similar to a previous work (Chekmarev16). According to this theorem, any smooth vector field, in the present case j(g), can be uniquely represented as

the side chains of two residues dnat was less than 7.5 Å, which was found to be more effective than the typically used threshold 6.5 Å.13 Excluding nearest neighbors, i.e., the pairs of residues for which |j − i| = 1 with i and j the residue numbers, the number of native contacts is Nnat = 23. 2.2. Conformation Space and Collective Variables. To characterize protein conformations, the hydrogen bond PCA (HB PCA) method7 was used. In this method, the original conformation space of the protein in the form of the hydrogen bond distances for the formed bonds is transformed to a space of orthogonal collective variables using principal component analysis (PCA).6 The first two modes corresponding to the largest eigenvalues were chosen to form a two-dimensional space of collective variables g = (g1,g2). This space represents a subspace of the three-dimensional space (g1,g2,g3) we previously employed to describe the first-passage folding of beta3s;13 it accounts for 20% of the data variation in comparison to 24% of that for the three-dimensional space. Although larger percentages might be desirable, the variables g1 and g2 grasp the essential features of the folding process. The variable g1 serves as a good reaction coordinate for the overall description of the process, in particular, taking into account helical conformations, which are formed in the initial stage of folding (the fraction of the hydrogen bonds between i and i + 4 residues is equal to 15%), and the variable g2 successfully discriminates between the Ns-or and Cs-or conformations.13 We also note the fact that a large number of small contributions are required to obtain significantly higher percentages (e.g., 36 modes are required to have 70%) suggests that their inclusion would not change the analysis significantly.7 Since the collective variables are linear combinations of the original variables, they are measured in the same units as the bond distances, i.e., in angstroms. 2.3. Secondary Structure Analysis. As in the previous studies,7,11,13,21,22 protein conformations were discriminated according to the secondary structure strings (SSSs) encoded with the DSSP alphabet,29 i.e., the letters “H”, “G”, “I”, “E”, “B”, “T”, and “S” and “−” stand for α-helix, 310-helix, π-helix, extended, isolated β-bridge, hydrogen bonded turn, bend, and unstructured segments, respectively. With this coding, the native state shown in Figure 1 is represented by the string “− EEEETTEEEEEETTEEEE−”. The other significant conformations, which are also shown in Figure 1, have the following, or close to them, strings: Cs-or (−EEEETTEEEE−SSS− − − − −), Ns-or (−EEEESSSEEEEETTEEEE−), and helical (− −HHHHHHHHHHS− − − − − − −). The program WORDOM30 was used to perform the analysis. 2.4. Probability Fluxes. Using the simulated folding trajectories, the probability fluxes of the transitions j(g) in the space of collective variables g = (g1,g2) were calculated as the 2-fold (time and ensemble) averages of the local transitions.8 Based on these fluxes, the process of the firstpassage folding can be considered as a steady flow of a folding fluid from the unfolded states to the native state, with the density of the fluid being proportional to the probability for the system to be found at the current point of the g space.8,13,16 Having the fluxes j(g), the streamlines of the folding flows, which are tangent to the local directions of the flux vectors j(g),12 are constructed. In the case of two dimensions, which is under consideration, the streamlines can also be calculated as the lines corresponding to constant values of the stream function.8 Every two adjacent streamlines form a stream tube that contains a certain portion of the total flow.

j = jcf + jdf

(1)

where ∇ × jcf = 0 and ∇·jdf = 0; i.e., jcf and jdf are curl-free and divergence-free fields, respectively. Due to the conditions the vectors jcf and jdf satisfy, they can be represented as jcf = −

∂Φ ∂Φ kg − kg ∂g1 1 ∂g2 2

∂Ψ ∂Ψ kg − kg ∂g1 2 ∂g2 1

jdf =

(2)

(3)

where Φ = Φ(g) is the potential for the curl-free component, Ψ = Ψ(g) is the potential for the divergence-free component, and kg1 and kg2 are the unit vectors for g1 and g2, respectively. The lines Φ(g) = const and Ψ(g) = const are qualitatively similar, respectively, to the isocommitor probability lines and the probability currents of the reactive trajectories which were introduced by E and Vanden-Eijnden31 to determine a reaction channel (the model Müller and Brown surface32 was used as an example). In contrast to the latter, the Ψ(g) = const and Φ(g) = const lines are not mutually orthogonal because the folding flow is not curl-free; the orthogonality of these lines would require the flow to be both divergence- and curl-free.16 Substituting the expressions for jcf and jdf from eqs 2 and 3 into eq 1 and regrouping the terms, we have j = jg k g + jg k g 1

1

2

2

where jg = − 1

∂Φ ∂Ψ + ∂g1 ∂g2

(4)

∂Φ ∂Ψ − ∂g2 ∂g1

(5)

and jg = − 2

The functions Φ(g) and Ψ(g) obey the corresponding Poisson equations. Taking into account eq 2 and the condition ∇·jdf = 0, the dot product of the gradient vector and the probability flux j determined by eq 1 gives ΔΦ(g) + q(g) = 0

(6)

where q(g) is the distribution of sources and sinks that is calculated from the simulated probability fluxes j(g) as q(g) = ∂jg /∂g1 + ∂jg /∂g2 1

2

(7)

The local regions of q(g) > 0 and q(g) < 0 represent the sources and sinks, respectively. Similarly, the cross product of the gradient vector and j, taking into account eq 3 and the condition ∇ × jcf = 0, gives ΔΨ(g) + ω(g) = 0

(8)

where ω(g) = ∂jg /∂g1 − ∂jg /∂g2 2

1382

1

(9) DOI: 10.1021/jp5112795 J. Phys. Chem. B 2015, 119, 1380−1387

Article

The Journal of Physical Chemistry B

or “cyclic balance”, which can be considered as a measure of deviation from detailed balance.35−37 We also note that since the gradients of the potentials are immediately related to the probability fluxes (eqs 2 and 3), the driving force determined by the functions Φ(g) and Ψ(g) is density weighted. That is, in contrast to the conventional definition of the fluxes j(g) ∼ F(g) p(g), as, e.g., in the Smoluchowski equation above, it is assumed that j(g) ∼ F(g). Correspondingly, the driving force is determined as F(g) = p(g) F̃(g), where F̃(g) is the actual driving force. The reason for such a deviation from the conventional definition of the driving force is that in the latter case the velocity field v(g) = j(g)/p(g) had to be considered instead of the flux field. However, in contrast to the flux field, the velocity field is not divergence-free due to a high “compressibility” of the folding fluid, i.e., because of the large changes of p(g), which are typical of protein folding. Thus, the replacement of j(g) with v(g) would lead to the appearance of artificial sources and sinks of the flow that complicated the interpretation of the Φ(g) and Ψ(g) functions.

is the simulated distribution of vorticity. The functions Φ(g) and Ψ(g) should satisfy eqs 6 and 8 within a region of the g space, in which q(g) ≠ 0 and ω(g) ≠ 0, and have, respectively, Φ(g) = 0 and Ψ(g) = 0 at its boundary (the Dirichlet problem; see, e.g., Sobolev33). To find the Φ(g) and Ψ(g), the Dirichlet problem was replaced with the initial-boundary-value problem, i.e., instead of eqs 6 and 8, there were considered, respectively ∂Φ(g , t )/∂t = ΔΦ(g , t ) + q(g)

(10)

and ∂Ψ(g , t )/∂t = ΔΨ(g , t ) + ω(g)

(11)

where t plays the role of time. In each case, starting with some initial distribution that satisfies the boundary conditions, the process was continued until a stationary distribution was achieved, which approximated the solution of the corresponding the Dirichlet problem. In the present paper, we integrated eqs 10 and 11 numerically, using a finite-difference scheme of the first-order approximation for the time derivative and the second-order approximation for the space derivatives. To avoid boundary effects, the region of integration, which was the same for Φ(g,t) and Ψ(g,t), was enlarged by 2 and 3 times in the direction of g1 and g2, respectively, compared to the region where the simulations were performed. At the boundaries of this, enlarged region there were Φ(g,t) = 0 and Ψ(g,t) = 0 for eqs 10 and 11, respectively, and the initial distributions were taken to be Φ(g,0) = 0 and Ψ(g,0) = 0. The stationary solutions of eqs 10 and 11 were considered to be achieved if, respectively, the conditions {∫ [ΔΦ(g,t)/q(g) + 1]2 dg1 dg2}1/2 ≤ 1 × 10−10 and {∫ [ΔΨ(g,t)/ω(g) + 1]2 dg1 dg2}1/2 ≤ 1 × 10−10 were satisfied for, at least, 1 × 107 last time steps, with the total number of steps of ≈6 × 107. We note that eqs 4 and 5 also allow determining the functions Φ(g) and Ψ(g) directly from the simulated probability fluxes j(g). For this purpose, the least-squares functional Q = ∫ [(jg1 + ∂Φ/∂g1 − ∂Ψ/∂g2)2 + (jg2 + ∂Φ/∂g2 + ∂Ψ/∂g1)2] dg1 dg2, where jg1 and jg2 are the simulated probability fluxes, can be minimized with respect to Φ(g) and Ψ(g).16 Both these approaches lead to essentially the same results. According to eqs 2 and 3, the probability fluxes are assumed to be proportional to the first-order space derivatives of the potential functions Φ(g) and Ψ(g); i.e., the fluxes are considered to be drift fluxes. In other words, although the inertia term was present in the MD simulations, the functions Φ(g) and Ψ(g) determined from the resulting simulated fluxes j(g) via eqs 1−3 account for an overdamped motion. The corresponding kinetic equation is the Smoluchowski equation ∂p/∂t + ∇·J = 0, where p(g,t) is the probability density, and J is the probability current, which can be written as J = −D(g) ∇p(g,t) + [D(g)/(kBT)]F(g) p(g,t), where −D(g) ∇p(g,t) is the diffusion flux, [D(g)/(kBT)]F(g) p(g,t) is the drift flux, D(g) is the diffusion coefficient, F(g) is the driving force, and kB is the Boltzmann constant. In the case of detailed balance (J = 0), the flow is curl-free. Correspondingly, the probability distribution is the Boltzmann distribution p(g) ∼ exp[−G(g)/ (kBT)], where G(g) is the free energy that exerts the driving force F(g) = −∇G(g). However, if J ≠ 0, as in the present case, when the steady flow from the unfolded state to the native state exists, the stationary solution is determined by the condition ∇· J = 0, so that J can have a curl component.34 Such flow is nonequilibrium and is characterized by “irreversible circulation”

3. RESULTS AND DISCUSSION As has been found in the previous studies of both the equilibrium7,11,21,22 and first-passage13 folding of beta3s, several protein conformations play the most significant role. Along with the native and native-like conformations, which have a form of double antiparallel hairpins, they are the conformations in which the C-terminal hairpin is formed and the N-terminal hairpin is unstructured (Ns-or, where “or” means “out of register”), those in which the N-terminal hairpin formed and the C-terminal unstructured (Cs-or), and the conformations which contain a helical region. These conformations are well grouped into clusters, based on the kinetic and conformation proximity;7,11,13,21,22 some characteristic images of them are shown in Figure 1. The conformations that have a curl-like structure with the C-terminal hairpin formed (Ch-curl), which are typical of the equilibrium folding,7,11,21,22 are rare in the first-passage folding13 because they are less stable than the antiparallel β-strands due to the distortion of the hydrogen bonds38 and more difficult to form dynamically because of the distant N- and C-terminal strands. Starting at an unfolded state, the protein proceeds to the native state either through the Ns-or and Cs-or conformations or directly. These pathways are effectively parallel, in which case folding kinetics are characteristically two-state,39 i.e., single exponential (Figure 2). At the same time, as shown in the previous paper,13 the probability flows through the Ns-or and

Figure 2. Distribution of the first-passage times. 1383

DOI: 10.1021/jp5112795 J. Phys. Chem. B 2015, 119, 1380−1387

Article

The Journal of Physical Chemistry B

or, Ns-or, and helical conformations in Figure 3 are closed. This indicates that the system dwells in these conformations executing small-scale rearrangements, which produce local flow vortices. One example of such rearrangements in the case of helical conformations is shown in Figure 5. In addition to the

Cs-or conformations are very small due to detailed balance established between these conformations and native-like states. Figure 3 shows the streamlines of the folding flows super-

Figure 5. Rearrangement of helical conformations. The conformations follow each other with an interval of 20 ps.

significant (native, Cs-or, Ns-or, and helical) conformations, in Figures 3 and 4 there are also indicated three others, whose examples are given in Figure 1. In contrast to the former, these conformations do not form well-defined clusters and just illustrate the diversity of the semicompact states of the protein that occur during the process of folding. Figure 6 shows the distribution of the flow divergence q(g) and vorticity ω(g) that are determined by eqs 7 and 9,

Figure 3. Streamlines of the folding flow superimposed on the free energy surface. The figure at the end of the streamline (0.01−0.99) indicates the fraction of the total flow restricted by the current streamline. Labels 1−7 mark the local regions of the conformation space that are related to the protein conformations shown in Figure 1: the native state (1), the Cs-or conformations (2), the Ns-or conformations (3), the helical conformations (4), and some others (5, 6, and 7).

imposed on the FES. It is seen that the fraction of the flow in the stream tubes which contain the Cs-or and Ns-or conformations is as small as ≈0.2 or smaller (≈0.1 for each stream tube), so that most of the flow (≈ 80% or larger) goes from the unfolding state to the native state directly. A similar picture is obtained from the vector field of the probability fluxes shown in Figure 4 (for the purpose of illustration the lengths of

Figure 4. Vector field of the probability fluxes. Labels 1−7 are as in Figure 3.

Figure 6. Distribution of (a) the divergence and (b) the vorticity of the flow. In (a), the positive values of the divergence correspond to the initiation of the folding trajectories in the unfolded states, and the negative values correspond to their termination in the native state.

the vectors are increased by a factor of 7.5 × 105). It is interesting to note that, according to the location of the basins on the FES, the minimum free energy path40 can be expected to go from the basin that contains the helical structures, which are readily formed in the initial stage of folding due to the shortrange contacts,13 to the basin that contains the Ns-or structures and native state. Such a path is in general agreement with the most probable (shortest) path through the kinetic network projected onto the (g1,g2) space,13 except that the latter goes close to the cluster of the Ns-or conformations rather than directly through it. We also note that the streamlines at the Cs-

respectively. The positive values of q(g) reveal the distributed sources of the folding flows, where the folding trajectories were initiated, and the negative values determine the sink of the flow, where the trajectories were terminated upon reaching the native state. In the rest of the g space, including the region between the sources and sink, q(g) = 0, i.e., the flow is divergence-free, because every trajectory initiated at the unfolded state ends in the native state and thus leaves any other local region of the g 1384

DOI: 10.1021/jp5112795 J. Phys. Chem. B 2015, 119, 1380−1387

Article

The Journal of Physical Chemistry B space after entering it. In contrast, the vorticity of the flow is nonzero in the entire flow field. As has been shown for a model β-hairpin,16 the vorticity inherently has different signs on the sides of the reaction channel, because the probability fluxes decrease toward the sides of the channel. The distribution of Figure 6b is however too noisy to see that because of low statistics (450 folding trajectories against 1 × 104 trajectories for the β-hairpin16), although the decrease of the probability fluxes toward the periphery of the flow field is well seen (Figure 4). The calculated components of the potential Φ(g) and Ψ(g) are shown in Figure 7. The Φ-component has two well-defined

It is evident that the substance ejected from the source will spread over the g plane in all directions. Therefore, to canalize the flow in the region between the source and the sink, where the Φ(g) surface is relatively flat, an addition force is required. This force is produced by the Ψ-component of the potential.16 In the present case, where the folding flow is mainly directed along the g1 axes (Figure 4), i.e., just the jg1-component of the flow is significant, it is clearly seen from eq 4. The Ψcomponent is determined by the vorticity of the folding flow via eq 8, in which, similar to the sources and sinks of the flow in eq 6, the local regions of ω(g) > 0 and ω(g) < 0 play the roles of the vorticity sources and sinks, respectively. Due to diffusion inherent to eqs 6 and 8, the flows produced by the discrete sources and sinks become smooth. For the Φ-component, this leads to the formation of a single peak in both the source and sink regions, though the distributions of the sources and sinks of the flow in these regions are not continuous (Figure 6a). A similar result is obtained for the Ψ-component, for which a very noisy distribution of the vorticity (Figure 6b) converts into a pair of well-defined ridges of different signs. It is significant that the Ψ-component is able to pick up the characteristic sign difference of the vorticity of the flow on the sides of the reaction channel16 even on the basis of a very noisy distribution of the vorticity, i.e., when the number of folding trajectories is limited. The Φ(g) and Ψ(g) distributions not only reveal general properties of the folding reaction but allow a deeper insight into details of the folding process. Figure 8 shows the contour plots

Figure 7. (a) Φ- and (b) Ψ-components of the potential.

peaks of different signs; the positive peak corresponds to the source of the folding flow, and the negative peak corresponds to the sink of the flow (cf. Figure 6a). The Ψ-component is also sign-changing; it consists of two ridges of different signs that connect the source and sink of the flow. In contrast to the Φcomponent, for which the presence of two peaks of different signs is easily explained by the distribution of the flow divergence (Figure 6a), the appearance of two well-defined ridges of different signs in the Ψ-component does not immediately follow from the distribution of the vorticity (Figure 6b), as it would be if the vorticity had different signs at the sides of the reaction channel.16 The role and formation of the Ψ-component become clear when the diffusion nature of eqs 6 and 8 is taken into account. Equation 6, in which q(g) > 0 in the source region, q(g) < 0 in the sink region, and q(g) = 0 in the rest of the g space, can be viewed as an equation that describes diffusion of some substance of density Φ(g) between the source and sink (the diffusion coefficient is equal to unity).

Figure 8. Contour plots of (a) Φ- and (b) Ψ-components of the potential. The labels 1−7 are as in Figures 3 and 4.

of the Φ- and Ψ-components with the characteristic protein conformations indicated as in Figure 3. From Figure 8a it is seen that the helical conformations (4), which are readily formed in the initial stage of folding,13 do not lie on the direct path from the source to sink of the folding flow; rather they require unfolding to allow the protein to form the characteristic 1385

DOI: 10.1021/jp5112795 J. Phys. Chem. B 2015, 119, 1380−1387

Article

The Journal of Physical Chemistry B

flow, i.e., to the unfolded states, where the folding trajectories were initiated, and its negative values correspond to the sink of the flow, i.e., to the native state, where the trajectories were terminated. Similarly, for the Ψ-component, the local regions of the positive and negative vorticity ω(g) play the role of the sources and sinks, respectively. In the context of the present study, Poisson’s equation can be considered as an equation that describes a process of diffusion. Therefore, the flows generated by the discrete sources and sinks become smooth, so that the resulting Φ(g) and Ψ(g) distributions acquire a simple form. The Φ-component is represented by a surface that has two well-defined peaks of different signs: one, positive peak for the source of the folding flow, and the other, negative peak for the sink of the flow. The Ψ-component is also sign-changing; it consists of the two ridges of different signs that connect the source and sink. Since the formation of such ridges is generally due to the different signs of vorticity on the sides of the reaction channel,16 it is significant that the Ψ-component is able to detect this difference even when it is “latent” as in Figure 6b. The Φ(g) and Ψ(g) surfaces are similar in shape to those for β-hairpin.16 This confirms the suggestion made in ref 16 that if the folding process is not complicated by the presence of essentially different pathways and long-living intermediates, the shape of the Φ(g) and Ψ(g) surfaces is universal. The considered potential for the driving force is not only important from the theoretical viewpoint, to indicate that it should have two components, but also gives valuable information about the folding process. In particular, examining the location of the characteristic protein conformations on the Φ(g) and Ψ(g) surfaces, we have found that the helical conformations do not lie on the direct path to the native state, and that the locally closed contours on the Ψ(g) surface indicate repeated local rearrangements of the protein.

β-structures. Figure 8b also shows that the closed contours of the Ψ(g) correspond to the closed streamlines of the flow (Figure 3), indicating that the local vortices of the flows are created due to the repeated local rearrangements of the protein conformations, as illustrated in Figure 5 for helical conformations.

4. CONCLUSION Using the hydrodynamic description of the folding reaction,8 we have studied the first-passage folding of a three-stranded antiparallel miniprotein (beta3s). The folding trajectories were initiated in the unfolded states of the protein and terminated in the native state; i.e., similar to the physiological conditions, the native state was assumed to be stable and unfolding events to be improbable.13 In this case, in accord with the hydrodynamic description, the folding reaction is represented by a steady flow of a “folding fluid” from the unfolded to native state; the density of the fluid is proportional to the probability for the protein to be in the current state. The simulations were performed with the CHARMM program,18 using an implicit solvent model. In comparison to the previous study,13 a larger number of folding trajectories (more than 2 times) were generated, which has allowed a more detailed characterization of the folding process. To describe the folding reaction, the multidimensional space of the hydrogen bond distances of the protein was reduced to a two-dimensional space of collective variables g = (g1,g2); for this, the previously developed hydrogen bond principal component analysis (HB PCA)7 was employed. The simulated folding trajectories were used to calculate the probability fluxes of the transitions between the states of the protein in this reduced space. The projection of the streamlines of the folding flows on the FES has shown that most of the folding flow (80% or larger) goes from the unfolded to native state directly rather than through the Cs-or and Ns-or conformations which are close to the native state (they have one hairpin formed and the other unstructured). This result is in agreement with our previous finding that these conformations are mostly in detailed balance with the nativelike states.13 The term “hydrodynamic” implies that a driving force of protein folding exists. As has been previously shown,7,8,13,14,16 and confirmed in the present study, the FES cannot play the role of the potential which produces the driving force of folding, because the folding flows do not follow the FES landscape. At the same time, the knowledge of these flows allows us to determine such a potential.16 For this, using the Helmholtz decomposition theorem,17 the vector field of the probability fluxes is represented as a sum of divergence-free and curl-free vector fields. Accordingly, the potential has two components. One component, Φ(g), corresponding to the curlfree field, is responsible for the source and sink of the folding flow, and the other, Ψ(g), corresponding to the divergence-free field, accounts for the overall vorticity of the flow and provides the canalization of the flow between the source and sink. In the present definition of the potential, the probability flux is proportional to the first-order space derivatives of the potential; i.e., the potential accounts for an overdamped motion, which is consistent with the characteristic conditions under which the folding reactions take place.41 Each of the components obeys Poisson’s equation with the corresponding source/sink terms. In the equation for the Φ-component, the source and sinks are determined by the distribution of the flow divergence q(g); the positive values of q(g) correspond to the sources of the folding



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS We are grateful to Prof. Martin Karplus for valuable discussions. This work was performed under a grant from the Russian Science Foundation (No. 14-14-00325).



REFERENCES

(1) Socci, N. D.; Onuchic, J. N.; Wolynes, P. G. Protein Folding Mechanisms and the MultidimensionalFolding Funnel. Proteins: Struct., Funct., Genet. 1998, 32, 136−158. (2) Boczko, E. M.; Brooks, C. L., III. First-Principles Calculation of the Folding Free-Energy of a 3-helix Bundle Protein. Science 1995, 269, 393−396. (3) Shea, J.-E.; Brooks, C. L., III. From Folding Theories to Folding Proteins: A Review and Assessment of Simulation Studies of Protein Folding and Unfolding. Annu. Rev. Phys. Chem. 2001, 52, 499−535. (4) Chekmarev, S. F.; Krivov, S. V.; Karplus, M. Folding Time Distributions as an Approach to Protein Folding Kinetics. J. Phys. Chem. B 2005, 109, 5312−5330. (5) Qi, B.; Muff, S.; Caflisch, A.; Dinner, A. R. Extracting Physically Intuitive Reaction Coordinates from Transition Networks of a BetaSheet Miniprotein. J. Phys. Chem. B 2010, 114, 6979−6989. (6) Jolliffe, I. Principal Component Analysis; Springer: Berlin, 1986. (7) Kalgin, I. V.; Caflisch, A.; Chekmarev, S. F.; Karplus, M. New Insights into the Folding of a Beta-Sheet Miniprotein in a Reduced

1386

DOI: 10.1021/jp5112795 J. Phys. Chem. B 2015, 119, 1380−1387

Article

The Journal of Physical Chemistry B

Constraints: Molecular Dynamics of n-alkanes. J. Comput. Phys. 1977, 23, 327−341. (29) Andersen, C. A.; Palmer, A. G.; Brunak, S.; Rost, B. Continuum Secondary Structure Captures Protein Flexibility. Structure 2002, 10, 175−184. (30) Seeber, M.; Cecchini, M.; Rao, F.; Settanni, G.; Caflisch, A. Wordom: a Program for EfficientAnalysis of Molecular Dynamics Simulations. Bioinformatics 2007, 23, 2625−2627. (31) E, W.; Vanden-Eijnden, E. Transition-Path Theory and PathFinding Algorithms for the Study of Rare Events. Annu. Rev. Phys. Chem. 2010, 61, 391−420. (32) Müller, K.; Brown, L. D. Location of Saddle Points and Minimum Energy Paths by a ConstrainedSimplex Optimization Procedure. Theor. Chim. Acta 1979, 53, 75−93. (33) Sobolev, S. Partial Differential Equations of Mathematical Physics; Pergamon Press: Oxford, 1964. (34) Van Kampen, N. G. Stochastic Processes in Physics and Chemistry; North-Holland: Amsterdam, 1992; Vol. 1. (35) Tomita, K.; Tomita, H. Irreversible Circulation of Fluctuation. Prog. Theor. Phys. 1974, 51, 1731−1749. (36) Graham, R. Covariant Formulation of Non-Equilibrium Statistical Thermodynamics. Z. Phys. B 1977, 26, 397−405. (37) Eyink, G. L.; Lebowitz, J. L.; Spohn, H. Hydrodynamics and Fluctuations Outside of Local Equilibrium: Driven Diffusive Systems. J. Stat. Phys. 1996, 83, 385−472. (38) Voet, D.; Voet, J. G. Biochemistry, 4th ed.; Wiley: Hoboken, NJ, 2011. (39) Levy, R. M.; Dai, W.; Deng, N.-J.; Makarov, D. E. How Long Does It Take to Equilibrate the Unfolded State of a Protein? Protein Sci. 2013, 22, 1459−1465. (40) Maragliano, L.; Fischer, A.; Vanden-Eijnden, E.; Ciccotti, G. String Method in Collective Variables: Minimum Free Energy Paths and Isocommittor Surfaces. J. Chem. Phys. 2006, 125, 024106. (41) Rohrdanz, M. A.; Zheng, W.; Clementi, C. Discovering Mountain Passes via Torchlight: Methods for the Definition of Reaction Coordinates and Pathways in Complex Macromolecular Reactions. Annu. Rev. Phys. Chem. 2013, 64, 295−316.

Space of Collective Hydrogen Bond Variables: Application to a Hydrodynamic Analysis of the Folding Flow. J. Phys. Chem. B 2013, 117, 6092−6105. (8) Chekmarev, S. F.; Palyanov, A. Y.; Karplus, M. Hydrodynamic Description of Protein Folding. Phys. Rev. Lett. 2008, 100, 018107. (9) Krivov, S. V.; Karplus, M. Free Energy Disconnectivity Graphs: Application to Peptide Models. J. Chem. Phys. 2002, 117, 10894− 10903. (10) Gavrilov, A.; Chekmarev, S. Graphic Representation of Equilibrium and Kinetics in Oligopeptides: Time-Dependent Free Energy Disconnectivity Graphs. In Bioinformatics of Genome Regulation and Structure; Kolchanov, N., Hofestaedt, R., Eds.; Springer: Berlin, 2004; pp 171−178. (11) Rao, F.; Caflisch, A. The Protein Folding Network. J. Mol. Biol. 2004, 342, 299−306. (12) Landau, L. D.; Lifshitz, E. M. Fluid Mechanics; Pegramon: Oxford, 1987. (13) Kalgin, I. V.; Chekmarev, S. F.; Karplus, M. First Passage Analysis of the Folding of a Beta-Sheet Miniprotein: Is it More Realistic Than the Standard Equilibrium Approach? J. Phys. Chem. B 2014, 118, 4287−4299. (14) Kalgin, I. V.; Karplus, M.; Chekmarev, S. F. Folding of a SH3 Domain: Standard and “Hydrodynamic” Analyses. J. Phys. Chem. B 2009, 113, 12759−12772. (15) Kalgin, I. V.; Chekmarev, S. F. Turbulent Phenomena in Protein Folding. Phys. Rev. E 2011, 83, 011920. (16) Chekmarev, S. F. Protein Folding: Complex Potential for the Driving Force in a Two-Dimensional Space of Collective Variables. J. Chem. Phys. 2013, 139, 145103. (17) Weber, H. J.; Arfken, G. B. Essential Mathematical Methods for Physicists; Elsevier: San Diego, 2004. (18) Brooks, B. R.; Brooks, C. L., III; MacKerell, A. D., Jr.; Nilsson, L.; Petrella, R. J.; Roux, B.; Won, Y.; Archontis, G.; Bartels, C.; Boresch, S.; Caflisch, A.; Caves, L.; Cui, Q.; Dinner, A. R.; Feig, M.; Fischer, S.; Gao, J.; Hodoscek, M.; Im, W.; Kuczera, K.; Lazaridis, T.; Ma, J.; Ovchinnikov, V.; Paci, E.; Pastor, R. W.; Post, C. B.; Pu, J. Z.; Schaefer, M.; Tidor, B.; Venable, R. M.; Woodcock, H. L.; Wu, X.; Yang, W.; York, D. M.; Karplus, M. CHARMM: The Biomolecular Simulation Program. J. Comput. Chem. 2009, 30, 1545−1614. (19) Ferrara, P.; Caflisch, A. Folding Simulations of a Three-Stranded Antiparallel Beta-Sheet Peptide. Proc. Natl. Acad. Sci. U. S. A. 2000, 97, 10780−10785. (20) Settanni, G.; Rao, F.; Caflisch, A. Φ-Value Analysis by Molecular Dynamics Simulations ofReversible Folding. Proc. Natl. Acad. Sci. U. S. A. 2005, 102, 628−633. (21) Muff, S.; Caflisch, A. Kinetic Analysis of Molecular Dynamics Simulations Reveals Changes inthe Denatured State and Switch of Folding Pathways upon Single-Point Mutation of a Beta-Sheet Miniprotein. Proteins: Struct., Funct., Bioinf. 2008, 70, 1185−1195. (22) Krivov, S. V.; Muff, S.; Caflisch, A.; Karplus, M. OneDimensional Barrier-Preserving Free-EnergyProjections of a BetaSheet Miniprotein: New Insights into the Folding Process. J. Phys. Chem. B 2008, 112, 8701−8714. (23) Muff, S.; Caflisch, A. ETNA: Equilibrium Transitions Network and Arrhenius Equation forExtracting Folding Kinetics from REMD Simulations. J. Phys. Chem. B 2009, 113, 3218−3226. (24) De Alba, E.; Santoro, J.; Rico, M.; Jimenez, M. De novo Design of a Monomeric Three-StrandedAntiparallel Beta-Sheet. Protein Sci. 1999, 8, 854−865. (25) Neria, E.; Fischer, S.; Karplus, M. Simulation of Activation Free Energies in Molecular Systems. J. Chem. Phys. 1996, 105, 1902−1921. (26) Ferrara, P.; Apostolakis, J.; Caflisch, A. Evaluation of a Fast Implicit Solvent Model for Molecular Dynamics Simulations. Proteins: Struct., Funct., Bioinf. 2002, 46, 24−33. (27) Cavalli, A.; Ferrara, P.; Caflisch, A. Weak Temperature Dependence of the Free Energy Surface and Folding Pathways of Structured Peptides. Proteins: Struct., Funct., Bioinf. 2002, 47, 305−314. (28) Ryckaert, J.-P.; Ciccotti, G.; Berendsen, H. J. Numerical Integration of the Cartesian Equations of Motion of a System with 1387

DOI: 10.1021/jp5112795 J. Phys. Chem. B 2015, 119, 1380−1387