Unfolding at a Local Scale

Feb 15, 2018 - In the prism of the model, there is no stable structure at the local scale in this 20 K unfolding temperature range. ... In this work, ...
0 downloads 10 Views 1MB Size
Subscriber access provided by UNIV OF DURHAM

Article

A Statistical Model to Decipher Protein Folding/unfolding at a Local Scale Paul Grassein, Patrice Delarue, Harold Abraham Scheraga, Gia G. Maisuradze, and Patrick Senet J. Phys. Chem. B, Just Accepted Manuscript • DOI: 10.1021/acs.jpcb.7b10733 • Publication Date (Web): 15 Feb 2018 Downloaded from http://pubs.acs.org on February 22, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

The Journal of Physical Chemistry B is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

A Statistical Model to Decipher Protein Folding/Unfolding at a Local Scale Paul Grassein,1 Patrice Delarue,1 Harold A. Scheraga,2,* Gia G. Maisuradze2,* and Patrick Senet1,2,* 1

Laboratoire Interdisciplinaire Carnot de Bourgogne, UMR 6303 CNRS-Univ. de Bourgogne Franche-Comté, 9 Av. A. Savary, BP 47 870, F-21078 Dijon Cedex, France 2

Baker Laboratory of Chemistry and Chemical Biology, Cornell University Ithaca, New York 14853-1301, USA

*

To whom correspondence may be addressed.

[email protected] [email protected] [email protected]

Author contributions: P. G. and P. S. designed research; P. G., P. D., G. G. M., and P. S. performed research; P. G., P. D., G. G. M., and P. S. analysed data; and P. G., H. A. S., G. G. M., and P. S. wrote the paper.

ACS Paragon Plus Environment

1

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 35

Abstract Protein folding/unfolding can be analysed experimentally at a local scale by monitoring the physical properties of local probes as a function of the temperature, as for example the distance between fluorophores or the values of chemical shifts of backbone atoms. Here, the analytical Lifson-Roig model for the helix-coil transition is modified to analyse local thermal unfolding of the fast-folder W protein of bacteriophage lambda (gpW) simulated by all-atom molecular dynamics (MD) simulations in explicit solvent at fifteen different temperatures. The protein structure is described by the coarse-grained dihedral angles (γ) and bond angles (θ) built between successive Cα-Cα virtual bonds. Each (γ,θ) pair serves as a local probe of protein unfolding. Local native/non-native states are defined for each pair of (γ,θ) angles by analysing the free-energy landscapes

computed from MD trajectories. The three

local elementary equilibrium constants of the model are extracted for each (γ,θ) pair along the sequence from MD simulations and the model predictions are compared to the MD data. Using only the local equilibrium constants as an input, we show that the local denaturation curves computed from the model partition function fit their MD simulated counterparts in a satisfying manner without any adjustment. In the model and MD simulations, gpW unfolds gradually between 320 K and 340 K, with an average native percentage decreasing from 0.8 (320 K) to 0.2 (340 K). In the prism of the model, there is no stable structure at the local scale in this 20 K unfolding temperature range. The enthalpy change upon local unfolding computed from the model and from MD trajectories suggests that the unfolded state between 320 K and 340 K corresponds to a dynamical equilibrium between a large ensemble of constantly changing structures. The present results confirm the downhill unfolding of gpW which does not obey a two-state global folding/unfolding model and shed light on the interpretation of local denaturation curves.

ACS Paragon Plus Environment

2

Page 3 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

I. INTRODUCTION To perform their functions in living organisms, most proteins must fold into a unique three-dimensional structure. In spite of long, assiduous experimental and theoretical studies, the question of how proteins reach their biologically active conformations still remains to be answered. Protein folding is undoubtedly one of the most challenging and important problems in biophysics, the solution of which will have practical consequences in medicine, drug development and agriculture. Shortly after the discovery of the α-helix by Pauling in 1951, analytical statistical models were proposed to study the thermodynamics of the helix-coil transition, which was then thought to be the main process in protein folding1. Inspired by the Ising model, Zimm and Bragg2, Lifson and Roig (LR)3, Poland and Scheraga1 proposed different ways to render the protein chain as a sequence of native/unfolded residues obeying the laws of statistical physics. While it was proven that one-dimensional ferromagnetic models could not undergo a phase transition, they showed that an amino acid chain, in which the helix state propagates once a nucleation barrier has been overcome, actually produces a sharp helix-coil transition. Their models successfully predicted the formation of helices with good precision, especially for long homopolymers2. However, it was not the case for most proteins which have other kinds of secondary structures and a native structure stabilized by the formation of contacts between residues which are distant along the amino-acid sequence (non-local interactions)4. Models including non-local interactions were studied extensively by Poland and Scheraga1. More recently, Ising models based on contact maps (residues in contact in the native state) were improved to model quantitatively experimental folding data5,6. The parameters of these models are fitted directly on global experimental data (fraction of the native state or heat capacity changes) by using phenomenological relations, as for example the relation between the heat capacity and the number of native contacts5.

ACS Paragon Plus Environment

3

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 35

With the advance of all-atom molecular dynamics (MD) simulations and the increase of the computational power, there was a renewal of interest for analytical statistical models to analyse simulated MD data7-9. We show here that such a statistical model can be used to interpret MD trajectories of protein unfolding and to shed light on the complexity of protein unfolding at a local scale. Indeed, measurements of the evolution of the native character of a protein locally as a function of the temperature is possible by monitoring the distance spectroscopic signals related to specific residues9-12, distance between fluorescent probes13 or by measuring the chemical shifts of atoms of the labelled amino-acids as a function of temperature14. A detailed study of the local folding/unfolding of the W protein of bacteriophage lambda (gpW) protein15 was performed recently by monitoring the chemical-shifts of 13C and 15

N of the backbone between 273 K and 373 K14. The thermal unfolding of gpW is interpreted

as a downhill unfolding.16,17 It is hypothesized that gpW unfolds without crossing high freeenergy barriers and changes gradually from the native state to the unfolded state.14 This behavior, named downhill folding/unfolding, is common to many small proteins and is markedly different from the two-state paradigm which describes protein folding as an all-ornone transition in which the native and unfolded thermodynamical states are separated by a large free-energy barrier16,17. In this work, we investigate the thermal unfolding of the gpW protein (Fig. 1a) in water by performing 19 long (750 ns) all-atom molecular dynamics (MD) simulations in explicit water between 280 K and 380 K (rescaled temperature, see Methods section). We modified the LR model3 and applied it to the coarse-grained (γ,θ) angles (Fig. 1b) as an interpretation prism of the all-atom MD data of gpW unfolding at the local scale; i.e., we monitored the native character of each (γ,θ) pair as a function of the temperature. The coarsegrained angle coordinates are used to describe large changes of protein conformation and are

ACS Paragon Plus Environment

4

Page 5 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

part of coarse-grained models of proteins18,19. The present statistical model is complementary to Ising models based on native contacts1,5,6. Because (γ,θ) are defined all along the sequence, these variables can be more easily used to monitor the temperature-dependence of disordered parts of a protein. The present model is appropriate to analyse local folding curves derived from NMR chemical shifts along the sequence14. Indeed, the chemical shift is a property intimately related to the local conformation of the backbone as described by (γ,θ). We defined the local native character of the protein based on the computed twodimensional free-energy landscapes (FELs) ∆G(γ,θ)20-22. Each (γ,θ) pair thus acts as a local probe of the gpW unfolding, as the chemical shifts do in the NMR measurements of the local unfolding of gpW. Simulations of the chemical-shift variations from the present MD simulations of gpW is part of another work, which will be reported elsewhere. Here, the parameters of the modified LR model along the amino-acid sequence are computed after studying the conformational entropy differences between the coarse-grained angle states of gpW. The local denaturation curves of each (γ,θ) pair predicted by the model are compared to those computed from the MD data and to those derived from the variation of the chemical shifts of the backbone atoms as a function of the temperature14 to obtain insight in the local unfolding process. The combination of the modified LR model and MD simulations produces meaningful physical quantities (such as enthalpy, local folding temperature, etc.) which can be compared to similar quantities extracted from experimental local denaturation curves.

II. METHODS A. Molecular dynamics simulations of the thermal unfolding of gpW 1. Simulations Setup. Starting from the native structure of gpW (PDB ID: 2L6Q) (Figure 1a), 19 allatom MD simulations were performed with Amber99sb-ildn-q force field23, implemented in a

ACS Paragon Plus Environment

5

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 35

Gromacs package24, in explicit water using a TIP3P model25. Na+,Cl- ions were added at a concentration of 0.1M. A Berendsen thermostat26 was used with temperature ranging from 300K to 500K. A Parinello-Rahman barostat27 kept the pressure constant at 1 bar. Integration was performed with a time-step of 1 fs. Coordinates were recorded every 1 ps. Trajectories. 750 ns MD simulations were performed at 300K, 310K, 320K, 330K, 340K, 350K, 360K, 370K, 380K, 410K, 420K, 450K, 500K (one simulation at each temperature); plus, six 750 ns MD trajectories were performed at 390K and 400K (three simulations at each temperature). Temperature scaling. In order to compare with experiments, the temperature range used in simulations needs to be rescaled. Indeed, force fields of proteins are in general biased to stable native structures with strong interaction potentials which do not allow reproduction of the correct thermal evolution of the macromolecule quantitatively. The same issue is encountered for the force fields of water. Therefore, after the MD runs of gpW in explicit solvent were completed, temperatures were corrected by comparing the

13 α

C chemical shifts

computed from the structures generated in our simulations, with those measured as a function of the temperature, as will be reported in details elsewhere. The linear correlation, Texp = 0.5Ttheor + 130, provides the best adjustment for the NMR jump resulting from unfolding of the protein in the experimental denaturation curves based on the chemical shifts. Therefore, temperature was rescaled from [300K, 500K] to [280K, 380K]. Similar rescaling of the MD temperature was used by others in recent studies14.

2. Virtual-bond angles and local free energy landscapes To describe the unfolding of gpW at the local scale, we chose the coarse-grained dihedral angles γi and bond angles θi.18 The γi is the dihedral angle formed by the vectors (virtual bonds) joining four successive Cα atoms (i – 1, i, i + 1, and i + 2) along the amino-

ACS Paragon Plus Environment

6

Page 7 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

acid sequence (Fig. 1b). The first dihedral angle along the sequence is γ2 and the last is γN-2 (N is the total number of residues). The θi is the angle formed by the vectors (virtual bonds) joining three successive Cα atoms (i – 1, i, i + 1) along the amino-acid sequence (Fig. 1b). The first angle along the sequence is θ2 and the last is θN-1. The gpW is a 62-residue protein; therefore, 59 pairs of

(i = 2 - 60) were defined. The pair (γ,θ)60 is the last pair that we

can analyse because the γ61 dihedral angle does not exist. By sampling the MD trajectories, we computed a two-dimensional probability distribution function,

of the coarse-grained angles. Then we calculated the free-

energy landscapes associated to these variables at each temperature T: ∆‫(ܩ‬γ,θ)i = - ܴ݈ܶ݊[P(γ,θ)i /max(P)]

(1)

where R is the gas constant (R= 8.314 kJ/mol/K). Dividing by the maximum of Eq. (1) sets the zero of free energy at the most probable

in

conformation. As usual,20-22 in

the Results section, dimensionless free-energy landscapes, ∆G(γ,θ)i / RT, are traced.

3. Defining local states, local native fraction and native contacts At low temperature, 280K, the vast majority of the conformations explored by gpW are in the native state. At a local scale we define the native values of

as the pairs of

angles whose free energy at 280K is under a certain threshold defined by mRT, where m is an integer: (2) Therefore, the native values of

correspond to the conformations which have a

probability larger than Pmax exp(-m). All the values obeying Eq. (2) define the local native state Ni of the protein. All the values that do not obey Eq. (2) define the local unfolded state Ui

ACS Paragon Plus Environment

7

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 35

of the protein. In the Results section we will compare the model for thresholds with m = 2, 3 and 4. Once native and unfolded states are defined at a local scale, we classify time-series either in a native state Ni or an unfolded state Ui along the MD trajectories, building a two-value local state time-series

. A local measure of

the degree of denaturation of the protein at temperature T is the fraction of local native states (later called local native fraction): (3) The global degree of denaturation is represented by the protein native percentage at time t: (4) The average of %N(t) over time indicates the global native character of gpW at a given temperature T. In the Results section, we will report the probability distribution of %N. The native contacts were defined as follows. First, an average distance map of Cα pairs was computed for the trajectory at 280K. Then, in this map we selected the non-diagonal and non-adjacent (i, i+1) distances and applied a 5.5 Å filter: Cα pairs below this threshold were considered as contacts of the native state. After removal of symmetric duplicates, we obtained 17 different pairs of Cα in contact at 280K. Then for every trajectory, the percentage of native contacts is defined as the percentage of these contacts established in average along the trajectory. 4. Computing mean enthalpy change upon local folding For each value of i, using the local state time-series

as a filter, each trajectory is

divided in two sets of snapshots: the native set Ni (si = 1), and the unfolded set Ui (si = 0).

ACS Paragon Plus Environment

8

Page 9 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

At any time t, we know the total enthalpy of the system (protein + water + ions). For a given value of i, we compute the average enthalpies

and

over the two sets of

snapshots. Therefore, 〈H〉Ni - 〈H〉Ui is the average enthalpy change associated with the folding of

, provided the contributions of water and ions to enthalpy are equal in both sets. This

condition should be met as long as there is enough snapshots in both

and

ensembles to

cover up the fluctuations of H(t). The

time-series produced by MD simulations randomly fluctuates around

ranging from -2247000 to -1046000 kJ/mol as temperature increases from T=280 K to T=380 K, respectively. The standard deviation is different

numbers

n

of

snapshots,

4200 kJ/mol, which is at most 0.4%. For we

compute

the

95%

confidence

interval

. Results in kJ/mol are listed in the table below:

n

∆ [kJ/mol]

10000

126

50000

42

100000

33

500000

16

The average difference will be shown in the Results section. In the unfolded ensemble, n is always greater than 10000. For the native ensemble, n > 100000. B. Projecting gpW local states onto LR helix-coil transition model 1. Adapting the LR model to the thermal unfolding of backbone angles (γ,θ) The original LR model3 was developed to evaluate the configurational partition function of a polypeptide molecule, in order to treat the helix/coil transition with the Ramachandran rotation angles (φ,ψ)i as variables. In summary, a state of the polypeptide chain hhhhcchhchcc, where h and c stand for (φ,ψ)i in helix and coil conformations,

ACS Paragon Plus Environment

9

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 35

respectively, adds to the partition function the product vwwvuuvvuvuu, where u,v,w are statistical weights of (φ,ψ)i units obeying the following rules: 1. if a unit is in the coil state, its weight is always u; 2. if a unit is at the beginning or at the end of an uninterrupted sequence of helical states, its contributes with a weight v; 3. if a unit is at the interior of a sequence of at least three helical states, it contributes with a weight w. By analogy with the original model, we represent gpW by the sequence of its backbone angles

which can be either in native

or in unfolded state

the aforementioned rules, we associate statistical weights

. According to

to local

states.

Dependence of the parameters upon position in sequence was added to account for heterogeneity induced by the various amino-acid configurations. Following the LR model, the expression of the parameters developed on all the possible values of

reads as follows:

(5)

As in the original model, and constraints, and

is the potential energy associated to the

geometry

is the interaction potential between neighboring segments of the

protein, which takes nonzero values only if all respective native states. The

factor is necessary to integrate

are in their on the sphere (as the

coarse-grained angles can be interpreted as the spherical coordinates of a unit vector). 2. Estimation of To assign values to

parameters that are consistent with the MD results, we chose to start

from the physical meaning of the parameters and find a relation between them and quantities extracted from the simulations.

ACS Paragon Plus Environment

10

Page 11 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

On the one hand, when

without being inside a native section of at least

three units, we assume no intrinsic energetic preference for the conformation of Cα-Cα virtual bonds and set

in Equation 5. The native conformation is less probable than the

unfolded one only because it spreads only over a minority of the available angle space. Therefore, the barrier between the local native and local unfolded states is purely entropic in this case. As the entropy S is given by the Boltzmann law, of microscopic states, the statistical weights states in

and

and

, where Ω is the number

are related to the respective number of

, respectively, and hence, do not vary with temperature. Now

reduce to the portion of area of the sphere corresponding to

and

and

. The areas will be

integrated numerically over the local states defined from the analysis of the free-energy landscapes ∆G(γ,θ)i. On the other hand, if

and if it is at the interior of a native section of at

least three units, we assume an energy gain Vi, which does not depend on the particular values of the angles. In the original LR model, the gain corresponded to the formation of hydrogen bonds in the α-helix. In this case, the competition between the energy gain and the entropy makes the barrier between native and unfolded local states vary with temperature. Setting V=0 and V’=Vi in Eq. 5, the Boltzmann weight factor associated to

becomes: (6)

Therefore, the only dependence of the model upon temperature lies in remaining quantity to estimate is

and the

. It is the mean energy gain when secondary structure

forms locally and stabilizes the native conformation. Concretely it should correspond to an average number of hydrogen bonds formed in the conformation. Competition between unfolding and folding comes to a critical temperature

when

the energetic stabilization is compensated by the entropic advantage of the unfolded conformation. At that point we have:

ACS Paragon Plus Environment

11

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 35

(7) When T ≤ T0i = - Vi/[Rln(ui/vi)], the propagation of the local native states becomes favorable and must lead to an increase of the local native fraction. However, the stability of native

also requires

to be high enough to compensate for the two

the edges of the native fragment couple is never stable, and the native fraction

weights of

. For this reason an isolated native

must be higher than the actual melting temperature, at which . Therefore, we will estimate

at the right end of the local

native fraction jump in the native fraction curve as a function of the temperature (the local denaturation curve). Once

are estimated for every pair of angles, the corresponding

and

can be determined by using Eq. 7. 3. Computing the partition function and subsequent thermodynamic quantities As LR noticed3, the expression of the partition function

can be written with a 4x4

matrix, which will enable more comfortable developments than the original 3x3 matrix formalism. As we introduced heterogeneity along the sequence,

can be expressed as : (8)

where:

(9)

and: (10) The 4-coordinate vector configurations

contains the statistical weights of

in the

(state at the left is 2). The 4-coordinate vector

contains the

ACS Paragon Plus Environment

12

Page 13 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

statistical weights of of

where the state at the right is

. Non-zero elements

matrices contain the statistical weights for the eight possible configurations of ,

:

, etc. with state at the middle being the j-th.

Fraction of native local state: the fraction defined previously as the average of the local state time-series simply appears in the LR model as an equilibrium constant part of

in which

.

is the

is restrained to be in native state. The restraint is achieved by

inserting a matrix that nullifies the contribution of the statistical weights corresponding to unfolded

:

(11)

To obtain

, one only has to replace

parameters by 0 in

.

Mean enthalpy difference: we want to compute the average change of enthalpy

when

shifts from a local unfolded to a local native state. The relation applies to the whole system and to any subset of states of the system. For each i two complementary subsets are defined using the model: native Ni with native unfolded

, and Ui with

. The Ni and Ui local states defined in the model are similar to the

corresponding states defined by the threshold in ∆G(γ,θ) free-energy landscapes computed from the MD trajectories. The partition functions corresponding to the Ni and Ui ensembles are respectively

computed in Eq. 11 and

enthalpy by numerical differentiation of

. We obtain the average change of :

(12)

for each pair of coarse-grained angles.

ACS Paragon Plus Environment

13

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 35

III. RESULTS AND DISCUSSION A. Native coarse-grained angle states of gpW We have calculated the free-energy landscapes of

(i = 2…60) for all MD

trajectories. Panels a, b, c of Figure 2 illustrate FELs of i = 12 and i = 26, as typical examples of α-helix and β-sheet, respectively. At low temperature, the FEL of helix structure has a small extent (less than 10 degrees) (panel a), which indicates that both γ and θ angles pertaining to the helix structure explore very small regions of the conformational space indicating a rather rigid conformation. On the contrary, the conformational space explored by the

angle pertaining to the β-sheet is about 50 degrees wide (panel b), allowing a certain

freedom of flexibility in this kind of secondary structure.

values of the twenty 2L6Q

conformations were added on the landscape for comparison. Compared to the most probable areas of the landscape (in dark tones), the experimental

(panels a and b of Figure 2)

values spread in the same manner, near to each other in the helix and spread in the sheet, as computed

values. A

offset of 30 degree for β-sheet conformation in MD compared to

the NMR-derived values can be noticed. Increasing temperature provokes spreading of the FEL explored by the protein and subsequent unfolding of gpW. Unlike their low temperature counterparts, high temperature FELs feature a uniform aspect. For this reason only

was displayed in Figure 2. At

T=355K, gpW irreversibly unfolded in a few nanoseconds.

FEL shows the

exploration of all accessible values. Although there is still a free energy minimum near the helix conformation, this is not a memory effect accounting for the local native conformation, but rather the sign that helix

values [centered at (50°,90°)] generally have a high

probability to occur, as does the case in our work for the great majority of the free-energy landscapes of gpW, including non-helix native structures. The seeming paradox - of an unfolded protein with all minima

in helix conformation – is raised considering that

ACS Paragon Plus Environment

14

Page 15 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

the most stable value is still a minority, and that the possibility of having multiple angles folded in helix conformation at the same time is by far unlikely. However, we will see later that locally, the conformation of the helix native state (and in fact all the local native states) is still explored at high temperatures, and we should not be surprised that the probability of this region of the free-energy landscape does not converge to zero. Now we define a separation between native and non-native values of presented in the Methods section, we select in the FEL the most probable

. As at T=280K

by defining a threshold equal to mRT with m=2, 3 or 4 from the free-energy minimum of ∆G(γ,θ)i. The thresholds ∆G=mRT are plotted as white contours in the FELs shown in Figure 2. The close concentric circles indicate no qualitative difference between the three possible values of m. We chose a threshold of 4RT as it encompasses almost all values of angles found in the NMR-derived structure (PDB code: 2L6Q). On the FEL at T=355K only the 4RT threshold is reported. Inside the white contour the

values are being part of the local

native state Ni. Let us now describe the evolution of the FEL as temperature increases. Below the unfolding temperature, each FEL covers specific area, which extends with the increase of the temperature, but the minimum is still centered on the local native state. In a temperature range of about 20 K near 330 K, each FEL suddenly explores other ∆G(γ,θ) minima and the secondary structures are not longer stable. Above the unfolding temperature, each FEL tend to explore all accessible

values and the FEL converges to a similar rather flat FEL all

along the amino-acid sequence. At low temperature, interactions dominate and restrain the exploration of the FEL while at high temperatures, stochastic exploration prevails over interactions and the FEL is widely explored by the protein. In the limit of purely stochastic exploration (i.e. at infinite temperature), the probability of being locally native or unfolded would be proportional to the

ACS Paragon Plus Environment

15

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

area of each state on the estimate the and

Page 16 of 35

unit sphere. So by computing these areas for T=380K we can

parameters of the model by computing Equation 5 numerically. values along the sequence are shown in Figure 2 (d). The sum

constant mainly because we could not sample all the rarest

values of the unfolded state

in the present MD time-scale of 750 ns. Except for N and C-termini, we have is the expected behavior.

is not

, which

is lower in an α-helix section than in β-sheets or loops. The

smaller is the extent of the native state, the higher is the entropic barrier to the local folding. B. Thermal unfolding of gpW at local scale: asynchrony and limited cooperativeness Applying the local native state definition to MD trajectories [Eq. (2)], we computed the local native fraction [Eq. (3)] for denaturation curves

and for

. The local

are shown in Supporting Information (Figure S1) where they are

compared to the experimental NMR-derived denaturation curves14. In Figure 3 (a), we present the global percentage of protein denaturation computed from the sum of local native fractions [Eq. (4)]. The experimental global folding curve derived from the variation of the NMR chemical shifts14 compares very well with the folding curve based on (γ,θ) extracted from the MD simulations. For comparison, the more common denaturation curve based on native contacts is also represented in Figure 3 (a). As shown in Figure 3, the folding curve based on native (γ,θ) better describes the experimental denaturation curve below the folding temperature than the curve based on native contacts. The curve of the percentage of native (γ,θ) cannot converge to zero at high temperature by definition whereas the curve of the percentage of native contacts should be. Two-dimensional probability distribution of the percentage of native contacts and the percentage of native (γ,θ) were also computed and are shown as heat maps in Figure 3 (b) to (d). These maps show a

ACS Paragon Plus Environment

16

Page 17 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

good accordance between both definitions of the global nativeness of the protein, as the majority of events is concentrated near the diagonal of the maps. In Figure 4 (a), we present the evolution of the native fraction of

computed

from MD trajectories and averaged on secondary structures as a function of T. Local unfolding happens in a 20 K interval centered on approximately 330 K. This interval is actually the unfolding temperature domain of gpW. Unlike bulk phase transitions, denaturation of gpW at a local scale goes rather progressively with temperature. Also, the high temperature values of

is always above 0.1, meaning that the native

values are

not excluded from the exploration but a minority among values of the unfolded state. From

computed from MD trajectories, we extracted

necessary to compute the

and

sequence are shown in Figure 4 (b).

parameters of the model.

values that are and

values (in violet) are steady for

along the . Near N and

C-termini of the protein, disordered extremities tend to unfold neighboring sections and to lower

. This is especially the case in helix

where section

unfolds up to 10

degrees colder than the rest. Then we computed

along the sequence. Values in black in

Figure 4 (b) approach 0 at N and C-terminal ends, reflecting low stability of these parts. Helices appear to be the most stable parts of gpW with a minimum

varying from -12.5 to -

10.5 kJ/mol. In the original LR3 and Zimm&Bragg2 models, the enthalpy difference between the folded and unfolded local states is about ∆H ≅ +4.1 kJ/mol, which is not only a different number but also has an opposite sign compared to the present study [see Figure 5 (c)]. The relation between the elementary equilibrium constant and ∆H in the LR model is , which is the same as in the present work when and

by

. According to this relation, a positive

is replaced by

would mean that 1) s increases with

ACS Paragon Plus Environment

17

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 35

T, leading to unstable cold helix formation and stable hot helix formation, and 2) that helix formation is globally endothermic. Therefore, one expects a negative ∆H as shown in Figure 5. Now since all the parameters of the model have been estimated, we compute the native fractions of

in the matrix formalism. Secondary structure average fractions are shown

in Figure 4 (a), on the same plot as local fractions computed directly from MD data [Eq. (3)]. We obtained good agreement between MD data and the model values without automated fitting. Local unfolding computed from the analytical model spans over the same temperature interval [320K, 340K]. The transition first looks sharper in the model for average helix fractions (blue and red lines) than in MD simulations. The steeper slope is, however, to be related to the lower high temperature limit of the limit of

in the model, which tends to ~0.03 while

in MD is more ~0.16. This rather high limit of the helix conformation of

is not explained by the ratio

which cannot be that large. Therefore, there must be

another interacting potential that enhances the probability of helix

values without

relying on the formation of a three-unit helix segment. At that point it is still difficult to say whether the simulation length or the model has reached its limit. The folding temperatures of the secondary structures deduced from the model denaturation curves are lower than the denaturation curves extracted from the NMR chemical shifts variations14 as shown by the vertical lines in Figure 4 (a). It is worth noting that the parameters of the model were not fitted on the experimental data but on MD trajectories. As it can be seen in Figure 4 (c), the local folding temperatures extracted from the model varied over 3K whereas the folding temperatures of the secondary structures extracted from the model vary on about 1.2K. In the NMR experiment14, the meting temperatures of the secondary structures varies on 3K and are higher than those predicted by model fitted on MD data. The folding temperatures of the secondary structures are Tα2 < Tβ < Tα1 in the model and

ACS Paragon Plus Environment

18

Page 19 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Tβ < Tα2 < Tα1 for the folding curves derived from the variation of the NMR chemical shifts. The fact that Tβ is smaller than Tα2 and Tα1 as measured by NMR is in agreement with the observation of an intermediate partially unfolded state (named X) at 274K where the β secondary structures were unfolded whereas the helix remained stable28. In the present MD simulations, the intermediate state X represents an extremely small minority of intermediate partially unfolded states. For example, in one trajectory at 325 K, we observed 24 snapshots corresponding to the X structure over 750000 snapshots. However, the experimental temperature at which the intermediate state was observed (274K) is very far from the conditions of the MD simulations.

C. Downhill (un)folding of gpW is seen as multiple-state model near melting temperature Slowly decreasing local unfolding curves recall the question of knowing whether gpW is a two-state or downhill folder. Recent works tend to prove the latter, especially Munoz et al who computed a marginal folding free-energy barrier and more recently tried to identify interaction networks by monitoring local NMR probes for

14

. Unlike a two-

state folder, a downhill folding protein progressively undergoes the unfolding transition as it approaches the melting temperature, instead of brutally switching from native to completely unfolded state at a critical temperature.17 This evolution is allowed by a small energy barrier between native and other states near the melting point, small enough to enable a continuous unfolding driven only by the increase of temperature. Using the local states definition applied on our MD simulation data, alongside a statistical folding/unfolding model attuned to statistics of gpW, we investigate how the protein’s native structure loses its stability in the unfolding temperature interval [320K, 340K].

ACS Paragon Plus Environment

19

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

First, in Figure 5 (a), the evolution of

along the

Page 20 of 35

sequence computed from

the MD data as temperature increases is described, drawing an “unfolding profile” in some manner. In the unfolding temperature domain, native fractions are heterogeneous, spreading across a 40%-wide interval.

pairs do not quit the native conformation altogether.

Cooperativeness appears to be limited in extent to certain groups of

. As temperature

increases, the unfolding profile decreases while keeping the same pattern of heterogeneity, which suggests that the groups of cooperative angles are reproduced in the independent trajectories. Represented in Figure 5 (b) is the homologous i.e.

profile calculated using the model,

. Heterogeneity along the sequence is much less marked. As a matter of fact,

the statistical model is strongly cooperative because the stability of folding fundamentally relies on the propagation of long native chains. Here, the model contradicts the information extracted from MD simulations: the rough aspect of the MD unfolding profile may result from too short simulations, and actually be flat at equilibrium. But even in that case it would mean that at least during the unfolding process, gpW goes through heterogeneously unfolded structures, so in any case we do not have all-ornone folding. On the contrary, in the model, the all-or-none folding scenario corresponding to a two-state model is still possible since the

flat profiles are compatible with 100% native

and unfolded states equilibrated in comparable proportions. Does the statistical model yield two global states or a collection of partially native states? This question can be answered only by knowing the general native percentage of gpW in the unfolding temperature interval. Dashed lines in Figure 5 (d) show the probability distribution of

at 325 K and 330

K computed from the model. The calculation was achieved by using the combinatorial expression for the number of states having k native units in l chains (see equation (1) in the Zimm&Bragg model)2, and by approximating the

ACS Paragon Plus Environment

parameters to an average triplet

20

Page 21 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

. The distribution is almost flat, reaching its maximum at 40 and 50% and decreasing at high and low values. According to the model, near the melting temperature, the absence of a net maximum of probability means that the protein must be constantly moving from one conformation to another, allowing large

fluctuations.

Figure 5 (d) features also the distribution of the native percentage %N computed from MD data in the unfolding interval, as well as at the two extreme temperatures 280 K (%N ≈ 100%) and 380 K (%N ≈ 18%) for reference of the two extreme states.

of the unfolded

state is not 0 because locally native conformations, though unstable, remain explored significantly as shown in Figure 2. In the unfolding domain of temperatures, extreme native and unfolded global states disappear. Most probable structures have intermediate %N values. Although the length of our trajectories does not enable to describe precisely the equilibrium structures of gpW at 325 K and 330 K, it appears that gpW has multiple temperature dependent metastable conformations. We identified some structures that lasted several hundreds of ns, although again we do not assert that they are equilibrium states or even majority. The fact that at temperatures close to the melting point, gpW explores various forms neither fully native nor unfolded, might be characteristic of the downhill folding. Close to the melting temperature, the low barrier of gpW makes possible the exploration of several conformations with free energies near dynamical equilibrium. Due to the limited length of the 325 K and 330 K trajectories (2.5 µs each) we do not know the number of structures which are part of the equilibrium. Perhaps the equilibrium distribution of %N would become flat on a vast majority of intermediate %N values. But in all cases it excludes a global two-state model for gpW as the extreme states are systematically unstable. In Figure 5 (c), the enthalpy change upon local folding

is

averaged on secondary structures and plotted as a function of temperature. At low temperature

ACS Paragon Plus Environment

21

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 35

MD data have typical fluctuations of 41.2 kJ/mol, which arises again from insufficient length of simulation, preventing the water contribution to disappear in the difference. As the fluctuations exceed the confidence interval, briefly calculated in Methods section, we can hardly comment the evolution at low or high temperature. For MD points as well as model predictions, the sign of

is always negative. The

negative sign was expected because the stability of native structures at low temperature is due to a lower enthalpy. The model shows no precise agreement with the data along the sequence. Qualitative agreement is met on the evolution of

as a function of temperature:

low negative values at low and high T, higher negative values in the unfolding domain. This result comes from the fact that at low and high temperatures, when one (γ,θ) pair undergoes a transition, the overall state of gpW is on average not changing, resulting in low enthalpy difference. In the unfolding temperature domain, on the contrary, the change of the i-th state is accompanied by the change of many other states, leading so to greater enthalpy differences. Near 330 K, gpW goes through multiple states featuring both long native chains and disordered parts, as shown in the

distribution in Figure 5 (d). In such a situation, the

enthalpy of the protein fluctuates a lot more than at extreme temperatures. The average enthalpy change associated to local folding is large because it corresponds, at the same time, to the global folding. For the same reason, in the model, ∆H is also more negative in the unfolding domain.

IV. CONCLUSIONS The thermal unfolding of gpW was modelled by MD simulations at fifteen temperatures between 280 K and 380 K. The simulations were analysed and local native states were defined in terms of local free-energy landscapes,

. It is worth noting that the

MD simulations cannot reproduce the thermodynamic equilibrium of folding/unfolding of

ACS Paragon Plus Environment

22

Page 23 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

gpW on 750 ns, so

are effective free-energy landscapes computed on a limited

time-scale. The MD simulations can be compared to experimental temperature-jump unfolding experiments, as we started always from the folded structure at the initial time and change the initial temperature. As shown elsewhere, despite this limitation, the chemical shifts denaturation curves computed from the 750 ns MD data compared well with the experimental data which means that the MD sampling is enough to analyse the main features of gpW thermal unfolding. Based on

space ratio between native and unfolded state, on one side, and on the

decrease of local native fractions as temperature increases, on the other side, we parameterized a modified heterogeneous LR model for

angles of gpW. In return, the

model proved its validity by producing satisfying denaturation results at both local and global scale. Therefore, the nucleation/propagation mechanism of helix formation is relevant to a certain extent for the description of the collective unfolding of segments of gpW. The Figure 6 illustrates the flow chart, in which all the important steps involved in building a modified LR model are demonstrated. In the prism of the modified LR model, downhill folding of gpW involves a temperature domain where the protein loses its local identity: there is no more stable structure. Near the melting temperature, structures constantly fluctuate in time. They still conserve some native parts but the distribution of the native

along the protein is itself fluctuating. The

distribution of native percentage %N of the protein in this temperature domain suggests that there may be a large ensemble of accessible structures. This temperature domain is characterized by a slow decrease of the native state fractions

, instead of sharp change from

1 to 0. Also enthalpy fluctuations strongly increase near the melting temperature of gpW, revealing a dynamical equilibrium involving partially unfolded structures at different degrees.

ACS Paragon Plus Environment

23

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 35

A limit of the LR model is that it is fully cooperative in essence at the scale of the entire protein. Even with parameters extracted from the dynamics of gpW, the cooperativeness was still involving all

. On the contrary, our simulations rather tend to

highlight cooperative groups (segments of the main chain). A more realistic model aimed at studying cooperativeness would certainly imply different mechanisms of folding at the local scale than the propagation of native states, in order to identify groups of

whose folding

is linked.

Supporting Information Plots of all local fractions of native state for residues 2 to 60.

ACKNOWLEDGMENTS Calculations were performed using HPC resources from PSIUN CCUB (Centre de Calcul de l'Université de Bourgogne). This project was supported by a grant from the National Institutes of Health [R01GM14312].

ACS Paragon Plus Environment

24

Page 25 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

REFERENCES: (1) Poland, D.; Scheraga, H. A. Theory of Helix-Coil Transitions in Biopolymers. Statistical Mechanical Theory of Order-Disorder Transitions in Biological Macromolecules; Academic Press: New York, 1970. (2) Zimm, B. H.; Bragg, J. K. Theory of the Phase Transition between Helix and Random Coil in Polypeptide Chains. J. Chem. Phys. 1959, 31 (2), 526–535. (3) Lifson, S.; Roig, A. On the Theory of Helix—Coil Transition in Polypeptides. J. Chem. Phys. 1961, 34 (6), 1963–1974. (4) Miyazawa, S.; Jernigan, R. L. Residue – Residue Potentials with a Favorable Contact Pair Term and an Unfavorable High Packing Density Term, for Simulation and Threading. J. Mol. Biol. 1996, 256 (3), 623–644. (5) Bruscolini, P.; Naganathan, A. N. Quantitative Predictions of Protein Folding Behaviors from a Simple Statistical Model. J. Am. Chem. Soc. 2011, 133, 5372-5379 (6) Gopi, S.; Singh A.; Suresh S.; Paul, S.; Ranu, S.; Naganathan, A. N. Toward a Quantitative Description of Microscopic Pathway Heterogeneity in Protein Folding. Phys. Chem. Chem. Phys. 2017, 19, 2089120903. (7) Muñoz, V.; Eaton, W. A. A Simple Model for Calculating the Kinetics of Protein Folding from ThreeDimensional Structures. Proc. Natl. Acad. Sci. U. S. A. 1999, 96 (20), 11311–11316. (8) Henry, E. R.; Best, R. B.; Eaton, W. A. Comparing a Simple Theoretical Model for Protein Folding with All-Atom Molecular Dynamics Simulations. Proc. Natl. Acad. Sci. U. S. A. 2013, 110 (44), 17880– 17885. (9) Lai, J. K.; Kubelka, G. S.; Kubelka, J. Sequence, Structure, and Cooperativity in Folding of Elementary Protein Structural Motifs. Proc. Natl. Acad. Sci. U. S. A. 2015, 112 (32), 9890–9895. (10) Naganathan, A. N.; Muñoz, V. Thermodynamics of Downhill Folding: Multi-Probe Analysis of PDD, a Protein That Folds Over a Marginal Free Energy Barrier. J. Phys. Chem. B 2014, 118 (30), 8982–8994. (11) Davis, C. M.; Cooper, A. K.; Dyer, R. B. Fast Helix Formation in the B Domain of Protein A Revealed by Site-Specific Infrared Probes. Biochemistry 2015, 54 (9), 1758–1766. (12) Ahmed, Z.; Beta, I. A.; Mikhonin, A. V.; Asher, S. A. UV−Resonance Raman Thermal Unfolding Study of Trp-Cage Shows That It Is Not a Simple Two-State Miniprotein. J. Am. Chem. Soc. 2005, 127 (31), 10943–10950. (13) Sukenik, S.; Pogorelov, T. V.; Gruebele, M. Can Local Probes Go Global? A Joint Experiment– Simulation Analysis of Λ6–85 Folding. J. Phys. Chem. Lett. 2016, 7 (11), 1960–1965 (14) Sborgi, L.; Verma, A.; Piana, S.; Lindorff-Larsen, K.; Cerminara, M.; Santiveri, C. M.; Shaw, D. E.; de Alba, E.; Muñoz, V. Interaction Networks in Protein Folding via Atomic-Resolution Experiments and Long-Time-Scale Molecular Dynamics Simulations. J. Am. Chem. Soc. 2015, 137 (20), 6506–6516. (15) Sborgi, L.; Verma, A.; Muñoz, V.; Alba, E. de. Revisiting the NMR Structure of the Ultrafast Downhill Folding Protein GpW from Bacteriophage λ. PLOS ONE 2011, 6 (11), e26409. (16) Muñoz, V.; Cerminara, M. When Fast Is Better: Protein Folding Fundamentals and Mechanisms from Ultrafast Approaches. Biochem. J. 2016, 473 (17), 2545–2559. (17) Muñoz, V.; Campos, L. A.; Sadqi, M. Limited Cooperativity in Protein Folding. Curr. Opin. Struct. Biol. 2016, 36 (Supplement C), 58–66. (18) Nishikawa, K.; Momany, F. A.; Scheraga, H. A. Low-Energy Structures of Two Dipeptides and Their Relationship to Bend Conformations. Macromolecules 1974, 7 (6), 797–806. (19) Liwo, A.; Khalili, M.; Czaplewski, C.; Kalinowski, S.; Ołdziej, S.; Wachucik, K.; Scheraga, H. A. Modification and Optimization of the United-Residue (UNRES) Potential Energy Function for Canonical Simulations. I. Temperature Dependence of the Effective Energy Function and Tests of the Optimization Method with Single Training Proteins. J. Phys. Chem. B 2007, 111 (1), 260–285. (20) Senet, P.; Maisuradze, G. G.; Foulie, C.; Delarue, P.; Scheraga, H. A. How Main-Chains of Proteins Explore the Free-Energy Landscape in Native States. Proc. Natl. Acad. Sci. U. S. A. 2008, 105 (50), 19708–19713. (21) Cote, Y.; Senet, P.; Delarue, P.; Maisuradze, G. G.; Scheraga, H. A. Nonexponential Decay of Internal Rotational Correlation Functions of Native Proteins and Self-Similar Structural Fluctuations. Proc. Natl. Acad. Sci. U. S. A. 2010, 107 (46), 19844–19849. (22) Cote, Y.; Senet, P.; Delarue, P.; Maisuradze, G. G.; Scheraga, H. A. Anomalous Diffusion and Dynamical Correlation between the Side Chains and the Main Chain of Proteins in Their Native State. Proc. Natl. Acad. Sci. U. S. A. 2012, 109 (26), 10346–10351. (23) Best, R. B.; de Sancho, D.; Mittal, J. Residue-Specific α-Helix Propensities from Molecular Simulation. Biophys. J. 2012, 102 (6), 1462-1467. (24) Hess, B.; Kutzner, C.; van der Spoel, D.; Lindahl, E. J. GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. J. Chem. Theory Comput. 2008, 4, 435-447. (25) Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L. Comparison of Simple Potential Functions for Simulating Liquid Water. J. Chem. Phys. 1983, 79, 926-935. (26) Berendsen, H. J. C.; Postma, J. P. M.; van Gunsteren, W. F.; DiNola, A.; Haak, J. R. MolecularDynamics with Coupling to an External Bath. J. Chem. Phys. 1984, 81 (8), 3684–3690. (27) Parrinello, M.; Rahman, A. Polymorphic Transitions in Single Crystals: A New Molecular Dynamics Method. J. Appl. Phys. 1981, 52, 7182-7190.

ACS Paragon Plus Environment

25

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 35

(28) Sanchez-Medina, C.; Sekhar, A.; Vallurupalli, P.; Cerminara, M.; Munoz, V. ; Kay, L. E. Probing the Free Energy Landscape of the Fast-Folding gpW Protein by Relaxation Dispersion NMR. J. Am. Chem. Soc. 2014, 136, 7444-7451

ACS Paragon Plus Environment

26

Page 27 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Figure captions: Figure 1: Panel (a): All-atom NMR-derived structure of gpW (PDB code: 2L6Q, first model), with secondary structure represented in new cartoon design. Secondary structures: α1 {4-19}, β1 {23-28}, β2 {31-36}, 3/10 helix {37-39} and α2 {40-54}. Panel (b): Representation of virtualbond angles on a 4-residue polypeptide chain. Virtual bonds are represented in solid red and orange lines. Figure 2: Panels (a) to (c): Free-energy landscapes

in color levels. Color levels are

shown on the right and are expressed in RT units. The green dots are the twenty extracted from the NMR-derived structures (2L6Q). Panel (a):

values

at T=280K featuring

local -helix conformation. Three concentric white circle show the native states defined by m=2, 3 ,4 RT thresholds. Panel (b): Panel (c):

at T=280K featuring local

-sheet conformation.

at T=380K showing exploration of nearly all accessible values of the

angles. The white circle encompasses the native state defined by a 4RT threshold. Panel (d): Red and blue:

and

parameters of the LR model along the sequence computed by numerical

integration of

over

and

respectively. Black: sum of the two

parameters along the amino-acid sequence. The sum is not perfectly constant which means that the exploration at T=380K does not cover exactly the same extended area for all

.

Secondary structures are represented on the X axis. Figure 3: Panel (a): global denaturation curve of gpW calculated in three ways: percentage of (γ,θ) in native state (purple solid line), percentage of native contacts (cyan solid line) and percentage of native chemical shifts from NMR14 (black circles). Panels (b), (c) and (d): probability distribution (X,Y) (heat maps) at three different temperatures, where X is the percentage of (γ, θ) in native state and Y is the percentage of native contacts found in the trajectory. (b) T=285K, (c) T=330K, (d) T=380K. Figure 4: Panel (a): Global denaturation curves of secondary structures, obtained by averaging over α-helix 1 (

), α-helix 2 (

) and the β -sheets ( ). Squares represent native

fractions derived from native state statistics of MD simulations. Solid lines represent the same

ACS Paragon Plus Environment

27

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

quantities computed from the LR model after the

Page 28 of 35

parameters have been defined. Solid vertical

lines represent the melting temperature of secondary structures derived from the denaturation curves of the model. Dashed vertical lines are the melting temperature of secondary structures, from experimental NMR measurements.14 Panel (b): Purple squares are temperatures along the amino-acid sequence estimated from local are deduced local folding enthalpy

local inversion

curves. Black squares

parameters of the model along the sequence. Secondary

structures are represented on the X axis. Red crosses represent the local melting temperature Tmi computed from the model denaturation curve of each pair (γ,θ). Panel (c): in red, the histogram of the local melting temperatures computed from the model. Solid and dashed vertical lines are the melting temperatures of secondary structures as in panel (a). Figure 5: Panel (a): plot of local fNi along the amino-acid sequence for T=280K (purple), 315K (gold), 325K (turquoise), 335K (red) and 380K (blue). Lines+squares represent fractions computed from MD simulations. Secondary structures are represented on the X axis. Panel (b): same as (a), solid lines represent local native fractions calculated in the LR model. Panel (c): plot of the enthalpy change upon folding of

as a function of temperature,

averaged on secondary structures -helix 1 (

, red),

helix 2 (

, blue) and the -sheets ( ,

green). Linepoints show the quantities computed from MD simulations. Solid lines show the results of Lifson-Roig model calculation. Panel (d): probability distribution of words the probability of having

native

, in other

angles at T=280K (red), 325K (green), 335K

(blue) and 380K (magenta). Solid lines are percentages computed from MD simulations. The dashed curves are the probabilities at 325K (green) and 330K (blue) computed in the model. The insert shows the free energy profiles of

in MD simulations.

Figure 5: The flow chart representing the building of the Lifson-Roig inspired model.

ACS Paragon Plus Environment

28

Page 29 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Figure 1.

ACS Paragon Plus Environment

29

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 35

Figure 2.

ACS Paragon Plus Environment

30

Page 31 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Figure 3.

ACS Paragon Plus Environment

31

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 35

Figure 4.

ACS Paragon Plus Environment

32

Page 33 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Figure 5.

ACS Paragon Plus Environment

33

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 35

Figure 6.

ACS Paragon Plus Environment

34

Page 35 of 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Table of Contents

ACS Paragon Plus Environment

35