All-atom simulations reveal protein charge decoration in the folded

∗To whom correspondence should be addressed. 1. Page 1 of 40. ACS Paragon Plus Environment. Journal of Chemical Theory and Computation. 1. 2. 3. 4. ...
0 downloads 6 Views 3MB Size
Subscriber access provided by Oakland University Libraries

Article

All-atom simulations reveal protein charge decoration in the folded and unfolded ensemble is key in thermophilic adaptation Lucas Sawle, Jonathan Huihui, and Kingshuk Ghosh J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.7b00545 • Publication Date (Web): 15 Sep 2017 Downloaded from http://pubs.acs.org on September 17, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Theory and Computation is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

All-atom simulations reveal protein charge decoration in the folded and unfolded ensemble is key in thermophilic adaptation Lucas Sawle, Jonathan Huihui, and Kingshuk Ghosh∗ Department of Physics and Astronomy, University of Denver E-mail: [email protected]

Abstract Thermophilic proteins denature at much higher temperature compared to their mesophilic homologues, in spite of high structural and sequential similarity. Computational approaches to understand this puzzle faces three major challenges: i) unfolded ensembles are usually neglected, ii) simulation studies of the folded states are often too short and iii) majority of investigations focus on a few protein pairs, obscuring the prevalence of different strategies across multiple protein systems. We address these concerns by carrying out all-atom simulations to characterize physicochemical properties of both the folded and disordered ensemble in multiple (12) thermophilic-mesophilic homologous protein pairs. We notice two clear trends in most pairs (10 out of 12). First, specific distribution of charges in the native basin – sampled from multi-microsecond long Molecular Dynamics (MD) simulation trajectories – leads to more favorable electrostatic interaction energy in thermophiles compared to mesophiles. Next, thermophilic proteins have lowered electrostatic interaction in their unfolded state – generated using ∗

To whom correspondence should be addressed

1

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Monte Carlo (MC) simulation – compared to their mesophilic counterparts. The net contribution of interaction energy to folding stability, however, remains more favorable in thermophiles compared to mesophiles. The overall contribution of electrostatics quantified by combining the net interaction energy and the solvation penalty of folding – due to differential charge burial in the folded and the unfolded ensemble – is also mostly favorable in thermophilic proteins compared to mesophiles. The systems that deviate from this trend provide interesting test cases to learn more about alternate design strategies when modification of charges is not viable due to functional reasons. The unequal contribution of the unfolded state to the stability in thermophiles and mesophiles highlight the importance of modeling disordered ensemble to understand thermophilic adaptation as well as protein stability, in general. Our integrated approach – combining finite element analysis with MC and MD – can be useful in designing charge mutations to alter protein stability.

1

Introduction

Thermophilic proteins, extracted from organisms that live at high temperature, denature at much higher temperatures compared to their mesophilic counterparts, found in organisms that live at, or near room temperature. Homologous pairs of thermophilic and mesophilic proteins show a high degree of structural and sequential similarity, and yet they differ significantly in their thermal response. This has puzzled the community and raised the longstanding question: 1 how do thermophilic proteins withstand such high temperature? Thermodynamic measurements 2 provide unambiguous validation that thermophilic proteins have higher melting temperature than mesophiles. Temperature dependent stability curves can be further analyzed to determine the changes in enthalpy, entropy, and specific heat upon folding, and highlight their specific roles in enhancing thermostability. 2,3 Although insightful, these studies by themselves do not depict molecular underpinning. To overcome this shortcoming, experiments have been carried out to systematically study several designed

2

ACS Paragon Plus Environment

Page 2 of 40

Page 3 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

mutants or sets of proteins to understand different molecular hypothesis. 4–10 However, these studies are costly and time consuming, and consequently limited to only a few protein pairs and are hard to generalize at a global scale. On the other hand, bioinformatics based approaches 11 provide a comparative analysis of thermophilic and mesophilic protein sequences at a large scale. These studies are less expensive and focus on compositional or other statistical properties of the sequence, giving insights toward evolutionary tendencies, but they often lack a first principle physicochemical basis. Two recent works 12,13 have shown the importance of biophysical modeling to highlight genomic and proteomic trends. First, a Monte Carlo study by Berezovsky 12 explained the increased occurrence of Lysine, but not Arginine, as observed in the thermophilic proteome. More recent work 13 based on a theoretical polymer physics model applied to a set of 540 pairs of thermophilic and mesophilic homologous sequences showed thermophilic proteins, on average, have a more compact denatured state ensemble compared to their mesophilic counterparts, consistent with protein specific case studies, 14–16 and global analysis of thermodynamic data. 2,17 In parallel to thermodynamic and sequence based analysis, structure-based studies have emerged from the expectation that although the static structures between thermophilic and mesophilic proteins are very similar, the collective motion encoded in these structures must be different. This hypothesis originates from two observations: i) thermophilic enzymatic activity is optimal near the natural environment of the organism at high temperatures, and relatively inactive at low temperatures, 18 ii) function is directly related to structural dynamics. 19–23 This has lead to the long-standing view that thermophilic protein structures are less flexible than mesophiles at room temperature. 24 Consistent with this view, NMR relaxation measurements have shown that the slow lead opening is responsible for lower turnover in thermophilic Adenylate Kinase, 25 and hydrogen exchange experiments of thermophiles indicate slow exchange rates. 26 Low protease susceptibility and slower unfolding rate in thermophiles also tend to bias this view. 27–29 However, subsequent experimental in-

3

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

vestigations to test conformational flexibility of thermophilic and mesophilic homologs have shown otherwise. Hydrogen exchange experiments show sufficient conformational flexibility at room temperature in thermophilic rubredoxin 30,31 and RNaseH. 32 While experiment studies of reduced flexibility in thermophiles remain inconclusive, computational studies have also remained indecisive. 24,33–41 Rigidity theory based analysis has shown the role of rigidity in thermostability in Rubredoxin and different mutants in Lipase. 24,35 However, as clearly articulated by Karshikoff, 36 rigidity is often associated with a frozen structure and neglects structural fluctuations. Livesay and Jacobs have advanced ensemble based sampling to include these fluctuations and explored stability-flexibility relations within a quantitive framework. 42,43 Molecular dynamics (MD) simulations provide another approach to model such fluctuations and dynamics, although at a much faster time scale than protein domain dynamics probed by hydrogen exchange. MD simulations of protein native states at room temperature have pointed out that thermophiles are not necessarily associated with suppressed fluctuations. 38–41 Furthermore, high temperature molecular dynamics studies are capable of giving further insights by following protein unfolding. 41,44 High temperature studies have also shown that fluctuations in the protein structure have much lower temperature dependence in thermophiles compared to mesophiles. 45,46 Quantifying structural dynamics can give us further information about electrostatic free energy, 47 internal dielectric constant. Brooks and colleagues have shown thermophilic proteins reduce the desolvation penalty by achieving a higher dielectric constant. 48 These lines of investigations demonstrate the relative differences in the structural dynamics between homologous pairs of thermophilic and mesophilic proteins and the importance of simulation studies to gain insights to these issues. 49 From the brief review above, three primary deficiencies of present approaches are apparent. First, the lack of consensus, and conflicting results highlight the need for a global approach: majority of the studies focus on particular protein pairs making it difficult to gain insights about universality of proposed mechanisms. In other words, to what extent a

4

ACS Paragon Plus Environment

Page 4 of 40

Page 5 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

mechanism conclusively proven to be responsible in one pair is also operative in another pair? Are there competing mechanisms and if so, is there a primary mechanism? While paucity of systems studied is a general challenge for experimental and simulation approaches, computational efforts suffer from their own challenges. Most of the computational approaches neglect the role of the unfolded state, in spite of its widely accepted importance in thermophilic adaptation. 2,13 This is primarily due to difficulties in modeling the unfolded state ensemble. As a result simulations primarily focus on the folded state. However, with the exception of a few studies, 41,50,51 simulated trajectories are relatively short with very little attention paid to the quality of sampling, leaving the reported observables unreliable, or artifacts of inadequate sampling. Recent work has shown that ensuring convergence of molecular dynamics simulations, even for protein native states, can be challenging. 52 We address these concerns by providing an integrated approach that uses i) micro-second or longer (as needed for convergence) Molecular Dynamics (MD) simulations to explore native state ensemble, ii) Monte Carlo (MC) simulation to generate unfolded/disordered ensemble, iii) finite difference scheme on both MD and MC frames to compute electrostatic properties in the folded and unfolded ensemble separately. Furthermore, this integrated approach was carried out at a global level by analyzing 12 homologous protein pairs (each having a mesophilic and a corresponding thermophilic protein) to glean insights about different strategies of thermophilic adaptation and their relative usage in different protein systems, not possible with previous studies on few selected pairs. In the majority of pairs studied, thermophilic proteins have better-connected, attractive electrostatic network in their native states, and were associated with lowered electrostatic interaction energies – both in the folded and unfolded ensemble – when compared to their mesophilic counterparts. The destabilizing effect of the unfolded state does not outweigh but lowers the stabilizing effect of the folded state interaction. Alternate strategies of thermophilic adaptation were revealed from a few special proteins pairs where modifying electrostatics does not appear to be a viable strategy for functional reason. Structural dynamics in the folded state is a consequence of these

5

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

different strategies, and consequently thermophilic proteins may have either enhanced or suppressed fluctuation profiles compared to mesophiles, in contradiction to usual belief. Short simulations of the folded state (even several hundreds of nano seconds) can yield qualitatively different results and trends compared to long well converged simulations, while results from simply analyzing PDB structures can also be misleading. These subtleties about the folded state dynamics, together with the finding that the disordered ensemble in thermophile and mesophile contribute differently in their respective stabilities, highlight the importance of carefully investigating both the folded and unfolded ensemble to understand the origin of thermophilic adaptation and protein stability, in general.

2 2.1

Methods Selection of protein pairs.

For this comparative study, 12 homologous single domain protein pairs from mesophilic and thermophilic organisms were selected. The following systems (with their abbreviations used throughout this manuscript) were chosen: Acylphosphatase (ACP), Chemotaxis Proteins W and Y (CheW and CheY respectively), Cold Shock Protein (CSP), Glycine cleavage system H-protein (HGCS), Histidine-containing phosphocarrier protein (HPr), N utilization substance protein B (NusB), N-acetyltransferase (PaiA), Ribonuclease H and P (RNaseH and RNaseP, respectively), Anti-sigma factor antagonist (Anti−σ), and Thioredoxin. Details on the selection criteria are given in the supporting information (supplemental methods section), and more information on selected pairs is shown in Table 1 in the supporting information.

2.2

Native state simulation protocol and convergence criteria

The protein pairs listed above were simulated for an initial minimum of 1µs, and the resulting trajectories were tested for convergence. If convergence was not satisfied, the simulations were extended for an additional 500ns, and convergence retested. This cycle of simulation 6

ACS Paragon Plus Environment

Page 6 of 40

Page 7 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

and convergence testing was repeated until both systems of a given mesophilic-thermophilic pair passed our criteria with identical simulation times. The convergence was tested by following the discovery of new cluster and the stability of the cluster distribution entropy. 52 The details of the simulation protocol and convergence criteria are given in the supporting information (see supplemental methods section).

2.3

Unfolded ensemble simulation protocol

The disordered state ensemble was approximated using an all-atom Monte Carlo simulation (called CAMPARI 53,54 ) for a total of 135 million post-equilibration steps. In order to minimize computation time and maximize sampling, the number of independent trajectories was dependent on sequence length. Full details can be found in the supporting information (see supplemental methods section). Distribution entropy was tested in a similar fashion to the folded state to further ensure convergence criteria.

3

Results and Discussion

3.1

Attractive ionic networks are better connected in thermophilic native states compared to their mesophilic counterparts

Several studies have indicated thermophilic protein native states have more ion pairs compared to their mesophilic counterparts. 55 However, these studies are based primarily on the static structure from the PDB, and ignore structural dynamics. Contacts identified from the PDB structures may be only transient, they may disappear while new contacts may emerge over a simulation trajectory. Furthermore, the total number of ion-pairs does not give us insights about the connectivity between all ion-pairs. The ion network topology can be important, as increasing just the total number of attractive ion-pairs does not necessarily enhance stability due to competing destabilization effect

7

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

arising from the desolvation penalty of charged groups. 56 Since desolvation penalties are expected to increase with the total number of charged side chains, an efficient design strategy is to increase the number of attractive ionic interactions while keeping the total number of charges relatively unaltered. This can be designed by sharing charged residues within the ionic interactions, 56,57 leading to better connected networks amongst charged groups. Network topology analysis of non-covalent connections also emphasized the importance of hubs in thermophilic proteins, 58 in protein allostery, stabilization, 59 catalytic site predictions, 60 while network analysis highlighted the role of the largest rigid cluster in thermophilic proteins. 24 To explore the role of connectivity in a quantitative manner, a simple but novel metric ζ was defined as the ratio of the number of ion pair interactions (between two opposite charges) to the total number of charged residues (vertices). Higher values of ζ imply more ion pairs per charged residues, and better connectivity. The minimum value of ζ is zero when no attractive ion pairs (edges) are formed, and the theoretical maximum is Q+ Q− /(Q+ + Q− ), where Q+ and Q− are the total number of positive and negative charges, respectively. As a simple reference, consider Q+ = Q− . In this case, when each charge forms one and only one ion pair (edge) with a corresponding charge of opposite sign, ζ = 0.5; while ζ = 1 indicates each charge on average forming two edges with charges of opposite sign (see Figure 1 in the supporting information for an illustrative example with Q+ = Q− = 3). Furthermore, the dynamic nature of ion-pair formation was quantified by the probability (p), and was determined by monitoring the fraction of simulation frames in which the distance (d) between two oppositely charged side chain representative atoms was within a defined critical distance dc , i.e. d ≤ dc , (see supporting information supplemental methods section). This defines an edge. The importance of quantifying contact probabilities in this manner have proven useful in other studies of protein folding mechanisms. 61 For a given p, all contacts (edges) between oppositely charged groups with a probability of at least p were considered. When p approaches unity, only the most stable and constant contacts contributed, and in

8

ACS Paragon Plus Environment

Page 8 of 40

Page 9 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

the opposite limit of p = 0, any and all ion pair interactions, including the most short lived, were expected to contribute. The resilience of these networks, and their connectedness is quantified by computing ζ as a function of p. For the majority of the pairs (9 of 12), thermophilic proteins formed better connected, and more stable networks among oppositely charged amino acids compared to their mesophilic counterparts (see Figure 1). RNaseP, HGCS, and Thioredoxin are the only three pairs that do not show any significant difference between thermophile and mesophile. Additionally, a coarse-grain metric is provided by the average of ζ computed over the entire simulation trajectory (solid lines in Figure 1). In this metric, we notice in all protein pairs – except RNaseP and Thioredoxin – thermophilic proteins have better connectivity than their mesophilic counterparts. PDB based analysis will not yield these patterns with probability nor the averages (as indicated by the solid lines in Figure 1) due to static structures. Short simulations, on the other hand, may suffer from sampling issues yielding unreliable values of p and averages thus highlighting the importance of careful MD simulations subjected to strict convergence test as presented here. Our findings are consistent with mutagenesis experiments that has shown enhanced thermostability can be achieved by forming large networks of ionic interactions 57 and salt bridges can form at the cost of low activity at low temperatures. 62 Better connected ion-pair networks may impart greater resilience to thermophilic proteins at high temperature. 63 A possible consequence of greater resilience is an increased unfolding barrier, and therefore, a slower unfolding rate which can give rise to higher stability assuming the folding rate remains unaltered; a feature experimentally observed in multiple thermophilic proteins. 27–29

9

ACS Paragon Plus Environment

!"#

!"$

!"%

!"&

'"!

)"! #"( #"! '"( '"! !"( !"! !"!

B

!"#

!"$

!"$

!"%

!"&

'"!

'"$ '"# '"! !"& !"% !"$ !"# !"! !"!

!"#

!"$

'

!"&

'"!

!"$

!"%

!"&

'"! !"( !"! !"!

'"!

!"#

!"$

J

!"&

!"#

!"$

!"%

!"&

'"!

'"! !"( !"$

!!

!"&

'"!

I

'"! !"(

!"#

!"$

!"%

!"&

'"(

K

!"#

'"!

'"!

!!

'"(

!"! !"!

!"%

'"(

!"! !"!

'"!

!"#$% &$' ($')$*

!"(

!"$

!"%

#"! !"#$% &$' ($')$*

'"!

!"&

F

!!

'"(

!"%

!!

H

!!

!"#

!"%

'"# '"! !"& !"% !"$ !"# !"! !"!

!"#$% &$' ($')$*

#

!"! !"!

!"#

!!

'"( !"#$% &$' ($')$*

!"#$% &$' ($')$*

G

!"$

!"(

!!

(

!"#

'"!

!"! !"!

'"!

E

!!

! !"!

!"&

C

!!

D

!"#

!"%

!"#$% &$' ($')$*

'"$ '"# '"! !"& !"% !"$ !"# !"! !"!

!"#$% &$' ($')$*

!"#$% &$' ($')$*

!!

Page 10 of 40

'"( !"#$% &$' ($')$*

A !"#$% &$' ($')$*

'"$ '"# '"! !"& !"% !"$ !"# !"! !"!

!"#$% &$' ($')$*

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

!"#$% &$' ($')$*

Journal of Chemical Theory and Computation

!"%

!"&

'"!

!!

L

'"! !"( !"! !"!

!"#

!"$

!"%

!"&

'"!

!!

Figure 1: Edges per vertex, ζ, (in the y axis) as a function of cut-off probability, pc , (on x axis) for 12 homologous protein pairs, each having a thermophile (red) and a mesophile (blue). (A) ACP, (B) CheW, (C) CheY, (D) CSP, (E) HGCS, (F) HPr, (G) NusB, (H) PaiA, (I) RNaseH, (J) RNaseP, (K) Anti−σ, and (L) Thioredoxin. Horizontal lines (red for thermophiles and blue for mesophiles) provide a reference and denote the average ζ computed over the entire trajectory.

3.2

Thermophilic proteins have more favorable electrostatic interaction energies in their native state

The results above indicate that attractive interactions between oppositely charged side chains are more favorable in thermophilic native states compared to their mesophilic counterparts. But, what is the contribution from the repulsive interactions between similarly charged amino acids? Moreover, electrostatics being long-ranged, the connectivity metric ζ alone does not give precise measure of attractive electrostatic interaction either. To further quantitate this effect, and provide a more comprehensive picture, the total electrostatic interaction energy 10

ACS Paragon Plus Environment

Page 11 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

was computed from all charged residues in the folded state of the protein. Due to the complicated, non-spherical geometry of the folded state, DelPhi 64 – a finite element scheme – was utilized to compute the interaction energy. The electrostatic interaction energy in the folded state was defined as Gfint =

1X qk φ k 2 k

(1)

where qk is the kth test charge under consideration, and φk is the potential at the kth charge location resulting from all other charges in the system. The factor of 1/2 ensures a proper summation, i.e. no double counting. Note the self energy terms, related to the solvation penalty (discussed below), have not been included in this calculation. For details on the calculation, see supporting information (supplemental methods section). Similar calculations were presented by Zhou 56 using spherical geometry, by Brooks and colleagues for non-spherical shape. 48 Protein specific dielectric constant (ǫp ) for the protein interior was calculated from the protein dipole moment fluctuation accumulated over the entire simulation trajectory 48 (see supporting information for details). The calculation for electrostatic interaction energy was performed using protein specific dielectric constant ǫp and using structures generated from the molecular dynamics simulation trajectories allowing inclusion of native state dynamics (see supporting information for details of this calculation). In order to ensure this does not lead to double counting, we compared Gfint (ǫ = 2) with Gfint (ǫ = ǫp ). We notice the difference is greater than the standard deviation in Gfint (ǫp ) and Gfint (ǫ = 2) computed over the entire simulation trajectory (see Table 2 in supporting information for the numbers and for details on the calculation of the standard deviations). Based on the small standard deviation values we conclude the risk of double counting is minimal. This justifies our current protocol of using simulated dielectric constant in the calculation of the electrostatic interaction energy over simulated frames. The difference in the folded state interaction energies (second column in Table 1) between thermophile and mesophile was defined as ∆Gfint = Gf,thermo − Gf,meso ; where Gf,thermo is the int int int electrostatic interaction energy of the folded state of the thermophilic protein, and Gf,meso int 11

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 40

is that of the mesophilic protein. Electrostatic interaction energies in the native state were more favorable in thermophilic proteins compared to their mesophilic homologues in 10 of 12 pairs with the only exception of RNaseP and Thioredoxin. The trend remains almost unaltered, with the only exception of ACP, even after considering the errors. The errors were estimated by computing the standard deviation in Gfint for thermophilic and mesophilic proteins, separately, over the entire simulation trajectory (see Table 3 in the supporting information for reported error estimates and details of the calculation.) Table 1: Electrostatic energies calculated from structural ensembles taken from simulation trajectories. System ACP CheW CheY CSP HGCS HPr NusB PaiA RNase H RNase P Anti−σ Thioredoxin

∆Gfint (ǫp ) -0.8 -24.6 -17.4 -8.0 -28.6 -6.6 -9.7 -7.8 -12.9 23.6 -6.9 7.9

∆Guint (ǫp ) -3.0 -16.4 -3.2 -2.0 -17.1 -4.9 -5.6 -5.3 4.8 19.1 -1.8 -1.8

ǫthermo p 16.9 58.1 27.5 40.6 18.1 21.1 46.0 46.7 14.8 46.6 38.4 21.6

ǫmeso p 20.7 38.7 24.3 23.8 24.9 17.6 27.9 25.1 14.3 37.5 36.5 20.9

∆∆Gint 2.2 -8.2 -14.2 -6.0 -11.5 -1.7 -4.1 -2.5 -17.7 4.5 -5.1 9.7

∆∆Gsolv 5.2 -8.1 -5.8 -0.5 13.9 -10.2 -10.2 -12.4 0.8 -4.1 2.3 -2.9

∆∆Gelec (ǫp ) 7.4 -16.3 -20.0 -6.5 2.4 -11.9 -14.3 -14.9 -16.9 0.4 -2.8 6.8

The difference in electrostatic interaction energy between thermophilic and mesophilic (∆Gint = Gthermo − Gmeso int int ) homologues in their folded state (column 2) and unfolded state (column 3) at specified internal dielectrics. Protein interior dielectric constants (ǫp ) were calculated using protein dipole moment fluctuations from MD simulations (columns 4 and 5). All values of ∆Gfint , ∆Guint , ∆∆Gint , ∆∆Gsolv and ∆∆Gelec have units of kT (T = 298K). ǫp values were calculated from simulation data and equation 4 in the supporting information.

12

ACS Paragon Plus Environment

Page 13 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

3.3

Thermophilic proteins have more favorable electrostatic interaction energies in their disordered state

Majority of previous theoretical studies have neglected the role of the unfolded state due to challenges in characterizing the disordered ensemble. However based on experimental 15,16 and few quantitive studies, 2,13 unfolded states are expected to retain residual structure in thermophilic proteins. We expect these residual structures to alter the enthalpy in the disordered ensemble of thermophilic proteins. This can be a destabilizing effect that may compete with the stated stabilizing effect of electrostatics in the folded state in thermophilic adaptation. We quantify this effect by generating disordered ensemble with Monte Carlo simulation using implicit solvation model called ABSINTH within CAMPARI simulation package. 53,65 Pappu and colleagues have applied this approach to several problems in protein science confirming its broad applicability to model disordered ensemble in proteins. 54,66–69 (See supplemental methods in the supporting information for details of simulation). Following the same scheme used to model the folded state contribution to electrostatics, we computed unfolded state interaction energy (reported in the third column in Table 1). We define, ∆Guint = Gu,thermo − Gu,meso , where Gu,thermo and Gu,meso are the disordered int int int int state (modeled as unfolded state) interaction energies for thermophile and mesophile, respectively. We note thermophilic proteins in general, except for RNaseP and RNaseH, have more favorable electrostatic interaction energies in their disordered ensemble compared to their mesophilic counterparts. This difference clearly highlights the importance of modeling disordered ensemble to better understand the origin of high thermal stability in thermophiles. Considering the errors (reported in Table 3 in the supporting information), for five protein pairs (CheW, CheY, HGCS, HPr and NusB) thermophiles have more favorable unfolded state interaction energy than their mesophilic counterparts, while the reverse trend is observed in RNaseP and for six other pairs (ACP, CSP, PaiA, RNaseH, Anti-σ and Thioredoxin) results are inconclusive concluded based on the observation that the difference of the means is smaller or comparable to standard deviations. 13

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

3.4

Thermophilic proteins have more favorable net electrostatic interaction energy

The analysis above highlights the stabilizing role of the electrostatic interaction in both the folded and unfolded state in thermophiles. Due to the competing nature of the two effects, we now compute the net contribution of the electrostatic interaction to the overall folding stability by subtracting the interaction energies of the unfolded ensemble from that of the folded ensemble, specifically ∆∆Gint = ∆Gfint −∆Guint (see column 6 in Table 1). We conclude that the destabilizing effects of the lowered electrostatic energy in the disordered ensemble observed in thermophilic proteins does not outweigh stabilizing effect of electrostatics in the folded state. However, it reduces the overall stabilizing effect of electrostatics to folding stability. The only exception is ACP which will be discussed later. Upon considering the error estimates, we notice for six proteins (CheW, CheY, CSP, HGCS, RNaseH, Anti-σ) the overall effect of electrostatic interaction remains favorable for thermophiles compared to mesophiles, while it is opposite for Thioredoxin and for five other proteins (ACP, HPr, NusB, PaiA, RNaseP) results are inconclusive. (see column 6,7 of Table 3 in the supporting information for reported error estimates).

3.5

Role of overall electrostatics including solvation penalty

It is now clear thermophilic proteins in general have more favorable electrostatic interactions with respect to their mesophilic counterparts. However, enhancement in favorable electrostatic interaction is often associated with a competing effect of increased desolvation penalty. 48,56,70 The desolvation penalty (∆Gsolv = Gfsolv − Gusolv ) is the difference in solvation energy of the charged groups in the folded (Gfsolv ) and the unfolded state (Gusolv ). In general, charged groups are better solvated in the unfolded state than in the folded state, giving rise to positive ∆Gsolv values. The desolvation penalty is expected to increase with the number of charges in the system, and therefore, the desolvation penalty is of particular concern

14

ACS Paragon Plus Environment

Page 14 of 40

Page 15 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

when favorable electrostatic interactions are achieved by an increase in the total number of charges. Brooks and colleagues demonstrated the possibility that thermophilic proteins may lower the desolvation penalty by increasing the interior dielectric constant of the protein. 48 Elcock 70 has noticed that although the desolvation penalty for thermophiles can be higher than mesophiles at room temperature, it is significantly lower at high temperature where the dielectric mismatch between protein interior and the solvent dielectric is reduced. In order to provide the net effect of electrostatics, we further quantified the solvation energy in the folded and disordered states using simulated ensembles (see supporting information for the details of the calculation). We compute ∆Gelec by combining both the electrostatic interaction energy and the desolvation penalties as ∆Gelec = Gfint − Guint + Gfsolv − Gusolv

(2)

and further define, ∆∆Gelec = ∆Gthermo − ∆Gmeso elec elec , where superscript thermo and meso denote thermophile and mesophile, respectively. Comparison between thermophilic and mesophilic pairs shows, for majority of the protein pairs, ∆Gthermo < ∆Gmeso elec elec (see column 8 of Table 1). We conclude, generally the beneficial effect of promoting favorable electrostatic interaction in thermophiles does not get over compensated by the possible destabilizing effect of solvation. The only exception is HGCS. RNaseP and Thioredoxin are two other systems where overall effect of electrostatics is less favorable in thermophiles than mesophiles (discussed later) primarily due to destabilizing interaction terms. Interestingly, HGCS, RNaseP and Thioredoxin are also the only three protein systems where network of oppositely charged residues did not show marked difference between thermophile and mesophile, unlike other pairs (Figure 1). For protein ACP, both the interaction and solvation terms are less favorable in thermophile. Our finding on the prominent role of electrostatics is consistent with several experimental

15

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

studies demonstrating the importance of charged residues and improved electrostatic interaction in protein stability. 3–9,36,57,71–75 Makhatadze and colleagues have particularly pointed out the role of electrostatic interaction from surface charged residues. 5,8,9,76,77 Furthermore, in agreement with the work of Brooks and colleagues, 48 we notice 10 out of 12 protein pairs have higher dielectric constant in thermophiles than mesophiles. This expectation can be rationalized by the fact that a higher dielectric constant leads to lower solvation penalty (at least by kT ) of folding which is noticed in 7 (CheW, CheY, HPr, NusB, PaiA, RNaseP, Thioredoxin; see column 4,5,7 in Table 1) of these 10 protein pairs. However, CSP and Anti-σ are exceptions with solvation penalties being comparable (in CSP) or unfavorable (in Anti-σ) in thermophiles compared to mesophiles in spite of thermophilic proteins having a higher dielectric constant than their mesophilic counterparts. The dielectric constant in RNaseH and Thioredoxin are comparable between thermophile and mesophile. We note for ACP and HGCS, the mesophilic protein has a higher dielectric constant than thermophilic protein. In spite of qualitative agreement with some of the earlier studied thumb rules, our work also illustrates counterexamples by providing quantitive estimates of electrostatic energy and more importantly computes individual contribution of different factors such as folded and disordered state interactions and solvation penalty (incorporating the simulated ensemble of the unfolded state), separately. These insights are novel and the methodology presented here can be used to explore the role of specific amino acids to stability and rationally design mutants with varying stabilities. While the electrostatic contribution to energetics at the room temperature (300K) can be illuminating and revealing, it is possible that the role of high temperature is important to properly understand enhanced thermostability as pointed out by Elcock. 70 Therefore, for protein systems ACP, RNaseP, HGCS and Thioredoxin – with positive ∆∆Gelec – it is possible that the combined effect of solvation and electrostatic interaction becomes favorable in the thermophile’s natural elevated temperature environment. Finally, it is instructive to note the errors in ∆∆Gelec that are similar to the errors in ∆∆Gint due to the negligible standard

16

ACS Paragon Plus Environment

Page 16 of 40

Page 17 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

deviation in the solvation across simulation frames. Considering these errors, we notice for seven protein pairs (CheW, CheY, CSP, HPr, NusB, PaiA, RNaseH) thermophilic proteins have overall more favorable contribution from electrostatics compared to their mesophilic homologues, while the reverse is noticed for Thioredoxin and for four remaining pairs (ACP, HGCS, RNaseP and Anti-σ) differences are within errors. 3.5.1

Results generated using MD ensembles differ significantly from PDB structures

Furthermore, we note the resulting calculations differed from calculations utilizing only static PDB structures 47 (Table 4 in supporting information). Significant quantitive discrepancies are apparent while comparing the interaction (column 2 and 3), solvation (column 4 and 5) and overall electrostatics (column 6 and 7) between calculations based on MD trajectories and static PDB structure. Interestingly, for some pairs even a qualitative difference is observed in the interaction energy (PaiA, and RNaseH) and overall electrostatics (RNaseH). These findings reiterate the important role of incorporating the coupling between charge distribution and native state dynamics in such calculations. 3.5.2

Results from CAMPARI generated disordered ensemble differ from interaction free model of the unfolded state

We further highlight the importance of explicitly simulating the disordered ensemble by contrasting our results against the simple model where unfolded state is assumed expanded and free of interaction. Under this simplifying assumption residual interactions are ignored and consequently ∆Guint = 0. Furthermore, the solvation energy of this highly expanded disordered state is calculated by a pentamer model 73 considering each amino acid surrounded by its two nearest sequential neighbors, where the charges on the nearest neighbors are turned off. Using protein sequence information, the pentamers were constructed in TLEAP, 78 and energy minimized with restraints on the backbone atoms to strictly remove bad side chain

17

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

steric clashes. The electrostatic solvation energy was calculated with DelPhi 64 for each constructed pentamer, and the summation of these solvation energies represented the unfolded state solvation energy. 73 The differences in solvation between pentamer model (∆∆G′solv ) and CAMPARI generated ensemble model (∆∆Gsolv ) are clear (see column 2 and 3 in Table 5 in the supporting information). This is primarily due to lack of charge burial in the pentamer model. Difference in the overall electrostatics (column 4 and 5) is even more drastic. These differences are due to the combined effect of inaccurate estimation of solvation in the pentamer model and neglecting the interaction in the disordered state. There is, however, one caveat of the unfolded state model presented here. The simulated ensemble of the unfolded state is representative of the infinitely dilute condition likely to occur under in-vitro conditions and ignores crowding and weak interactions that are expected inside cells. 79,80 Therefore, our results are strictly applicable to explain enhanced thermal tolerance in thermophilic proteins as observed in-vitro. An intriguing question arises: are the driving forces behind enhanced stability of thermophilic proteins in-vitro same as in-vivo conditions under which the organisms have actually evolved ? What are the differences in cellular conditions between thermophiles and mesophiles, for example salt and macromolecular concentrations ? Although limited studies have shown in-vivo and in-vitro stabilities are roughly similar, 81–83 detailed studies under cellular conditions – similar to the work of McGuffee and Elcock 79 – are needed to answer these questions and quantitatively re-evaluate different contributions to the stability in-vivo.

3.6

Insights gleaned from the sequence comparison between homologue proteins

Are there any general guidelines from sequences that can provide intuitive understanding to our findings above? The detailed study on multiple pairs provides an opportunity to explore the relationship between biophysical observables and properties of the sequence.

18

ACS Paragon Plus Environment

Page 18 of 40

Page 19 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

3.6.1

Charge composition and sequence charge patterning plays an important role

Defining |Qthermo | and |Qmeso net net | as the net charge in thermophile and mesophile, respectively, we notice |Qthermo | is less than, or equal to |Qmeso net net | for the protein pairs CheW, CheY, CSP, HGCS, HPr, NusB, PaiA, Anti−σ, and Thioredoxin (see Table 6 in supporting information). Interestingly, with the exception of Thioredoxin, all other protein pairs in the above list have more favorable electrostatic interaction energy in thermophiles compared to mesophiles (column 6 in Table 1). On the other hand, for ACP and RNaseP protein pairs, thermophilic proteins have more net charge and higher electrostatic interaction energy (positive ∆∆Gint in column 6 of Table 1) compared to their mesophilic counterparts. These results can be intuitively understood with the simple picture that too many unbalanced charges in the compact folded state can be destabilizing, supported by models based on protein net charge only. 84–86 However this simple mean-field argument based on protein net charge can not explain our results for RNaseH and Thioredoxin where proteins with higher net charge have more favorable electrostatic interaction. An obvious strategy to achieve this would be to selectively distribute similar charges far from each other. 36 Furthermore, for proteins NusB and CSP, |Qthermo | = |Qmeso net net |, yet both of these protein pairs have substantially more favorable electrostatic interaction energies in the thermophile compared to mesophiles. These examples highlight the important role of preferential charge placement 5,87 beyond net charge composition and need for models that explicitly account for charge decoration and their coupling with the folded and unfolded ensemble, as described here. Quantitive models of pH dependent stability has also shown the importance of considering detailed charge distribution – beyond just the net charge – in the 3D structure. 88 While the discussion above shows importance of charge decoration in 3D space, recent work has also pointed out explicit role of charge patterning – beyond composition – in protein sequence that influence protein and peptide conformation as well. 13,54 Techniques of theoretical polymer physics 13,89–93 has been used to introduce a novel sequence charge 19

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

decoration (SCD) metric 13 that can discriminate between two sequences with same charge composition, but different patterning (see equation 6 in the supporting information for the definition of SCD). For two homologous proteins, considering only the difference in charge content and patterning, and assuming everything else equal, a lower value of SCD implies a more compact denatured state. 13,86 Using this metric on a dataset of 540 homologous pairs, it has been shown that SCD is, on average, lower in thermophiles compared to mesophiles. This finding indicated that thermophiles, in general, have a more compact denatured state compared to mesophiles and pointed out the subtle role of charge patterning in thermophilic adaptation. 13,86 Consistent with this finding, we notice SCDthermo is less or comparable to SCDmeso for the majority of the sequences (10 out of 12 pairs explored in this work, see Table 6 in supporting information). The only exceptions are RNaseH and RNaseP for which SCDthermo is significantly higher than SCDmeso . Both of these protein pairs also show a significant charge imbalance in the thermophile. We further notice, for a given protein pair, low values of SCD is associated with lower interaction energy in the unfolded state (compare column 3 in Table 1 of the main text and Table 6 in the supporting information). The only exceptions are HPr and ACP where thermophilic protein has lower interaction energy in the unfolded state compared to mesophiles in spite of having comparable SCDs. This also shows difference between coarse-grain SCD metric (based on protein charge only) and all-atom simulation used to generate disordered ensemble with all interactions present. 3.6.2

Alternate strategies other than electrostatics for thermophilic adaptation

It is illuminating to further investigate the sequences of ACP, HGCS, RNaseP and Thioredoxin where the combined effect of solvation and electrostatic interaction energy is comparable (RNaseP), or less (ACP, HGCS and Thioredoxin) favorable in the thermophilic protein than mesophile. Also note, considering the error estimates, the effect of overall electrostatics is inconclusive in ACP, HGCS and RNaseP. So what could possibly lead to higher stability in thermophiles for these proteins ? Interestingly, ACP, HGCS, and RNaseP all have

20

ACS Paragon Plus Environment

Page 20 of 40

Page 21 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

a significant enhancement in the fraction of hydrophobic residues in their thermophilic sequences. An increase in hydrophobicity can significantly enhance stability, for example an increase in hydrophobic composition fraction from 0.38 to 0.45 is sufficient to increase the melting temperature by almost 40◦ C. 94 This presents an intriguing possibility that wherever altering charge distribution is not an option to obtain higher stability, proteins may rely on an alternate mechanism of significantly enhancing hydrophobic fraction. Proteins with high charge and specific charge patterning have been associated with specific function. 95,96 These functional restrictions may limit adopting electrostatics as a strategy to enhance stability in these proteins, a direction that needs further study. Finally, it is important to discuss the protein Thioredoxin that does not seem to follow the electrostatics strategy either. Thermophilic and mesophilic thioredoxin do not have any marked difference in either charge content, nor in the degree of hydrophobicity. However, thermophilic sequence of Thioredoxin has a significant proline enrichment compared to its mesophilic counterpart. Separate experiments have shown the role of proline substitution to increase the melting temperature. 97 This can also impart higher resilience towards unfolding resulting in slower unfolding rate as noticed by Makhatadze group. 29 Their work further reports melting temperature in thermophiles is lowered much less compared to mesophiles when pH is lowered. This is consistent with our finding that electrostatics does not play a significant role in stabilizing thermophilic Thioredoxin.

3.7

Thermophilic proteins do not necessarily have suppressed fluctuations in the native state

Thermophilic proteins are often believed to be more rigid compared to their mesophilic counterparts. While previous simulation studies 38–41 have already proven a lack of such trend, further exception to this rule is provided here for several protein pairs not studied before. To provide a single metric per protein, we compute the average fluctuation from the mean 21

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

!"'

-.!/

-.!/

!"(

!"& !"% !"! !"!

!"#

$"!

$&! #&' #&! "&' "&! !&' !&!

$"#

"(&

!

"

#

$

!(! !

%

D

E -.!/

-.!/

-.!/

!"#

$"! !"#

$"!

!"! !"!

$"#

!"#$%&'"() *"#+ ! ,"

$"!

$"#

%"!

H $"!

-.!/

$"!

!"#

!"#

!"! !"!

!"! !"! !"# $"! $"# %"! %"# &"! &"# !"#$%&'"() *"#+ ! ,"

!"#

$"!

$"#

!"#$%&'"() *"#+ ! ,"

$"!

%"!

!

"

#

$

'

$"!

$"#

I

!"!

!"#

!"$

!"%

!"&

'"!

!"#$%&'"() *"#+ ! ,"

K

$&! #&' #&! "&' "&! !&' !&!

&

'"# '"! !"& !"% !"$ !"# !"!

$"#

-.!/

-.!/

$"!

!"#

!"#

!"#$%&'"() *"#+ ! ,"

J

$"#

!"#

%

!"#$%&'"() *"#+ ! ,"

$"#

-.!/

-.!/

$"#

!"! !"!

!"#

$

F

$"% $"! !"( !"' !"& !"% !"! !"!

!"#$%&'"() *"#+ ! ,"

G

%"!

#

!"#$%&'"() *"#+ ! ,"

$"!

!"#

"

!"#$%&'"() *"#+ ! ,"

$"#

!"! !"!

"(! !(&

!"#$%&'"() *"#+ ! ,"

$"#

C

B -.!/

A

$"!

-.!/

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 40

%

!"#$%&'"() *"#+ ! ,"

L

"&# "&! !&( !&' !&% !&# !&! !

"

#

$

%

!"#$%&'"() *"#+ ! ,"

Figure 2: Cumulative hRM SF i (see supporting information for details of the calculation) as a function of simulation time for 12 thermophile (red) -mesophile (blue) pairs: A) ACP, B) CheW, C) CheY, D) CSP, E) HGCS, F) HPr, G) NusB, H) PaiA, I) RNaseH, J) RNaseP, K) Anti-σ, and L) Thioredoxin. The crossing of curves show the instability in the calculation for short time scales. structure (RM SF ) as a function of time (details of the calculations are in the supplemental methods section in the supporting information). Figure 2 displays the average fluctuation (hRM SF i) comparison between thermophilic and mesophilic proteins for each homologous pair. This observation shows thermophiles do not necessarily have suppressed fluctuations compared to mesophiles. This may seem to contradict that thermophilic enzymes have lower activity at room temperature. However, low activity may be due to high activation barrier of the chemical step, and not necessarily due to structural rigidity. 41,98 It is also important to note, thermophilic proteins may have greater resilience at high temperature, due to better connected ion pairs, which is not synonymous to greater rigidity. 63 Finally, it should be

22

ACS Paragon Plus Environment

Page 23 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

noted that the time scales of reported fluctuations were limited to the simulation time, and therefore, our findings have no knowledge of large scale motion, or their relative differences between mesophilic and thermophilic proteins. The time evolution of hRM SF i for each protein pair (Figure 2) makes another important point. For multiple pairs, it is observed that the comparison of fluctuations between thermophile and mesophile derived from short trajectories can lead to drastically different conclusions compared to long time scale simulations, as is evident from multiple crossings of the time evolution graphs between thermophile and mesophile in these systems. The importance of long-time simulation and sampling issues have been pointed out in the flexibility study of Trigger-factor 99 as well. This once again highlights the importance of long time simulations to draw meaningful conclusion beyond sampling artifacts.

Conclusion In summary, we present detailed physicochemical analysis of 12 thermophilic-mesophilic proteins pairs from which we draw several key conclusions. First, the electrostatic interaction energy is more favorable in the native state of thermophilic proteins compared to mesophiles, consistent with experimental studies. 4–9,76,77 Second, thermophilic proteins in general have more favorable attractive interaction in their disordered ensemble compared to their mesophilic counterparts, an effect not quantified before. This emphasizes the differential role of the unfolded state in the stability of thermophilic and mesophilic proteins and need for further investigation. However, the net effect of the electrostatic interaction energy, in general, is more favorable in thermophiles compared to mesophiles. Third, the overall effect of electrostatics remain more favorable in the folding stability of thermophiles than mesophiles for majority of systems even after considering the desolvation penalty of folding. Fourth, four protein systems (ACP, HGCS, RNaseP and Thioredoxin) in which overall electrostatics does not seem to be more favorable in the folding stability of thermophiles

23

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

compared to mesophiles, highlight alternate mechanisms for thermophilic adaptation when modifying charges is not possible for functional reasons. Fifth, within the simulation time scale, the widely held view that thermophilic proteins have preferentially suppressed motion than their mesophilic counterparts is not supported. Next, long simulation trajectories are necessary to avoid sampling artifacts, as is evident from the observed behavior based on early simulation times (short time scale) that can conflict with long simulation results. This reiterates recent work that has shown the convergence of native state simulations can be challenging, and may require several microseconds long simulations. 52 The importance of carefully converged MD simulations to model folded state dynamics is further highlighted by noticing electrostatic contribution to energetics solely based on PDB structures can give incorrect qualitative and quantitative predictions. Finally, we note our integrated method to include folded and unfolded ensemble along with a finite difference method to systematically compute different contributions to electrostatics will be useful for mutational studies and protein design. Although, 12 protein pairs is still limited due to computational expenses, but it is a first step towards such a global analysis to reveal abundance of different strategies and highlights need for such detailed quantitative analysis at even greater scale.

Acknowledgement We acknowledge support from NSF (award number 1149992), RCSA (as a Cottrell Scholar) and University of Denver (for PROF grant). We also acknowledge the High Performance Computing (HPC) facility at the University of Denver for computing support.

Supporting Information Available The supplemental methods section of the supporting information contains details about the protein pairs selected, MD and MC protocol, clustering of MD and MC frames, convergence criteria, calculation of the dielectric constant, electrostatic energies (both interaction and solvation), RMSF, and the Sequence Charge Decoration metric (SCD). The supporting in24

ACS Paragon Plus Environment

Page 24 of 40

Page 25 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

formation contains tables with the PDB codes of proteins studied along with their source organisms. In addition, the supporting information contains multiple tables reporting different contributions to electrostatic energies (averages, standard deviations, and errors, as appropriate) as referred in the main text. A table containing the net charges and calculated SCD of each protein is also given in the supporting information. The supporting information includes a figure introducing the concept of the edge vs vertex metric for a toy protein under two illustrative cases.

This material is available free of charge via the Internet at

http://pubs.acs.org/.

References (1) Kumar, S.; Nussinov, R. How do thermophilic proteins deal with heat? Cell. Mol. Life. Sci. 2001, 58, 1216–1233. (2) Sawle, L.; Ghosh, K. How Do Thermophilic Proteins and Proteomes Withstand High Temperature? Biophys. J. 2011, 101, 217–227. (3) Kumar, S.; Tsai, C.; Nussinov, R. Thermodynamic Difference among Homologous Thermophilic and Mesophilic Proteins. Biochemistry 2001, 40, 14152–14165. (4) Grimsley, G.; Shaw, K.; Fee, L.; Alston, R.; BM, H.-D.; Thurlkill, R.; Scholtz, J.; Pace, C. Increasing protein stability by altering long-range coulombic interactions. Protein Sci. 1999, 8, 1843–1849. (5) Loladze, V.; Ibarra-Molero, B.; Sanchez-Ruiz, J.; Makhatadze, G. Engineering a thermostable protein via optimization of charge-charge interactions on the protein surface. Biochemistry 1999, 38, 16419–16423. (6) Perl, D.; Mueller, U.; Heinemann, U.; Schmid, F. Two exposed amino acid residues confer thermostability on a cold shock protein. Nat. Struct. Mol. Biol. 2000, 7, 380– 383. 25

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(7) Spector, S.; Wang, M.; Carp, S.; Robblee, J.; Hendsch, Z.; Fairman, R.; Tidor, B.; Raleigh, D. Rational modification of protein stability by the mutations of charged surface residues. Biochemistry 2000, 39, 872–879. (8) Loladze, V.; Makhatadze, G. Removal of surface charge-charge interactions from ubiquitin leaves the protein folded and very stable. Protein Sci. 2002, 11, 174–177. (9) Strickler, S.; Gribenko, A.; Gribenko, A.; Keiffer, T.; Tomlinson, J.; Reihle, T.; Loladze, V.; Makhatadze, G. Protein stability and surface electrostatics: a charged relationship. Biochemistry 2006, 45, 2761–2766. (10) Hart, K. M.; Harms, M. J.; Schmidt, B. H.; Elya, C.; Thornton, J. W.; Marqusee, S. Thermodynamic system drift in protein evolution. PLoS Biol. 2014, 12, e1001994. (11) Li, Y.; Middaugh, C.; Fang, J. A novel scoring function for discriminating hyperthermophilic and mesophilic proteins with application to predicting relative thermostability of protein mutants. BMC Bioinf. 2010, 11, 62. (12) Berezovsky, I.; Chen, W.; Choi, P.; Shakhnovich, E. Entropic Stabilization of Proteins and Its Proteomic Consequences. PLoS Comp. Bio. 2005, 1, 0322–0332. (13) Sawle, L.; Ghosh, K. A theoretical method to compute sequence dependent configurational properties in charged polymers and proteins. J. Chem. Phys. 2015, 143, 085101. (14) Miyazawa, S.; Jernigan, R. L. Protein stability for single substitution mutants and the extent of local compactness in the denatured state. Protein Eng. 1994, 7, 1209–1220. (15) Robic, S.; Guzman-Casado, M.; Sanchez-Ruiz, J. M.; Marqusee, S. Role of residual structure in the unfolded state of a thermophilic protein. Proc. Natl. Acad. Sci. 2003, 100, 11345–11349.

26

ACS Paragon Plus Environment

Page 26 of 40

Page 27 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(16) Liu, C. C.; Licata, V. J. The stability of Taq DNA polymerase results from a reduced entropic folding penalty; identification of other thremophilic proteins with similar folding thermodynamics. Proteins 2014, 82, 785–793. (17) Dill, K.; Ghosh, K.; Schmit, J. Physical limits of cells and proteomes. Proc. Natl. Acad. Sci. 2011, 108, 17876–17882. (18) Vieille, C.; Zeikus, J. Thermozymes: Identifying molecular determinants of protein structural and functional stability. Trends in Biotechnol. 1996, 14, 183–190. (19) Eisenmesser, E.; Millet, O.; Labeikovsky, W.; Korzhnev, D.; Wolf-Watz, M.; Bosco, D.; Skalicky, J.; Kay, L.; Kern, D. Intrinsic dynamics of an enzyme underlies catalysis. Nature 2005, 438, 117–21. (20) Bahar, I.; Chennubhotla, C.; Tobi, D. Intrinsic dynamics of enzymes in the unbound state and relation to allosteric regulation. Curr. Opin. Struct. Biol. 2007, 17, 633–640. (21) Bahar, I.; Lezon, T. R.; Yang, L.-W.; Eyal, E. Global dynamics of proteins: Bridging between structure and function. Annu. Rev. Biophys. 2010, 39, 23–42. (22) Teilum, K.; Olsen, J. G.; Kragelund, B. B. Protein stability, flexibility and function. Biochim. Biophys. Acta, Proteins Proteomics 2011, 1814, 969 – 976. (23) Nevin Gerek, Z.; Kumar, S.; Banu Ozkan, S. Structural dynamics flexibility informs function and evolution at a proteome scale. Evol. Appl. 2013, 6, 423–433. (24) Rader, A. Thermostability in rubredoxin and its relationship to mechanical rigidity. Phys. Biol. 1998, 7, 016002. (25) Wolf-Watz, M.; Thai, V.; Henzler-Wildman, K.; Hadjipavlou, G.; Eisenmesser, E.; Kern, D. Linkage between dynamics and catalysis in a thermophilic-mesophilic enzyme pair. Nat. Struct. Mol. Biol. 2004, 11, 945–949.

27

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(26) Wagner, G.; Wuthrich, K. Correlation between the Amide Proton Exchange Rates and the Denaturation Temperatures in Globular Proteins Related to the Basic Pancreatic Trypsin Inhibitor. J. Mol. Biol. 1979, 130, 31–37. (27) Cavagnero, S.; Debe, D.; Zhou, Z.; Adams, M.; Chan, S. Kinetic Role of Electrostatic Interactions in the Unfolding of Hyperthermophilic and Mesophilic Rubredoxins. Biochemistry 1998, 37, 3369–3376. (28) Struvay, C.; Negro, S.; Matagne, A.; Feller, G. Energetics of Protein Stability at Extreme Environmental Temperatures in Bacterial Trigger Factors. Biochemistry 2013, 52, 2982–2990. (29) Tzul, F.; Vasilchuk, D.; Makhatadze, G. Evidence for the Principle of Minimal Frustration in the Evolution of Protein Folding Landscapes. Proc. Natl. Acad. Sci. 2017, 114, 1627–1632. (30) Hernandez, G.; Jenney, F.; Adams, M.; LeMaster, D. Millisecond time scale conformational flexibility in a hyperthermophile protein at ambient temperature. Proc. Natl. Acad. Sci. 2000, 97, 3166–3170. (31) Jaenicke, R. Do ultrastable proteins from hyperthermophiles have high or low conformational rigidity ? Proc. Natl. Acad. Sci. 2000, 97, 2962–2964. (32) Hollien, J.; Marqusee, S. Structural distribution of stability in a thermophilic enzyme. Proc. Natl. Acad. Sci. 1999, 96, 13674–13678. (33) Tang, K. E.; Dill, K. Native Protein Fluctuations: The conformational-motion temperature and the inverse correlation of protein flexibility with protein stability. J. Biomol. Struct. Dyn. 1998, 16, 397–411. (34) Livesay, D.; Dallakyan, S.; Wood, G.; Jacobs, D. A flexible approach for understanding protein stability. FEBS Lett. 2004, 576, 468–476. 28

ACS Paragon Plus Environment

Page 28 of 40

Page 29 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(35) Rathi, P. C.; Jaeger, K.-E.; Gohlke, H. Structural Rigidity and Protein Thermostability in Variants of Lipase A from Bacillus subtilis. PLoS One 2015, 10, e0130289. (36) Karshikoff, A.; Nilsson, L.; Ladenstein, R. Rigidity versus flexibility: the dilemma of understanding protein thermal stability. FEBS J. 2015, 282, 3899–917. (37) Jacobs, D. Ensemble-Based methods for describing protein dynamics. Curr. Opin. Pharmacol. 2010, 10, 760–769. (38) Wintrode, P.; Zhang, D.; Vaidehi, N.; Arnold, F.; Goddard, W. Protein dynamics in a family of laboratory evolved thermophilic enzymes. J. Mol. Biol. 2003, 327, 745–757. (39) Motono, C.; Gromiha, M.; Kumar, S. Thermodynamics and kinetic determinants of Thermotoga maritime cold shock protein stability: A structural and dynamic analysis. Proteins: Struct., Funct., Bioinf. 2007, 71, 655–669. (40) Merkley, E.; Parson, W.; Daggett, V. Temperature dependence of the flexibility of thermophilic and mesophilic flavoenzymes of the nitroreductase fold. Protein Eng., Des. Sel. 2010, 23, 327–336. (41) Kalimeri, M.; Rahaman, O.; Melchionna, S.; Sterpone, F. How Conformational Flexibility Stabilizes the Hyperthermophilic Elongation Factor G-domain. J. Phys. Chem. B 2013, 117, 13775–13785. (42) Livesay, D.; Jacobs, D. Conserved quantitative stability/flexibility relationships (QSFR) in an orthologous RNase H pair. Proteins: Struct., Funct., Bioinf. 2006, 62, 130–143. (43) Mottonen, J.; Xu, M.; Jacobs, D.; Livesay, D. Unifying mechanical and thermodynamic descriptions across the thioredoxin protein family. Proteins: Struct., Funct., Bioinf. 2008, 75, 610–627.

29

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(44) Huang, X.; Zhou, H.-X. Similarity and difference in the unfolding of thermophilic and mesophilic cold shock proteins studied by molecular dynamics simulations. Biophys. J. 2006, 91, 2451–2463. (45) Fitter, J.; Heberle, J. Structural equilibrium fluctuations in mesophilic and thermophilic α-amylase. Biophys. J. 2000, 79, 1629–1636. (46) Fitter, J.; Herrmann, R.; Hauss, T.; Lechner, R.; Dencher, N. Dynamical properties of alpha-amylase in the folded and unfolded state: the role of thermal equilibrium fluctuations for conformational entropy and protein stabilization. Physica B 2001, 301, 1–7. (47) Danciulescu, C.; Ladenstein, R.; Nilsson, L. Dynamic Arrangement of Ion Pairs and Individual Contributions to the Thermal Stability of the Cofactor-Binding Domain of Glutamate Dehydrogenase from Thermotoga maritime. Biochemistry 2007, 46, 8537– 8549. (48) Dominy, B. N.; Minoux, H.; Brooks, C. L. An electrostatic basis for the stability of thermophilic proteins. Proteins: Struct., Funct., Bioinf. 2004, 57, 128–141. (49) Sterpone, F.; Melchionna, S. Thermophilic proteins: insights and perspective from in silicon experiments. Chem. Soc. Rev. 2012, 41, 1665–1676. (50) Singh, B.; Bulusu, G.; Mitra, A. Understanding the Thermostability and Activity of Bacillus subtilis Lipase Mutants: Insights from Molecular Dynamics Simulations. J. Phys. Chem. B 2015, 119, 392–409. (51) Chen, L.; Li, X.; Wang, R.; Fang, F.; Yang, W.; Kan, W. Thermal stability and unfolding pathways of hyperthermophilic and mesophilic periplasmic binding proteins studied by molecular dynamics simulation. J. Biomol. Struct. Dyn. 2016, 0, 1–14.

30

ACS Paragon Plus Environment

Page 30 of 40

Page 31 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(52) Sawle, L.; Ghosh, K. Convergence of Molecular Dynamics Simulation of Protein Native States: Feasibility vs Self-Consistency Dilemma. J. Chem. Theory Comput. 2016, 12, 861–869. (53) Vitalis, A.; Pappu, R. ABSINTH: a new continuum solvation model for simulations of polypeptides in aqueous solutions. J. Comput. Chem. 2009, 30, 673–699. (54) Das, R. K.; Pappu, R. V. Conformations of intrinsically disordered proteins are influenced by linear sequence distributions of oppositely charged residues. Proc. Natl. Acad. Sci. 2013, 110, 13392–13397. (55) Kumar, S.; Nussinov, R. Relationship between ion pair geometries and electrostatic strengths in proteins. Biophys. J. 2002, 83, 1595–1612. (56) Zhou, H. X. Toward the physical basis of thermophilic proteins: Linking of enriched polar interactions and reduced heat capacity of unfolding. Biophys. J. 2002, 83, 3126– 3133. (57) Lebbink, J. H. G.; Knapp, S.; Oost, J. v. d.; Rice, D.; Ladenstein, R.; M de. Voss, W. Engineering Activity and Stability of Thermotoga Maritime Glutamate Dehydrogenase. II: Construction of a 16-reside Ion-pair Network at the Subunit Interface. J. Mol. Biol. 1999, 289, 357–369. (58) Brinda, K.; Vishveshwara, S. A network representation of protein structures: Implications for protein stability. Biophys. J. 2005, 89, 4159–4170. (59) Vijayabaskar, M.; Vishveshwara, S. Interaction energy based protein structure networks. Biophys. J. 2010, 99, 3704–3715. (60) Chea, E.; Livesay, D. R. How accurate and statistically robust are catalytic site predictions based on closeness centrality ? BMC Bioinform. 2007, 8 .

31

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(61) Chen, T.; Chan, H. S. Native Contact Density and Nonnative Hydrophobic Effects in the Folding of Bacterial Immunity Proteins. PLoS Comp. Bio. 2015, 11, e1004260. (62) Lam, S.; Yeung, R.; Yu, T.-H.; Sze, K.-H.; Wong, K.-B. A Rigidifying Salt-Bridge Favors the Activity of Thermophilic Enzyme at High Temperatures at the Expense of Low-Temperature Activity. PLoS Biol. 2011, 9, e1001027. (63) Aguilar, C. F.; Sanderson, I.; Moracci, M.; Ciaramella, M.; Nucci, R.; Rossi, M.; Pearl, L. H. Crystal Structure of the beta-Glycosidase from the Hyperthermophilic Archeon Sulfolobus solfataricus: Resilience as a Key Factor in Thermostability. J. Mol. Biol. 1997, 271, 789–802. (64) Li, L.; Li, C.; Sarkar, S.; Zhang, J.; Witham, S.; Zhang, Z.; Wang, L.; Smith, N.; Petukh, M.; Alexov, E. DelPhi: a comprehensive suite for DelPhi software and associated resources. BMC Biophys. 2012, 5, 1–11. (65) Mittal, A.; Das, R.; Vitalis, A.; Pappu, R. Computational Approaches to Protein Dynamics: From Quantum to Coarse-Grained Methods; CRC Press, Boca Raton, FL, 2015; Chapter ABSINTH Implicit Solvation Model and Force Field Paradigm for Use in Simulations of Intrinsically Disordered Proteins. (66) Mao, A.; Crick, S.; Vitalis, A.; Chicoine, C.; Pappu, R. Net Charge Per Residue Modulates Conformational Ensembles of Intrinsically Disordered Proteins. Proc. Natl. Acad. Sci. 2010, 107, 8183–8188. (67) Meng, W.; Luan, B.; Lyle, N.; Pappu, R.; Raleigh, D. The Denatured State Ensemble Contains Significant Local and Long-range Structure Under Native Condition: Analysis of the N-terminal Domain of the Ribosomal Protein L9. Biochemistry 2013, 52, 2662– 2671. (68) Meng, W.; Lyle, N.; Luan, B.; Raleigh, D.; Pappu, R. Experiments and Simulations

32

ACS Paragon Plus Environment

Page 32 of 40

Page 33 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Show How Long-Range Contacts Can Form in Expanded Unfolded Proteins with Negligible Secondary Structure. Proc. Natl. Acad. Sci. 2013, 110, 2123–2128. (69) Luan, B.; Lyle, N.; Pappu, R.; Raleigh, D. Denatured State Ensembles with the same radii of gyration can form significantly different long-range contacts. Biochemistry 2014, 53, 39–47. (70) Elcock, A. H. The stability of salt bridges at high temperatures: Implications for hyperthermophilic proteins. J. Mol. Bio. 1998, 284, 489–502. (71) Pace, C. N.; Alston, R. W.; Shaw, K. L. Charge-charge interactions influence the denatured state ensemble and contribute to protein stability. Protein Sci. 2000, 9, 1395– 1398. (72) Kumar, S.; Ma, B.; Tsai, C.; Nussinov, R. Electrostatic strengths of salt bridges in thermophilic and mesophilic glutamate dehydrogenase monomers. Proteins: Struct., Funct., Genet. 2000, 38, 368–383. (73) Dominy, B.; Perl, D.; Schmid, F.; Brooks, C. The effects of ionic strength on protein stability: The cold shock protein family. J. Mol. Biol. 2002, 319, 541–554. (74) Torrez, M.; Schultehenrich, M.; Livesay, D. Conferring Thermostability to Mesophilic Proteins through Optimized Electrostatic Surfaces. Biophys. J. 2003, 85, 2845–2853. (75) Alsop, E.; Silver, M.; Livesay, D. R. Optimized electrostatic surfaces parallel increased thermostability: a structural bioinformatic analysis. Protein Eng., Des. Sel. 2003, 16, 871–874. (76) Gribenko, A.; Patel, M.; Liu, J.; McCallum, S.; Wang, C.; Makhatadze, G. Rational stabilization of enzyme by computational redesign of surface charge-charge interactions. Proc. Natl. Acad. Sci. 2009, 106, 2601–2606.

33

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(77) Tzul, F.; Schweiker, K.; Makhatadze, G. Modulation of folding energy landscape by charge-charge interactions: Linking experiments with computational modeling. Proc. Natl. Acad, Sci. 2015, 112, E259–266. (78) Schafmeister, C. E. A. F.; Ross, W. S.; Romanovski, W. S. LEAP. 1995. (79) McGuffee, S. R.; Elcock, A. H. Diffusion, crowding protein stability in a dynamic molecular model of the bacterial cytoplasm. PLoS Comput. Biol. 2010, 6, e:1000694. (80) Sarkar, M.; Smith, A. E.; Pielak, G. J. Impact of reconstituted cytosol on protein stability. Proc. Natl. Acad. Sci. 2013, 110, 19342–19347. (81) Ghaemmaghami, S.; Oas, T. G. Quantitative proteins stability measurement in vivo. Nat. Struct. Biol. 2001, 8, 879–882. (82) Ignatova, Z.; Gierasch, L. M. Monitoring protein stability and aggregation in vivo by real-time fluorescent labeling. Proc. Natl. Acad. Sci. 2004, 101, 523–528. (83) Guo, M.; Xu, Y.; Gruebele, M. Temperature dependence of protein folding kinetics in living cells. Proc. Natl. Acad. Sci. 2012, 109, 17863–17867. (84) Ghosh, K.; Dill, K. A. Computing protein stabilities from their chain lengths. Proc. Natl. Acad. Sci. 2009, 106, 10649–10654. (85) De Graff, A.; Hazoglou, M.; Dill, K. Highly Charged Proteins: The Achille’s Heel of Aging Proteomes. Structure 2016, 24, 1–8. (86) Ghosh, K.; De Graff, A.; Sawle, L.; Dill, K. The role of proteome physical chemistry in cell behavior. J. Phys. Chem. B. 2016, 120, 9549–9563. (87) Sanchez-Ruiz, J.; Makhatadze, G. To charge or not to charge ? Trends Biotechnol. 2001, 19, 132–135.

34

ACS Paragon Plus Environment

Page 34 of 40

Page 35 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(88) Alexov, E. Numerical calculations of the pH of maximal protein stability. The effect of the sequence composition and three-dimensional structure. Eur. J. Biochem. 2004, 271, 173–185. (89) Edwards, S.; Singh, P. Size of a polymer molecule in solution. Part 1 – excluded volume problem. J. Chem. Soc. Faraday Transaction 2 1979, 75, 1020–1029. (90) Muthukumar, M.; Nickel, B. Perturbation theory for a polymer chain with excluded volume interaction. J. Chem. Phys. 1984, 80, 5839–5850. (91) Ghosh, K.; Carri, G.; Muthukumar, M. Configurational properties of a single semiflexible polyelectrolyte. J. Chem. Phys. 2001, 115, 4367–4375. (92) Ghosh, K.; Muthukumar, M. Scattering properties of a single semiflexible polyelectrolyte. J. Polym. Sci. Part B Polym. Phys. 2001, 39, 2644–2652. (93) Rustad, M.; Ghosh, K. Why and how does native topology dictate the folding speed of a protein? J. Chem. Phys. 2012, 137, 205104. (94) Dill, K. A.; Alonso, D.; Hutchinson, K. Thermal Stabilities of Globular-Proteins. Biochemistry 1989, 28, 5439–5449. (95) Karlin, S.; Blaisdell, B.; Brendel, V. Identification of significant sequence patterns in proteins. Methods Enzymol. 1990, 183, 388–402. (96) Karlin, S. Statistical significance of sequence patterns in proteins. Curr. Opin. Struct. Biol. 1995, 5, 360–371. (97) Trevino, S. R.; Schaefer, S.; Scholtz, J. M.; Pace, C. N. Increasing protein conformational stability by optimizing beta-turn sequence. J. Mol. Biol. 2007, 373, 211–218. (98) Roca, M.; Liu, H.; Messer, B.; Warshel, A. On the Relationship between Thermal Stability and Catalytic Power of Enzymes. Biochemistry 2007, 46, 15076–15088. 35

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(99) Thomas, A. S.; Mao, S.; Elcock, A. H. Flexibility of the bacterial chaperone trigger factor in microsecond-timescale molecular dynamics simulations. Biophys. J 2013, 105, 732–744.

36

ACS Paragon Plus Environment

Page 36 of 40

Page 37 of 40

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment

0.4

0.6

0.8

1.0

D

0.4

0.6

0.8

1.0

2 1 0.4

Edges per Vertex

1.0

E

0.2

0.4

0.6

0.8

1.0

1.2 1.0 0.8 0.6 0.4 0.2 0.0 0.0

0.2

0.4

0.6

0.8

1.0

1.0 0.5

0.4

0.6 Pc

0.8

0.2

0.4

0.6

0.8

1.5 1.0 0.5 ACS Paragon Plus Environment 0.0 0.0 0.2 0.4 0.6 0.8 Pc

1.0

F

0.2

0.4

0.6

0.8

1.0

I

1.0 0.5 0.2

0.4

0.6

0.8

1.0

Pc

K

2.0

1.0

0.8

1.5

0.0 0.0

1.0

Pc

Edges per Vertex

J

1.5

H

0.5 0.0 0.0

0.6 Pc

Pc

1.0

Pc

0.2

0.8

Pc

1.5 Edges per Vertex

Edges per Vertex

G

0.2

1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 0.0

0.6

0.5

Pc

3

0.0 0.0

0.4

1.0

Edges per Vertex

0.2

Pc

0 0.0

0.2

Pc

C

Page 38 of 40

1.5

0.0 0.0

1.0

1.5 Edges per Vertex

0.2

2.0 1.5 1.0 0.5 0.0 0.0

Edges per Vertex

Edges per Vertex

3.0 of Chemical Theory and Computation A Journal B 2.5

Edges per Vertex

Edges per Vertex Edges per Vertex

1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 0.0

Edges per Vertex

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42

1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 0.0

L

1.0 0.5 0.0 0.0

0.2

0.4

0.6 Pc

0.8

1.0

〈 RMSF 〉

〈 RMSF 〉

〈 RMSF 〉 1

2

3

1.0 0.5 0.0 0

4

1

2

1.0 0.5 0.0 0.0

0.5

1.0

1.5

2.0

H

〈 RMSF 〉

0.5 0.5

1.0

〈 RMSF 〉

3.0 2.5 2.0 1.5 1.0 0.5 ACS Paragon Plus Environment 0.0 0 1 2 3 Simulation Time (μs)

6

0.5

1.0

1.5

1.5

I

1.2 1.0 0.8 0.6 0.4 0.2 0.0 0.0

Simulation Time (μs)

J

5

Simulation Time (μs)

1.0

0.0 0.0

4

F

1.2 1.0 0.8 0.6 0.4 0.2 0.0 0.0

Simulation Time (μs)

1.5

3

Simulation Time (μs)

E

1.5

G

〈 RMSF 〉

0

C

1.5

Simulation Time (μs)

D

〈 RMSF 〉

B

〈 RMSF 〉

0.6

〈 RMSF 〉

〈 RMSF 〉

0.8 1 2 0.4 3 0.2 4 5 0.0 6 0.0 0.5 1.0 1.5 7 Simulation Time (μs) 8 9 1.5 10 11 1.0 12 13 140.5 15 16 170.0 0.5 1.0 1.5 18 0.0 Simulation Time (μs) 19 20 212.0 22 231.5 24 251.0 26 270.5 280.0 29 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 30 Simulation Time (μs) 31 321.5 33 34 351.0 36 370.5 38 39 400.0 41 0.0 0.5 1.0 1.5 2.0 42 Simulation Time (μs)

3.0 2.5 2.0 1.5 1.0 0.5 0.0

Journal of Chemical Theory and Computation

〈 RMSF 〉

A

0.2

0.4

0.6

0.8

1.0

Simulation Time (μs)

K 〈 RMSF 〉

1.0

Page 39 of 40

4

L

1.2 1.0 0.8 0.6 0.4 0.2 0.0 0

1

2

3

Simulation Time (μs)

4

Journal of Chemical Theory and Computation

1

q1 q2

r

2 12

Nativeness, Fn

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

0.8 0.6

Mesophilic

Thermophilic

0.4 0.2 0 40

50

ACS Paragon Plus Environment

Wednesday, April 6, 16

Page 40 of 40

60 70 80 90 Temperature H°CL

100