Effect of Correlated Pair Mutations in Protein Misfolding | The Journal

May 24, 2019 - A Monte Carlo simulation based sequence design method is proposed to explore the effect of correlated pair mutations in proteins. In th...
0 downloads 0 Views 3MB Size
Article pubs.acs.org/JPCB

Cite This: J. Phys. Chem. B 2019, 123, 5069−5078

Effect of Correlated Pair Mutations in Protein Misfolding Adesh Kumar and Parbati Biswas* Department of Chemistry, University of Delhi, Delhi 110007, India

Downloaded via KEAN UNIV on July 17, 2019 at 12:15:20 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.

S Supporting Information *

ABSTRACT: A Monte Carlo simulation based sequence design method is proposed to explore the effect of correlated pair mutations in proteins. In the designed sequences, the most correlated residue pairs are identified and mutated with all possible amino acid pairs except those already present. The cumulative correlated pair mutations generated an array of mutated sequences. Results show a significant increase in the probability of misfolding for correlated pair mutations as compared to that of the random pair mutations. The pair mutations of correlated residues that are in contact record a higher probability of misfolding as compared to the correlated residues that are not in contact. The probability of misfolding increases on pair mutation of nonlocally correlated residue pairs as compared to that of the locally correlated residue pairs. The choice of a compact or expanded conformation does not depend on the type of correlated pair mutations. Pair mutation of the most correlated residue pairs at the surface with hydrophobic amino acids results in higher misfolding probability as compared to that in the core. An exactly opposite behavior is observed on pair mutation with hydrophilic and charged amino acid pairs. The neutral amino acid pairs do not differentiate between core and surface sites. This study may be used for targeted mutation experiments to predict complex mutation patterns, reengineer the existing proteins, and design new proteins with reduced misfolding propensity.



INTRODUCTION The complex sequence−structure relationship in proteins is consistent with their respective three-dimensional conformations and optimized biological functions.1,2 Naturally evolved proteins are mutationally robust, which promotes diversity in sequences for a given target structure.3−5 The phenotypical impact of a mutation is crucial for assessing the naturally occurring mutations in proteins. While most amino acid residues located at their respective sites preserve the existing fold of any protein, mutations at these sites only slightly affect the stability of the protein without altering its function, except mutations of the active site or allied residues that are vital for the functional specificity of the protein.6,7 This fact may be exploited to understand and interpret correlated mutations in proteins. Correlated residues either in physical contact or distant along the protein sequence are considered as part of an interaction network, and they influence the protein structure and its folding properties.8 Mutations of the correlated residues may disrupt this network and, thus, enhance the probability of misfolding as compared to the random mutations.9−11 The effect of the site-directed point/pairwise mutations at specific sites is largely accommodated by the movement of the neighboring residues in the polypeptide chain to preserve the overall compactness of the protein. The folding pathway of a protein is unperturbed by residue substitutions due to point or pair mutations.12,13 However, mutations may stabilize or destabilize the partially folded/ unfolded states in the folding pathway. The different stabilities of these intermediates may influence the rate of protein folding,12 depending on the mutated residue and the site of mutation. Pair mutations or triple mutations cause larger © 2019 American Chemical Society

changes in the stability of a protein as compared to point mutations.14,15 A large decrease in the folding rate of a protein due to pair mutations corresponds to a higher stabilization of the folding intermediates that results in a higher probability of misfolding. Correlated mutations facilitate the evolution of proteins over geological time scales, which permits a detailed investigation of the amino acid substitutions with respect to protein stability, foldability, and function. Thus, it is important to associate sequence correlations with protein energetics. In this context, studies have attempted to rationalize mutational coevolution by relating the statistical free energies to the experimental free energies of folding.16−19 While the results are contentious, the existing methods are unable to predict the mutational correlational patterns and their link with protein energetics. In this article, a mutual information based correlated pair mutation method is presented to analyze the effect of pair mutations in proteins. A coarse-grained Cα chain backbone model of the protein is used to probe the effect of correlated pair mutations in selected proteins with varying secondary structure content. A statistical potential, which comprises one body, two body, and hydrogen bonding interaction terms, is developed from a selected data set of proteins to compute the energy of a sequence in any conformation. Sequences are designed for each selected protein, which are then randomly chosen for the correlated pair mutations. Most correlated residue pairs are identified for the selected sequences and Received: April 15, 2019 Revised: May 17, 2019 Published: May 24, 2019 5069

DOI: 10.1021/acs.jpcb.9b03533 J. Phys. Chem. B 2019, 123, 5069−5078

Article

The Journal of Physical Chemistry B mutated with all possible pairs of amino acids, except the amino acid pairs present at the mutated sites. The same set of sequences are also subjected to random pair mutations. Results show a significant increase in the probability of misfolding on incorporating correlated pair mutations as compared to the random pair mutations. The probability of misfolding increases due to correlated pair mutations, if mutated residue pairs are in contact and distant along the sequence. The choice of a compact or expanded conformation does not depend on the type of correlated pair mutation. Pair mutation of the most correlated residue pairs at the surface with hydrophobic amino acids results in higher misfolding probability as compared to that in the core. An exactly opposite behavior is observed on pair mutation with hydrophilic and charged amino acid pairs. The neutral amino acid pairs do not differentiate between core and surface sites. This method may be used to identify the evolutionarily conserved and coevolving residue pairs with optimal inter-residue interactions that are needed to preserve the functional structure of protein. Determination of the correlated residue pairs may be effective for targeted mutation experiments to predict complex mutation patterns, reengineer existing proteins, and design new proteins with reduced misfolding propensity.



METHODS The Cα chain backbones of the globular proteins S-nitroso thioredoxin (105-mer, PDB ID - 2IIY, SCOP α/β), human interleukin-8 (70-mer, PDB ID - 5D14, SCOP α + β), and alpha-amylase inhibitor HOE-467A (74-mer, PDB ID - 1HOE, SCOP all-β) are selected as the respective target structures for protein sequence design (Figure 1). The different secondary structural contexts are implicitly included in these coarsegrained model proteins of different SCOP classes. Monte Carlo simulations yield the respective non-native conformational ensemble for each protein, i.e., 85714, 140745, and 120482 conformations for 2IIY, 5D14, and 1HOE, respectively, with appropriate restrictions in the pseudo bond length (b) Cα, i− Cα, i+1, the bend angle (θ) Cα, i−Cα, i+1−Cα, i+2, and the torsion angle (ϕ) Cα, i−Cα, i+1−Cα, i+2−Cα, i+3 between the constituent amino acid residues in their respective target structures. The non-native conformational ensemble for each protein exhibits a broad range of RMSD and Rg as RMSD = 1.16−23.49 Å for 2IIY, RMSD = 1.00−29.12 Å for 5D14, and RMSD = 2.07− 28.00 Å for 1HOE and Rg varying as 12.32−26.98 Å for 2IIY, 12.04−30.99 Å for 5D14, and 11.12−27.99 Å for 1HOE (refer to Figure S1 of the Supporting Information). The appropriate restrictions in pseudo bond length, bend angle, and torsion angle are included to resemble realistic protein conformations. The allowed ranges of these parameters are opted from a data set of 338 monomeric globular proteins with X-ray resolution ≤3 Å and sequence identity ≤30%, compiled from the RCSB protein data bank on 09/02/2017.20 This data set comprises single polypeptide chains without any ligands and missing residues/atoms in their respective X-ray crystal structures. The values of pseudo bond lengths, bend angles, and torsion angles are extracted from their corresponding distribution in the selected data set of proteins, where their respective ranges are assigned based on the most populated region.21 These values range from 3.65 Å ≤ b ≤ 3.95 Å, 70° ≤ θ ≤ 160° for the pseudo bond length and bend angle, respectively (refer to Figures S2 and S3 of the Supporting Information of ref 22). The range of the torsion angle is chosen from the three most populated regions 10° ≤ ϕ ≤ 70°, 160° ≤ ϕ ≤ 180° and

Figure 1. Cα chain backbone structures of selected globular proteins (a) 2IIY, (b) 5D14, and (c) 1HOE as target states for protein sequence design.

−180° ≤ ϕ ≤ − 100° in the torsion angle distribution plot (refer to Figure S4 of the Supporting Information of ref 22). Potential. A statistical potential comprising one body, two body, and hydrogen bonding interaction terms is developed from the selected data set of proteins to compute the effective energy of a sequence in any conformation. While, the one body potential accounts for the interaction of an amino acid with its surrounding environment, the two body potential measures the average strength of interaction between any two amino acid residues. One body and two body interaction potentials are derived from the selected data set of proteins based on the frequency of occurrence of the individual amino acid residues and pairwise amino acid residues in the crystal structures of the respective proteins. Hydrogen bonding interaction is modeled by the 10−12 Lennard-Jones (LJ) potential. All interactions are categorized as local and nonlocal based on the distance between the interacting amino acid residues along the sequence. The interaction of the ith amino acid residue with those up to the (i + 4)th position along the sequence is considered as local, while the interactions beyond the (i + 4)th residue are termed as nonlocal. One Body Interaction Potential. This potential22 takes into account the structural context of a particular amino acid residue in any conformation that depends on the identity of the specific amino acid residue and its position in the protein. The surrounding environment of an amino acid residue may be quantified in terms of its coordination number, which represents the number of nonbonded amino acid residues within a limiting spatial distance of 7.5 Å.23 Local and nonlocal coordination numbers may be assigned according to the definition of the local and nonlocal interactions. For each of 5070

DOI: 10.1021/acs.jpcb.9b03533 J. Phys. Chem. B 2019, 123, 5069−5078

Article

The Journal of Physical Chemistry B

Thus, the total energy of a sequence in a given conformation may be expressed as

the 20 amino acids, the probability distribution of the local and nonlocal coordination numbers is calculated from the selected data set of 338 proteins. The interaction potential is derived from these amino acid residue probabilities through the Boltzmann inversion method.24 V (αi , c) = −kBT ln(p(αi , c)/pavg ,1B , c )

ET = E1B , L + E1B , NL + E2B , L + E2B , NL + EHB , L + EHB , NL (4)

where, E1B,L, E1B,NL, E2B,L, E2B,NL, EHB,L, and EHB,NL represent the respective energetic contributions due to one body local, one body nonlocal, two body local, two body nonlocal, local hydrogen bonding, and nonlocal hydrogen bonding interactions to the total energy of a sequence, ET, for a specified conformation. Protein Sequence Design. Sequences are designed through Monte Carlo simulation by a combination of positive and negative design for the selected target states of 2IIY, 5D14, and 1HOE, respectively. The sequence is stabilized in the chosen target state in positive design, whereas it is destabilized in the non-native ensemble of states in negative design. Therefore, the energy of the sequence is minimized in the target state in positive design, while in negative design the energy of sequence is maximized in the non-native state ensemble. Hence, with each accepted mutation the energy gap between target state and non-native states increases. A random sequence is selected as a starting sequence for Monte Carlo simulation. A random point mutation is made at each MC step, and the energy of the mutated sequence is calculated in the target state and non-native states. In the positive design, Metropolis28 criterion is used to accept or reject the move. The mutated sequence accepted in the positive design is further checked for negative design. The stability gap, Δ, which is the difference in energy of the sequence between the target state Enat and ensemble averaged energy of the non-native states ⟨Enon−nat⟩ , is calculated for negative design.

(1)

where p(αi, c) is the probability of occurrence of the αi type of amino acid residue at the ith site with a coordination number c and V(αi, c) is the required interaction potential. The average probability of occurrence of the 20 different types of amino acid residues at any site with the coordination number c is denoted by pavg,1B, c. kB is the Boltzmann constant, and T is the absolute temperature. The value kBT = 1 is assumed in this work. One body potential favors hydrophobic residues at high coordinated (buried) sites and hydrophilic residues at low coordinated (surface) sites. The presence of hydrophobic residues at the buried sites constitutes the hydrophobic core of the protein. Therefore, one body potential implicitly accounts for the hydrophobic effect through the identity of the interacting residues for a particular structural context. Two Body Interaction Potential. The two body interaction potential measures the inter-residue contact propensities between two nonbonded amino acid residues.22,25 For each pair of 210 unique amino acid pairs the number of possible contacts and the number of actual contacts are calculated in each protein of the selected data set. Both local and nonlocal contacts are calculated separately. Thus, for each amino acid pair, the probability of the local and nonlocal contacts is expressed as the ratio of the total number of local/nonlocal actual contacts to the total number of local/nonlocal possible contacts. These contact probabilities are converted into the potential through the Boltzmann inversion method24 V (αi , αj) = −kBT ln(p(αi , αj)/pavg ,2B )

Δ = Enat − Enon − nat (2)

(5)

After a mutation, if Δ is negative and more than 80% of the non-native states show an increase in energy, the mutation is accepted; otherwise, it is rejected. The negative value of Δ ensures that the energy gap increases between the target state and non-native states after an acceptable mutation. This process is repeated until the mutated amino acid sequence selects the minimum energy target state. This procedure is used to design 100000 sequences for each of the target states for the selected proteins. Amino Acid Correlation. The designed sequences of each protein are used to calculate the site-specific residue probability ωi(αi) and the pairwise residue probability ωi,j(αi, αj). The site-specific residue probability, ωi(αi) is calculated as the ratio of the frequency of occurrence of αi type of amino acid residue at the ith site of the designed sequences to the total number of such sequences. The pairwise residue probability, ωi,j(αi, αj), is also calculated as the frequency of occurrence of a pair of residues, αi and αj type of amino acids at the ith and jth sites of the designed sequences, respectively, to the total number of such sequences. The correlation between αi and αj type of amino acid residues residing at ith and jth sites may be measured in terms of the tolerance defined as

where V(αi, αj) is the two body interaction potential between αi and αj type of amino acid residues with a pairwise residue probability of p(αi, αj). For all 210 amino acid pairs, the average probability of formation of two body nonbonded contacts is denoted by pavg,2B. Hydrogen Bonding. The commonly used 10−12 LJ potential26,27 is used to represent the hydrogen bond. ÄÅ É 10 Ñ ÅÅÅ ij r yz12 ÑÑÑ i y j z r Å Ñ Ei , j = ϵÅÅÅ5jjjj min zzzz − 6jjjj min zzzz ÑÑÑ ÅÅ j ri , j z ÑÑ j z r { k i , j { ÑÑÖ ÅÅÇ k (3) where Ei, j denotes the hydrogen bond energy between ith and jth residues separated by a spatial distance, ri, j, and rmin is the spatial distance between the residue pair corresponding to a minimum of the hydrogen bond energy. The energy well depth is ϵ, which is assumed to be equal to 1. The Cα−Cα distance corresponding to the peak in the distribution of the spatial distance between the Cα atoms of the backbone−backbone hydrogen bond forming amino acid residue pairs in the selected data set of proteins is found to be rmin (5.25 Å). For the formation of a hydrogen bond, ri, j must satisfy a distance constraint, i.e., 5.0 Å ≤ ri, j ≤ 7.0 Å, corresponding to the most populated region of the Cα−Cα distance distribution plot (refer to Figure S7 of the Supporting Information of ref 22). The hydrogen bond energy is calculated separately for the local and nonlocal interactions.

Ci , j(αi , αj) =

ωi , j(αi , αj) ωi(αi) × ωj(αj)

(6)

when 5071

DOI: 10.1021/acs.jpcb.9b03533 J. Phys. Chem. B 2019, 123, 5069−5078

Article

The Journal of Physical Chemistry B l >1 o o o o o o o o o o o o o o4 represents the nonlocal interactions between the highest correlated residue pairs. Some of the highest correlated residue pairs interact locally, while a large number of these residue pairs have nonlocal interactions. This highlights the significant role of nonlocal interactions in proteins. The probability distribution of the spatial distance between the highest correlated residue pairs is shown in Figure 2(b) for three designed sequences of 2IIY. Although the peak of this probability distribution occurs at 7.5 Å. The degree of interaction between any two residues is inversely proportional to the spatial distance between them.31 Thus, the correlated residues either in contact or not in contact may differentially stabilize a protein. Two different sets of mutated sequences are generated accordingly, and the number of misfolded sequences are computed. The probability of misfolding may be evaluated as the ratio of the number of misfolded mutated sequences to the total number of mutated sequences in each set. It may be observed from Figure 5 that pair mutations of correlated residues that are in contact record a higher probability of misfolding as compared to the correlated residues which are not in contact. The residues in contact stabilize the protein to form compact structures. Thus, correlated mutations of such residue pairs make the protein more vulnerable toward misfolding. Similar results are also obtained for the proteins 5D14 and 1HOE (refer to Figure S5 of the Supporting Information).

The number of folded and misfolded sequences are computed for both locally and nonlocally correlated mutated sequences. The probability of misfolding on local/nonlocal correlated pair mutations may be calculated as the ratio of the number of misfolded mutated sequences to the total number of mutated sequences in each set of mutated sequences. Figure 6 depicts the probability of misfolding on local and nonlocal correlated pair mutations of 10 randomly selected sequences. For each sequence, the probability of misfolding is higher for the nonlocal pair correlated mutations. A similar trend is observed for 5D14 and 1HOE (refer to Figure S6 of the Supporting Information). This may be due to the dominance of nonlocal interactions over local interactions in proteins. The nonlocal pair correlated mutations involve nonlocal interactions, which are the primary determinant of protein folding, while the local pair correlated mutations include local interactions which may be required but not necessary for folding.22 Thus, the probability of misfolding is higher for nonlocal pair correlated mutations. It may be concluded from Figures 4, 5, and 6 that on correlated pair mutation the probability of misfolding increases, if mutated residue pairs are in contact and distant along the sequence. The destabilizing correlated pair mutations lead to misfolding. The destabilization of a misfolded mutated sequence due to local or nonlocal interactions may be quantified as D = ((EL , wt − EL , mut ) − (ENL , wt − ENL , mut ))

(7)

where EL,wt and ENL,wt represent the energy of a wild type/ designed sequence due to local and nonlocal interactions, while EL,mut and ENL,mut represent the energy of a mutated sequence due to local and nonlocal interactions. 5074

DOI: 10.1021/acs.jpcb.9b03533 J. Phys. Chem. B 2019, 123, 5069−5078

Article

The Journal of Physical Chemistry B

Figure 8. Energy landscape for (a) a designed sequence of 2IIY and its misfolded mutated sequences with (b) low RMSD and small Rg, and (c) high RMSD and large Rg conformations.

In eq 7, D > 0 represents the destabilization of a misfolded sequence majorly from the nonlocal interactions while D < 0 corresponds to the same from the local interactions. D = 0 shows equal destabilization due to both local and nonlocal interactions. The RMSD of the conformations chosen by the designed misfolded sequences is calculated. This RMSD is plotted in Figures S7, S8, and S9 of the Supporting Information for a designed sequence of 2IIY, 5D14, and 1HOE, respectively. On the basis of the RMSD of the chosen conformations, the misfolded sequences are divided into two groups. In one group, all misfolded sequences choose the conformations with RMSD > 8 Å, while the other group has misfolded sequences that select the conformations with RMSD ≤ 8 Å. In each of these groups, some of these misfolded sequences are obtained by local correlated pair mutations while the remaining by nonlocal correlated pair mutations. The local/ nonlocal destabilization energy of each misfolded sequence in both groups of randomly selected sequences is computed and used in eq 7 to obtain the respective number of such locally/ nonlocally correlated pair mutated misfolded sequences. The percentage of locally/nonlocally correlated mutated sequences destabilized by local and nonlocal interactions is calculated and plotted in Figures 7, for 2IIY (refer to Figures S10 and S11 of the Supporting Information for the corresponding results of 5D14 and 1HOE). Figures 7(b,d), S10(b,d), and S11(b,d) depict that when local and nonlocal correlated pair mutated misfolded sequences choose conformations with RMSD > 8 Å, then the destabilization is mostly caused by the nonlocal

interactions. However, when they choose conformations with RMSD ≤ 8 Å, then the destabilization of most misfolded sequences is due to the local interactions as evident from Figures 7(a,c) and S10(a,c) for 2IIY and 5D14. While for 1HOE, most of the misfolded sequences out of the five designed ones are destabilized by the local interactions, but out of the other five the destabilization of the misfolded sequences is mostly due to the nonlocal interactions (refer to Figures S11(a,c) of the Supporting Information). Since the misfolded sequences choose both low and high RMSD conformations on local and nonlocal correlated pair mutations, thus, the choice of a compact or expanded conformation does not depend on the type of correlated pair mutation. Pair mutations not only affect correlated residue pair interactions but also other interacting residue pairs that are in contact with the mutated residues. The correlated residue pair will be maximally affected, but the collective role of all other interacting pairs will decide the stability of the mutated sequence in a particular conformation. For 2IIY, 5D14, and 1HOE, the energy landscape of a designed sequence and its misfolded mutant sequence for conformations with RMSD ≤ 8 Å and RMSD > 8 Å are shown in Figures 8, S12, and S13, respectively (refer to the Supporting Information for Figures S12 and S13). The natural amino acids may be classified as hydrophobic (I, V, L, F, C, M, A, W), neutral (G, T, S, Y, P), and hydrophilic (H, N, D, Q, E, K, R).32 There are 36 hydrophobic (II, IV, VV, IL, VL, LL, IF, IC, VF, VC, LF, IM, IA, LC, VM, VA, LM, LA, FF, FC, CC, FM, FA, CM, CA, MM, MA, IW, AA, VW, LW, FW, CW, MW, AW, WW), 15 neutral (GG, GT, GS, TT, TS, 5075

DOI: 10.1021/acs.jpcb.9b03533 J. Phys. Chem. B 2019, 123, 5069−5078

Article

The Journal of Physical Chemistry B

Figure 9. Probability of misfolding of a designed sequence of 2IIY on correlated pair mutations with (a) hydrophobic, (b) hydrophilic, (c) charged, and (d) neutral amino acid pairs.

GY, SS, GP, TY, SY, TP, SP, YY, YP, PP), 28 hydrophilic (HH, HE, HQ, HD, HN, EE, EQ, ED, EN, QQ, QD, QN, DD, DN, NN, HK, EK, QK, DK, NK, HR, KK, ER, QR, DR, NR, KR, RR), and 10 charged (EE, ED, DD, EK, DK, KK, ER, DR, KR, RR) amino acid pairs. The highest correlated residue pairs are mutated with different types of amino acid pairs, and the solvent accessibility of each amino acid residue in the mutated sequences is calculated with DSSP.33 The relative solvent accessibility (RSA) is calculated as the ratio of the solvent accessibility of a residue to the maximum solvent accessibility of the same residue.34 The corresponding RSA of each amino acid residue is used to tag a residue as a core or surface residue. Amino acid residues occupying sites with RSA < 37% are considered as core residues, while those at sites with RSA ≥ 37% comprise the surface residues.35 The probability of misfolding due to pair mutation with these different types of amino acid pairs is calculated for a pair of residues in the core, at the surface, and with one in the core and the other at the surface, respectively. Figures 9, S14, and S15 depict the probability of misfolding due to such pair mutations for 2IIY, 5D14, and 1HOE (refer to Supporting Information for Figures S14 and S15). Figures 9(a), S14(a), and S15(a) exhibit that the probability of misfolding is lowest when both core residues of the protein are mutated with hydrophobic amino acids, while it is highest when both residues are at the surface. Similarly, Figures 9(b,c), S14(b,c), and S15(b,c) depict that the probability of misfolding is highest when the core residues are pair mutated with charged and hydrophilic amino acids; however, the probability of misfolding is lowest when the surface residues are mutated with charged and hydrophilic amino acids. For a given pair of residues, one in the core and the other at the surface, pair mutation with hydrophobic amino acids will stabilize the core

and destabilize the surface. An exactly opposite behavior is observed on pair mutation with hydrophilic and charged amino acid pairs. For a mutating pair of hydrophobic, hydrophilic, and charged amino acids, the probability of misfolding of most amino acid pairs lies between those of the core and surface mutations. It is clear from Figures 9(d), S14(d), and S15(d) that the pair mutation with neutral amino acid pairs in the core, at the surface, and with one in the core and the other at the surface site pair does not appreciably change the probability of misfolding. Thus, the neutral amino acid pairs do not differentiate between core and surface sites. The results confirm that the stability of a protein is governed by specific interactions between individual residue pairs depending on their location.



CONCLUSIONS This work explores the effect of correlated pair mutations in globular proteins. A statistical potential, which constitutes one body, two body, and hydrogen bonding interaction terms is developed from the selected data set of proteins to compute the effective energy of a sequence in any conformation. Sequences are designed for the selected proteins that are randomly chosen for correlated pair mutations. Results show a significant increase in the probability of misfolding on incorporating correlated pair mutations as compared to the random pair mutations. This confirms that the correlated residue pairs play an important role in controlling the stability and flexibility of the protein. The pair mutations of correlated residues that are in contact record a higher probability of misfolding as compared to the correlated residues that are not in contact. Correlated mutations of residue pairs that are in contact make the protein more vulnerable toward misfolding. The probability of misfolding increases on pair mutation of 5076

DOI: 10.1021/acs.jpcb.9b03533 J. Phys. Chem. B 2019, 123, 5069−5078

The Journal of Physical Chemistry B



nonlocally correlated residue pairs as compared to that of the locally correlated residue pairs. This may be due to the dominance of nonlocal interactions over local interactions in proteins. The nonlocal pair correlated mutations involve nonlocal interactions, which are the primary determinant of protein folding, while the local pair correlated mutations include local interactions which may be required but not necessary for folding. The probability of misfolding increases for the mutated residue pairs that are in contact and distant along the sequence. The choice of a compact or expanded conformation does not depend on the type of correlated pair mutation. Pair mutations not only affect correlated residue pair interactions but also other interacting residue pairs that are in contact with the mutated residues. The correlated residue pair will be maximally affected, but the collective role of all other interacting pairs will decide the stability of the mutated sequence in a particular conformation. Pair mutation of the most correlated residue pairs at the surface with hydrophobic amino acids results in higher misfolding probability as compared to that in the core. However, pair mutation with hydrophilic and charged amino acids at the surface exhibits lower misfolding probability as compared to that in the core. For a given pair of residues, one in the core and the other at the surface, pair mutation with hydrophobic amino acids stabilizes the core and destabilizes the surface. An exactly opposite behavior is observed on pair mutation with hydrophilic and charged amino acid pairs. The neutral amino acid pairs do not differentiate between core and surface sites. The results confirm that the stability of a protein is governed by specific interactions between individual residue pairs depending on their location. This method may be used to identify the evolutionarily conserved and coevolving residue pairs with optimal inter-residue interactions that are needed to preserve the functional structure of protein. Determination of the correlated residue pairs may be effective for targeted mutation experiments to predict complex mutation patterns, reengineer existing proteins, and design new proteins with reduced misfolding propensity.



Article

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. ORCID

Parbati Biswas: 0000-0001-7709-2263 Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS The authors gratefully acknowledge DST-SERB, India (Project No. EMR/2016/006619), and DU-DST PURSE Grant phaseII (File No. CD/2018/1638) for financial assistance. A. Kumar acknowledges DST-SERB, India, for providing SRF (Project No. EMR/2016/006619).



REFERENCES

(1) Siltberg-Liberles, J.; Grahnen, J. A.; Liberles, D. A. The evolution of protein structures and structural ensembles under functional constraint. Genes 2011, 2, 748−762. (2) Sadowski, M. I.; Jones, D. T. The sequence−structure relationship and protein function prediction. Curr. Opin. Struct. Biol. 2009, 19, 357−362. (3) Sikosek, T.; Chan, H. S. Biophysics of protein evolution and evolutionary protein biophysics. J. R. Soc., Interface 2014, 11, 20140419. (4) Bloom, J. D.; Lu, Z.; Chen, D.; Raval, A.; Venturelli, O. S.; Arnold, F. H. Evolution favors protein mutational robustness in sufficiently large populations. BMC Biol. 2007, 5, 29. (5) Wagner, A. Robustness and evolvability: a paradox resolved. Proc. R. Soc. London, Ser. B 2008, 275, 91−100. (6) Tokuriki, N.; Stricher, F.; Schymkowitz, J.; Serrano, L.; Tawfik, D. S. The stability effects of protein mutations appear to be universally distributed. J. Mol. Biol. 2007, 369, 1318−1332. (7) Gerton, J. L.; Ohgi, S.; Olsen, M.; DeRisi, J.; Brown, P. O. Effects of mutations in residues near the active site of human immunodeficiency virus type 1 integrase on specific enzyme-substrate interactions. J. Virol. 1998, 72, 5046−5055. (8) Horner, D. S.; Pirovano, W.; Pesole, G. Correlated substitution analysis and the prediction of amino acid structural contacts. Briefings Bioinf. 2007, 9, 46−56. (9) Kowarsch, A.; Fuchs, A.; Frishman, D.; Pagel, P. Correlated mutations: a hallmark of phenotypic amino acid substitutions. PLoS Comput. Biol. 2010, 6, No. e1000923. (10) Wozniak, P. P.; Vriend, G.; Kotulska, M. Correlated mutations select misfolded from properly folded proteins. Bioinformatics 2017, 33, 1497−1504. (11) Lisewski, A. M. Random amino acid mutations and protein misfolding lead to Shannon limit in sequence-structure communication. PLoS One 2008, 3, No. e3110. (12) Sinha, N.; Nussinov, R. Point mutations and sequence variability in proteins: redistributions of preexisting populations. Proc. Natl. Acad. Sci. U. S. A. 2001, 98, 3139−3144. (13) Nemtseva, E. V.; Gerasimova, M. A.; Melnik, T. N.; Melnik, B. S. Experimental approach to study the effect of mutations on the protein folding pathway. PLoS One 2019, 14, No. e0210361. (14) Jackson, S. E. How do small single-domain proteins fold? Folding Des. 1998, 3, R81−R91. (15) Majeske, N.; Jagodzinski, F. Elucidating which pairwise mutations affect protein stability: an exhaustive big data approach. In Proc. of IEEE COMPSAC (International Conference on Computer Software and Applications Conference). 2018; pp 508− 515. (16) Morcos, F.; Schafer, N. P.; Cheng, R. R.; Onuchic, J. N.; Wolynes, P. G. Coevolutionary information, protein folding landscapes, and the thermodynamics of natural selection. Proc. Natl. Acad. Sci. U. S. A. 2014, 111, 12408−12413.

ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jpcb.9b03533. The RMSD and Rg of the non-native conformational ensembles, distance along the sequence between highest correlated residues, probability distribution of the spatial distance between highest correlated residues, probability of misfolding on incorporating correlated and random pair mutations, probability of misfolding on pair mutation of correlated residues when they are (i) in contact, and (ii) not in contact, probability of misfolding on local and nonlocal correlated pair mutations, Cα RMSD of the most stable conformations of misfolded sequences, percentage of misfolded sequences destabilized by local and nonlocal interactions, energy landscape of a designed sequence and its misfolded sequences, and probability of misfolding on correlated pair mutation with hydrophobic, hydrophilic, charged, and neutral amino acid pairs. (PDF) 5077

DOI: 10.1021/acs.jpcb.9b03533 J. Phys. Chem. B 2019, 123, 5069−5078

Article

The Journal of Physical Chemistry B (17) Contini, A.; Tiana, G. A many-body term improves the accuracy of effective potentials based on protein coevolutionary data. J. Chem. Phys. 2015, 143, No. 025103. (18) Figliuzzi, M.; Jacquier, H.; Schug, A.; Tenaillon, O.; Weigt, M. Coevolutionary landscape inference and the context-dependence of mutations in beta-lactamase TEM-1. Mol. Biol. Evol. 2016, 33, 268− 280. (19) Levy, R. M.; Haldane, A.; Flynn, W. F. Potts hamiltonian models of protein covariation, free energy landscapes, and evolutionary fitness. Curr. Opin. Struct. Biol. 2017, 43, 55−62. (20) Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. The protein data bank. Nucleic Acids Res. 2000, 28, 235−242. (21) Fogolari, F.; Esposito, G.; Viglino, P.; Cattarinussi, S. Modeling of polypeptide chains as Cα chains, Cα chains with Cβ, and Cα chains with ellipsoidal lateral chains. Biophys. J. 1996, 70, 1183−1197. (22) Kumar, A.; Baruah, A.; Biswas, P. Role of local and nonlocal interactions in folding and misfolding of globular proteins. J. Chem. Phys. 2017, 146, No. 065102. (23) Mirny, L. A.; Finkelstein, A. V.; Shakhnovich, E. I. Statistical significance of protein structure prediction by threading. Proc. Natl. Acad. Sci. U. S. A. 2000, 97, 9978−9983. (24) Reith, D.; Pütz, M.; Müller-Plathe, F. Deriving effective mesoscale potentials from atomistic simulations. J. Comput. Chem. 2003, 24, 1624−1636. (25) Miyazawa, S.; Jernigan, R. L. Residue−residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. J. Mol. Biol. 1996, 256, 623−644. (26) McGuire, R. F.; Momany, F. A.; Scheraga, H. A. Energy parameters in polypeptides. v. empirical hydrogen bond potential function based on molecular orbital calculations. J. Phys. Chem. 1972, 76, 375−393. (27) Mayo, S. L.; Olafson, B. D.; Goddard, W. A. DREIDING: a generic force field for molecular simulations. J. Phys. Chem. 1990, 94, 8897−8909. (28) Metropolis, N.; Ulam, S. The monte carlo method. J. Am. Stat. Assoc. 1949, 44, 335−341. (29) Bleicher, L.; Lemke, N.; Garratt, R. C. Using amino acid correlation and community detection algorithms to identify functional determinants in protein families. PLoS One 2011, 6, No. e27786. (30) Schwehm, J. M.; Kristyanne, E. S.; Biggers, C. C.; Stites, W. E. Stability effects of increasing the hydrophobicity of solvent-exposed side chains in staphylococcal nuclease. Biochemistry 1998, 37, 6939− 6948. (31) Zhang, C.; Liu, S.; Zhou, H.; Zhou, Y. An accurate, residuelevel, pair potential of mean force for folding and binding based on the distance-scaled, ideal-gas reference state. Protein Sci. 2004, 13, 400−411. (32) Pommié, C.; Levadoux, S.; Sabatier, R.; Lefranc, G.; Lefranc, M.-P. IMGT standardized criteria for statistical analysis of immunoglobulin V-REGION amino acid properties. J. Mol. Recognit. 2004, 17, 17−32. (33) Kabsch, W.; Sander, C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 1983, 22, 2577−2637. (34) Miller, S.; Janin, J.; Lesk, A. M.; Chothia, C. Interior and surface of monomeric proteins. J. Mol. Biol. 1987, 196, 641−656. (35) Baruah, A.; Biswas, P. The Role of site-directed point mutations in protein misfolding. Phys. Chem. Chem. Phys. 2014, 16, 13964− 13973.

5078

DOI: 10.1021/acs.jpcb.9b03533 J. Phys. Chem. B 2019, 123, 5069−5078