Stability of Unfolded and Folded Protein Structures Using a 3D-RISM

Oct 2, 2017 - Specifically, the stabilities of CLN025, the WW domain variant GTT, the triple mutant of the redesigned protein G variant NuG2, and the ...
0 downloads 0 Views 2MB Size
Article Cite This: J. Phys. Chem. B 2017, 121, 9881-9885

pubs.acs.org/JPCB

Stability of Unfolded and Folded Protein Structures Using a 3D-RISM with the RMDFT Yutaka Maruyama† and Ayori Mitsutake*,‡ †

Co-Design Team, FLAGSHIP 2020 Project, RIKEN Advanced Institute for Computational Science, Kobe 650-0047, Japan Department of Physics, Keio University, Yokohama, Kanagawa 223-8522, Japan



S Supporting Information *

ABSTRACT: Protein stability is determined by the characteristics of the protein itself as well as the surrounding solvent. Herein, we discuss the stability of the folded and unfolded structures of proteins obtained from Anton’s long simulations (Lindorff-Larsen, K.; Piana, S.; Dror, R. O.; Shaw, D.E. Science, 2011, 334, 517−520). Specifically, the stabilities of CLN025, the WW domain variant GTT, the triple mutant of the redesigned protein G variant NuG2, and the de novo-designed three-helix bundle protein are investigated. The solvation free energy of the structures is calculated using the threedimensional reference interaction site model with the reference-modified density functional theory. The total energy is given by the sum of the conformational energy and the solvation free energy, and their balance results in the stabilization of protein structure, as demonstrated by the correspondence between structures with the lowest total energy of all proteins to their native structures. Overall, these findings indicate that the total energy function is appropriate for evaluating the stability of protein folding systems. Moreover, decomposing the energy terms reveals that proteins achieve their stabilities from the balance between the conformational energy and the solvation free energy. In particular, the solvation entropy is the main contributor to the process of folding from more extended structures to compact structures. The native structure is more stable than the compact structure owing to competition between intramolecular and intermolecular interactions.



Maruyama and Hirata10 proposed an algorithm of 3D-RISM theory for a graphics processing unit (GPU), which substantially reduced the calculation time. Moreover, the use of reference-modified density functional theory (RMDFT) was proposed to obtain a more accurate estimate of the SFE function. RMDFT also improved the absolute values of the SFE for neutral amino acid side-chain analog molecules and for 504 small organic molecules.11 In this Article, we investigate the stabilities of unfolded and folded protein structures by introducing the total energy Etot, which is given by the sum of the conformational energy Ec and the SFE Esol. Here, we use the SFE function calculated by the 3D-RISM with the RMDFT. The stabilities of the folded and unfolded structures of superchignolin, CLN025 (a small protein),12 the WW domain variant GTT (a β-sheet protein, hereafter referred to as mGTT),13 the triple mutant of the redesigned protein G variant NuG2 (an α + β protein, hereafter mNuG2),14 and the de novo-designed three-helix bundle protein α3D (an α-helical protein, hereafter mα3D)15 obtained from Anton’s long simulations1 are investigated.

INTRODUCTION Molecular dynamics (MD) simulation is a popular and powerful method for investigating the structure and function of proteins in microscopic detail. Recent technological advances have allowed for simulations to be carried out on time scales of the order of microseconds (see reviews in refs 1−3). Evaluating the stability of proteins is important for investigating the mechanism of protein folding. However, investigation of the stability of proteins obtained by MD simulations remains a challenge because the huge molecules surrounding the proteins affect their stability. The solvation free energy (SFE) is one of the most important properties to investigate the thermodynamic stability of biomolecules, including protein folding. However, SFE calculations using MD simulations are very computationally costly, particularly in the case of biomolecules. For example, the energy representation (ER) method, which is one of the techniques for SFE calculation, can avoid running MD simulations for the intermediate states between the initial and final states of the solute-insertion process.4−6 However, the ER method requires at least 5 ns of MD simulation with explicit water for SFE calculation.7 Alternatively, the three-dimensional reference interaction site model (3D-RISM),8,9 which is an integral equation theory developed for molecular liquids, could be used to calculate the SFE with a relatively low computational cost. Recently, © 2017 American Chemical Society

Received: August 25, 2017 Revised: October 1, 2017 Published: October 2, 2017 9881

DOI: 10.1021/acs.jpcb.7b08487 J. Phys. Chem. B 2017, 121, 9881−9885

Article

The Journal of Physical Chemistry B



COMPUTATIONAL DETAILS CLN025, mGTT, mNuG2, and mA3D contain 10, 35, 56, and 73 amino acids, respectively. The details of the MD simulations of these proteins are provided in the Supporting Information of Lindorff-Larsen et al.1 Note that there are two types of chignolin: the original chignolin16 and CLN025.12 In this work, only CLN025 was used for the folding simulation, which contains mutations of two amino acids at both terminals (G1Y and G10Y). To investigate the stability of these proteins, we introduce the total energy Etot, which is given by the sum of the conformational energy Ec, and the SFE Esol: Etot = Ec + Esol

(1)

The force field for the conformational energy is based on CHARMM22*.17−19 We used GROMACS for calculating the conformational energy.20,21 The solvation free energy of the structures is calculated using the three-dimensional reference interaction site model with the reference-modified density functional theory. Note that although the folding simulations of Lindorff-Larsen et al.1 were performed near transition temperatures to enhance sampling, we instead calculated the SFE at room temperature (298.15 K) by using 3D-RISM with RMDFT11 on GPU.10 The diameter of the hard sphere was set to 2.88 Å for the RMDFT calculations. For the details of the calculation, see refs 10 and 11. An SFE calculation of a protein structure takes within 1 min on NVIDIA K20c GPU (2563 grids, water solvent). To investigate the solvation effect, we also divide the term of the SFE by the terms of the solvation energy Es and the solvation entropy, −TΔSs, as Esol = Es − T ΔSs

Figure 1. Backbone structures with the lowest total energy and the lowest conformational energy for CLN025 (a and b), mGTT (c and d), mNuG2 (e and f), and mα3D (g and h), respectively. Chimera was used to generate the figures.24

(2)

total energy are nearly the same as those with the lowest conformational energy for CLN025, the two structures differed for the other proteins. Figure 2 shows the values of the total energy of CLN025 (Figure 2a), mGTT (Figure 2b), mNuG2 (Figure 2c), and mα3D (Figure 2d) as a function of Cα-RMSD. The ranges of the total energy of CLN025, mGTT, mNuG2, and mA3D were approximately 100, 180, 250, and 350 kcal/mol, respectively.

where ΔSs is the solvation entropy and T is temperature. We mention that the estimation of −TΔSs requires two SFE calculations at different temperatures. Thus, three SFE calculations were performed for one structure in this study. For the details of the calculation, see refs 22 and 23. Structures obtained every 20 ns (every 100 samples) for CLN025 and structures obtained every 200 ns (every 1000 samples) for the other proteins extracted from Anton’s trajectory are used in the calculations. The total number of samples is 5348, 5686, 5780, and 3534 for CLN025, mGTT, mNuG2, and mα3D, respectively.



RESULTS AND DISCUSSION Figure 1 shows the structures with the lowest total energy and the lowest conformational energy for all proteins. The Cα-rootmean-square deviation (RMSD) values for the structures with the lowest total energy relative to the corresponding reference structures for CLN025, mGTT, mNuG2, and mα3D were determined to be 0.7, 1.6, 1.9, and 3.6, respectively. The reference structures for calculating the Cα-RMSD values were taken from the Supporting Information of Honda et al.,12 and were 2F21.pdb, 1MIO.pdb, and 2A3D.pdb for CLN025, mGTT, mNuG2, and mα3D, respectively. (The reference structures correspond to native structures.) From Figure 1 and Cα-RMSD values, the structures with the lowest total energy correspond to their native structures. The Cα-RMSD values for the structures of CLN025, mGTT, mNuG2, and mα3D with the lowest conformational energy are 0.9, 8.8, 7.6, and 12.3, respectively. Therefore, although the structures with the lowest

Figure 2. Total energy of CLN025 (a), mGTT (b), mNuG2 (c), and mα3D (d) as a function of Cα-RMSD. The purple, green, cyan, and yellow dots correspond to the structures for run 0, run 1, run 2, and run 4, respectively, of Lindorff-Larsen et al.1 9882

DOI: 10.1021/acs.jpcb.7b08487 J. Phys. Chem. B 2017, 121, 9881−9885

Article

The Journal of Physical Chemistry B

mA3D were approximately 250, 600, 700, and 1000 kcal/mol, respectively, whereas the ranges of the SFE were approximately 200, 450, 600, and 900 kcal/mol, respectively. Therefore, the ranges of the conformational energy are larger than those of the total energy. This indicates that the solvation effect reduces the difference of the conformational energy between unfolded and folded structures. As shown in Figure 3a, c, e, and g, the native and unfolded structures are within a similar conformational energy range. For the small protein CLN025, the conformational energy and CαRMSD values showed a positive correlation, and structures with a lower conformational energy correspond to the native state. By contrast, no such correlations were observed for the larger proteins. Typically, in mNuG2, slightly disordered structures with Cα-RMSD ≈ 10 Å showed lower conformational energy values compared to those of the native structure. When comparing the results among all proteins evaluated, mNuG2 clearly shows a characteristic folding structure different from the other proteins. These results suggest that it is difficult to evaluate the stability of proteins using only the conformational energy. As shown in Figure 3b, d, f, and h, the extended structures had lower SFE values. This may be due to the stability of the hydrogen bonds between atoms of the proteins and water. This suggests competition between the conformational energy and SFE. Therefore, when we introduce the total energy given by the sum of the conformational energy and SFE, the native structures have lower total energy values. Thus, the balance between conformational energy and SFE allows for stability of the native state. Nevertheless, the detailed mechanism for stabilization will differ for individual proteins. Figure 4 shows the terms of the solvation energy Es (Figure 4a) and solvation entropy −TΔSs (Figure 4b) as a function of

Note that the energy range expands as the protein size increases. For all proteins, the structures with the lowest total energy clearly correspond to a small Cα-RMSD value (representing the native structure). Similarly, the unfolded structures have higher total energy for all proteins. The same tendency was observed in a previous study employing 3DRISM theory, which used only 570 samples of protein G.23 However, in the previous work, the authors only considered nonbinding interactions such as van der Waals interactions and electrostatic interactions as the conformational energy, and used the Singer−Chandler formula25,26 as the SFE functional. In the present analysis, we include all of the terms of the conformational energy and use a more accurate SFE function using the RMDFT. The total energy distributions as a function of the radius of gyration Rg are shown in Figure S1 of the Supporting Information. Moreover, as shown in Figure 2, the value of Cα-RMSD can distinguish the structure differences in more detail, especially for differences between the native structure and slightly compact structures. On the basis of investigations of the several structures obtained from unfolded and folded simulations of various proteins, the present results strongly support that the total energy is an appropriate parameter for evaluating the stabilities of many proteins. Figure 3 shows the conformational energy and SFE of all proteins as a function of Cα-RMSD. The ranges of the conformational energy of CLN025, mGTT, mNuG2, and

Figure 4. Terms of the solvation energy Es (a) and solvation entropy −TΔSs (b). (c) The sum of Ec and Es of mGTT as a function of CαRMSD. (d) Correlation between the conformational energy and the solvation free energy.

Cα-RMSD for mGTT. The ranges of the solvation energy and entropy terms are 600 and 60 kcal/mol, respectively. When comparing Figure 3d and Figure 4a, it appears that the tendency of the solvation energy is similar to that of the SFE, indicating that the solvation energy is the main contributor to the SFE. The same tendency was observed for the other proteins (see Supporting Information Figure S2. The energy terms as a function of the radius of gyration Rg are also shown

Figure 3. Conformational energy and solvation free energy of CLN025 (a and b), mGTT (c and d), mNuG2 (e and f), and mα3D (g and h) as a function of Cα-RMSD, respectively. The different colors of dots correspond to the different runs of Lindorff-Larsen et al.1 9883

DOI: 10.1021/acs.jpcb.7b08487 J. Phys. Chem. B 2017, 121, 9881−9885

Article

The Journal of Physical Chemistry B

intermolecular interactions. Finally, our results highlight that the precise mechanisms underlying stabilization vary for individual proteins. The results also support the usefulness of hybrid simulations combining MD and 3D-RISM.29−31 In fact, the multiple time step MD simulation with 3D-RISM accelerated the folding process of Trp-cage miniprotein.32 In future works, we will apply this strategy for further investigating the mechanism of the folding process and the forces driving the ligand binding of proteins.

in Figure S3). To decompose the contribution for the terms of “energy” and “entropy”, we divide the total energy into two terms of Ec + Es and −TΔSs as Etot = (Ec + Es) − TΔSs. The values of Ec + Es as a function of Cα-RMSD for mGTT are shown in Figure 4c. The range of Ec + Es is about 140 kcal/mol and is similar to that for the term of solvation entropy. The large changes of the conformational energy (about 600 kcal/ mol) and solvation energy (about 600 kcal/mol) are thus canceled out. The conformational energy is affected by proteinintramolecular interactions, and especially the hydrogen bonds between the atoms of the protein itself. By contrast, the solvation energy is affected by protein−water intermolecular interactions, especially the hydrogen bonds between atoms of the solvent and those of the protein. Owing to this competition between intramolecular and intermolecular interactions, the native structure becomes slightly stabilized by the intramolecular interactions. By comparing Figure 4b and c, the factors that are most important for the folding event process can be clarified. The term of the solvation entropy is reduced during the folding process from unfolded structures with Cα-RMSD ≈ 15 Å to compact structures with Cα-RMSD ≈ 8 Å. This means that the hydration entropy drives the folding process from more extended structures to more compact structures. However, the native and compact structures have a similar contribution from the solvation entropy. Conversely, there is a large difference in the energy term Ec + Es between the native structure and compact structure. This implies that after folding to the compact structures due to the hydration entropy,23,27,28 the native structure is more energetically stabilized than the other compact structures due to the balance between the intramolecular and intermolecular interactions. A similar tendency was observed for the other proteins except for CLN025 (see Figure S2). In CLN025, the stabilizations due to the hydration entropy and the energy occur simultaneously because of the small size (10 residues) of the protein. As shown in Figure 4d, an inverse correlation was observed between the conformational energy Ec and the SFE Esol. As shown in Figure S4, similar results were obtained for the other proteins (the correlations between Es and −TΔSs for all the proteins are also shown in Figure S4). This further supports that the proteins can achieve stability due to the balance between the conformational energy and SFE.



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jpcb.7b08487. Figures showing the total energy as a function of radius of gyration, the solvation energy, solvation entropy term and energy term as a function of RMSD and as a function of radius of gyration, the correlation between the conformational energy and the solvation free energy, and the correlation between the solvation energy and solvation entropy. (PDF)



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. ORCID

Yutaka Maruyama: 0000-0003-4035-4885 Ayori Mitsutake: 0000-0002-4194-7255 Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This work was supported by PRESTO, JST (JPMJPR13LB). Numerical calculations were performed in part using HA-PACS at the Center for Computational Sciences (CCS), University of Tsukuba. We thank D.E. Shaw Research for providing access to the protein folding trajectory dataset. Molecular graphics and analyses were performed with the UCSF Chimera package. Chimera is developed by the Resource for Biocomputing, Visualization, and Informatics at the University of California, San Francisco (supported by NIGMS P41-GM103311).





CONCLUSIONS We demonstrate that it is possible to analyze protein stability by calculating the total energy of the unfolded and folded structures of proteins obtained by Anton’s long simulation. The total energy is given by the sum of the conformational energy and the SFE calculated using the 3D-RISM with the RMDFT. Among a huge number of calculations, the structures with the lowest total energy correspond to the respective native structures for a wide variety of proteins, including a small protein, a β-sheet protein, an α + β protein, and an α-helical protein. Overall, this simulation demonstrated that the total energy is an appropriate energy function for investigating the stabilities of various proteins. Moreover, decomposing the energy terms revealed that proteins achieve their stabilities from the balance between the conformational energy and SFE. In particular, the solvation entropy is the main contributor to the process of folding from more extended structures to compact structures. The native structure is more stable than the compact structure owing to competition between intramolecular and

REFERENCES

(1) Lindorff-Larsen, K.; Piana, S.; Dror, R. O.; Shaw, D. E. How fastfolding proteins fold. Science 2011, 334, 517−520. (2) Dror, R. O.; Dirks, R. M.; Grossman, J. P.; Xu, H.; Shaw, D. E. Biomolecular simulation: a computational microscope for molecular biology. Annu. Rev. Biophys. 2012, 41, 429−452. (3) Lane, T. J.; Shukla, D.; Beauchamp, K. A.; Pande, V. S. To milliseconds and beyond: challenges in the simulation of protein folding. Curr. Opin. Struct. Biol. 2013, 23, 58−65. (4) Matubayasi, N.; Nakahara, M. Theory of solutions in the energetic representation. I. Formulation. J. Chem. Phys. 2000, 113, 6070−6081. (5) Matubayasi, N.; Nakahara, M. Theory of solutions in the energy representation. II. Functional for the chemical potential. J. Chem. Phys. 2002, 117, 3605−3616. (6) Matubayasi, N.; Nakahara, M. Theory of solutions in the energy representation. III. Treatment of the molecular flexibility. J. Chem. Phys. 2003, 119, 9686−9702. (7) Sakuraba, S.; Matubayashi, N. ERmod: Fast and Versatile Computation Software for Solvation Free Energy with Approximate Theory of Solutions. J. Comput. Chem. 2014, 35, 1592−1608.

9884

DOI: 10.1021/acs.jpcb.7b08487 J. Phys. Chem. B 2017, 121, 9881−9885

Article

The Journal of Physical Chemistry B (8) Kovalenko, A.; Hirata, F. Three-dimensional density profiles of water in contact with a solute of arbitrary shape: a RISM approach. Chem. Phys. Lett. 1998, 290, 237−244. (9) Kovalenko, A.; Hirata, F. Potential of mean force between two molecular ions in a polar molecular solvent: a study by the threedimensional reference interaction site model. J. Phys. Chem. B 1999, 103, 7942−7957. (10) Maruyama, Y.; Hirata, F. Modified Anderson method for accelerating 3D-RISM calculations using graphics processing unit. J. Chem. Theory Comput. 2012, 8, 3015−3021. (11) Sumi, T.; Mitsutake, A.; Maruyama, Y. A Solvation-Free-Energy Functional: A Reference-Modified Density Functional Formulation. J. Comput. Chem. 2015, 36, 1359−1369. (12) Honda, S.; Akiba, T.; Kato, Y. S.; Sawada, Y.; Sekijima, M.; Ishimura, M.; Ooishi, A.; Watanabe, H.; Odahara, T.; Harata, K. Crystal structure of a ten-amino acid protein. J. Am. Chem. Soc. 2008, 130, 15327−15331. (13) Piana, S.; Sarkar, K.; Lindorff-Larsen, K.; Guo, M.; Gruebele, M.; Shaw, D. E. Computational design and experimental testing of the fastest-folding beta-sheet protein. J. Mol. Biol. 2011, 405, 43−48. (14) Nauli, S.; Kuhlman, B.; Baker, D. Computer-based redesign of a protein folding pathway. Nat. Struct. Biol. 2001, 8, 602−605. (15) Walsh, S. T. R.; Cheng, H.; Bryson, J. W.; Roder, H.; DeGrado, W. F. Solution structure and dynamics of a de novo designed threehelix bundle protein. Proc. Natl. Acad. Sci. U. S. A. 1999, 96, 5486− 5491. (16) Honda, S.; Yamasaki, K.; Sawada, Y.; Morii, H. 10 residue folded peptide designed by segment statistics. Structure 2004, 12, 1507−1518. (17) MacKerell, A. D., Jr.; Feig, M.; Brooks, C. L., III Extending the treatment of backbone energetics in protein force fields: limitations of gas-phase quantum mechanics in reproducing protein conformational distributions in molecular dynamics simulations. J. Comput. Chem. 2004, 25, 1400−1415. (18) MacKerell, A. D., Jr.; Bashford, D.; Bellott, M.; Dunbrack, R. L., Jr.; Evanseck, J. D.; Field, M. J.; Fischer, S.; Gao, J.; Guo, H.; Ha, S.; et al. All-atom empirical potential for molecular modeling and dynamics studies of proteins. J. Phys. Chem. B 1998, 102, 3586−3616. (19) Piana, S.; Lindorff-Larsen, K.; Shaw, D. E. How robust are protein folding simulations with respect to force field parameterization? Biophys. J. 2011, 100, L47−L49. (20) Hess, B.; Kutzner, C.; van der Spoel, D.; Lindahl, E. GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable molecular simulation. J. Chem. Theory Comput. 2008, 4, 435−447. (21) Pronk, S.; Pall, S.; Schulz, R.; Larsson, P.; Bjelkmar, P.; Apostolov, R.; Shirts, M. R.; Smith, J. C.; Kasson, P. M.; van der Spoel, D.; et al. GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit. Bioinformatics 2013, 29, 845−854. (22) Imai, T.; Harano, Y.; Kinoshita, M.; Kovalenko, A.; Hirata, F. Theoretical analysis on changes in thermodynamic quantities upon protein folding: Essential role of hydration. J. Chem. Phys. 2007, 126, 225102. (23) Maruyama, Y.; Harano, Y. Does water drive protein folding? Chem. Phys. Lett. 2013, 581, 85−90. (24) Pettersen, E. F.; Goddard, T. D.; Huang, C. C.; Couch, G. S.; Greenblatt, D. M.; Meng, E. C.; Ferrin, T. E. UCSF Chimera-a visualization system for exploratory research and analysis. J. Comput. Chem. 2004, 25, 1605−1612. (25) Singer, S. J.; Chandler, D. Free energy functions in the extended RISM approximation. Mol. Phys. 1985, 55, 621−625. (26) Kovalenko, A.; Hirata, F. Self-consistent description of a metalwater interface by the Kohn-Sham density functional theory and the three-dimensional reference interaction site model. J. Chem. Phys. 1999, 110, 10095. (27) Harano, Y.; Roth, R.; Kinoshita, M. On the energetics of protein folding in aqueous solution. Chem. Phys. Lett. 2006, 432, 275−280. (28) Yoshidome, T.; Ekimoto, T.; Matubayasi, N.; Harano, Y.; Kinoshita, M.; Ikeguchi, M. An accurate and efficient computation method of the hydration free energy of a large, complex molecule. J. Chem. Phys. 2015, 142, 175101.

(29) Luchko, T.; Gusarov, S.; Roe, D. R.; Simmerling, C.; Case, D. A.; Tuszynski, J.; Kovalenko, A. Three-Dimensional Molecular Theory of Solvation Coupled with Molecular Dynamics in Amber. J. Chem. Theory Comput. 2010, 6, 607−624. (30) Omelyan, I.; Kovalenko, A. Generalised canonical isokinetic ensemble: speeding up multiscale molecular dynamics and coupling with 3D molecular theory of solvation. Mol. Simul. 2013, 39, 25−48. (31) Omelyan, I.; Kovalenko, A. Multiple time step molecular dynamics in the optimized isokinetic ensemble steered with the molecular theory of solvation: Accelerating with advanced extrapolation of effective solvation forces. J. Chem. Phys. 2013, 139, 244106. (32) Omelyan, I.; Kovalenko, A. MTS-MD of Biomolecules Steered with 3D-RISM-KH Mean Solvation Force Accelerated with Generalized Solvation Force Extrapolation. J. Chem. Theory Comput. 2015, 11, 1875−1895.

9885

DOI: 10.1021/acs.jpcb.7b08487 J. Phys. Chem. B 2017, 121, 9881−9885