Combining Statistical Potentials with Dynamics-Based Entropies

Apr 10, 2012 - i GNM 2.0: the Gaussian network model database for biomolecular structural dynamics. Hongchun Li , Yuan-Yu Chang , Lee-Wei Yang , Ivet ...
0 downloads 13 Views 332KB Size
Article pubs.acs.org/JPCB

Combining Statistical Potentials with Dynamics-Based Entropies Improves Selection from Protein Decoys and Docking Poses Michael T. Zimmermann,†,‡ Sumudu P. Leelananda,‡ Andrzej Kloczkowski,‡,§,∥ and Robert L. Jernigan*,†,‡,§ †

Bioinformatics and Computational Biology Interdepartmental Graduate Program, ‡Department of Biochemistry, Biophysics, and Molecular Biology, and §LH Baker Center of Bioinformatics and Biological Statistics, Iowa State University, Ames, Iowa 50011, United States ABSTRACT: Protein structure prediction and protein−protein docking are important and widely used tools, but methods to confidently evaluate the quality of a predicted structure or binding pose have had limited success. Typically, either knowledge-based or physics-based energy functions are employed to evaluate a set of predicted structures (termed “decoys” in structure prediction and “poses” in docking), with the lowest energy structure being assumed to be the one closest to the native state. While successful for many cases, failures are still common. Thus, improvements to structure evaluation methods are essential for future improvements. In this work, we combine multibody statistical potentials with dynamics models, evaluating fluctuation-based entropies that include contributions from the entire structure. This leads to enhanced selection of native-like structures for CASP9 decoys, refined ClusPro docking poses, as well as large sets of docking poses from the Benchmark 3.0 and Dockground data sets. The data used include both bound and unbound docking, and positive results are found for each type. Not only does this method yield improved average results, but for high quality docking poses, we often pick the best pose.



carried out by Tanaka and Scheraga.3 Later, Miyazawa and Jernigan4 and Sippl5 further developed statistical contact potentials. Today, many statistical potentials have been developed and are being widely used. It has now become possible to derive a variety of improved, more specific potentials as the number of experimentally determined threedimensional protein structures in the PDB has grown substantially. As a result, knowledge-based potential functions have become extremely useful in computational studies such as protein native structure prediction,6−9 analyzing protein− protein interactions,10−13 and designing new proteins.14−17 Types of Statistical Potentials. There are three types of statistical contact potentials: distance-independent, distancedependent, and geometric. Distance-independent potentials are developed by assuming that interactions between amino acids are relatively short-range. One of the most popular and widely used distant-independent potential functions is the Miyazawa− Jernigan statistical contact potential.18 In distance-dependent potentials, the interactions depend not only upon the type of interaction but on the distance between the amino acids or atoms. Distances are typically discretized into concentric shells where interactions within each shell are weighted separately from other shells. Many distance-dependent potential functions have been developed.5,19,20

INTRODUCTION It is extremely difficult to describe interactions among residues or atoms in protein structures using first principles. As an alternative, statistical, empirical knowledge-based contact potentials have been developed as an approximate and convenient way to describe these interactions. By utilizing the information from the interactions in a representative set of known protein structures, it is possible to approximate interactions in other proteins. These are the knowledge-based potentials that have been developed by analyzing protein structures in the Protein Data Bank. Here, we give a short review on statistical contact potentials (another review can be found in ref 1). Development of such statistical contact potentials originated based on the discovery that all information required to specify the three-dimensional structure of a protein is contained in its amino acid sequence.2 Formulation of this thermodynamic hypothesis was proposed after experimental observations, and the postulate that, under physiological conditions, the native state of a protein corresponds to its global free energy minimum. This is the basis for the development of potential functions that are used to calculate the effective energy of a protein. Although the hypothesis has strong support, it is now known that there are some exceptions. Protein structures tend to change in response to local forces or local conditions and the physiologically relevant states may not correspond to the global minimum conformation. One of the very first knowledge-based potentials derived from statistical analysis of a protein structure database was © 2012 American Chemical Society

Special Issue: Harold A. Scheraga Festschrift Received: December 13, 2011 Revised: April 9, 2012 Published: April 10, 2012 6725

dx.doi.org/10.1021/jp2120143 | J. Phys. Chem. B 2012, 116, 6725−6731

The Journal of Physical Chemistry B

Article

Figure 1. Scheme for selecting the four amino acids for four-body sequential potential generation. (A) The example shown here is the HIV-1 protease structure 1T3R. (B) Zooming in on four sequential residues (Q18 through E21), their geometric center is calculated and shown as a red sphere. Blue residues are the Cα atoms in close proximity to this geometric center (8 Å). (C) The same view as part B but with only alpha carbons shown for the sequential (black) and close nonbonded (blue) residues; the backbone trace is also shown. Each four-body case is a combination of three sequential residues and one close nonbonded residue. One four-body case consists of E21, K20, L19, and I13. This is an example of one nonsequential residue interacting with three sequential residues. A similar construction is carried out for nonsequential interactions. Details of the construction are given in ref 30.

exposure, thus providing a more detailed and cooperative representation of protein interaction energies than do pairwise potentials.30 We have found that these four-body contact potentials can discriminate well between native structures and partially unfolded or deliberately misfolded structures. We have also derived an optimized potential by combining these longrange four-body interactions with short-range interaction energies.23 This potential obtains optimal weights for the various types of terms for the overall best performance. The geometric construction of the four bodies is explained with the aid of Figure 1. For each of the four consecutive amino acids i, i+1, i+2, and i+3 (colored black) along the sequence, the geometrical center (red) of their four side chain centers (Cα for Gly) is calculated. Residues in close proximity to this geometric center are selected (blue). Six planes are defined by the combinations of three points taken from all combinations of black pairs and the red center point, and these planes subdivide the space surrounding the red point into four tetrahedra. Each tetrahedron has a common vertex, which is the geometrical center of four side chain centers. Each of the four contacting bodies for the four-body potentials is obtained as follows. One triplet of amino acids from a tetrahedron is taken along with another amino acid which is not along the sequence but within a cutoff distance from the quartet’s geometrical center. This amino acid is considered to be in contact with the triplet if it is within a cutoff distance of 8 Å. This cutoff distance is selected because it yielded the best threading results. An example of a set of four bodies is shown by the four black sequential residues with the nonsequential blue residues. Among these, the three black residues form a sequence triplet, whose residue types are reduced to eight classes of amino acid types in order to ensure sufficient data for potential extraction. The single blue point within the quadruplet is not close in sequence. Here, the four bodies always have three sequential points and one nonsequential point in the quartet of interacting residues. There is no consideration of the specific sequence order of the three residues within each sequential backbone triplet in accumulating the information to construct the potential. As a result, we have only 120 different triplets instead of 512 (83). In collecting data, we have included all specific types of residues (20 types) for the fourth nonsequential point, within a distance of 8 Å from the geometric center and assigned to one of the corresponding four tetrahedra defined by the vectors originating from the red center to the black points. This

Knowledge-based potential functions can further be divided into two types: atomic-level potentials19−21 and coarse-grained potentials.3,4,18 Atomic, physics-based potentials have been found to be promising for protein folding studies. However, they are computationally costly when applied to large proteins because of the large number of atoms involved and even larger number of potential interactions. On the other hand, coarsegrained potentials can substantially reduce the computational cost of modeling native protein structures. Despite being considered insufficiently rigorous to reflect the entire landscape of a potential energy surface,22 we have shown that coarsegrained potentials exhibit a high level of performance, rivaling even that of the best atomic potentials.23 Multibody Potentials. Betancourt and Thirumalai reported that pairwise potentials are not sufficient for threading applications of proteins.24 Two-body potentials can only represent lower-order packing arrangements but cannot recognize all native folds within large data sets of decoy structures, represent three-dimensional interactions, or properly represent three-dimensional cooperativity.25 Similarly, Scheraga’s research group26,27 found two-body potentials for methane in water to be insufficient for capturing the cooperativity and showed that a three-body term was required. They have subsequently explored up to six-body potentials and learned that a four-body term is sufficient to capture the cooperativity present in proteins. Many-body potentials have been of interest recently because they can better account for the three-dimensional structures of proteins and model the cooperative nature of protein interactions. There are several multibody potentials found in the literature,28,29 and they have shown improved results relative to two-body potentials. Munson and Singh29 found small gains in threading by using three-body potentials. Krishnamoorthy and Tropsha28 showed that the four-body potentials obtained by using Delaunay tessellation algorithms can discern correct sequences or structures and provide better z-scores in energy, in comparison with two-body statistical potentials. However, these four-body contact potentials, derived by Delaunay tessellation and most two-body potentials,4,7 neglect the information on the location of residues (distance dependencies). Four-Body Contact Potentials. Four-body potentials developed by our group take into account the interactions between the backbone, side chains, and the extent of solvent 6726

dx.doi.org/10.1021/jp2120143 | J. Phys. Chem. B 2012, 116, 6725−6731

The Journal of Physical Chemistry B

Article

number of accessible states decreases, and the funnel narrows. Thus, folded states including the native state correspond to energy minima and exhibit fewer accessible states. The motions or fluctuations of a structure inform us about the entropy of the structure. While the motions of proteins are complex, there are some simple geometric properties that have fundamental effects on the dynamics including shape36 and packing density.37,38 Further, Shannon entropy follows packing density.39 Various forms of conformational entropy have been mathematically described including those by Schlitter40 and by Andricioaei and Karplus41 that rely on the covariance between atom positions from a sampling of the structure’s motions, such as from molecular dynamics (MD). Several studies have shown agreement between the essential dynamics42 calculated from MD trajectories and the motions computed from normal mode analysis (NMA) of a representative structure.43,44 We employ an NMA procedure called the elastic network model (ENM)45,46 to compute motions of protein structures, and these motions are used to approximate the entropy of a conformation. Our previous work47 used vibrational entropies that were based on the frequency of the normal modes, but only small gains were seen. In this work, we take a different approach that leads to significant gains by utilizing the mean square fluctuations computed from the ENM as a direct measure of entropy:

residue is then counted in the specific tetrahedron, and the procedure is repeated for all quartets defining closely interacting residues over the entire set of proteins. We thus derive four-body conformational sets comprising the three sequential residue triplets and a single nonsequential nearby residue. Three levels of solvation were considered: exposed (surface of the protein), intermediate, or buried (inside the protein core) by their relative solvent accessible surface area (RSA). Better results were obtained in discriminating native structures from a large number of decoys by using these four-body potentials categorized by RSA. A four-body contact potential energy was calculated according to the inverse Boltzmann principle.30 Four-body nonsequential potentials have also been developed by taking nonsequential triplets.31 As the four-body potential was parametrized using NACCESS RSA values,32 this program has also been used in the present work. One limitation of using NACCESS is that it requires an atomic model of the structure. The rest of the potential uses a coarse-grained representation. In the future, we plan to reparameterize the four-body potential so that an atomic model is not needed and a one-point-per-residue coarse-grained model can be used throughout. Optimized Four-Body Potentials. Our group has combined the four-body sequential,30 the four-body nonsequential,31 and the short-range potentials33 with weights as follows:

N

ΔS ∝ Γ−1 =

Voptimized = w4 ‐ body ‐ sequential ·V4 ‐ body ‐ sequential

∑ i=2

+ w4 ‐ body ‐ non ‐ sequential ·V4 ‐ body ‐ non ‐ sequential

1 (Q Q T ) λi i i

where Q is a normal mode, λ the corresponding square frequency, Γ the system’s Hessian, and Γ−1 its pseudoinverse. The equation is for an ENM using the Gaussian assumption (GNM) for nondirectional motions so that there is only one rigid body mode and N − 1 internal modes of motion, where N is the number of atoms. While ΔG = ΔE − TΔS, we will simply state ΔG = ΔE − ΔS to signify that we are combining the fourbody potential with the ENM entropy. We have weighted ΔS to have the same average contribution (per decoy or pose) as ΔE. Many other weights were considered, but none of them improved performance relative to ΔE. Enhanced weighting schemes that perhaps account for secondary structure, exposure, or residue properties will be explored in the future. Selecting Native-like Poses from CASP Structure Predictions. We evaluate the performance of our method using CASP9 decoys downloaded from the Prediction Center (http://predictioncenter.org).48 Each decoy for 111 CASP9 targets (54−887 decoys per target) is evaluated and ranked. Fewer than the full 129 targets were used because some targets do not have an available native structure for evaluation (the structure has not been determined experimentally), there were too many unresolved (by X-ray crystallography) residues in the native structure, or there were fewer than 50 decoys. Our results are shown in Figure 2. The average rmsd to the native structure is 13.4 Å for the best ΔE and 12.2 Å with ΔS included; this is a small improvement despite the poor performance of ΔS alone. One possible reason for the small gains here (relative to the larger gains in docking pose selection described below) is the observation that predicted structures tend to underestimate the packing density, which is one of the primary properties that ENMs model. With sufficient improvement in decoy selection, we may be able to use structure sampling procedures (ENM, Monte Carlo, molecular dynamics,

+ wshort range·Vshort range

Here, w and V respectively represent the weight and the potential for the cases indicated by the respective subscripts. Optimization of the weight for each term was performed using the particle swarm optimization (PSO) technique34 to find an optimized potential. The weight of the first four-body sequential term is arbitrarily set to 1.0 (w4‑body‑sequential = 1), and the weight coefficients for the other two terms were varied by using PSO. The optimized weights obtained for the fourbody nonsequential and short-range potentials were 0.28 and 0.22, respectively, for the homology modeling targets of CASP8. For template-free modeling targets in CASP8, the corresponding weights were different at 1.01 and 0.56. Details of the optimization can be found in ref 23. These potentials have been tested along with other twobody, four-body, and atomistic potentials using data from CASP8 and the Decoys ‘R’ Us database, and it was observed that after optimization the resulting potentials perform better than either of the four-body potentials do individually, better than all other coarse-grained potentials, and almost at the same level of performance as atomistic potentials.23 Evaluation of Structural Entropy. Entropy is a measure of how many microstates a system can sample. Proteins are not static structures but rather dynamic flexible entities that are constantly sampling different states or conformations. When considering a folded protein, this sampling is often referred to as the native state ensemble. A common model employed to describe the interplay between energy, entropy, and the structure’s conformation is an energy landscape funnel.35 This model describes high energy states having high entropy (funnel width at the top of the funnel), but as the energy is lower, the 6727

dx.doi.org/10.1021/jp2120143 | J. Phys. Chem. B 2012, 116, 6725−6731

The Journal of Physical Chemistry B

Article

Figure 3. Selection of ClusPro poses using unconstrained docking is capable of picking the best available pose. For 6 out of 15 targets, we are able to select the best pose using our combined free energy approach. The best pose is the minimal L-rmsd pose. We also show selections based on choosing the largest cluster size, which is a commonly used basis for selection. We again normalize the L-rmsd so that all poses can reasonably fit on the plot.

Figure 2. Combining four-body energy and ENM entropy for selecting native-like poses among CASP9 targets. Each decoy within 111 CASP9 targets is evaluated and ranked. Fewer than the full 129 targets are used because some targets do not have a native structure available for evaluation, or there are too many unresolved (by X-ray crystallography) residues in the native structure. We show the square root of rmsd for the best prediction (the most native-like decoy in the data set) and the lowest energy structures evaluated using four-body energy (ΔE), ENM-entropy (ΔS), and the combination. The second square root is applied to normalize the plot (so all the data is visible). The average rmsd to the native structure is 23.4 for ΔS alone, 13.4 Å for ΔE, and 12.2 Å for ΔG, an improvement despite the poor performance by ΔS alone. This indicates that the motions of decoy structures carry information that can be used to improve structure predictions.

Benchmark 3.0 data set.53 The structures used are 1ACB, 1AK4, 1ATN, 1AVX, 1BVN, 1CG1, 1D6R, 1DFJ, 1E6E, 1E96, 1EAW, 1EFN, 1EWY, 1F34, and 1FQ1. In this case, we have performed blind docking. That is, we did not give ClusPro any information about which residues should be in the interface. For 6 of the 15 targets, we pick the most native-like pose. Fourbody potentials alone only outperform the free energy method in three cases and fail to identify the most native-like pose. Figure 4 shows the results for the same 15 protein pairs, but

etc.) to refine a predicted structure toward the native state based on these free energies. Selecting Native-like Poses in Protein−Protein Docking. Protein−protein docking is a field where better potentials have a high likelihood of improving results. With the large growth in genomic sequences, computational methods for determining the details of binding events become increasingly important, since all interacting pairs of structures would be impossible to determine. We apply the present free energy evaluation to sets of docked poses, and find that the selection power is significantly increased compared to use of potentials alone. For reporting the quality of a binding pose, we use ligand rmsd (L-rmsd) and CAPRI classification.49 To calculate the Lrmsd of a pose, we first retrieve the experimentally determined native structure from the PDB. The structure is partitioned into a ligand and receptorthe receptor is typically the larger of the two proteins. The receptor from the computed pose is superimposed onto the receptor in the native structure. The rmsd's between the ligand in the pose and in the native structure are then reported as the L-rmsd. CAPRI classifies predictions as high, medium, acceptable, or incorrect based on the fraction of native contacts, the rmsd of interface atoms, and L-rmsd.49 Many docking procedures have been developed, and each has its own advantages and limitations. ClusPro50 is widely used and employed in this work. The ClusPro algorithm postprocesses poses generated by ZDock51 or PIPER52 using energy evaluation and spatial clustering. Figure 3 shows our results for 15 arbitrarily chosen structures (they have not been chosen to achieve a special level of performance) from the

Figure 4. Selection of ClusPro poses using constrained docking picks the best available pose for 9 out of 15 targets based on our new free energy approach. The key for the symbols is the same as for Figure 3.

now ClusPro was given a pair of residues involved in the binding site. Thus, the poses present are more focused around the correct binding orientation. For these more restricted cases, we choose the best pose 9 out of 15 times and are out performed by energy alone only once. We show significantly better performance with the present method in comparison with structures selected by the largest cluster, which is a usual selection method and one of the pieces of information provided by ClusPro. For unconstrained docking, choosing the largest cluster yields better poses than ours five times but only picks the best available pose twice. For 6728

dx.doi.org/10.1021/jp2120143 | J. Phys. Chem. B 2012, 116, 6725−6731

The Journal of Physical Chemistry B

Article

Figure 5. Improved performance across Benchmark 3.0 and Dockground protein docking pairs. From Benchmark 3.0, bound poses were considered, and from Dockground, unbound poses. (A) L-rmsd of the lowest energy pose from 176 Benchmark 3.0 targets shows a strong improvement for the new method. (B) The number of poses that have a lower L-rmsd than the lowest energy pose for Benchmark 3.0. (C) L-rmsd for Dockground targets shows the discriminating power of ΔS. The combined method, ΔG, shows improved discrimination relative to energy alone but poorer performance compared to ΔS, perhaps suggesting the possibility of future improvements from varying the relative contribution from ΔS and ΔE. (D) The number of poses having lower L-rmsd than the lowest energy pose. For ΔS, we pick a medium or high quality structure, according to CAPRI criteria, for 31/61 targets, a significant improvement over ΔE alone which identifies medium or high quality targets only 6/61 times.

amino acid. Certainly, this neglects some specific atomic considerations such as more specific atom pairs interacting that would likely add some additional stabilizing energy. In addition, atomic entropy considerations are absent, but the extent of such an effect is not fully clear, since that will depend significantly on the atomic packing densities at specific locations within a structure. Further side-chain entropy is neglected by the coarse-grained procedure but is known to be important. These additional considerations are difficult to evaluate, but nonetheless, the present approach could be augmented with atomic interaction energies for the energy minimized structure in a hierarchical way based on the coarsegrained structure. Also, presumably, the atomic entropies could be evaluated on the basis of such an atomic structure. This would provide a hierarchical scheme to evaluate free energies: first for the coarse-grained models, followed by constructing energy minimized atomic structures and evaluating their atomic energies and atomic entropies. One of the challenges in scoring protein structures is the development of reliable scoring functions that have a good correlation between proximity to the native structure and score. Poses where the L-rmsd is high are often regarded as random configurations of the receptor and ligand and are not well correlated with energy; poses far from the native position may have favorable energy. The introduction of entropic effects shows limited improvement at identifying better structure predictions but across three data sets can improve docking pose selection. Benchmark 3.0 and Dockground data sets represent “needle in a haystack” problems. The ZDock poses generated for Benchmark 3.0 contain between 0 and 4 acceptable or better (by CAPRI protocol) decoys out of 2000, while Dockground contains 2−4 out of about 100. When considering the very strict evaluation method of only considering the best pose for each target, entropy evaluation successfully identifies a good prediction for half of the Dockground targets.

constrained docking, using the largest cluster size gives a better pose five times but only picks the best pose once. The ClusPro algorithm generates many poses, clusters them, and returns the centroid of each to the user. As a more rigorous test of our method, we next evaluate 2000 ZDock51 poses for all 176 targets from the Benchmark 3.0 data set53 using the bound conformations. The results are summarized in Figure 5. For this data set, ΔE and ΔS both have similar distributions of scores. That is, they perform similarly, on average. However, the distribution from combining ΔE and ΔS is significantly shifted toward more native-like values. Performing a similar test on the Dockground data set,54 the combined approach exhibits smaller gains compared to energy potentials. Dockground contains exclusively poses generated from independently solved crystal structures. That is, all of these are all unbound docking poses. However, the fluctuation-based entropies alone are very successful at identifying the native-like structures. Using the CAPRI criteria, we identify medium or high quality poses for 31 out of 61 targets, whereas the potentials alone only do so for 6 targets.



DISCUSSION L-rmsd captures differences in the overall position of the ligand as well as its orientation but is insensitive to some details at the binding interface. Particularly for promiscuous or disordered proteins, many binding sites may exist and the crystal form is only one of them. Using the bound conformations as in this study represents a best-case scenario for rigid docking algorithms. Flexible docking is important but has not been explored here. Further, the optimized potentials may not be ideally parametrized for use with the entropic effects and a new optimization could be performed to obtain a more selfconsistent set of weights in the future. The present method represents a coarse-grained approach where only single geometric points have been used for each 6729

dx.doi.org/10.1021/jp2120143 | J. Phys. Chem. B 2012, 116, 6725−6731

The Journal of Physical Chemistry B

Article

(10) Vajda, S.; Kozakov, D. Curr. Opin. Struct. Biol. 2009, 19, 164− 170. (11) de Azevedo, W. F., Jr.; Dias, R. Curr. Drug Targets 2008, 9, 1031−1039. (12) Vakser, I. A.; Kundrotas, P. Curr. Pharm. Biotechnol. 2008, 9, 57−66. (13) Ritchie, D. W. Curr. Protein Pept. Sci. 2008, 9, 1−15. (14) Bellows, M. L.; Floudas, C. A. Curr. Drug Targets 2010, 11, 264−278. (15) Mandell, D. J.; Kortemme, T. Curr. Opin. Biotechnol. 2009, 20, 420−428. (16) Mandell, D. J.; Kortemme, T. Nat. Chem. Biol. 2009, 5, 797− 807. (17) Gerlt, J. A.; Babbitt, P. C. Curr. Opin. Chem. Biol. 2009, 13, 10− 18. (18) Miyazawa, S.; Jernigan, R. L. J. Mol. Biol. 1996, 256, 623−644. (19) Samudrala, R.; Moult, J. J. Mol. Biol. 1998, 275, 895−916. (20) Lu, H.; Skolnick, J. Proteins 2001, 44, 223−232. (21) Zhou, H.; Zhou, Y. Protein Sci. 2002, 11, 2714−2726. (22) Skolnick, J. Curr. Opin. Struct. Biol. 2006, 16, 166−171. (23) Gniewek, P.; Leelananda, S. P.; Kolinski, A.; Jernigan, R. L.; Kloczkowski, A. Proteins 2011, 79, 1923−1929. (24) Betancourt, M.; Thirumalai, D. Protein Sci. 1999, 8, 361−369. (25) Vendruscolo, M.; Najmanovich, R.; Domany, E. Proteins 2000, 38, 134−148. (26) Czaplewski, C.; Rodziewicz-Motowidlo, S.; Liwo, A.; Ripoll, D. R.; Wawak, R. J.; Scheraga, H. A. Protein Sci. 2000, 9, 1235−1245. (27) Czaplewski, C.; Rodziewicz-Motowidló, S.; Dabal, M.; Liwo, A.; Ripoll, D. R.; Scheraga, H. A. Biophys. Chem. 2003, 105, 339−359. (28) Krishnamoorthy, B.; Tropsha, A. Bioinformatics 2003, 19, 1540− 1548. (29) Munson, P.; Singh, R. K. Protein Sci. 1997, 6, 1467−1481. (30) Feng, Y.; Kloczkowski, A.; Jernigan, R. L. Proteins 2007, 68, 57− 66. (31) Feng, Y.; Kloczkowski, A.; Jernigan, R. L. BMC Bioinf. 2010, 11, 92. (32) Hubbard, S. J.; Campbell, S. F.; Thornton, J. M. J. Mol. Biol. 1991, 220, 507−530. (33) Bahar, I.; Kaplan, M.; Jernigan, R. L. Proteins 1997, 29, 292− 308. (34) Kennedy, J.; Eberhart, R. C. Proceedings of IEEE International Conference on Neural Networks 1995, 1942−1948. (35) Cheung, M. S.; Chavez, L. L.; Onuchic, J. N. Polymer 2004, 45, 547−555. (36) Lu, M.; Ma, J. Biophys. J. 2005, 89, 2395−2401. (37) Pabuwal, V.; Li, Z. J. Protein Eng. 2009, 22, 67−73. (38) Jernigan, R. L.; Kloczkowski, A. Methods Mol. Biol. 2007, 350, 251−276. (39) Liao, H.; Yeh, W.; Chiang, D.; Jernigan, R. L.; Lustig, B. Protein Eng. 2005, 18, 59−64. (40) Schlitter, J. Chem. Phys. Lett. 1993, 215, 617−621. (41) Andricioaei, I.; Karplus, M. J. Chem. Phys. 2001, 115, 6289− 6292. (42) Hayward, S.; de Groot, B. L. Molecular Modeling of Proteins; Kukol, A., Ed. Humana Press: Tutowa, NJ, 2008; pp 89−106. (43) Bakan, A.; Bahar, I. Pac. Symp. Biocomput. 2011, 181−192. (44) Yang, L.; Song, G.; Carriquiry, A.; Jernigan, R. L. Structure 2008, 16, 321−330. (45) Zimmermann, M. T.; Kloczkowski, A.; Jernigan, R. L. BMC Bioinf. 2011, 12, 264. (46) Bahar, I.; Atilgan, A. R.; Erman, B. Folding Des. 1997, 2, 173− 181. (47) Zimmermann, M. T.; Leelananda, S. P.; Feng, Y.; Gniewek, P.; Jernigan, R. L.; Kloczkowski, A. J. Struct. Funct. Genomics 2011, 12, 137−147. (48) MacCallum, J. L.; Perez, A.; Schnieders, M. J.; Hua, L.; Jacobson, M. P.; Dill, K. A. Proteins 2011, 79 (Suppl.10), 74−90. (49) Mendez, R.; Leplae, R.; Lensink, M. F.; Wodak, S. J. Proteins 2005, 60, 150−169.

While docking of known binding pairs (bound docking) has been shown to be explained to a significant extent by shape complementarities,55,56 this approach has not proven as successful for unbound docking. In addition, typical molecular computations also include the use of some type of potential function for evaluating various poses, which is a rather different evaluation basis than shape complementarity. The more difficult and relevant test for scoring is to evaluate unbound docking cases. For unbound docking, how to incorporate protein flexibility (flexible docking) is a current and advancing area of research. We point out that one of the main advantages of the present approach is its use of coarse-graining (both the potential function used here and the entropies from the ENM are coarse-grained), so that any small reconfiguration upon binding, such as side chain reconfiguration, will not much affect this evaluation. Our robust method leads to significant gains for the unbound docking cases extracted from Dockground where the precise detailed structures of the bound partners are unknown.



CONCLUSIONS By combining multibody statistical contact potentials with entropies derived from structure dynamics, we present an algorithm for classifying predictions from protein structure prediction and protein−protein docking which performs significantly better than other methods. The novel feature of our method is the inclusion of entropies based on the collective dynamics of the structure. As the statistical potential and dynamics models used are coarse-grained at one geometric point per residue, the evaluation of large sets of structures is possible. The gains demonstrated are particularly large for protein binding using either bound or unbound conformations. Since the ENMs depend substantially upon the overall shapes of the individual structures, they provide an important measure of the entropy changes upon binding. These entropy changes depend on the shape of the entire protein complex, and not just the interface.



AUTHOR INFORMATION

Corresponding Author

*Phone: 515-294-7278. E-mail: [email protected]. Present Address ∥

Battelle Center for Mathematical Medicine, Department of Pediatrics, Nationwide Children’s Hospital, The Ohio State University, Columbus, OH 43205. Notes

The authors declare no competing financial interest.



REFERENCES

(1) Leelananda, S. P.; Feng, Y.; Gniewek, P.; Kloczkowski, A.; Jernigan, R. L. In Multiscale approaches to protein modeling; Kolinski, A., Ed.; Springer: New York, 2011; pp 127−157. (2) Anfinsen, C. B. Science 1973, 181, 223−230. (3) Tanaka, S.; Scheraga, H. A. Macromolecules 1976, 9, 945−950. (4) Miyazawa, S.; Jernigan, R. L. Macromolecules 1985, 18, 534−552. (5) Sippl, M. J. Mol. Biol. 1990, 213, 859−883. (6) Kihara, D.; Chen, H.; Yang, Y. D. Curr. Protein Pept. Sci. 2009, 10, 216−228. (7) Skolnick, J.; Jaroszewski, L.; Kolinski, A.; Godzik. Protein Sci. 1997, 6, 676−688. (8) Skolnick, J.; Brylinski, M. Briefings Bioinf. 2009, 10, 378−391. (9) Kryshtafovych, A.; Fidelis, K. Drug Discovery Today 2009, 14, 386−393. 6730

dx.doi.org/10.1021/jp2120143 | J. Phys. Chem. B 2012, 116, 6725−6731

The Journal of Physical Chemistry B

Article

(50) Comeau, S. R.; Gatchell, D. W.; Vajda, S.; Camacho, C. J. Bioinformatics 2004, 20, 45−50. (51) Chen, R.; Li, L.; Weng, Z. Proteins 2003, 52, 80−87. (52) Kozakov, D.; Brenke, R.; Comeau, S. R.; Vajda, S. Proteins 2006, 65, 392−406. (53) Hwang, H.; Pierce, B.; Mintseris, J.; Janin, J. l.; Weng, Z. Proteins 2008, 73, 705−709. (54) Liu, S.; Gao, Y.; Vakser, I. A. Bioinformatics 2008, 24, 2634− 2635. (55) Norel, R.; Petrey, D.; Wolfson, H. J.; Nussinov, R. Proteins 1999, 36, 307−317. (56) Chen, R.; Weng, Z. Proteins 2003, 51, 397−408.

6731

dx.doi.org/10.1021/jp2120143 | J. Phys. Chem. B 2012, 116, 6725−6731