Biochemistry 2005, 44, 2059-2071
2059
Virtual Screening against Highly Charged Active Sites: Identifying Substrates of Alpha-Beta Barrel Enzymes† Chakrapani Kalyanaraman, Katarzyna Bernacki, and Matthew P. Jacobson* Department of Pharmaceutical Chemistry, UniVersity of CaliforniasSan Francisco, San Francisco, California 94143-2240 ReceiVed August 31, 2004; ReVised Manuscript ReceiVed NoVember 26, 2004
ABSTRACT: We have developed a virtual ligand screening method designed to help assign enzymatic function for alpha-beta barrel proteins. We dock a library of ∼19,000 known metabolites against the active site and attempt to identify the relevant substrate based on predicted relative binding free energies. These energies are computed using a physics-based energy function based on an all-atom force field (OPLSAA) and a generalized Born implicit solvent model. We evaluate the ability of this method to identify the known substrates of several members of the enolase superfamily of enzymes, including both holo and apo structures (11 total). The active sites of these enzymes contain numerous charged groups (lysines, carboxylates, histidines, and one or more metal ions) and thus provide a challenge for most docking scoring functions, which treat electrostatics and solvation in a highly approximate manner. Using the physics-based scoring procedure, the known substrate is ranked within the top 6% of the database in all cases, and in 8 of 11 cases, it is ranked within the top 1%. Moreover, the top-ranked ligands are strongly enriched in compounds with high chemical similarity to the substrate (e.g., different substitution patterns on a similar scaffold). These results suggest that our method can be used, in conjunction with other information including genomic context and known metabolic pathways, to suggest possible substrates or classes of substrates for experimental testing. More broadly, the physics-based scoring method performs well on highly charged binding sites and is likely to be useful in inhibitor docking against polar binding sites as well. The method is fast (25 3.9 23.7 9.1 5.9 25.7 >25 1.9
MR (holo) MR (apo) GlucD (holo) MLE-I (apo) MAL (holo) MAL (apo) OSBS (holo) OSBS (apo) enolase (holo)
6.5 12.1 >25 3.8 4.9 7.1 12.9 >25 3.7
12.8 22.9 0.03 0.69 0.22 11.6 11.5 >25 5.7
22.6 >25 >25 19.4 10.6 >25 13.3 >25 >25
MR (holo) MR (apo) GlucD (holo) MLE-I (apo) MAL (holo) MAL (apo) OSBS (holo) OSBS (apo) enolase (holo)
0.41 0.65 >25 3.2 6.2 0.78 2.1 >25 21.2
4.5 9.8 0.02 3.3 11.0 4.1 6.5 >25 1.4
(b) Rank after Rescoring 21.1 9.3 >25 >25 >25 5.0 5.4 4.1 22.2 1.0 >25 0.05 18.7 7.0 >25 >25 >25 1.1
2-succinyl-2-hydroxy-2,4cyclohexadiene-1-carboxylate
2-phosphoglycerate
10.7 12.6 >25 16.0 5.1 >25 6.1 9.8 >25
6.5 7.6 0.93 6.8 2.7 >25 11.7 >25 0.04
18.8 12.0 >25 23.1 23.8 >25 0.21 5.2 >25
21.9 19.8 8.6 23.9 22.6 >25 12.3 >25 0.16
a
The entry “>25” signifies that the ligand ranked lower than the top 25% after the docking phase and thus was not subjected to rescoring. The Ala-Glu epimerase (AEE) is not included in these results because the amino acid dipeptides are not part of the standard KEGG LIGANDS library.
enriched in the top few percent of the ranked ligand list, especially after rescoring, as shown in Figure 3. However, the method also shows the ability to identify the correct ligand containing this substructure from others. In Table 2, we have gathered the ranks for each known substrate obtained after docking and rescoring using each of the enzyme structures. The columns of the table make it possible to assess whether a given ligand scores better against the enzyme for which it is the known substrate than against the other enzymes. The rows make it possible to assess whether the known substrate for a particular enzyme outranks the known substrates for other enzymes. By both criteria, the results after rescoring show strong evidence of capturing selectivity; that is, the right ligand ranks highest for the right enzyme. The only problematic case is the apo MLE enzyme. Although the known substrate scores better against MLE than against any of the other enzymes, the D-glucarate and S-mandelate ligands outscore it. The results before rescoring (i.e., using the docking scoring function) show very little ability to capture selectivity, with GlucD and enolase being the only exceptions. CONCLUSION We have developed a physics-based method for rescoring protein-ligand complexes generated by a docking program, and we have applied it to virtual metabolite screening against a diverse set of alpha-beta barrel enzymes in the enolase superfamily, which have highly charged binding sites. We conclude by briefly commenting on the strengths of, and possible improvements to, our approach. In general, the rescoring method appears to be highly robust, improving the rank of the known substrates significantly in a large majority of cases; the only exceptions are cases where the docking program performs excellently to begin with. We attribute this success to the treatment of electrostatics and solvation in our energy function, which consists of the OPLS-AA force field and a generalized Born
implicit solvent model. Thus, the rescoring method accounts for desolvation of both the protein and ligand upon binding, which would be very difficult to account for in grid-based scoring functions used in high-throughput docking. We believe this to be critical for studying the enolase and other alpha-beta barrel enzymes, in which the active sites generally contain a large number of charged groups (and the substrates are frequently charged as well). Initial tests of our method on virtual inhibitor screening against polar binding sites have also demonstrated significant improvements in enrichment (N. Huang, C. Kalyanaraman, J. Irwin, B. K. Shoichet, and M. P. Jacobson, in preparation). The other major strength of the method described here is its speed. The average computational cost of rescoring a protein-ligand complex in this work was ∼45 s, on a recentgeneration single processor PC. Thus, tens of thousands of complexes can be rescored on a small cluster with relatively modest computational expense, in contrast to more sophisticated but expensive physics-based methods such as MMPBSA and FEP. The speed of the method is made possible by a highly efficient minimization algorithm, based on the truncated Newton method, in generalized Born solvent. In fact, the minimization itself requires only ∼15 s on average, with the remaining time associated with loading the protein, assigning parameters, etc. Further algorithmic optimization will reduce this computational overhead. This speed enables our method to be used with large ligand libraries, such as those used in most virtual inhibitor screening applications. One requirement for the success of our method is correct assignment of protonation states on both the protein receptor and ligands. In this work, we manually assigned protonation states to histidines in the binding site, based on their hydrogen bonding partners, but we can envision automated assignment of protonation states, for example, using algorithms based on continuum electrostatics. In tests where we set the histidine protonation states incorrectly, the results of the rescoring were almost invariably worse, sometimes
2070 Biochemistry, Vol. 44, No. 6, 2005 dramatically (data not shown). This is not surprising, because we include full electrostatics in the rescoring. One major limitation of our method is the treatment of entropic losses associated with ligand binding. We crudely account for the loss of internal ligand entropy by using a simple penalty based on the number of rotatable bonds. Translational and rotational entropy losses are not accounted for at all. Although we attempt to reproduce only relative and not absolute binding free energies, we nonetheless expect that improved treatment of entropic losses would improve the enrichment of binders by our rescoring method. Finally, all rescoring results presented here treated the receptor as entirely rigid. Relaxing this constraint could potentially improve results in cases where nontrivial conformational changes occur upon ligand binding. The simplest approximation would be to simply allow residues in the binding site to minimize along with the ligand (which requires only modest increases in computational expense); early results suggest that this strategy can improve results on docking to apo structures. More elaborate rescoring methods can include, for example, rotamer searches for side chains in the binding site, to deal with larger conformational changes. ACKNOWLEDGMENT We thank Patsy Babbitt (UCSF) and John Gerlt (UIUC) for introducing us to this problem and helping to guide our work; Brian Shoichet, John Irwin, Niu Huang, Elaine Meng, Brian Tuch and Robert Rizzo (UCSF) for many helpful discussions and critical technical assistance with docking; and Schro¨dinger, Inc. for use of and assistance with Glide and QikProp. M.P.J. is a member of the Scientific Advisory Board of Schro¨dinger, Inc. REFERENCES 1. Shoichet, B. K., McGovern, S. L., Wei, B., and Irwin, J. J. (2002) Lead discovery using molecular docking, Curr. Opin. Chem. Biol. 6, 439-46. 2. Brooijmans, N., and Kuntz, I. D. (2003) Molecular recognition and docking algorithms, Annu. ReV. Biophys. Biomol. Struct. 32, 335-73. 3. Jorgensen, W. L. (2004) The many roles of computation in drug discovery, Science 303, 1813-8. 4. Murzin, A. G., Brenner, S. E., Hubbard, T., and Chothia, C. (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures, J. Mol. Biol. 247, 53640. 5. Gerlt, J. A., and Babbitt, P. C. (2001) Divergent evolution of enzymatic function: mechanistically diverse superfamilies and functionally distinct suprafamilies, Annu. ReV. Biochem. 70, 20946. 6. Babbitt, P. C. (2003) Definitions of enzyme function for the structural genomics era, Curr. Opin. Chem. Biol. 7, 230-7. 7. Gerlt, J. A., and Raushel, F. M. (2003) Evolution of function in (beta/alpha)8-barrel enzymes, Curr. Opin. Chem. Biol. 7, 25264. 8. Babbitt, P. C., Hasson, M. S., Wedekind, J. E., Palmer, D. R., Barrett, W. C., Reed, G. H., Rayment, I., Ringe, D., Kenyon, G. L., and Gerlt, J. A. (1996) The enolase superfamily: a general strategy for enzyme-catalyzed abstraction of the alpha-protons of carboxylic acids, Biochemistry 35, 16489-501. 9. Bissantz, C., Folkers, G., and Rognan, D. (2000) Protein-based virtual screening of chemical databases. 1. Evaluation of different docking/scoring combinations, J. Med. Chem. 43, 4759-67. 10. Stahl, M., and Rarey, M. (2001) Detailed analysis of scoring functions for virtual screening, J. Med. Chem. 44, 1035-1042.
Kalyanaraman et al. 11. Wang, R., Lu, Y., and Wang, S. (2003) Comparative evaluation of 11 scoring functions for molecular docking, J. Med. Chem. 46, 2287-303. 12. Glide. (2003), Schrodinger Inc., New York. 13. Friesner, R. A., Banks, J. L., Murphy, R. B., Halgren, T. A., Klicic, J. J., Mainz, D. T., Repasky, M. P., Knoll, E. H., Shelley, M., Perry, J. K., Shaw, D. E., Francis, P., and Shenkin, P. S. (2004) Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy, J. Med. Chem. 47, 1739-49. 14. Halgren, T. A., Murphy, R. B., Friesner, R. A., Beard, H. S., Frye, L. L., Pollard, W. T., and Banks, J. L. (2004) Glide: a new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screening, J. Med. Chem. 47, 1750-9. 15. Jorgensen, W. L., Maxwell, D. S., and TiradoRives, J. (1996) Development and testing of the OPLS all-atom force field on conformational energetics and properties of organic liquids, J. Am. Chem. Soc. 118, 11225-11236. 16. Kaminski, G. A., Friesner, R. A., Tirado-Rives, J., and Jorgensen, W. L. (2001) Evaluation and reparametrization of the OPLS-AA force field for proteins via comparison with accurate quantum chemical calculations on peptides, J. Phys. Chem. B 105, 64746487. 17. Ghosh, A., Rapp, C. S., and Friesner, R. A. (1998) Generalized born model based on a surface integral formulation, J. Phys. Chem. B 102, 10983-10990. 18. Kollman, P. (1993) Free-Energy CalculationssApplications to Chemical and Biochemical Phenomena, Chem. ReV. 93, 23952417. 19. Hoffmann, D., Kramer, B., Washio, T., Steinmetzer, T., Rarey, M., and Lengauer, T. (1999) Two-stage method for protein-ligand docking, J. Med. Chem. 42, 4422-4433. 20. Zou, X. Q., Sun, Y. X., and Kuntz, I. D. (1999) Inclusion of solvation in ligand binding free energy calculations using the generalized-born model, J. Am. Chem. Soc. 121, 8033-8043. 21. Verkhivker, G. M., Bouzida, D., Gehlhaar, D. K., Rejto, P. A., Arthurs, S., Colson, A. B., Freer, S. T., Larson, V., Luty, B. A., Marrone, T., and Rose, P. W. (2000) Deciphering common failures in molecular docking of ligand-protein complexes, J. Comput.Aided Mol. Des. 14, 731-751. 22. Wu, G., Robertson, D. H., Brooks, C. L., and Vieth, M. (2003) Detailed analysis of grid-based molecular docking: A case study of CDOCKERsA CHARMm-based MD docking algorithm, J. Comput. Chem. 24, 1549-1562. 23. Floriano, W. B., Vaidehi, N., Zamanakos, G., and Goddard, W. A., 3rd. (2004) HierVLS hierarchical docking protocol for virtual ligand screening of large-molecule databases, J. Med. Chem. 47, 56-71. 24. Goto, S., Okuno, Y., Hattori, M., Nishioka, T., and Kanehisa, M. (2002) LIGAND: database of chemical compounds and reactions in biological pathways, Nucleic Acids Res. 402-4. 25. Daylight. Los Altos, CA. 26. OpenEye. Santa Fe, NM. 27. Eldridge, M. D., Murray, C. W., Auton, T. R., Paolini, G. V., and Mee, R. P. (1997) Empirical scoring functions: I. The development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes, J. Comput.-Aided Mol. Des. 11, 425-45. 28. Jacobson, M. P., Kaminski, G. A., Friesner, R. A., Rapp, C. S. (2002) Force Field Validation Using Protein Side Chain Prediction, J. Phys. Chem. B 106, 11673. 29. Jacobson, M. P., Pincus, D. L., Rapp, C. S., Day, T. J., Honig, B., Shaw, D. E., and Friesner, R. A. (2004) A hierarchical approach to all-atom protein loop prediction, Proteins 55, 35167. 30. Schlick, T., and Fogelson, A. (1992) Tnpack - a Truncated Newton Minimization Package for Large-Scale Problems 0.1. Algorithm and Usage, ACM Trans. Math. Software 18, 46-70. 31. Xie, D. X., and Schlick, T. (1999) Efficient implementation of the truncated-Newton algorithm for large-scale chemistry applications, SIAM J. Optimization 10, 132-154. 32. Schlick, T., and Overton, M. (1987) A Powerful Truncated Newton Method for Potential-Energy Minimization, J. Comput. Chem. 8, 1025-1039. 33. QikSim. (2003). 34. Willett, P., Barnard, J. M., Downs, G. M. (1998) Chemical Similarity Searching, J. Chem. Inf.Comput. Sci. 38, 983. 35. Neidhart, D. J., Howell, P. L., Petsko, G. A., Powers, V. M., Li, R. S., Kenyon, G. L., and Gerlt, J. A. (1991) Mechanism of the
Virtual Metabolite Screening reaction catalyzed by mandelate racemase. 2. Crystal structure of mandelate racemase at 2.5-A resolution: identification of the active site and possible catalytic residues, Biochemistry 30, 926473. 36. Landro, J. A., Kallarakal, A. T., Ransom, S. C., Gerlt, J. A., Kozarich, J. W., Neidhart, D. J., and Kenyon, G. L. (1991) Mechanism of the reaction catalyzed by mandelate racemase. 3. Asymmetry in reactions catalyzed by the H297N mutant, Biochemistry 30, 9274-81. 37. Mitra, B., Kallarakal, A. T., Kozarich, J. W., Gerlt, J. A., Clifton, J. G., Petsko, G. A., and Kenyon, G. L. (1995) Mechanism of the reaction catalyzed by mandelate racemase: importance of electrophilic catalysis by glutamic acid 317, Biochemistry 34, 277787. 38. Gulick, A. M., Hubbard, B. K., Gerlt, J. A., and Rayment, I. (2000) Evolution of enzymatic activities in the enolase superfamily: crystallographic and mutagenesis studies of the reaction catalyzed by D-glucarate dehydratase from Escherichia coli, Biochemistry 39, 4590-602. 39. Babbitt, P. C., Mrachko, G. T., Hasson, M. S., Huisman, G. W., Kolter, R., Ringe, D., Petsko, G. A., Kenyon, G. L., and Gerlt, J. A. (1995) A functionally diverse enzyme superfamily that abstracts the alpha protons of carboxylic acids, Science 267, 1159-61. 40. Helin, S., Kahn, P. C., Guha, B. L., Mallows, D. G., and Goldman, A. (1995) The refined X-ray structure of muconate lactonizing enzyme from Pseudomonas putida PRS2000 at 1.85 A resolution, J. Mol. Biol. 254, 918-41. 41. Levy, C. W., Buckley, P. A., Sedelnikova, S., Kato, Y., Asano, Y., Rice, D. W., and Baker, P. J. (2002) Insights into enzyme evolution revealed by the structure of methylaspartate ammonia lyase, Structure (Cambridge) 10, 105-13. 42. Goda, S. K., Minton, N. P., Botting, N. P., and Gani, D. (1992) Cloning, sequencing, and expression in Escherichia coli of the Clostridium tetanomorphum gene encoding beta-methylaspartase and characterization of the recombinant protein, Biochemistry 31, 10747-56.
Biochemistry, Vol. 44, No. 6, 2005 2071 43. Shindyalov, I. N., and Bourne, P. E. (1998) Protein structure alignment by incremental combinatorial extension (CE) of the optimal path, Protein Eng. 11, 739-47. 44. Schmidt, D. M., Hubbard, B. K., and Gerlt, J. A. (2001) Evolution of enzymatic activities in the enolase superfamily: functional assignment of unknown proteins in Bacillus subtilis and Escherichia coli as L-Ala-D/L-Glu epimerases, Biochemistry 40, 1570715. 45. Gulick, A. M., Schmidt, D. M., Gerlt, J. A., and Rayment, I. (2001) Evolution of enzymatic activities in the enolase superfamily: crystal structures of the L-Ala-D/L-Glu epimerases from Escherichia coli and Bacillus subtilis, Biochemistry 40, 15716-24. 46. Klenchin, V. A., Schmidt, D. M., Gerlt, J. A., and Rayment, I. (2004) Evolution of Enzymatic Activities in the Enolase Superfamily: Structure of a Substrate-Liganded Complex of the l-Alad/l-Glu Epimerase from Bacillus subtilis, Biochemistry 43, 103708. 47. Thompson, T. B., Garrett, J. B., Taylor, E. A., Meganathan, R., Gerlt, J. A., and Rayment, I. (2000) Evolution of Enzymatic Activity in the Enolase Superfamily: Structure of o-Succinylbenzoate Synthase from Escherichia coli in Complex with Mg2+ and o-Succinylbenzoate, Biochemistry 39, 10662-76. 48. Klenchin, V. A., Ringia, E. A. T., Gerlt, J. A., and Rayment, I. (2003) Evolution of enzymatic activity in the enolase superfamily: Structural and mutagenic studies of the mechanism of the reaction catalyzed by o-succinylbenzoate synthase from Escherichia coli, Biochemistry 42, 14427-14433. 49. Wedekind, J. E., Poyner, R. R., Reed, G. H., and Rayment, I. (1994) Chelation of serine 39 to Mg2+ latches a gate at the active site of enolase: structure of the bis(Mg2+) complex of yeast enolase and the intermediate analogue phosphonoacetohydroxamate at 2.1-A resolution, Biochemistry 33, 9333-42. BI0481186