Combining Protein Modeling and 6D-QSAR ... - ACS Publications

May 3, 2005 - Yeti) for the identification of the binding mode(s) and 6D-. QSAR (software Quasar) for their quantification. The results obtained for 1...
0 downloads 0 Views 273KB Size
3700

J. Med. Chem. 2005, 48, 3700-3703

Combining Protein Modeling and 6D-QSAR. Simulating the Binding of Structurally Diverse Ligands to the Estrogen Receptor† Angelo Vedani,* Max Dobler, and Markus A. Lill Biographics Laboratory 3R, Friedensgasse 35, 4056 Basel, Switzerland Received February 28, 2005 Abstract: We present a concept for the in silico simulation of adverse effects triggered by drugs and chemicals. The underlying philosophy combines flexible docking (software Yeti) for the identification of the binding mode(s) and 6DQSAR (software Quasar) for their quantification. The results obtained for 106 diverse molecules binding to the estrogen receptor (q2 ) 0.903; p2 ) 0.885) suggest that our approach is suitable for the identification of an endocrine-disrupting potential associated with drugs and chemicals.

Nuclear receptors comprise a family of ligand-dependent transcription factors that transform extra- and intracellular signals into cellular responses by triggering the transcription of target genes. In particular, they mediate the effects of hormones and other endogenous ligands to regulate the expression of specific genes. Among other members, this family includes receptors for the various steroid hormones, e.g., the estrogen, androgen, progesterone, and glucocorticoid receptors. Unbalanced production or cell insensitivity to specific hormones may result in diseases associated with human endocrine dysfunction.1 The presence of hormonally active compounds (endocrine disruptors) in the biosphere has become a worldwide environmental concern. It was concluded that such compounds elicit a variety of adverse effects in humans and wildlife including promotion of hormone-dependent cancers, reproductive tract disorders, and a reduction in reproductive fitness. A number of receptor-mediated hormonal responses to toxicity are known, including xenobiotic effects on the thyroid hormone receptor, the epidermal growth factor receptor, the aryl hydrocarbon receptor, and effects mediated by the androgen and the estrogen receptor (ER), respectively. A variety of compounds in the environment were shown to display agonistic or antagonistic activity toward the ER, including natural products and synthetic compounds.2-8 The concern over xenobiotics binding to the ER has created a need to screen and monitor compounds that can modulate endocrine effects. This has been underscored by the U.S. legislation in 1995-1996 by mandating that chemicals and formulations must be screened for potential estrogenic activity before they are manufactured or used in certain processes.9,10 In Switzerland, the necessity for a coordinated interdisciplinary approach to the environmental and public health problems caused by endocrine disruption has been recognized and a National Research Pro* To whom correspondence should be addressed. Phone: +41-61261-4256. Fax: +41-61-261-4259. E-mail: [email protected]. † Dedicated to Professor Edgar F. Meyer, mentor and friend, in honor of his 70th birthday.

gramme on Endocrine Disruption (NRP50) has been launched in 2001.11 It is one of the objectives of the Biographics Laboratory 3R (an academically oriented, nonprofit organization) to establish a virtual laboratory accessible through the Internet and to allow for an in silico identification of drugs and chemicals for their potential to trigger harmful effects.12-14 In this account, we describe the development and validation of a three-dimensional model for the ER, aimed at the screening of drug candidates and environmental toxins for potential endocrine-disrupting activity. The underlying technology was developed over the past decade and continuously documented in this journal.15-18 Our study was based on the X-ray crystal structure of the human ER R ligand-binding domain with bound diethylstilbestrol, available at a resolution of 2.0 Å,19 which served as a template for identifying potential binding modes of the 106 investigated compounds. While the binding mode for the symmetric diethylstilbestrol molecule is unambiguous (the two phenolic hydroxyl groups engage in hydrogen bonds with Glu 353 and His 524, respectively; Figure 1), different orientations within the binding pocket are feasible for most of the asymmetric compounds used in this study. Here, an only moderate induced fit (involving the rearrangement of protein side chains) tolerates different ligand orientations (Figure 2). The 3D structures of all 106 compounds20-22 were generated and optimized in aqueous solution using the AMBER* force field23 as implemented in the MacroModel software;24 atomic partial charges and solvation energies were calculated using the program AMSOL.25 Subsequently, we sampled energetically feasible arrangements within the binding pocket using a Monte Carlo search protocol based on a Metropolis selection criterion (software Yeti).26,27 For each compound, 40 000 conformations and orientations within the ER binding site were randomly generated; 4000 thereof were fully minimized, including an extended binding pocket of 12 Å around the very ligand.28 For each ligand molecule, up to four energetically favorable arrangements (within 5.0 kcal/mol of the lowest-energy conformation) were retained and composed into a 4D data set (to be used as input to Quasar; see below) totaling in 344 representations for the 106 molecules. The ligands comprise six different substance classes and cover a range in activity (IC50) of 7 orders of magnitude (2.8 mM to 0.2 nM). Quasar, a receptor-modeling concept developed at our laboratory, is based on 6D-QSAR and explicitly allows for the simulation of induced fit.15,16,18,29 Quasar generates a family of quasi-atomistic receptor surrogates that are optimized by means of a genetic algorithm. The hypothetical receptor site is characterized by a threedimensional surface that surrounds the ligand molecules at van der Waals distance and which is populated with atomistic properties mapped onto it. The topology of this surface mimics the 3D shape of the binding site; the mapped properties represent other information of interest, such as hydrophobicity, electrostatic potential,

10.1021/jm050185q CCC: $30.25 © 2005 American Chemical Society Published on Web 05/03/2005

Letters

Journal of Medicinal Chemistry, 2005, Vol. 48, No. 11 3701

Figure 1. Stereoview of the binding mode of diethylstilbestrol to the ER at the true biological receptor.14 Figure 3. Stereoview of the ER surrogate (Quasar 5.0) with coumestrol bound (for clarity, the front part of the receptor model is clipped). The mapped quasi-atomistic properties are colored as follows: green ) H-bond donor; yellow ) H-bond acceptor; light-brown ) positively charged hydrophobic; darkbrown ) negatively charged hydrophobic; gray ) neutral hydrophobic; purple ) H-bond flip-flop. No positively or negatively charged salt bridges were selected in this model because the 106 agonist molecules used in this study lack charged groups.

Figure 2. Four orientations of coumestrol as identified by the Monte Carlo refinement protocol.

and hydrogen-bonding propensity (Figure 3). The fourth dimension refers to the possibility of representing each ligand molecule as an ensemble of conformations, orientations, and protonation states, thereby reducing the bias in identifying the bioactive conformation and orientation. Within this ensemble, the contribution of an individual entity to the total energy is determined by a normalized Boltzmann weight. Because the manifestation and magnitude of the induced fit may vary for different molecules binding to a target protein, the fifth dimension in Quasar allows for the simultaneous evaluation of up to six different induced-fit protocols. The binding energy is calculated as follows:16,29

Ebinding ) Eligand-receptor - Esolvation,ligand - T∆S Einternal strain - Einduced fit The most recent extension of the Quasar concept to six dimensions (f 6D-QSAR) allows for the simultaneous consideration of different solvation models. This can be achieved explicitly by mapping parts of the surface area with solvent properties16 (position and size are optimized by the genetic algorithm29) or implicitly. Here, the solvation terms (ligand desolvation and

solvent stripping) are independently scaled for each different model within the surrogate family, reflecting varying solvent accessibility. The associated weights are evolved throughout the simulation. Like for the fourth and fifth dimensions, a modest “evolutionary pressure” is applied to achieve convergence. Both approaches have been calibrated by using systems for which the 3D structure is available: the ER (see below), the androgen receptor,31 and carbonic anhydrase.26 While the explicit approach seems to bear more realism, it is with the implicit concept that better results are achieved. This might be explained by the fact that a solvent-accessible surface as a “large body” is more inert to alteration than individual scaling factors for solvation are. Because the ligand molecules used in this study differ substantially in their hydrophilic character and the binding pocket of the ER is partially solvent-accessible,19 our new approach allows the simulation of solvent effects associated with ligand binding in an unbiased fashion, a novelty in QSAR-based receptor modeling. Of the 106 agonist molecules used in our study, 88 defined the training set and the remaining 18 defined the test set. In Quasar, the surrogate family comprised 200 models that were evolved over 40 000 crossover cycles, corresponding to 200 generations. The simulation reached a cross-validated r2 of 0.903 and yielded a predictive r2 of 0.885. The receptor surrogate is depicted in Figure 3. Comparison with the binding site at the true biological receptor (Figure 1) shows that characteristic properties (H-bond acceptors mimicking Glu 353 and His 524, an H-bond donor mimicking Arg 394, and a larger hydrophobic pocket representing Leu 346, Ala 350, Trp 383, Leu 384, Phe 404, Ile 424, Phe 425, and Leu 540) are well-identified by the receptor surrogate. Experimental and calculated IC50 values are compared in Figure 4. The rms deviation for the 88 ligand molecules of the training set of 0.66 kcal/mol corresponds to a factor 2.1 of the experimental IC50 value; the maximal individual deviation is 1.4 kcal/mol (9.6 in

3702

Journal of Medicinal Chemistry, 2005, Vol. 48, No. 11

Letters

Table 1. Comparison of All Quasar Simulations from the 3D to 6D Levels and “Consensus Scoring” in Combination with Raptora training set (n ) 88)

test set (n ) 18)

QSAR level

cross- validated r2

rms deviation

max deviation

predictive r2

rms deviation

max deviation

3D 4D 5D 6D Raptor

0.821 0.810 0.872 0.903 0.902

3.6 4.4 2.9 2.1 1.9

53.5 222 74.6 9.6 10.2

0.563 0.788 0.790 0.885 0.846

10.4 4.5 4.5 2.4 2.4

388 19.7 45.4 5.9 25.6

a All quantities represent values averaged over the 200 receptor models comprising the surrogate family (rms and maximal individual deviation are given as factor of the experimental IC50 value). The deviation factor is calculated as IC50,exp/IC50,calc - 1.0 for IC50,exp/IC50,calc > 1.0 and as IC50,calc/IC50,exp - 1.0 otherwise.

Figure 4. Comparison of experimental and calculated IC50 values. The error bars represent the variations of the prediction over the 200 individual models. Dashed lines mark a factor 10 off the experimental IC50.

IC50). For the 18 test compounds, a predictive r2 of 0.885 was obtained. On the average, the predicted binding affinity of the test ligands deviates by 0.71 kcal/mol (2.4 in IC50) from the experiment; the maximal observed deviation is 1.1 kcal/mol (5.9 in IC50) from the experiment. Five scramble tests29 (mean predictive r2 ) -2.088) demonstrated the sensitivity of the model family toward the biological data. As shown in Table 1, these results cannot be obtained by 3D-QSAR because the docking study identified up to four energetically close binding modes for most of the 106 compounds (see above); choosing the bioactive entity therefrom (mandatory in 3D-QSAR) is hardly possible. In our study, this is due to the uncertainty associated with ligand-receptor interaction energies based on a single representation. Using Boltzmann sampled configurations27 instead of sampling low-energy arrangements would definitely improve the selectivity of the Monte Carlo protocol but would simultaneously lead to a burst of the 4D data set. In contrast, the fourth dimension in Quasar (4D-QSAR) allows for an unbiased identification of the binding mode; the genetic algorithm selected a single entity with a preference p g 0.99 for 32% of the 106 ligand molecules (p g 0.9 for 55%, p g 0.8 for 67%, p g 0.7 for 79%, p g 0.6 for 87%, p g 0.5 for 95%). Likewise, the magnitude and anisotropy of the induced fit cannot be assumed firsthand from proteinmodeling studies. The fifth dimension (5D-QSAR) allows for the simultaneous evaluation of different induced-fit scenarios. The scenario that emerged as the best solu-

tion in Quasar suggests a moderate flexibility of the ER binding pocket, which is in agreement with the Monte Carlo simulations (see above). Finally, 6D-QSAR probes different solvation scenarios. The obtained degree of ligand desolvation of 93.4% is in good agreement with the experimental structure19 where only a small portion around Glu 353 and Arg 394 is found to engage in hydrogen bonding with water molecules (Figure 1). For evaluating the new technology, we simulated the binding at the 3D-QSAR level (by selecting the lowestenergy conformer from the Monte Carlo simulations), 4D-QSAR level (by preselecting a single induced-fit mode), and 5D-QSAR level (by omitting ligand-dependent solvation effects). The results are compared in Table 1 and demonstrate the supremacy of 6D-QSAR. Finally, to allow for consensus scoring, we used an alternative approach for estimating binding affinities: Raptor, a dual-shell multidimensional QSAR concept explicitly allowing for induced fit and featuring a hydrophobicity term in the scoring function.17,30 The results (Table 1) suggest that our Monte Carlo sampling protocol was adequate and demonstrate the robustness of the model family. Because the test set comprising 18 ligand molecules represents a rather small fraction of the whole data, we are presently validating models with a more balanced training-to-test ratio. The first simulation using 80 training and 26 test compounds reached a crossvalidated r2 of 0.895 and yielded a predictive r2 of 0.892. The estrogen receptor model is presently being added to the database of our virtual laboratory, allowing for the identification of harmful effects triggered by drugs and chemicals. Other entries include the aryl hydrocarbon receptor (trained using 91 and tested using 30 compounds; q2 ) 0.816, p2 ) 0.763), the androgen receptor (91 + 26 compounds; q2 ) 0.867, p2 ) 0.754) and cytochrome P450 3A4 (38 + 10 compounds; q2 ) 0.888, p2 ) 0.746).31-33 Acknowledgment. This research was made possible through grants from the Swiss National Science Foundation (Grant No. 405040-104411) and the Jacques en Dolly Gazan Foundation, Zug, Switzerland. Supporting Information Available: Tabular listing of the systematic names and experimental and calculated IC50 values for the 106 ligands used in this study. This material is available free of charge via the Internet at http://pubs.acs.org.

Note Added after ASAP Publication. This manuscript was released ASAP on May 3, 2005, without the factor 10 in the caption of Figure 4 and without the dedication footnote on the title page. The correct version was posted on May 10, 2005.

Letters

References (1) Zubay, G. L.; Parson, W. W.; Vance, D. E. Principles of Biochemistry; Wm. C. Brown Communications, Inc.: Dubuque, IA, 1995. (2) Dibb, S. Swimming in a sea of estrogens, chemical hormone disrupters. Ecologist 1995, 25, 27-31. (3) McLachlan, J. A.; Arnold, S. F. Environmental estrogens. Am. Sci. 1996, 84, 452-461. (4) Guillette, L. J.; Crain, D. A.; Rooney, A. A.; Pickford, D. B. Organization versus activation: the role of endocrine disrupting contaminants EDCs during embryonic development in wildlife. Environ. Health Perspect. 1995, 103 (suppl. 7), 157-164. (5) Colborn, T. Environmental estrogens: health implications for humans and wildlife. Environ. Health Perspect. 1995, 103 (suppl. 7), 135-136. (6) Feldman, D.; Krishnan, A. Estrogens in unexpected places: Possible implications for researchers and consumers. Environ. Health Perspect. 1995, 103 (Suppl. 7), 129-133. (7) Hoare, S. A.; Jobling, S.; Parker, M. G.; Sumpter, J. P.; White, R. Environmental persistent alkylphenolic compounds are estrogenic. Endocrinology 1994, 135, 175-182. (8) Korach, K. S.; Levy, L. A.; Sarver, P. J. Estrogen receptor stereochemistry: receptor binding and hormonal responses. J. Steroid Biochem. 1987, 27, 281-290. (9) Safe Drinking Water Act Amendment; Public Law 104-182 (Section 136); U.S. Environmental Protection Agency, 1996; http://www.epa.gov.safewater/sdwa/ index.html. (10) Food Quality Protection Act; Public Law 104-170 (Section 408); U.S. Food and Drug Administration, 1996; http://www.fda.gov/ opacom/laws/ foodqual/ fqpatoc.htm. (11) URL: http://www.nrp50.ch. (12) Dobler, M.; Lill, M. A.; Vedani, A. From crystal structures and their analysis to the in silico prediction of toxic phenomena. Helv. Chim. Acta 2003, 86, 1554-1568. (13) Vedani, A.; Dobler, M.; Lill, M. A. Internet laboratory for predicting harmful effects triggered by drugs and chemicals. ALTEX 2003, 20, 85-91. (14) URL: http://www.biograf.ch/projects.html. (15) Vedani, A.; Briem, H.; Dobler, M.; Dollinger, H.; McMasters, D. R. Multiple comformation and protonation-state representation in 4D-QSAR: The NK-1 receptor system. J. Med. Chem. 2000, 43, 4416-4427. (16) Vedani, A.; Dobler, M. 5D-QSAR: The key for simulating induced fit? J. Med. Chem. 2002, 45, 2139-2149. (17) Lill, M. A.; Vedani, A.; Dobler, M. Raptor: Combining dual-shell representation, induced-fit simulation, and hydrophobicity scoring in receptor modeling: Application toward the simulation of structurally diverse ligand sets. J. Med. Chem. 2004, 47, 61746186. (18) Vedani, A.; Dollinger, H.; Hasselbach, K.-M.; Dobler, M.; Lill, M. A. Novel ligands for the chemokine receptor-3 (CCR3): A receptor-modeling study based on 5D-QSAR. J. Med. Chem. 2005, 48, 1515-1527.

Journal of Medicinal Chemistry, 2005, Vol. 48, No. 11 3703 (19) Shiau, A. K.; Barstad, D.; Loria, P. M.; Cheng, L.; Kushner, P. J.; Agard, D. A.; Greene, G. L. The structural basis of estrogen receptor/coactivator recognition and the antagonism of this interaction by tamoxifen endogenous estrogen 17β-estradiol (E2) and the synthetic nonsteroidal estrogen diethylstilbestrol (DES). Cell 1998, 95, 927-937. (20) Blair, R. M.; Fang, H.; Branham, W. S.; Hass, B. R.; Dial, S. L.; Moland, C. L.; Tong, W.; Shi, L.; Perkins, R.; Sheehan, D. M. The estrogen receptor binding affinities of 188 natural and xenochemicals: Structural diversity of ligands. Toxicol. Sci. 2000, 54, 138-153. (21) Shi, L. M.; Fang, H.; Tong, W.; Wu, J.; Perkins, R.; Blair, R. M.; Branham, W. S.; Dial, S. L.; Moland, C. L.; Sheehan, D. M. QSAR model using a large diverse set of estrogens. J. Chem. Inf. Comput. Sci. 2001, 41, 186-195. (22) Our study did not include the substantial body of nonbinders reported in refs 20 and 21. It also excluded all ER antagonist molecules. A combined data set (including agonists and antagonists) was reported earlier in this journal.30 (23) Weiner, S. J.; Kollmann, P. A.; Case, D. A.; Singh, U. C.; Ghio, C.; Alagona, G.; Profeta, S., Jr.; Weiner, P. A new force field for molecular-mechanical simulation of nucleic acids and proteins. J. Am. Chem. Soc. 1984, 106, 765-784. (24) Mohamadi, F.; Richards, N. G. J.; Guida, W. C.; Liskamp, R.; Lipton, M.; Caufield, C.; Chang, G.; Hendrickson, T.; Still, W. C. MacroModelsAn integrated software system for modeling organic and bioorganic molecules using molecular mechanics. J. Comput. Chem. 1990, 11, 440-467. (25) Cramer, C. J.; Truhlar, D. G. AM1-SM2 and PM3-SM3 parametrized SCF solvation models for free energies in aqueous solution. J. Comput.-Aided Mol. Des. 1992, 6, 629-666. (26) Vedani, A.; Huhta, D. W. A new force field for modeling metalloproteins. J. Am. Chem. Soc. 1990, 112, 4759-4767. (27) URL: http://www.biograf.ch/PDFS/Yeti.pdf. (28) Depending on the size of the ligand molecule, each of the 106 simulations required 8-12 h of CPU time on a 2.5 GHz Macintosh G5 computer. (29) URL: http://www.biograf.ch/PDFS/Quasar.pdf. (30) Lill, M. A.; Vedani, A.; Dobler, M. Raptor: Raptor: Combining dual-shell representation, induced-fit simulation, and hydrophobicity scoring in receptor modeling: Application toward the simulation of diverse ligand sets. J. Med. Chem. 2004, 47, 61746186. (31) Lill, M. A.; Dobler, M.; Vedani, A. In silico prediction of receptormediated environmental toxic phenomena. Application towards endocrine disruption. SAR QSAR Environ. Res. 2005, 16, 149169. (32) Vedani, A.; Dobler, M.; Lill, M. A. In silico prediction of harmful effects triggered by drugs and chemicals. Toxicol. Appl. Pharmacol., in press. (33) Lill, M. A.; Dobler, M.; Vedani, A. Prediction of small-molecule binding to cytochrome P450 3A4: Flexible docking combined with multidimensional QSAR. J. Am. Chem. Soc., submitted.

JM050185Q