Insight Analysis of Promiscuous Estrogen Receptor Alpha–Ligand

Jul 18, 2018 - Search; Citation; Subject .... of ERα using the pharmacophore ensemble/support vector machine (PhE/SVM) .... BUSINESS CONCENTRATES ...
0 downloads 0 Views 8MB Size
Article Cite This: Chem. Res. Toxicol. 2018, 31, 799−813

pubs.acs.org/crt

Insight Analysis of Promiscuous Estrogen Receptor α‑Ligand Binding by a Novel Machine Learning Scheme Tien-Yi Hou,† Ching-Feng Weng,*,‡ and Max K. Leong*,†,‡ †

Department of Chemistry and ‡Department of Life Science and Institute of Biotechnology, National Dong Hwa University, Shoufeng, Hualien 97401, Taiwan

Chem. Res. Toxicol. 2018.31:799-813. Downloaded from pubs.acs.org by UNIV OF SOUTH DAKOTA on 08/23/18. For personal use only.

S Supporting Information *

ABSTRACT: Estrogen receptor α (ERα) plays a significant role in occurrence of breast cancer and may cause various adverse side-effects when ERα is an offtarget protein. A theoretical model was derived to predict the binding affinity of ERα using the pharmacophore ensemble/support vector machine (PhE/SVM) scheme to consider the promiscuous characteristic of ERα. The estimations by PhE/SVM were discovered to be in good agreement with the observed values for those training molecules (n = 31, r2 = 0.80, q2CV = 0.77, RMSE = 0.57, s = 0.58), test molecules (n = 179, q2 = 0.91−0.96, RMSE = 0.33, s = 0.26) and outliers (n = 15, q2 = 0.80−0.86, RMSE = 0.56, s = 0.49). When subjected to various statistical validations, the PhE/SVM model consistently fulfilled the strictest criteria. A mock test also asserted its predictivity. When compared with crystal structures, the calculated results are consistent with the reported ERα-ligand co-complex structure, and the plasticity nature of ERα is also disclosed. Consequently, this precise, fast, and robust model can be adopted to predict ERαligand binding affinities and to design safer non-ERα-targeted pharmaceuticals in the process of drug discovery and development.



pubertal development, and reproductive behavior.15 As compared with ERβ, ERα profoundly contributes to the pathogenesis and progression of breast cancer,16−18 which is one of the leading forms of malignancy and one of highest causes of cancer-associated mortality among females in America.19 More importantly, ca. 65% of breast cancer patients are ER positive (ER+),20 of which approximately 40% will eventually relapse.21,22 As such, ERα is a vital therapeutic target for the treatment and prevention of breast cancer,23 whereas the role of ERβ in breast cancer is less clear.24,25 In fact, ERα is upregulated and ERβ is downregulated when normal breast tissue becomes cancerous,26 leading to a marked increase in the ERα/ERβ ratio27 that can be further manifested by the predominance of ERα in the breast cancer cell lines, including MCF-7, T47D, and ZRER-75-1.28 Accordingly, ERα is the major therapeutic target for treating ER+ breast cancer.29 More intriguingly, some compounds can play yin-and-yang roles in tumorigenicity and anti-tumorigenicity via different signaling pathways.30 On the other hand, various adverse side effects can be produced by those non-ERα-targeted molecules when they bind to ERα, regardless of their pharmacological nature, viz. agonism/antagonism.31 As such, ERα has been listed as an anti-target protein,32 and ERα-ligand binding is one of the critical absorption, distribution, metabolism, excretion, and toxicity (ADME/Tox) parameters that should be considered in the process of drug discovery and development. More specifically, estrogenicity/anti-estrogenicity of lead compounds

INTRODUCTION Estrogen receptor (ER), one member of the nuclear receptor superfamily of ligand-regulated transcription factors, can be activated by the hormone estrogen to affect a variety of physiological functions in animal and human body.1 It was assumed that ERα was the only ER protein expressed in all estrogen-responsive tissue since it was identified in the 1960s.2 It was not until the mid-1990s that the second subtype, viz. ERβ, was discovered in rats, humans, and mice.3−5 ERα and ERβ, encoded by ESR1 and ESR2 genes, respectively, share 47% and 59% sequence identity and homology, respectively.6,7 Physiologically, the tissue distributions of both subtypes are not exactly the same.8 For instance, both ERα and ERβ can be found in bone, uterus, and pituitary.9−11 Notably, ERα is predominantly found in liver, ovary (thecal cells), breast, prostate (stroma), mammary gland, male reproductive organs (testis and epididymis), and adipose tissue, whereas ERβ is primarily present in the colon, prostate (epithelium), bladder, lung, ovarian granulosa cells and bone marrow.10,12 Moreover, both subtypes are co-expressed in diverse regions of the central nervous system (CNS), namely ERα primarily in ventromedial hypothalamic nucleus, subfornical organ, ventrolateral thalamic nucleus, lateral and basolateral amygdaloid nuclei, dorsal endopiriform nucleus, and arcuate nucleus, whereas ERβ principally in hypothalamus, cerebellum, substantia nigra, pontine nuclei, ventral tegmental area, pineal gland, anterior olfactory nuclei, cerebral and piriform cortex, and hippocampus,13 resulting in different neurological functions.14 ERα is of critical importance to a number of physiological functions, including growth of mammary glands and uterus, © 2018 American Chemical Society

Received: May 20, 2018 Published: July 18, 2018 799

DOI: 10.1021/acs.chemrestox.8b00130 Chem. Res. Toxicol. 2018, 31, 799−813

Article

Chemical Research in Toxicology

vector machine (SVM). Additionally, the regression can also be carried out by other schemes such as average consensus and weighted consensus.50 The PhE/SVM architecture can be shown by Figure 3 of Chen et al.51 More significantly, PhE/ SVM has been extensively employed to accurately model various systems, whose target proteins are greatly promiscuous, including the case studies of human ether-á-go-go related gene (hERG) potassium channel (hERG),52 CYP450s,53,54 human pregnane X receptor (hPXR),51 P-glycoprotein (P-gp),55 as well as breast cancer resistance protein (BCRP).56 This study was aimed at developing an accurate and rapid in silico model using the PhE/SVM scheme to predict ERα-ligand binding affinity for identifying those ERα binders to eliminate the potential adverse side-effects as novel pharmaceuticals.

is of critical importance to prioritization of lead compounds.33,34 Generally, the in vitro assays to determine ERα-ligand binding are time-consuming and labor intensive.35 For instance, it demands a series of steps as well as long incubation time to use the isotope-labeled estradiol (E2) ligand.10 On the contrary, in silico approaches to predict the ERα-ligand binding affinities are swift and cost-efficient.35 In fact, a number of quantitative structure−activity relationship (QSAR) models have been proposed to estimate ERα-ligand binding affinity.36 Nevertheless, it is not unusual to observe that some predictive models were derived using the data based on the assay system with the choice of uterine cytosol,37,38 in which ERα is predominately but not exclusively expressed, leading to data heterogeneity consequently. As such, the integrity of constructed models is severely hampered. Apparently, a sound predictive model can be built only based on homogeneous sample data.39 In addition to QSAR models, a number of ERα specific pharmacophore hypotheses have been proposed.40−42 Nevertheless, it is believed that ERα is highly promiscuous per se43,44 as illustrated by Figure 1, in which the



MATERIALS AND METHODS

Data Compilation. A comprehensive literature search was carried out to retrieve all available IC50 values to maximize the structural diversity. To warrant data consistency and to reduce the assay variations among different research groups, only those molecules assayed by Merck Research Laboratories46,47,57−68 were enrolled in this study since they produced the largest quantity data. All molecules adopted in this investigation, references to the literature, and their corresponding negative logarithm IC50 values, viz. pIC50, are listed in Table S1. It has been suggested that three-dimensional conformations of ligands play a pivotal role in pharmacophore model development.69 As such, all of the molecules selected in this study were initially subjected to full geometry optimization by the density functional theory (DFT) B3LYP method and the basis set LanL2DZ using Gaussian 09 (Gaussian, Wallingford, CT), followed by the generation of stable conformations by use of the mixed Monte Carlo multiple minimum (MCMM)70/low mode71 in combination with the GB/SA hydration algorithm72 by MacroModel (Schrödinger, Portland, OR). During the conformational search, the energy minimization was carried out by the truncated-Newton conjugated gradient method (TNCG) using MMFFs force field,73 and the solvation effect was taken into consideration using the constant dielectric constant of water. No more than 255 distinct conformations were produced for each compound within the energy window of 20 kcal/mol since HypoGen (vide inf ra) imposes both constraints. Unlike any other QSAR schemes that only utilize the most stable conformation, HypoGen takes into account all conformations of each training sample to generate pharmacophore hypotheses. Pharmacophore Generation. Pharmacophore models were produced by the HypoGen module included in Discovery Studio (BIOVIA, San Diego, CA). Principally, HypoGen attempts to associate activities with the spatial arrangement of different chemical characteristics through construction, subtraction, and optimization phases in comparison with any other QSAR schemes,74 which establish the relationship between selected descriptors and activities by linear or nonlinear regression only. The chemical features common to those most active training samples will be recognized to develop pharmacophore hypotheses in the construction phase, and those constructed pharmacophore hypotheses, whose chemical features are shared by the most and the least active molecules, will be removed in the subtraction phase. In the final optimization phase, those remaining hypotheses are slightly perturbed, and their corresponding scores are calculated on the basis of predictive errors as well as complexity level. The detailed theory and algorithm of HypoGen can be found elsewhere.75 Accordingly, the selected chemical features and the activities associated with those training samples are of critical importance in determining the predictivity of a built pharmacophore hypothesis, suggesting that activity span as well as structural diversity should be taken into consideration. More specifically, it is of critical importance to “teach” the program new knowledge by inputting structurally similar compounds with greatly diverse biological activities or

Figure 1. Superposition of ERα proteins in various co-complex structures (PDB codes: chain A of 1X7R, 1XP1, and 2IOG), which are color-coded by purple, magenta, and yellow, respectively.

ERα crystal structures co-complexed with genistein (PDB code: 1X7R),45 (2s,3r)-2-(4-{2-[(3r,4r)-3,4-dimethylpyrrolidin-1-yl]ethoxy}phenyl)-3-(4-hydroxyphenyl)-2,3-dihydro-1,4benzoxathiin-6-ol (PDB code: 1XP1)46 and N-[(1r)-3-(4hydroxyphenyl)-1-methylpropyl]-2-[2-phenyl-6-(2-piperidin1-ylethoxy)-1h-indol-3-yl]acetamide (PDB code: 2IOG)47 are overlaid. These protein conformations display substantial structural distinctions, especially the residues Met343, Thr347, His524, Leu525, and Tyr526, which constitute the putative binding pocket along with other residues. Such a flexible system can be addressed by more sophisticated structure-based methods such as SVM-Pose/SVM-Score combinatorial ensemble docking48 or ensemble-based molecular dynamics (MD)49 that inevitably will decrease the throughput. Such a promiscuous system can be resolved using a novel machine learning-based molecular modeling scheme derived by Leong, in which the protein conformational flexibility as well as the multiple ligand orientations were encoded by a group of pharmacophore hypotheses, viz. pharmacophore ensemble (PhE), in conjunction with regression by support 800

DOI: 10.1021/acs.chemrestox.8b00130 Chem. Res. Toxicol. 2018, 31, 799−813

Chemical Research in Toxicology

ÄÅ ÅÅ 1 Å RMSE = ÅÅÅÅ ÅÅÅ n Ç

structurally distinct compounds with close biological activities. It is highly possible to produce overfitted or overtrained predictive models in case of the existence of any biological or chemical redundancy. As such, the most active, moderately active, and inactive training samples should be included to gain important knowledge on pharmacophore requirements. Ideally, at least 16 molecules should be included in an ideal training set with at least 4−5 orders of magnitude in biological activity, roughly equal sample number in each order of magnitude, and new knowledge regarding structure−activity relationship to warrant its statistical significance.75 After cautious examination of structure−activity relationship of selected compounds, 31 molecules, whose IC50 values extended over 5 orders of magnitude, were deliberately selected as the training samples to develop pharmacophore models. The remaining molecules, whose biological activities spanned over 5 orders of magnitude, were treated as the test samples to verify those produced pharmacophore hypotheses. A few test runs were initially executed to assess the selections of the chemical features, which have been employed by previously reported pharmacophore hypotheses.40−42,76 According to the preliminary calculations, chemical features, namely hydrophobic (HP), hydrophobic aromatic (HPA), ring aromatic (RA), and/or hydrogen-bond donor (HBD), were adopted with the options of various feature combinations and minimum, maximum, and total numbers for each employed chemical feature and total features. In addition, different combinations of different weight and different tolerance hypothesis generation options were also tested to maximize the hypothesis performance and diversity. The construction of pharmacophore ensemble (PhE) commenced with the selection of two pharmacophore models and the developed PhE, consequently, was further subjected to regression calculations by support vector machine (SVM) (vide inf ra) to give rise the final PhE/ SVM model. The PhE was continuously constructed until the PhE/ SVM model executed well. The three- or even four-member ensembles, otherwise, were produced by adopting one or two pharmacophore hypotheses, respectively, if all two-member ensembles revealed poor performance. The criterion of ensemble construction was primarily based on the principle of Occam’s razor by employing the minimal number of predictive models.77 SVM Computation. The pIC50 values estimated by those pharmacophore models in the ensemble were used as SVM inputs for the regression calculations using LIBSVM (software available at https://www.csie.ntu.edu.tw/~cjlin/libsvm/), which is comprised of svm-train and svm-predict modules for developing SVM models using the training samples and validating those generated models by the test samples, respectively. The runtime variables such as regression modes (ε-SVR and ν-SVR), their associated parameters (ε and ν), cost (C), and the width of the radial basis function (RBF) kernel (γ) were systemically computed in a parallel manner by a Perl script.78 Model Validation. The predictivity of generated models was evaluated by several statistic metrics. The correlation coefficients r2 and q2 between the estimated and experimental values in the training and external sets, respectively, for the linear least-squares regression were evaluated by eq 1: n

r 2 , q2 = 1 −

i=1

i=1

ÑÑÑ Ö

(3)

n

∑ |Δi|

(4)

i=1

rm2 = r 2(1 −

|r 2 − ro2| )

(5)

rm′ 2 = r 2(1 −

|r 2 − rm′ 2| )

(6)

1 2 (rm + rm′ 2) 2

⟨rm2 ⟩ =

(7)

Δrm2 = |rm2 − rm′ 2|

(8) (r2o )

and the slope of the regression where the correlation coefficient line (k) were computed from the regression line (estimated vs experimental values) through the origin, whereas ro′2 was computed from the regression line (experimental vs estimated values) through the origin. Additionally, the performance of generated SVM models in the external data set was evaluated by the parameters q2F1, q2F2, and q2F3 and concordance correlation coefficient (CCC) suggested by Shi et al.,38 Schüürmann et al.,81 Consonni et al.,82 and Chirico and Gramatica83 using the QSARINS package:84,85 2 qF1 =1−

2 qF2 =1−

nEXT

nEXT

i=1

i=1

nEXT

nEXT

∑ (yi − yi ̂ )2 ∑ (yi − ⟨yTR ⟩)2

∑ (yi − yi ̂ )2 ∑ (yi − ⟨yEXT ⟩)2

ÄÅ ÅÅ 1 Å 2 qF3 = 1 − ÅÅÅÅ ÅÅÅ nEXT Ç i=1

ÉÑ nEXT ÑÑ 2Ñ ∑ (yi − yi ̂ ) ÑÑÑÑ ÑÑÑ i=1 Ö i=1

ÄÅ ÅÅ 1 ÅÅ ÅÅ ÅÅ n ÅÅÇ TR

ÉÑ n TR ÑÑ 2Ñ ∑ (yi − ⟨yTR ⟩) ÑÑÑÑ ÑÑÑ i=1 Ö

(9)

(10)

(11) CCC n

=

2 ∑i =EXT (yi − ⟨yEXT ⟩)(yi ̂ − ⟨yEXT ̂ ⟩) 1 n

∑i =EXT (yi − ⟨yEXT ⟩)2 + (yi ̂ − ⟨yEXT ̂ ⟩)2 + nEXT(⟨yEXT ⟩ − ⟨yEXT ̂ ⟩)2 1

(12) where nTR and nEXT represent the training sample number and external sample number, respectively, ⟨ŷTR⟩ is the average estimated value in the training set, and ⟨ŷEXT⟩ ⟨ŷEXT⟩ denote the average experimental and estimated values in the external set, respectively. Various validation requirements have been suggested to gauge the predictive model performance. However, a theoretical model can be regarded as predictive if it can collectively fulfill the most stringent criteria suggested by Golbraikh et al.,86 Ojha et al.,80 Roy et al.,87 and Chirico and Gramatica.88

(1)

where yi and ŷi are the estimated and experimental values, respectively, ⟨ŷ⟩ is the average estimated value, and n is the number of samples in the data set. Furthermore, the residual Δi (i.e., the difference between yi and ŷi) was calculated:

Δi = yi − yi ̂

1 n



ÉÑ1/2 ÑÑ

Ñ Δi2ÑÑÑÑ

All generated regression models were further internally tested by 10-fold cross-validation in lieu of leave-one-out due to its better performance,79 producing the correlation coefficient of 10-fold cross validation correlation coefficient q2CV. Additionally, various modified versions of r2 suggested by Ojha et 80 al. were also evaluated:

n

∑ (yi ̂ − yi )2 /∑ (yi − ⟨y ̂⟩)2 i=1

MAE =

n

Article

2 r 2 , qCV , q2 , qF2n ≥ 0.70

(13)

2 |r 2 − qCV | < 0.10

(14)

r 2 − ro2

(2)

r2

The root-mean-square error (RMSE) as well as mean absolute error (MAE) for n samples were evaluated:

< 0.10 and 0.85 ≤ k ≤ 1.15

|ro2 − ro′ 2| < 0.30 801

(15) (16)

DOI: 10.1021/acs.chemrestox.8b00130 Chem. Res. Toxicol. 2018, 31, 799−813

Article

Chemical Research in Toxicology rm2 ≥ 0.65 ⟨rm2 ⟩

≥ 0.65 and

(17) Δrm2

< 0.20

(18) (19)

CCC ≥ 0.85

where r in eqs 14 and 15 symbolizes the parameters r (the training set) and q (the external set), and qFn in eq 13 represents qF1, qF2 and qF3.



RESULTS PhE. Of all built pharmacophore models using different combinations of chemical features as well as runtime parameters, two pharmacophore models, symbolized by Hypo A and Hypo B, were enlisted to build PhE based on their predictive performances (Table S1) and statistical evaluations in the training set and test set (Tables 1 and 2, Table 1. Statistic Parameters Correlation Coefficient (r2), Maximum Residual (Δmax), Mean Absolute Error (MAE), Standard Deviation (s), RMSE, and Cross-Validation Coefficient q2CV Evaluated by Hypo A, Hypo B, and PhE/ SVM in the Training Set r2 Δmax MAE s RMSE q2CV

Hypo A

Hypo B

PhE/SVM

0.77 1.26 0.52 0.62 0.61 N/Aa

0.73 1.13 0.59 0.68 0.67 N/Aa

0.80 1.07 0.48 0.58 0.57 0.77

Figure 2. Pharmacophore models in the ensemble. Generated pharmacophore models (A) Hypo A and (B) Hypo B, in which hydrophobic, hydrogen-bond donor, hydrophobic aromatic, and ring aromatic chemical features are represented by light blue, magenta, blue, and orange circles, respectively. The interfeature distances and angles among features, depicted in black, are measured in angstroms and degrees, respectively.

a

Not applicable.

Table 2. Statistic Parameters Correlation Coefficients q2, q2F1, q2F2, and q2F3, Concordance Correlation Coefficient (CCC), Maximum Residual (Δmax), Mean Absolute Error (MAE), Standard Deviation (s), and RMSE Evaluated by Hypo A, Hypo B, and PhE/SVM in the Test Set q2 q2F1 q2F2 q2F3 CCC Δmax MAE s RMSE

Hypo A

Hypo B

PhE/SVM

0.94 0.94 0.93 0.94 0.96 1.34 0.13 0.29 0.29

0.92 0.93 0.92 0.92 0.95 1.56 0.15 0.29 0.33

0.95 0.93 0.91 0.93 0.96 1.10 0.26 0.26 0.33

features can be demonstrated by the overlay of both models as displayed in Figure 3. When fitted to 4, which is a potent

respectively). Both candidate hypotheses in the PhE comprise of various combinations of chemical features, namely one HBD, two HPs, and one RA in Hypo A; and one HBD, one HPA, and two HPs in Hypo B. Tables S2 and S3 summarize the characteristics of both hypotheses, namely three-dimensional coordinates, weights, tolerances, and interfeature distances. Both pharmacophore models are spatially positioned unequally in addition to various combinations of chemical features as shown by Figure 2. However, one HBD and two HPs are common to both models. The shortest distance between HBD and HP in Hypo A is 4.82 Å, but that increases to 6.49 Å in Hypo B. The distinct differences between both pharmacophore hypotheses in space as well as chemical

Figure 3. Superposition of two pharmacophore models Hypo A and Hypo B, denoted in blue and red, respectively.

binder selected in this study (Table S1), the modest absolute residuals of 0.43 and 0.07 were yielded by Hypo A and Hypo B, respectively, whereas 4 used different orientations to produce the best fit in both pharmacophore models as displayed in parts A and B of Figure 4, proclaiming multiple orientations of 4 when laid in the ERα binding pocket that, consequently, can lead to different ligand−protein interactions. The discrepancies become more significant by the overlay of both conformations as illustrated in part C of Figure 4, 802

DOI: 10.1021/acs.chemrestox.8b00130 Chem. Res. Toxicol. 2018, 31, 799−813

Article

Chemical Research in Toxicology

prediction of 5 by Hypo B differed from the experimental value with an absolute error of 1.11, whose absolute residual was only 0.34 by Hypo A. Conversely, the absolute predictive errors of 7 by Hypo A and Hypo B were only 0.30 and 0.24, respectively. Such predictive discrepancies between both models in the ensemble indicate that no single pharmacophore hypotheses performed better or worse than the other for all training molecules. It can be discovered from Figure 5, which exhibits the scatter plot of experimental vs estimated pIC50 values for all training

Figure 5. Observed pIC50 vs the pIC50 predicted by Hypo A, Hypo B, and PhE/SVM for those molecules in the training set. The solid line, dashed lines, and dotted lines correspond to the SVM regression of the data, 95% confidence interval for the SVM regression, and 95% confidence interval for the prediction, respectively.

molecules, that the predictions by Hypo A and Hypo B are mostly in agreement with experimental values in the training set as further manifested by those little statistical parameters, namely RMSE, MAE, and s (Table 1). However, they generated correlation coefficients (r2) values of around 0.7, stating their modest predictivity. When applied to the 179 test samples, both models in the ensemble executed marginally better than they did in the training set as manifested by the statistical assessments RMSE, MAE, and s (Table 2). The RMSE values, for instance, were 0.61 and 0.67 by Hypo A and Hypo B in the training set (Table 1), respectively, and consistently dropped to 0.29 and 0.33 in the test set even though there were more molecules in the test set. These seemingly unconventional observations can be plausible due to the fact that the pIC50 values of ca. 70% of test samples were more than 8 (Figure 6) in comparison with their counterparts in the training set. However, their absolute maximum predictive errors increased from 1.26 and 1.13 in the training set to 1.34 and 1.56 in the test set, respectively, indicating their performance deterioration. Prediction discrepancies between both pharmacophore hypotheses discovered in the training set also can be discovered in the test set as exhibited in Figure 6. The discrepancies in prediction between both pharmacophore hypotheses discovered in the training set can also be discovered in the test set. The prediction of 146 by Hypo B, for instance, yielded the maximum absolute deviation of 1.56, but Hypo A only produced an absolute error of 0.77. Likewise, the absolute predictive deviation of 120 by Hypo B was only 0.88, whereas Hypo A yielded an absolute residual of 1.24. The predictions by Hypo A and Hypo B are also generally in agreement with experimental values for the test molecules

Figure 4. Pharmacophore models (A) Hypo A and (B) Hypo B fitted to 4 and (C) overlay of both models, which are color-coded by blue and red, respectively. The chemical features are described in Figure 2.

implying the necessity of constructing an ensemble to take into consideration the ERα conformational plasticity. The maximum predictive error (Δmax) produced by Hypo A in the training set was 1.26 when predicting 29, whereas Hypo B only yielded an absolute deviation of 0.76 (Table S1). The 803

DOI: 10.1021/acs.chemrestox.8b00130 Chem. Res. Toxicol. 2018, 31, 799−813

Article

Chemical Research in Toxicology

differences between r2 and q2 and between r2 and q2CV affirm that PhE/SVM is a well-trained predictive model because it will otherwise produce at least one considerable difference in the event of overtraining. Robustness Assessment. Generally, it is not trivial to identify those outliers from the sample pool and eliminate them from model development since they denote mathematical extrapolations per se.89 It has been pointed that the detection of outliers can be realized by analyzing the sample dispersions in the chemical space.89 Nevertheless, 15 molecules were intentionally adopted as the outlier set to further evaluate the extrapolation capacity of PhE/SVM since they are very dissimilar to the training samples as illustrated by Figure 7, Figure 6. Observed pIC50 vs the pIC50 predicted by Hypo A, Hypo B, and PhE/SVM for those molecules in the test set. The solid line, dashed lines, and dotted lines correspond to the SVM regression of the data, 95% confidence interval for the SVM regression, and 95% confidence interval for the prediction, respectively.

(Table S1). Thus, it can be assured that Hypo A and Hypo B are good candidate models to construct the ensemble based on their performances in the training set and test set as well as their statistical assessments (vide supra). PhE/SVM. The final PhE/SVM model was constructed by regression of both pharmacophore hypotheses selected in the ensemble. Table S4 details the runtime parameters for producing the optimal SVM, which was selected on the basis of its performance in the training set (Table S1) and crossvalidation (Table 1). It can be learned from Figure 5 that the PhE/SVM model predicted those molecules in the training set generally better than either one of two pharmacophore hypotheses in the ensemble and worse than the other, resulting in the fact that PhE/SVM yielded medium residuals as compared with its counterparts in the PhE for most of the molecules in the training set. However, PhE/SVM produced the minimal predictive deviations in some cases. For example, the prediction of 7 by PhE/SVM deviated from the experimental value by an absolute value of 0.05, but Hypo A and Hypo B generated absolute errors of 0.30 and 0.24, respectively. When compared with both models in the ensemble, PhE/ SVM yielded the greatest r2 and the least RMSE, Δmax, MAE, and s in the training set (Table 1), suggesting that PhE/SVM performed better than Hypo and Hypo (Figure 5). PhE/SVM gave rise to a q2CV of 0.77 as compared with its r2 of 0.80 in the training set when subjected to the 10-fold internal crossvalidation. The large values of r2 and q2CV as well as insignificant difference between both parameters assert the fact that PhE/ SVM is a statistically significant and authentic model. Similar to Hypo A and Hypo B, PhE/SVM also showed slightly better performance when applied to the test samples as manifested by q2, RMSE, MAE, and s (Table 2). Like those observations discovered in the training set, PhE/SVM produced the greatest r2 and least Δmax and s in the test set. In fact, PhE/SVM sometimes yielded the least absolute residuals in the test set. For instance, the estimations of 67 by Hypo A, Hypo B, and PhE/SVM generated the absolute residuals of 1.34, 1.20, and 1.10, respectively, despite it was the maximum absolute residual given rise by PhE/SVM in the test set. Hence, it can be assured that PhE/SVM executed better than either of predictive models in the PhE in the test set as illustrated by Figure 6. More prominently, these marginal

Figure 7. Molecular distribution for those samples in the training set (black circle), test set (white triangle), and outlier set (gray square) in the chemical space spanned by three principal components.

which exhibits the projection of all selected molecules in the chemical space, spanned by the first three principal components, which interpret 99.2% of the variance in the primitive data. It can be discovered that the outliers are totally positioned outside the boundary of the training set, indicating their high level of dissimilarity and yielding a good robustness measure of PhE/SVM.90 In fact, the structurally unique characteristics of those outliers can be shown by empirical observation that they have the smallest polar surface areas, the largest number of rotable bonds (nrot) when the number of hydrogen-bond acceptor (HBA) is two, or the largest number of ring when nrot is one in contrast to those training samples. The predictions in the outlier set and their related statistical assessments are list in Tables S1 and 3, respectively, and the resulting scatter plot is exhibited in Figure 8. The predictions by PhE/SVM are consistent with experimental values for all outliers as indicated by MAE (0.41), s (0.49), and RMSE (0.56), which are less than their equivalents in the training set. Additionally, the parameters q2, q2F1, q2F2, q2F3, and CCC are equal to or larger than r2, suggesting that the PhE/SVM model is very insusceptible to the outliers and PhE/SVM is a very robust, which is of critical significance to real applications when applied to novel compounds. Predictive Assessments. It can be discovered from Figure 9, which exhibits the scatter plots of the predictive errors vs the pIC50 values predicted by the PhE/SVM model for all molecules (i.e., training samples, test samples, and outliers) that the residuals are roughly evenly dispersed on both sides of x-axis. Accordingly, PhE/SVM yielded the average errors of 804

DOI: 10.1021/acs.chemrestox.8b00130 Chem. Res. Toxicol. 2018, 31, 799−813

Article

Chemical Research in Toxicology Table 3. Statistic Parameters Correlation Coefficients q2, q2F1, q2F2, and q2F3, Concordance Correlation Coefficient (CCC), Maximum Residual (Δmax), Mean Absolute Error (MAE), Standard Deviation (s), and RMSE Evaluated by PhE/SVM in the Outlier Set

Table 4. Validation Verification of PhE/SVM Based on Prediction Performance of Those Molecules in the Training Set, Test Set, and Outlier Set training set n r2o k ro′2 r2m r′m2 ⟨r2m⟩ Δr2m r2,q2CV, q2,q2Fn ≥ 0.70 |r2 − q2CV| < 0.10 (r2 − r2o )/r2 < 0.10 and 0.85 ≤ k ≤ 1.15 |r2o − qo′2| < 0.30 r2m ≥ 0.65 ⟨r2m⟩ ≥ 0.65 and Δr2m < 0.20 CCC > 0.85

PhE/SVM q2 q2F1 q2F2 q2F3 CCC Δmax MAE s RMSE

0.82 0.84 0.80 0.80 0.86 1.33 0.41 0.49 0.56

a

31 0.80 0.99 0.76 0.78 0.63 0.71 0.15 × × × × × × N/Aa

test set

outlier set

179 0.94 0.98 0.94 0.91 0.87 0.89 0.04 × N/Aa ×

15 0.82 1.05 0.77 0.80 0.64 0.72 0.16 × N/Aa ×

× × × ×

× × × ×

Not applicable.

fulfilled all validation requirements, denoting the high level of accuracy and predictivity in PhE/SVM. Mock Challenge. To mimic the realistic challenge, the developed PhE/SVM model was further examined by a number of molecules measured by Kuiper et al.,10 of which, three were also utilized in this study, providing a good means to calibrate the challenge system. Nevertheless, Kuiper et al. measured their relative binding affinity (RBA) values in contrast to the compounds selected in this study, whose IC50 values were assayed, suggesting that those compounds assayed by Kuiper et al. are not qualified as the second external set or test set since those validation criteria (vide supra) are not applicable to these compounds. To exclude the discrepancy between both measured systems (i.e. negative logarithm of observed RBA vs negative logarithm of observed IC50), the subsequent correlation was established based on those common three molecules.

Figure 8. Observed pIC50 vs the pIC50 predicted by Hypo A, Hypo B, and PhE/SVM for those molecules in the outlier set. The solid line, dashed lines, and dotted lines correspond to the SVM regression of the data, 95% confidence interval for the SVM regression, and 95% confidence interval for the prediction, respectively.

pRBA = − 0.7027 × pIC50 + 13.036 n = 3, r 2 = 0.76

(20)

The r2 value of 0.76 indicates that both assay systems were appropriately correlated with each other. Hence, it is plausible to contest PhE/SVM with those seven novel molecules. Figure 10 displays the tested results of seven novel marketed drugs. It can be discovered that the r2 value between experimental pRBA vs estimated pIC50 was 0.80. The small distinction between both numbers (0.76 vs 0.80) denotes that the estimations by PhE/SVM can nearly duplicate the experimental observations. Thus, this mock challenge by seven marketed drugs clearly affirmed the predictivity of PhE/SVM.

Figure 9. Residual vs the pIC50 predicted by PhE/SVM in the training set (solid circles), test set (open triangles), and outlier set (gray squares).

−0.05, −0.20, and 0.30 in the training set, test set, and outlier set, respectively (Table S1). These insignificant numbers suggest that there is no systematic error linked to PhE/SVM. The predictivity of derived PhE/SVM was further evaluated by those validation criteria collectively proposed by Golbraikh et al.,86 Ojha et al.,80 Roy et al.,87 and Chirico and Gramatica88 in the tree data sets (i.e., training set, test set, and outlier set) (eqs 13−19)). The results are listed in Table 4, from which it can be discovered that PhE/SVM exhibited very high levels of predictivity in those three data sets. Additionally, HSVR



DISCUSSION It is plausible to verify the quality of a generated pharmacophore by inspecting how well the mapped conformations can reproduce the crystal structures.41 When mapped to the bound ligand in the chain F of the co-complex structure (PDB code: 1ERE), the free E2 (8) conformation 805

DOI: 10.1021/acs.chemrestox.8b00130 Chem. Res. Toxicol. 2018, 31, 799−813

Article

Chemical Research in Toxicology

Figure 12. Alignment of the CM3 conformation mapped by Hypo A with bound CM3 in the ERα-ligand co-complex structure (PDB code: 1YIN). The protein and bound CM3 are represented in purple and green, respectively. The chemical features are described in Figure 2.

Figure 10. Observed pRBA vs the pIC50 predicted by PhE/SVM based on seven marketed drugs.

Table 5. Summary of Developed ERα Binding Affinity Quantitative Pharmacophore Hypotheses

mapped by Hypo A completely coincided with the bound one as displayed in Figure 11, indicating the high level of

n model type

training set

test set

external set

chemical features

15

1HBA, 2HPs, 1RA 1HBA, 1HBD, 2HPAs, 1HP 1HBA, 1HBD, 3HPs 1HBA, 1HBD, 3HPs 1HBD, 2HPs, 1RA 1HBD, 2HPs, 1HPA

HypoGen

25

9

HypoGen

24

29

HypoGen

23

51

HypoGen

22

10

HypoGen (Hypo A) HypoGen (Hypo B)

31

179

15

31

179

15

ref Mukherjee et al. 2007 Brogi et al. 2009 Fang et al. 2011 Islam et al. 2016 this study this study

Figure 11. Alignment of the E2 conformation mapped by Hypo A with bound E2 in the F chain of the ERα-ligand co-complex structure (PDB code: 1ERE). The bound E2, RA feature, and Phe404 are represented in green, orange, and purple, respectively.

reproducibility of ligand conformation. More importantly, such agreement between free and bound ligand conformations can be found for not only those small and rigid ligands but also larger and more flexible ones. For instance, little deviation can be observed between the free (2R,3R,4S)-5-fluoro-3-(4hydroxy-phenyl)-4-methyl-2-[4-(2-piperidin-1-yl-ethoxy)-phenyl]-chroman-6-ol or CM3 (208) conformation mapped by Hypo A and the bound one in the co-complex structure (PDB code: 1YIN) as shown in Figure 12. Accordingly, the little deviations between the mapped conformations and bound ones can be plausibly due to the better algorithm to generate ligand conformational ensembles in conjunction with a proper solvation treatment as well as more sophisticated conformation mapping methods implemented in the HypoGen module (vide supra). Both pharmacophore models in PhE are composed of distinct numbers and types of chemical features, indicating that not all ligands can interact with ERα in the same way. This coincides with the fact that different reported pharmacophore hypotheses included distinct combinations of chemical features (Table 5). Furthermore, it can be discovered from Figure 13,

Figure 13. Binding interactions between E2 and ERα based on the crystal structure of the chain A of the co-complex structure (PDB: 1ERE). Figure generated by LigPlot+.

which exhibits the interactions between ERα-E2 by LigPlot+ (available at http://www.ebi.ac.uk/thornton-srv/software/ LigPlus/) based on the ERα-E2 co-complex structure (PDB code: 1YIN), that both hydroxyl groups of E2 can bind with 806

DOI: 10.1021/acs.chemrestox.8b00130 Chem. Res. Toxicol. 2018, 31, 799−813

Article

Chemical Research in Toxicology ERα by hydrogen bonds, indicating the necessity of both interactions to exert the ERα-ligand interactions. Consequently, the predictive models proposed by Brogi et al.,41 Fang et al.,42 and Islam et al.76 unanimously adopted one HBA and one HBD chemical features. However, 5, 87, and 88, which only have one hydroxyl group, are strong ERα binders, suggesting that only one hydrogen-bond interaction is sufficient enough to establish strong binding with ERα that, actually, is totally consistent with the postulation made by Katzenellenbogen.91 As such, the predictive models proposed by Mukherjee et al.40 adopted one HBA chemical feature as well as Hypo A, and Hypo B derived in this study used one HBD chemical feature to take into account the only hydrogenbond interaction. Furthermore, the discrepancy in HBA/HBD selection can be resolved by the fact that both chemical features, in fact, are interchangeable as observed.92 It is seemingly unusual to observe that only the model derived by Mukherjee et al.40 and Hypo A developed in this study selected the chemical feature RA among all published models. This inconsistency can be manifested by Figure 11, which displays the alignment of the E2 conformation mapped by Hypo A with bound E2 in the F chain of the ERα-ligand cocomplex structure (PDB code: 1ERE). It can be observed that the aromatic ring of E2 can interact with that of Phe404 by π−π stacking, implying the necessity of selecting RA to take into consideration such protein−ligand interaction. The predictive hypothesis built by Brogi et al.41 selected 2 HPAs and 1 HP, whereas the ones proposed by Fang et al.42 and Islam et al.76 adopted 3HPs. Furthermore, Hypo A and Hypo B included 2 HPs and 1 RA as well as 2 HPs and 1 HPA, respectively. The differences in chemical feature type and number seemingly indicate the inconsistency among these five theoretical models that, nevertheless, can be eliminated once the chemical features RA and HPA are substituted by HP as proposed Wolber et al.,93 yielding totally three HP features. As such, these three other models can reproduce the one proposed by Fang et al.42 and Islam et al.,76 and, moreover, these five pharmacophore hypotheses are in qualitative agreement with one another in terms of hydrophobic interactions. Qualitatively, the model derived by Mukherjee et al.40 and Hypo A are alike once the chemical feature HBA in their model was substituted by HBD (vide supra).92 Conversely, the pharmacophore hypothesis proposed by Mukherjee et al.40 is not quantitatively consistent with either one of hypotheses in the PhE. Once those chemical features selected by both pharmacophore models in the PhE are taken into consideration individually, this quantitative discrepancy between PhE and the model built by Mukherjee et al.40 can be exonerated. For example, the length between HBA and RA in the model of Mukherjee et al.40 is 7.08 Å (Figure 14A), which is in agreement with the length between HBD and RA in Hypo A (8.36 ± 3.20 Å) (Figure 14B). The close difference between both distances is due to the tolerances related to each chemical feature (i.e. the radius of each sphere). Likewise, the distances between 1 HP and 1 HBA as well as that between that 1 RA and 1 HBA in their model are 8.50 and 7.08 Å, respectively, which are in reasonable ranges of the interfeature distances between 1 HBD and 1 HPA in Hypo B (8.53 ± 3.20 Å) and that between 1 HBD and 1 RA in Hypo A (8.36 ± 3.20 Å), respectively. The geometric constraints for the other interfeature distances are also very similar as illustrated in Figure 14. Accordingly, it can be asserted that neither of

Figure 14. Geometrical relationships in the pharmacophore models (A) proposed by Mukherjee et al. and (B) excerpted from the PhE in this study. The interfeature distances are measured in angstroms.

models in the PhE showed great quantitative similarity to the model derived by Mukherjee et al.40 when the chemical features related to each pharmacophore hypotheses were handled in an integrated manner, and, conversely, high degrees of quantitative similarity between the model of Mukherjee et al.40 and both models in the PhE were generated once those chemical features were handled individually. The similar inspections can also be discovered in the comparisons between the model reported by Fang et al.42 and both models in the PhE. For instance, the distances between 1 HBA and 3 HPs are 6.55, 2.62, and 8.46 Å in the model of Fang et al.42 (Figure 15A), which are similar to the interfeature distances between 1 HBD and 1 HP in Hypo A (8.36 ± 3.20 Å), between 1 HBD and 1 HP in Hypo A (4.82 ± 3.20 Å), and between 1 HBD and 1 HPA in Hypo B (8.53 ± 3.20 Å), respectively (Figure 15B). Additionally, the three interfeature lengths between 1 HBD and 3 HPs in the model of Fang et 807

DOI: 10.1021/acs.chemrestox.8b00130 Chem. Res. Toxicol. 2018, 31, 799−813

Article

Chemical Research in Toxicology

Figure 15. Geometrical relationships in the pharmacophore models (A) proposed by Fang et al. and (B) excerpted from the PhE in this study. The interfeature distances are measured in angstroms.

Figure 16. Geometrical relationships in the pharmacophore models (A) proposed by Islam et al. and (B) excerpted from the PhE in this study. The interfeature distances are measured in angstroms.

al.42 can also be mapped to their counterparts in the PhE within reasonable margins as displayed in Figure 15. Recently, Islam et al.76 have published a pharmacophore hypothesis, which is qualitatively identical to that proposed by Fang et al.42 (Table 5). Nevertheless, quantitative differences between both models can be observed (Figures 15A vs 16A). However, their models can be regenerated by reassembling some of chemical features from the ensemble. For instance, the lengths between 1 HBA and 3 HPs are 6.27, 6.90, and 7.93 Å in their model (Figure 16A), which correspond to the interfeature distances between 1 HBD and 1 HP in Hypo B (6.49 ± 3.20 Å), between 1 HBD and 1 RA in Hypo A (8.36 ± 3.20 Å), and between 1 HBD and 1 HPA in Hypo B (8.53 ± 3.20 Å), respectively. In addition to HBD and hydophobe constrains, the similar geometrical analogies can also be observed in the other interfeature distances. For instance, the interfeature distances between two pairs of HP groups are 4.78 and 3.17 Å in their model (Figure 16A), which only differ from the interfeatures between 1 HP and 1 RA in Hypo A (4.11 ±

3.20 Å) as well as 1 HP and 1 HPA in Hypo B (3.04 ± 3.20 Å), respectively, by small margins (Figure 16B). Thus, the predictive model of Islam et al.76 can be replicated by reassembling certain chemical features constructed in Hypo A or Hypo B. Thus, these inconsistencies among published in silico models qualitatively and quantitatively can be possibly due to the promiscuous nature of ERα, resulting in different protein− ligand interactions. As such, no single theoretical model can comprehensively cover the ample protein conformational space, otherwise marked predictive errors can be generated. However, such a complicated system can be suitably modeled by the PhE/SVM scheme since it is designated to address such critical factors that would be otherwise unsuccessfully addressed by any other analogue-based molecular modeling schemes.52 It can be argued that such a complicated system actually can also be accurately modeled by the most sophisticated structure-based SVM-pose/SVM-score combinatorial ensem808

DOI: 10.1021/acs.chemrestox.8b00130 Chem. Res. Toxicol. 2018, 31, 799−813

Article

Chemical Research in Toxicology ble docking,48 for instance. Such assertion is correct only if all published crystal structures can comprehensively cover the protein conformation space, otherwise predictive errors will be produced. For instance, 12, whose molecular volume was the biggest among 225 molecules selected in this study, was rigidly aligned against bound N-[(1R)-3-(4-hydroxyphenyl)-1-methylpropyl]-2-[2-phenyl-6-(2-piperidin-1-ylethoxy)-1h-indol-3yl]acetamide or IOG (129) in the co-complex structure (PDB code: 2IOG), whose binding pocket was bigger than any other published wild-type crystal structures,45−47,57,94−107 as exhibited in Figure 17. It can be discovered that the molecular

swiftly estimate the ERα binding can be especially valuable to drug discovery and development. A theoretical model was built to estimate the ERα-ligand binding using the pharmacophore ensemble/support vector machine scheme to address the promiscuous characteristic of ERα that otherwise cannot be properly rendered by any other analogue-based molecular modeling schemes when applied to structurally dissimilar ligands. This derived PhE/SVM model showed good predictive accuracy for those structurally diverse 31 training samples and 179 test samples, respectively, with exceptional predictivity and statistical significance. In addition, its robustness was further assessed by 15 outliers, which were structurally different from the training samples. When mock challenged by a number of marketed drugs to simulate the real application, PhE/SVM also exhibited good performance. In addition, this theoretical model can justify the discrepancies in hitherto published pharmacophore hypotheses and uncovered a possible novel protein conformation that was not addressed by any published co-complex crystal structures. Thus, it can be pledged that this accurate, predictive, and rapid PhE/SVM model can be utilized to predict ERα-ligand binding to facilitate and accelerate the drug discovery and development.



ASSOCIATED CONTENT

* Supporting Information S

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.chemrestox.8b00130. Table S1. Selected compounds for this study, their SMILES strings, names, observed pIC50 values and predicted values by Hypo A, Hypo B, and PhE/SVM, data partitions, and references. Table S2. Weights, tolerances, three-dimensional coordinates of chemical features, and interfeature distances of pharmacophore model Hypo A. Table S3. Weights, tolerances, threedimensional coordinates of chemical features, and interfeature distances of pharmacophore model Hypo B. Table S4. Optimal runtime parameters for the SVM model (XLSX)

Figure 17. Alignment of 12 with IOG in the ERα-ligand co-complex structures (PDB code: 2IOG). Thr347, Asp351, Cys530, and Lys531 are represented by purple, and IOG is depicted in green. The green meshed lobe shows the molecular volume of 12.

volume of aligned 12 bumps into Thr347, Asp351, Cys530, and Lys531, indicating the fact that the binding pocket of ERα-IOG is not large enough to accommodate 12 despite the fact that it has the biggest pocket and it is of necessity for ERα to adopt a new protein conformation with a more spacious binding pocket so that the more bulky 12 can be accommodated. In addition to crystallization, the more time-consuming MD calculations can be used to circumvent this difficulty. Nevertheless, MD cannot be carried out in a high-throughput fashion, making it impractical for routine applications. PhE/ SVM, conversely, can swiftly and accurately model such a promiscuous system without the a priori knowledge about protein conformation. It is of interest to observe that only the antagonism of those molecules synthesized by Merck Research Laboratories was explored. Agonism of ERα ligands is also critically important in toxicogenicity, and, more importantly, the conformation of antagonist-bound ERα is substantially different from that of agonist-bound one.108 Hence, those compounds synthesized and assayed by Nwachukwu et al.109 will be adopted to further investigate the interactions between agonists and ERα in conjunction with other novel machine-learning schemes, namely deep neural networks and extreme learning machines, for example.110



AUTHOR INFORMATION

Corresponding Authors

*E-mail: [email protected]. Phone: +886.3.890.3609. Fax: +886.3.890.0162. *E-mail: [email protected]. Phone: +886.3.890.3637. Fax: +886.3.890.0163. ORCID

Ching-Feng Weng: 0000-0002-9747-8697 Max K. Leong: 0000-0002-6927-1517 Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This work was supported by the Ministry of Science and Technology, Taiwan. Parts of calculations were performed at the National Center for High-Performance Computing, Taiwan. The authors are grateful to Prof. Paola Gramatica for providing a free license of QSARINS.



CONCLUSIONS ERα is a promising therapeutic target for treating ER+ breast cancer and an anti-target protein for reducing the potential adverse side-effects. An in silico model that can accurately and 809

DOI: 10.1021/acs.chemrestox.8b00130 Chem. Res. Toxicol. 2018, 31, 799−813

Article

Chemical Research in Toxicology



(20) Finn, R. S., Crown, J. P., Lang, I., Boer, K., Bondarenko, I. M., Kulyk, S. O., Ettl, J., Patel, R., Pinter, T., Schmidt, M., Shparyk, Y., Thummala, A. R., Voytko, N. L., Fowst, C., Huang, X., Kim, S. T., Randolph, S., and Slamon, D. J. (2015) The cyclin-dependent kinase 4/6 inhibitor palbociclib in combination with letrozole versus letrozole alone as first-line treatment of oestrogen receptor-positive, HER2-negative, advanced breast cancer (PALOMA-1/TRIO-18): a randomised phase 2 study. Lancet Oncol. 16, 25−35. (21) Ma, C. X., Sanchez, C. G., and Ellis, M. J. (2009) Predicting endocrine therapy responsiveness in breast cancer. Oncology 23, 133− 142. (22) Colleoni, M., Bagnardi, V., Rotmensz, N., Gelber, R. D., Viale, G., Pruneri, G., Veronesi, P., Torrisi, R., Cardillo, A., Montagna, E., Campagnoli, E., Luini, A., Intra, M., Galimberti, V., Scarano, E., Peruzzotti, G., and Goldhirsch, A. (2009) Increasing steroid hormone receptors expression defines breast cancer subtypes non responsive to preoperative chemotherapy. Breast Cancer Res. Treat. 116, 359−369. (23) Osborne, C. K., Schiff, R., Fuqua, S. A. W., and Shou, J. (2001) Estrogen Receptor: Current Understanding of Its Activation and Modulation. Clin. Cancer Res. 7, 4338s−4342s. (24) Palmieri, C., Cheng, G. J., Saji, S., Zelada-Hedman, M., Wärri, A., Weihua, Z., Van Noorden, S., Wahlstrom, T., Coombes, R. C., Warner, M., and Gustafsson, J.-A. (2002) Estrogen receptor beta in breast cancer. Endocr.-Relat. Cancer 9, 1−13. (25) Gustafsson, J.-Å., and Warner, M. (2000) Estrogen receptor β in the breast: role in estrogen responsiveness and development of breast cancer. J. Steroid Biochem. Mol. Biol. 74, 245−248. (26) Roger, P., Sahla, M. E., Mäkelä, S., Gustafsson, J. Å., Baldet, P., and Rochefort, H. (2001) Decreased Expression of Estrogen Receptor β Protein in Proliferative Preinvasive Mammary Tumors. Cancer Res. 61, 2537−2541. (27) Leygue, E., Dotzlaw, H., Watson, P. H., and Murphy, L. C. (1998) Altered Estrogen Receptor α and β Messenger RNA Expression during Human Breast Tumorigenesis. Cancer Res. 58, 3197−3201. (28) Mukhopadhyay, K. D., Liu, Z., Bandyopadhyay, A., Kirma, N. B., Tekmal, R. R., Wang, S., and Sun, L.-Z. (2015) Aromatase Expression Increases the Survival and Malignancy of Estrogen Receptor Positive Breast Cancer Cells. PLoS One 10, e0121136. (29) Zhu, Y., Wang, A., Liu, M. C., Zwart, A., Lee, R. Y., Gallagher, A., Wang, Y., Miller, W. R., Dixon, J. M., and Clarke, R. (2006) Estrogen receptor alpha positive breast tumors and breast cancer cell lines share similarities in their transcriptome data structures. Int. J. Oncol. 29, 1581−1590. (30) Gérard, C., Mestdagt, M., Tskitishvili, E., Communal, L., Gompel, A., Silva, E., Arnal, J.-F., Lenfant, F., Noel, A., Foidart, J.-M., and Péqueux, C. (2015) Combined estrogenic and anti-estrogenic properties of estetrol on breast cancer may provide a safe therapeutic window for the treatment of menopausal symptoms. Oncotarget 6, 17621−17636. (31) Zakharov, A., and Lagunin, A. (2014) Computational Toxicology in Drug Discovery: Opportunities and Limitations, In Application of Computational Techniques in Pharmacy and Medicine (Gorb, L., Kuz’min, V., and Muratov, E., Eds.) pp 325−367, Springer, Netherlands, Dordrecht. (32) Zakharov, A. V., Lagunin, A. A., Filimonov, D. A., and Poroikov, V. V. (2012) Quantitative Prediction of Antitarget Interaction Profiles for Chemical Compounds. Chem. Res. Toxicol. 25, 2378−2385. (33) Ivanov, S. M., Lagunin, A. A., and Poroikov, V. V. (2016) In silico assessment of adverse drug reactions and associated mechanisms. Drug Discovery Today 21, 58−71. (34) Schmidt, F., Matter, H., Hessler, G., and Czich, A. (2014) Predictive in silico off-target profiling in drug discovery. Future Med. Chem. 6, 295−317. (35) Brogi, S., Papazafiri, P., Roussis, V., and Tafi, A. (2013) 3DQSAR using pharmacophore-based alignment and virtual screening for discovery of novel MCF-7 cell line inhibitors. Eur. J. Med. Chem. 67, 344−351.

REFERENCES

(1) Nilsson, S., Koehler, K. F., and Gustafsson, J.-Å. (2011) Development of subtype-selective oestrogen receptor-based therapeutics. Nat. Rev. Drug Discovery 10, 778−792. (2) Herynk, M. H., and Fuqua, S. A. W. (2004) Estrogen Receptor Mutations in Human Disease. Endocr. Rev. 25, 869−898. (3) Kuiper, G. G., Enmark, E., Pelto-Huikko, M., Nilsson, S., and Gustafsson, J. A. (1996) Cloning of a novel receptor expressed in rat prostate and ovary. Proc. Natl. Acad. Sci. U. S. A. 93, 5925−5930. (4) Mosselman, S., Polman, J., and Dijkema, R. (1996) ERβ: Identification and characterization of a novel human estrogen receptor. FEBS Lett. 392, 49−53. (5) Tremblay, G. B., Tremblay, A., Copeland, N. G., Gilbert, D. J., Jenkins, N. A., Labrie, F., and Giguère, V. (1997) Cloning, Chromosomal Localization, and Functional Analysis of the Murine Estrogen Receptor β. Mol. Endocrinol. 11, 353−365. (6) Tsezou, A., Tzetis, M., Gennatas, C., Giannatou, E., Pampanos, A., Malamis, G., Kanavakis, E., and Kitsiou, S. (2008) Association of repeat polymorphisms in the estrogen receptors alpha, beta (ESR1, ESR2) and androgen receptor (AR) genes with the occurrence of breast cancer. Breast 17, 159−166. (7) Thomas, C., and Gustafsson, J.-Å. (2011) The different roles of ER subtypes in cancer biology and therapy. Nat. Rev. Cancer 11, 597− 608. (8) Ascenzi, P., Bocedi, A., and Marino, M. (2006) Structure− function relationship of estrogen receptor α and β: Impact on human health. Mol. Aspects Med. 27, 299−402. (9) Enmark, E., Pelto-Huikko, M., Grandien, K., Lagercrantz, S., Lagercrantz, J., Fried, G., Nordenskjöld, M., and Gustafsson, J.-Å. (1997) Human Estrogen Receptor β-Gene Structure, Chromosomal Localization, and Expression Pattern. J. Clin. Endocrinol. Metab. 82, 4258−4265. (10) Kuiper, G. G. J. M., Carlsson, B., Grandien, K., Enmark, E., Häggblad, J., Nilsson, S., and Gustafsson, J.-Å. (1997) Comparison of the Ligand Binding Specificity and Transcript Tissue Distribution of Estrogen Receptors α and β. Endocrinology 138, 863−870. (11) Shughrue, P. J., Lane, M. V., Scrimo, P. J., and Merchenthaler, I. (1998) Comparative distribution of estrogen receptor-α (ER-α) and β (ER-β) mRNA in the rat pituitary, gonad, and reproductive tract. Steroids 63, 498−504. (12) Couse, J. F., Lindzey, J., Grandien, K., Gustafsson, J.-Å., and Korach, K. S. (1997) Tissue Distribution and Quantitative Analysis of Estrogen Receptor-α (ERα) and Estrogen Receptor-β (ERβ) Messenger Ribonucleic Acid in the Wild-Type and ERα-Knockout Mouse. Endocrinology 138, 4613−4621. (13) Shughrue, P. J., Lane, M. V., and Merchenthaler, I. (1997) Comparative distribution of estrogen receptor-α and -β mRNA in the rat central nervous system. J. Comp. Neurol. 388, 507−525. (14) ter Horst, G. J. (2010) Estrogen in the Limbic System, In Vitamins and Hormones: Hormones of the Limbic System (Litwack, G., Ed.) pp 319−338, Academic Press, London. (15) Pearce, S. T., and Jordan, V. C. (2004) The biological role of estrogen receptors α and β in cancer. Crit. Rev. Oncol. Hematol. 50, 3− 22. (16) McDonnell, D. P., and Norris, J. D. (2002) Connections and Regulation of the Human Estrogen Receptor. Science 296, 1642− 1644. (17) Konduri, S. D., Medisetty, R., Liu, W., Kaipparettu, B. A., Srivastava, P., Brauch, H., Fritz, P., Swetzig, W. M., Gardner, A. E., Khan, S. A., and Das, G. M. (2010) Mechanisms of estrogen receptor antagonism toward p53 and its implications in breast cancer therapeutic response and stem cell regulation. Proc. Natl. Acad. Sci. U. S. A. 107, 15081−15086. (18) Weigel, M. T., and Dowsett, M. (2010) Current and emerging biomarkers in breast cancer: prognosis and prediction. Endocr.-Relat. Cancer 17, R245−R262. (19) Siegel, R. L., Miller, K. D., and Jemal, A. (2017) Cancer statistics, 2017. Ca-Cancer J. Clin. 67, 7−30. 810

DOI: 10.1021/acs.chemrestox.8b00130 Chem. Res. Toxicol. 2018, 31, 799−813

Article

Chemical Research in Toxicology (36) Tong, W., Perkins, R., Xing, L., Welsh, W. J., and Sheehan, D. M. (1997) QSAR Models for Binding of Estrogenic Compounds to Estrogen Receptor α and β Subtypes. Endocrinology 138, 4022−4025. (37) Sadler, B. R., Cho, S. J., Ishaq, K. S., Chae, K., and Korach, K. S. (1998) Three-Dimensional Quantitative Structure−Activity Relationship Study of Nonsteroidal Estrogen Receptor Ligands Using the Comparative Molecular Field Analysis/Cross-Validated r2-Guided Region Selection Approach. J. Med. Chem. 41, 2261−2267. (38) Shi, L. M., Fang, H., Tong, W., Wu, J., Perkins, R., Blair, R. M., Branham, W. S., Dial, S. L., Moland, C. L., and Sheehan, D. M. (2001) QSAR Models Using a Large Diverse Set of Estrogens. J. Chem. Inf. Comput. Sci. 41, 186−195. (39) Cherkasov, A., Muratov, E. N., Fourches, D., Varnek, A., Baskin, I. I., Cronin, M., Dearden, J., Gramatica, P., Martin, Y. C., Todeschini, R., Consonni, V., Kuz’min, V. E., Cramer, R., Benigni, R., Yang, C., Rathman, J., Terfloth, L., Gasteiger, J., Richard, A., and Tropsha, A. (2014) QSAR Modeling: Where Have You Been? Where Are You Going To? J. Med. Chem. 57, 4977−5010. (40) Mukherjee, S., Nagar, S., Mullick, S., Mukherjee, A., and Saha, A. (2007) Pharmacophore Mapping of Selective Binding Affinity of Estrogen Modulators through Classical and Space Modeling Approaches: Exploration of Bridged-Cyclic Compounds with Diarylethylene Linkage. J. Chem. Inf. Model. 47, 475−487. (41) Brogi, S., Kladi, M., Vagias, C., Papazafiri, P., Roussis, V., and Tafi, A. (2009) Pharmacophore Modeling for Qualitative Prediction of Antiestrogenic Activity. J. Chem. Inf. Model. 49, 2489−2497. (42) Fang, J., Shen, J., Cheng, F., Xu, Z., Liu, G., and Tang, Y. (2011) Computational Insights into Ligand Selectivity of Estrogen Receptors from Pharmacophore Modeling. Mol. Inf. 30, 539−549. (43) El Marzouk, S., Gahattamaneni, R., Joshi, S. R., and Scovell, W. M. (2008) The plasticity of estrogen receptor−DNA complexes: Binding affinity and specificity of estrogen receptors to estrogen response element half-sites separated by variant spacers. J. Steroid Biochem. Mol. Biol. 110, 186−195. (44) Spyrakis, F., and Cozzini, P. (2009) How Computational Methods Try to Disclose the Estrogen Receptor Secrecy-Modeling the Flexibility. Curr. Med. Chem. 16, 2987−3027. (45) Manas, E. S., Xu, Z. B., Unwalla, R. J., and Somers, W. S. (2004) Understanding the Selectivity of Genistein for Human Estrogen Receptor-β Using X-Ray Crystallography and Computational Methods. Structure 12, 2197−2207. (46) Blizzard, T. A., DiNinno, F., Morgan, J. D., II, Chen, H. Y., Wu, J. Y., Kim, S., Chan, W., Birzin, E. T., Yang, Y. T., Pai, L.-Y., Fitzgerald, P. M. D., Sharma, N., Li, Y., Zhang, Z., Hayes, E. C., DaSilva, C. A., Tang, W., Rohrer, S. P., Schaeffer, J. M., and Hammond, M. L. (2005) Estrogen receptor ligands. Part 9: Dihydrobenzoxathiin SERAMs with alkyl substituted pyrrolidine side chains and linkers. Bioorg. Med. Chem. Lett. 15, 107−113. (47) Dykstra, K. D., Guo, L., Birzin, E. T., Chan, W., Yang, Y. T., Hayes, E. C., DaSilva, C. A., Pai, L.-Y., Mosley, R. T., Kraker, B., Fitzgerald, P. M. D., DiNinno, F., Rohrer, S. P., Schaeffer, J. M., and Hammond, M. L. (2007) Estrogen receptor ligands. Part 16:2-Aryl indoles as highly subtype selective ligands for ERα. Bioorg. Med. Chem. Lett. 17, 2322−2328. (48) Leong, M. K., Syu, R.-G., Ding, Y.-L., and Weng, C.-F. (2017) Prediction of N-Methyl-D-Aspartate Receptor GluN1-Ligand Binding Affinity by a Novel SVM-Pose/SVM-Score Combinatorial Ensemble Docking Scheme. Sci. Rep. 7, 40053. (49) Williams-Noonan, B. J., Yuriev, E., and Chalmers, D. K. (2018) Free Energy Methods in Drug Design: Prospects of “Alchemical Perturbation” in Medicinal Chemistry. J. Med. Chem. 61, 638−649. (50) Li, J., Lei, B., Liu, H., Li, S., Yao, X., Liu, M., and Gramatica, P. (2008) QSAR study of malonyl-CoA decarboxylase inhibitors using GA-MLR and a new strategy of consensus modeling. J. Comput. Chem. 29, 2636−2647. (51) Chen, C.-N., Shih, Y.-H., Ding, Y.-L., and Leong, M. K. (2011) Predicting Activation of the Promiscuous Human Pregnane X Receptor by Pharmacophore Ensemble/Support Vector Machine Approach. Chem. Res. Toxicol. 24, 1765−1778.

(52) Leong, M. K. (2007) A Novel Approach Using Pharmacophore Ensemble/Support Vector Machine (PhE/SVM) for Prediction of hERG Liability. Chem. Res. Toxicol. 20, 217−226. (53) Leong, M. K., Chen, Y.-M., Chen, H.-B., and Chen, P.-H. (2009) Development of a New Predictive Model for Interactions with Human Cytochrome P450 2A6 Using Pharmacophore Ensemble/ Support Vector Machine (PhE/SVM) Approach. Pharm. Res. 26, 987−1000. (54) Leong, M. K., and Chen, T.-H. (2008) Prediction of cytochrome P450 2B6-substrate interactions using pharmacophore ensemble/support vector machine (PhE/SVM) approach. Med. Chem. 4, 396−406. (55) Leong, M. K., Chen, H.-B., and Shih, Y.-H. (2012) Prediction of Promiscuous P-Glycoprotein Inhibition Using a Novel Machine Learning Scheme. PLoS One 7, e33829. (56) Ding, Y.-L., Shih, Y.-H., Tsai, F.-Y., and Leong, M. K. (2014) In Silico Prediction of Inhibition of Promiscuous Breast Cancer Resistance Protein (BCRP/ABCG2). PLoS One 9, e90689. (57) Tan, Q., Blizzard, T. A., Morgan, J. D., II, Birzin, E. T., Chan, W., Yang, Y. T., Pai, L.-Y., Hayes, E. C., DaSilva, C. A., Warrier, S., Yudkovitz, J., Wilkinson, H. A., Sharma, N., Fitzgerald, P. M. D., Li, S., Colwell, L., Fisher, J. E., Adamski, S., Reszka, A. A., Kimmel, D., DiNinno, F., Rohrer, S. P., Freedman, L. P., Schaeffer, J. M., and Hammond, M. L. (2005) Estrogen receptor ligands. Part 10: Chromanes: old scaffolds for new SERAMs. Bioorg. Med. Chem. Lett. 15, 1675−1681. (58) Blizzard, T. A., DiNinno, F., Morgan, J. D., II, Chen, H. Y., Wu, J. Y., Gude, C., Kim, S., Chan, W., Birzin, E. T., Tien Yang, Y., Pai, L.Y., Zhang, Z., Hayes, E. C., DaSilva, C. A., Tang, W., Rohrer, S. P., Schaeffer, J. M., and Hammond, M. L. (2004) Estrogen receptor ligands. Part 7: Dihydrobenzoxathiin SERAMs with bicyclic amine side chains. Bioorg. Med. Chem. Lett. 14, 3861−3864. (59) Blizzard, T. A., DiNinno, F., Morgan, J. D., II, Wu, J. Y., Chen, H. Y., Kim, S., Chan, W., Birzin, E. T., Yang, Y. T., Pai, L.-Y., Zhang, Z., Hayes, E. C., DaSilva, C. A., Tang, W., Rohrer, S. P., Schaeffer, J. M., and Hammond, M. L. (2004) Estrogen receptor ligands. Part 8: Dihydrobenzoxathiin SERAMs with heteroatom-substituted side chains. Bioorg. Med. Chem. Lett. 14, 3865−3868. (60) Blizzard, T. A., DiNinno, F., Chen, H. Y., Kim, S., Wu, J. Y., Chan, W., Birzin, E. T., Yang, Y. T., Pai, L.-Y., Hayes, E. C., DaSilva, C. A., Rohrer, S. P., Schaeffer, J. M., and Hammond, M. L. (2005) Estrogen receptor ligands. Part 13: Dihydrobenzoxathiin SERAMs with an optimized antagonist side chain. Bioorg. Med. Chem. Lett. 15, 3912−3916. (61) Blizzard, T. A., Morgan, J. D., Mosley, R. T., Birzin, E. T., Frisch, K., Rohrer, S. P., and Hammond, M. L. (2003) 2Phenylspiroindenes: a novel class of selective estrogen receptor modulators (SERMs). Bioorg. Med. Chem. Lett. 13, 479−483. (62) Blizzard, T. A., Morgan, J. D., II, Chan, W., Birzin, E. T., Pai, L.Y., Hayes, E. C., DaSilva, C. A., Mosley, R. T., Yang, Y. T., Rohrer, S. P., DiNinno, F., and Hammond, M. L. (2005) Estrogen receptor ligands. Part 14: Application of novel antagonist side chains to existing platforms. Bioorg. Med. Chem. Lett. 15, 5124−5128. (63) Blizzard, T. A., Gude, C., Chan, W., Birzin, E. T., Mojena, M., Tudela, C., Chen, F., Knecht, K., Su, Q., Kraker, B., Holmes, M. A., Rohrer, S. P., and Hammond, M. L. (2007) Bridged androstenediol analogs as ER-β selective SERMs. Bioorg. Med. Chem. Lett. 17, 2944− 2948. (64) Blizzard, T. A., Gude, C., Morgan, J. D., II, Chan, W., Birzin, E. T., Mojena, M., Tudela, C., Chen, F., Knecht, K., Su, Q., Kraker, B., Mosley, R. T., Holmes, M. A., Rohrer, S. P., and Hammond, M. L. (2007) Androstene-3,5-dienes as ER-β selective SERMs. Bioorg. Med. Chem. Lett. 17, 6295−6298. (65) Chen, H. Y., Dykstra, K. D., Birzin, E. T., Frisch, K., Chan, W., Yang, Y. T., Mosley, R. T., DiNinno, F., Rohrer, S. P., Schaeffer, J. M., and Hammond, M. L. (2004) Estrogen receptor ligands. Part 1: The discovery of flavanoids with subtype selectivity. Bioorg. Med. Chem. Lett. 14, 1417−1421. 811

DOI: 10.1021/acs.chemrestox.8b00130 Chem. Res. Toxicol. 2018, 31, 799−813

Article

Chemical Research in Toxicology (66) Kim, S., Wu, J., Chen, H. Y., Birzin, E. T., Chan, W., Yang, Y. T., Colwell, L., Li, S., Dahllund, J., DiNinno, F., Rohrer, S. P., Schaeffer, J. M., and Hammond, M. L. (2004) Estrogen receptor ligands. Part 4: The SAR of the syn-dihydrobenzoxathiin SERAMs. Bioorg. Med. Chem. Lett. 14, 2741−2745. (67) Tan, Q., Birzin, E. T., Chan, W., Tien Yang, Y., Pai, L.-Y., Hayes, E. C., DaSilva, C. A., DiNinno, F., Rohrer, S. P., Schaeffer, J. M., and Hammond, M. L. (2004) Estrogen receptor ligands. Part 5: The SAR of dihydrobenzoxathiins containing modified basic side chains. Bioorg. Med. Chem. Lett. 14, 3747−3751. (68) Tan, Q., Birzin, E. T., Chan, W., Yang, Y. T., Pai, L.-Y., Hayes, E. C., DaSilva, C. A., DiNinno, F., Rohrer, S. P., Schaeffer, J. M., and Hammond, M. L. (2004) Estrogen receptor ligands. Part 6: Synthesis and binding affinity of dihydrobenzodithiins. Bioorg. Med. Chem. Lett. 14, 3753−3755. (69) Foloppe, N., and Chen, I.-J. (2009) Conformational Sampling and Energetics of Drug-Like Molecules. Curr. Med. Chem. 16, 3381− 3413. (70) Chang, G., Guida, W. C., and Still, W. C. (1989) An internalcoordinate Monte Carlo method for searching conformational space. J. Am. Chem. Soc. 111, 4379−4386. (71) Kolossvary, I., and Guida, W. C. (1996) Low mode search. An efficient, automated computational method for conformational analysis: Application to cyclic and acyclic alkanes and cyclic peptides. J. Am. Chem. Soc. 118, 5011−5019. (72) Still, W. C., Tempczyk, A., Hawley, R. C., and Hendrickson, T. (1990) Semianalytical treatment of solvation for molecular mechanics and dynamics. J. Am. Chem. Soc. 112, 6127−6129. (73) Halgren, T. A. (1996) Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94. J. Comput. Chem. 17, 490−519. (74) Evans, D. A., Doman, T. N., Thorner, D. A., and Bodkin, M. J. (2007) 3D QSAR methods: Phase and Catalyst compared. J. Chem. Inf. Model. 47, 1248−1257. (75) Li, H., Sutter, J., and Hoffmann, R. (2000) HypoGen: An Automated System for Generating 3D Predictive Pharmacophore Models, In Pharmacophore Perception, Development, and Use in Drug Design (Güner, O. F., Ed.) pp 171−189, International University Line, La Jolla, CA. (76) Islam, M. A., Patel, D. A., Rathod, S. G., Chunarkar, P., and Pillay, T. S. (2016) Identification of structural requirements of estrogen receptor modulators using pharmacoinformatics techniques for application to estrogen therapy. Med. Chem. Res. 25, 407−421. (77) Dearden, J. C., Cronin, M. T. D., and Kaiser, K. L. E. (2009) How not to develop a quantitative structure−activity or structure− property relationship (QSAR/QSPR). SAR QSAR Environ. Res. 20, 241−266. (78) Leong, M. K., Lin, S.-W., Chen, H.-B., and Tsai, F.-Y. (2010) Predicting Mutagenicity of Aromatic Amines by Various Machine Learning Approaches. Toxicol. Sci. 116, 498−513. (79) Breiman, L., and Spector, P. (1992) Submodel Selection and Evaluation in Regression. The X-Random Case. Int. Stat. Rev. 60, 291−319. (80) Ojha, P. K., Mitra, I., Das, R. N., and Roy, K. (2011) Further exploring rm2 metrics for validation of QSPR models. Chemom. Intell. Lab. Syst. 107, 194−205. (81) Schüürmann, G., Ebert, R.-U., Chen, J., Wang, B., and Kühne, R. (2008) External Validation and Prediction Employing the Predictive Squared Correlation Coefficient-Test Set Activity Mean vs Training Set Activity Mean. J. Chem. Inf. Model. 48, 2140−2145. (82) Consonni, V., Ballabio, D., and Todeschini, R. (2009) Comments on the Definition of the Q2 Parameter for QSAR Validation. J. Chem. Inf. Model. 49, 1669−1678. (83) Chirico, N., and Gramatica, P. (2011) Real External Predictivity of QSAR Models: How To Evaluate It? Comparison of Different Validation Criteria and Proposal of Using the Concordance Correlation Coefficient. J. Chem. Inf. Model. 51, 2320−2335.

(84) Gramatica, P., Cassani, S., and Chirico, N. (2014) QSARINSchem: Insubria datasets and new QSAR/QSPR models for environmental pollutants in QSARINS. J. Comput. Chem. 35, 1036−1044. (85) Gramatica, P., Chirico, N., Papa, E., Cassani, S., and Kovarich, S. (2013) QSARINS: A new software for the development, analysis, and validation of QSAR MLR models. J. Comput. Chem. 34, 2121− 2132. (86) Golbraikh, A., Shen, M., Xiao, Z. Y., Xiao, Y. D., Lee, K. H., and Tropsha, A. (2003) Rational selection of training and test sets for the development of validated QSAR models. J. Comput.-Aided Mol. Des. 17, 241−253. (87) Roy, K., Mitra, I., Kar, S., Ojha, P. K., Das, R. N., and Kabir, H. (2012) Comparative Studies on Some Metrics for External Validation of QSPR Models. J. Chem. Inf. Model. 52, 396−408. (88) Chirico, N., and Gramatica, P. (2012) Real External Predictivity of QSAR Models. Part 2. New Intercomparable Thresholds for Different Validation Criteria and the Need for Scatter Plot Inspection. J. Chem. Inf. Model. 52, 2044−2058. (89) Casalegno, M., Sello, G., and Benfenati, E. (2008) Definition and Detection of Outliers in Chemical Space. J. Chem. Inf. Model. 48, 1592−1601. (90) Gnanadesikan, R., and Kettenring, J. R. (1972) Robust estimates, residuals, and outlier detection with multiresponse data. Biometrics 28, 81−124. (91) Katzenellenbogen, J. A. (2011) The 2010 Philip S. Portoghese Medicinal Chemistry Lectureship: Addressing the “Core Issue” in the Design of Estrogen Receptor Ligands. J. Med. Chem. 54, 5271−5282. (92) Gao, L., Tu, Y., Ågren, H., and Eriksson, L. A. (2012) Characterization of Agonist Binding to His524 in the Estrogen Receptor α Ligand Binding Domain. J. Phys. Chem. B 116, 4823− 4830. (93) Wolber, G., Seidel, T., Bendix, F., and Langer, T. (2008) Molecule-pharmacophore superpositioning and pattern matching in computational drug design. Drug Discovery Today 13, 23−29. (94) Bruning, J. B., Parent, A. A., Gil, G., Zhao, M., Nowak, J., Pace, M. C., Smith, C. L., Afonine, P. V., Adams, P. D., Katzenellenbogen, J. A., and Nettles, K. W. (2010) Coupling of receptor conformation and ligand orientation determine graded activity. Nat. Chem. Biol. 6, 837− 843. (95) Brzozowski, A. M., Pike, A. C. W., Dauter, Z., Hubbard, R. E., Bonn, T., Engstrom, O., Ohman, L., Greene, G. L., Gustafsson, J.-A., and Carlquist, M. (1997) Molecular basis of agonism and antagonism in the oestrogen receptor. Nature 389, 753−758. (96) Fang, J., Akwabi-Ameyaw, A., Britton, J. E., Katamreddy, S. R., Navas, F., Miller, A. B., Williams, S. P., Gray, D. W., Orband-Miller, L. A., Shearin, J., and Heyer, D. (2008) Synthesis of 3-alkyl naphthalenes as novel estrogen receptor ligands. Bioorg. Med. Chem. Lett. 18, 5075− 5077. (97) Heldring, N., Pawson, T., McDonnell, D., Treuter, E., Gustafsson, J.-Å., and Pike, A. C. W. (2007) Structural Insights into Corepressor Recognition by Antagonist-bound Estrogen Receptors. J. Biol. Chem. 282, 10449−10455. (98) Kim, S., Wu, J. Y., Birzin, E. T., Frisch, K., Chan, W., Pai, L.-Y., Yang, Y. T., Mosley, R. T., Fitzgerald, P. M. D., Sharma, N., Dahllund, J., Thorsell, A.-G., DiNinno, F., Rohrer, S. P., Schaeffer, J. M., and Hammond, M. L. (2004) Estrogen Receptor Ligands. II. Discovery of Benzoxathiins as Potent, Selective Estrogen Receptor α Modulators. J. Med. Chem. 47, 2171−2175. (99) Kong, E. H., Heldring, N., Gustafsson, J.-Å., Treuter, E., Hubbard, R. E., and Pike, A. C. W. (2005) Delineation of a unique protein−protein interaction site on the surface of the estrogen receptor. Proc. Natl. Acad. Sci. U. S. A. 102, 3593−3598. (100) Manas, E. S., Unwalla, R. J., Xu, Z. B., Malamas, M. S., Miller, C. P., Harris, H. A., Hsiao, C., Akopian, T., Hum, W.-T., Malakian, K., Wolfrom, S., Bapat, A., Bhat, R. A., Stahl, M. L., Somers, W. S., and Alvarez, J. C. (2004) Structure-Based Design of Estrogen Receptor-β Selective Ligands. J. Am. Chem. Soc. 126, 15106−15119. (101) Nettles, K. W., Bruning, J. B., Gil, G., O’Neill, E. E., Nowak, J., Hughs, A., Kim, Y., DeSombre, E. R., Dilis, R., Hanson, R. N., 812

DOI: 10.1021/acs.chemrestox.8b00130 Chem. Res. Toxicol. 2018, 31, 799−813

Article

Chemical Research in Toxicology Joachimiak, A., and Greene, G. L. (2007) Structural plasticity in the oestrogen receptor ligand-binding domain. EMBO Rep. 8, 563−568. (102) Osz, J., Brélivet, Y., Peluso-Iltis, C., Cura, V., Eiler, S., Ruff, M., Bourguet, W., Rochel, N., and Moras, D. (2012) Structural basis for a molecular allosteric control mechanism of cofactor binding to nuclear receptors. Proc. Natl. Acad. Sci. U. S. A. 109, E588−E594. (103) Phillips, C., Roberts, L. R., Schade, M., Bazin, R., Bent, A., Davies, N. L., Moore, R., Pannifer, A. D., Pickford, A. R., Prior, S. H., Read, C. M., Scott, A., Brown, D. G., Xu, B., and Irving, S. L. (2011) Design and Structure of Stapled Peptides Binding to Estrogen Receptors. J. Am. Chem. Soc. 133, 9696−9699. (104) Shiau, A. K., Barstad, D., Loria, P. M., Cheng, L., Kushner, P. J., Agard, D. A., and Greene, G. L. (1998) The Structural Basis of Estrogen Receptor/Coactivator Recognition and the Antagonism of This Interaction by Tamoxifen. Cell 95, 927−937. (105) Tanenbaum, D. M., Wang, Y., Williams, S. P., and Sigler, P. B. (1998) Crystallographic comparison of the estrogen and progesterone receptor’s ligand binding domains. Proc. Natl. Acad. Sci. U. S. A. 95, 5998−6003. (106) Vajdos, F. F., Hoth, L. R., Geoghegan, K. F., Simons, S. P., LeMotte, P. K., Danley, D. E., Ammirati, M. J., and Pandit, J. (2007) The 2.0 Å crystal structure of the ERα ligand-binding domain complexed with lasofoxifene. Protein Sci. 16, 897−905. (107) Wärnmark, A., Treuter, E., Gustafsson, J.-Å., Hubbard, R. E., Brzozowski, A. M., and Pike, A. C. W. (2002) Interaction of Transcriptional Intermediary Factor 2 Nuclear Receptor Box Peptides with the Coactivator Binding Site of Estrogen Receptor α. J. Biol. Chem. 277, 21862−21868. (108) Huang, P., Chandra, V., and Rastinejad, F. (2010) Structural overview of the nuclear receptor superfamily: insights into physiology and therapeutics. Annu. Rev. Physiol. 72, 247−272. (109) Nwachukwu, J. C., Srinivasan, S., Zheng, Y., Wang, S., Min, J., Dong, C., Liao, Z., Nowak, J., Wright, N. J., Houtman, R., Carlson, K. E., Josan, J. S., Elemento, O., Katzenellenbogen, J. A., Zhou, H. B., and Nettles, K. W. (2016) Predictive features of ligand-specific signaling through the estrogen receptor. Mol. Syst. Biol. 12, 864. (110) Chen, H., Engkvist, O., Wang, Y., Olivecrona, M., and Blaschke, T. (2018) The rise of deep learning in drug discovery. Drug Discovery Today 23, 1241−1250.

813

DOI: 10.1021/acs.chemrestox.8b00130 Chem. Res. Toxicol. 2018, 31, 799−813