1970
Chem. Res. Toxicol. 2008, 21, 1970–1982
Structure-Activity Relationship Analysis of Rat Mammary Carcinogens Albert R. Cunningham,*,† Shanna T. Moss,‡ Seena A. Iype,† Gefei Qian,† Shahid Qamar,† and Suzanne L. Cunningham James Graham Brown Cancer Center and Departments of Medicine and Pharmacology and Toxicology, UniVersity of LouisVille, 529 South Jackson Street, LouisVille, Kentucky 40202, and U.S. EnVironmental Protection Agency, Atlanta, Georgia 30334 ReceiVed May 12, 2008
Structure-activity relationship (SAR) models are powerful tools to investigate the mechanisms of action of chemical carcinogens and to predict the potential carcinogenicity of untested compounds. We describe here the application of the cat-SAR (categorical-SAR) program to two learning sets of rat mammary carcinogens. One set of developed models was based on a comparison of rat mammary carcinogens to rat noncarcinogens (MC-NC), and the second set compared rat mammary carcinogens to rat nonmammary carcinogens (MC-NMC). On the basis of a leave-one-out validation, the best rat MCNC model achieved a concordance between experimental and predicted values of 84%, a sensitivity of 79%, and a specificity of 89%. Likewise, the best rat MC-MNC model achieved a concordance of 78%, a sensitivity of 82%, and a specificity of 74%. The MC-NMC model was based on a learning set that contained carcinogens in both the active (i.e., mammary carcinogens) and the inactive (i.e., carcinogens to sites other than the mammary gland) categories and was able to distinguish between these different types of carcinogens (i.e., tissue specific), not simply between carcinogens and noncarcinogens. On the basis of a structural comparison between this model and one for Salmonella mutagens, there was, as expected, a significant relationship between the two phenomena since a high proportion of breast carcinogens are Salmonella mutagens. However, when analyzing the specific structural features derived from the MC-NC learning set, a dichotomy was observed between fragments associated with mammary carcinogenesis and mutagenicity and others that were associated with estrogenic activity. Overall, these findings suggest that the MC-NC and MC-NMC models are able to identify structural attributes that may in part address the question of “why do some carcinogens cause breast cancer”, which is a different question than “why do some chemicals cause cancer”. Introduction The identification of human carcinogens is a difficult and complex task. Only a limited number of high-quality epidemiological studies have been conducted that identify particular agents that induce cancer in humans. In lieu of such data, rodent cancer bioassays or short-term tests for genotoxicity have been used to estimate the likelihood that particular chemicals will be human carcinogens. However, there are approximately 75000 industrial chemicals on the Toxic Substance Control Act’s Chemical Substance Inventory (1) and the National Institute of Environmental Health Sciences estimates that there are over 80000 chemicals registered for use in the United States (2). A complete 2 year cancer bioassay as conducted by the National Toxicology Program (NTP)1 including planning, evaluation, and review takes about 5 years to complete, costs between $2 and 4 million, and uses * To whom correspondence should be addressed. Tel: 502-852-3346. E-mail:
[email protected]. † University of Louisville. ‡ U.S. Environmental Protection Agency. 1 Abbreviations: CPDB, Carcinogenic Potency Database; DSSTox, distributed structure-searchable toxicity MC-NC; FDA, Food and Drug Administration; IQ, 2-amino-3-methyl-3H-imidazo(4,5-f)quinoline; LOO, leave-one-out;LMO,leave-manyout;MC-NC,mammarycarcinogen-noncarcinogen; MC-NMC, mammary carcinogen-nonmammary carcinogen; NTP, National Toxicology Program; SAR, structure-activity relationship; cat-SAR, categorical-SAR.
400 animals (3). There are currently 538 technical reports by the NTP for rodent carcinogenicity using its standardized 2 year rodent bioassays (4). On the other hand, the Carcinogenic Potency Database (CPDB) analyzes and consolidates into a single resource the world’s diverse literature and NTP Technical Reports of chronic long-term animal cancer bioassays (5). To date, analyses of 6540 experiments on 1547 chemicals are available on the CPDB’s Web site (6). Thus, it is evident that not all chemicals in use today will be tested in vivo for carcinogenesis since testing all chemicals in this manner is both time- and cost-prohibitive. Structure-activity relationship (SAR) modeling and other predictive toxicological methods provide a means to estimate toxicological properties of chemicals based on information from previously tested compounds. The CPDB is now available on the Environmental Protection Agency’s Distributed StructureSearchable Toxicity (DSSTox) Database Network (7). The consolidation, standardization, and analyses of cancer bioassay data by the CPDB and DSSTox provide a comprehensive resource for investigating chemical carcinogenesis, including analyses by SAR modeling and predictive toxicological methods. We have reported predictive and mechanistically insightful SAR models for carcinogenesis based on CPDB analyses of mice (8) and rats (9) using the CASE/MULTICASE SAR expert system. The best rat and mouse SAR models from these studies, respectively, had a concordance between experimental and SAR-
10.1021/tx8001725 CCC: $40.75 2008 American Chemical Society Published on Web 08/30/2008
Structure-ActiVity Analysis of Rat Mammary Carcinogens
predicted values of 71 and 78%, sensitivity of 69 and 77%, and specificity of 73 and 78% (8, 9). MCASE MC4PC and MDL-QSAR models have also recently been developed for a set of 1540 compounds tested for rodent carcinogenicity as compiled by the Food and Drug Administration (FDA) (10). In this case, Contrera et al. reported a concordance of 66 and 69%, a sensitivity of 61 and 63%, and a specificity of 71 and 75%, respectively, for MCASE MC4PC and MDL-QSAR (10). Many others have also demonstrated varying degrees of success modeling chemical carcinogens, and the utility and application of some important toxicologically focused predictive methods have been reviewed in depth (11-13). The CASE/MULTICASE SAR models of rat and mouse carcinogens developed by us, while being predictive, also provided insight into the structural underpinnings for speciesspecific carcinogenesis. Many, although not all, of the readily explainable attributes of these models corresponded with the genotoxic or electrophilic paradigm of carcinogenesis (14). This is not surprising given the large numbers of electrophilic or proelectrophilic carcinogens used to build the models and the a priori acceptance of the electrophilic theory and thus demonstrated that the modeling process yielded justifiable and mechanistically insightful information. Of note, even in light of the bias toward the electrophilic paradigm, we were able to glean an interesting relationship between estrogenicity and carcinogenicity. We identified a twodimensional feature of rodent carcinogens that dichotomizes the so-called “beneficial” (e.g., phytoestrogens) from “harmful” (e.g., pesticides and industrial chemicals) xenoestrogens (15). Further investigation of this feature showed that differences in regional lipophilicity were evident between phytoestrogens and other man-made estrogenic compounds. We speculated at the time that these differences in chemical features of estrogenic compounds could induce different biological responses (15-17). During this same time, the estrogen receptor alpha (ERR) ligand binding domain was crystallized, and its atomic coordinates were resolved with those of bound estradiol and raloxifene (18), genistein (19), and 4-hydroxytamoxifen and diethylstilbestrol (20). It was noted that the lipophilic cavity is nearly twice the size of estradiol, which may explain in part the ER’s promiscuity (21). Most importantly, it was observed by these authors that estrogen antagonists induce a different conformational change in the AF-2 region as compared to that for the natural ligand. Together, these analyses demonstrated the utility of SAR analysis to not only generate predictive models that are explainable by current knowledge but also their ability to generate testable hypotheses regarding the mechanistic action of toxicants. Environmental risk factors, including chemical exposure, may play a role in the development of breast cancer. Although many of these factors remain largely unknown, in an assessment of numerous study sources, it was reported that 216 chemicals are associated with breast cancer (22). Cohn et al. found that the age of exposure to DDT is an important risk factor for breast cancer, noting a 5-fold increased risk for breast cancer for women who were exposed mostly under age 20 (23). We note that one group of chemicals that has received considerable attention with regards to breast cancer is the environmental endocrine disruptors, with specific attention paid to the xenoestrogens (24, 25). For instance, many industrial chemicals (e.g., PCBs and pesticides), consumer products (e.g., plasticizers and phenols), and plant products (e.g., phytoestrogens such as genistein and coumestrol) have been shown to possess estrogenic activity in a number of in vitro and in vivo assays, although
Chem. Res. Toxicol., Vol. 21, No. 10, 2008 1971
most studies have been limited primarily to DDT, DDE, TCDD, and PCBs, with other compounds receiving little or no attention (26). As such, xenoestrogens warrant attention, especially in light of conflicting epidemiological data and expert opinions regarding their role in the development of this disease (27-30). Moreover, when considering the specific role that genotoxicants play in carcinogenesis, results from the comet (i.e., alkaline single cell gel electrophoresis) assay are interesting since the method can detect tissue-specific in vivo chemical-induced genotoxicity. When applying this assay to a set of 208 chemicals previously tested for carcinogenicity, Sasaki et al. found that many of the tissues that displayed DNA damage were not necessarily targets for carcinogenicity, but nearly all tissues displaying carcinogenicity were also targets of genotoxicity (31). They concluded that although genotoxicity is generally necessary for carcinogenicity, it is not a sufficient predictor of organspecific carcinogenicity (31, 32). In other words, although genotoxicity is a mechanistic link to cancer, DNA adducts can be found in similar levels between cancer target and nontarget organs (33). In fact, when considering the analysis of in vivo biomarkers, DNA from an often noncancer target tissue (i.e., peripheral blood lymphocytes) is used as a surrogate for other sites that might be the target for carcinogenesis (34). Because whole-animal carcinogenicity data deal with many underlying and often competing mechanisms, the development of organ-specific carcinogenicity SAR models is appealing. The FDA’s National Center for Toxicological Research noted that FDA reviewers are interested in organ-specific carcinogenicity to aid in evaluating new chemicals (35). In their preliminary SAR analyses of liver carcinogens, they obtained an overall predictability of 63%, with a sensitivity of 30% and a specificity of 77% (35). For the analyses described herein, we used the cat-SAR expert system developed by us to analyze the set of rat mammary carcinogens reported in the CPDB (36). The system is called cat-SAR for categorical-SAR. Basically, the cat-SAR approach is a computational SAR or in silico toxicity prediction “expert system” as described by Dearden (37). In a previous analysis of human respiratory sensitizers, the cat-SAR program was able to achieve a concordance between experimental and predicted values of 92% with sensitivities between 89 and 94% and specificities between 87 and 95% (38). The approach that we have taken in developing the cat-SAR program diverges from some existing commercial SAR expert systems that use defined and proprietary modeling technology since descriptor selection and mathematical model derivation are transparent and controllable. Thus, with cat-SAR, the control and selection of modeling parameters facilitate the ability to rigorously explore the relationships between chemical structure and biological activity without the use of “black box” technology. Ultimately, this rationale negates any a priori requirements that a given set of data must fit the attributes of a predefined and often proprietary modeling process. The cat-SAR models are built through a comparison of structural features found among categorized compounds in the model’s learning set. Generically, these categories are biologically active and inactive compounds. When just considering whole-animal carcinogenesis, the categories are simply carcinogens and noncarcinogens. However, when considering organ-specific carcinogenesis, the question arises as to the selection of the inactive or noncarcinogenic compounds. Should they be whole-animal noncarcinogens or carcinogens that are just not carcinogenic to the organ under consideration? For this exercise, we considered both options and developed predictive
1972
Chem. Res. Toxicol., Vol. 21, No. 10, 2008
SAR models comparing rat mammary carcinogens to noncarcinogens (MC-NC model) and rat mammary carcinogens to nonmammary carcinogens (MC-NMC model). Moreover, because the MC-NMC model considers carcinogens as both active (i.e., mammary carcinogens) and inactive (i.e., carcinogens to other sites), analysis of this model is not intended to classify carcinogens from noncarcinogens. Rather, the intent is to classify (and study) breast carcinogens from other carcinogenic compounds. By so doing, the MC-NMC model is intended to identify structural attributes that may in part address the question of “why do some carcinogens cause breast cancer”, which is a different question than “why do some chemicals cause cancer”. In essence, a tiered computational approach could be used to first identify carcinogens and then to identify those that might be active at the mammary gland.
Experimental Procedures Mammary Gland Carcinogen Learning Sets. The CPDB standardizes the experimental results (whether positive or negative for carcinogenicity), including qualitative data on strain, sex, route of compound administration, target organ, histopathology, and the author’s opinion and reference to the published paper, as well as quantitative data on carcinogenic potency, statistical significance, tumor incidence, dose-response curve shape, length of experiment, duration of dosing, and dose rate (39). Moreover, a potency value for carcinogens, the TD50, is also available. The TD50 is defined as “that dose rate (in mg/kg body weight/day), which, if administered chronically for the standard lifespan of the species, will halve the probability of remaining tumorless throughout that period” (39). Several mammary carcinogen learning sets were developed from the published CPDB Summary Table by Target Site (36, 40) for this study. The web-based version of the CPDB listed 107 rat mammary carcinogens for both males and females (40). Of the 107 compounds, only four were not clear female mammary carcinogens. Atrazine, 4,4′-methylene-bis(2-methylaniline), and methyleugenol were tested in both sexes and found to only induce mammary cancer in males; results for benzidine and norlestrin were not defined by sex; and 4-Bis(2-hydroxyethyl)amino-2-(5-nitro-2-thienyl)quinazoline has only been tested in male rats. On the other hand, of the 107 compounds, 22 were tested only in female rats, while 13 compounds were clearly male and female mammary carcinogens (Table S1 in the Supporting Information). Cat-SAR models were built through a comparison of structural features found among two designated categories of compounds in the model’s learning set. As mentioned, for these analyses, the categories were MC-NC and MC-NMC. The catSAR learning set consisted of the chemical name, its structure as a.MOL2 file, and its categorical designation (e.g., one or zero). Organic salts were included as the free base. Simple mixtures and technical grade preparations were included as the major or active component. Metals, metaloorganic compounds, polymers, and mixtures of unknown composition were not included. As such, we excluded norlestrin, dimethylaminoethylnitrosoethyl urea nitrite salt, from the learning set. Also, because 2-amino-3-methyl-3H-imidazo(4,5-f)quinoline (IQ)·HCl and IQ would both be included as the free base, we excluded the IQ·HCl structure. Therefore, a total of 104 rat mammary carcinogens were included in the learning sets (Table S1 in the Supporting Information). Because we had a sufficient number of noncarcinogens and nonmammary carcinogens to populate the inactive data set, we
Cunningham et al.
made triplicate inactive data sets designated MC-NC and MCNMC. Models 1-3 consisted of 104 chemicals each (see Tables 1 and 2). By so doing, we were able to assess the stability of the derived models. This approach prevented the chance of selecting 104 inactive compounds that produced a “good” model. For the MC-NC model, three random sets of 104 noncarcinogens were randomly selected from the 449 rat noncarcinogens listed in the CPDB. Likewise, for the MC-NMC model, three random sets of 104 carcinogens were selected from the 395 rat carcinogens in the CPDB that did not induce mammary cancer (see Table S1 in the Supporting Information for a summary of MC-NC and MC-NMC model set 1 with data provided in the CPDB (40) and the results from the cat-SAR validation exercise). As mentioned, the final MC-NC and MC-NMC models are available as SD files (see the Supporting Information). The cat-SAR program provides for a number of user-specified options, so there is no a priori determination of the parameters in the final model. As such, we have developed and reported herein several different cat-SAR MC-NC and MC-NMC models. Obviously, with the ability to vary modeling parameters, some can extend past the structural range of the learning sets. For example, the fragment length parameter for the models described herein was set from three to seven heavy atoms (described below). Thus, chemicals of only three heavy atoms contributed their entire chemical structure as one fragment. Likewise, compounds consisting of less than three heavy atoms contributed no fragments to the model. In Silico Chemical Fragmentation and the CompoundFragment Data Matrix. Using the Tripos Sybyl HQSAR module, each chemical was fragmented in silico into all possible fragments meeting user-specified criteria. HQSAR allowed the user to select attributes for fragment determination including atom count (i.e., size of the fragment), bond types, atomic connections (i.e., the arrangement of atoms in the fragment), hydrogen atoms, chirality, and hydrogen bond donor and acceptor groups. Fragments can be linear, branched, or cyclic moieties. Models developed herein contained fragments between three and seven atoms in size and considered atoms, bond types, and atomic connections. Upon completion of the fragmentation routine, a Sybyl HQSAR add-on procedure produces the compound-fragment data matrix as a text file. In the matrix, the rows are intact chemicals, and the columns are molecular fragments. Thus, for each chemical, a tabulation of all of its fragments is recorded across the table rows, and for each fragment, all chemicals that contain it are tabulated in each column. The HQSAR module is not used for statistical analysis or model development. The compound-fragment matrix, which contained between 18450 and 15360 fragments for the MCNC models (see Table 1) and 14537 and 13938 fragments for the MC-NMC model (see Table 2), is then analyzed, using the cat-SAR programs that we have developed to identify structural features associated with active and inactive compounds. The cat-SAR program and the compound-fragments matrix are available through the corresponding author, and the mammary carcinogen model is also available in the Supporting Information. Identifying “Important” Fragments of Activity and Inactivity. A measure of each fragment’s association with biological activity was next determined. To ascertain an association between each fragment and activity (or inactivity), a set of rules is established to choose “important” active and inactive fragments. The first selection rule is the number of times
18450 16616 15360
22264 22264
18450 16616 15360
22264 22264
FragSum 1 2 3
123 0.90:0.90 0.90:0.99
FragAve 1 2 3
123 0.90:0.90 0.90:0.99
2373 2373
1050 1052 1268
2373 2373
1050 1052 1268
536 536
632 500 619
536 536
632 500 619
active
fragments
model
1837 1837
418 552 649
1837 1837
418 552 649
inactive
0.87 (59/68) 1 (59/59)
1.00 (68/68) 0.94 (59/63) 1.00 (67/67)
0.87 (59/68) 1 (59/59)
1.00 (68/68) 0.94 (59/63) 1.00 (67/67)
sensitivity
self-fit
0.99 (175/177) 0.99 (165/167)
0.97 (71/73) 0.99 (72/73) 0.90 (61/68)
0.98 (174/177) 0.99 (165/167)
0.97 (71/73) 0.99 (72/73) 0.90 (61/68)
specificity
0.96 (234/245) 0.99 (224/226)
0.99 (139/141) 0.96 (131/136) 0.95 (128/135)
0.95 (233/245) 0.99 (224/226)
0.99 (139/141) 0.96 (131/136) 0.95 (128/135)
concordance
0.66 (56/85) 0.72 (58/81)
0.82 (64/78) 0.78 (52/67) 0.76 (54/71)
0.68 (58/85) 0.67 (54/81)
0.79 (62/78) 0.78 (52/67) 0.76 (54/71)
sensitivity
drop-1
0.95 (156/165) 0.89 (138/155)
0.82 (50/61) 0.85 (64/75) 0.81 (51/63)
0.92 (152/165) 0.95 (147/155)
0.89 (54/61) 0.84 (63/75) 0.76 (48/63)
specificity
0.85 (212/250) 0.83 (196/236)
0.82 (114/139) 0.82 (116/142) 0.78 (105/134)
0.84 (210/250) 0.85 (201/236)
0.84 (116/139) 0.81 (115/142) 0.76 (102/134)
concordance
0.60 (4.6/7.6) 0.69 (5.7/8.2)
0.74 (4.9/6.6) 0.73 (4.7/6.4) 0.72 (4.6/6.4)
0.63 (5.1/8.2) 0.63 (5.2/8.2)
0.74 (4.9/6.6) 0.73 (4.7/6.4) 0.73 (4.7/6.4)
sensitivity
drop-many
0.94 (13.3/14.2) 0.87 (13.3/15.3)
0.87 (4.8/5.5) 0.82 (5.4/6.6) 0.80 (4.4/5.5)
0.93 (14.2/15.3) 0.92 (14.1/15.3)
0.85 (4.7/5.5) 0.83 (5.5/6.6) 0.76 (4.3/5.5)
specificity
0.82 (17.9/21.8) 0.81 (18.9/23.5)
0.80 (9.7/12.1) 0.78 (10.1/13.0) 0.76 (9.0/11.9)
0.82 (19.3/23.5) 0.82 (19.2/23.5)
0.80 (9.7/12.1) 0.78 (10.2/13.0) 0.75 (8.9/11.9)
concordance
a Total fragments, number of fragments derived from the learning set; model fragments, number of fragments meeting specified rules of the model; active fragments, number of fragments meeting specified rules to be considered as active; inactive fragments, number of fragments meeting specified rules to be considered as inactive; sensitivity, number of correct positive predictions/total number of positive predictions; specificity, number of correct negative predictions/total number of negative predictions; concordance, observed correct predictions, number of correct predictions/total number of predictions; and FragSum and FragAve, see the Experimental Procedures describing the different ways to count fragments. Models 1-3 are composed of different random selections of inactive compounds. Model 123 is composed of a 1:2 active to inactive ratio of compounds, with an equal selection of important fragments (0.90:0.90) and one that controls for the larger number of inactive compounds (0.90:0.99).
total
model
Table 1. Fragment Summary, Self-Fit, and Cross-Validation Results for the MC-NC SAR Modelsa
Structure-ActiVity Analysis of Rat Mammary Carcinogens Chem. Res. Toxicol., Vol. 21, No. 10, 2008 1973
0.78 (15.7/10.3) 0.79 (15.0/19.2) 0.87 (11.9/13.7) 0.88 (11.3/12.9) 0.60 (3.8/6.6) 0.62 (3.8/6.3) 0.81 (176/218) 0.81 (167/207) 0.89 (127/142) 0.90 (120/134) 0.65 (49/76) 0.64 (47/73) 0.95 (199/210) 0.99 (193/195) 0.93 (137/148) 0.99 (140/142) 1 (62/62) 1 (53/53) 1347 1347 a
See the footnote of Table 1.
492 492 18738 18738 123 0.90:0.90 0.90:0.99
1839 1839
0.78 (10.4/13.4) 0.74 (9.0/12.1) 0.72 (8.3/11.6) 0.75 (4.6/6.1) 0.69 (4.0/5.8) 0.64 (3.7/5.8) 0.79 (5.8/7.3) 0.79 (5.0/6.3) 0.79 (4.6/5.8) 0.79 (114/144) 0.77 (102/133) 0.76 (98/129) 0.74 (49/66) 0.73 (47/64) 0.72 (43/60) 0.83 (65/78) 0.80 (55/69) 0.80 (55/69) 0.96 (131/136) 0.97 (122/126 0.98 (118/120) 0.93 (62/67) 0.97 (61/63) 0.98 (61/62) 1.00 (69/69) 0.97 (61/63) 0.98 (57/58) 525 380 389 716 543 430 13938 14537 14111 FragAve 1 2 3
1241 923 819
0.75 (15.2/20.3) 0.78 (14.9/19.2) 0.81 (11.0/13.7) 0.86 (11.1/12.9) 0.65 (4.2/6.6) 0.62 (3.8/6.3) 0.81 (177/218) 0.81 (167/207) 0.90 (128/142) 0.90 (120/134) 0.65 (49/76) 0.64 (47/73) 0.94 (198/210) 0.99 (193/195) 0.97 (143/148) 0.99 (140/142) 0.89 (55/62) 1 (53/53) 1347 1347 492 492 18738 18738 123 0.90:0.90 0.90:0.99
1839 1839
0.75 (4.6/6.1) 0.69 (4.0/5.8) 0.64 (3.7/5.8) 0.79 (5.8/7.3) 0.79 (5.0/6.3) 0.79 (4.6/5.8) 0.78 (113/144) 0.75 (100/133) 0.75 (97/129) 0.74 (49/66) 0.70 (45/64) 0.70 (42/60) 0.82 (64/78) 0.80 (55/69) 0.80 (55/69) 0.95 (129/136) 0.96 (121/126) 0.98 (117/120) 0.90 (60/67) 0.97 (61/63) 0.95 (59/62) 1.00 (69/69) 0.95 (60/63) 1.00 (58/58) 525 380 389 716 543 430 13938 14537 14111 FragSum 1 2 3
1241 923 819
drop-many
specificity sensitivity concordance
drop-1
specificity sensitivity concordance
self-fit
specificity sensitivity inactive
fragments
active total
model
Cunningham et al.
model
Table 2. Fragment Summary, Self-Fit, and Cross-Validation Results for the MC-NMC SAR Modelsa
0.78 (10.4/13.4) 0.74 (9.0/12.1) 0.72 (8.3/11.6)
Chem. Res. Toxicol., Vol. 21, No. 10, 2008 concordance
1974
a fragment is identified in the learning set. For this exercise, it was set at three compounds (i.e., 1.4%). For this parameter, we reasoned that if a fragment was found in only one or two compounds in the learning set, it may be a chance occurrence. On the other hand, because the learning sets were composed of 104 active and 104 inactive compounds of diverse character, if we required fragments to be found in more than three compounds, then we would expect to miss important features. The second rule relates to the proportion of active or inactive compounds that contribute to each fragment. We developed models predominately where the proportion of active or inactive compounds associated with a particular fragment derived from either 90% active or inactive compounds. We reasoned that even if a particular fragment was associated with activity, there may be other reasons (i.e., fragments) why it was inactive; thus, it would not be expected to be found in 100% of the active compounds. A similar argument can be made for inactive fragments. Thus, if we considered only those fragments found exclusively in active or inactive compounds, we would rarify the fragments pool to an unreasonable level and risk losing valuable information. Alternately, we expected that fragments found to be present approximately equally in the active and inactive fragment sets would not be associated with biological activity. Such fragments may serve as chemical scaffolds holding the biologically active features and are not directly related to activity or inactivity. In general, fragments were considered “significant” if they were found in at least three compounds in the learning set and were comprised of either 90% or more active or inactive compounds. Predicting Activity. The resulting list of fragments was then used for mechanistic analysis or to predict the activity of an unknown compound. In the latter circumstance, the cat-SAR program determined which, if any, fragments from the model’s pool of significant fragments the test compound contained. If none were present, no prediction of activity was made for the compound. If one or more fragments were present, the number of active and inactive compounds containing each fragment was determined. The probability of activity or inactivity was then calculated based on the total number of active and inactive compounds that went into deriving each of the fragments. The probability of activity was calculated by two similar means. The fragment sum (FragSum) method calculated the average probability of the active and inactive fragments contained in it and was weighted to the number of active and inactive compounds that went into deriving each fragment. For example, if a compound contained two fragments, one found in 9/10 active compounds in the learning set (i.e., 90% active) and the other found in 3/3 inactive compounds (i.e., 0% active), the unknown compound was predicted to have a probability of activity of 69% (i.e., 9/10 actives + 0/3 actives ) 9/13 actives or 69% chance of activity). Similarly, the fragment average method (FragAve) method calculated the average probability of the active and inactive fragments contained in it by simply averaging the probability of activity associated with each fragment. Using the above example, the two probabilities of activity, 90 and 0%, were averaged for an activity value of 45%. Model Validation. A self-fit (i.e., leave-none-out) and two cross-validations routine [i.e., leave-one-out (LOO) and multiple leave-many-out (LMO)] were conducted for each model. For the LOO cross-validation, each chemical, one at a time, was removed from the total fragment set, and the n - 1 model was derived. Using the same criteria described above, the activity of the removed chemical was then predicted using the n - 1 model. Predicted vs experimental values for each chemical were
Structure-ActiVity Analysis of Rat Mammary Carcinogens
then compared, and the model’s concordance, sensitivity, and specificity were determined. For the LMO cross-validation, randomly selected sets of 10% of the chemicals (i.e., 20 chemicals) were removed from the total fragment set, and the n - 10% model was derived. Again, the activity of each of the removed chemicals was then predicted using the n - 10% model. Predicted vs experimental values for the chemicals in the left out sets were then compared, and the model’s concordance, sensitivity, and specificity were determined. Cat-SAR predictions were based on two separate fragment sets (i.e., the active fragments and the inactive ones), and the predicted activity of a chemical was based on the average probability of all of the active and inactive compounds contributing to its fragments. To best classify compounds back to an active or inactive category, we adapted a routine from our previous MultiCASE work in which we identified an optimal cutoff point that best separated the probabilistic prediction of active and inactive compounds based on the drop-one validations. Chemical Diversity Approach (CDA). To analyze the potential relationship(s) between compounds demonstrating breast carcinogenicity to other toxicological phenomena, including general carcinogenicity to rodents, mutagenicity to Salmonella, and estrogenic activity, we used the CDA. This method is based upon the premise that the mechanistic relationship between biological phenomena can be ascertained by assessing the prevalence of chemicals that give identical responses in the different assays being investigated (41). However, because data compilations between various toxicological end points are often incomplete, we developed the CDA that uses SAR models that had been characterized and validated (42) to predict the activity of a set of 10000 compounds. This set of data was derived from chemical structure libraries and from a random sample of chemical structures from the National Cancer Institute Repository of potential cancer chemotherapeutic agents. The data set contains natural products and synthetic chemicals and includes representatives of all major classes of environmental and toxicological concerns (e.g., xenoestrogens, solvents, alkylating agents, aliphatic and aromatic amines and halides, nitroarenes, phenols, dioxins, pesticides, halogenated biphenyls, polycyclic hydrocarbons, and therapeutics). While no SAR model is perfectly predictive, when applied to a population of 10000 chemicals, it is expected that the overall predicted prevalence of active and inactive molecules will be a reflection of the true distribution. In essence, the prevalence of chemicals that are predicted by SAR models to possess two toxicological properties simultaneously is then quantified and compared to the expected prevalence. If the two effects are assumed to be independent of one another (i.e., the null hypothesis), the observed and expected values should be nearly equal. A significantly greater observed than expected prevalence indicates a similarity in mechanism among the toxicological effects that are being studied, while a significantly lower observed than expected prevalence suggests a possible antagonism between the phenomena under investigation. The CDA has been validated on a number of occasions (43, 44) and has also been used to develop mechanistic hypotheses (45-48).
Results and Discussion Overview of Predictive Performance of the Cat-SAR Mammary Carcinogen Models. The self-fit analysis of all models yielded concordance between experimental and predicted results ranging from 94 to 100%. Considering the LOO
Chem. Res. Toxicol., Vol. 21, No. 10, 2008 1975
validations with the FragSum method to calculate the probabilities of activity, the best rat MC-NC model had a concordance of 84%, a sensitivity of 79%, and a specificity of 89% (model 1, Table 1). This model made predictions on 139 of the 208 chemicals in the learning set. The best rat MC-MNC model had a concordance of 78%, a sensitivity of 82%, and a specificity of 74% (model 1, Table 2) and made predictions on 144 of the 208 chemicals in the learning set. The MC-NC and MC-NMC models 1 were also cross-validated with LMO. The MC-NC model had a concordance of 80%, a sensitivity of 74%, and a specificity of 85%, and the MC-NMC model had a concordance of 78%, a sensitivity of 22%, and a specificity of 86% (Tables 1 and 2). Similar results were obtained for the LOO validations using the FragAve method (Tables 1 and 2). In these analyses, therefore, one method was not superior to the other, and the FragSum models were arbitrarily selected for further analyses. To better judge how well these two models performed in general, we can consider the “accuracy” or reproducibility of in vivo or in vitro toxicological tests themselves. In general, surrogate tests and carcinogen bioassays are not reproducible with 100% concordance. For instance, the NTP’s Salmonella mutagenicity database, which is derived from a standardized protocol, has been estimated to be about 85% reproducible in vitro (49). Moreover, it was found that based on “near-replicate” experiments in the CPDB, there was also a degree of nonreproducibility (5, 50). For example, 11 out of 54 chemicals tested in similar experiments for their ability to induce cancer in mice were discordant (i.e., 80% reproducible), and 16 out of 104 chemicals tested for cancer in rats were discordant (i.e., 85% reproducible) (5). Moreover, Gottman et al., using the CPDB, found only a 57% concordance between 121 compounds tested by the NTP/NCI where literature values were also available (51). On the basis of these findings of variability in these data, the cat-SAR mammary gland carcinogenesis models appear to be as predictive as the bioassays are reproducible and match or exceed the concordance values previously obtained by us (8, 9) or Contrera et al. (10) for rodent carcinogenesis using MCASE and MDL-QSAR. Comparison of Models. Using the difference between two proportions test, analysis of each set of three models derived from the random selection of noncarcinogenic (MC-NC) or nonmammary carcinogenic compounds (MC-NMC) indicated that the models had approximately the same concordance. For example, there was no significant difference between the two MC-NC models with the greatest difference between concordance values. MC-NC model 1 correctly predicted 116 correct compounds out of 139 predictions (82%), and MC-NC model 3 correctly predicted 102 compounds out of 134 predictions (77%) (p ) 0.61). Likewise, the MC-NMC model 1 correctly predicted 113 compounds out of 144 predictions (78%), and MC-NMC model 3 correctly predicted 97 compounds out of 129 predictions (75%) (p ) 0.82). This indicates that the accurate predictions made by the models were not spurious events based on a fortuitous selection of “good” compounds and thus provides assurance that the models are based on a sound foundation and are not providing arbitrary predictions or mechanistic assertions. 1:2 Active to Inactive Model. Because there were many more inactive compounds than active ones for both the MCNC and the MC-NMC models, we assessed the applicability of increasing the information content of the model by increasing the number of inactive compounds. To do this, we combined the inactive sets of models 1-3 and then pared the set back to 208 inactives. These models therefore had a 1:2 active to inactive
1976
Chem. Res. Toxicol., Vol. 21, No. 10, 2008
ratio and were designated MC-NC and MC-NMC Models 123 (see Tables 1 and 2). With the same model parameters described above (i.e., fragments derived from at least three chemicals and comprised of 90% or more active or inactive chemicals) using the LOO validation for the FragSum models, the derived 1:2 active to inactive models had a concordance values of 84 and 81% for the MC-NC and MC-NMC models, respectively, but were skewed with sensitivities of 68 and 65% and specificities of 92 and 90% (see models 123, Tables 1 and 2). Similar results were obtained from the FragAve method and with the LMM validations. To offset this imbalance, we changed the value used to select important inactive fragments from 0.90 to 0.99, which in essence required more inactive chemicals to contribute to inactive fragments than active chemicals were needed to contribute to active fragments. These 0.90-0.99 adjusted models had LOO concordances between 85 and 81%, respectively, for the MCNC and MC-NMC models and were similarly skewed with sensitivities of 67 and 64% and specificitiesy of 95 and 90% (see models 123, Tables 1 and 2). The results indicate that balanced learning sets yielded better models than unbalanced ones for this set of data. Examples of Cat-SAR Predictions. The two examples that follow are based on results from the LOO cross-validation exercise; therefore, the individual chemicals did not contribute to their own prediction. 4-Aminodiphenyl and 2-amino-1methyl-6-phenylimidazo[4,5-b]pyridine (PhIP) were selected to illustrate cat-SAR predictions of mammary carcinogens based onMC-NCmodel1andMC-NMCmodel1.Briefly,4-aminodipneyl·HCl is a Salmonella mutagen with a rat TD50 of 0.98 mg/kg/day and is only tumorigenic to the mammary gland in female rats with no assay results for male rats. PhIP, as reported in the CPDB, is also a Salmonella mutagen with a rat TD50 of 1.69 mg/kg/day and tumorigenic to the hematopoietic system, large and small intestine, and prostate in male rats and the mammary gland and small intestine in female rats. 4-Aminodiphenyl was predicted to be active by the MC-NC model as a mammary carcinogen where the classification categories were breast carcinogen or noncarcinogen. This was based on the occurrence of four similar fragments grouped together as MC-NC set A (Figure 1). These fragments were all derived from nine carcinogens and zero noncarcinogenic compounds in the model’s learning set (Figure 1). 4-Aminodiphenyl was also predicted to be active in the MC-NMC model as a mammary carcinogen when the classification categories were mammary carcinogen or nonmammary carcinogen. This was based on the reoccurrence of MC-NC set A and another set of fragments, MC-NMC set B, which was identified only in the MC-NMC model. MC-NMC set B was derived from two groups of compounds; one group was composed of 28 carcinogenic compounds, 26 of which were mammary carcinogens, and the other group was composed of 22 carcinogenic compounds, 20 of which were mammary carcinogens (Figure 1). Set A (derived from both the MC-NC and MC-NMC models) covers the aromatic amine moiety of 4-aminodiphenyl and is a structural alert for DNA-reactive carcinogens as presented by Ashby et al. (52-54). We selected one of the shared fragments from set A (MC-NC fragment 1798 and MC-NMC fragment 2188) and searched our NTP Salmonella mutagenicity learning set where 17 compounds were identified that contained it, 15 of which were classified as mutagenic and two as nonmutagenic (i.e., 88.2% mutagens). We conducted a similar search using set B fragment MC-NMC fragment 1860. This set of fragments
Cunningham et al.
was only identified by the MC-NMC model and not MC-NC. Inspection of this fragment shows that it covers the parasubstituted biphenyl moiety and does not suggest DNA reactivity. A search of the NTP Salmonella mutagenicity learning set found that it occurred in 57 compounds, of which 27 were mutagenic and 30 nonmutagenic (i.e., 47.4% mutagens). PhIP was predicted to be active in the MC-NC model as a mammary carcinogen where the classification categories were breast carcinogen or noncarcinogen. This prediction was based on the occurrence of 13 fragments that can be grouped into MC-NC sets C, D, and E (Figure 2). Set C was derived from nine carcinogens and zero noncarcinogens, set D was derived from 15 carcinogens and one noncarcinogen, and set E was derived from three carcinogens and zero noncarcinogens (Figure 2). PhIP was also predicted to be a mammary carcinogen by the MC-NMC model where the classification categories were breast carcinogen or nonmammary carcinogen. This prediction was based on the same set of fragments identified by the MCNC model (i.e., sets C, D, and E) and similarly as with 4-aminodiphenyl, with the addition of MC-NMC sets B, D, F, and G (i.e., those only found in the MC-NMC model). MCNMC set B was described above for 4-aminobiphenyl. Set D was derived from 15 carcinogens, all of which were mammary carcinogens. MC-NMC Set F was derived from three groups of carcinogens, one group from four mammary carcinogens and one group from 21 carcinogens and two noncarcinogens and one group of 20 carcinogens and two noncarcinogens. MC-NMC set G contained one fragment derived from four carcinogens, all of which were mammary carcinogens (Figure 2). MC-NMC set B, as mentioned, is not associated with mutagenicity and suggests that the MC-NMC model identifies structural attributes of chemicals that may differentiate mammary carcinogens from other carcinogens. This is separable from the MC-NC model, which differentiates carcinogens from noncarcinogens. It is conceivable that MC-NMC set B is related to estrogen receptor binding and the MC-NMC model. For example, PhIP has been shown to possess structural attributes associated with mutagenicity and interaction with the estrogen receptor (55). Fragment set B is similar to biophores that we previously identified with MultiCASE as being related to estrogenic activity as measured in the E-SCREEN assay (56), and in a number of scenarios, estrogens have been demonstrated to be metabolized to genotoxic agents (57, 58). Comparison to Other Toxicological SAR Models. For CDA analysis, validated SAR models were used to predict the activity of 10000 compounds (described above). Briefly, the prevalence of chemicals that are predicted by separate SAR models to possess two toxicological properties simultaneously is then quantified and compared to the expected prevalence. If the two effects are assumed to be independent of one another (i.e., the null hypothesis), the observed and expected values should be nearly equal. A significantly greater observed than expected prevalence indicates a similarity in mechanism among the toxicological effects that are being studied, while a significantly lower observed than expected prevalence suggests a possible antagonism between the phenomena under investigation. Comparisons of SAR predictions between the rat mammary carcinogen models and the three other cat-SAR models were conducted to assess the likelihood that these models might be related and have common underlying biological mechanism(s) of action using the CDA method. For these analyses, we considered models based on CPDB rat carcinogens, CPDB mammary carcinogens, Salmonella mutagens, and estrogens as
Structure-ActiVity Analysis of Rat Mammary Carcinogens
Chem. Res. Toxicol., Vol. 21, No. 10, 2008 1977
Figure 1. Illustration of significant fragments used to predict the activity of 4-aminodiphenyl by the rat MN-NC and MC-NMC models.
determined by the ESCREEN assay. For these analyses, the level of significance was set at p ) 0.05. The first set of CDA analyses considers the relationship between the MC-NC and MC-NMC and the models from their parent learning set (i.e., the CPDB rat model that includes all compounds tested for carcinogenicity in rats). There was a large and significant overlap between the rat CPDB model and the MC-NC model of 2.22 times more than expected chemicals that were jointly predicted to be CPDB carcinogens and MC-NC carcinogens (analysis 1, p < 0.001, Table 3). Likewise, when comparing the CPDB model and the MC-NMC model, there were 1.27 times more than expected chemicals jointly predicted as carcinogens by these two models (analysis 2, p < 0.001, Table 3). This indicates, as expected, that to a high degree, the rat CPDB, MC-NC, and MC-NMC models are related. Analyses 3, 4, and 5 (Table 3) indicate a strong and also expected overlap between mutagenicity and carcinogenicity.
There were significantly more chemicals predicted to be Salmonella mutagens and CPDB, MC-NC, and MC-NMC carcinogens than expected by chance of 1.10, 1.55, and 1.23, respectively (analyses 3-5, p < 0.001, Table 3). Of interest is the fact that the rat MC-NMC model contained carcinogens in both its “active” (i.e., mammary carcinogens) and “inactive” (i.e., carcinogens at other sites than the mammary gland) compounds. As such, the model’s learning set was populated with mutagenic compounds as both active and inactive chemicals (i.e., as both mammary carcinogens and nonmammary carcinogens). Analysis of the Salmonella mutagenicity of compounds in the rat MC-NMC model (and MC-NC) showed that of the 104 compounds, 74 had accompanying mutagenicity data from the CPDB and 60 of these compounds were mutagens (81%). This is consistent with findings by Gold and colleagues, who reported for chemicals tested in both mice and rats, that 79% of mutagens were carcinogens and only 49% of nonmu-
1978
Chem. Res. Toxicol., Vol. 21, No. 10, 2008
Cunningham et al.
Figure 2. Illustration of significant fragments used to predict the activity of PhIP by the rat MN-NC and MC-NMC models.
tagens were carcinogens (5). Considering the 104 nonmammary carcinogens used to make the “inactive” category of the MCNMC model, of the 75 compounds with mutagenicity data, 42 (56%) were mutagenic. On the other hand, for the MC-NC model, of the 66 noncarcinogens included in the “inactive” category that had mutagenicity data, only 22 were mutagens (33%). This is close to the value obtained by Gold et al., where
they report 25% of noncarcinogens as mutagens (5). Thus, even though the MC-NMC model had a high prevalence of mutagens as both active and inactive compounds, structural features associated with mutagenicity were still identified (Figures 1 and 2). Analyses 6-9 considered the potential relationships between estrogenicity of xenobiotics as measured by the ESCREEN
Structure-ActiVity Analysis of Rat Mammary Carcinogens Table 3. Mechanistic Relationship Analyses between Rat Mammary Carcinogens, Other Carcinogens, Salmonella Mutagens, and Estrogensa analysis
observed
expected
rat CPDB vs 1. MC-NC 2. MC-NMC
∆/expected
p value
1002 989
311 435
691 554
2.22 1.27