Analysis of the ToxCast Chemical-Assay Space Using the

Oct 27, 2015 - The ToxCast release of December 2013 was used in this analysis (http://epa.gov/ncct/toxcast/data.html).(7) The data set consists of 185...
5 downloads 6 Views 2MB Size
Subscriber access provided by UNIV LAVAL

Article

Analysis of the ToxCast Chemical-Assay Space using the Comparative Toxicogenomics Database Bingjie Hu, Eric Gifford, Huijun Wang, Wendy Bailey, and Timothy Johnson Chem. Res. Toxicol., Just Accepted Manuscript • DOI: 10.1021/acs.chemrestox.5b00369 • Publication Date (Web): 27 Oct 2015 Downloaded from http://pubs.acs.org on October 29, 2015

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Chemical Research in Toxicology is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

Analysis of the ToxCast Chemical-Assay Space using the Comparative Toxicogenomics Database

Bingjie Hu*,†, Eric Gifford†, Huijun Wang‡, Wendy Bailey§, Timothy Johnson§ †

Structural Chemistry, Merck Research Laboratories, Merck & Co., West Point, PA 19486, USA.



Structural Chemistry, Merck Research Laboratories, Merck & Co., Kenilworth, NJ 07033, USA.

§

Safety Assessment and Laboratory Animal Resources, Merck Research Laboratories, Merck & Co., West

Point, PA 19486, USA. *Corresponding author: Email: [email protected]

1 ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 40

Table of Contents Graphic

2 ACS Paragon Plus Environment

Page 3 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

Abstract Many studies have attempted to predict in vivo hazards based on the ToxCast in vitro assay results with the goal of using these predictions to prioritize compounds for conventional toxicity testing. Most of these conventional studies rely on in vivo endpoints observed using preclinical species (e.g., mice and rats). Although the preclinical animal studies provide valuable insights, there can often be significant disconnects between these studies and safety concerns in humans. One way to address these concerns, for an admittedly more limited set of compounds, is to explore relationships between the in vitro data from human cell lines and observations from human related studies. The Comparative Toxicogenomics Database (CTD; http://ctdbase.org) is a rich source of data linking chemicals to human diseases/adverse events and pathways. In this study we explored the relationships between ToxCast chemicals, their ToxCast in vitro test results and their annotations of human diseases/adverse events endpoints as captured in the CTD database. We mined these associations to identify potentially interesting, statistically significant in vitro assay and in vivo toxicity correlations. To the best of our knowledge, this is one of the first studies analyzing the relationships between the ToxCast in vitro assays results and the CTD disease/adverse event endpoint annotations. The in vitro profiles identified in this analysis may prove useful for prioritizing compounds for toxicity testing, suggesting mechanisms of toxicity, and forecasting potential in vivo human drug induced injury.

3 ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 40

INTRODUCTION The identification of potential human adverse health effects of new compounds in early drug discovery is a major challenge to the pharmaceutical industry. Historically, estimation of human safe exposure levels and effective dosing has relied heavily on animal studies, and concordance between animal and human toxicities have been observed in many toxicity categories.1 However, some important toxicity categories, such as liver toxicity, still need improved models to reliably predict human toxicity.1 Moreover, animal studies provide limited information about the mechanism-of-actions for the toxic events. To this end, many studies have been performed to investigate the potential of in vitro assays in predicting human toxicity and understanding mechanism-of-actions in recent decades.2-5 The ToxCast project, initiated by the U. S. Environmental Protection Agency (EPA) in 2007, aims to forecast the potential toxicity of the chemical entities based on in vitro high-throughput screening (HTS) assays and in silico models.6-8 The project has been divided into several phases. In phase I, 309 chemicals were profiled against 467 biochemical or cell-based assays from 9 assay platforms.8 Most of the Phase I chemicals are food-use pesticide active ingredients and more than 90% of them have extensive in vivo animal testing data stored in EPA’s Toxicity Reference Database (ToxRefDB).9 By combining the ToxCast in vitro results with the ToxRefDB in vivo toxicity data, promising predictive models for hepatocellular cancer,8, 10

reproductive toxicity11 and prenatal developmental toxicity12 have been built and demonstrated the

potential application of in vitro assays to risk assessment. After the proof-of-concept studies in Phase I, ToxCast expanded the chemical space with other chemical families such as phthaliates, antimicrobials, food additives, pharmaceuticals, etc. Most notably, 111 failed drugs donated by six major pharmaceutical companies were also included in Phase II ToxCast.13 However, unlike Phase I chemicals, only about 30% of Phase II chemicals have existing in vivo toxicity information in ToxRefDB, thus limiting the study of correlations between the Phase II in vitro assays and the in vivo toxicity endpoints.

4 ACS Paragon Plus Environment

Page 5 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

We have used the collective knowledge of chemical-related adverse events from the published literature made available in the Comparative Toxicogenomics Database (CTD)14, 15 to compliment the ToxRefDB content. CTD provides information about the chemical-gene, chemical-disease and genedisease interactions manually curated from the literature, and approximately 50% of the chemicaldiseases associations were curated from studies in human. Recently, over 88,000 research articles enriched for drugs of therapeutic interest have also been curated and made available in CTD.16 We mapped the ToxCast chemical-assay data (Phase I, Phase II and endocrine activity E1K set) to the CTD chemical-gene-disease information via common chemical structures and genes related to assay endpoints. The objective is to investigate whether or not the aggregated information from multiple sources can be used to identify statistically meaningful associations between the in vitro assay endpoints and in vivo toxicity findings. In this study, we analyzed the ToxCast chemical-assay endpoint response profiles in the context of the CTD chemical-gene and chemical-disease associations and moreover identified associations between ToxCast in vitro assays and the diseases/toxicities curated in CTD.

METHODS ToxCast dataset. The ToxCast release of December 2013 was used in this analysis (http://epa.gov/ncct/toxcast/data.html)7. The dataset consists of 1,852 chemicals tested in three different phases: i) Phase I consisted of primarily food-use pesticide active ingredients, with well-characterized toxicity profiles; ii) Phase II included known drugs and failed pharmaceuticals; and iii) E1K set consisted of chemicals tested only on the estrogen, androgen, and thyroid-related ToxCast assays for potential endocrine disruption. The 1852 chemicals were tested against 821 assay endpoints from seven assay platforms: ACEA, Apredica, Attagene, BioSeek, NCGC (NIH Chemical Genomics Center), Novascreen, Odyssey Thera.7 The platforms include both biochemical and cell-based assays across multiple human primary cells, human or rodent cell lines, and rat primary hepatocytes.8 The assay endpoints cover a total of 351 unique genes spanning a wide range of molecular pathways.7 The AC50 (half-maximal activity

5 ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 40

concentration) value for each chemical-assay endpoint pair was provided and compounds with no effect in a specific assay endpoint were assigned a value of 106. 8 CTD dataset. The Comparative Toxicogenomics Database (http://ctdbase.org) provides chemicalgene, gene-disease and chemical-disease associations manually curated from the literature.15 The data is updated monthly and the release of October 2014 was used in this analysis. A total of 13,810 chemicals, 6,341 diseases and 36,798 genes were curated from 110,749 references in this release. There are 1,027,595 curated chemical-gene/protein interactions between 10,988 chemicals and 35,721 genes/proteins, and 192,669 curated chemical-disease associations formed by 8,610 unique chemicals and 3,049 unique diseases. The chemical-gene/protein interactions include both direct interactions (e.g., “chemical X binds to protein Y”), and indirect interactions (e.g., “chemical X results in increased phosphorylation of protein Y”). CTD uses the hierarchical “merged disease vocabulary” (MEDIC) for the disease terms.17 A chemical can have two types of disease associations: ‘mechanistic/marker’ association (M-type); and/or ‘therapeutic’ association (T-type). About 50% of the chemical-diseases associations were curated from studies in human. In this study, only ‘mechanistic/marker’ chemical-disease association (M-type) were used to analyze the in vitro assays and toxic effects associations. Mapping between ToxCast and CTD. The chemicals in ToxCast were annotated with CAS Registry Number (CASRN), InChIKey, chemical name and synonyms. Ultimately 1,147 ToxCast chemicals were found in CTD using CASRN. For the ToxCast chemicals that were not found by CASRN in CTD, we mapped their InChIKey to MeSH name through PubChem service.18 Searching for compounds in CTD using the MeSH name returned another 207 ToxCast chemicals. For the ToxCast chemicals that were not found by either CASRN or MeSH name, a final search using the ToxCast chemical names and synonyms was performed and 84 additional chemicals were retrieved. In summary, 1438 ToxCast chemicals, corresponding to 1423 unique chemical structures, were found to be curated in CTD. 16,588 chemicaldisease associations and 226,893 chemical-gene associations were retrieved for the 1438 ToxCast chemicals for subsequent analysis. 6 ACS Paragon Plus Environment

Page 7 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

Defining toxicity categories. The parent terms “Cardiovascular Diseases”, “Liver Diseases”, “Nervous System Diseases” and “Kidney Diseases” were used to retrieve all the drug-induced events curated in CTD for the four physiological systems central to drug safety assessment.16 The chemicaldisease associations with a ‘marker/mechanism’ relationship (M-type) were grouped into four toxicity categories: CardioTox, HepatoTox, NeuroTox and RenalTox and used in the subsequent statistical analysis of ToxCast assay-toxic effects associations. Statistical analysis of ToxCast in vitro assays and toxic effects associations. For a given ToxCast in vitro assay endpoint, a chemical having a reported AC50 was considered as a ‘Hit’ and a chemical that produced no dose response relationship in the assay endpoint (ToxCast tag AC50 of 106) was labeled as ‘non-Hit’. Based on the curated chemical-disease association from CTD, the chemical was classified as either associated with a given toxicity category or without the toxicity annotation. The chemicals that have no chemical-disease curations in CTD were classified as non-toxicity related chemicals. The 1423 unique ToxCast chemicals were then placed into a 2x2 contingency table based on their activities towards each ToxCast in vitro assay endpoint and associations with the four toxicity categories defined by the abovementioned CTD disease terms (CardioTox, HepatoTox, NeuroTox and RenalTox) (Figure Legend Figure 1). The Fisher’s exact test was performed to evaluate the significance between each assay endpoint-toxicity annotation pair. A p-value of < 0.05 (two-sided) was considered significant. All the calculations were performed using R (version 3.1.1).19 To consider the strength of the evidence that leads to the assay-toxic effect association, a lower bound of 3 was set to the number of chemicals that ‘hit’ the given assay endpoints as well as associated with the given toxic effect (a ≥ 3).20 The association between each of the assay endpoints and the toxicity annotations was evaluated by the Proportional Reporting Ratio (PRR) calculated based on the 2x2 contingency table:

 =

/( + ) /( + )

7 ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 40

A PRR > 1 indicates a positive association, i.e. the chemicals that ‘hit’ a given assay endpoint are likely to be associated with a given toxic effect. In this study we defined associations with PRR ≥ 2 as indicative of strong positive associations. RESULTS Gene associations of ToxCast chemicals The search of the 1852 ToxCast chemicals in CTD led to an overlap of 1438 chemicals (Figure 2). A total of 190,239 unique chemical-gene/protein interactions between 905 ToxCast chemicals and 25,439 genes/proteins were found in CTD. The “long-tailed” distribution of the number of unique chemicals associated with each gene/protein follows the power-law distribution, that is the majority of interactions are from a small set of genes with a large number of chemical associations. In fact, about 25% of the genes/proteins (6468) interact with only one chemical and more than half of the genes/proteins (14381) interact with no more than five chemicals. On the other end of the spectrum, merely 0.8% genes/proteins (204) interact with more than 50 chemicals but contribute to roughly 10% of the chemical-gene interactions (Figure 3). The top 16 genes/proteins interact with more than 150 ToxCast chemicals (Table 1). Not surprisingly, the nuclear receptors estrogen receptor 1 (ESR1), androgen receptor (AR) and pregnane X receptor (NR1I2) had the highest number of interactions with ToxCast chemicals. This can be explained by the promiscuous nature of these receptors which are well-known to be able to interact with structurally diverse exogenous chemicals.21, 22 Three cytochrome P450 enzymes, CYP1A1, CYP1A2 and CYP3A4, which are highly inducible in animals and human23, 24 and also well-known promiscuous targets were also at the top of the list. TNF, CASP3, BCL2, BAX and P53, which are involved in critical cell signaling pathways, were also ranked high on the list. The majority of the ToxCast assay endpoints are annotated with their intended molecular target. Overall, 351 unique genes/proteins were measured by a total of 722 ToxCast assay endpoints. To compare with the chemical-gene/protein interactions curated in CTD, we computed the number of 8 ACS Paragon Plus Environment

Page 9 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

ToxCast chemicals that are ‘active’ against each ToxCast target: if a chemical is a ‘Hit’ against any of the assay endpoints that measures a given gene/protein, it will be defined as ‘active’ against that gene/protein. Note that to make the calculation comparable with the chemical-gene/protein interactions in CTD, only the 905 ToxCast chemicals that have CTD gene annotations were used in this computation. The number of ToxCast chemicals active against each target is also listed in Table 1. In general, the overlap between the chemical-gene/protein associations from CTD and ToxCast are very limited. And due to the fact that not all of the ToxCast results are curated in CTD,25 the number of active chemicals measured from ToxCast assays can be significantly larger than the number of chemical-gene/protein interactions curated in CTD, as in the case of ESR1 and NR1I2. On the other hand, CTD contains information from over 110,000 references, which will enrich the chemical-gene/protein interactions that were not detected by ToxCast assays. The notable examples are CASP3 and MAPK1, where more than 150 ToxCast chemicalgene/protein interactions were curated in CTD. However, only three chemicals are shown as active against any of the ToxCast assays measuring against these two targets. Overall, there are 28 ToxCast genes/proteins that have at least 50 more chemical-gene/protein interactions curated in CTD than measured by the ToxCast assays (Supplementary Table S1). Finally, there are also genes/proteins that are highly associated with ToxCast chemicals but are not measured by any of the ToxCast in vitro assay endpoints. For example, the catalase (CAT) gene encodes a key antioxidant enzyme in the body’s defense against oxidative stress. It has been found to be involved in drug-induced liver injury,26 kidney failure27 and other toxic events.28 However, none of the ToxCast in vitro assays directly measures interactions between the chemicals and CAT. In fact, among the 48 genes/proteins that have more than 100 ToxCast chemicals interactions, 21 are not measured by ToxCast in vitro assays. It would be interesting to explore whether or not some of these unmeasured genes are worth including in the future ToxCast campaigns.

Disease associations of ToxCast chemicals

9 ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 40

A total of 16,588 unique chemical-disease associations were formed among 684 ToxCast chemicals and 2,128 diseases. Similar to the distribution of chemical-gene/protein interaction, the chemical-disease association also has a long-tail distribution. Few diseases associated with the majority of the chemicals and contribute significantly to the total number of associations. More specifically, among the 684 ToxCast chemicals with curated disease associations, more than 20% were found to have toxic and/or therapeutic effect associations with Drug Induced Liver Injury (DILI), necrosis and seizures (Figure 4). In fact, DILI, necrosis and seizures are among the most prevalent disease terms in CTD. Each of these disease terms has more than 5,000 curated chemical associations from over 1000 references. The hierarchical disease vocabulary MEDIC in CTD allows annotated data from child level to be subsumed to parent level. In a previous analysis, the parent terms “Cardiovascular Diseases”, “Nervous System Diseases”, “Kidney Diseases” and “Liver Diseases” were used to retrieve all the drug-induced events curated in CTD for four physiological systems central to drug safety assessment.16 The chemicaldisease associations with ‘marker/mechanism’ relationship (M-type) were then grouped into Cardio, Neuro, Renal or Hepato toxicity categories. Using the same parent terms, we assigned the ToxCast chemicals into the four toxicity categories based on their curated M-type chemical-disease associations (Table 2). In total, there are 429 unique ToxCast chemicals that are associated with 639 unique diseases from the four toxicity categories. The top 10 diseases ranked by the number of associated ToxCast chemicals for each toxicity category are listed in Figure 5. In CardioTox, the diseases had the highest number of ToxCast chemical associations are from abnormal blood pressure (hypertension and hypotension). For HepatoTox, DILI again is the most prominent disease that associates with the ToxCast chemicals. Seizure and kidney diseases have the highest number of chemical associates for NeuroTox and RenalTox respectively. These findings are consistent with the overall number of chemical-disease associations curated in the CTD database for each toxicity category.16 It is worth noting that some diseases can belong to two toxicity categories (Figure 6A). For example, cerebral hemorrhage and stroke belong to both cardiovascular diseases and nervous system diseases. 10 ACS Paragon Plus Environment

Page 11 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

Similarly, many of the chemicals are associated with more than one toxicity category (Figure 6B). In fact, 83 of the ToxCast chemicals, such as nicotine, aspirin, and warfarin, were associated with all four toxicity categories. Interestingly, 69 out of 83 chemicals that are annotated with all four toxicity categories are pharmaceuticals, a much higher ratio than pharmaceuticals in the overall ToxCast chemical set (115/429) (Figure 7). Due to the keen interests in understanding the toxicity of pharmaceuticals, a greater number of studies have been done on a few well characterized drugs, such as Tamoxifen, Warfarin, etc. These pharmaceuticals are designed to be biologically active and as such, potentially more enriched for diseaserelated activities relative to chemicals of other use categories in ToxCast. Therefore it is not surprising to observe that pharmaceuticals have a higher tendency to be associated with multiple toxicity categories. Associations between in vitro assay and toxicity category A major goal of the ToxCast project is to correlate the in vitro activity of chemicals with their in vivo toxic effects.6 To this end, many studies10-12, 29 have been performed to assess the associations between the ToxCast assay results with animal studies for over 400 chemicals captured in the EPA's Toxicity Reference Database (ToxRefDB).9, 30, 31 The curated chemical-disease associations from CTD provides an independent and complementary data source for us to further explore associations between ToxCast in vitro assay results and in vivo toxicities of the chemicals. We evaluated the associations between each of the 821 assay endpoints and the four previously defined toxicity categories using the Proportional Reporting Ratio (PRR), which is a widely used measurement for identifying signals of adverse drug reactions.20 A PRR > 1 indicates a positive association, i.e. the chemicals that ‘hit’ a given assay endpoint are likely to be associated with a given toxicity category. In this study we defined associations with PRR ≥ 2 and a ≥ 3 as indicative of strong positive associations (see Methods section). A total of 88 assay endpoints from six assay platforms were identified to have strong positive associations with the four toxicity categories (

11 ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 40

Table 3). As discussed above, a given ToxCast chemical can have multiple toxicity category annotations and a given disease term can also be placed into multiple toxicity categories. Not surprisingly, assay endpoints can also have strong positive associations with one or several toxicity categories (Figure 8). In fact, five of the assay endpoints were found to have strong positive associations with all four toxicity categories. Four out of the five assay endpoints are measuring the binding or regulation of the androgen receptor (AR) or the glucocorticoid receptor (NR3C1), indicating again the importance of these nuclear receptors (NRs) in multiple biological functions. More specifically, the associations between AR and the four toxicity categories are inferred primarily from AR agonists, such as norethindrone and testosterone propionate, whereas the associations of NR3C1 are drawn from NR3C1 antagonists, such as corticosterone. It is well-recognized that agonism or antagonism of AR and NR3C1 can cause undesirable off-target effects, such as liver complications and arrhythmia, that they are routinely included in candidate drug in vitro safety pharmacological profiling panels.32 Approximately 70% of the 88 assay endpoints with strong positive associations to any of the toxicity categories are from Novascreen. These assays measure ligand binding or enzyme inhibition for assay targets related to pathways of toxicity, cell-signaling, and xenobiotic metabolism, such as kinases, nuclear receptors, GPCRs, CYPs and other enzymes. Another assay platform with relatively high numbers of strong positive associations between assay endpoints and toxicity categories is the NCGC platform. NCGC contains cell-based assays measuring binding constants and enzyme inhibition of nuclear receptors. In contrast, despite there being a total number of 202 assay endpoints from Bioseek, only nine Bioseek assay endpoints were found to have strong associations with any of the toxicity categories. Low enrichment of significantly associated assay endpoints was also observed for Apredica and Attagene. Because the identified assay-toxicity associations are based on univariate association analysis without considering the dependencies between the assay endpoints, it is more likely to identify significantly associated assay endpoints from platforms that contain independent measurements, such as Novascreen and NCGC. In a previous study using ToxCast in vitro assays to predict in vivo rat reproductive toxicity,

12 ACS Paragon Plus Environment

Page 13 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

36 significantly associated assay endpoints were identified using univariate association analysis. Similar to our findings, 26 of the 36 assay endpoints are from Novascreen and NCGC assay platforms.11 In contrast, for assay platforms such as Apredica, Attagene and Bioseek, where multiple related readouts are measured simultaneously from the same cell line or primary cell, it is possible that correlation methods or predictive models based on multivariate analysis will be more suitable in establishing the relationships between the assay endpoints and the in vivo adverse effects. However, such multivariate analysis would require either well-characterized reference compounds with known mechanism of actions or compounds with well-established causal relationships to in vivo toxicities. For instance, by comparing the correlation of the Bioseek activity profiles between the ToxCast chemicals and the reference compounds with known mode of toxicity, Houck et al. have successfully grouped the ToxCast chemicals with similar mode of toxicities to the reference compounds.33 In our analysis, the curated chemical-disease associations provided in CTD do not necessarily establish the causal relationships between the chemicals and the diseases. Therefore, only univariate association analysis was performed. For ToxCast assays that measure enzyme inhibition, ligand binding or gene regulation, the target genes and gene family are provided. In total, 88 ToxCast assay endpoints measured 63 genes from 16 different target families. The distribution of the target genes into different gene families are shown in Figure 9 for each toxicity category. For cardio-toxicity, there are a total number of 27 genes measured by the assays and more than 40% of them belong to the GPCRs family (Table 4, Figure 9). This includes five different adrenergic receptors (α1A, α1B, α2A, α2B and β2) and two muscarinic acetylcholine receptors (M1 and M4), which are particularly important in the homeostatic regulation of the cardiovascular system.34, 35 For instance, adrenergic receptors α1B (ADRA1B) has been found to be involved in the progression of cardiac hypertrophy36 and plays a key role in mediating the blood pressure response and cardiovascular structural adaptation to chronic adrenergic stimulation in mice models.37 Besides regulation of the cardiovascular system, GPCRs play important roles in a wide array of functions in the human body and are involved in various human diseases.38 Therefore, we also observed strong associations between GPCR

13 ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 40

assays with other toxicity annotations in our analysis (Figure 9). In fact, the important role of GPCRs in regulating various functions in the human body is so well-recognized that some of the better characterized GPCRs are routinely tested in in vitro pharmacological assays to identify undesirable off-target activities for candidate drug molecules in the early phase of drug discovery.32 The blockage of the ion channel KCNH2 (hERG) can elicit potentially fatal cardiac arrhythmias followed by QT prolongation39 and therefore, is required by regulatory authorities to be tested in in vitro pharmacological assays for potential off-target effects.32 The Novascreen assay endpoint measuring hERG binding was identified to have a strong association with CardioTox, indicating again, the sensitivity of our method to identify important associations between in vitro assays and in vivo toxicities. It is also worth mentioning that some ion channels with well-characterized cardio-toxicity associations, such as the voltage-gated calcium channel subunit α1C (Cav1.2),40 are not included in the ToxCast assay panels. However, interestingly, our analysis identified the voltage-gated calcium channel subunit α1A (CACNA1A or Cav2.1), a close family of Cav1.2, as having a strong association with CardioTox. A large number of assay endpoints that have strong associations with the hepato-toxicity (Table 5) and neuro-toxicity (Table 6) target kinase binding or activation. The chemicals that lead to the associations between these kinases and hepato-toxicity and neuro-toxicity include mancozeb, a widely used fungicide that has been identified to cause hepatocarcinomas in rats,41 and tetracycline, which can induce fatty liver disease and may result in severe hepatic dysfunction when administered intravenously in high doses.42 However, the specificity of these compounds towards kinase inhibition is relatively low as most of these compounds bind to greater than 20% of the ToxCast kinase assays. Due to the low hit rate against these kinase assays (SGK1, CSNK1A1, MAPKAPK5, MARK1 and PAK4), the statistical analysis suggests a stronger association between hepato-toxicity and these kinases in comparison to the kinase assays with a higher hit rate. On the other hand, some of the compounds have shown very specific binding towards some of the kinases and therefore strengthen the association between the kinase assays and the toxicities. For instance, the environmental contaminants perfluorooctanoic acid (PFOA) and 14 ACS Paragon Plus Environment

Page 15 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

perfluorononanoic acid (PFNA) are very specific towards TEK inhibition and lead to the association between TEK assays and hepato-toxicity. The pesticide resmethrin and the teratogen cyclopamine have high specificity towards FGFR1 and therefore a strong association can be drawn between the FGFR1 assay and neuro-toxicity. Besides the compound-inferred gene-diseases associations, there is also direct evidence associating these kinases with hepato or neuro toxicities. For instance, TEK has been found to be involved in rat hepatocarcinogenesis43 and FGFR1 was reported to have strong associations with major depression, schizophrenia and bipolar disorder.44 Protein kinases are known to regulate a majority of cellular activities, especially in signaling pathways. It is estimated that up to 30% of human proteins maybe modified by protein kinases.45 Therefore, non-specific interactions with kinases can lead to dangerous side-effects and other liabilities.46,

47

However, due to the relative lack of understanding

regarding the implications of modulating specific kinase activity, it is challenge to define suitable in vitro kinase profiling assays to predict clinical adverse effect.32 The associations we found between the ToxCast kinase assays and the toxicity categories may be useful in generating hypotheses of off-target activity for potential biological follow-up. Cytochrome P-450s (CYPs) are involved in the metabolism of drugs, chemicals and endogenous substrates. CYP-mediated activation of drugs to toxic metabolites can induce hepatotoxicity.48 In our analysis, three assay endpoints measuring CYP inhibition were found to have strong associations with HepatoTox. Unexpectedly, assays that measure some of the most common and promiscuous CYPs, such as CYP3As and CYP2C9 (which together account for over 60% of P450-mediated human drug metabolism49) were not selected by our analysis. A retrospective inspection found that Novascreen assays measuring CYP3A4 and CYP2C9 inhibition had a PRR of 1.47 and 1.34 respectively, indicating weak positive associations between these assay endpoints and the HepatoTox. The fact that a relatively high portion of the non-hits in these CYP assays are HepatoTox-related chemicals contributes to the weak positive associations between the assays and the HepatoTox. This suggests that most of the HepatoToxrelated ToxCast chemicals induce liver toxicity through mechanisms other than CYP3As or CYP2C9

15 ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 40

inhibition. Considering that the majority of ToxCast chemicals are environmental compounds (>85%), which have different chemical and biological spaces compared to pharmaceuticals,50 it would be interesting to speculate whether or not most of the environmental compounds elicit hepato-toxicity through different mechanisms of actions comparing to pharmaceuticals. Seven assay endpoints measuring phosphatases inhibitions were found to have strong positive associations with NeuroTox. Six out of the seven assays are measuring protein tyrosine phosphatases (PTPs) inhibitions. These enzymes are key regulatory components in signal transduction pathways such as the MAP kinase pathway,51 and play an important role in the development and function of the central nervous system.52 Our findings that assays measuring protein tyrosine kinases (CSF1R, FGFR1, FLT1, INSR, KDR, MET, TEK) and PTPs inhibition have strong associations with NeuroTox suggests again, the importance of tyrosine phosphorylation in the regulation of neuronal function.53 Finally, for RenalTox, we observed strong associations with assays measuring regulation of gene expression for cytokine and cell adhesion molecules, all of which are from Bioseek assay platform. In particular, six of the assay endpoints are from human primary vascular smooth muscle cells (SM3C) measuring the genes involved in cytokine-cytokine receptor interaction (Table 7). These cytokines range from pro-inflammatory IL-6 to T-cell and monocyte chemoattractant types (CXCL9), which would be increased with most types of significant tissue damage. Therefore, it is not obvious why these assay endpoints were only identified to have strong associations with renal toxicity. However, as discussed earlier, Bioseek systems are designed to model complex human disease and tissue biology by stimulating different types of human primary cells such that multiple disease- and tissue-relevant signaling pathways are simultaneously active.33 The endpoints measured in the same primary cell are essentially dependent variables and therefore are expected to be identified concomitantly. Correlation methods or predictive models based on multivariate analysis might be able to elucidate more relationships between complex assay platforms (such as Bioseek) and the four toxicity categories. Nevertheless, the fact that we are able

16 ACS Paragon Plus Environment

Page 17 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

to identify endpoints from the same Bioseek primary cell suggests that the univariate association analysis is also useful in identifying signals for complex assay platforms.

DISCUSSION To investigate the potential of ToxCast in vitro assays for early detection of safety concerns, a comprehensive set of reference compounds with known in vivo adverse effects is critical for training and validation purposes. Databases, such as ToxRefDB, that stores in vivo toxicity endpoints observed using preclinical species (e.g., mice and rats) are valuable sources for defining such reference compound set. On the other hand, the chemical-gene and chemical-adverse effects associations curated from published scientific literature in CTD provides additional information of the ToxCast chemicals. By analyzing the difference between the chemical-gene associations from CTD and the chemical-gene activity profile in ToxCast, we found that for 28 ToxCast genes, there are significantly more (≥50) chemical-gene interactions curated in CTD than were identified in the ToxCast assays. Given that CTD chemicalgene/protein interactions include both direct (e.g., direct binding) and indirect interactions (e.g., gene expression), it is not surprising that there are more chemical-gene interactions from CTD than that were measured by ToxCast assays. However, it has been elegantly demonstrated that the data completeness have a significant impact on the topology of the drug-target networks.54 Therefore, models based only on the ToxCast in vitro assay results might not be accurate in predicting the in vivo toxicities owing to these missing chemical-gene/protein interactions. The additional chemical-gene interaction information provided by CTD and/or other sources covering homogenous types of data,55 such as ChEMBL,56 can be particularly valuable for building network models in predicting the associations between chemical-genediseases. We also analyzed the disease associations for the ToxCast chemicals. The chemical-disease associations provide a direct link between ToxCast chemicals and adverse effects documented in the literature. Not surprisingly, drug induced liver injury (DILI) has the largest number of associations with

17 ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 40

ToxCast chemicals owing to the vast number of available references on DILI. Due to the long-tail distribution of the chemical-disease associations, the majority of the disease terms only have a few associated ToxCast chemicals. Therefore aggregation of the child level disease terms to the parent level provides a statistically meaningful way of utilizing the chemical-disease associations. To this end, we focused our analysis on four physiological systems central to drug safety assessment and analyzed the associations of ToxCast chemicals to each of these toxicity categories. This higher level chemical-adverse effect association provides additional toxicity annotations for ToxCast chemicals and therefore enabled us to identify the associations between ToxCast in vitro assays and four major toxicity categories. The majority of the assay endpoints that have significant associations with the four toxicity categories are from Novascreen and NCGC. Since the identified assay-toxicity associations are based on univariate analysis, it is possible that a statistical method with multivariate models will elucidate more insights for assay platforms such as Apredica, Attagene and Bioseek, where multiple related readouts are measured simultaneously from the same cell line or primary cell. Nevertheless, our current univariate association analysis demonstrates sensitivities in identifying important associations between in vitro assays and in vivo toxicities (e.g. hERG assay and CardioTox) and usefulness in selecting assay endpoints from complex assay platforms (e.g. associations of endpoints from Bioseek SM3C primary cell and RenalTox). For ToxCast assays with target gene annotations, we analyzed the associations of the gene targets with each toxicity category (Figure 9, Table 4-7). We were able to identify some of the well-known genetoxicity associations, such as the strong associations between adrenergic receptors and hERG ion channel with CardioTox. In fact, many of the genes we identified are among the in vitro pharmacological profiling panels recommended by four major pharmaceutical companies to identify undesirable off-target activities.32 The overall consistency between the known gene-toxicity associations and our findings suggests the applicability of our statistical analysis in identifying significant gene-toxicity associations, despite the fact that most of the ToxCast chemicals mediating these associations are environmental chemicals and the curated compound-toxicity associations of CTD are from highly heterogeneous

18 ACS Paragon Plus Environment

Page 19 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

resources. Based on this analysis, we have identified strong associations between hepatotoxicity and some kinase assays, as well as strong associations between neurotoxicity and protein tyrosine phosphatase and kinase assays. These findings may be useful in prioritizing compounds for toxicity testing and suggesting mechanisms of toxicity. In conclusion, CTD provides a rich data source for analyzing the chemical-assay space of ToxCast. By integrating the chemical-gene and chemical-disease associations from CTD with the ToxCast assay results, we were able to identify some of the missing chemical-gene interactions and accurately infer known associations between ToxCast assay endpoints and adverse events. Our analyses also suggest ToxCast assays that may prove useful for prioritizing compounds for toxicity testing and suggesting mechanisms of toxicity. However, we do recognize that finding robust relationships between the assays and the adverse effects can be difficult because the assay measurements as well as the associations curated from the literature can be noisy. One critical factor that could significantly influence our analysis is the definition of a ‘Hit’ based on the AC50 values reported by ToxCast. We’ve already observed that some of the important chemical-gene interactions are missing from the current ToxCast assay results. In the newest ToxCast release, a new data analysis pipeline with three different models fitting the binding activity data was provided to the users. It would be interesting to see whether or not the data from the new release will identify more chemical-gene interactions from the ToxCast assay results and therefore change the assay-toxicity associations identified in this study. Another aspect worth mentioning is the ToxCast chemical space. Currently, pharmaceuticals only accounts for 26.8% of the chemicals that have associations with any of the four toxicity categories (Figure 7). It has been found that the pharmaceuticals and environmental chemicals in ToxCast are different in both chemical and biological space.50 It would be interesting to see whether or not the associations identified in this study will change significantly if more drug-like molecules are included in the future ToxCast dataset. Finally, it is also worth mentioning again that, CTD provides the associations between the chemical, gene, and disease which in many cases are not sufficient to establish a causal relationship among the three entities. For instance, the exposure

19 ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 40

level and ADME properties of the chemicals were not curated in the current CTD database, therefore limiting the establishment of the causal relationships between the chemicals and the toxic outcomes. However, to fully elucidate the mechanism of actions behind the identified ToxCast assays and the toxicity categories associations is beyond the scope of the current study. To obtain a more complete understanding of these associations, further computational analyses and experimental validations might be needed. Associations between targets and toxicities can be probed in a variety of ways, not least of which would be to screen compounds known to exhibit a given toxicity and compounds known not to exhibit a given toxicity and test them in the in-vitro assay of the target in question. An enrichment of hits in that assay for compounds that also exhibit that toxicity would lend evidence to that targets involvement in the mechanism of toxicity. Acknowledgement. We thank Dr. Frank Sistare for critical reading and stimulating discussion of the manuscript. We also thank Dr. Kara Pearson, Dr. Keith Tanis and Dr. Douglas Thudium for the helpful discussion and suggestion. We are also thankful to the support from the Postdoctoral Research Fellow Program at Merck Research Laboratory. Supporting Information Available: Table containing genes have more than 50 chemicalgene/protein interactions curated in CTD than measured by the ToxCast assays, percentage of missing data for assays identified to have strong associations with the four toxicity categories. This material is available free of charge via the Internet at http://pubs.acs.org. Abbreviations: CTD, Comparative Toxicogenomics Database; ToxRefDB, Toxicity Reference Database; HTS, high-throughput screening; NCGC, NIH Chemical Genomics Center; AC50, halfmaximal activity concentration; CASRN, CAS Registry Number; CardioTox, Cardio-Toxicity; HepatoTox, Hepato-Toxicity; NeuroTox, Neuro-Toxicity; RenalTox, Renal-Toxicity; PRR, Proportional Reporting Ratio; ESR1, nuclear receptors estrogen receptor 1; AR, androgen receptor; NR1I2, pregnane X receptor; NR, nuclear receptor; CYP, Cytochrome P450; CAT, catalase; DILI, Drug Induced Liver

20 ACS Paragon Plus Environment

Page 21 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

Injury; GPCR, G-protein-coupled receptors; hERG, ion channel KCNH2; PFOA, perfluorooctanoic acid; PFNA, perfluorononanoic acid; PTPs, protein tyrosine phosphatases.

References (1)

Olson, H., Betton, G., Robinson, D., Thomas, K., Monro, A., Kolaja, G., Lilly, P., Sanders, J., Sipes, G., and Bracken, W. (2000) Concordance of the toxicity of pharmaceuticals in humans and in animals. Regul. Toxicol. Pharmacol. 32, 56-67.

(2)

Huang, R., Southall, N., Cho, M.-H., Xia, M., Inglese, J., and Austin, C. P. (2008) Characterization of diversity in toxicity mechanism using in vitro cytotoxicity assays in quantitative high throughput screening. Chem. Res. Toxicol. 21, 659-667.

(3)

Schmidt, C. W. (2009) TOX 21: new dimensions of toxicity testing. Environ. Health Perspect. 117, A348-A353.

(4)

Dambach, D. M., Andrews, B. A., and Moulin, F. (2005) New technologies and screening strategies for hepatotoxicity: use of in vitro models. Toxicol. Pathol. 33, 17-26.

(5)

Shukla, S. J., Huang, R., Austin, C. P., and Xia, M. (2010) The future of toxicity testing: a focus on in vitro methods using a quantitative high-throughput screening platform. Drug Discovery Today 15, 997-1007.

(6)

Dix, D. J., Houck, K. A., Martin, M. T., Richard, A. M., Setzer, R. W., and Kavlock, R. J. (2007) The ToxCast program for prioritizing toxicity testing of environmental chemicals. Toxicol. Sci. 95, 5-12.

(7)

Kavlock, R., Chandler, K., Houck, K., Hunter, S., Judson, R., Kleinstreuer, N., Knudsen, T., Martin, M., Padilla, S., and Reif, D. (2012) Update on EPA’s ToxCast program: providing high

21 ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 40

throughput decision support tools for chemical risk management. Chem. Res. Toxicol. 25, 12871302. (8)

Judson, R. S., Houck, K. A., Kavlock, R. J., Knudsen, T. B., Martin, M. T., Mortensen, H. M., Reif, D. M., Rotroff, D. M., Shah, I., Richard, A. M., and Dix, D. J. (2010) In vitro screening of environmental chemicals for targeted testing prioritization: the ToxCast project. Environ. Health Perspect. 118, 485-492.

(9)

Martin, M. T., Judson, R. S., Reif, D. M., Kavlock, R. J., and Dix, D. J. (2009) Profiling chemicals based on chronic toxicity results from the US EPA ToxRef Database. Environ. Health Perspect. 117, 392-399.

(10)

Shah, I., Houck, K., Judson, R. S., Kavlock, R. J., Martin, M. T., Reif, D. M., Wambaugh, J., and Dix, D. J. (2011) Using nuclear receptor activity to stratify hepatocarcinogens. PLoS One 6, e14584.

(11)

Martin, M. T., Knudsen, T. B., Reif, D. M., Houck, K. A., Judson, R. S., Kavlock, R. J., and Dix, D. J. (2011) Predictive model of rat reproductive toxicity from ToxCast high throughput screening. Biol. Reprod. 85, 327-339.

(12)

Sipes, N. S., Martin, M. T., Reif, D. M., Kleinstreuer, N. C., Judson, R. S., Singh, A. V., Chandler, K. J., Dix, D. J., Kavlock, R. J., and Knudsen, T. B. (2011) Predictive models of prenatal developmental toxicity from ToxCast high-throughput screening data. Toxicol. Sci. 124, 109-127.

(13)

Sipes, N. S., Martin, M. T., Kothiya, P., Reif, D. M., Judson, R. S., Richard, A. M., Houck, K. A., Dix, D. J., Kavlock, R. J., and Knudsen, T. B. (2013) Profiling 976 ToxCast chemicals across 331 enzymatic and receptor signaling assays. Chem. Res. Toxicol. 26, 878-895.

22 ACS Paragon Plus Environment

Page 23 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

(14)

Davis, A. P., Murphy, C. G., Rosenstein, M. C., Wiegers, T. C., and Mattingly, C. J. (2008) The Comparative Toxicogenomics Database facilitates identification and understanding of chemicalgene-disease associations: arsenic as a case study. BMC Med. Genomics 1, 48.

(15)

Davis, A. P., Murphy, C. G., Johnson, R., Lay, J. M., Lennon-Hopkins, K., Saraceni-Richards, C., Sciaky, D., King, B. L., Rosenstein, M. C., and Wiegers, T. C. (2012) The comparative toxicogenomics database: update 2013. Nucleic Acids Res. 41(Database issue), D1104-1114.

(16)

Davis, A. P., Wiegers, T. C., Roberts, P. M., King, B. L., Lay, J. M., Lennon-Hopkins, K., Sciaky, D., Johnson, R., Keating, H., and Greene, N. (2013) A CTD–Pfizer collaboration: manual curation of 88 000 scientific articles text mined for drug–disease and drug–phenotype interactions. Database 2013, bat080.

(17)

Davis, A. P., Wiegers, T. C., Rosenstein, M. C., and Mattingly, C. J. (2012) MEDIC: a practical disease vocabulary used at the Comparative Toxicogenomics Database. Database 2012, bar065.

(18)

Bolton, E. E., Wang, Y., Thiessen, P. A., and Bryant, S. H. (2008) PubChem: integrated platform of small molecules and biological activities. Annu. Rep. Comput. Chem. 4, 217-241.

(19)

Team, R. C. (2014) R: A language and environment for statistical computing. R Foundation for Statistical Computing.

(20)

Evans, S., Waller, P. C., and Davis, S. (2001) Use of proportional reporting ratios (PRRs) for signal generation from spontaneous adverse drug reaction reports. Pharmacoepidemiol. Drug Saf. 10, 483-486.

(21)

Ng, H. W., Perkins, R., Tong, W., and Hong, H. (2014) Versatility or Promiscuity: The Estrogen Receptors, Control of Ligand Selectivity and an Update on Subtype Selective Ligands. Int. J. Environ. Res. Public Health 11, 8709-8742.

23 ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(22)

Page 24 of 40

Krasowski, M. D., Ni, A., Hagey, L. R., and Ekins, S. (2011) Evolution of promiscuous nuclear hormone receptors: LXR, FXR, VDR, PXR, and CAR. Mol. Cell. Endocrinol. 334, 39-48.

(23)

Gonzalez, F. J. (1990) Molecular genetics of the P-450 superfamily. Pharmacol. Ther. 45, 1-38.

(24)

Pichard, L., Fabre, I., Fabre, G., Domergue, J., Saint Aubert, B., Mourad, G., and Maurel, P. (1990) Cyclosporin A drug interactions. Screening for inducers and inhibitors of cytochrome P450 (cyclosporin A oxidase) in primary cultures of human hepatocytes and in liver microsomes. Drug Metab. Dispos. 18, 595-606.

(25)

Knudsen, T. B., Houck, K. A., Sipes, N. S., Singh, A. V., Judson, R. S., Martin, M. T., Weissman, A., Kleinstreuer, N. C., Mortensen, H. M., and Reif, D. M. (2011) Activity profiles of 309 ToxCast™ chemicals evaluated across 292 biochemical targets. Toxicology 282, 1-15.

(26)

Huang, Q., Jin, X., Gaillard, E. T., Knight, B. L., Pack, F. D., Stoltz, J. H., Jayadev, S., and Blanchard, K. T. (2004) Gene expression profiling reveals multiple toxicity endpoints induced by hepatotoxicants. Mutat. Res., Fundam. Mol. Mech. Mutagen. 549, 147-167.

(27)

Cho, K.-h., Kim, H.-j., Rodriguez-Iturbe, B., and Vaziri, N. D. (2009) Niacin ameliorates oxidative stress, inflammation, proteinuria, and hypertension in rats with chronic renal failure. Am. J. Physiol. 297, F106-F113.

(28)

Sam, F., Kerstetter, D. L., Pimental, D. R., Mulukutla, S., Tabaee, A., Bristow, M. R., Colucci, W. S., and Sawyer, D. B. (2005) Increased reactive oxygen species production and functional alterations in antioxidant enzymes in human failing myocardium. J. Card. Fail. 11, 473-480.

(29)

Thomas, R. S., Black, M., Li, L., Healy, E., Chu, T.-M., Bao, W., Andersen, M., and Wolfinger, R. (2012) A comprehensive statistical analysis of predicting in vivo hazard using high-throughput in vitro screening. Toxicol. Sci. 128, 398-417.

24 ACS Paragon Plus Environment

Page 25 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

(30)

Martin, M. T., Mendez, E., Corum, D. G., Judson, R. S., Kavlock, R. J., Rotroff, D. M., and Dix, D. J. (2009) Profiling the reproductive toxicity of chemicals from multigeneration studies in the toxicity reference database. Toxicol. Sci. 110, 181-190.

(31)

Knudsen, T. B., Martin, M. T., Kavlock, R. J., Judson, R. S., Dix, D. J., and Singh, A. V. (2009) Profiling the activity of environmental chemicals in prenatal developmental toxicity studies using the US EPA's ToxRefDB. Reprod. Toxicol. 28, 209-219.

(32)

Bowes, J., Brown, A. J., Hamon, J., Jarolimek, W., Sridhar, A., Waldron, G., and Whitebread, S. (2012) Reducing safety-related drug attrition: the use of in vitro pharmacological profiling. Nat. Rev. Drug Discov. 11, 909-922.

(33)

Houck, K. A., Dix, D. J., Judson, R. S., Kavlock, R. J., Yang, J., and Berg, E. L. (2009) Profiling bioactivity of the ToxCast chemical library using BioMAP primary human cell systems. J. Biomol. Screening 14, 1054-1066.

(34)

Rockman, H. A., Koch, W. J., and Lefkowitz, R. J. (2002) Seven-transmembrane-spanning receptors and heart function. Nature 415, 206-212.

(35)

Tang, C.-M., and Insel, P. A. (2004) GPCR expression in the heart:“new” receptors in myocytes and fibroblasts. Trends Cardiovasc. Med. 14, 94-99.

(36)

Zuscik, M. J., Chalothorn, D., Hellard, D., Deighan, C., McGee, A., Daly, C. J., Waugh, D. J., Ross, S. A., Gaivin, R. J., and Morehead, A. J. (2001) Hypotension, autonomic failure, and cardiac hypertrophy in transgenic mice overexpressing the α1B-adrenergic receptor. J. Biol. Chem. 276, 13738-13743.

25 ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(37)

Page 26 of 40

Vecchione, C., Fratta, L., Rizzoni, D., Notte, A., Poulet, R., Porteri, E., Frati, G., Guelfi, D., Trimarco, V., and Mulvany, M. J. (2002) Cardiovascular influences of α1b-adrenergic receptor defect in mice. Circulation 105, 1700-1707.

(38)

Heng, B. C., Aubel, D., and Fussenegger, M. (2013) An overview of the diverse roles of Gprotein coupled receptors (GPCRs) in the pathophysiology of various human diseases. Biotechnol. Adv. 31, 1676-1694.

(39)

Curran, M. E., Splawski, I., Timothy, K. W., Vincen, G. M., Green, E. D., and Keating, M. T. (1995) A molecular basis for cardiac arrhythmia: HERG mutations cause long QT syndrome. Cell 80, 795-803.

(40)

Splawski, I., Timothy, K. W., Sharpe, L. M., Decher, N., Kumar, P., Bloise, R., Napolitano, C., Schwartz, P. J., Joseph, R. M., and Condouris, K. (2004) Ca v 1.2 calcium channel dysfunction causes a multisystem disorder including arrhythmia and autism. Cell 119, 19-31.

(41)

Belpoggi, F., Soffritti, M., Guarino, M., Lambertini, L., Cevolani, D., and Maltoni, C. (2002) Results of Long‐Term Experimental Studies on the Carcinogenicity of Ethylene‐bis‐ Dithiocarbamate (Mancozeb) in Rats. Ann. N. Y. Acad. Sci. 982, 123-136.

(42)

LIVERTOX Database, http://livertox.nih.gov/TetracyclineAndOxytetracyline.htm. National Library of Medicine.

(43)

Kuroda, H., Ohtsuru, A., Futakuchi, M., Kawashita, Y., Nagayama, Y., Fukuda, E., Namba, H., Shirai, T., Kanematsu, T., and Yamashita, S. (2002) Distinctive gene expression of receptor-type tyrosine kinase families during rat hepatocarcinogenesis. Int. J. Mol. Med. 9, 473-480.

26 ACS Paragon Plus Environment

Page 27 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

(44)

Gaughran, F., Payne, J., Sedgwick, P. M., Cotter, D., and Berry, M. (2006) Hippocampal FGF-2 and FGFR1 mRNA expression in major depression, schizophrenia and bipolar disorder. Brain Res. Bull. 70, 221-227.

(45)

Cohen, P. (2000) The regulation of protein function by multisite phosphorylation–a 25 year update. Trends Biochem. Sci. 25, 596-601.

(46)

Shah, R. R., Morganroth, J., and Shah, D. R. (2013) Hepatotoxicity of tyrosine kinase inhibitors: clinical and regulatory perspectives. Drug Saf. 36, 491-503.

(47)

Force, T., and Kolaja, K. L. (2011) Cardiotoxicity of kinase inhibitors: the prediction and translation of preclinical models to clinical outcomes. Nat. Rev. Drug Discov. 10, 111-126.

(48)

Villeneuve, J.-P., and Pichette, V. (2004) Cytochrome P450 and liver diseases. Curr. Drug Metab. 5, 273-282.

(49)

Wienkers, L. C., and Heath, T. G. (2005) Predicting in vivo drug interactions from in vitro drug discovery data. Nat. Rev. Drug Discovery 4, 825-833.

(50)

Shah, F., and Greene, N. (2013) Analysis of Pfizer Compounds in EPA’s ToxCast ChemicalsAssay Space. Chem. Res. Toxicol. 27, 86-98.

(51)

Denu, J. M., and Dixon, J. E. (1998) Protein tyrosine phosphatases: mechanisms of catalysis and regulation. Curr. Opin. Chem. Biol. 2, 633-641.

(52)

Paul, S., and Lombroso, P. (2003) Receptor and nonreceptor protein tyrosine phosphatases in the nervous system. Cell. Mol. Life Sci. 60, 2465-2482.

(53)

Wagner, K. R., Mei, L., and Huganir, R. L. (1991) Protein tyrosine kinases and phosphatases in the nervous system. Curr. Opin. Neurobiol. 1, 65-73.

27 ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(54)

Page 28 of 40

Mestres, J., Gregori-Puigjané, E., Valverde, S., and Sole, R. V. (2008) Data completeness—the Achilles heel of drug-target networks. Nat. Biotechnol. 26, 983-984.

(55)

Chen, B., Dong, X., Jiao, D., Wang, H., Zhu, Q., Ding, Y., and Wild, D. J. (2010) Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data. BMC Bioinformatics 11, 255.

(56)

Bento, A. P., Gaulton, A., Hersey, A., Bellis, L. J., Chambers, J., Davies, M., Krüger, F. A., Light, Y., Mak, L., and McGlinchey, S. (2014) The ChEMBL bioactivity database: an update. Nucleic Acids Res. 42, D1083-D1090.

Table 1. Genes with more than 150 ToxCast chemical interactions. Gene Symbol ESR1 TNF AR CYP1A1

Gene Name

estrogen receptor 1 tumor necrosis factor androgen receptor cytochrome P450, family 1, subfamily A, polypeptide 1 caspase 3, apoptosis-related cysteine CASP3 peptidase catalase CAT nuclear receptor subfamily 1, group I, NR1I2 member 2 CYP1A2 cytochrome P450, family 1, subfamily A, polypeptide 2 interleukin 6 IL6 CYP3A4 cytochrome P450, family 3, subfamily A, polypeptide 4 MAPK1 mitogen-activated protein kinase 1 chemokine (C-C motif) ligand 2 CCL2 B-cell CLL/lymphoma 2 BCL2 HMOX1 heme oxygenase (decycling) 1 BCL2-associated X protein BAX tumor protein p53 TP53

Overlapc

No. of Chemicals CTD Associationa 241 227 221 219

No. of Chemicals ToxCast Assayb 556 180 313 34

191 48 118 15

207

3

1

198 175

NA 465

NA 146

174

63

17

173 172

92 36

19 9

161 160 158 154 152 150

3 277 NA NA NA 171

2 82 NA NA NA 49

28 ACS Paragon Plus Environment

Page 29 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

a

Number of ToxCast chemicals associated with the given gene/protein from the CTD curation. bNumber of ToxCast chemicals active against the given gene/protein based on their ToxCast assay activity (Only the 905 chemicals with chemical-gene/protein interaction curations in CTD were used in this calculation). c Overlap between the chemical-gene/protein associations from CTD and ToxCast. Table 2. ToxCast chemical-disease profiles of ‘mechanistic/marker’ associations. Disease Term

No. Diseases No. Chemicals Toxicity Category

Nervous System Diseases

362

272

NeuroTox

Cardiovascular diseases

226

175

CardioTox

Kidney Diseases

52

167

RenalTox

Liver Diseases

44

279

HepatoTox

Unique Entries

639

429

Table 3. Number of assay endpoints with significant associations with the four toxicity categories. Number of Assay Endpoints Assay Platform Cardio Hepato Neuro Renal Totala 0 0 0 0 ACEA 0 (2) 0 0 2 5 Apredica 7 (60) 1 2 1 1 Attagene 2 (82) 2 0 0 8 BioSeek 9 (202) 5 5 4 3 NCGC 11 (37) 28 12 35 12 Novascreen 57 (420) 2 0 0 0 Odyssey 2 (18) Thera Total 38 19 42 29 88 (821) a The number in the parenthesis indicates the total number of endpoints from the assay platform, while the data outside of the parenthesis shows the number of assay endpoints that pass the statistical significance test (p-value < 0.05) with PRR ≥ 2 and a ≥ 3. Table 4. Assays with significant associations with cardio-toxicity ranked by the PRR value. Assay name NVS_NR_rMR ATG_AR_TRANS Tox21_GR_BLA_Antagonist_ratio NVS_IC_hKhERGCh NVS_NR_cAR NVS_GPCR_rOpiate_NonSelectiveNa

PRRa a hit rate 3.84 12 1.88% 3.61 6 0.97% 3.34 16 2.92% 3.18 4 0.92% 3.11 23 12.67% 3.09 9 2.17%

Gene symbolb Nr3c2 AR NR3C1 KCNH2 AR Oprm1

Gene family nuclear receptor nuclear receptor nuclear receptor ion channel nuclear receptor gpcr 29

ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 40

NVS_NR_hGR 2.97 24 5.08% NR3C1 nuclear receptor NVS_ENZ_rabI2C 2.91 5 1.26% NA misc protein BSK_4H_Pselectin_up 2.82 4 1.03% SELP cell adhesion molecules NVS_NR_hPR 2.80 19 4.18% PGR nuclear receptor Tox21_AR_LUC_MDAKB2_Agonist 2.72 25 5.78% AR nuclear receptor NVS_GPCR_rAdra1A 2.66 9 2.52% Adra1a gpcr NVS_TR_rNET 2.66 5 1.37% Slc6a2 transporter Tox21_TR_LUC_GH3_Agonist 2.58 10 2.30% THRB nuclear receptor NVS_NR_rAR 2.56 24 5.85% AR nuclear receptor NVS_ADME_rCYP2C11 2.45 9 19.65% Cyp2c11 Cyp NVS_GPCR_h5HT6 2.43 17 5.38% HTR6 gpcr NVS_GPCR_r5HT_NonSelective 2.40 6 1.83% Htr1a gpcr NVS_TR_hSERT 2.37 7 2.17% SLC6A4 transporter Tox21_Aromatase_Inhibition 2.35 35 9.60% CYP19A1 Cyp NVS_GPCR_hAdrb2 2.35 8 2.52% ADRB2 gpcr NVS_GPCR_rAdra1B 2.32 10 3.20% Adra1b gpcr NVS_NR_bER 2.32 13 3.34% ESR1 nuclear receptor NVS_GPCR_rOpiate_NonSelective 2.32 13 4.23% Oprm1 gpcr OT_AR_ARSRC1_0960 2.27 36 10.31% AR nuclear receptor NVS_NR_hER 2.19 23 6.47% ESR1 nuclear receptor NVS_GPCR_rmAdra2B 2.17 11 3.78% Adra2b gpcr NVS_GPCR_hAdra2A 2.11 11 3.89% ADRA2A gpcr NVS_GPCR_hM1 2.09 10 3.54% CHRM1 gpcr NVS_ENZ_oCOX1 2.09 16 5.78% PTGS1 reductase OT_AR_ARSRC1_0480 2.09 19 5.58% AR nuclear receptor Tox21_AR_LUC_MDAKB2_Antagonist 2.09 12 3.41% AR nuclear receptor NVS_IC_rCaBTZCHL 2.08 13 4.69% Cacna1a ion channel NVS_NR_hPPARg 2.05 22 6.54% PPARG nuclear receptor NVS_GPCR_hM4 2.02 10 3.66% CHRM4 gpcr NVS_GPCR_rAdra2_NonSelective 2.02 10 3.66% Adra2a gpcr BSK_SAg_IL8_up 2.02 10 3.67% IL8 cytokine NVS_GPCR_g5HT4 2.02 15 5.61% Htr4 gpcr a The percentage of missing data for each assay were provided in the Supplementary Table S2. bTargets in italics are among the in vitro pharmacological profiling panel suggested by four major pharmaceutical companies to identify undesirable off-target activities of drug candidates.32 Table 5. Assays with significant associations to hepato-toxicity ranked by the PRR value. Assay name NVS_ENZ_hSGK1 NVS_ENZ_hCK1a NVS_ENZ_hMAPKAPK5 NVS_ENZ_hMARK1

PRRa a hit rate 4.84 3 1.09% 3.62 3 1.46% 3.62 3 1.46% 3.62 3 1.46%

Gene symbolb SGK1 CSNK1A1 MAPKAPK5 MARK1

Gene family kinase kinase kinase kinase

30 ACS Paragon Plus Environment

Page 31 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

kinase background measurement 2.55 30 12.67% AR nuclear receptor NVS_NR_cAR 2.36 6 1.14% TEK kinase NVS_ENZ_hTie2 2.30 7 1.37% Slc6a2 transporter NVS_TR_rNET 2.29 18 2.92% nuclear receptor Tox21_GR_BLA_Antagonist_ratio NR3C1 2.29 7 1.11% NA background Tox21_AutoFluor_HEPG2_Cell_blue measurement nuclear receptor 2.24 6 0.97% AR ATG_AR_TRANS 2.16 15 11.38% Cyp2a2 cyp NVS_ADME_rCYP2A2 2.14 6 1.26% NA misc protein NVS_ENZ_rabI2C 46 8.63% THRB nuclear receptor 2.09 Tox21_TR_LUC_GH3_Antagonist 2.08 28 5.08% NR3C1 nuclear receptor NVS_NR_hGR 2.07 13 11.35% Cyp2c13 cyp NVS_ADME_rCYP2C13 2.04 12 2.16% NR4A2 nuclear receptor ATG_NURR1_TRANS 49 9.60% CYP19A1 cyp 2.01 Tox21_Aromatase_Inhibition a The percentage of missing data for each assay were provided in the Supplementary Table S2. bTargets in italics are among the in vitro pharmacological profiling panel suggested by four major pharmaceutical companies to identify undesirable off-target activities of drug candidates.32 NVS_ENZ_hPAK4 Tox21_AutoFluor_HEPG2_Media_blue

4 8

3.12 3.00

0.57% PAK4 0.97% NA

Table 6. Assays with significant associations to neuro-toxicity ranked by the PRR value. Assay name NVS_ENZ_hSGK1 APR_OxidativeStress_72h_dn NVS_ENZ_hInsR_Activator NVS_ENZ_hCASP8 Tox21_AutoFluor_HEK293_Media_green

PRRa a hit rate 4.59 3 1.09% 4.15 3 0.37% 4.15 3 0.34% 4.13 3 0.34% 4.00 3 0.28%

Gene symbolb SGK1 H2AFX INSR CASP8 NA

NVS_ENZ_hCK1a NVS_ENZ_hMAPKAPK5 NVS_ENZ_hMARK1 APR_CellCycleArrest_72h_up NVS_ENZ_hAKT2 NVS_ENZ_hPTPN6 NVS_ENZ_hPTPRF NVS_ENZ_hPAK4 Tox21_AutoFluor_HEPG2_Cell_green

3.43 3.43 3.43 3.33 3.33 3.33 3.33 3.31 3.19

3 3 3 4 4 4 4 4 3

1.46% 1.46% 1.46% 0.62% 0.57% 0.57% 0.57% 0.57% 0.35%

CSNK1A1 MAPKAPK5 MARK1 NA AKT2 PTPN6 PTPRF PAK4 NA

Tox21_AutoFluor_HEPG2_Media_green

3.19

3

0.35% NA

NVS_ENZ_hACP1

3.11

3

0.46% ACP1

Gene family kinase cell cycle kinase protease background measurement kinase kinase kinase cell cycle kinase phosphatase phosphatase kinase background measurement background measurement phosphatase 31

ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 40

NVS_ENZ_hMsk1 3.11 3 0.46% RPS6KA5 kinase NVS_IC_rKATPCh 3.11 3 0.46% Kcnj1 ion channel NVS_ENZ_hVEGFR1 3.08 3 0.46% FLT1 kinase NVS_ENZ_hMet 2.98 5 0.80% MET kinase NVS_ENZ_hVEGFR2 2.96 5 0.80% KDR kinase NVS_ENZ_rabI2C 2.67 7 1.26% NA misc protein NVS_ENZ_hCSF1R 2.60 5 0.92% CSF1R kinase NVS_ENZ_hPTPN1 2.60 5 0.92% PTPN1 phosphatase NVS_ENZ_hPTPN4 2.60 5 0.92% PTPN4 phosphatase NVS_ENZ_hPTPRM 2.60 5 0.92% PTPRM phosphatase NVS_ENZ_hTie2 2.50 6 1.14% TEK kinase Tox21_GR_BLA_Antagonist_ratio 2.49 19 2.92% NR3C1 nuclear receptor NVS_ENZ_hFGFR1 2.47 10 1.94% FGFR1 kinase NVS_GPCR_rOpiate_NonSelectiveNa 2.45 11 2.17% Oprm1 gpcr ATG_AR_TRANS 2.30 6 0.97% AR nuclear receptor NVS_GPCR_gLTD4 2.25 7 1.49% Cysltr1 gpcr NVS_NR_cAR 2.24 28 12.67% AR nuclear receptor NVS_ENZ_hPTPN11 2.23 8 1.71% PTPN11 phosphatase NVS_NR_hGR 2.23 29 5.08% NR3C1 nuclear receptor NVS_ENZ_rAChE 2.20 14 3.35% Ache esterase NVS_NR_rMR 2.20 11 1.88% Nr3c2 nuclear receptor NVS_GPCR_rAdra1A 2.11 11 2.52% Adra1a gpcr NVS_GPCR_mCCKAPeripheral 2.10 10 2.29% Cckar gpcr NVS_ENZ_hSIRT1 2.09 8 1.83% SIRT1 hydrolase NVS_GPCR_r5HT_NonSelective 2.09 8 1.83% Htr1a gpcr NVS_NR_hPR 2.02 22 4.18% PGR nuclear receptor a The percentage of missing data for each assay were provided in the Supplementary Table S2. bTargets in italics are among the in vitro pharmacological profiling panel suggested by four major pharmaceutical companies to identify undesirable off-target activities of drug candidates.32

Table 7. Assays with significant associations to renal-toxicity ranked by the PRR value. Assay name BSK_SM3C_MIG_down BSK_SM3C_SAA_down BSK_SM3C_IL6_down BSK_SM3C_SRB_down BSK_SM3C_IL8_down BSK_SM3C_MCSF_down NVS_ENZ_rabI2C ATG_AR_TRANS BSK_4H_Pselectin_up

PRRa a hit rate 5.35 4 6.96% 4.84 9 27.11% 4.69 3 5.49% 4.35 4 8.42% 4.11 3 6.23% 3.80 4 9.52% 3.22 5 1.26% 3.15 5 0.97% 3.15 4 1.03%

Gene symbolb CXCL9 SAA1 IL6 NA IL8 CSF1 NA AR SELP

Target family cytokine cell adhesion molecules cytokine cell cycle cytokine hematopoietic factor misc protein nuclear receptor cell adhesion molecules 32

ACS Paragon Plus Environment

Page 33 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

Tox21_AutoFluor_HEPG2_Media_blue

3.14

5

0.97% NA

background measurement NVS_TR_rNET 2.94 5 1.37% Slc6a2 transporter NVS_NR_rMR 2.63 8 1.88% Nr3c2 nuclear receptor NVS_GPCR_rOpiate_NonSelectiveNa 2.63 7 2.17% Oprm1 gpcr Tox21_GR_BLA_Antagonist_ratio 2.57 12 2.92% NR3C1 nuclear receptor APR_CellCycleArrest_24h_up 2.56 5 1.74% NA cell cycle APR_MitoMass_24h_dn 2.54 7 2.49% NA cell morphology NVS_GPCR_rAdra2_NonSelective 2.50 11 3.66% Adra2a gpcr APR_MitoMass_72h_dn 2.44 9 3.36% NA cell morphology BSK_CASM3C_MCP1_up 2.38 7 2.63% CCL2 cytokine Tox21_AR_LUC_MDAKB2_Agonist 2.35 21 5.78% NA nuclear receptor APR_StressKinase_24h_up 2.26 13 5.35% JUN cell cycle NVS_NR_cAR 2.22 18 12.67% AR nuclear receptor NVS_ENZ_oCOX1 2.17 15 5.78% PTGS1 reductase NVS_GPCR_rOpiate_NonSelective 2.15 11 4.23% Oprm1 gpcr NVS_NR_hGR 2.12 17 5.08% NR3C1 nuclear receptor NVS_GPCR_g5HT4 2.09 14 5.61% Htr4 gpcr NVS_NR_rAR 2.07 19 5.85% Ar nuclear receptor APR_p53Act_24h_up 2.01 30 14.93% TP53 dna binding NVS_GPCR_h5HT6 2.01 13 5.38% HTR6 gpcr a The percentage of missing data for each assay were provided in the Supplementary Table S2. bTargets in italics are among the in vitro pharmacological profiling panel suggested by four major pharmaceutical companies to identify undesirable off-target activities of drug candidates.32

33 ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 40

Figure Legend Figure 1. Statistical measurement of the assay-toxicity category associations. Figure 2. ToxCast and CTD Overlap. A) ToxCast chemicals were used to search against the CTD database using the CASRN, MeSH chemical name and synonyms. A total of 1,438 ToxCast chemicals were found in CTD. Among this, 684 chemicals associated with 2,128 diseases through 16,588 associations. And 905 chemicals associated with 25,439 genes through 226,893 interactions. Figure 3. Distribution of chemical-gene/protein interactions. The number of unique chemical interactions with a given gene/protein is plotted. The insert shows the top 16 genes with more than 150 CTD curated chemical-gene interactions. Figure 4. Diseases with the highest number of ToxCast chemical associations. Diseases associated with more than 10% of the ToxCast chemicals are listed. Figure 5. The top 10 diseases ranked by the number of associated ToxCast chemicals associated for each toxicity profile. Figure 6. Venn diagram of disease, ToxCast chemicals and assay endpoints associated with each toxicity category. Figure 7. Use categories of the ToxCast chemicals associated with the four toxicity categories. A. Use categories of the 429 chemicals associated with at least one toxicity category. B. Use categories of the 83 chemicals associated with all four toxicity categories. Figure 8. Venn diagram of assay endpoints associated with each toxicity category Figure 9. Distribution of target genes into different gene family for each toxicity category.

34 ACS Paragon Plus Environment

Page 35 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

Figure 1.

Figure 2.

35 ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 36 of 40

Figure 3.

36 ACS Paragon Plus Environment

Page 37 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

Figure 4.

37 ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 38 of 40

Figure 5.

Figure 6.

38 ACS Paragon Plus Environment

Page 39 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

Figure 7.

Figure 8.

39 ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 40 of 40

Figure 9.

40 ACS Paragon Plus Environment