Computational Models for Human and Animal ... - ACS Publications

Feb 25, 2016 - data set comprising 3712 compounds with liver related toxicity findings in humans ... programs.3 Thus, the early identification of a he...
0 downloads 0 Views 2MB Size
Subscriber access provided by UNIV OSNABRUECK

Article

Computational Models for Human and Animal Hepatotoxicity with a Global Application Scope Denis Mulliner, Friedemann Schmidt, Manuela Stolte, HansPeter Spirkl, Andreas Czich, and Alexander Amberg Chem. Res. Toxicol., Just Accepted Manuscript • DOI: 10.1021/acs.chemrestox.5b00465 • Publication Date (Web): 25 Feb 2016 Downloaded from http://pubs.acs.org on February 27, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Chemical Research in Toxicology is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

Computational Models for Human and Animal Hepatotoxicity with a Global Application Scope Denis Mulliner, Friedemann Schmidt, Manuela Stolte, Hans-Peter Spirkl, Andreas Czich, Alexander Amberg Sanofi-Aventis Deutschland GmbH, R&D DSAR/Preclinical Safety FF, Industriepark Hoechst, Building H831, D-65926 Frankfurt am Main, Germany Corresponding Author: Denis Mulliner, [email protected]

1 ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 31

Abstract Hepatic toxicity is a key concern for novel pharmaceutical drugs, since it is difficult to anticipate in preclinical models, and it can originate from pharmacologically unrelated drug effects, such as pathway interference, metabolism, and drug accumulation. Because liver toxicity still ranks amongst the top reasons for drug attrition, the reliable prediction of adverse hepatic effects is a substantial challenge in drug discovery and development. To this end, more effort needs to be focused on the development of improved predictive in-vitro and in-silico approaches. Current computational models often lack applicability to novel pharmaceutical candidates, typically due to insufficient coverage of the chemical space of interest, which is either imposed by size or diversity of the training data. Hence, there is an urgent need for better computational models to allow for the identification of safe drug candidates and to support experimental design. In this context a large data set comprising 3712 compounds with liver related toxicity findings in humans and animals was collected from various sources. The complex pathology was clustered into 21 preclinical and human hepatotoxicity endpoints, which were organized into three levels of detail. Support vector machine models were trained for each endpoint, using optimized descriptor sets from chemometrics software. The optimized global human hepatotoxicity model has high sensitivity (68%) and excellent specificity (95%) in an internal validation set of 221 compounds. Models for preclinical endpoints performed similarly. To allow for reliable prediction of ‘truly external’ novel compounds, all predictions are tagged with confidence parameters. These parameters are derived from a statistical analysis of the predictive probability densities. The whole approach was validated for an external validation set of 269 proprietary compounds. The models are fully integrated into our early safety in-silico workflow. Introduction Drug induced liver injury is of great concern for patient safety and a major cause for drug withdrawal from the market (more than 50 drugs worldwide).1,2 Adverse hepatic effects in clinical trials often lead to a late and costly termination of drug development programs.3 Thus, the early identification of a hepatotoxic potential is of great importance to all stakeholders. Nevertheless, currently employed animal and in-vitro models do not predict hepatic effects in humans reliably.4,5 This lack of 2 ACS Paragon Plus Environment

Page 3 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

translatability is often attributed to the complexity of the associated pathologies and toxicological pathways including transporter perturbation, mitochondrial dysfunction, oxidative stress, immunologically mediated responses, and reactive metabolite formation.6 Many of these factors still lack a detailed understanding. In this context a great number of approaches have been developed to assess hepatotoxicity employing among others toxicogenomics,7 high throughput screening, 8 systems biology approaches9 and in-silico models.10.11 In contrast to experimental methods in-silico models are often faster, cheaper, highly reproducible, and can even be applied to hypothetical molecules in early drug discovery prior to synthesis.12 This makes them a desirable tool for academia and industry alike. In general, for computational models a distinction is drawn between statistical and expert knowledge systems. The latter typically cover welldescribed drug classes only, resulting in low false positive rates but at the same time insufficient sensitivities (

3 ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 31

75%) while the ratio of true positives to total positive predictions was low (20 to 48%) but could be enhanced using a similarity index. Prediction of human hepatotoxicity using structural alerts is generally more limited. In an approach using Derek for Windows (Knowledge Base Version 10) and 22 additional custom alerts Green et al.14 reported a sensitivity of 46% and an accuracy of 56% for 623 validation compounds. However, the reliability of a positive prediction was good with 33% of the alerted compounds being negative. Marchant et al.13 reported a sensitivity of 40% for 137 compounds using Derek for Windows (Version 10). In the same work prediction of hepatotoxicity for 300 compounds tested positive in 28-day rat studies with structural alerts developed for human hepatotoxicity is of alarmingly low quality (sensitivity of 26%). This result is supported by the low concordance between human and animal liver findings (between 40 and 55%)5,16 and it is questionable if other models trained with human data could predict preclinical hepatotoxicity. While information on human toxicity is paramount in drug development the reliability of in-silico models is challenged when predictions are not in agreement with follow-up animal study findings (even though this might not be the scope of such models). Good preclinical endpoint models can fill this gap by identifying a possible lack of animal to human translatability and could indicate were additional testing is required making them a valuable tool in drug development and safety assessment. Among many liver toxicity pathways and mediating targets only few have been subject to predictive modeling approaches including inhibition of the bile salt excretion pump (BSEP)24, sodium taurocholate cotransporting polypeptide (NTCP)25 or oxidative stress.26Most generalized computational liver toxicity models apply a simple binary hepatotoxicity classification scheme (active/inactive or positive/negative) as target endpoint. This kind of endpoint might not be a good choice because a) it does not reflect the complex background of hepatotoxicity and related pathologies adequately and b) a positive prediction is difficult to interpret or validate experimentally since the adverse effect could be manifold and thus c) it is not in line with the OECD principles for quantitative structure-activity relationship (QSAR) model validation demanding a well characterized endpoint.23 Other approaches were developed based on clustering related liver toxicity findings into multiple 4 ACS Paragon Plus Environment

Page 5 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

parallel15,27,28 or hierarchical23 endpoints with different scopes and success. Even then, a single endpoint usually covers multiple different modes of toxicological action, so the modelling algorithm must be chosen carefully in accordance with the need to describe a multimechanistic dataset. Among the many different technologies used to develop classification and QSAR models, the support vector machine (SVM) approach introduced by Cortes and Vapnik29 is a flexible high-performance method, which is well suited for such complex data. Although SVM models are of limited interpretability this drawback is inherent to all methods using a large number of abstract chemical descriptors. SVM is widely employed for toxicity30 and hepatotoxicity16,31 prediction. Selection of relevant variables is paramount for these high-dimensional modeling approaches. For this purpose, a genetic algorithm (GA) is an efficient method to sample the parameter space.32 In combination with machine learning, GA has been previously applied to derive predictive models from target interaction data33,34 or complex ADME data, such as metabolic liability.35 The combination of GA and SVM (termed GA-SVM) is an emerging technique in biological activity modelling and drug design.36,37 In this work a large data set of in vivo hepatotoxicity data was compiled and classified into strictly defined, hierarchically organized endpoints, separately for humans and animal experiments. Using a machine-learning GA-SVM approach, classification models were developed and subsequently validated employing internal and external data sets. We further defined quality parameters to estimate the reliability of predictions that are based on the statistics of model validation. Intervals of high, medium and low confidence of prediction were defined this way. The models perform well and are now regularly applied to in-silico safety assessments of novel molecules during early research and drug development. Material and Methods Data set. Data on hepatotoxicity of drugs and chemicals were extracted from publications.14,15,21,31 Compounds were also compiled from DrugBank.38,39 Clinical, post-market and preclinical data of marketed drugs was retrieved from PharmaPendium40 and pooled with high-level summarizing data from Leadscope,41 and combined with a proprietary internal repository with 14 to 28-day rat study 5 ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 31

data. A high level of detail was retained during data collection including information on species, findings, time-points and doses. Data collection was followed by a two-step curation process firstly treating structures and secondly filtering associated findings. Data originating from different sources typically suffers from heterogeneous representation of molecular structures, isomers, tautomers, charge and other properties. This often leads to unwanted redundancies and creates numerical errors or artifacts when subjected to modelling algorithms. In order to normalize the data repository accordingly, we developed a Pipeline Pilot workflow, for data processing, which is available as a protocol in the supplementary material to this publication. All compound mixtures, all entries without regular structure and compounds without any organic atoms were filtered out. Compounds with a total of three or less atoms, containing any transition metal atoms or a molecular weight exceeding the arbitrary cutoff value of 800 Da were discarded. Counter ions of salts were removed, all stereochemistry was regularized and molecule net charges of free bases were neutralized. All structures were converted to their most stable tautomer in water (using the tauthor tool from Moka 2.0 software).42 Finally, biological information and data was merged on the parent molecule. The in vivo studies were selected such as to represent studies that are typically performed during preclinical drug development and clinical trials. To this end, only findings for humans and all preclinical studies based on various strains and breeds of rats, mice, dogs, and monkeys were considered relevant. Findings were curated manually, selecting only those findings which would indicate hepatotoxicity if found in an animal study or in humans. Post-market data from adverse event reports was integrated following the scheme proposed by Ursem and coworkers.27 We used the proportional reporting ratio (PRR)43 to account for patient exposure and the Yates’ χ2 test to account for statistical independence. An adverse event or side effect finding was considered as relevant for a compound, if PRR ≥ 2, χ2 ≥ 4 and a minimum of 2 reports were available. In a final step the data from different sources was merged on the parent compound level. Detailed study findings, where available, were generally considered of highest priority. In all other cases where

6 ACS Paragon Plus Environment

Page 7 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

only summarized data was available (hepatotoxicity: positive or negative) and two sources disagreed the corresponding compounds were discarded from the dataset. Endpoints. An adequate compromise between complexity and dataset coverage needs to be found to allow for machine-learning. Specific individual findings (e.g. “cytolytic hepatitis”) are heavily imbalanced containing only few positive compounds rendering them unfeasible for modelling. Therefore, we clustered closely related liver toxicity findings into several binary endpoints. In our approach, the endpoints are arranged into a hierarchical tree with 3 levels of details: The level 0 “super endpoint” hepatotoxicity includes all relevant hepatotoxicity terms and overall classifications; level 1 distinguishes between clinical chemistry and morphological findings yielding two endpoints; level 2 further separates hepatocellular and hepatobiliary injury related terms, yielding another 4 endpoints. The endpoint tree is shown in figure 1. In contrast to a statistical selection of findings reported elsewhere,15,27 clustering used in this work is purely endpoint driven. A table mapping the collected findings to the 7 endpoints can be found in the supporting information to this publication. Human and preclinical findings were kept separately, resulting in two endpoint trees. In addition, a third tree was introduced using only preclinical findings with associated dose, while all findings obtained at very high doses above a threshold of 500 mg/kg were discarded. Those dose limited preclinical endpoints focus on strong effects and reduce the leverage of high dose toxicants during model development. A compound was classified as positive when a finding associated with a particular endpoint, species and dose was reported or at least one associated endpoint in a lower level was positive. Accordingly, a compound was classified as negative if no associated findings or all lower level endpoints were negative. In figure 2 the classification of the serotonin receptor agonist frovatriptan for the three endpoint trees (H: human; PC: preclinical; PCdl: preclinical dose limit < 500 mg/kg) is exemplified. Frovatriptan is reported to increase alanine aminotransferase, aspartate aminotransferase and gamma-glutamyltransferase in humans. These findings are linked to the human level 2 endpoint clinical chemistry hepatocellular injury (H-CCHC) which is therefore classified as positive. Consequently, the connected level 1 endpoint clinical chemistry (H-CC) and the level 0 endpoint hepatotoxicity (H-HT) are also positive. 7 ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 31

If only summarized data was available for a compound (e.g. reported to be hepatotoxic in humans in literature without further details), the compound was only classified in the superendpoint, i.e. the original source classification was used. The data is available as supplementary material to this publication. Model development. All compounds from public sources were split into training, test and internal validation (IV) sets for each endpoint tree, respectively. To retain the original balance of the data, 80% (10%; 10%) of positive compounds and 80% (10%; 10%) of negative compounds in the respective superendpoint were randomly assigned to the training (test; IV) sets. All unpublished compounds were assigned to an external validation (EV) set. For all compounds, 674 molecular 2D and 3D descriptors were calculated using the chemometrics software packages CATS (191 descriptors),44 MOE (192 descriptors);45 MDL public fingerprints (163 descriptors),46 VolSurf+ (128 descriptors).47,48 To develop individual binary classification models, each of the 21 datasets were kept separately and subjected to SVM-GA. The support vector machine (SVM) learning method was combined with a genetic algorithm (GA) for feature selection similar to other QSAR and QSPR approaches.36,37 In the GA population, the individual candidate solutions were defined as chromosomes encoding descriptor contributions. For each individual, a SVM model was trained using the given global training set. The resulting model was subsequently applied to the test set. Due to the non-linearity of SVM the resulting numeric class membership between 0 and 1 (in this work, 1 means being positive in the respective endpoint) is not a probability value. For both training and test data, the areas under the receiver operator characteristic curves (AUC) were calculated and AUCs were used as measure for the discriminative power of the model. The fitness of an individual was accordingly defined as a weighted sum of training and test AUC: ݂݅‫ ∙ ܽ = ݏݏ݁݊ݐ‬AUCሺtrainingሻ + ሺ2 − ܽሻ ∙ AUCሺtestሻ

(1)

where the weighting factor ܽ = 0.95. For feature selection a randomly selected first population of 1000 individuals was evolved until the change of maximum fitness was negligible and a maximum fitness was reached. This procedure also reduces the number of contributing descriptors, resulting in 8 ACS Paragon Plus Environment

Page 9 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

slim and predictive models. The parameters used in model optimization can be found in the supporting information. The optimized models were applied to the IV and EV sets. An illustration of the total modeling work-flow is shown in figure 3. The workflow was implemented in the statistical software R version 3.0.249 using the SVM algorithm provided in the e1071 package version 1.6-2,50 the genetic algorithm provided in then genalg package version 0.1.1.151 and the pROC package version 1.7.152 to calculate AUC values. The R source code is provided in the supporting information. Confidence intervals and quality parameters. Characterization of the applicability domain (AD) solely based on descriptor distribution was not adopted since their interpretation can be ambiguous.53 Moreover, for hepatotoxicity models benefit of AD definition is not always visible.10,20,21,22 Instead, we developed a quality parameter from the IV set. While AUC is a good measure for the discriminative power of a binary classifier, there is no way to determine an optimal threshold for separating the two classes. For the presented approach this is further complicated by the non-linearity of SVM and biased training data with an unequal class distribution. To address this issue, we have divided the range of predicted values arbitrarily into five distinct confidence intervals representing regions of high, medium, and low confidence of prediction. Thus, each predicted value can be attributed to a certain class. The proposed confidence intervals (CIs) were derived from IV set predictions to provide an estimate of prospective prediction quality. First the class dependent probability densities ‫ܦ‬ሺ‫ݔ‬ሻ of the predicted value ‫ ݔ‬of negative compounds, ‫ܦ‬଴ ሺ‫ݔ‬ሻ, and positive compounds, ‫ܦ‬ଵ ሺ‫ݔ‬ሻ, were fitted using kernel density estimation (gaussian functions with bandwidth of 0.025). With ‫ܦ‬଴ and ‫ܦ‬ଵ the relative probability densities ஽బሺ௫ሻ ሺ௫ሻା஽ బ భ ሺ௫ሻ

(2)

஽భ ሺ௫ሻ ஽బ ሺ௫ሻା஽భ ሺ௫ሻ

(3)

ܴ଴ ሺ‫ݔ‬ሻ = ஽ and

ܴଵ ሺ‫ݔ‬ሻ =

9 ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 31

were calculated. ܴ଴ ሺ‫ݔ‬ሻ and ܴଵ ሺ‫ݔ‬ሻ are continuous functions approximating the negative and positive prediction probabilities, that were derived from a finite number of discrete predictions. Accordingly, ܴሺ‫ݔ‬ሻ may take values from 0 to 1 and ܴ଴ ሺ‫ݔ‬ሻ + ܴଵ ሺ‫ݔ‬ሻ = 1 for any arbitrary ‫ݔ‬. The thresholds for high and medium confidence in either of the classifications are arbitrarily set at ‫ݐ‬′ with ܴሺ‫ݐ‬′ሻ = 0.8 representing 80% probability (4 out of 5 predictions are correct) and ‫ݐ‬′′ with ܴሺ‫ݐ‬′′ሻ = 0.66 representing a 66% probability (2 out of 3 predictions are correct). With the resulting thresholds ‫ݐ‬଴ᇱ and ‫ݐ‬଴ᇱᇱ for a negative classification with high and medium confidence and ‫ݐ‬ଵᇱ and ‫ݐ‬ଵᇱᇱ for a positive ᇱᇱ classification with high and medium confidence the 5 CIs are defined as [0; ‫ݐ‬଴ᇱ ሻ, [‫ݐ‬଴ᇱ ; ‫ݐ‬଴ᇱᇱ ሻ, [t ᇱᇱ ଴ ; tଵ ሻ,

[tଵᇱᇱ ; tଵᇱ ሻ, and [‫ݐ‬ଵᇱ ; 1ሿ. For illustration purposes, CIs are here denominated as A, B, C, D, and E. The size of any of the intervals A to E may vary with a different validation set and is therefore no intrinsic property of the model. Results and Discussions Data repository. The final data repository contains 3712 public domain and 269 proprietary compounds that could be classified in at least one of the 21 hepatotoxicity endpoints. While human toxicity data was available almost exclusively for pharmaceuticals, the repository also contains industrial chemicals with sparse animal toxicity data. A total of 1709 compounds were classified in human and preclinical endpoints while 1537 of these compounds were further characterized by dosing information for at least one animal study. The overall concordance of the human and animal hepatotoxicity is 77%. In previous studies concordance was reported to be even lower (between 40 and 55%),5,16 probably reflecting over-simplification of the toxicity data. One should keep in mind that such an analysis strongly depends on the used classification scheme and underlying data set. Only 63 of the 349 compounds with non-matching classification show hepatotoxicity in animals but not in humans, while the majority is positive in humans and negative in animals. This is not surprising since a compound showing hepatotoxicity in animal studies will in most cases not be tested in humans. Doses in animal studies as well as units and formats were found to be divers and vastly inconsistent in different data sources. For 1693 results, no dose could be obtained or a minimum dose could not be assigned and thus classification in the dose limited preclinical endpoints was not possible. 127 10 ACS Paragon Plus Environment

Page 11 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

compounds (116 (7%) public and 11 (4%) proprietary) had only findings at a dose exceeding 500 mg/kg and were thus classified as negative to improve the model performance towards perception of clear hepatotoxicants at lower doses. The hierarchical assignment of compounds was often limited by data availability as well. 2171 compounds were classified for the human super endpoint hepatotoxicity, while only 1874 compounds could be fit into all endpoints. General preclinical study data was available for 3250 compounds of which only 1563 could be classified in all preclinical endpoints. The number of compounds in each endpoint data set is listed in table 1. A complete list of names, SMILES and classifications of the public domain compounds is available from the supporting information. Hepatotoxicity models. A total of 21 models (7 human, 7 preclinical without and 7 with dose limit) were obtained with the GA-SVM approach, and an overview of model performance is given in table 2. Due to the tight optimization procedure all models perform well for the complete training and test sets (AUC > 0.8, not considering confidence intervals), but are of diverse quality for the 10% hold out IV data set. All human endpoints (M1 to M7, see table 2) are predicted reasonably well with AUC(IV) ranging between 0.71 and 0.75. For the preclinical endpoints without dose limit only the superendpoint hepatotoxicity yields a model with a significant discriminative power (AUC(IV) = 0.73, M12). The other 6 models (data not shown in table 2) show low predictivity (AUC(IV) < 0.67).Consequently, those models were not considered in any further analysis. The validation of the dose limited ( 1200 compounds) a much wider chemical diversity than any previously published animal hepatotoxicity models. Their performance for the EV set with sensitivities >72% and true positive rates of 35 to 47% and 50 to 71% when considering only high confidence predictions is similar. The hierarchical endpoint structure, the developed confidence intervals and the indirect inclusion of dose information make them a unique set of tools in in-silico hepatotoxicity prediction. Interestingly, for all 7 preclinical models the performances in their IV and EV are not correlated in any way and performance of a model for EV can only be estimated a posteriori. This is insofar surprising, as the models for dose limited preclinical endpoints (M13 to M18) all have roughly the same training, 14 ACS Paragon Plus Environment

Page 15 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

test and validation sets (only varying by a maximum of 15 compounds). The sets are covering the same chemical space and thus might be expected to perform similarly when applied to the same set of EV compounds. Within the applied modelling strategy, the model applicability is primarily driven by the GA-driven choice of feature space rather than chemical space in general. For these models (but to our opinion not limited to them), forward prediction and chemical transferability should not be assessed in a single validation step or replaced by applicability domain definition but rather through rigorous external validation. Similar results have been discussed earlier on other datasets.16,55 Building global models with broad applicability is especially challenging for human toxicity endpoints where validation data is scarce and systematic testing is out of question. For the human endpoints not enough data was available for an external validation with proprietary, i.e. non-public domain, compounds. But considering the observed decrease of discriminative power for the preclinical models when turning from internal to external validation (AUC values decrease by approximately 0.1, accuracies decrease up to 30%), we expect a similar reduction of predictivity for human endpoints when applying the models to entirely novel compound types e.g. to drug candidates in an early research or development phase. Since the both human and preclinical models presented in this work are solely descriptor based they should be seen as complementary to expert knowledge driven structural alerts and they should be used accordingly. In this context, the large data collection presented in this work can be used for validation of known and identification new structural alerts. Mechanistic differentiation. By selection of appropriate molecular descriptors, the models are able to capture and distinguish diverse pharmacological and human toxicological features as exemplified by a series of four triptan core derivatives. The chemical structures of frovatriptan, etodolac, tadalafil, and yohimbine are shown in figure 07. Frovatriptan is a serotonin receptor agonist used in the treatment of migraine (H-CCHC: positive, see figure 2 and table 4). The precise mechanism of its liver toxicity is unknown but may involve a reactive metabolite produced by a hepatic Cytochrome, such as CYP1A2.56 For the non-steroidal anti-inflammatory drug etodolac (positive in all human endpoints) and the selective inhibitor of phosphodiesterase type 5 tadalafil (positive in all human level 2 15 ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 31

endpoints but H-CCHB) the mechanisms driving the observed liver toxicity are most likely of idiosyncratic nature and are equally thought to involve metabolites or metabolic intermediates.56 The number of incidences of post-market reports differ significantly for the two compounds and suggest that they have different toxic potencies, namely etodolac being of concern. The alpha 2 adrenergic receptor agonist yohimbine is a herbal extract mostly used to treat erectile dysfunction and is not reported to induce acute liver injury. Yohimbine is also negative in all human endpoints. Those four compounds differ in their respective pharmacophoric features. Net charge and polarity of compounds are key parameters to drive molecular receptor recognition and pharmacokinetic parameters of substances, so the aliphatic acid etodolac solely undergoes glucuronidation by human UGTs, which can lead to the formation of reactive metabolites in man. Etodolac is also a substrate of the liver cytochrome CYP2C9. In contrast to that, yohimbine and frovariptan are positively charged, basic amines, making them poor substrates for UGT, but interaction partners for aminergic g-protein coupled receptors. In contrast to etodolac, frovatriptan is a substrate of CYP1A2, and yohimbine is a substrate of CYP3A4. Selection and incorporation of pharmacophorically diverse compounds in model training is key for an implicit training and differenciation of hepatic modes-of-toxicity. Technically, charged and partially charged surface area descriptors (prefix PEOP_VSA and Q_VSA, see supporting information table S7) are able to represent pharmacophoric properties. Consequently, the developed model suite has been found to differentiate well between the toxicological profiles of the four drugs and correctly classifies all four human level 2 endpoints with two minor exception: for tadalafil (test set compound) the model M5 (H-CCHB) result is equivocal (correct: negative) and for yohimbine the model M4 (H-CCHC) result is equivocal (correct: negative). To composite model M11 correctly classifies all four compounds. Actual and predicted classes for the four level 2 human endpoints of the four compounds are listed in table 4. Due to a significant number of hepatic toxicity modes for chemicals, including pharmacophoric diversity in model development is important to supports the generalizability of the models. In a future perspective, as more and more adverse outcome pathways are uncovered target based modeling will become feasible even for complex organ specific toxicities. 16 ACS Paragon Plus Environment

Page 17 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

Summary and Conclusion We have developed a systematic approach to construct interrelated models for hepatotoxicity with a general applicability scope from a repository of 3712 compounds and associated human and animal hepatotoxicity data. The models were organized such as to reflect major classes of pathologies in preclinical and clinical development, seeking to support the translatability of data in the pharmaceutical drug discovery. All pathology findings were clustered into 3 hierarchical endpoint trees with 7 endpoints each and for every endpoint SVM models were constructed using a genetic algorithm for feature selection. Model performance was evaluated for an internal (10% hold out) and an external validation set (269 proprietary compounds) with only preclinical data. For human hepatotoxicity, 4 high quality single endpoint models and one composite model were identified and implemented. For preclinical hepatotoxicity, only 2 single endpoint models and one composite model combining the 2 showed significant predictive power for the external validation set. A composite model based on lower level endpoints can additionally be used to anticipate more specific liver findings. Hepatotoxicity in humans and animals remains a key challenge for computational methods. It could be shown here that models based on the improved data sets can enhance predictability and reliability of predictions. To this end, technically advanced modelling algorithms alone will not suffice. A combination of high quality data, the right choice of endpoint or endpoints, an elaborate model technique, and a rigorous validation scheme are necessary to jump the hurdle posed by this complex organ specific toxicity. Furthermore, a combination of the newly developed human and preclinical models can support a translational analysis and promote a better understanding of differences between human and animal invivo results. In the context of drug development a combination with complementary in-silico, PK/PD data and in-vitro methods may help to reduce animal testing and hopefully lead to improved toxicity profiles of novel molecules and finally also help to reduce late state attrition rates in pharmaceutical drug development.

17 ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 31

Abbreviations AUC, area under the receiver operator characteristic curve; CI, confidence interval; EV, external validation; GA-SVM, support vector machine model using genetic algorithm for feature selection; IV, internal validation; OECD, Organization for Economic Cooperation and Development; PRR, proportional reporting ratio; QSAR, quantitative structure-activity relationship. Associated Content Supporting Information. Entire public dataset of compounds (in SDF format) with binary endpoint data; table with endpoints and associated findings; tables with further statistical data on developed models; table with selected descriptors; R source code files (in plain text format); Pipeline Pilot workflow for compound preparation. This material is available free of charge via the Internet at http://pubs.acs.org.

18 ACS Paragon Plus Environment

Page 19 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

References

(1)

Fung, M., Thornton, A., Mybeck, K., Wu, J. H.-h., Hornbuckle, K., and Muniz, E. (2001) Evaluation of the Characteristics of Safety Withdrawals of Prescription Drugs from Worldwide Pharmaceutical Markets - 1960 to 1999. Drug Inf. J. 5, 293-317.

(2)

Chen, M., Vijay, V., Shi, Q., Liu, Z., Fang, H., and Tong, W. (2011) FDA-approved drug labeling for the study of drug-induced liver injury. Drug Discov. Today 16, 697-703.

(3)

Ballet, F. (1997) Hepatotoxicity in drug development: detection, significance and solutions. J. Hepatol. 26, 26-36.

(4)

Kaplowitz, N. (2005) Idiosyncratic Drug Hepatotoxicity. Nat. Rev. Drug Discov. 4, 489-499.

(5)

Olson, H., Betton, G., Robinson, D., Thomas, K., Monro, A., Kolaja, G., Lilly, P., Sanders, J., Sipes, G., Bracken, W., Dorato, M., Van Deun, K., Smith, P., Berger, B., and Heller, A. (2000) Concordance of Toxicity of Pharmaceuticals in Humans and in Animals. Regul. Toxicol. Pharmacol. 32, 56-67.

(6)

Jeschke, H., Gores, G. J., Cedrbaum, A. I., Hinson, J. A., Pessayre, D., and Lemasters, J. J. (2002) Mechanisms of Hepatotoxicity. Toxicol. Sci. 65, 166-176.

(7)

Blomme, E. A. G., Yang, Y., and Waring, J. F. (2009) Use of toxicogenomics to understand mechanisms of drug-induced hepatotoxicity during drug discovery and development. Toxicol. Lett. 186, 22-31.

(8)

O'Brian, P. J., Irwin, W., Diaz, D., Howard-Cofield, E., Krejsa, C. M., Slaughter, M. R., Gao, B., Kaludercic, N., Angeline, A., Bernadi, P., Brain, P., and Hougham, C. (2006) High concordance of drug-induced human hepatotoxicity with in vitro cytotoxicity measured in a novel cell-based model using high content screening. Arch. Toxicol. 80, 580-604.

(9)

Bhattacharya, S., Shoda, L. K. M., Zhang, Q., Woods, C. G., Howell, B. A., Siler, S. Q., Woodhead, J. L., Yang, Y., McMullen, P., Watkins, P. B., and Andersen, M. E. (2012) Modeling drug- and chemical-induced hepatotoxicity with systems biology approaches. Front. Physiol. 3, 1-18.

(10)

Chen, M., Bisgin, H., Tong, L., Hong, H., Fang, H., Borlak, J., and Tong, W. (2014) Toward predictive models for drug-induced liver injury in humans: are we there yet? Biomarkers Med. 8, 201-213.

(11)

Ekins, S. (2014) Progress in computational toxicology. J. Pharmacol. Toxicol. Meth. 69, 115140.

(12)

Cumming, J. G., Davis, A. M., Muresan, S., Haeberlein, M., and Chen, H. (2013) Chemical predictive modelling to improve compound quality. Nat. Rev. Drug Discov. 12, 948-962.

(13)

Marchant, C. A., Fisk, L., Note, R. R., Patel, M. L., and Suárez, D. (2009) An Expert System 19 ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 31

Approach to the Assessment of Hepatotoxic Potential. Chem. Biodivers. 6, 2107-2144. (14)

Greene, N., Fisk, L., Naven, R. T., Note, R. R., Patel, M. L., and Pelletier, D. J. (2010) Developing Structure-Activity Relationships for the Prediction of Hepatotoxicity. Chem. Res. Toxicol. 23, 1215-1222.

(15)

Liu, Z., Shi, Q., Ding, D., Kelly, R., Fang, H., and Tong, W. (2011) Translating Clinical Findings into Knowledge in Drug Sfety Evaluation - Drug Induced Liver Injury Prediction System (DILIps). PLoS Comput. Biol. 7, e1002310.

(16)

Fourches, D., Barnes, J. C., Day, N. C., Bradley, P., Reed, J. Z., and Tropsha, A. (2011) Chemoinformatics Analysis of assertions Mined from Literature that Describe Drug-Induced Liver Injury in Different Species. Chem. Res. Toxicol. 23, 171-183.

(17)

Rogers, A. D., Zhu, H., Fourches, D., Rusyn, I., and Tropsha, A. (2009) Modeling LiverRelated Adverse Effects of Drugs Using kNearest Neighbor Quantitative Structure-Activity Relationship Method. Chem. Res. Toxicol. 23, 724-732.

(18)

Cruz-Monteagudo, M., Crodeiro, M. N., and Borges, F. (2008) Computational Chemistry Approach for the Early Detection of Drug-Induced Idiosyncratic Liver Toxicity. J. Comput. Chem. 29, 532-549.

(19)

Cheng, A., and Dixon, S. L. (2003) In silico models for the prediction of dose-dependent human hepatotoxicity. J. Comp. Aided Mol. Des. 17, 811-823.

(20)

Liew, C. Y., Lim, Y. C., and Yap, C. W. (2011) Mixed learning algorithms and features ensemble in hepatotoxicity prediction. J. Comput. Aided Mol. Des. 25, 855-871.

(21)

Ekins, S., Williams, A. J., and Xu, J. J. (2010) A Predictive Ligand-Based Bayesian Model for Human Drug-Induced Liver Injury. Drug Metab. Dispos. 38, 2302-2308.

(22)

Chen, M., Hong, H., Fang, H., Kelly, R., Zhou, G., Borlak, J., and Tong, W. (2013) Quantitative Structure-Activity Relationship Models for Predicting Drug-Induced Liver Injury Based on FDA-Approved Drug Labeling Annotation and Using a Large Collection of Drugs. Toxicol. Sci. 136, 242-249.

(23)

Myshkin, E., Brennan, R., Khasanova, T., Sitnik, T., Serebriyskaya, T., Litvinova, E., Guryanov, A., Nikolsky, Y., Nikolskaya, T., and Bureeva, S. (2012) Prediction of Organ Toxicity Endpoints by QSAR Modeling Based on Precise Chemical-Histopathology Annotations. Chem. Biol. Drug Des. 80, 406-416.

(24)

Pinto, M., Digles, D., and Ecker, G. F. (2014) Computational models for predicting the interaction with ABC transporters. Drug Discov. Today Technol. 12, e69-e77.

(25)

Dong, Z., Ekins, S., and Polli, J. E. (2013) Structure-Activity Relationship for FDA Approved Drugs As Inhibitors of the Human Sodium Taurocholate Cotransporting Polypeptide (NTCP). Mol. Pharm. 10, 1008-1019.

(26)

Kim, M. T., Huang, R., Sedykh, A., Wang, W., Xia, M., and Zhu, H. (2015) Mechanism 20 ACS Paragon Plus Environment

Page 21 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

Profiling of Hepatotoxicity Caused by Oxidative Stress Using the Antioxidant Response Element Reporter Gene Assay Models and Big Data. Environ. Health Persp. [Online] http://ehp.niehs.nih.gov/1509763/ (accessed Jan 06, 2016). (27)

Ursem, C. J., Kruhlak, N. L., Contrera, J. F., MacLaughlin, P. M., Benz, D. R., and Metthews, E. J. (2009) Identification of structure-activity relationships for adverse effects of pharmaceuticals in humans. Part A: Use FDA post-market reports to create a database of hepatobiliary and urinary tract toxicities. Regul. Toxicol. Pharmacol. 54, 1-22.

(28)

Matthews, E. J., Ursem, C. J., Kruhlak, N. L., Benz, D. R., Aragonés Sabaté, D., Yang, C., Klopman, G., and Contrera, J. F. (2009) Identification of structure-activity relationships for adverse effects of pharmaceuticals in humans. Part B: Use of (Q)SAR systems for early detection of drug-induced hepatotoxicity and urinary tract toxicities. Regul. Toxicol. Pharmacol. 54, 23-42.

(29)

Cortes, C., and Vapnik, V. (1995) Support-Vector Networks. Machine Learning 20, 273-297.

(30)

Zhao, C. Y., Zhang, H. X., Zhang, X. Y., Liu, M. C., Hu, Z. D., and Fan, B. T. (2005) Application of support vector machine (SVM) for prediction toxic activity of different data sets. Toxicology 217, 105-119.

(31)

Low, Y., Uehara, T., Minowa, Y., Yamada, H., Ohno, Y., Urushidani, T., Sedykh, A., Muratow, E., Kuz'min, V., Fourches, D., Zhu, H., Rusyn, I., and Tropsha, A. (2011) Predicting Drug-Induced Hepatotoxicity Using QSAR and Toxicogenomics Approches. Chem. Res. Toxicol. 24, 1251-1262.

(32)

Cho, S. J., and Hermsmeier, M. A. (2002) Genetic Algorithm Guided Selection:Variable Selection and Subset Selection. J. Chem. Inf. Comput. Sci. 42, 927-936.

(33)

Matter, H., Anger, L., Giegerich, C., Güssregen, S., Hessler, G., and Baringhaus, K. (2012) Development of in silico filters to predict activation of the pregnane X receptor (PXR) by structurally diverse drug-like molecules. Bioorg. Med. Chem. 20, 5352-5365.

(34)

Schmidt, F., Matter, H., Hessler, G., and Czich, A. (2014) Predictive in silico off-target profiling in drug discovery. Future Med. Chem. 6, 295-317.

(35)

Baringhaus, K., Hessler, G., Matter, H., and Schmidt, F. (2013) 11. DEVELOPMENT AND APPLICATIONS OF GLOBAL ADMET MODELS: IN SILICO PREDICTION OF HUMAN MICROSOMAL LABILITY. In Chemoinformatics for Drug Discovery, John Wiley & Sons, Inc, Hoboken, NJ, USA.

(36)

Pourbasheer, E., Riahi, S., Ganjali, M. R., and Norouzi, P. (2009) Application of genetic algorithm-support vector machine (GA-SVM) for prediction of BK-channels activity. Eur. J. Med. Chem. 44, 5023-5028.

(37)

Fernandez, M., Caballero, J., Fernandez, L., and Sarai, A. (2010) Genetic algorithm optimization in drug design QSAR: Bayesian-regularized genetic neural networks (BRGNN) and genetic-algorithm-optimized support vector machines (GA-SVM). Mol. Divers. 15, 269289. 21 ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 31

(38)

Canadian Institute of Health Research (2013) DrugBank at http://www.DrugBank.ca. (accessed Jan 3, 2013).

(39)

Law, V., Knox, C., Djoumbou, Y., Jewison, T., Guo, A. C., Maciejewski, A., Arndt, D., Wilson, M., Neveu, V., Tang, A., Gabriel, G., Ly, C., Adamjee, S., Dame, Z., Han, B., Zhou, Y., and Wishart, D. S. (2014) DrugBank 4.0: shedding new light on drug metabolism. Nucleic Acids Res. 42, D1091-D1097.

(40)

Reed Elsevier Properties SA (2013) PharmaPendium at http://www.pharmapendium.com (accessed Jan 5, 2013).

(41)

Leadscope, Inc., Leadscope 3.0.8, Columbus, OH, USA.

(42)

Milletti, F., Storchi, L., Sforna, G., Cross, S., and Cruciani, G. (2009) Tautomer enumeration and stability prediction for virtual screening on large chemical databases. J. Chem. Inf. Model. 49, 68-75.

(43)

Evans, S. J. W., Waller, P. C., and Davis, S. (2001) Use of proportional reporting ratios (PRRs) for signal generation from spontaneous adverse drug reaction reports. Pharmacoepidemiol. Drug Saf. 10, 483-486.

(44)

Schneider, G., Neidhart, W., Giller, T., and Schmid, G. (1999) “Scaffold-Hopping" by Topological Pharmacophore Search: A Contribution to Virtual Screening. Angew. Chem., Int. Ed. 38, 2894-2896.

(45)

Chemical Computing Group (2013) Molecular Operating Environment Descriptors, Quebec, Canada.

(46)

MDL Information Systems/Symyx (1984) MACCS keys, Santa Clara, CA, USA.

(47)

Crivori, P., Cruciani, G., Carrupt, P.-A., and Testa, B. (2000) Predicting Blood-Brain Barrier Permeation from Three-Dimensional Molecular Structure. J. Med. Chem. 43, 2204-2216.

(48)

Cruciani, G., Crivori, P., Carrupt, P.-A., and Testa, B. (2000) Molecular fields in quantitative structure-permeation relationships: the VolSurf approach. J. Mol. Struct.: THEOCHEM 503, 17-30.

(49)

R Core Team (2014) R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing at http://www.R-project.org (accessed Nov 24, 2014).

(50)

Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F., Chang, C.-C. and Lin, C.C. (2014) e1071: Misc Functions of the Department of Statistics (e1071) at http://cran.rproject.org/web/packages/e1071/index.html (accessed Nov 24, 2014).

(51)

Willighagen, E. (2014) genalg: R Based Genetic Algorithm project.org/web/packages/genalg/index.html (accessed Nov 24, 2014).

(52)

Robin, X., Truck, N., Hainard, A., Tiberti, N., Lisacek, F., Sanchez, J.-C. and Müller, M. (2014) pROC: display and analyze ROC curves at http://cran.r-

at

http://cran.r-

22 ACS Paragon Plus Environment

Page 23 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

project.org/web/packages/pROC/index.html (accessed Nov 24, 2014). (53)

Norinder, U., Carlsson, L., Boyer, S., and Eklund, M. (2014) Introducing Conformal Prediction in Predictive Modeling. A Transparent and Flexible Alternative to Applicability Domain Determination. J. Chem. Inf. Model. 54, 1596-1603.

(54)

Zakharov, A. V., Peach, M. L., Sitzmann, M., and Nicklaus, M. C. (2014) QSAR Modeling of Imbalanced High-Throughput Screening Data in PubChem. J. Chem. Inf. Model. 54, 705712.

(55)

Tropsha, A., and Golbraikh, A. (2007) Predictive QSAR Modeling Workflow, Model Applicability Domains, and Virtual Screening. Curr. Pharm. Des. 13, 3494-3504.

(56)

National Institute of Health (2016) LiverTox at http://livertox.nlm.nih.gov/ (accessed Feb, 2016)

23 ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 31

Table 1. Compounds classified according to the endpoints in the repository used for development of the different models. For each endpoint 80% of the compounds were used as model training set, 10% as test and 10% as an internal validation set. Level

N (positive/negative) a public domain proprietary

Endpoint Human

0 1 1 2

2 2

Hepatotoxicity Clinical Chemistry Findings Morphological Findings Clinical Chemistry Findings Hepatocellular Injury Clinical Chemistry Findings Hepatobiliary Injury Morphological Findings Hepatocellular Injury Morphological Findings Hepatocellular Injury

0

Hepatotoxicity

2

2171 1940 1940 1874

(1435/736) (987/953) (1012/928) (891/983)

-

1874 (584/1290)

-

1891 (900/991) 1891 (613/1278)

-

Preclinical with and without dose 3250 (2464/786)

269 (105/164)

Preclinical with dose < 500 mg/kg 269 Hepatotoxicity 1578 (668/910) Clinical Chemistry Findings 1578 (435/1143) 269 269 Morphological Findings 1578 (615/963) Clinical Chemistry Findings Hepatocellular 1566 (404/1162) 269 Injury 2 Clinical Chemistry Findings Hepatobiliary 1566 (118/1448) 269 Injury 2 Morphological Findings Hepatocellular Injury 1563 (598/965) 269 2 Morphological Findings Hepatocellular Injury 1563 (53/1510) 269 a N: number of compounds. In brackets: the number of positive and negative compounds. 0 1 1 2

(94/175) (60/209) (69/200) (56/213) (18/251) (64/205) (20/249)

24 ACS Paragon Plus Environment

Page 25 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

Table 2. Performance of 18 models for human and preclinical hepatotoxicity endpoints for internal and external validation sets. Model No.

Model Description

M1 M2 M3 M4 M5 M6 M7

H-HT H-CC H-MF H-CCHC H-CCHB H-MFHC H-MFHB H-CC composite of H-CCHC and H-CCHB H-MF composite of H-MFHC and H-MFHB H-HT composite of H-CC and H-MF H-HT composite of H-CCHC, H-CCHB, H-MFHC, and H-MFHB PC-HT PC-HTdl PC-CCdl PC-MFdl PC-CCHCdl PC-MFHCdl PC-HTdl composite of PC-CCdl and PC-MFdl

M8 M9 M10 M11 M12 M13 M14 M15 M16 M17 M18

External val.c AUCd ACCe -

Na Descr. 83 62 76 46 69 73 64

Nb 221 200 200 193 193 195 195

-

200

0.71

76%

-

-

-

200

0.76

77%

-

-

-

221

0.74

78%

-

-

-

221

0.75

77%

-

-

70 103 69 74 75 76

375 157 157 157 156 154

0.73 0.74 0.73 0.77 0.70 0.75

75% 75% 78% 78% 83% 77%

0.55 0.58 0.68 0.65 0.51 0.53

50% 41% 64% 48% 38% 45%

-

157

0.76

75%

0.68

56%

Internal val. AUCd ACCe 0.73 75% 0.71 73% 0.74 74% 0.73 75% 0.73 77% 0.72 73% 0.75 78%

a

N Descr.: number of descriptors selected by genetic optimization algorithm; b N number of compounds in internal validation set (10% of the public domain compounds listed in table 1); c External Validation sets contain 269 proprietary compounds with 28-day rat study data; d AUC: Area under the receiver operator characteristic (ROC) curve; e ACC: Accuracy. ACC = number of correct classifications divided by total number of compounds. Here ACC is calculated from classifications based on confidence intervals and equivocal compounds are not included.

25 ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 31

Table 3. Definition of the confidence intervals A to E for models M1, M11, M14, M15, and M18 and information on the number of compounds within the intervals. Model No.

Model Description

M1

H-HT

M11

H-HT composite of H-CCHC, H-CCHB, H-MFHC, and H-MFHB

M14

PC-CCdl

M15

PC-MFdl

M18

PC-HTdl composite of PC-CCdl and PCMFdl

Interval A B C D E A B C D E A B C D E A B C D E A B C D E

[0.00, 0.47[ [0.47, 0.55[ [0.55, 0.73[ [0.73, 0.91[ [0.91, 1.00] [0.00, 0.36[ [0.36, 0.47[ [0.47, 0.68[ [0.68, 0.71[ [0.71, 1.00] [0.00, 0.14[ [0.14, 0.26[ [0.26, 0.31[ [0.31, 0.46[ [0.46, 1.00] [0.00, 0.28[ [0.28, 0.34[ [0.34, 0.45[ [0.45, 0.54[ [0.54, 1.00] [0.00, 0.28[ [0.28, 0.35[ [0.35, 0.43[ [0.43, 0.53[ [0.53, 1.00]

Internal validation a

External validation b

Nc 34 16 87 83 1 4 50 125 11 28 2 81 33 41 0 13 43 52 30 19 16 38 44 38 21

Nc

positive ratio

1 102 76 81 7 6 27 127 63 44 5 28 108 75 51

0% 12% 22% 32% 71% 17% 7% 23% 24% 50% 20% 7% 30% 36% 63%

positive ratio 32% 50% 67% 83% 100% 0% 40% 71% 91% 96% 0% 15% 30% 66% 0% 16% 42% 67% 68% 6% 21% 41% 66% 71%

a

Internal validation sets contain 10% of the public domain compounds listed in table 2; b External Validation sets contain 269 proprietary compounds with 28-day rat study data; c N: number of compounds with a predicted value in the given interval.

Table 4. Human hepatotoxicity classes for four triptan core derivatives including prediction results from models M4 to M7 (P=positive, N=negative). Compound name Frovatriptan etodolac tadalafil yohimbine a

H-CCHC actual predicted P P P P P P N equivocal

H-CCHB actual predicted N N P P N equivocala N N

H-MFHC actual predicted N N P P P P N N

H-MFHB actual predicted N N P P P P N N

Tadalafil is in the test set of model M5 (endpoint H-CCHB).

26 ACS Paragon Plus Environment

Page 27 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

Figure 1. Schematic of the hierarchical endpoint tree and endpoint abbreviations. Association of terms to endpoints can be found in the supporting information (table S1).

Figure 2. Classification of frovatriptan for 21 endpoints based on reported toxicity findings and associated species and doses. Positive endpoints are colored red, negative endpoints are colored green. Endpoint name abbreviations can be found in figure 1. 27 ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 31

Figure 3. Workflow used for developing and validating support vector machine classification models using a genetic algorithm for feature selection.

28 ACS Paragon Plus Environment

Page 29 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

Figure 4. Probability densities ࡰ૚ and ࡰ૙ (top) and relative probability densities ࡾ૚ and ࡾ૙ (bottom) of positive (red) and negative (black) compounds for the internal validation set of the human hepatotoxicity model M1 (H-HT). Vertical lines separate the 5 confidence intervals A, B, C, D, and E at the thresholds ࢚ᇱ૙ = ૙. ૝ૠ, ࢚ᇱᇱ૙ = ૙. ૞૞, ࢚ᇱᇱ૚ = ૙. ૠ૜, and ࢚ᇱ૚ = ૙. ૢ૚.

Figure 5. Numbers and ratios of positive (red, top) and negative (green, bottom) compounds in the internal validation set of the human hepatotoxicity model M1 (H-HT) grouped according to the 5 confidence intervals A, B, C, D, and E (see table 3).

29 ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 31

Figure 6. Plots of the probability densities (top) and relative probability densities (bottom) of the positive (red) and negative (black) compounds in the internal (left) and external (right) validation set for the composite model for preclinical hepatotoxicity < 500 mg/kg (M18) combining the models PCCCdl and PC-MFdl. Vertical lines separate the 5 confidence intervals A, B, C, D, and E at the thresholds ࢚ᇱ૙ = ૙. ૛ૡ, ࢚ᇱᇱ૙ = ૙. ૜૞, ࢚ᇱᇱ૚ = ૙. ૝૜, and ࢚ᇱ૚ = ૙. ૞૜ derived from the internal validation set.

30 ACS Paragon Plus Environment

Page 31 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

Figure 7. Chemical structures of four drugs in the in the data collection containing a triptan moiety.

TOC Graphic

31 ACS Paragon Plus Environment