Conformal Prediction Classification of a Large Data Set of

May 6, 2016 - In this study, we analyzed the ToxCast and Tox21 estrogen receptor data sets using Conformal Prediction to enhance the full exploitation...
2 downloads 0 Views 3MB Size
Subscriber access provided by University of Sussex Library

Article

Conformal Prediction Classification of a Large Dataset of Environmental Chemicals from ToxCast and Tox21 Estrogen Receptor Assays Ulf Norinder, and Scott Boyer Chem. Res. Toxicol., Just Accepted Manuscript • DOI: 10.1021/acs.chemrestox.6b00037 • Publication Date (Web): 06 May 2016 Downloaded from http://pubs.acs.org on May 10, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Chemical Research in Toxicology is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

Conformal Prediction Classification of a Large Dataset of Environmental Chemicals from ToxCast and Tox21 Estrogen Receptor Assays

Ulf Norinder and Scott Boyer Swedish Toxicology Sciences Research Center, SE-151 36 Södertälje, Sweden

Corresponding author: Dr. Ulf Norinder, Swedish Toxicology Sciences Research Center, SE-151 36 Södertälje, Sweden e-mail: [email protected] phone: +46 8 524 885 14

The authors declare they have no actual or potential competing financial interests.

ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table of Contents

ACS Paragon Plus Environment

Page 2 of 40

Page 3 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

ABSTRACT Quantitative Structure Activity Relationships (QSAR) are critical to exploitation of the chemical information in toxicology databases. Exploitation can be extraction of chemical knowledge from the data but also making predictions of new chemicals based on quantitative analysis of past findings. In this study we analyzed the ToxCast and Tox21 estrogen receptor datasets using Conformal Prediction to enhance the full exploitation of the information in these datasets. We applied Aggregated Conformal Prediction (ACP) to the ToxCast and Tox21 estrogen receptor datasets using Support Vector Machine classifiers to compare overall performance of the models, but more importantly, to explore the performance of ACP on datasets that are significantly enriched in one class without employing sampling strategies of the training set. ACP was also used to investigate the problem of applicability domain using both datasets. Comparison of ACP to previous results obtained on the same datasets using traditional QSAR approaches indicated similar overall balanced performance to methods in which careful training set selections were made, e.g. sensitivity and specificity for the external Tox21 dataset of 70 -75 %, and far superior results to those obtained using traditional methods without training set sampling where the corresponding results showed a clear unbalance of 50 and 96 %, respectively. Application of Conformal Prediction to imbalanced datasets facilitates an unambiguous analysis of all data, allows accurate predictive models to be built which display similar accuracy in external validation to external validation and most importantly, allows an unambiguous treatment

ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

of applicability domain.

INTRODUCTION Humans and other species are exposes to tens of thousands of man-made chemicals. Some of these chemicals are able to mimic natural hormones and disrupt normal functions of the endocrine system.1-5 Most currently-available protocols for testing such chemicals for their biological activity and toxicity are not only expensive but also quite time-consuming. It is only possible to accurately evaluate a small portion of compounds using experimental in vivo methods. For tens of thousands of chemicals it becomes impossible in terms of both cost and time to test them experimentally. This, in turn, means that only a small fraction of such chemicals have been thoroughly characterized and well assessed for potential risks to both human and environmental health6-9 and thus most chemicals have little to no data regarding their toxicological profile. The European legislation for safe use of chemicals, REACH (Registration, Evaluation, Authorisation and Restriction of Chemicals), requires information for all chemicals that are currently on the market in Europe in quantities above one tonne per year.10 A huge amount of data on each compound is required and to facilitate such data collection alternative computational methods to direct experimentation have been identified as possible sources for this information. Among the more concerning effects of man-made chemcials is disruption of reproduction. Endocrine disrupting chemicals (EDCs) which can interact with endocrine hormone receptors, and in particular the estrogen receptor (ER), have been the focus of many investigations. As a consequence of growing datasets of individual chemical activity at the ER, predictive quantitative structure-activity relationship (QSAR) models can be built. Many of these studies have focused

ACS Paragon Plus Environment

Page 4 of 40

Page 5 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

on the US National Center for Toxicological Research (NCTR) Endocrine Disruptor Knowledge Base (EDKB)11 with various smaller extensions12-20 or on the combined EDKB and Japanese METI databases.21-22 A substantial amount of new data have recently been made available from the ToxCast23 and Tox2124 programs. Zang and co-workers have recently reported an in silico binary classification study based on these resources using various machine learning methods and several training-test set selection techniques.25 Zang and co-workers used QikProp descriptors (51 descriptors, Schrödinger version 3.2, http://www.schrodinger.com/) and 4328 structural fingerprints (FP3, FP4, MACCS) using OpenBabel26, PADEL27 and PubChem28 to describe the structures and machine learning methods linear discriminant analysis (LDA)29 , classification and regression trees (CART)30 as well as support vector machines (SVM)31 for delineating the structure-activity relationships. Many combinations of molecular descriptors and machine learning methodologies can be applied to these types of datasets to arrive at models that are quite acceptable regarding internal validation and even upon application to limited external validation datasets. However, if these datasets and models are to be truly useful in risk assessment, one must know the conditions under which a certain prediction is likely to be accurate, how likely the prediction is to be accurate and if it is not accurate, some diagnostic should be offered to explain the lack of confidence. An added benefit would be if a method produced accurate models and reliable results even under conditions in which training data were not evenly distributed – a real possibility with many datasets requiring modeling to be applied. In this study we report the investigation of such datasets using conformal predictions (CP) as an alternative to more traditional approaches in order to address some of these critical concerns.

ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The primary purpose of this study is not necessarily to derive better models in comparison to previous studies but to demonstrate the usefulness of a more stringent mathematical framework in the form of CP with respect to confidence of predictions as well as definition of the predictive boundaries of the derived models.

Materials and Methods Dataset sources. The present investigation was conducted using data from the ToxCast32 and Tox2124 chemical libraries similar to the sources used by Zang and co-workers.25 Tox21 compounds were treated as active if both of the two assays indicated an active compound. Dataset chemical structure description. The ToxCast and Tox21 chemical structures were standardized using the IMI eTOX project standardizer developed by Francis Atkinson, EBI (https://pypi.python.org/pypi/standardiser, https://wwwdev.ebi.ac.uk/chembl/extra/francis/standardiser/). After removal of duplicate ToxCast structures among the Tox21 chemicals the former dataset consisted of 319 active and 1482 inactive compounds (active/inactive = 1:4.7) while the corresponding numbers for the latter set were 183 and 4320, respectively (active/inactive = 1:23.6). The structures were described using signature descriptors of heights zero to three, h = 0,1,2,3.33 The signature descriptors are 2-D topological descriptors that describe the connectivity of each atom in a chemical structure with its neighboring atoms one (h = 1) or several (h > 1) bonds away. Thus, the signature descriptors are capable of capturing both local compositions of atoms and their neighboring atoms as well as the bonds between them. Data analysis.

ACS Paragon Plus Environment

Page 6 of 40

Page 7 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

The signature characterized datasets were analyzed using support vector machines (SVM) utilizing the Radial Basis function (RBF) kernel.34 The C-SVM algorithm was used with default parameters other than gamma and cost that were set to 0.002 and 50, respectively. An ensemble consisting of 100 derived SVM models was used. This procedure was conducted within the conformal prediction framework described in more detail below. The ToxCast dataset was used as training set for model building. The models were subsequently used to predict the outcome on the external Tox21 test set. The conformal prediction framework was also used to internally validate the ToxCast data set for performance (100 models with 20 % random selection of test set for each model) (Fig. 1). In the following section we describe conformal prediction with an aim to informally explain the idea behind the framework. A more formal description including proofs of the mathematical theorems on which the conformal prediction framework is built have been published by Vovk and co-workers.35 For some initial, more mathematically oriented, work in the QSAR domain, we refer to Eklund and co-workers.36-37 We also refer to Norinder and co-workers for more chemoinformatics oriented descriptions.38-39 In this work we use aggregated conformal prediction (ACP)40 which produces median estimations of the p-values (see section “Conformal Predictions” for a description) based on the ensemble of the 100 individual conformal prediction models developed using SVMs and signature descriptors for the ToxCast dataset (Fig. 2). Conformal Prediction A confidence predictor is a prediction algorithm that outputs a prediction region (Fig. 3). This is in contrast to the single label prediction output by many standard prediction algorithms, e.g.

ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

support vector machines (SVM) or random forest (RF). A particular type of confidence predictor is the conformal predictor. Validity, similar but not identical to accuracy, is an important concept in conformal prediction. At a chosen confidence level 1−ε a confidence predictor is valid if the number of errors it commits does not exceed ε. An attractive property of conformal predictors is that they are always valid, under the assumption that compounds are independently drawn from the same distribution, for which a mathematical proof has been presented by Vovk and coworkers.35 The same assumption is also made for most standard prediction algorithms used in QSAR so CP does not introduce new assumptions in addition to the ones that are generally used already for QSAR modeling. Also, a conformity measure (score) needs to be defined in order to construct the prediction regions of the conformal predictor. This is a way of measuring how similar a new compound is to existing compounds in the model. Relating and ranking conformity scores of new compounds to be predicted with conformity scores of previously (experimentally tested) compounds in the model is the core of conformal predictors. This is done using a p-value, the number of existing compounds that have as small or smaller conformity scores as the new compound, divided by the total number of compounds. The new compound is very nonconforming if this value (fraction) is small compared to the values for existing compounds in the model , i.e. the new compound differs from previous compounds because of its different conformity score compared to most of the existing compounds in the training set. On the other hand, the new compound is very conforming if the value (fraction) is large compared to the values for existing compounds in the model, i.e. very similar to most of the existing compounds in the training set. One set of conformity scores is generated for each experimental class to be modelled (Mondrian conformity scores). In this investigation binary, i.e. 2-class, models are derived where a compound is experimentally determined to be either active or inactive according to experimental results which means that 2 sets of conformity scores, one for each class, are

ACS Paragon Plus Environment

Page 8 of 40

Page 9 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

created. A significance level (ε) is set, e.g. at 0.2 (corresponding to a 0.8 confidence level), that is appropriate for the modeling situation in order to evaluate the predicted class(es) of new compounds from the model. For each new compound the fraction of conformity scores for existing compounds less than the conformity score of the new compound is computed for each of the, in our case, two classes (active, inactive) of the model (see Fig, 1 Conformal Predictors). This fraction must be greater than the set significance level (ε) for the new compound to be assigned to the class in question (Reference 39, see example under part 2.3). Thus a new compound can be predicted to belong to either the active or inactive class, respectively, belong to both classes (active and inactive, i.e. the model cannot distinguish between the two classes), or belong to the empty category. The empty category means that the model cannot assign a class label to the new compound because it is too different to the existing compounds in the model to be able to provide a reliable prediction. This last category is conventionally denoted as the new compound is outside the AD of the model. In CP compounds predicted to belong to the both or empty classes are regarded a correctly and erroneously predicted, respectively. We use inductive conformal prediction (ICP) for the individual conformal predictors in this study. In ICP we divide the training set of compounds into a “proper” training set and a “calibration” set by random selection (Fig. 1). The names proper training set and calibration set are the terms used in CP nomenclature for these two sets.35 The model and the conformal predictor are constructed using the proper training set. The calibration set is used for generating the sets of conformity scores for comparing new compounds to existing ones using the conformity measures and the p-value. The conformity measure that is used is the distance to the decision boundary between the classes of the SVM model. The greater the distance for a compound from the boundary towards each of the two classes, respectively, the more

ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

conforming. For pseudo code to further illustrate how to use conformal predictions, see Reference 39. Aggregated conformal prediction (ACP)40 , based on all 100 models, is then used for the final classification of each compound.

Results and Discussion The objective of this study was to expand the amount of useable information that can be obtained from a given dataset undergoing QSAR modelling, but also to demonstrate that even if a dataset is not evenly distributed, one can obtain accurate and reliable models without resorting to methods that artificially ‘rebalance’ an imbalanced dataset. Our results demonstrate that not only can accurate models be built, but that additional information can be derived from the application of Conformal Prediction. The results with respect to validity (accuracy), G-mean, sensitivity and specificity for ACPs as well as traditional ensemble predictions using the majority vote, for each compound, from the 100 SVM models are collected in Table 1 and Figures 4-7, respectively. Table 1 and Figure 4 show that conformal predictions using SVM and signature descriptors give median averaged conformal predictors that are valid at different confidence levels depending on how the treatment of compounds for which the model cannot distinguish between the two classes (predicts both classes) or compounds not assigned to any class (empty class) are considered. The percentage both category compounds are approximately 5 % for both datasets. By conformal prediction definition these compounds are treated as correct from a validity point of view. However, from a practical perspective these compounds are not uniquely classified and thus it can be argued that they could be left out of evaluations of the accuracy of ‘active’ and ‘inactive’

ACS Paragon Plus Environment

Page 10 of 40

Page 11 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

predictions. On the other hand, the fact that the conformal predictor has designated compounds to belong to the both category does provide information. These compounds are similar to other compounds previously investigated (training set) and not outside the structural domain for which the present model can provide statistically reliable predictions. However, features of their structures result in the model classifying them as possibly belonging to both categories. Interpretation of the actual meaning of a both prediction varies from model to model, but in the case of models built on structural fingerprint as in the present study, this observation could be due to the presence of significant features from both active and inactive structures. Several interesting aspects of ACP can be observed from the results of this study that are summarized here: 1. The desired level of validity of a CP model can be adjusted. The first ACP analysis was performed using a significance of 0.2. However, the results using the more pragmatic validity scheme, i.e. omitting the compounds designated as both, show that the derived conformal predictors from the ToxCast training set are not valid (72.3 % validity achieved vs 80 % required) from the internal validation procedure. This prompted a reinvestigation at the 0.3 significance level. From Table 1 it is evident that this is the level of significance that can be achieved for the training set and the conformal predictors are now valid at the level (70.1% achieved vs 70 % required). The fact that the significance level needed to be adjusted shows the flexibility of CP and that these levels can be readily changed, and the consequences immediately inspected, in order to achieve valid, usable models. 2. Little or no model performance decrease is seen on external validation. The predictions on the Tox21 external test set are also valid (Table 1) with a validity of 73.9 % (70 % required). The observation that the derived models have the ability to make external predictions at a

ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

similar validity level as the results from the training set is particularly important. Many times models have a tendency to under-perform on external test sets compaired to their internal validation statistics. Equivalent prediction results between training and test sets for CP classification models has been demonstrated before by Norinder and co-workers.39 3. Modelling of imbalanced datasets is managed in an accurate and unambiguous way. There are several challenges with the ToxCast and Tox21 datasets related to the distribution of the two classes. Firstly, the datasets are substantially imbalanced. The ToxCast dataset have a ratio of actives to inactives of 1:4.7 while the Tox21 set is even more imbalanced with a corresponding ratio of 1:23.6. Not only is the more interesting class, i.e. the active class, the minority class but an additional challenge is the fact that the models are built on one distribution of actives to inactives but the models are used to predict a test set with a completely different active:inactive ratio. This issue has been studied in the field of machine learning and is often called “covariate shift”41-42 and/or “sample selection bias”43. The traditional measure of accuracy is a rather meaningless measure of quality for these imbalanced datasets, particularly with respect to the performance of the minority class. Since we are primarily interested in the active minority class a far more interesting measure in this case is sensitivity, the true positive rate or positive recall for the minority class. The investigated datasets are related to environmental chemicals that affect various biological systems such as the estrogen hormone system and it is therefore of importance that the derived models do not fail to detect active compounds. In Table 1 there are two sets of sensitivity, specificity, validity and G-mean, i.e. xx_excl and xx_incl. The former set is calculated omitting compounds classified as both or empty while the latter include these compounds in the calculations. Again, it should be noted that, by CP definitions, all compounds classified as both or empty are treated as correctly and erroneously predicted,

ACS Paragon Plus Environment

Page 12 of 40

Page 13 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

respectively. In the following discussion we use the statistics for the excl set as a more fair comparison to results from traditional predictions. The results for the incl set, which from a CP perspective are more correct, are always higher than the corresponding numbers for the excl set. Table 1 and Figure 5 show that the sensitivity obtained for the external test set compounds assigned a single class is 77.3 %. This result compares favorably with results obtain in the investigation by Zang and co-workers25 and also the corresponding result for the combined G-mean measure, the geometric mean of sensitivity and specificity, of 75.5 % (Fig. 5). The G-mean represents a combined evaluation of the balance of model predictions with respect to both sensitivity and specificity, the true negative rate or negative recall for the majority class. This suggests that the model performs well in identifying both the minority as well as the majority class. Comparing the results from ACP with the corresponding traditional ensemble predictions from the 100 models without additional considerations such as over- or under-sampling of the training set as well as not using ACP and Mondrian predictions show clear advantages (Table 1 and Figs 5-7). Not only is the G-mean worse for the traditional approach but, more significantly, the imbalance between sensitivity and specificity for the datasets are clearly evident. While the specificity (of the majority class) is very high, as can be expected for largely imbalanced datasets. The sensitivity (of the minority class) is low which is of more concern since the number of false negatives must be kept at a very minimum. Thus the separate ACP treatment of the imbalanced classes appears to be beneficial for obtaining more balanced predictions for both classes. 4.

The need for a separate applicability domain measure with ambiguous interpretation is unnecessary. An important part of model development and deployment of models is to provide an estimate of statistical reliability for new predictions and give alerts if such reliability is not achieved. This concept is often referred to as the “Applicability Domain”

ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 40

(AD) of a model. Much work has been performed in this field over the years and several methods and approaches have been published.44-56 To adequately define the AD is one of the five principles of the OECD guidelines57-59 for deriving a validated QSAR model that can be used as an alternative to experimental testing to provide information for new compounds. CP has been proposed as more consistent approach for defining statistically reliability predictions within a well-defined mathematical framework.38-39 CP is based on the concept of validity, defined in section “Conformal Prediction”, and is a framework for instance-based prediction of reliability60, i e. estimating the individual reliability for each and every new compound to be predicted. The predictive boundaries of the model are defined by the significance level that it set for the model. This, in turn, gives the necessary conformity and, consequently, the median p-values needed for each class by every new compound in the ACP. Those compounds failing to achieve the necessary median pvalues for either set of conformity scores are designated as members of the empty class for which reliable predictions cannot be made. These compounds are also the ones that, in more traditional terms, are determined to be outside the AD of the models. In this way AD determination, from a CP framework perspective, is intimately linked to model construction and validity of the derived models.35 Consequently, it is straight-forward by internal validation of the training set to determine if valid models can or cannot be achieved at an acceptable level of significance. If this is not the case the alternatives may then be to lower the significance level and/or change statistical methodology, chemical description or alternatively inject more information of another kind into the model. All of these modifications can be readily performed within the CP framework and the consequences of such actions immediately assessed.

ACS Paragon Plus Environment

Page 15 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

Finally, it is important to note here that the particular choice of statistical method, SVM, and description of the investigated chemical structures, signature fingerprints, in this study, albeit resulting in good and balanced models, is perhaps not the most important finding of the investigation. Similar or even better models can most likely be derived using other combinations of statistical methods and descriptors. This study demonstrates that the ACP methodology provides a framework for the development of predictive models and defines statistically reliable validity boundaries for these models. ACP also represents a reasonable alternative for handling imbalanced datasets (Mondrian classification) without the need for additional considerations such as over- or under-sampling.

Conclusions The results obtained in this investigation of a large set of environmental chemicals from the ToxCast and Tox21 projects using aggregated conformal prediction (ACP) show that the ACP approach enables the development of accurate and well-balanced models with respect to validity as well as balance between sensitivity and specificity. Models with adequate sensitivity are of particular interest in risk assessment for the identification of compounds acting on endpoints such as hormone receptors due to their central role in many toxicologically-relevant pathways. Another relevant finding of this study is the close correspondence between the validity obtained for the training set and the corresponding value for the external test set, despite the fact that the class distributions of the two sets are quite different. Furthermore, the ACP methodology enables an unambiguous framework for identifying the boundaries of the model(s) for which the predictions of new compounds are statistically reliable. With these features, we conclude that ACP is a useful method in the reliable extraction of as much information as possible from often

ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

unevenly-distributed toxicology datasets. Finally, the results of this study indicate that accurate models can be produced with performance results that are similar to those published previously, but with the added benefit of an unambiguous definition of the predictive boundaries of the model for each individual prediction.

ABBREVIATIONS QSAR, Quantitative Structure Activity Relationships; REACH, Registration, Evaluation, Authorisation and Restriction of Chemicals; EDCs, Endocrine disrupting chemicals; ER estrogen receptor; NCTR, National Center for Toxicological Research; EDKB, Endocrine Disruptor Knowledge Base; LDA, linear discriminant analysis; CART, classification and regression trees; SVM support vector machines; RBF, Radial Basis function; RF, random forest; CP, conformal prediction; ICP, inductive conformal prediction; ACP, Aggregated Conformal Prediction

ACS Paragon Plus Environment

Page 16 of 40

Page 17 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

References 1. Birnbaum, L. S., and Fenton, S. E. (2003) Cancer and developmental exposure to endocrine disruptors. Environ. Health Perspect. 111, 389−394. 2. Mahoney, M. M., and Padmanabhan, V. (2010) Developmental programming: impact of fetal exposure to endocrine-disrupting chemicals on gonadotropin-releasing hormone and estrogen receptor RNA in sheep hypothalamus. Toxicol. Appl. Pharmacol. 247, 98−104. 3. Reif, D.M., Martin, M. T., Tan, S., Houck, K. A., Judson, R. S., Richard, A. M., Knudsen, T. B., Dix, D. J., and Kavlock, R.J. (2010) Endocrine profiling and prioritization of environmental chemicals using ToxCast data. Environ. Health Perspect. 118, 1714−1720. 4. Soto, A. M., and Sonnenschein, C. (2010) Environmental causes of cancer: endocrine disruptors as carcinogens. Nat. Rev. Endocrinol. 6, 363−370. 5. Rotroff,, D. M., Dix, D. J., Houck, K.A., Knudsen, T. B., Martin, M. T., McLaurin, K. W., Reif, D. M., Singh, A. V., Crofton, K. M., Xia, M., Huang, R., and Judson, R. S. (2013) Using in vitro high-throughput screening assays to identify potential endocrine disrupting chemicals. Environ. Health Perspect. 121, 7−14. 6. Cohen-Hubal, E. A., Richard, A.M., Aylward, L., Edwards, S.W., Gallagher, J., Goldsmith, J. M., Isukapalli, S., Tornero-Velez, R., Weber, E.J., and Kavlock RJ. (2010) Advancing exposure characterization for chemical evaluation and risk assessment. J. Toxicol. Environ. Health. B Crit. Rev. 13, 299−313. 7. Knudsen, T. B., Houck, K.A., Sipes, N., Singh, A. V., Judson, R.S., Martin, M. T., Weissman, A., Kleinstreuer, N., Mortensen, H.M., Reif, D. M., Rabinowitz, J.R., Setzer, W., Richard, A. M., Dix, D. J., and Kavlock, R. J. (2011) Activity profiles of 309 ToxCastTM chemicals evaluated across 292 biochemical targets. Toxicology 282, 1−15.

ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

8. National Research Council. (1984) Toxicity testing: strategies to determine needs and priorities, The National Academies Press, Washington DC. 9. Pease, W. (1997) Toxic ignorance: the continuing absence of basic health testing for topselling chemicals in the United States. Environmental Defense Fund. Diane Pub Co, Washington DC. 10. REGULATION (EC) No 1907/2006 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 18 December 2006 concerning the Registration, Evaluation, Authorization and Restriction of Chemicals (REACH), establishing a European Chemicals Agency, amending Directive 1999/45/EC and repealing Council Regulation (EEC) No 793/93 and Commission Regulation (EC) No 1488/94 as well as Council Directive 76/769/EEC and Commission Directives 91/155/EEC, 93/67/EEC, 93/105/EC and 2000/21/EC. Available: http://www.isopa.org/isopa/uploads/Documents/documents/l_39620061230en00010849.pdf [accessed 2 February 2015]. 11. U.S. Food and Drug Administration (EDKB database) (2015). Available: (http://www.fda.gov/ScienceResearch/BioinformaticsTools/EndocrineDisruptorKnowledgeba se/ucm135074.htm [accessed 2 February 2015]. 12. Klopman, G., and Chakravarti, S.K. (2003) Structure-activity relationship study of a diverse set of estrogen receptor ligands (I) using MultiCASE expert system. Chemosphere 51, 445459. 13. Hong, H., Tong, W., Xie, Q., Fang, H., and Perkins, R. (2005) An in silico ensemble method for lead discovery: Decision forest. SAR QSAR Environ. Res. 16, 339-347. 14. Tong, W., Xie, Q., Hong, H., Shi, L., Fang, H., and Perkins, R. (2004) Assessment of prediction confidence and domain extrapolation of two structure-activity relationship models for predicting estrogen receptor binding activity. Environ. Health Perspect. 112, 1249-1250.

ACS Paragon Plus Environment

Page 18 of 40

Page 19 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

15. Korhonen, S.-P., Tuppurainen, K., Laatikainen, R., and Peraekylae, M. (2005) Comparing the Performance of FLUFF-BALL to SEAL-CoMFA with a Large Diverse Estrogen Data Set: From Relevant Superpositions to Solid Predictions. J. Chem. Inf. Model. 45, 1874-1883. 16. Ghafourian, T., and Cronin, M. T. D. (2006) The effect of variable selection on the non-linear modelling of oestrogen receptor binding. QSAR Comb. Sci. 25, 824-835. 17. Liu, H., Papa, E., and Gramatica, P. (2006) QSAR Prediction of Estrogen Activity for a Large Set of Diverse Chemicals under the Guidance of OECD Principles. Chem. Res. Toxol. 19, 1540-1548. 18. Ji, L., Wang, X. D., Luo, S., Qin, L., Yang, X. S., Liu, S. S., and Wang, L. S. (2008) QSAR study on estrogenic activity of structurally diverse compounds using generalized regression neural network. Sci. China, Ser. B: Chem. 51, 677-683. 19. Ji, L., Wang, X. D., Yang, X. S., Liu, S. S., and Wang, L.S. (2008) Back-propagation network improved by conjugate gradient based on genetic algorithm in QSAR study on endocrine disrupting chemicals. Chin. Sci. Bull. 53, 33-39. 20. Stojic, N., Eric, S., and Kuzmanovski, I. (2010) Prediction of toxicity and data exploratory analysis of estrogen-active endocrine disruptors using counter-propagation artificial neural networks. J. Mol. Graph. Model. 29, 450-460. 21. Li, J., and Gramatica, P. (2010) QSAR classification of estrogen receptor binders and prescreening of potential pleiotropic EDCs. SAR QSAR Environ. Res. 21, 657-669. 22. METI, Ministry of Economy Trade and Industry, Japan. 2012. Current status of testing methods development for endocrine disrupters. 6th Meeting of the Task Force on Endocrine Disrupters Testing and Assessment (EDTA), 24–25 June 2002, Tokyo. Available: http://www.meti.go.jp/english/report/data/gEndoctexte.pdf [accessed 2 February 2015].

ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 40

23. Judson, R.S., Houck, K.A., Kavlock, R.J., Knudsen, T.B., Martin, M.T., and Mortensen, H.M., Reif, D.M., Rotroff, D.M., Shah, I., Richard, A.M., and Dix, D.J. (2010) In vitro screening of environmental chemicals for targeted testing prioritization: the ToxCast project. Environ. Health Perspect. 118, 485–492.

24. Huang, R., Sakamuru, S., Martin, M. T., Reif, D. M., Judson, R. S., Houck, K. A., Casey, W., Hsieh, J.-H., Shockley, K. R., Ceger, P., Fostel, J., Witt, K. L., Tong, W., Rotroff, D.M., Zhao, T., Shinn, P., Simeonov, A., Dix, D. J., Austin, C. P., Kavlock, R. J., Tice, R. R., and Xia, M. (2014) Profiling of the Tox21 10K compound library for agonists and antagonists of the estrogen receptor alpha signaling pathway. Sci. Rep. 4, 5664. 25. Zang, Q., Rotroff, D. M., and Judson, R. S. (2013) Binary Classification of a Large Collection of Environmental Chemicals from Estrogen Receptor Assays by Quantitative StructureActivity Relationship and Machine Learning Methods, J. Chem. Inf. Model. 53, 3244-3261. 26. O’boyle, N. M., Banck, M., James, C. A., Morley, C., Vandermeersch, T., and Hutchison, G.R. (2011) Open Babel: An open chemical toolbox. J. Cheminform. 3, 33. 27. Yap, C. W. (2011) PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J. Comput. Chem. 32, 1466-1474. 28. PubChem (2015). Available: http://pubchem.ncbi.nlm.nih.gov/ [accessed 2 February 2015]. 29. McLachlan, G. J. (2004) Discriminant Analysis and Statistical Pattern Recognition, Wiley Interscience , New Jersey. 30. Breiman, L. (1996) Bagging Predictors. Mach. Learn. 24, 123-140. 31. Cortes, C., and Vapnik, V. (1995) Support-vector networks. Mach. Learn. 20, 273-297. 32. Rotroff, D. M., Martin, T. M., Dix, D. J., Filer, D. L., Houck, K.A., Knudsen, T. B., Sipes, N. S., Reif, D. M., Xia, M., Huang, R., and Judson, R. S. (2014) Predictive Endocrine Testing in

ACS Paragon Plus Environment

Page 21 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

the 21st Century Using in Vitro Assays of Estrogen Receptor Signaling Responses. Environ. Sci. Technol. 48, 8706-8716. 33. Faulon, J. L., Collins, M., and Carr, R. D. (2004) The Signature Molecular Descriptor. 4. Canonizing Molecules Using Extended Valence Sequence. J. Chem. Inf. Comput. Sci. 44, 427–436. 34. Chang, C. C., and Lin, C. J. (2011) LIBSVM: A library for support vector machines. ACM Trans. Intel. Syst. Tech. 2, 1–27. Available: http://www.csie.ntu.edu.tw/~cjlin/libsvm [accessed 2 February 2015]. 35. Vovk, V., Gammerman, A., and Shafer, G. (2005) Algorithmic learning in a random world, Springer, New York. 36. Eklund, M., Norinder, U., Boyer, S., and Carlsson, L. (2012) Conformal prediction in QSAR, in Artificial Intelligence Applications and Innovations: Proceedings, Part II of AIAI 2012 International Workshops: AIAB, AIeIA, CISE, COPA, IIVC, ISQL, MHDW, and WADTMB (Iliadis, L., Maglogiannis, I., Papadopoulos, H., Karatza, K., and Sioutas, S., Eds), pp 166175, Springer, Berlin Heidelberg. 37. Eklund, M., Norinder, U., Boyer, S., and Carlsson, L. (2013) The application of conformal prediction to the drug discovery process. Ann. Math. Artif. Intell. 74, 117-132. 38. Norinder, U., Carlsson, L., Boyer, S., and Eklund, M. (2014) Introducing Conformal Prediction in Predictive Modeling. A Transparent and Flexible Alternative To Applicability Domain Determination. J. Chem. Inf. Model. 54, 1596–1603. 39. Norinder, U., Carlsson, L., Boyer, S., and Eklund, M. (2015) Introducing conformal prediction in predictive modeling for regulatory purposes. A transparent and flexible alternative to applicability domain determination. Regul. Toxicol. Pharm. 71, 279–284.

ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 40

40. Carlsson, L., Eklund, M., and Norinder, U. (2014) Aggregated Conformal Prediction, in Artificial Intelligence Applications and Innovations, IFIP Advances in Information and Communication Technology (Iliadis, L., Maglogiannis, I., Papadopoulos, H., Sioutas, S., and Makris, C., Eds), pp 231-240, Springer, Berlin Heidelberg. 41. Yu, X., Yu, M., Xu, L-X., Yang, J., and Xie, Z-Q. (2015) Training Classifiers under Covariate Shift by Constructing the Maximum Consistent Distribution Subset. Math. Probl. Eng. 2015, Article ID 302815. 42. Bickel, S., Brückner, M., and Scheffer, T. (2007) Discriminative learning for differing training and test distributions, in ICML '07 Proceedings of the 24th international conference on Machine learning (Ghahramani, Z., Ed), pp 81-88 , ACM, New York. 43. Cortes, C., Mohri, M., Riley, M., and Rostamizadeh A. (2008) Sample Selection Bias Correction Theory, in Proceedings from 19th International Conference, ALT 2008 (Freund, Y., Györfi, L., Turán, G., and Zeugmann, T., Eds), pp 38-53, Springer, Berlin Heidelberg. 44. Eriksson, L., Jaworska, J. S., Worth, A. P., Cronin, M. T.D., McDowell, R. M., and Gramatica, P. (2003) Methods for reliability, uncertainty assessment, and applicability evaluations of classification and regression based QSARs. Environ. Health Perspect. 111, 1361-1375. 45. Dimitrov, S., Dimitrova, G., Pavlov, T., Dimitrova, N., Patlewicz, G., Niemela, J., and Mekenyan, O. (2005) A stepwise approach for defining the applicability domain of SAR and QSAR models. J. Chem. Inf. Model. 45, 839-849. 46. Netzeva, T. I., Worth, A. P., Aldenberg, T., Benigni, R., Cronin, M. T. D., Gramatica, P., Jaworska, J. S., Kahn, S., Klopman, G., Marchant, C. A., Myatt, G., Nikolova-Jeliazkova, N., Patlewicz, G.Y., Perkins, R., Roberts, D.W., Schultz, T. W., Stanton, D. T., van de Sandt, J. J. M., Tong, W., Veith, G., and Yang, C. (2005) Current status of methods for defining the

ACS Paragon Plus Environment

Page 23 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

applicability domain of (quantitative) structure-activity relationships. The report and recommendations of ECVAM Workshop 52. ATLA 33, 155-173. 47. Bassan, A., and Worth, A.P. (2007) Computational Tools for Regulatory Needs, in Computational Toxicology: Risk Assessment for Pharmaceutical and Environmental Chemicals. (Ekins, S., Ed), pp 751-775, John Wiley & Sons , New York. 48. Schroeter, T.B., Schwaighofer, A., Mika, S., Laak, A.T., Suelzle, D., Ganzer, U., Heinrich, N., and Muller, K.-R. (2007) Estimating the domain of applicability for machine learning QSAR models: a study on aqueous solubility of drug discovery molecules. J. Comp. Aid. Mol. Des. 21, 651-664. 49. Weaver, S., and Gleeson, M. P. (2008) The importance of the domain of applicability in QSAR modeling. J. Mol. Graph. Model. 26, 1315-1326. 50. Dragos, H., Gilles, M., and Varnek, A. (2009) Predicting the predictability: a unified approach to the applicability domain problem of QSAR models. J. Chem. Inf. Model. 49, 1762-1776. 51. Sushko, I., Novotarskyi, S., Körner, R., Pandey, A. K., Cherkasov, A., Li, J., Gramatica, P., Hansen, K., Schroeter, T., Müller,, K.-R., Xi, L., Liu, H., Yao, X., Öberg, T., Hormozdiari, F., Dao, P., Sahinalp, C., Todeschini, R., Polishchuk, P., Artemenko, A., Kuz’min, V., Martin, T. M., Young, D. M., Fourches, D., Muratov, E., Tropsha, A., Baskin, I., Horvath, D., Marcou, G., Muller, C., Varnek, A., Prokopenko, V. V., and Tetko, I.V. (2010) Applicability domain for classification problems: benchmarking of distance to models for Ames mutagenicity set. J. Chem. Inf. Model. 50, 2094-2111. 52. Sahigara, F., Mansouri, K., Ballabio, D., Mauri, A., Consonni, V., and Todeschini, R. (2012) Comparison of Different Approaches to Define the Applicability Domain of QSAR Models. Molecules 17, 4791-4810.

ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

53. Sheridan, R. P. (2012) Three useful dimensions for domain applicability in QSAR models using random forest. J. Chem. Inf. Model. 52, 814-823. 54. Keefer, C. E., Kauffman, G. W., and Gupta, R. R. (2013) An interpretable, probability-based confidence metric for continuous QSAR models. J. Chem. Inf. Model. 53, 368-383. 55. Sheridan, R. P. (2013) Using Random Forest To Model the Domain Applicability of Another Random Forest Model. J. Chem. Inf. Model. 53, 2837–2850. 56. Wood, D. J., Carlsson, L., Eklund, M., Norinder, U., and Stålring, J. (2013) QSAR with experimental and predictive distributions: an information theoretic approach for assessing model quality. J. Comp. Aid. Mol. Des. 27, 203-219. 57. OCED (2004) http://www.oecd.org/env/ehs/risk-assessment/validationofqsarmodels.htm [accessed 2 February 2015]. 58. Joint Meeting of the Chemicals Committee and Working Party on Chemicals, Pesticides and Biotechnology (2004) http://www.oecd.org/env/ehs/risk-assessment/37849783.pdf [accessed 2 February 2015], http://www.oecd.org/officialdocuments/displaydocument/?doclanguage=en&cote=env/jm/mo no(2004)24 [accessed 2 February 2015]. 59. Joint Meeting of the Chemicals Committee and Working Party on Chemicals, Pesticides and Biotechnology (2007) http://www.oecd.org/officialdocuments/displaydocument/?doclanguage=en&cote=env/jm/mo no(2007)2 [accessed 2 February 2015]. 60. Yang, F., Wang, H-Z., Mi, H., Lin C-D., and Cai, W-W. (2009) Using random forest for

reliable classification and cost-sensitive learning for medical diagnosis. BMC Bioinf. 10 (Suppl 1), S22.

ACS Paragon Plus Environment

Page 24 of 40

Page 25 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

Tables

Descriptiona

Setb

Sensitivity

Specificity

Validityc

G-mean

Study

Zang_2013 Zang_2013 CP_excl CP_excl CP_incl CP_incl traditional traditional

training set external test set training set external test set training set external test set training set external test set

77.30 68.40 71.92 77.31 72.10 77.82 30.41 50.27

80.30 70.90 69.68 73.74 71.19 75.10 92.98 96.52

79.40 70.80 70.09 73.91 71.35 75.23 81.90 94.65

78.80 69.60 70.79 75.51 71.64 76.45 53.17 69.66

ref 25 ref 25 this work this work this work this work this work this work

Table 1. Statistics from predictions.

a) CP_excl = excluding compounds predicted as both or empty, CP_incl = including compounds predicted as “both” or empty. Traditional refers to using the majority vote, for each compound, from the 100 SVM models. b) Training set: Internal validation results for CP_excl, CP_incl and traditional (see section Data Analysis for details). c) Validity = validity for CP_incl and accuracy for traditional predictions (traditional and Zang 2013) and for CP_excl. Conformal prediction significance levels are set at 0.3, i e. 70 % confidence level.

ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure Legends Fig. 1. Conformal prediction framework Fig. 2. Aggregated conformal predictor scheme Fig. 3. Confidence predictor scheme Fig. 4. Validity (accuracy) from model predictions Fig. 5. G-mean from models predictions Fig. 6. Sensitivity from models predictions Fig. 7. Specificity from models predictions

ACS Paragon Plus Environment

Page 26 of 40

Page 27 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

Figure 1.

ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2.

ACS Paragon Plus Environment

Page 28 of 40

Page 29 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

Figure 3.

Figure 4.

ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 5.

ACS Paragon Plus Environment

Page 30 of 40

Page 31 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

Figure 6.

ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 7.

ACS Paragon Plus Environment

Page 32 of 40

Page 33 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

Table of Contents 32x16mm (300 x 300 DPI)

ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Fig. 1. Conformal prediction framework 87x49mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 34 of 40

Page 35 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

Fig. 2. Aggregated conformal predictor scheme 117x108mm (300 x 300 DPI)

ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Fig. 3. Confidence predictor scheme 58x13mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 36 of 40

Page 37 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

Fig. 4. Validity (accuracy) from model predictions 85x70mm (300 x 300 DPI)

ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Fig. 5. G-mean from models predictions 85x70mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 38 of 40

Page 39 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chemical Research in Toxicology

Fig. 6. Sensitivity from models predictions 85x70mm (300 x 300 DPI)

ACS Paragon Plus Environment

Chemical Research in Toxicology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Fig. 7. Specificity from models predictions 85x70mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 40 of 40