Subscriber access provided by UNIV OF SOUTHERN QUEENSLAND
Article
Enhancing acute oral toxicity predictions by using consensus modeling and algebraic form-based 0D-to-2D molecular encodes Cesar Garcia-Jacas, yovani Marrero-Ponce, Fernando Cortes-Guzman, José Suárez-Lezcano, Felix O. Martinez-Rios, Luis A. García-González, Mario Pupo-Meriño, and Karina Martinez-Mayorga Chem. Res. Toxicol., Just Accepted Manuscript • DOI: 10.1021/acs.chemrestox.9b00011 • Publication Date (Web): 08 May 2019 Downloaded from http://pubs.acs.org on May 8, 2019
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Chemical Research in Toxicology
1
Enhancing acute oral toxicity predictions
2
by using consensus modeling and algebraic
3
form-based 0D-to-2D molecular encodes
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
César R. García-Jacas ( ),1 Yovani Marrero-Ponce,2-3 Fernando Cortés-Guzmán,4 José Suárez-Lezcano,5 Felix O. Martinez-Rios,6 Luis A. García-González,7 Mario Pupo-Meriño7 and Karina Martinez-Mayorga.4
1Departamento
de Ciencia de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Ensenada, Baja California, México. 2Universidad San Francisco de Quito (USFQ), Grupo de Medicina Molecular y Traslacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas, Quito, Pichincha, Ecuador. 3Grupo de Investigación Ambiental (GIA), Programas Ambientales, Facultad de Ingenierías, Fundacion Universitaria Tecnologico Comfenalco – Cartagena, Cr44 DN 30 A, 91, Cartagena, Bolívar, Colombia. 4Instituto de Química, Universidad Nacional Autónoma de México (UNAM), Ciudad de México, México. 5Pontificia Universidad Católica del Ecuador Sede Esmeraldas (PUCESE), Esmeraldas, Ecuador. 6Facultad de Ingeniería, Universidad Panamericana (UP), Ciudad de México, México. 7Grupo de Investigación de Bioinformática, Universidad de las Ciencias Informáticas (UCI), La Habana, Cuba.
Corresponding author (
):
33
César R. García-Jacas
34
[email protected],
[email protected],
[email protected] 1 ACS Paragon Plus Environment
Chemical Research in Toxicology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
35
Page 2 of 45
Table of Contents (TOC) graphic
36
2 ACS Paragon Plus Environment
Page 3 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80
Chemical Research in Toxicology
Abstract Quantitative structure-activity relationships (QSAR) are introduced to predict acute oral toxicity (AOT), by using the QuBiLS-MAS framework for the molecular encoding. Three training sets were employed to build the models: EPA training set (5931 compounds), EPA-full training set (7413 compounds) and Zhu training set (10152 compounds). Additionally, the EPA test set (1482 compounds) was used for the validation of the QSAR models built on the EPA training set, while the ProTox (425 compounds) and T3DB (284 compounds) external sets were employed for the assessment of all the models. The k-nearest neighbor, multilayer perceptron, random forest and support vector machine procedures were employed to build several base (individual) models. The base models with 𝑅𝐸𝑃𝐴 ― 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 ≥ 0.75 and 𝑀𝐴𝐸𝐸𝑃𝐴 ― 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 ≤ 0.5 were retained to build consensus models. As a result, two consensus models based on the minimum operator and denoted as M19 and M22, as well as a consensus model based on the weighted average operator and denoted as M24, were selected as the best ones for each training set considered. According to the applicability domain (AD) analysis performed, model M19 (built on the EPA training set) has 𝑀𝐴𝐸𝑡𝑒𝑠𝑡 ― 𝐴𝐷 = 0.4044, 𝑀𝐴𝐸𝑃𝑟𝑜𝑇𝑜𝑥 ― 𝐴𝐷 = 0.4067 and 𝑀𝐴𝐸𝑇3𝐷𝐵 ― 𝐴𝐷 = 0.2586 on the EPA test set, ProTox external set and T3DB external set, respectively; whereas model M22 (built on the EPA-full set) and model M24 (built on the Zhu set) present 𝑀𝐴𝐸𝑃𝑟𝑜𝑇𝑜𝑥 ― 𝐴𝐷 = 0.3992 and 𝑀𝐴𝐸𝑇3𝐷𝐵 ― 𝐴𝐷 = 0.2286, and 𝑀𝐴𝐸𝑃𝑟𝑜𝑇𝑜𝑥 ― 𝐴𝐷 = 0.3773 and 𝑀𝐴𝐸𝑇3𝐷𝐵 ― 𝐴𝐷 = 0.2471 on the two external sets accounted for, respectively. These outcomes were compared and statistically validated with respect to 14 QSAR methods (e.g., admetSAR, ProTox-II) from the literature. As a result, model M22 presents the best overall performance. In addition, a retrospective study on 261 withdrawn drugs due to their toxic/side effects was carried out, in order to assess the usefulness of prospectively using the QSAR models proposed in the labeling of chemicals. A comparison with regard to the methods from the literature was also made. As a result, model M22 has the best ability of labelling a compound as toxic according to the Globally Harmonized System of Classification and Labelling of Chemicals (GHS). Therefore, it can be concluded that the models proposed, especially model M22, constitute prominent tools for studying AOT, at providing the best results among all the methods examined. A freely available software was also developed in order to be used in virtual screening tasks (http://tomocomd.com/apps/ptoxra).
Keywords: quantitative structure-activity relationships (QSAR), acute oral toxicity (AOT), machine learning, consensus model, ensemble model, applicability domain, algebraic forms, QuBiLS-MAS molecular descriptors. Running Head: Prediction of acute oral toxicity using the QuBiLS-MAS approach…
3 ACS Paragon Plus Environment
Chemical Research in Toxicology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
81
Page 4 of 45
1. Introduction
82
Chemical toxicity is the ability of a substance to cause hazardous biological damages in
83
vulnerable sites (e.g., in organs as the liver – hepatoxicity) or cells (cytotoxicity)1 in the human
84
body, or in other living systems such as plants,2 animals3 or ecosystems.4 Acute oral toxicity (AOT)
85
constitutes one of the most important toxicological endpoints to be assessed. LD50, indicator used
86
to measure AOT, is the dose of a chemical which kills 50% of the animals treated, either
87
immediately or in short time, after administration of a single dose or multiple doses within 24
88
hours.5 In vivo assays of animal tests are required to obtain an exact measure of AOT. However,
89
these are complex, expensive and time-consuming, especially when a large number of chemicals
90
are studied. In addition, because of ethical reasons and animal rights, LD50 experiments are quite
91
controversial. Thus, QSAR (Quantitative Structure-Activity Relationship) models constitute one
92
of the alternatives to be used to predict AOT.
93
Since the last quarter of the past century, several QSAR models have been introduced to
94
estimate this endpoint. In this sense, Enslein et al.6, 7 built models based on the multiple linear
95
regression (MLR) technique using non-congeneric chemicals, but they lack of predictive power.
96
To tackle this drawback, Eldred et al.8 and Guo et al.9 developed models using congeneric
97
compounds. Not surprisingly these have limited application ranges. Posteriorly, Zhu et al.10, 11
98
proposed models based on k-NN, random forest, hierarchical clustering and other methods, as well
99
as consensus models, showing the latter better performances than the formers. Nonetheless, the
100
predictive ability achieved by the consensus models is from low to moderate (see Table 3 in ref.10).
101
In general, these results highlight not only the complexity of the biological mechanisms involved
102
in AOT, but also the diversity of modes of action and the lack of chemical space coverage. It is
103
also important to remark that LD50 measurements, as any similarly complex in vivo endpoint, are
4 ACS Paragon Plus Environment
Page 5 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Chemical Research in Toxicology
104
expected to contain rather high quantity of noise, which reflects biological diversity as well as
105
testing conditions.
106
More recently, with the purpose of obtaining better results in the AOT prediction, the QSAR
107
models for the admetSAR12, 13 and ProTox-II14, 15 web tools were developed by using training sets
108
comprised of 10207 and 38515 compounds, respectively, in order to consider a wider chemical
109
space in the learning process. Also, other modeling strategies have been applied, for instance: (1)
110
Lagunin et al.16 presented models based on the combination of QNA (Quantitative Neighborhoods
111
of Atoms) descriptors, PASS (Prediction of Activity Spectra for Substances) predictions and self-
112
consistent regression; (2) Lu et al.17 proposed local lazy learning (LLL)-based methods, which use
113
linear regression, arithmetic mean or weighted mean to determine the predictions from k nearest
114
neighbors to a chemical structure given; and (3) Xu et al.18 introduced models based on a deep
115
learning architecture using improved molecular graph encoding convolutional neural networks.
116
As it can be seen, all these models were built with classic (e.g., k-NN) and modern (e.g., deep
117
learning) learning methods. The majority of them were created using similarity-based strategies10-
118
17
119
similar activities/toxicities. In addition, as a common characteristic, these QSAR models perform
120
the molecular encoding using ‘traditional’ molecular descriptors (MDs) (e.g., DRAGON MDs)
121
and/or fingerprints (e.g., MACCS keys). In general, all these approaches may seem an indicative
122
of that all the efforts have been made to build good models to predict AOT. However, the QSAR
123
models mentioned above present poor predictive power when are used in external sets (see Section
124
3.3). In our opinion, this behavior could be due to the MDs used. Indeed, as it has been analyzed
125
elsewhere,19-21 it is more important to use the set of “right” predictors than the choice of the “right”
126
learning algorithm.
(e.g., FDA-method), under the assumption of that structurally similar compounds may have
5 ACS Paragon Plus Environment
Chemical Research in Toxicology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 6 of 45
127
The traditional MDs (and fingerprints) included in the QSAR models reported to date to
128
predict AOT have as a characteristic that they are mostly chemically interpretable. However, an
129
easily interpretable descriptor is not necessarily a good descriptor to predict a determined endpoint.
130
In fact, a property of the MDs is that they present the ability of highlighting molecular features
131
that are not highlighted by a chemical interpretation. Consequently, the objective is often to
132
determine the ‘best’ predictive model, while the interpretation is an add-on. The latter is stated in
133
Principle 5 (Chapter 6) of the OECD Guidance Document on the Validation of QSAR Models:22
134
“a (Q)SAR should be associated with a mechanistic interpretation, if possible”. Therefore, the use
135
of more recent definitions of MDs, whose interpretation could be more difficult, may contribute
136
to develop better QSAR models to predict AOT.
137
Moreover, according to the complexity theory through the No Free Launch Theorem,23 it can
138
be said that no single QSAR model to predict AOT achieves superior results to all the others when
139
its performance is averaged over all possible chemicals. The reasoning for this opinion is that one
140
MDs family (e.g., DRAGON 0D-to-3D MDs) may not suffice to encode all the relevant structural
141
information for distinct compounds. In other words, the significance of the MDs depends on the
142
nature of the chemical structures under analysis. Thus, following the same line of thought, it is
143
necessary to use alternative approaches to encode new and orthogonal chemical information in
144
order to build more robust QSAR models. In this sense, several MDs both topological (2D) and
145
geometrical (3D) have recently been presented,24-29 being the QuBiLS-MAS (acronym for
146
Quadratic, Bilinear and N-Linear mapS based on graph-theoretic electronic-density Matrices and
147
Atomic weightingS) 2D-MDs29, 30 one of these novel definitions.
148
The QuBiLS-MAS 2D-MDs are the current definition of the former ToMoCoMD-CARDD
149
2D-MDs,31-36 at presenting novel generalizations and extensions.29 These MDs have been applied
6 ACS Paragon Plus Environment
Page 7 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Chemical Research in Toxicology
150
in a wide variety of applications.37-45 Indeed, the significant predictive power demonstrated by this
151
ligand-based method suggests that these 2D-MDs encode relevant chemical information of the
152
molecules. Therefore, all in all, this manuscript introduces a novel topological approach based on
153
the QuBiLS-MAS 2D-MDs to predict AOT. This manuscript is organized as follows. Section 2
154
describes the chemical datasets and the modeling methodology used. Section 3 analyzes the results
155
obtained and compares them with respect to several tools established in the literature. A statistical
156
assessment of the predictions obtained is also presented, as well as the outcomes achieved in a
157
retrospective virtual screening. Finally, Section 4 establishes the findings, conclusions and future
158
outlooks.
159 160
2. Material and Methods
161
2.1. Chemical datasets
162
The chemical dataset contained in the Toxicity Estimation Software Tool (T.E.S.T), provided
163
by the U.S. Environmental Protection Agency (EPA), was used to build the QSAR models.11 This
164
dataset is comprised of 5931 training compounds (hereafter denoted as EPA training set) and 1482
165
test compounds (hereafter denoted as EPA test set) annotated with oral rat LD50 values. Two other
166
training sets comprised of 7413 and 10207 compounds, respectively, were also accounted for. The
167
first one, denoted as EPA-full training set, was created by joining the EPA training/test sets
168
aforementioned; whereas the second one, widely used in the literature12-18 and denoted as Zhu
169
training set, was collected from Zhu et al.10 With these three training sets, comparability of results
170
among the QSAR models stated in the literature to those to be developed is guaranteed. It is
171
important to highlight that the EPA training set ⊆ EPA-full training set ⊆ Zhu training set (⊆
172
means subset).
7 ACS Paragon Plus Environment
Chemical Research in Toxicology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 8 of 45
173
Moreover, two sets comprised of 500 and 428 chemical structures annotated with oral rat
174
LD50 values were downloaded from the ProTox-II website15 and Toxin and Toxin Target Database
175
(T3DB – also well-known as Toxic Exposome Database),46 respectively, in order to be considered
176
as external validation sets. Additionally, a dataset composed of 292 withdrawn drugs because of
177
their toxic/side effects was manually obtained from the ‘Toxicities’ section in the WITHDRAWN
178
database.47 In this database, it is considered as a withdrawn drug the one recalled from market in
179
at least one country. The WITHDRAWN dataset was used to perform a simulation of prospective
180
virtual screening with the best QSAR models built. In this way, according to the AOT values to be
181
predicted, the relevance of employing the novel predictive models in the labelling of compounds
182
was analyzed (see Sections 2.6 and 3.5).
183
All these datasets were obtained as Simplified Molecular Input Line Entries (SMILES) or
184
Structure Data Format (SDF), and then converted to 2D-representations without H-depleted by
185
using the OpenBabel software.48 Filters to remove inorganic and organometallic compounds, salts,
186
compound mixtures and repeated compounds with respect to the Zhu training set were applied.
187
After that, the EPA training/test sets and EPA-full training set kept the same number of structures;
188
whereas the Zhu training set, ProTox external set, T3DB external set and WITHDRAWN set
189
contain 10152, 425, 284 and 267 compounds, respectively. The LD50 values for the chemicals
190
considered were expressed as mol/kg, and then these were stated as log[1/(mol/kg)] according to
191
standard QSAR practices. Supplementary Information 1 shows the SDF files of the chemical sets
192
explained above.
193
2.2. QuBiLS-MAS topological molecular descriptors
194
The atom- and bond-based QuBiLS-MAS topological molecular descriptors (2D-MDs) were
195
recently presented as a novel framework to characterize compounds.29, 30 This framework is the
8 ACS Paragon Plus Environment
Page 9 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Chemical Research in Toxicology
196
current definition of the former ToMoCoMD-CARDD 2D-MDs,31-36 at including novel matrix
197
approaches (e.g., double-stochastic), novel molecular cutoffs to take into account particular inter-
198
atomic relations (see Eq. 7 in ref.29), novel local-fragments to consider chemical regions of interest,
199
diversity of operators to generalize the calculation from LOVIs (Local Vertex Invariants) or LOEIs
200
(Local Edge Invariants), among other features.29 These 2D-MDs have been used in several
201
applications, such as: antimalarials,37 trichomonacidals,38-40 paramphistomicides,41 tyrosinase
202
inhibitors42, 43 and others.44, 45 The QuBiLS-MAS 2D-MDs calculation is based on the bilinear,
203
quadratic and linear algebraic forms. Thus, the original approach to compute the atom- [𝑎𝑏𝑘(𝑥,𝑦)]
204
and bond-based [𝑒𝑏𝑘(𝑥,𝑦)] bilinear QuBiLS-MAS MDs is a bilinear mapping as shown below: 𝑛
205
𝑎𝑏
𝑘
(𝑥,𝑦) =
𝑛
∑∑𝑚 𝑥 𝑦 = [𝑋] ℳ [𝑌] 𝑘 𝑖 𝑗 𝑖𝑗
𝑇
𝑘
(𝟏)
𝑖 = 1𝑗 = 1 𝑚
206
𝑒𝑏
𝑘
(𝑥,𝑦) =
𝑚
∑∑𝑒 𝑥 𝑦 = [𝑋] ℰ [𝑌] 𝑘 𝑖 𝑗 𝑖𝑗
𝑇 𝑘
(𝟐)
𝑖 = 1𝑗 = 1
207
where n (or m) is the total number of atoms (or bonds); k 1, 2, , 15 is the power of the
208
matrix approaches; 𝑚𝑘𝑖𝑗 (or 𝑒𝑘𝑖𝑗) denotes the coefficients of the k-th atom-based electronic-density
209
matrix ℳ𝑘 – see ref.33 (or bond-based matrix ℰ𝑘 – see ref.35); and xi and yi are the components of
210
the x and y atom-based (or bond-based) property vectors, respectively. Note that if the x and y
211
vectors encode the same property x y , then Equations 1 and 2 define atom-
212
bond-based [𝑒𝑞𝑘(𝑥,𝑥)] quadratic QuBiLS-MAS 2D-MDs (quadratic mapping), respectively; and
213
if all the entries of the x vector are equal to 1 and the y vector encodes a property, then Equations
214
1 and 2 define atom- [𝑎𝑓𝑘(𝑥,𝑥)] and bond-based [𝑒𝑓𝑘(𝑥,𝑥)] linear QuBiLS-MAS 2D-MDs (linear
215
mapping), respectively.
[𝑎𝑞𝑘(𝑥,𝑥)] and
9 ACS Paragon Plus Environment
Chemical Research in Toxicology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 10 of 45
216
The main contribution of the QuBiLS-MAS framework (formerly ToMoCoMD-CARDD) is
217
the generalization of the vector-matrix-vector procedure described in Equations 1 and 2. The ℳ𝑘
218
and ℰ𝑘 matrices are divided to atom-level (ℳ𝑎,𝑘) and bond-level (ℰ𝔢,𝑘), respectively, by using the
219
procedure to determine local-fragment matrices, but considering as fragment each atom “a” (see
220
Equation 6 in ref.31) or bond “𝔢” (see Equation 7 in ref.36) of the molecule. In this way, bilinear
221
(quadratic, linear) QuBiLS-MAS 2D-MDs for each atom [LOVI –
222
𝑏(𝑞,𝑓)ℒ𝑒]
223
tendency statistics) can be used on the LOVIs (or LOEIs) calculated to obtain diversity of global
224
molecular characterizations, just as confirmed elsewhere.49, 50 This generalization is unique in the
225
2D-MDs computation to date. Remark that if the summation operator is applied on the LOVIs (or
226
LOEIs) calculated, then the results exactly coincide with the original approach (Equations 1-2).
227
Schema 1 depicts a flowchart regarding the QuBiLS-MAS 2D-MDs calculation. 𝑏(𝑞,𝑓)ℒ𝑎
or bond [LOEI –
can be computed (see Equations 3-4). So, different aggregation operators (e.g., central
𝑛
228
𝑏(𝑞,𝑓)ℒ𝑎]
𝑎,𝑘
= 𝑏(𝑞,𝑓) (𝑥,𝑦) =
𝑛
∑ ∑𝑚
𝑎,𝑘 𝑖 𝑗 𝑖𝑗 𝑥 𝑦
= [𝑋]𝑇ℳ𝑎,𝑘[𝑌]
∀ 𝑎 = 1, 2, …, 𝑛
(𝟑)
𝑖 = 1𝑗 = 1 𝑚
229
𝑏(𝑞,𝑓)ℒ𝑒
= 𝑏(𝑞,𝑓)𝔢,𝑘(𝑥,𝑦) =
𝑚
∑ ∑𝑒
𝔢,𝑘 𝑖 𝑗 𝑖𝑗 𝑥 𝑦
= [𝑋]𝑇ℰ𝔢,𝑘[𝑌]
∀ 𝔢 = 1, 2, …, 𝑚
(𝟒)
𝑖 = 1𝑗 = 1
230 231
Schema 1 comes about here 2.3. Feature selection and modeling methodology
232
Firstly, 335 configurations of the bilinear, quadratic and linear QuBiLS-MAS 2D-MDs were
233
computed on the EPA training set. These configurations were created according to the outcomes
234
obtained in several cheminformatics studies (see Section 4 in ref.29), and they are available in the
235
software of the same name of the descriptors.30 Since numerous MDs are calculated with this
236
program, then the following workflow for data reduction was carried out on each descriptors
10 ACS Paragon Plus Environment
Page 11 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Chemical Research in Toxicology
237
matrix obtained: (1) the MDs with NaN (Not-a-Number) values and/or with values represented as
238
power of 10 were dropped; and (2) a supervised feature selection with regard to the LD50 values
239
corresponding to the EPA training compounds was performed using an in-house program. This
240
program retains the best 2D-MDs according to the Correlation Subset51 and Relief-F52 methods
241
available in the Weka software (v3.8).53 This supervised selection was repeated until obtaining an
242
only matrix of until 5000 QuBiLS-MAS 2D-MDs.
243
Once the previous workflow concluded, the two selection methods aforementioned and a
244
Shannon Entropy (SE)-based unsupervised method54 were applied on the MDs matrix obtained.
245
Thus, three subsets with the most suitable MDs were obtained, so that: (1) with the Correlation
246
Subset method51 a subset of features highly correlated with the activity and lowly correlated among
247
them was retained; (2) with the Relief-F method52 the features that best distinguish among
248
instances that are near each other were retained; and (3) with the SE-based method54 the features
249
that best yield distinct descriptions for structurally different molecules were retained. The SE-
250
based selection was performed with the IMMAN software,55 following a discretization scheme
251
equal to 5931 bins (size of the EPA training set). These three subsets were also shuffled among
252
themselves (ensemble feature selection56), obtaining four additional subsets. A wrapper-type
253
selection57 was also applied on all the previous subsets, by using the Best-First method as search
254
strategy, the Root Mean Square Error (RMSE) as evaluation measure, and the k-Nearest Neighbor
255
(k-NN),58 Multilayer Perceptron (MLP),59 Random Forest (RF)60 and Support Vector Machine
256
(SVM)61 learning methods.
257
Subsequently, several individual regression models (QSAR) were built using the learning
258
methods aforementioned, which are available in the Weka software (v3.8).53 In detail, the k-NN
259
method was configured with k-values (number of neighbors) between 3 and 6 (determined by cross
11 ACS Paragon Plus Environment
Chemical Research in Toxicology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 12 of 45
260
validation), and the ratio 1/distance was used as distance weighting scheme. Moreover, the Puk
261
function62 was used as kernel in the SVM. The remaining learning methods were utilized with the
262
respective default configurations. These configurations were also employed in the wrapper-type
263
selection explained above. It is important to highlight that on the subsets of MDs obtained from
264
the wrapper-type selection, only the learning method used in the wrapper was employed to build
265
the model. The performance of the models was measured using the Correlation Coefficient (𝑅),
266
Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) statistical parameters. The
267
training assessment was performed through 10-fold cross-validation. The base (individual) models
268
with 𝑅𝐸𝑃𝐴 ― 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 ≥ 0.75 and 𝑀𝐴𝐸𝐸𝑃𝐴 ― 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 ≤ 0.5 were retained to build ensemble and
269
consensus models.
270
The ensemble models were built using the Bagging,63 Additive Regression64 and Stacking65
271
procedures; while the consensus models were developed using the Maximum (Max), Minimum
272
(Min), Average (AVE) and Weighted Average (WA) aggregation functions. The weightings for
273
the WA function were calculated according to Equation 9 described in ref.49, with 1 . The
274
ensemble models with a training performance better than the one achieved by the best individual
275
model, as well as all the consensus models built, were assessed on the EPA test set. In this way,
276
the best models based on the EPA training set were selected according to their training and test
277
results. Finally, the same QuBiLS-MAS 2D-MDs and learning method used in the best models
278
built on the EPA training set were employed to build the models on the EPA-full and Zhu training
279
sets, respectively. All these models were assessed on the external sets considered to measure their
280
predictive abilities. Lastly, a comparison with respect to several QSAR methodologies established
281
in the literature (e.g., admetSAR,12 ProTox14) was performed. Schema 2 depicts the workflow
282
described above.
12 ACS Paragon Plus Environment
Page 13 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Chemical Research in Toxicology
Schema 2 comes about here
283 284
2.4. Applicability domain
285
An applicability domain (AD) analysis was carried out to know the behavior of the models
286
to perform reliable predictions.66-68 This analysis is often performed using only one method to
287
determine the structures that fall within the AD of a model (e.g., Liew et al.69 only used an approach
288
based on range, and Aranda et al.70 only used the leverage method). However, there are several
289
AD methods that characterize of distinct way the interpolation space stated by the MDs used.67 So,
290
inspired on the idea of consensus-based decision, five approaches (i.e., City-block, Euclidean,
291
Mahalanobis, Range and Density – see refs.71, 72 for a detailed description) were considered in a
292
recent study to determine the reliability of the predictions.73 More specifically, if the prediction for
293
a molecule lies outside of the bounds in at least two AD methods, then it is considered as unreliable.
294
This approach will be the one used in this work. These AD methods are available in the Ambit
295
Discovery software74 and they were used with their respective default configurations.
296
2.5. Statistical analysis of the external predictive ability
297
A statistical analysis of the predictive ability achieved on the ProTox and T3DB external sets
298
was carried out, with the purpose of knowing if the results obtained by the best QSAR models
299
proposed are significantly different to the ones achieved by the QSAR methods from the literature
300
(e.g., admetSAR,12 ProTox,14 DL-AOT18). To this end, the predictions within the AD of the best
301
models built and QSAR methods from the literature were accounted for. Then, the absolute error
302
module was computed to be used in this study. An analysis to examine the data normality was
303
made, by using the Kolmogorov–Smirnov test corrected by Lilliefors, and the Shapiro-Wilk test.75
304
If no normality was indicated pvalue 0.05 , then non-parametric tests were applied. Thus, the
305
Friedman test76 was used to determine global differences, and if these exist, then the post-hoc
13 ACS Paragon Plus Environment
Chemical Research in Toxicology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 14 of 45
306
Holm method77 was used for multiple comparisons. In this report, a significance level 0.05
307
was considered.
308
2.6. Retrospective virtual screening in a withdrawn drug dataset
309
With this retrospective screening, a study to know the ability of the best models proposed to
310
generate toxicological warnings was made. In this way, the usefulness of prospectively using the
311
QSAR models proposed in the early prediction of toxicity or in the labeling of chemicals was
312
assessed. An analysis with regard to the QSAR methods most commonly employed in the literature
313
for predicting AOT (i.e., TEST Consensus,10 admetSAR,12 ProTox14 and DL-AOT18) was also
314
performed. To this end, the WITHDRAWN dataset, composed of 267 drugs recalled from the
315
market in at least one country due to their toxic/side effects, was used (see Section 2.1). The
316
predictions obtained on the WITHDRAWN set were converted to mg/kg. Then, these predictions
317
were labeled (classified) according to the Globally Harmonized System of Classification and
318
Labelling of Chemicals (GHS):78 Class I (fatal if swallowed – LD50 ≤ 5), Class II (fatal if
319
swallowed – 5 < LD50 ≤ 50), Class III (toxic if swallowed – 50 < LD50 ≤ 300), Class IV (harmful
320
if swallowed – 300 < LD50 ≤ 2000), Class V (may be harmful if swallowed – 2000 < LD50 ≤ 5000)
321
and Class VI (non-toxic – LD50 > 5000). A labelling between Class I and Class V is considered
322
as toxicological warning.
323 324
3. Results and discussion
325
3.1. Performance of the best individual, ensemble and consensus models
326
From the data matrices built, 84 individual (or base) models (42 k-NN, 14 MLP, 14 RF and
327
14 SVM) were developed and examined. Only 12 of these regression models achieved the cutoff
328
of 𝑅𝐸𝑃𝐴 ― 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 ≥ 0.75 and 𝑀𝐴𝐸𝐸𝑃𝐴 ― 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 ≤ 0.5 in a 10-fold cross validation. Therefore, they
14 ACS Paragon Plus Environment
Page 15 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Chemical Research in Toxicology
329
were included in the pool of base models for the ensemble and consensus models building. Table
330
1 shows the performance of these models on the EPA training/test sets. As it can be observed, the
331
number of QuBiLS-MAS 2D-MDs included in the models range from 37 to 228, for a ratio of
332
cases-to-independent variable of 160:1 and 26:1 in the smallest and largest models, respectively.
333
It can also be observed that all the models present a good behavior on the test set, at presenting
334
values of 0.7671 ≤ 𝑅𝑡𝑒𝑠𝑡 ≤ 0.7991 and 0.4329 ≤ 𝑀𝐴𝐸𝑡𝑒𝑠𝑡 ≤ 0.4620. The best performances
335
belong to models M02 (𝑅𝑡𝑒𝑠𝑡 = 0.7991, 𝑀𝐴𝐸𝑡𝑒𝑠𝑡 = 0.4329), M01 (𝑅𝑡𝑒𝑠𝑡 = 0.7952, 𝑀𝐴𝐸𝑡𝑒𝑠𝑡
336
= 0.4355) and M03 (𝑅𝑡𝑒𝑠𝑡 = 0.7942 and 𝑀𝐴𝐸𝑡𝑒𝑠𝑡 = 0.4416), at being the only ones with 𝑅𝑡𝑒𝑠𝑡
337
≥ 0.79. Lastly, it can also be observed that the RF and SVM learning methods allow to achieve
338
the best correlations, except in models M11 and M12 that are based on the k-NN (𝑘 = 6) method.
339
Table 1 comes about here
340
Posteriorly, from the best individual (or base) models (see Table 1), ensemble models were
341
built using the Bagging,63 Additive Regression64 and Stacking65 procedures, as well as consensus
342
models based on the Max, Min, AVE and WA functions. Table 2 shows the statistics of the best
343
ensemble and consensus models built for the EPA training/test sets. As it can be observed, the
344
ensemble models present 0.7801 ≤ 𝑅𝐸𝑃𝐴 ― 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 ≤ 0.7885 and 0.4224 ≤ 𝑀𝐴𝐸𝐸𝑃𝐴 ― 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔
345
≤ 0.4312, being both statistics better than the ones obtained by the best base model (see Table 1).
346
It indicates that these models may have a better behavior on the test set. In this sense, it can be
347
noted that all the ensemble and consensus models achieve 0.8024 ≤ 𝑅𝑡𝑒𝑠𝑡 ≤ 0.8103 and 0.4106
348
≤ 𝑀𝐴𝐸𝑡𝑒𝑠𝑡 ≤ 0.4267. In detail, consensus models M17 (𝑅𝑡𝑒𝑠𝑡 = 0.8103 and 𝑀𝐴𝐸𝑡𝑒𝑠𝑡 = 0.4136),
349
M18 (𝑅𝑡𝑒𝑠𝑡 = 0.8102 and 𝑀𝐴𝐸𝑡𝑒𝑠𝑡 = 0.4138) and M19 (𝑅𝑡𝑒𝑠𝑡 = 0.8095 and 𝑀𝐴𝐸𝑡𝑒𝑠𝑡 = 0.4106)
350
are those with the best performance. Model M19 presents the lowest 𝑀𝐴𝐸𝑡𝑒𝑠𝑡.
351
Table 2 comes about here 15 ACS Paragon Plus Environment
Chemical Research in Toxicology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 16 of 45
352
Moreover, as it was explained above, regression models to predict AOT were also built on
353
both the EPA-full (7413 structures) and Zhu (10152 structures) training sets, in order to consider
354
a wider chemical space in the learning process and, in addition, to guarantee comparability of
355
results with regard to several models reported in the literature.12-17 Specifically, the QSAR models
356
developed are based on the same QuBiLS-MAS 2D-MDs and learning method used to build
357
models M13 and M15, which make up the consensus models from M17 to M19, that are the ones
358
with the best performances so far. Therefore, only consensus models based on the AVE, WA and
359
Min functions will be analyzed for the two training sets mentioned above. It is expected that the
360
new consensus models yield better predictions on external sets than the consensus models created
361
on the EPA training set (5931 structures).
362
In this sense, Table 3 shows the results obtained by the consensus models on the ProTox and
363
T3DB external sets. On one hand, if the performance of the models built on the EPA training set
364
(from M17 to M19), EPA-full training set (from M20 to M22) and Zhu training set (from M23 to
365
M25) is analyzed, it can be seen that, for the first two training sets, the best result is achieved by
366
the Min function-based consensus models M19 (𝑀𝐴𝐸𝑃𝑟𝑜𝑇𝑜𝑥 = 0.4073, 𝑀𝐴𝐸𝑇3𝐷𝐵 = 0.2968) and
367
M22 (𝑀𝐴𝐸𝑃𝑟𝑜𝑇𝑜𝑥 = 0.3971, 𝑀𝐴𝐸𝑇3𝐷𝐵 = 0.2796), respectively; whereas for the latter training set
368
mentioned, the AVE function-based model M23 (𝑀𝐴𝐸𝑃𝑟𝑜𝑇𝑜𝑥 = 0.3807, 𝑀𝐴𝐸𝑇3𝐷𝐵 = 0.3016) and
369
the WA function-based model M24 (𝑀𝐴𝐸𝑃𝑟𝑜𝑇𝑜𝑥 = 0.3805, 𝑀𝐴𝐸𝑇3𝐷𝐵 = 0.3049) are the best ones.
370
On the other hand, it can be observed that the performance of the models on the ProTox set
371
improves as long as the number of training compounds increases. A similar behavior can be noted
372
in the T3DB external set. However, in this external set, the models built on the Zhu set present a
373
slight decrease in its performance as compared to the ones built on the EPA-full training set. See
374
Supplementary Information 2 for the values predicted.
16 ACS Paragon Plus Environment
Page 17 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
375 376
Chemical Research in Toxicology
Table 3 comes about here 3.2. Applicability domain of the best models built
377
An applicability domain (AD) analysis (see Section 2.4) for each consensus model built was
378
done, with the purpose of determining how trustworthy are the predictions performed by them.
379
Table 4 shows the outcomes obtained in this study for the EPA test set, ProTox external set and
380
T3DB external set. If the coverage statistical parameter (i.e., ratio of compounds that is expected
381
to fall within the AD) is examined, it can be observed that for the EPA test set, the models present
382
a coverage equal to 98.25%; while for the ProTox and T3DB external sets, the models present a
383
coverage between 93.58% and 96.89% and between 93.66% and 95.42% of the predictions
384
performed, respectively. Therefore, in a general sense, it can be concluded that all the consensus
385
models built perform highly reliable predictions according to the corresponding AD. Nonetheless,
386
it is important to remark that the models built on the EPA-full training set (from M20 to M22) are
387
those with the best coverage, followed by the models built on the EPA training set (from M17 to
388
M19) and Zhu training set (from M23 to M25), respectively.
389
Table 4 comes about here
390
Moreover, if the statistics of the predictions performed are analyzed (see Table 4), it can be
391
observed that all the consensus models present comparable-to-superior results with respect to the
392
ones obtained when the AD is not accounted for (see Tables 2-3). Note that the best results were
393
achieved on the T3DB external set, where the models present 0.2286 ≤ 𝑀𝐴𝐸𝑇3𝐷𝐵 ― 𝐴𝐷 ≤ 0.2635
394
(without AD: 0.2796 ≤ 𝑀𝐴𝐸𝑇3𝐷𝐵 ≤ 0.3057) and 0.3859 ≤ 𝑅𝑀𝑆𝐸𝑇3𝐷𝐵 ― 𝐴𝐷 ≤ 0.4405 (without
395
AD: 0.5091 ≤ 𝑅𝑀𝑆𝐸𝑇3𝐷𝐵 ≤ 0.5272). Lastly, it can be noted that models M19 (𝑀𝐴𝐸𝑡𝑒𝑠𝑡 ― 𝐴𝐷
396
= 0.4044, 𝑀𝐴𝐸𝑃𝑟𝑜𝑇𝑜𝑥 ― 𝐴𝐷 = 0.4067, 𝑀𝐴𝐸𝑇3𝐷𝐵 ― 𝐴𝐷 = 0.2586), M22 (𝑀𝐴𝐸𝑃𝑟𝑜𝑇𝑜𝑥 ― 𝐴𝐷 = 0.3992
397
, 𝑀𝐴𝐸𝑇3𝐷𝐵 ― 𝐴𝐷 = 0.2286), M23 (𝑀𝐴𝐸𝑃𝑟𝑜𝑇𝑜𝑥 ― 𝐴𝐷 = 0.3774, 𝑀𝐴𝐸𝑇3𝐷𝐵 ― 𝐴𝐷 = 0.2434) and M24
17 ACS Paragon Plus Environment
Chemical Research in Toxicology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 18 of 45
398
(𝑀𝐴𝐸𝑃𝑟𝑜𝑇𝑜𝑥 ― 𝐴𝐷 = 0.3773, 𝑀𝐴𝐸𝑇3𝐷𝐵 ― 𝐴𝐷 = 0.2471), just as in the previous analysis without AD,
399
are the ones with the best performances.
400
3.3. Comparison with respect to other QSAR models reported in the literature
401
So far, a description of the best models built, and an analysis of the outcomes achieved by
402
them have been carried out. However, it is important to perform a comparison with respect to other
403
procedures established in the literature. To this end, from the results previously described, it can
404
be stated that consensus models M19, M22 and M24 (see Supplementary Information 3 to choose
405
between models M23 and M24) are the best ones for each training set accounted for. Thus, they
406
will be used in the comparison. Several freely accessible or commercial QSAR/similarity-based
407
tools are also used for this study. These tools include the T.E.S.T software (version 4.2.1),10, 11
408
GUSAR software,16 admetSAR web server (v1.0),12, 13 DL-AOT web server18 and ProTox-II web
409
server.14, 15 Hereafter, the combinatorial use of the QNA (Quantitative Neighborhoods of Atoms)
410
and MNA (Multilevel Neighborhoods of Atoms) MDs is referred to as GUSAR consensus method
411
(see Supplementary Information 4 for the training results with this approach).
412
Study Case 1: Performance on the EPA test set
413
Figure 1 shows a comparative graphic of the performance (MAE) achieved by seven QSAR
414
methods reported in the literature with regard to the performance achieved by consensus model
415
M19 on the EPA test set (1482 compounds). All these methods were built on the EPA training set
416
(5931 compounds). Therefore, comparability of results is guaranteed. In detail, the methods from
417
the literature to be compared are the GUSAR consensus method considering nearest neighbors
418
(GUSAR wNN),16 the GUSAR consensus method without considering nearest neighbors (GUSAR
419
nNN),16 the GUSAR method using only QNA MDs,16 the GUSAR method using only MNA
420
MDs,16 the T.E.S.T FDA method (TEST FDA),10 the T.E.S.T Nearest Neighbors method (TEST
18 ACS Paragon Plus Environment
Page 19 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Chemical Research in Toxicology
421
NN)10 and the T.E.S.T Consensus method (TEST Consensus).10 The comparisons were performed
422
according to the AD of the GUSAR wNN method, the TEST Consensus method and model M19.
423
T.E.S.T Hierarchical Clustering method10 was not included in this comparison due to its poor
424
performance. Figure 1 comes about here
425 426
As it can be observed in Figure 1, consensus model M19, built with the QuBiLS-MAS 2D-
427
MDs, presents the best performance irrespectively of the AD considered, being the only model in
428
always achieving 𝑀𝐴𝐸𝑡𝑒𝑠𝑡 < 0.41. It can also be noted that the AD of model M19 is the second
429
best (𝑐𝑜𝑣𝑒𝑟𝑎𝑔𝑒 = 98.25%), which is only slightly overcome by the AD of the TEST Consensus
430
method (𝑐𝑜𝑣𝑒𝑟𝑎𝑔𝑒 = 98.38%). Moreover, in a general sense, the GUSAR wNN method achieves
431
the second-best results, only surpassed by the TEST Consensus method if the AD of this latter is
432
analyzed. Finally, it can be observed that the GUSAR nNN and TEST NN methods present similar
433
performances, and that the TEST FDA, GUSAR MNA and GUSAR QNA methods are the ones
434
with the worst outcomes. Supplementary Information 5 shows the statistical parameters obtained
435
according to all the ADs.
436
Study Case 2: Performance on the ProTox external set
437
Figure 2 depicts the performance achieved by the GUSAR wNN,16 GUSAR nNN,16 DL-
438
AOT18 and admetSAR12 methods, as well as the one obtained by consensus models M19, M22 and
439
M24, according to their ADs on the ProTox external set (425 compounds) (for more details see
440
Supplementary Information 6). The T.E.S.T software10 fails during the prediction (error in the
441
MDs computation) on the ProTox set and, thus, none of its methods were taken into account in the
442
current comparison. ProTox method14, 15 was not considered in this study because the external set
443
used presents a similar distribution to the training set (38515 compounds) of this method. Note
19 ACS Paragon Plus Environment
Chemical Research in Toxicology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 20 of 45
444
that, excepting consensus models M19 and M22, all the remaining QSAR methods were built on
445
the Zhu training set (10152 compounds). Figure 2 comes about here
446 447
As a result, consensus model M24 is the one with the best performance for all the ADs
448
analyzed. This model is the only one in always achieving 𝑀𝐴𝐸𝑃𝑟𝑜𝑇𝑜𝑥 < 0.38, whereas the other
449
QSAR methods yield 𝑀𝐴𝐸𝑃𝑟𝑜𝑇𝑜𝑥 > 0.39. It can also be observed that consensus models M19
450
(0.3965 ≤ 𝑀𝐴𝐸𝑃𝑟𝑜𝑇𝑜𝑥 ≤ 0.4054) and M22 (0.3907 ≤ 𝑀𝐴𝐸𝑃𝑟𝑜𝑇𝑜𝑥 ≤ 0.3967), as well as the
451
admetSAR
452
(0.3915 ≤ 𝑀𝐴𝐸𝑃𝑟𝑜𝑇𝑜𝑥 ≤ 0.4029) methods, present comparable performances, even when models
453
M19 and M22 were built on smaller training sets. Note that the GUSAR nNN
454
(0.4164 ≤ 𝑀𝐴𝐸𝑃𝑟𝑜𝑇𝑜𝑥 ≤ 0.4269) and DL-AOT (0.4602 ≤ 𝑀𝐴𝐸𝑃𝑟𝑜𝑇𝑜𝑥 ≤ 0.4654) methods are
455
those with the poorest outcomes. DL-AOT method is based on a convolutional neural network,
456
which is a more complex procedure than the one used by the models proposed here; even then,
457
models M24, M22 and M19 achieve better results for the external set under study.
458
Study Case 3: Performance on the T3DB external set
(0.3905 ≤ 𝑀𝐴𝐸𝑃𝑟𝑜𝑇𝑜𝑥 ≤ 0.3999)
and
GUSAR
wNN
459
Finally, Figure 3 represents the performance obtained by the GUSAR wNN,16 GUSAR
460
nNN,16 TEST Consensus,10 DL-AOT,18 admetSAR12 and ProTox14, 15 methods, as well as the one
461
achieved by the best QSAR models proposed in this report (i.e., consensus models M19, M22 and
462
M24), according to their ADs on the T3DB external set (284 compounds). Other QSAR models
463
were also analyzed (e.g., TEST NN, GUSAR MNA), but as they yielded poor outcomes, then these
464
were not graphically represented. Supplementary Information 7 contains the results corresponding
465
to the 14 methods examined. Remark that the ProTox method was built on 38515 training
466
compounds, that the T.E.S.T software-based methods and model M19 were created on the EPA 20 ACS Paragon Plus Environment
Page 21 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Chemical Research in Toxicology
467
training set (5931 compounds), that model M22 was developed on the EPA-full training set (7413
468
compounds), and that the other QSAR methods were built on the Zhu training set (10152
469
compounds). Figure 3 comes about here
470
As it can be noted, consensus model M22 (0.2165 ≤ 𝑀𝐴𝐸𝑃𝑟𝑜𝑇𝑜𝑥 ≤ 0.2634), followed by
471
DL-AOT
method
(0.2374 ≤ 𝑀𝐴𝐸𝑃𝑟𝑜𝑇𝑜𝑥 ≤ 0.2752) and consensus models M19
472
the
473
(0.2368 ≤ 𝑀𝐴𝐸𝑃𝑟𝑜𝑇𝑜𝑥 ≤ 0.2798) and M24 (0.2451 ≤ 𝑀𝐴𝐸𝑃𝑟𝑜𝑇𝑜𝑥 ≤ 0.2870), are the ones with
474
the best outcomes for all the ADs analyzed. Specifically, consensus model M22 is the only one in
475
obtaining 𝑀𝐴𝐸𝑇3𝐷𝐵 < 0.23 (𝑀𝐴𝐸𝑇3𝐷𝐵 = 0.2165 and 𝑀𝐴𝐸𝑇3𝐷𝐵 = 0.2226 within the AD of the
476
TEST Consensus method and model M24, respectively); whereas the remaining procedures
477
achieve results superior to the threshold aforementioned. Moreover, it can also be seen that the
478
GUSAR
479
(0.3080 ≤ 𝑀𝐴𝐸𝑃𝑟𝑜𝑇𝑜𝑥 ≤ 0.3544) methods have comparable performances. Curiously, the
480
ProTox, admetSAR and TEST Consensus methods, that are those most commonly used in the
481
literature, obtain the poorest behaviors, at presenting 𝑀𝐴𝐸𝑇3𝐷𝐵 > 0.6.
nNN
(0.2754 ≤ 𝑀𝐴𝐸𝑃𝑟𝑜𝑇𝑜𝑥 ≤ 0.3113)
and
GUSAR
wNN
482
All in all, it can be concluded that the consensus models (M19, M22 and M24) proposed
483
constitute prominent alternatives to be used in the study of AOT, due to the fact that they achieve
484
results superior to the ones obtained by all the QSAR methods analyzed from the literature. In this
485
way, the hypothesis assumed in the present work is confirmed, which pose that the use of current
486
definitions of MDs may contribute to develop better QSAR models to predict AOT. It is important
487
to remark that the models built always presented the best performance regardless of the chemical
488
dataset used to assess them, to difference of the QSAR methods from the literature, which showed
489
different performances in the sets considered. For instance: DL-AOT method was the worst one in 21 ACS Paragon Plus Environment
Chemical Research in Toxicology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 22 of 45
490
the ProTox external set, whereas it was second-best one in the T3DB external set. A statistical
491
assessment to confirm the results obtained is presented below.
492
3.4. Statistical analysis of the external predictive ability achieved by the best models built
493
This analysis was made to determine if the outcomes obtained by the best consensus models
494
built (i.e., M19, M22 and M24) are statistically different to the ones achieved by the QSAR
495
methods from the literature. To this end (see Section 2.5), the absolute error module was computed
496
for the 360 (out of the 425) and 250 (out of the 284) predictions within the AD of all the QSAR
497
methods (models) in the ProTox and T3DB external sets, respectively (see Supplementary
498
Information 8). For the analysis in the ProTox set, the admetSAR12 and DL-AOT18 methods, as
499
well as the GUSAR software-based methods16 (i.e., GUSAR wNN, GUSAR nNN, GUSAR MNA,
500
GUSAR QNA) were considered; whereas for the T3DB set, in addition of the QSAR methods
501
aforementioned, the ProTox14, 15 method and the T.E.S.T software-based methods10, 11 (i.e., TEST
502
HC, TEST FDA, TEST NN and TEST Consensus) were also accounted for.
503
According to the normality test made, the null hypothesis was rejected for all the methods
504
under study in both chemical datasets (Supplementary Information 9). Therefore, non-parametric
505
tests were applied. In this sense, global differences were determined according to the results
506
obtained from the Friedman test (Supplementary Information 10). According to this test, model
507
M24 has the rank of 1 (best performing model) in the ProTox set, followed by models GUSAR
508
wNN, M19, M22, admetSAR, DL-AOT, GUSAR nNN, GUSAR MNA and GUSAR QNA, in that
509
order; whereas in the T3DB set, model M22 is the one with the best performance, followed by
510
models M19, DL-AOT, M24, GUSAR MNA, GUSAR nNN, TEST FDA, GUSAR wNN, GUSAR
511
QNA, TEST NN, TEST Consensus, ProTox, admetSAR and TEST HC, in that order. These
512
outcomes confirm the descriptive assessment performed in the previous section, where was
22 ACS Paragon Plus Environment
Page 23 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Chemical Research in Toxicology
513
indicated that the QSAR models based on the QuBiLS-MAS 2D-MDs obtain the best overall
514
performances.
515
Finally, it can be observed that, according to the results obtained from the Holm post-hoc
516
procedure (Supplementary Information 10), model M24 is statistically superior to the DL-AOT,
517
GUSAR nNN, GUSAR MNA and GUSAR QNA methods in the ProTox set. In this dataset, model
518
M24 is not statistically better than the GUSAR wNN and admetSAR methods, neither with respect
519
to models M19 and M22. Moreover, in the T3DB set, model M22 presents a statistically superior
520
performance with regard to all the QSAR methods from the literature (Supplementary Information
521
10), excepting the DL-AOT method, as well as models M19 and M24. Therefore, it can be
522
statistically stated that the QuBiLS-MAS 2D-MDs allowed to build the best QSAR models to
523
predict AOT to date. Lastly, it can also be statistically concluded that model M22 is the best of all
524
by a short difference, followed by models M24 and M19 (Supplementary Information 11),
525
respectively.
526
3.5. Results of the retrospective virtual screening in a withdrawn drug dataset
527
This study was carried out to assess the usefulness of prospectively using the best QSAR
528
models created in the labeling of chemicals. To this end, the WIHTDRAWN dataset was used
529
because it contains drugs recalled from the market in at least one country, due to their toxic/side
530
effects. Figure 4A shows the labeling yielded by models M19, M22 and M24 according to the
531
GHS standard on the set aforementioned (see Supplementary Information 12). On one hand, it is
532
expected that the majority of these withdrawn drugs is not labeled as highly toxic (Class I-II – fatal
533
if swallowed), due to the fact that they were marketable drugs and, thus, they passed several pre-
534
clinical toxicological assays. Thereby, it can be analyzed that the models considered only label
535
four drugs as highly toxics: indomethacin (gastrointestinal), amphetamine (cardiovascular),
23 ACS Paragon Plus Environment
Chemical Research in Toxicology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 24 of 45
536
aprobarbital (respiratory) and bithionol (dermatological). This labeling is in correspondence with
537
the toxic/side effects evidenced by these drugs.
538
Figure 4 comes about here
539
On the other hand, it is also expected that only a few of these drugs be in a Class VI (non-
540
toxic) labeling, due to the fact that, as it was aforementioned, they present toxic/side effects that
541
caused them to be recalled from the market. In this sense, it can be noted that only 7 (0.026%), 9
542
(0.033%) and 17 (0.063%) drugs were labeled in this class (e.g., gentamicin) by models M19, M22
543
and M24, respectively. Lastly, it can be observed that the majority of these compounds were
544
labeled between Class III (e.g., oxyphenisatin) and Class V (e.g., practolol), mainly Class IV (e.g.,
545
alosetron). That is, the majority of these withdrawn drugs were identified as toxic or harmful if
546
swallowed. In general, models M19, M22 and M24 yielded toxicological warnings between Class
547
I and Class IV for 226 (84.64%), 221 (82.77%) and 201 (75.28%) drugs, and between Class I and
548
Class V for 260 (97.38%), 258 (96.63%) and 250 (93.63%) drugs, respectively.
549
Moreover, Figure 4B shows the labeling obtained by model M22 and the TEST Consensus,10
550
DL-AOT,18 admetSAR12 and ProTox14 methods. Model M22 was selected because it presents the
551
best external predictive power (see Section 3.4). As it can be noted, similar results were obtained
552
by the methods from the literature, at labelling the majority of the withdrawn drugs between Class
553
III and Class V. In summary, model M22, TEST Consensus method, DL-AOT method, admetSAR
554
method and ProTox method generate toxicological warnings between Class I and Class IV for 221
555
(82.77%), 207 (77.53%), 197 (73.78%), 192 (71.91%) and 200 (74.91%) drugs, and between Class
556
I and Class V for 258 (96.63%), 250 (93.63%), 253 (94.76%), 248 (92.88) and 250 (93.63%) drugs,
557
respectively. As it can be analyzed, for both warning groups, model M22 is the one with the highest
558
ability of labelling a compound as toxic (or harmful). Note that model M19 also presents better
24 ACS Paragon Plus Environment
Page 25 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Chemical Research in Toxicology
559
ability than the methods from the literature to generate toxicological warnings, while model M24
560
has a comparable behavior.
561
To conclude, Table 5 shows the Pearson correlation-based similarity coefficients among the
562
consensus models proposed and the QSAR methods from the literature considered in this study.
563
The similarity coefficients were determined from the toxicity classes obtained according to the
564
GHS labelling. As it can be observed, models M19, M22 and M24 majorly present moderate-to-
565
low similarity coefficients 0.7 with respect to the TEST Consensus, DL-AOT, admetSAR and
566
ProTox methods. Only consensus models M22 and M24 have high similarities with regard to the
567
admetSAR method, with coefficients equal to 0.723 and 0.848, respectively. It can also be noted
568
that no similarity coefficient among the models proposed is greater than 0.9. All in all, it can be
569
stated that models M19, M22 and M24 yield non-redundant and better results than the other QSAR
570
methods studied in the labelling of chemicals, justifying in this way their use in prospective studies.
571
Tables 5 comes about here
572
3.6. Chemometric analysis of the QuBiLS-MAS molecular descriptors included in the best
573
QSAR models built
574
This section is devoted to assessing the usefulness of using the QuBiLS-MAS 2D-MDs in
575
the AOT modeling. A Shannon Entropy (SE)-based variability analysis (VA) was firstly carried
576
out in order to quantify the information content codified by MDs.54 In this way, relevant MDs for
577
cheminformatics studies can be identified according to the principle that high-entropy values
578
correspond to those MDs with good ability to discriminate among structurally distinct compounds,
579
while low-entropy values are indicative of the opposite.79 To this end, a dataset comprising by all
580
the compounds belonging to the Zhu training set and ProTox/T3DB external sets was used (see
581
Supplementary Information 13). The discretization scheme adopted is equal to 10861 bins, which
25 ACS Paragon Plus Environment
Chemical Research in Toxicology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 26 of 45
582
is the number of compounds within the dataset mentioned above. The IMMAN software55 was
583
employed to perform this VA.
584
Figure 5 depicts the SE distribution corresponding to the 200 QuBiLS-MAS MDs integrating
585
the best QSAR models built, as well as the SE distributions corresponding to the best 200 0D-to-
586
2D MDs, according to the VA-criterion, computed by the DRAGON (v5.5) and PaDEL software,
587
respectively. If the most relevant 50 MDs for each distribution are analyzed, it can be stated that
588
the QuBiLS-MAS framework allows to compute MDs with superior information content than all
589
the universe of 0D-to-2D MDs obtained by the two other software considered. In this way, it is
590
demonstrated that the QSAR models proposed contain MDs that better characterize the compounds
591
considered for the AOT modeling in this report. Thus, the use of the QuBiLS-MAS 2D-MDs is
592
justified. It is important to highlight that in a previous study (see Figure 5 in ref.29) was proved that
593
the best QuBiLS-MAS 2D-MDs, according to the VA-criterion, always present a SE behavior
594
superior than all the 0D-to-2D MDs computed by several tools established in the literature (e.g.,
595
DRAGON, MOL2, CDK, BLUECAL).
596
Figure 5 comes about here
597
Moreover, a linear independence analysis was performed through the Principal Component
598
Analysis (PCA) method, by using the same dataset and the QuBiLS-MAS, DRAGON and PaDEL
599
MDs employed in the previous VA. Orthogonal MDs are loaded in different projections, while
600
correlated MDs are loaded in a same component. Supplementary Information 13 shows the
601
eigenvalues and the percentages of the explained variance by 10 principal components (PC)
602
obtained, which explain approximately 69.38% of the cumulative variance. As a result, the MDs
603
analyzed show collinearity in PC1 (34.16%), PC3 (6.41%), PC4 (4.36%), PC5 (3.98%) and PC7
604
(2.55%). Also note that the QuBiLS-MAS 2D-MDs present exclusive loadings in PC2 (9.29%)
26 ACS Paragon Plus Environment
Page 27 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Chemical Research in Toxicology
605
and PC8 (2.34%). That is, they contain chemical information not codified by the DRAGON and
606
PaDEL MDs considered, justifying in this way the use of the QuBiLS-MAS approach in the AOT
607
modeling.
608
Nonetheless, this PCA study evidences that exclusive loadings are also presented by the
609
DRAGON and PaDEL MDs in PC6 (2.97%) and PC9 (1.74%), and PC10 (1.56%), respectively.
610
This result suggests two alternatives: (1) that better QSAR models could be obtained by combining
611
the QuBiLS-MAS, DRAGON and PaDEL MDs; or (2) that better strategies for the filtering and
612
selection of QuBiLS-MAS 2D-MDs can be applied in order to use MDs encoding all the chemical
613
information captured by the DRAGON and PaDEL MDs. Finally, it is important to highlight that
614
according to the collinearity shown among the QuBiLS-MAS, DRAGON and PaDEL MDs, it can
615
be stated that the former codifies chemical information related to the molecular weight (MW), the
616
sum of Sanderson electronegativities (Se) and the sum of atomic polarizabilities (Sp) in PC1; to
617
the number of aromatic atoms (naAromAtom), the number of double bonds (nBondsD) and the
618
number of doubly bound carbon bounds to two other carbons (C2SP2) in PC3; to the sum of E-
619
States for (strong) hydrogen bond donors (SHBd) in PC4; to the sum of path lengths starting from
620
nitrogen atoms (WTPT-5) in PC5; and to the minimum atom-type E-State: >C< (minssssC) in
621
PC7.
622 623
4. Conclusions
624
Several QSAR models have been developed to predict AOT, by using the QuBiLS-MAS
625
framework for the molecular encoding. Three training sets comprised of 5931 (EPA training set),
626
7413 (EPA-full training set) and 10152 (Zhu training set) compounds were used for the models
627
building (EPA training set ⊆ EPA-full set ⊆ Zhu set). The models created on the EPA training set
27 ACS Paragon Plus Environment
Chemical Research in Toxicology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 28 of 45
628
were validated on the EPA test set (1482 compounds), as well as on the ProTox (425 compounds)
629
and T3DB (284 compounds) external sets; whereas the models built on the other two training sets
630
were only assessed on the external sets mentioned above. The models developed ranging from 37
631
to 228 QuBiLS-MAS 2D-MDs, and they present good predictive ability and high coverage percent
632
according to the applicability domain (AD) analysis performed.
633
Specifically, two Min operator-based consensus models denoted as M19 and M22, as well
634
as a WA operator-based consensus model denoted as M24, were selected as the best ones for each
635
training set accounted for. On one hand, model M19, created on the EPA training set, presents
636
𝑀𝐴𝐸𝑡𝑒𝑠𝑡 ― 𝐴𝐷 = 0.4044, 𝑀𝐴𝐸𝑃𝑟𝑜𝑇𝑜𝑥 ― 𝐴𝐷 = 0.4067 and 𝑀𝐴𝐸𝑇3𝐷𝐵 ― 𝐴𝐷 = 0.2586 on the 1456, 402
637
and 271 compounds included within the AD for the EPA test set, ProTox external set and T3DB
638
external set, respectively. On the other hand, model M22, built on the EPA-full training set, has
639
𝑀𝐴𝐸𝑃𝑟𝑜𝑇𝑜𝑥 ― 𝐴𝐷 = 0.3992 and 𝑀𝐴𝐸𝑇3𝐷𝐵 ― 𝐴𝐷 = 0.2286 on the 414 and 271 compounds inside its
640
AD for the ProTox and T3DB external sets, respectively. In addition, for these two external sets,
641
model M24, built on the Zhu training set, presents 𝑀𝐴𝐸𝑃𝑟𝑜𝑇𝑜𝑥 ― 𝐴𝐷 = 0.3773 and 𝑀𝐴𝐸𝑇3𝐷𝐵 ― 𝐴𝐷
642
= 0.2471 on the 400 and 266 compounds within its AD, respectively.
643
These outcomes were compared and statistically validated with respect to the ones obtained
644
by 14 QSAR methods from the literature, which include the T.E.S.T software-based methods,10
645
the GUSAR software-based methods,16 and the admetSAR,12 ProTox14 and DL-AOT18 methods.
646
As a result, model M22 presents the best performance. It is important to remark that model M19
647
is the one with the best performance on the EPA test set among all the QSAR methods built on the
648
EPA training set (e.g., T.E.S.T software-based methods). Moreover, a retrospective study on 261
649
withdrawn drugs due to their toxic/side effects was performed, in order to assess the ability of the
650
models proposed in generating toxicological warnings. A comparison with regard to the methods
28 ACS Paragon Plus Environment
Page 29 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Chemical Research in Toxicology
651
from the literature was also made. As a result, model M22 has the highest ability of labelling a
652
compound as toxic (or harmful) according to the GHS scheme.78 All in all, it can be concluded
653
that the models proposed here, especially model M22, constitute prominent tools to predict AOT,
654
at providing the best results among all the methods examined. To this end, a software was built in
655
order to be used in virtual screening tasks (http://tomocomd.com/apps/ptoxra).
656 657
Acknowledgement: CRGJ acknowledges to the program “Cátedras CONACYT” from “Consejo
658
Nacional de Ciencia y Tecnología (CONACYT), México” by the support to the endowed chair
659
501/2018 at “Centro de Investigación Científica y de Educación Superior de Ensenada
660
(CICESE)”. YMP thanks to the program “Profesor Invitado” for a post-doctoral fellowship to
661
work at Valencia University in 2018-2019. YMP also acknowledges the support from USFQ
662
“Chancellor Grant 2017-2018 (Project ID11192)
663 664
Conflict of Interest: The authors declare no conflict of interest.
665 666
5. References
667 668 669 670 671 672 673 674 675 676 677 678 679 680 681
1. Boyerinas, B.; Jochems, C.; Fantini, M.; Heery, C. R.; Gulley, J. L.; Tsang, K. Y.; Schlom, J., AntibodyDependent Cellular Cytotoxicity Activity of a Novel Anti–PD-L1 Antibody Avelumab (MSB0010718C) on Human Tumor Cells. Cancer Immunol. Res. 2015, 3, 1148-1157. 2. Adrees, M.; Ali, S.; Rizwan, M.; Zia-ur-Rehman, M.; Ibrahim, M.; Abbas, F.; Farid, M.; Qayyum, M. F.; Irshad, M. K., Mechanisms of silicon-mediated alleviation of heavy metal toxicity in plants: A review. Ecotoxicol. Environ. Saf. 2015, 119, 186-197. 3. Mebs, D., Toxicity in animals. Trends in evolution? Toxicon 2001, 39, 87-96. 4. Saha, N. C.; Bhunia, F.; Kaviraj, A., Toxicity of Phenol to Fish and Aquatic Ecosystems. Bull. Environ. Contam. Toxicol. 1999, 63, 195-202. 5. Walum, E., Acute oral toxicity. Environ. Health Perspect. 1998, 106, 497-503. 6. Enslein, K., A toxicity estimation model. J. Environ. Pathol. Toxicol. 1978, 2, 115-121. 7. Enslein, K.; Lander, T. R.; Tomb, M. E.; Craig, P. N., A Predictive Model for Estimating Rat Oral Ld50 Values. Toxicol. Ind. Health 1989, 5, 265-265. 8. Eldred, D. V.; Jurs, P. C., Prediction of Acute Mammalian Toxicity of Organophosphorus Pesticide Compounds from Molecular Structure. SAR QSAR Environ. Res. 1999, 10, 75-99.
29 ACS Paragon Plus Environment
Chemical Research in Toxicology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731
Page 30 of 45
9. Guo, J.-X.; Wu, J. J. Q.; Wright, J. B.; Lushington, G. H., Mechanistic Insight into Acetylcholinesterase Inhibition and Acute Toxicity of Organophosphorus Compounds: A Molecular Modeling Study. Chem. Res. Toxicol. 2006, 19, 209-216. 10. Zhu, H.; Martin, T. M.; Ye, L.; Sedykh, A.; Young, D. M.; Tropsha, A., Quantitative Structure−Activity Relationship Modeling of Rat Acute Toxicity by Oral Exposure. Chem. Res. Toxicol. 2009, 22, 19131921. 11. T.E.S.T software. https://www.epa.gov/chemical-research/toxicity-estimation-software-tool-test (April 2, 2019), 12. Cheng, F.; Li, W.; Zhou, Y.; Shen, J.; Wu, Z.; Liu, G.; Lee, P. W.; Tang, Y., admetSAR: A Comprehensive Source and Free Tool for Assessment of Chemical ADMET Properties. J. Chem. Inf. Model. 2012, 52, 3099-3105. 13. admetSAR software. http://lmmd.ecust.edu.cn/admetsar1/home/ (April 2, 2019), 14. Drwal, M. N.; Banerjee, P.; Dunkel, M.; Wettig, M. R.; Preissner, R., ProTox: a web server for the in silico prediction of rodent oral toxicity. Nucleic Acids Res. 2014, 42, W53-W58. 15. ProTox-II software. http://tox.charite.de/protox_II/ (April 2, 2019), 16. Alexey, L.; Alexey, Z.; Dmitry, F.; Vladimir, P., QSAR Modelling of Rat Acute Toxicity on the Basis of PASS Prediction. Mol. Inf. 2011, 30, 241-250. 17. Lu, J.; Peng, J.; Wang, J.; Shen, Q.; Bi, Y.; Gong, L.; Zheng, M.; Luo, X.; Zhu, W.; Jiang, H.; Chen, K., Estimation of acute oral toxicity in rat using local lazy learning. J. Cheminf. 2014, 6, 26. 18. Xu, Y.; Pei, J.; Lai, L., Deep Learning Based Regression and Multiclass Models for Acute Oral Toxicity Prediction with Automatic Chemical Feature Extraction. J. Chem. Inf. Model. 2017, 57, 2672-2685. 19. Hand, D. J., Classifier Technology and the Illusion of Progress. Stat. Sci. 2006, 21, 1-14. 20. Stanley, Y. S.; Fei, Y.; Mu, Z., Chemical Descriptors Are More Important Than Learning Algorithms for Modelling. Mol. Inf. 2012, 31, 707-710. 21. Skoraczyński, G.; Dittwald, P.; Miasojedow, B.; Szymkuć, S.; Gajewska, E. P.; Grzybowski, B. A.; Gambin, A., Predicting the outcomes of organic reactions via machine learning: are current descriptors sufficient? Sci. Rep. 2017, 7, 3582. 22. Validation of (Q)SAR Models. http://www.oecd.org/chemicalsafety/riskassessment/validationofqsarmodels.htm (April 2, 2019), 23. Whitley, D.; Watson, J. Complexity Theory and the No Free Lunch Theorem. In Search Methodologies, Burke, E.; Kendall, G., Eds.; Springer US: Boston, MA, 2005; Chapter 11, pp 317-339. 24. Barigye, S. J.; Marrero-Ponce, Y.; López, Y. M.; Santiago, O. M.; Torrens, F.; Domenech, R. G.; Galvez, J., Event-based criteria in GT-STAF information indices: theory, exploratory diversity analysis and QSPR applications. SAR QSAR Environ. Res. 2013, 24, 3-34. 25. Martínez-Santiago, O.; Millán-Cabrera, R.; Marrero-Ponce, Y.; Barigye, S. J.; Martínez-López, Y.; Torrens, F.; Pérez-Giménez, F., Discrete Derivatives for Atom-Pairs as a Novel Graph-Theoretical Invariant for Generating New Molecular Descriptors: Orthogonality, Interpretation and QSARs/QSPRs on Benchmark Databases. Mol. Inf. 2014, 33, 343-368. 26. Cubillán, N.; Marrero-Ponce, Y.; Ariza-Rico, H.; Barigye, S. J.; García-Jacas, C. R.; Valdes-Martini, J. R.; Alvarado, Y. J., Novel global and local 3D atom-based linear descriptors of the Minkowski distance matrix: theory, diversity–variability analysis and QSPR applications. J. Math. Chem. 2015, 53, 2028-2064. 27. Marrero-Ponce, Y.; García-Jacas, C. R.; Barigye, S. J.; Valdés-Martiní, J. R.; Rivera-Borroto, O. M.; Pino-Urias, R. W.; Cubillán, N.; Alvarado, Y. J., Optimum Search Strategies or Novel 3D Molecular Descriptors: is there a Stalemate? Curr. Bioinf. 2015, 10, 533-564. 28. García-Jacas, C. R.; Marrero-Ponce, Y.; Barigye, S. J.; Valdés-Martiní, J. R.; Rivera-Borroto, O. M.; Verbel, J. O., N-Linear Algebraic Maps to Codify Chemical Structures: is a suitable generalization to the atom-pairs approaches? Curr. Drug. Metab. 2014, 15, 441-469. 29. Valdés-Martiní, J. R.; Marrero-Ponce, Y.; García-Jacas, C. R.; Martinez-Mayorga, K.; Barigye, S. J.; Vaz d‘Almeida, Y. S.; Pham-The, H.; Pérez-Giménez, F.; Morell, C. A., QuBiLS-MAS, open source 30 ACS Paragon Plus Environment
Page 31 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782
Chemical Research in Toxicology
30. 31. 32. 33. 34. 35.
36. 37.
38.
39.
40.
41.
42.
43.
multi-platform software for atom- and bond-based topological (2D) and chiral (2.5D) algebraic molecular descriptors computations. J. Cheminf. 2017, 9, 1-26. QuBiLS-MAS software. http://tomocomd.com/qubils-mas (April 2, 2019), Marrero-Ponce, Y., Total and Local Quadratic Indices of the Molecular Pseudograph's Atom Adjacency Matrix: Applications to the Prediction of Physical Properties of Organic Compounds. Molecules 2003, 8, 687-726. Marrero-Ponce, Y., Total and local (atom and atom type) molecular quadratic indices: significance interpretation, comparison to other molecular descriptors, and QSPR/QSAR applications. Bioorg. Med. Chem. 2004, 12, 6351-6369. Castillo-Garit, J. A.; Martinez-Santiago, O.; Marrero-Ponce, Y.; Casañola-Martín, G. M.; Torrens, F., Atom-based non-stochastic and stochastic bilinear indices: Application to QSPR/QSAR studies of organic compounds. Chem. Phys. Lett. 2008, 464, 107-112. Marrero-Ponce, Y.; Torrens, F.; García-Domenech, R.; Ortega-Broche, S. E.; Zaldivar, V. R., Novel 2D TOMOCOMD-CARDD molecular descriptors: atom-based stochastic and non-stochastic bilinear indices and their QSPR applications. J. Math. Chem. 2008, 44, 650-673. Marrero-Ponce, Y.; Martínez-Albelo, E. R.; Casañola-Martín, G. M.; Castillo-Garit, J. A.; EcheveríaDíaz, Y.; Zaldivar, V. R.; Tygat, J.; Borges, J. E. R.; García-Domenech, R.; Torrens, F.; PérezGiménez, F., Bond-based linear indices of the non-stochastic and stochastic edge-adjacency matrix. 1. Theory and modeling of ChemPhys properties of organic molecules. Mol. Diversity 2010, 14, 731-753. Marrero-Ponce, Y.; Torrens, F.; Alvarado, Y. J.; Rotondo, R., Bond-based global and local (bond, group and bond-type) quadratic indices and their applications to computer-aided molecular design. 1. QSPR studies of diverse sets of organic chemicals. J Comput Aided Mol Des 2006, 20, 685-701. Marrero-Ponce, Y.; Iyarreta-Veitía, M.; Montero-Torres, A.; Romero-Zaldivar, C.; Brandt, C. A.; Ávila, P. E.; Kirchgatter, K.; Machado, Y., Ligand-Based Virtual Screening and in Silico Design of New Antimalarial Compounds Using Nonstochastic and Stochastic Total and Atom-Type Quadratic Maps. J. Chem. Inf. Model. 2005, 45, 1082-1100. Meneses-Marcel, A.; Marrero-Ponce, Y.; Machado-Tugores, Y.; Montero-Torres, A.; Pereira, D. M.; Escario, J. A.; Nogal-Ruiz, J. J.; Ochoa, C.; Arán, V. J.; Martínez-Fernández, A. R.; García Sánchez, R. N., A linear discrimination analysis based virtual screening of trichomonacidal lead-like compounds: Outcomes of in silico studies supported by experimental results. Bioorg. Med. Chem. Lett. 2005, 15, 3838-3843. Marrero-Ponce, Y.; Meneses-Marcel, A.; Castillo-Garit, J. A.; Machado-Tugores, Y.; Escario, J. A.; Barrio, A. G.; Pereira, D. M.; Nogal-Ruiz, J. J.; Arán, V. J.; Martínez-Fernández, A. R.; Torrens, F.; Rotondo, R.; Ibarra-Velarde, F.; Alvarado, Y. J., Predicting antitrichomonal activity: A computational screening using atom-based bilinear indices and experimental proofs. Bioorg. Med. Chem. 2006, 14, 6502-6524. Meneses-Marcel, A.; Marrero-Ponce, Y.; Ibáñez-Escribano, A.; Gómez-Barrio, A.; Escario, J. A.; Barigye, S. J.; Terán, E.; García-Jacas, C. R.; Machado-Tugores, Y.; Nogal-Ruiz, J. J.; Arán-Redó, V. J., Drug repositioning for novel antitrichomonas from known antiprotozoan drugs using hierarchical screening. Future Med. Chem. 2018, 10, 863-878. Marrero-Ponce, Y.; Huesca-Guillén, A.; Ibarra-Velarde, F., Quadratic indices of the ‘molecular pseudograph's atom adjacency matrix’ and their stochastic forms: a novel approach for virtual screening and in silico discovery of new lead paramphistomicide drugs-like compounds. J. Mol. Struct.: THEOCHEM 2005, 717, 67-79. Casañola-Martín, G. M.; Marrero-Ponce, Y.; Khan, M. T. H.; Ather, A.; Sultan, S.; Torrens, F.; Rotondo, R., TOMOCOMD-CARDD descriptors-based virtual screening of tyrosinase inhibitors: Evaluation of different classification model combinations using bond-based linear indices. Bioorg. Med. Chem. 2007, 15, 1483-1503. Casañola-Martín, G. M.; Khan, M. T. H.; Marrero-Ponce, Y.; Ather, A.; Sultankhodzhaev, M. N.; Torrens, F., New tyrosinase inhibitors selected by atomic linear indices-based classification models. Bioorg. Med. Chem. Lett. 2006, 16, 324-330. 31 ACS Paragon Plus Environment
Chemical Research in Toxicology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833
Page 32 of 45
44. A., C. G. J.; Yovani, M. P.; Francisco, T.; Ramón, G. D., Estimation of ADME properties in drug discovery: Predicting Caco‐2 cell permeability using atom‐based stochastic and non‐stochastic linear indices. J. Pharm. Sci. 2008, 97, 1946-1976. 45. Medina Marrero, R.; Marrero-Ponce, Y.; Barigye, S. J.; Echeverría Díaz, Y.; Acevedo-Barrios, R.; Casañola-Martín, G. M.; García Bernal, M.; Torrens, F.; Pérez-Giménez, F., QuBiLs-MAS method in early drug discovery and rational drug identification of antifungal agents. SAR QSAR Environ. Res. 2015, 26, 943-958. 46. Toxin and Toxin Target Database (T3DB). http://www.t3db.ca/ (April 2, 2019), 47. WITHDRAWN: A Resource for Withdrawn and Discontinued Drugs. http://cheminfo.charite.de/withdrawn/index.html (April 2, 2019), 48. O'Boyle, N. M.; Banck, M.; James, C. A.; Morley, C.; Vandermeersch, T.; Hutchison, G. R., Open Babel: An open chemical toolbox. J. Cheminf. 2011, 3, 1-14. 49. García-Jacas, C. R.; Cabrera-Leyva, L.; Marrero-Ponce, Y.; Suárez-Lezcano, J.; Cortés-Guzmán, F.; García-González, L. A., GOWAWA Aggregation Operator-based Global Molecular Characterizations: Weighting Atom/bond Contributions (LOVIs/LOEIs) According to their Influence in the Molecular Encoding. Mol. Inf. 2018, 37, 1-14. 50. García-Jacas, C. R.; Cabrera-Leyva, L.; Marrero-Ponce, Y.; Suárez-Lezcano, J.; Cortés-Guzmán, F.; Pupo-Meriño, M.; Vivas-Reyes, R., Choquet integral-based fuzzy molecular characterizations: when global definitions are computed from the dependency among atom/bond contributions (LOVIs/LOEIs). J. Cheminf. 2018, 10, 1-17. 51. Hall, M. A. Correlation-based Feature Selection for Machine Learning. University of Waikato, Hamilton, NewZealand, 1999. 52. Robnik-Šikonja, M.; Kononenko, I., Theoretical and Empirical Analysis of ReliefF and RReliefF. Mach. Learn. 2003, 53, 23-69. 53. WEKA software. http://www.cs.waikato.ac.nz/ml/weka/ (April 2, 2019), 54. Godden, J. W.; Stahura, F. L.; Bajorath, J., Variability of Molecular Descriptors in Compound Databases Revealed by Shannon Entropy Calculations. J. Chem. Inf. Comput. Sci. 2000, 40, 796-800. 55. Urias, R. W. P.; Barigye, S. J.; Marrero-Ponce, Y.; García-Jacas, C. R.; Valdes-Martiní, J. R.; PerezGimenez, F., IMMAN: free software for information theory-based chemometric analysis. Mol. Divers. 2015, 19, 305-319. 56. Kuncheva, L. I. Ensemble Feature Selection. In Combining Pattern Classifiers: Methods and Algorithms; John Wiley & Sons: New Jersey, 2014. 57. Kohavi, R.; John, G. H., Wrappers for feature subset selection. Artif. Intell. 1997, 97, 273-324. 58. Altman, N. S., An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression. Amer. Statist. 1992, 46, 175-185. 59. Murtagh, F., Multilayer perceptrons for classification and regression. Neurocomputing 1991, 2, 183197. 60. Breiman, L., Random Forests. Mach. Learn. 2001, 45, 5-32. 61. Shevade, S. K.; Keerthi, S. S.; Bhattacharyya, C.; Murthy, K. R. K., Improvements to the SMO algorithm for SVM regression. IEEE Trans. Neural Netw. Learn. Syst. 2000, 11, 1188-1193. 62. Üstün, B.; Melssen, W. J.; Buydens, L. M. C., Facilitating the application of Support Vector Regression by using a universal Pearson VII function based kernel. Chemom. Intell. Lab. Syst. 2006, 81, 29-40. 63. Breiman, L., Bagging predictors. Mach. Learn. 1996, 24, 123-140. 64. Friedman, J. H., Stochastic gradient boosting. Comput. Stat. Data. Anal. 2002, 38, 367-378. 65. Wolpert, D. H., Stacked generalization. Neural Netw. 1992, 5, 241-259. 66. Dragos, H.; Gilles, M.; Alexandre, V., Predicting the Predictability: A Unified Approach to the Applicability Domain Problem of QSAR Models. J. Chem. Inf. Model. 2009, 49, 1762-1776. 67. Sahigara, F.; Mansouri, K.; Ballabio, D.; Mauri, A.; Consonni, V.; Todeschini, R., Comparison of Different Approaches to Define the Applicability Domain of QSAR Models. Molecules 2012, 17, 4791. 68. Roy, K.; Kar, S.; Ambure, P., On a simple approach for determining applicability domain of QSAR models. Chemom. Intell. Lab. Syst. 2015, 145, 22-29. 32 ACS Paragon Plus Environment
Page 33 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858
Chemical Research in Toxicology
69. Liew, C. Y.; Lim, Y. C.; Yap, C. W., Mixed learning algorithms and features ensemble in hepatotoxicity prediction. J Comput Aided Mol Des 2011, 25, 855-871. 70. Aranda, J.; Garro Martinez, J.; Castro, E.; Duchowicz, P., Conformation-Independent QSPR Approach for the Soil Sorption Coefficient of Heterogeneous Compounds. Int. J. Mol. Sci. 2016, 17, 1247. 71. J, J.; N, N.-J.; T, A., QSAR applicabilty domain estimation by projection of the training set descriptor space: a review. Altern. Lab. Anim. 2005, 33, 445-459. 72. Netzeva, T. I.; Worth, A.; Aldenberg, T.; Benigni, R.; Cronin, M. T.; Gramatica, P.; Jaworska, J. S.; Kahn, S.; Klopman, G.; Marchant, C. A.; Myatt, G.; Nikolova-Jeliazkova, N.; Patlewicz, G. Y.; Perkins, R.; Roberts, D.; Schultz, T.; Stanton, D. W.; van de Sandt, J. J.; Tong, W.; Veith, G.; Yang, C., Current status of methods for defining the applicability domain of (quantitative) structure-activity relationships. The report and recommendations of ECVAM Workshop 52. Altern. Lab. Anim. 2005, 33, 155-73. 73. García-Jacas, C. R.; Martinez-Mayorga, K.; Marrero-Ponce, Y.; Medina-Franco, J. L., Conformationdependent QSAR approach for the prediction of inhibitory activity of bromodomain modulators. SAR QSAR Environ. Res. 2017, 28, 41-58. 74. Jaworska, J.; Nikolova-Jeliazkova, N. Ambit Discovery v0.0.4. http://ambit.sourceforge.net/download_ambitdiscovery.html (April 2, 2019), 75. Yap, B. W.; Sim, C. H., Comparisons of various types of normality tests. J. Stat. Comput. Sim. 2011, 81, 2141-2155. 76. Friedman, M., A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings. Ann. Math. Statist. 1940, 11, 86-92. 77. Holm, S., A Simple Sequentially Rejective Multiple Test Procedure. Scand. J. Stat 1979, 6, 65–70. 78. Winder, C.; Azzi, R.; Wagner, D., The development of the globally harmonized system (GHS) of classification and labelling of hazardous chemicals. J. Hazard. Mater. 2005, 125, 29-44. 79. Randić, M., Generalized molecular descriptors. J. Math. Chem. 1991, 7, 155-168.
33 ACS Paragon Plus Environment
Chemical Research in Toxicology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 34 of 45
ANNEXES (Tables, Figures and Schemas to be inserted in the Main Text) Table 1. Statistical parameters of the best individual models built on the EPA training set (5931 structures). The performance achieved by these models on the EPA test set (1482 structures) is reported between parentheses. ID Size Model* Rtraining (test) MAEtraining (test) RMSEtraining (test) M01 166 (ReF-CFS)_RF 0.7790 (0.7952) 0.4382 (0.4355) 0.5986 (0.5899) M02 159 (ReF-CFSB)_RF 0.7770 (0.7991) 0.4411 (0.4329) 0.6013 (0.5865) M03 228 (ReF-CFS-SE)_RF 0.7753 (0.7942) 0.4420 (0.4416) 0.6034 (0.5927) M04 78 CFS_RF 0.7689 (0.7865) 0.4493 (0.4478) 0.6096 (0.6010) M05 95 ReF_RF 0.7681 (0.7869) 0.4473 (0.4400) 0.6109 (0.5997) M06 103 SE_SVM 0.7594 (0.7807) 0.4455 (0.4401) 0.6173 (0.6033) M07 37 SE_Wrapper(RF) 0.7552 (0.7687) 0.4602 (0.4620) 0.6254 (0.6223) M08 78 CFS_SVM 0.7543 (0.7671) 0.4582 (0.4529) 0.6246 (0.6203) M09 228 (ReF-CFS-SE)_SVM 0.7536 (0.7773) 0.4568 (0.4479) 0.6259 (0.6097) M10 166 (ReF-CFS)_SVM 0.7521 (0.7818) 0.4568 (0.4427) 0.6255 (0.6032) M11 166 (ReF-CFS)_kNN(6) 0.7510 (0.7789) 0.4617 (0.4475) 0.6347 (0.6054) M12 228 (ReF-CFS-SE)_kNN(6) 0.7500 (0.7724) 0.4580 (0.4532) 0.6305 (0.6140) Average performance 0.7620 (0.7824) 0.4513 (0.4453) 0.6173 (0.6040) * The name of the models is denoted as ‘base name’_‘learning method’, where ‘base name’ is denoted by the feature selection methods used: ReF (Relief-F), CFS (Correlation subset forward), CFSB (Correlation subset bidirectional) and SE (Shannon entropy-based method).
34 ACS Paragon Plus Environment
Page 35 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
Chemical Research in Toxicology
Table 2. Statistical parameters achieved by the best ensemble and consensus models built for the EPA training set (5931 structures) and the EPA test set (1482 structures), respectively. Model ID Size Rtraining MAEtraining RMSEtraining Rtest MAEtest RMSEtest Ensemble method Base models M13 166 Additive regression M01 0.7885 0.4224 0.5843 0.8059 0.4160 0.5725 M14 228 Additive regression M03 0.7864 0.4256 0.5875 0.8092 0.4179 0.5694 M15 159 Additive regression M02 0.7841 0.4266 0.5898 0.8093 0.4157 0.5690 M16 166 Stacking (linear regression) M01, M10, M11 0.7801 0.4312 0.5933 0.8024 0.4267 0.5755 M17 200 Consensus (average) M13, M15 0.8103 0.4136 0.5676 M18 200 Consensus (weighted average) M13, M15 0.8102 0.4138 0.5677 M19 200 Consensus (minimum) M13, M15 0.8095 0.4106 0.5693
35 ACS Paragon Plus Environment
Chemical Research in Toxicology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
Page 36 of 45
Table 3. Statistical parameters achieved by the consensus models from M17 to M25. The models from M17 to M19, from M20 to M22 and from M23 to M25 were built on the EPA training set (5931 structures), EPA-full training set (7413 structures) and Zhu training set (10152 structures), respectively. ProTox external set T3DB external set (425 compounds) (284 compounds) ID Size Model MAEProTox RMSEProTox MAET3DB RMSET3DB M17 200 Consensus (average) 0.4106 0.5901 0.3004 0.5157 M18 200 Consensus (weighted average) 0.4108 0.5902 0.3006 0.5155 M19 200 Consensus (minimum) 0.4073 0.5890 0.2968 0.5221 M20 200 Consensus (average) 0.3975 0.5784 0.3019 0.5098 M21 200 Consensus (weighted average) 0.3976 0.5785 0.3029 0.5097 M22 200 Consensus (minimum) 0.3971 0.5781 0.2796 0.5161 M23 200 Consensus (average) 0.3807 0.5663 0.3016 0.5091 M24 200 Consensus (weighted average) 0.3805 0.5656 0.3049 0.5095 M25 200 Consensus (minimum) 0.3848 0.5756 0.3057 0.5272
36 ACS Paragon Plus Environment
Page 37 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
Chemical Research in Toxicology
Table 4. Performance of the best consensus models built on the EPA training set (from M17 to M19), EPA-full training set (from M20 to M22) and Zhu training set (from M23 to M25), respectively, according to their applicability domains. EPA test set (1482 compounds) ProTox external set (425 compounds) T3DB external set (284 compounds) ID % % % MAEtest-AD RMSEtest-AD MAEProTox-AD RMSEProTox-AD MAET3DB-AD RMSET3DB-AD coverage coverage coverage M17 98.25 0.4084 0.5611 94.59 0.4096 0.5925 95.42 0.2632 0.4402 M18 98.25 0.4087 0.5613 94.59 0.4098 0.5926 95.42 0.2635 0.4403 M19 98.25 0.4044 0.5613 94.59 0.4067 0.5916 95.42 0.2586 0.4405 M20 97.41 0.3992 0.5813 95.42 0.2547 0.4023 M21 97.41 0.3993 0.5814 95.42 0.2559 0.4030 M22 97.41 0.3992 0.5809 95.42 0.2286 0.3957 M23 94.12 0.3774 0.5688 93.66 0.2434 0.3859 M24 94.12 0.3773 0.5681 93.66 0.2471 0.3871 M25 94.12 0.3811 0.5779 93.66 0.2444 0.3979
37 ACS Paragon Plus Environment
Chemical Research in Toxicology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 38 of 45
Table 5. Pearson correlation-based similarity coefficients among the models proposed and QSAR methods most commonly used in the literature to predict AOT. The similarities were calculated from the toxicity classes obtained by each model in the WITHDRAWN dataset according to the GHS labelling. TEST DL-AOT admetSAR ProTox M19 M22 M24 Consensus TEST Consensus 1.000 DL-AOT .496 1.000 admetSAR .433 .658 1.000 ProTox .351 .557 .612 1.000 M19 .447 .531 .648 .506 1.000 M22 .465 .599 .723 .548 .890 1.000 M24 .456 .666 .848 .648 .735 .814 1.000
38 ACS Paragon Plus Environment
Page 39 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Chemical Research in Toxicology
Schema 1. General workflow to compute the QuBiLS-MAS molecular descriptors. (1) Computation of the molecular vectors according to the properties selected; (2) Computation of the non-stochastic matrices for 𝐤 = 𝟏; (3) To consider atom-types or local-fragments (optional); (4) To apply molecular cutoffs (optional); (5) Computation of the simple-stochastic, double-stochastic and mutual probability matrices; (6) To perform the division in atom-level (or bond-level) matrices; (7) Computation of the atom-level (or bondlevel) indices (descriptors) using the property vectors calculated in the step (1); and (8) To apply one or several aggregation operators on the atom-level (or bond-level) descriptors vector.
39 ACS Paragon Plus Environment
Chemical Research in Toxicology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 40 of 45
Schema 2. Workflow corresponding to the feature selection and modelling process.
40 ACS Paragon Plus Environment
Page 41 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Chemical Research in Toxicology
Figure 1. Comparative graphic of the performance achieved by consensus model M19 with respect to seven methods reported in the literature on the EPA test set (1482 compounds), according to the applicability domain (AD) of the GUSAR wNN method, TEST Consensus method and model M19.
41 ACS Paragon Plus Environment
Chemical Research in Toxicology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 42 of 45
Figure 2. Comparative graphic of the performance achieved by consensus models M19, M22 and M24 with respect to four methods reported in the literature on the ProTox external set (425 compounds), according to the applicability domain (AD) of the GUSAR wNN method, GUSAR nNN method and model M24.
42 ACS Paragon Plus Environment
Page 43 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Chemical Research in Toxicology
Figure 3. Comparative graphic of the performance achieved by consensus models M19, M22 and M24 with respect to six methods reported in the literature on the T3DB external set (284 compounds), according to the applicability domain (AD) of the GUSAR wNN method, GUSAR nNN method, TEST Consensus method and model M24.
43 ACS Paragon Plus Environment
Chemical Research in Toxicology 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
Page 44 of 45
Figure 4. Graphics corresponding to the GHS labelling obtained in the WITHDRAWN set by: (A) Consensus models M19, M22 and M24 proposed in this work; and (B) Consensus model M22 (the best one in external prediction) and the TEST Consensus, DL-AOT, admetSAR and ProTox methods established in the literature.
44 ACS Paragon Plus Environment
Page 45 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Chemical Research in Toxicology
Figure 5. Shannon entropy distribution corresponding to the 200 QuBiLS-MAS 2D-MDs included in the best QSAR models proposed, as well as of the best 200 0D-to-2D MDs computed by the DRAGON and PaDEL software, respectively.
45 ACS Paragon Plus Environment