From Classification to Regression Multitasking QSAR Modeling Using

Nov 4, 2017 - First, a Java implementation of the V-WSP algorithm(39) had been used to determine collinear descriptors (r > 0.95), which was followed ...
0 downloads 11 Views 2MB Size
Subscriber access provided by READING UNIV

Article

From classification to regression multi-tasking QSAR modelling using a novel modular neural network: Simultaneous prediction of anticonvulsant activity and neurotoxicity of succinimides Davor Antanasijevi#, Jelena Antanasijevi#, Nemanja Trišovi#, Gordana Uš#umli#, and Viktor Pocajt Mol. Pharmaceutics, Just Accepted Manuscript • DOI: 10.1021/acs.molpharmaceut.7b00582 • Publication Date (Web): 04 Nov 2017 Downloaded from http://pubs.acs.org on November 6, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Molecular Pharmaceutics is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

From classification to regression multi-tasking QSAR modelling using a novel modular neural network: Simultaneous prediction of anticonvulsant activity and neurotoxicity of succinimides

Davor Antanasijević a,*, Jelena Antanasijevićb, Nemanja Trišovićb, Gordana Ušćumlićb , Viktor Pocajtb

a

University of Belgrade, Innovation Center of the Faculty of Technology and Metallurgy, Karnegijeva 4, 11120 Belgrade b

University of Belgrade, Faculty of Technology and Metallurgy, Karnegijeva 4, 11120 Belgrade

*E mail: [email protected]; Tel: +381 11 3303 650

ACS Paragon Plus Environment

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Abstract Succinimides, that contain a pharmacophore responsible for anticonvulsant activity, are frequently used antiepileptic drugs and the synthesis of their new derivatives with improved efficacy and tolerability presents an important task. Nowadays, multi-target/tasking methodologies focused on quantitative-structure activity relationships (mt-QSAR/mtkQSAR) have an important role in the rational design of drugs, since they enable simultaneous prediction of several standard measures of biological activities at diverse experimental conditions and against different biological targets. Relating to this very topic, the mt-QSAR/mtk-QSAR methodology can give only binary classification models, and as such, in this study a regression mtk-QSAR (rmtk-QSAR) model based on a novel modular neural network (MNN) has been proposed. The MNN uses standard classification mtk-QSAR models as input modules, while the regression is performed by the output module. The rmtk-QSAR model has been successfully developed for the simultaneous prediction of anticonvulsant activity and neurotoxicity of succinimides, with a satisfactory accuracy in testing (R2=0.87). Thus, the proposed mtk-QSAR regression method can be regarded as a viable alternative to the standard QSAR methodology. Keywords: succinimides, multi-tasking, QSAR, regression, modular neural network

2 ACS Paragon Plus Environment

Page 2 of 33

Page 3 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

1. Introduction Since 20–40 % of patients with epilepsy have drug resistance,1 continuous search for novel active molecules with more favorable anticonvulsant properties, improved efficacy and tolerability is necessary. It is well-known that the structural fragment crucial for anticonvulsant activity is nitrogen heterocyclic system, usually imide or lactam, with phenyl or alkyl groups attached to the ring.2,3 Since many studies have shown that differently substituted

succinimides

(pyrrolidine-2,5-diones)

reveal

prominent

anticonvulsant

properties in the animal models of epilepsy (maximal electroshock (MES) and pentylenetetrazole (PTZ) seizure tests),1,4–10 these compounds are regarded as favorable in future researches and they were selected as the subject of this study. Succinimides are frequently used anticonvulsants in the management of absence seizures. Although not completely understood, their mechanism of action involves a decrease in Ttype calcium channel activity.11,12 Ethosuximide (3-ethyl-3-methylpyrrolidine-2,5-dione, Zarontin) is the most commonly used among them. This drug offers a wide range of protection against different kinds of absence seizures and could potentially be useful in conjunction with other anticonvulsants in the treatment of patients with mixed seizure types.13,14 Methsuximide (1,3-dimethyl-3-phenylpyrrolidine-2,5-dione, Celontin) is generally used when other drugs fail to effectively control absence seizures or partial seizures with complex

symptomatology.13,14

Phensuximide

(1-methyl-3-phenylpyrrolidine-2,5-dione,

Milontin) is the least toxic, but also regarded to be less effective than other succinimide anticonvulsants.13,14 Considering their effect on experimental seizures, all three drugs prevent PTZ seizures at nontoxic doses in experimental animals. While methsuximide and phensuximide are capable of protecting mice against MES at non-neurotoxic and slightly 3 ACS Paragon Plus Environment

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ataxic doses respectively, ethosuximide has no effect on MES, except at high anesthetic dose level.15 Having this in mind, significant efforts have been continually directed towards design and synthesis of novel succinimides as potential central nervous system-active drug candidates. In this context, QSAR (Quantitative structure–activity relationship) methodology has emerged as a promising tool for identifying the most promising candidates among a high numbers of compounds, thus efficiently managing costly resources and significantly shortening the cycle of drug development.16 QSAR studies endeavor to associate the chemical structure of a compound with its activity, with the assumption that correlations exist between physicochemical properties and molecular structure.17 QSAR is a powerful tool for design of new compounds, as well as for the optimization of structure.18 It has proven successful in many aspects of molecular design particularly in the fields of drug discovery.19 In recent years, researchers have been focused on developing multi-target QSAR (mt-QSAR) models,20–23 i.e. the prediction of biological activities by considering different biological targets, as well as, on developing multi-tasking QSAR (mtk-QSAR) models where different measures of biological effects and diverse experimental conditions are considered.24,25 Such models can also be used for the prediction of many ADME (absorption, distribution, metabolism, and elimination) properties,26 and were applied to the calculation of quantitative contribution of diverse molecular fragments to the activity, toxicity or any ADME property, which allows the analysis and fast detection of 2D pharmacophores, toxicophores, etc.27,28 Hence, with the use of mt-QSAR or mtk-QSAR models, it is possible to

4 ACS Paragon Plus Environment

Page 4 of 33

Page 5 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

design new molecular entities with the desired properties, e.g. high inhibitory activity, and at the same time low toxicity, and good ADME properties.29,30 The mt-QSAR/mtk-QSAR models are based on the application of the moving average approach, whereby the inputs compare the similarity of present case with the average case, for which activity/toxicity are above or below predefined cutoff values. Thus, this methodology is limited to producing only binary classification models. Some regression multi-target QSAR models, which are proposed for multi-target drugs such as HIV-1 inhibitors can be found in literature,31,32 but these kinds of examples have generally been developed using another approach i.e. multi-task learning (MTL).33 In the MLT, for each compound a set of triples (X, y, t) is used, where X is matrix of descriptors, y is measured output, and t indicates to which target the triple belongs. Since X and y are defined as for a single-target QSAR, the QSAR model for a certain target is a separate learning task.33 Among other statistical methods, artificial neural networks (ANNs) are especially popular in the field of QSAR modeling owing to their capabilities to fit non-linear relationships.34 A special type of ANNs is modular neural network (MNN) that represents an ensemble of independently trained neural networks but their outputs are somehow combined to reach a final prediction.35 Modular approaches have been already successfully applied in QSAR studies, e.g. unsupervised trained ANNs have been used to group similar compounds, after local QSAR models were generated for each group using back-propagation neural networks.36 In this paper, a novel MNN, specially designed to overcome limitation of multi-tasking QSAR methodology in its application for simultaneous prediction of anticonvulsant activity and 5 ACS Paragon Plus Environment

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

neurotoxicity of succinimides, is reported. The input modules of the MNN are probabilistic neural networks (PNNs), while the output module is a generalized regression neural network (GRNN). Both neural network techniques are simple and powerful for use in QSAR studies.37 The proposed MNN transforms binary classification responses from several mtkQSAR models into a regression response, allowing prediction of the actual values of different measures of biological effects. Thus, standard mt-QSAR/mtk-QSAR methodology is extended to regression analysis.

2. Materials and methods 2.1. Data set and calculation of multi-tasking descriptors The dataset consisted of 174 structurally diverse succinimides (see Table S1 in Supporting Information), which were taken from literature (see references S1-S26 presented in Supplementary material). Considering that they had been examined for different biological effects (ED50 MES, ED50 PTZ and TD50) and on/in different biological targets (mice and rats), this dataset contained 327 cases. The anticonvulsant activity (MES, scPTZ tests) and the acute neurological toxicity for the compounds were expressed as their median effective (ED50) or toxic (TD50) doses in μmol/kg, whereby the ED50 is defined as the dose of a drug protecting 50 % of animals while TD50 is the dose that causing minimal neurological toxicity in 50% of animals. The molecular structures of compounds were sketched using ChemDraw, and a variety of 1D and 2D descriptors was generated using PaDEL Descriptor software.38 For each case in the dataset one of two categories, depending on the predefined cutoff values (Table 1), was assigned. Therefore, in each model all compounds that have ED50>cutoff and TD50>cutoff were assigned to category 1, i.e. low active/toxic, otherwise, the compounds were as considered high active/toxic and assigned to category 0.

6 ACS Paragon Plus Environment

Page 6 of 33

Page 7 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

Table 1. Models and corresponding cut-offs Model Cutoff values label ED50 TD50 (μmol/kg) (μmol/kg) I > 100 > 500 II > 200 > 600 III > 300 > 700 IV > 400 > 800 V > 500 > 900

Number of cases 1 (yes) 0 (no) 279 227 164 126 113

48 100 163 201 214

Following the multitasking QSAR methodology,24,25 the calculation of multitasking descriptors was based on the Box-Jenkins moving average approach23:

   , =

, ∑ 

  , 

(1)

 where  ( , ) is the mean of the descriptors () for all the cases where a compound was tested against the same target (bt) using the same standard measure of biological effect (sm), whereby sm> cutoff, and  , is the number of cases that fulfills the condition above. Then, the multitasking descriptors (∆ , ) are calculated as deviation from the average value, and therefore, they describe both biological target and measure of biological effect:  ∆ , =  −   ,

(2)

The number of positive (1) and negative (0) cases for each combination of bt and sm, as well as for each model, is given in Table 2.

7 ACS Paragon Plus Environment

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 33

Table 2. Number of positive (1) and negative (0) cases per model depending on measure and biological target Outcome Model I II ED50 MES Mouse 1 127 109 0 11 29 Rat 1 5 3 0 17 19 PTZ Mouse 1 52 36 0 0 16 Rat 1 7 7 0 0 0 TD50 Mouse 1 76 61 0 14 29 Rat 1 12 11 0 6 7 2.2. Selection of descriptors sm

bt

III 78 60 2 20 14 38 5 2 54 36 11 7

IV 57 81 1 21 5 47 5 2 47 43 11 7

V 48 90 1 21 5 47 5 2 43 47 11 7

In this study, the selection of descriptors was performed in two consecutive steps. First, a Java implementation of the V-WSP algorithm39 had been used to determine collinear descriptors (r > 0.95), which was followed by the use of the Reshaped sequential replacement (RSR) algorithm that yield the best subset of descriptors. The V-WSP algorithm is an adaptation of the WSP (Wootton, Sergent, Phan-Tan-Luu's) algorithm, which was modified by Ballabio et al.40 with the aim of selecting a representative set of variables. The V-WSP algorithm served as a correlation filter and reduced the initial pool of descriptors from 609 to 242. In order to select the descriptors from the reduced pool of 242 descriptors, the RSR algorithm proposed by Cassotti et al.41 was applied. The RSR algorithm is based on Miller’s Sequential Replacement (SR) method,42 but has added several functionalities which decrease the calculation time, increase the probability of convergence upon the optimal models, and identify models with drawbacks such as overfitting, chance correlation, variable

8 ACS Paragon Plus Environment

Page 9 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

redundancy and collinearity.43 For classification problems, these functionalities are as follows:41 •

Tabu list: Preliminary exclusion of variables not correlated with the output; they are re-included after the RSR algorithm reaches convergence, but only if they improve the model higher than a pre-defined threshold. The Canonical measure of correlation (CMC) index, with the threshold value of 0.3, was used for screening the “tabu” descriptors.



Roulette wheel: The initialization of the population was based on the calculated CMC index for each descriptor, whereby the descriptors with the higher CMC index had higher probability to be selected.



Randomization test: The real model error is compared with the random classification error (Random Error Rate (RER) test), i.e. with the error rate obtained if the cases are randomly assigned to the classes, and final model is accepted if the real error rate is smaller than the corresponding RER value.



Nested models: The final population of models was checked in the terms of complexity and performance. The model is rejected if its higher complexity is not balanced by higher performance.



Model distance and correlation: The canonical measure of Distance (CMD) and CMC indices were used to determine whether the final models with different variables are actually different in their nature.43

The RSR (Matlab) toolbox44 that uses k-nearest neighbours (kNN) to calibrate classification models, with the non-error rate in cross-validation (NERcv) as a fitness function, was used for the selection of subsets of the descriptors. Before the RSR was applied the dataset was split 9 ACS Paragon Plus Environment

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

into training (287 cases) and testing (40 cases) subsets using the Kennard-Stone algorithm.45 The obtained kNN mtk-QSAR models were used to benchmark the corresponding ANN mtkQSAR models. 2.3. Proposed modular neural network In this study, a two layer modular neural network (MNN) was used (Figure 1) for the transformation of binary classification responses from several input modules into a regression response made by single (output) module. The modules in the input layer were independently trained Probabilistic neural networks (PNN),46 while the output module was a General regression neural network (GRNN).47 This type of MNN belongs to the group of loosely coupled models,35 and similar MNNs have been proposed for classification problems.48 The PNNs were selected for the input layer of MNN since they are conceptually similar to the kNN, which makes them suitable to be combined with the RSR algorithm, while the GRNN was selected, because it is built on the same learning paradigm as the PNN.

Figure 1. Modular neural network for regression mtk-QSAR (rmtk-QSAR) modeling 10 ACS Paragon Plus Environment

Page 10 of 33

Page 11 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

The number of input modules is not limited to five, which was used in the current study; it can be increased if needed. It can be expected that MNN with higher number of input modules will have somewhat enhanced performance, but essentially the performance of MNN mainly depends on the accuracy of input modules. An additional comparative study that may be needed in order to quantitatively determine the accuracy of MNN depending on its architecture is beyond the scope of this research. The PNN/GRNN modules are four-layered one-pass supervised trained networks that, in the case of a PNN, estimate the probability density function (PDF) of features of each class from the available training samples using Gaussian kernel, while the GRNN module provides an optimal estimation of continuous variables implementing the statistical concepts of conditional probability.49 In both of the networks conditional probabilities were estimated using the Parzen's window approach.50 The main advantage of a PNN/GRNN is extremely rapid training, which does not require an iterative training procedure. It should be noted however that although PNN/GRNN training is not iterative, it requires the estimation of a Gaussian kernel bandwidth, which is described by smoothing factor (σ). A small value of σ causes the estimated parent density function to have distinct modes corresponding to the locations of the training samples, while a larger value produces a greater degree of interpolation between points.46 Therefore, σ defines PNN/GRNN's predicting performance. During the PNN/GRNN training the learning dataset (80% of training data) was used to set the network weights, while the validation dataset (20% of training data) was utilized for the determination of optimal smoothing factor by genetic algorithm.51 The training of PNN and GRNN is described in more details in literature.46,47

11 ACS Paragon Plus Environment

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 33

For the selected PNN/GRNN modules the number of neurons in their input layer (6 to 8) corresponds to the number of descriptors/inputs used, while the number of neurons in pattern (230), summation (2) and output (1) layer was the same for all modules. The training set for PNN modules was built from the RSR selected descriptors and binary output that correspond to the selected cut-off value. For the GRNN module the training set is built from the output of the PNN modules and one dummy variable that distinguishes between ED50 and TD50, together with the measured ED50/TD50 values that served as the model output. Beside its use in the MNN, each PNN module can be applied as a separate mtk-QSAR model for the classification of compounds into high biological active or low toxic, and vice versa. Also, they can be coupled into a cascade algorithm for the determination of the range to which ED50/TD50 belongs; details are provided in section “ED50 and TD50 range prediction”. Since MNN mtk-QSAR is a regression model, it will be labeled as rmtk-QSAR.

3. Results and discussion 3.1. mtk-QSAR models In this study, all classification mtk-QSAR models were evaluated based on the counts of true positives (TP), false positives (FP), true negatives (TN) and false negatives (FN). In addition, performance indices, such as sensitivity (SE) (eq. (3)), specificity (SP) (eq. (4)), predictive accuracy (Q) (eq. (5)) and non-error rate (NER) (eq. (6)) were utilized.

 =  ⁄( + )

(3)

 = ⁄( + )

(4)

12 ACS Paragon Plus Environment

Page 13 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

= ( + )⁄

(5)

! = ( + )⁄2

(6)

The RSR calculations were carried out setting the model size from 6 to 8 and the number of models for each model size to 3. Each combination of descriptors had passed the RER test and nested criterion, and was selected based on 5-fold cross validation. The RSR parameters are presented in Table 3, while the results of the RSR algorithms are given in Table 4. The symbols for all selected descriptors together with their corresponding description are depicted in Table 5. The predictions of each kNN model for test compounds are presented in Supporting Information (see Table S2). It should be noted that reported ED50 and TD50 values often demonstrate a wide 95% confidence interval (see Table S2 in Supporting Information), which should also be taken into account when prediction results are analysed. Table 3. The RSR algorithm parameters Parameter Value Minimum model size 6 Maximum model size 8 Number of seeds 3 Number of cross-validation groups 5 Maximum value of k for kNN 6 Type of probability for RER test propa Threshold for nested models detection 0.005 a proportional to the number of objects of each class

13 ACS Paragon Plus Environment

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table 4. Performance of kNN mtk-QSAR models on testing Performance Model index kNN-I

kNN-II

Sizea NERcv Test NERtst

kNNIII 7 0.794 0.866

kNN-IV kNN-V

8 6 6 8 0.878 0.796 0.820 0.856 0.706 0.727 0.907 0.854 (0.887)b (0.793) (0.947) (0.964) Q (%) 85.0 77.5 87.5 90.0 87.5 (90.0) (82.5) (95.0) (97.5) SE (%) 91.2 92.0 95.5 93.3 78.6 (94.1) (92.9) SP (%) 50.0 53.3 77.8 88.0 92.3 (66.7) (66.7) (96.0) (100) TP 31 (32) 23 21 14 11 (13) FN 3 (2) 2 1 1 3 (1) TN 3 (4) 8 (10) 14 22 (24) 24 (26) FP 3 (2) 7 (5) 4 3 (1) 2 (0) a Number of inputs b if the 95% confidence interval is taken into account

14 ACS Paragon Plus Environment

Page 14 of 33

Page 15 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

Table 5. Descriptors selected by the RSR algorithm with description Model II V I V I I III V III IV, V III IV IV II, IV I I II IV, V V I V I II III I, III IV III III II II V

Name ΔALogP ΔSM1_Dzs ΔSpMin3_Bhv

Description (Deviation from …) Ghose-Crippen LogKow Spectral moment of order 1 from Barysz matrix weighted by I-state Smallest absolute eigenvalue of Burden modified matrix - n 3 weighted by relative van der Waals volumes ΔSpMin3_Bhi Smallest absolute eigenvalue of Burden modified matrix - n 3 weighted by relative first ionization potential ΔSpMin5_Bhm Smallest absolute eigenvalue of Burden modified matrix - n 5 weighted by relative mass ΔSpMin5_Bhv Smallest absolute eigenvalue of Burden modified matrix - n 5 weighted by relative van der Waals volumes ΔSpMin7_Bhv Smallest absolute eigenvalue of Burden modified matrix - n 7 weighted by relative van der Waals volumes ΔSpMin8_Bhv Smallest absolute eigenvalue of Burden modified matrix - n 8 weighted by relative van der Waals volumes ΔSpMax2_Bhe Largest absolute eigenvalue of Burden modified matrix - n 2 weighted by relative Sanderson electronegativities ΔSpMax4_Bhv Largest absolute eigenvalue of Burden modified matrix - n 4 weighted by relative van der Waals volumes ΔSpMax6_Bhv Largest absolute eigenvalue of Burden modified matrix - n 6 weighted by relative van der Waals volumes ΔSpMax7_Bhs Largest absolute eigenvalue of Burden modified matrix - n 7 weighted by relative I-state ΔSpMax8_Bhv Largest absolute eigenvalue of Burden modified matrix - n 8 weighted by relative van der Waals volumes ΔnBondsS2 Total number of single bonds including bonds to hydrogens, excluding aromatic bonds ΔnRotBt Number of rotatable bonds, including terminal bonds ΔnHBDon Number of hydrogen bond donors using CDK H Bond Donor Count Descriptor algorithm ΔnHBDon_Lipinski Number of hydrogen bond donors using Lipinski's definition: Any OH or NH. ΔnHBAcc Number of hydrogen bond acceptors ΔVE3_Dt Logarithmic coefficient sum of the last eigenvector from detour matrix ΔMLFER_E Excessive molar refraction ΔMDEO-11 Molecular distance edge between all primary oxygens ΔpiPC9 Conventional bond order ID number of order 9 ln1+x ΔpiPC10 Conventional bond order ID number of order 10 ln1+x ΔETA_Beta_ns A measure of electron-richness of the molecule ΔETA_EtaP_F Functionality index EtaF relative to molecular size ΔETA_Eta_F Functionality index EtaF ΔtopoRadius Topological radius minimum atom eccentricity ΔGGI7 Topological charge index of order 7 ΔGGI9 Topological charge index of order 9 ΔGGI10 Topological charge index of order 10 ΔVR1_D Randic-like eigenvector-based index from topological distance matrix

15 ACS Paragon Plus Environment

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Additional uncertainty raises from the fact that in vivo activities from multiple sources with non-uniform testing protocols were collected, i.e. different TD tests (rotarod and chimney test), as well as, diverse pretreatment times (30 min to 4 h). To compare the relative potencies of investigated compounds, it is necessary to determine the pretreatment time required after the administration for peak anticonvulsant effect to be exhibited.52 The peak time depends on various factors including: route of administration, experimental animal (mouse vs. rat), and the pharmacokinetics of compounds. However, in many reported studies, investigated compounds were tested after the administration of single dose and the anticonvulsant effect was then determined at one fixed time point (e.g., 30 min).53 Since this information were not often reported, it could not be included into the models. Since the true predictive power of a model can be assessed only if it is applied on a set of compounds that the model has never seen before,19 each QSAR model was evaluated only based on the results obtained for the test set. All kNN models have high sensitivity (SE>90%), while the specificity for the majority of models was low (SP90%). Conversely, the PNN models (Table 6 and Table S3 in Supporting Information), developed using the same descriptors, have high sensitivity, except for PNN-II, and also satisfactory specificity for all models, except PNN-III. Efficient separation was performed by the majority of PNN models (Q>90%), which makes them suitable modules for the MNN regression mtk-QSAR model.

16 ACS Paragon Plus Environment

Page 16 of 33

Page 17 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

Table 6. Performance of PNN mtk-QSAR models on testing Performance index Model PNN-I PNN-II PNN-III PNN-IV NERtst 0.843 0.707 0.843 0.900 (0.956)a (0.827) (0.866) (0.940) Q (%) 85.0 70.0 85.0 87.5 (92.5) (80.0) (87.5) (92.5) SE (%) 85.3 68.0 90.9 100 (91.2) (72.0) (95.5) SP (%) 83.3 73.3 77.8 80.0 (100) (93.3) (88.0) TP 29 (31) 17 (18) 20 (21) 15 FN 5 (3) 8 (7) 2 (1) 0 TN 5 (6) 11 (14) 14 20 (22) FP 1 (0) 4 (1) 4 5 (3) a if the 95% confidence interval is taken into account

PNN-V 0.885 (0.942) 85.0 (92.5) 100 76.9 (88.5) 14 0 20 (23) 6 (3)

3.2. ED50 and TD50 range prediction As it was mentioned before, the PNN modules can be used as separate mtk-QSAR models, but also can be assembled in a cascade decision algorithm (CDA) that allows the determination of the range of values to which ED50 and TD50 belong (Figure 2). The CDA ends when first model in the cascade gives positive (1) prediction, and that model is labelled as the decisive one. An obvious feature of the presented CDA routine is the reduction of the risk of false assessment of high active/toxic compounds to the minimum, since such predictions must be verified by all models. The decisive kNN and PNN models are designated in Supporting Information (Tables S2 and S3) for each test compound. The predicted ranges for the test compounds obtained by kNN models with corresponding 95% confidence interval are presented in Supporting Information (Figure S1), while the same plot for PNN models is given in Figure 3.

17 ACS Paragon Plus Environment

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2. mtk-QSAR cascade decision algorithm. ED50 and TD50 values are in μmol/kg

18 ACS Paragon Plus Environment

Page 18 of 33

Page 19 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

Figure 3. CDA results based on PNN mtk-QSAR models a) ED50 MES, b) ED50 PTZ, c) TD50 As can be observed in Figure 3, the ranges predicted correspond to the reported 95% confidence intervals. The accuracy of the CDA in predicting high and low active/toxic 19 ACS Paragon Plus Environment

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

compounds is presented in Table 7. The CDA based on the PNN models outperformed the corresponding kNN models. In the case of compounds 15, 57, 101 and 102 (Table S3 in Supporting Information), the CDA gave an incorrect range although four of the five models did produce accurate predictions. This is a drawback of the cascade algorithm, because its predictions depend on a single decision model only. Table 7. Accuracy of ED50 and TD50 range prediction Model high active/toxica low active/toxicb low and high active/toxic kNN 50% 94% 80% PNN 75% 100% 92% a ED50900 μmol/kg 3.3. rmtk-QSAR model After all PNN modules had been created, the GRNN module is trained using known binary outputs and one dummy variable (Figure 1). Thus, the MNN converts the standard mtkQSAR binary response into actual ED50 and TD50 values. Since the PNN-V is used to predict if ED50 exceeds 500 μmol/kg and TD50 exceeds 900 μmol/kg, the reported ED50 values higher than 500 μmol/kg were standardised to 550 μmol/kg, while TD50 values higher than 900 μmol/kg were standardised to 950 μmol/kg. The GRNN module was tested using the same test compounds. The input values for test compounds are obtained from the PNN modules in the form of category probabilities (see Table S4 in Supporting Information). The measured and MNN rmtk-QSAR predicted values of ED50 and TD50 are given in Figure 4. The performance of the MNN model is evaluated by coefficient of determination (R2), mean absolute error (MAE), and root mean squared error (RMSE). In addition, for better estimation of the external predictive potential of models, a modified r2 (#$% ) performance metric (Eq. (7)) was used, where #&% is squared correlation coefficient between the observed 20 ACS Paragon Plus Environment

Page 20 of 33

Page 21 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

and predicted values of the test set compounds with intercept set to zero.54 The value of #$% should be greater than 0.5 for an acceptable model. #$% = # % '1 − )|# % − #&% |+

(7)

The values of calculated performance metrics (R2=0.87, MAE=62 μmol/kg, RMSE=100 μmol/kg and #$% =0.57) indicate that the predictions of ED50 and TD50 were made with satisfactory accuracy. This makes the proposed mtk-QSAR regression method a viable alternative to the standard QSAR regression methodology.

Figure 4. Results of MNN prediction of ED50 and TD50 values for test compounds. The lines indicate 95% confidence interval. 3.4. Applicability domain and outliers The OECD (Organization for Economic Cooperation and Development) recommended that validated QSAR models should have a defined applicability domain (AD). The AD estimates the uncertainty in the prediction of a test compound based on how similar it is to the

21 ACS Paragon Plus Environment

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

training compounds.55 The AD can be characterized in various ways, and it is determined by the descriptors used in the model.20 In this study, the AD, as well as the presence of outliers, were determined using a recently proposed standardisation approach.55 This analysis has confirmed that the test compounds have been properly selected, since none of the compounds were identified as an outlier for any of the PNN models. Only three compounds (85, 89, 104) were found to fall outside of the AD in the case of the PNN-IV model. If a molecule lies outside of the AD, this does not automatically mean that the prediction by model will be incorrect, but it informs the user about a high level of possible uncertainty.19 In the case of those three compounds their ED50 values have been predicted with relatively low error and they were in accordance with the 95% confidence interval (Table 8). Table 8. Test compounds outside the applicability domain Comp. bt (sm)

Measured ED50 (μmol/kg) 85 Rat 46 (22-82)a 89 (ED50 MES) 50 (22-95) 104 73 (50-106) a 95% confidence interval

MNN ED50 (μmol/kg) 55 55 54

4. Conclusions Multi-target/tasking quantitative structure–activity relationship (mt-QSAR/mtk-QSAR) methodology is promising alternative to the standard drug design procedures. Although, mtQSAR/mtk-QSAR models demonstrated high performance for the prediction of various measures of biological activities at diverse experimental conditions and against different biological targets, they are limited to only binary predictions, such as active/non-active or toxic/nontoxic. In this paper, the mt-QSAR/ mtk-QSAR concept has been extended by creating a novel regression mtk-QSAR model based on a modular neural network (MNN). 22 ACS Paragon Plus Environment

Page 22 of 33

Page 23 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

This MNN consists of two layers of modules, and it is designed to transform binary responses from several standard mtk-QSAR models (input modules) into a single regression response made by the output module. Each input module is trained independently, and therefore can serve as a standard classification mtk-QSAR model. The proposed MNN has been applied on a dataset consisting of succinimides, with the aim of simultaneously predicting their anticonvulsant activity and their neurotoxicity. It was demonstrated that the MNN model can predict ED50 MES, ED50 PTZ and TD50 tested on mice and rats with satisfactory accuracy (R2=0.87). Furthermore, the applicability domain (AD) analysis demonstrated that similar accuracy had been achieved even for compounds that have a structure outside of the AD. Further research is planned on establishing the relationship between MNN parameters (number of input modules) and its accuracy. For that purpose a dataset that contains more than two thousands cases for one and a half thousands compounds tested on four biological targets is currently been gathered.

Acknowledgements The authors are grateful to the Ministry of Education, Science and Technological Development of the Republic of Serbia, Project No. 172007 and 172013 for financial support.

Supporting Information Chemical structures with related anticonvulsant activity and neurotoxicity data. mtk-QSAR kNN and PNN testing results. CDA results based on kNN mtk-QSAR models. Testing dataset for a GRNN module of MNN model.

23 ACS Paragon Plus Environment

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

References (1)

Kamiński, K.; Rapacz, A.; Filipek, B.; Obniska, J. Design, Synthesis and Anticonvulsant Activity of New Hybrid Compounds Derived from N-Phenyl-2-(2,5-Dioxopyrrolidin-1Yl)-Propanamides and -Butanamides. Bioorg. Med. Chem. 2016, 24 (13), 2938–2946.

(2)

Perisic-Janjic, N.; Kaliszan, R.; Wiczling, P.; Milosevic, N.; Uscumlic, G.; Banjac, N. Reversed-Phase TLC and HPLC Retention Data in Correlation Studies with in Silico Molecular Descriptors and Druglikeness Properties of Newly Synthesized Anticonvulsant Succinimide Derivatives. Mol. Pharm. 2011, 8 (2), 555–563.

(3)

Perisic-Janjic, N.; Kaliszan, R.; Milosevic, N.; Uscumlic, G.; Banjac, N. Chromatographic Retention Parameters in Correlation Analysis with in Silico Biological Descriptors of a Novel Series of N-Phenyl-3-Methyl Succinimide Derivatives. J. Pharm. Biomed. Anal. 2013, 72, 65–73.

(4)

Rapacz, A.; Obniska, J.; Wiklik-Poudel, B.; Rybka, S.; Sałat, K.; Filipek, B. Anticonvulsant and Antinociceptive Activity of New Amides Derived from 3-Phenyl-2,5-DioxoPyrrolidine-1-Yl-Acetic Acid in Mice. Eur. J. Pharmacol. 2016, 781, 239–249.

(5)

Rybka, S.; Obniska, J.; Rapacz, A.; Furgała, A.; Filipek, B.; Żmudzki, P. Synthesis and Evaluation of Anticonvulsant Properties of New N-Mannich Bases Derived from 3-(1Phenylethyl)- and 3-Benzyl-Pyrrolidine-2,5-Dione. Bioorg. Med. Chem. Lett. 2016, 26 (9), 2147–2151.

(6)

Obniska, J.; Rapacz, A.; Rybka, S.; Góra, M.; Kamiński, K.; Sałat, K.; Zmudzki, P. Synthesis, and Anticonvulsant Activity of New Amides Derived from 3-Methyl- or 3Ethyl-3-Methyl-2,5-Dioxo-Pyrrolidin-1-Yl-Acetic Acids. Bioorganic Med. Chem. 2016, 24 (8), 1598–1607.

(7)

Kamiński, K.; Zagaja, M.; Rapacz, A.; Łuszczki, J. J.; Andres-Mach, M.; Abram, M.; Obniska, J. New Hybrid Molecules with Anticonvulsant and Antinociceptive Activity Derived from 3-Methyl- or 3,3-Dimethyl-1-[1-Oxo-1-(4-Phenylpiperazin-1-Yl)propan2-Yl]pyrrolidine-2,5-Diones. Bioorganic Med. Chem. 2016, 24 (4), 606–618.

(8)

Obniska, J.; Rapacz, A.; Rybka, S.; Powroånik, B.; Pekala, E.; Filipek, B.; Åmudzki, P.; Kamiåski, K. Design, Synthesis and Biological Activity of New Amides Derived from 3Methyl-3-Phenyl-2,5-Dioxo-Pyrrolidin-1-Yl-Acetic Acid. Eur. J. Med. Chem. 2015, 102, 14–25.

(9)

Kamiński, K.; Rapacz, A.; Łuszczki, J. J.; Latacz, G.; Obniska, J.; Kieć-Kononowicz, K.; Filipek, B. Design, Synthesis and Biological Evaluation of New Hybrid Anticonvulsants Derived from N-Benzyl-2-(2,5-Dioxopyrrolidin-1-Yl)propanamide and 2-(2,5Dioxopyrrolidin-1-Yl)butanamide Derivatives. Bioorganic Med. Chem. 2015, 23 (10), 2548–2561.

(10)

Kaminski, K.; Zagaja, M.; Luszczki, J. J.; Rapacz, A.; Andres-Mach, M.; Latacz, G.; KiecKononowicz, K. Design, Synthesis, and Anticonvulsant Activity of New Hybrid Compounds Derived from 2-(2,5-Dioxopyrrolidin-1-Yl)propanamides and 2-(2,5Dioxopyrrolidin-1-Yl)butanamides. J. Med. Chem. 2015, 58 (13), 5274–5286. 24 ACS Paragon Plus Environment

Page 24 of 33

Page 25 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

(11)

Huguenard, J. R. Block of T-Type Ca2+ Channels Is an Important Action of Succinimide Antiabsence Drugs. Epilepsy Curr. 2002, 2, 49–52.

(12)

Gomora, J. C.; Daud, A. N.; Weiergräber, M.; Perez-Reyes, E. Block of Cloned Human T-Type Calcium Channels by Succinimide Antiepileptic Drugs. Mol. Pharmacol. 2001, 60, 1121–1132.

(13)

McEvoy, G. K. AHFS Drug Information; American Society of Health-System Pharmacists: Bethesda, Maryland, 1995.

(14)

LeDuc, B. Antiseizure Drugs. In Foye’s principles of medicinal chemistry; Lemke, T. L., Williams, D. A., Eds.; Lippincott Williams and Wilkins: Philadelphia, PA, 2008.

(15)

Chen, G.; Weston, J. K.; Bratton Jr., A. C. Anticonvulsant Activity and Toxicity of Phensuximide, Methsuximide and Ethosuximide. Epilepsia 1963, 4, 66–76.

(16)

Manly, C. J.; Louise-May, S.; Hammer, J. D. The Impact of Informatics and Computational Chemistry on Synthesis and Screening. Drug Discov. Today 2001, 6 (21), 1101–1110.

(17)

Hanrahan, G. Artificial Neural Networks in Biological and Environmental Analysis; CRC Press: Boca Raton, FL, 2011.

(18)

Knight, N. J.; Hernando, E.; Haynes, C. J. E.; Busschaert, N.; Clarke, H. J.; Takimoto, K.; García-Valverde, M.; Frey, J. G.; Quesada, R.; Gale, P. A. QSAR Analysis of Substituent Effects on Tambjamine Anion Transporters. Chem. Sci. 2016, 7, 1600–1608.

(19)

Gupta, J.; Adams, D. J.; Berry, N. G. Will It Gel? Successful Computational Prediction of Peptide Gelators Using Physicochemical Properties and Molecular Fingerprints. Chem. Sci. 2016, 7, 4713–4719.

(20)

García, I.; Fall, Y.; Gómez, G.; González-Díaz, H. First Computational Chemistry MultiTarget Model for Anti-Alzheimer, Anti-Parasitic, Anti-Fungi, and Anti-Bacterial Activity of GSK-3 Inhibitors in Vitro, in Vivo, and in Different Cellular Lines. Mol. Divers. 2011, 15 (2), 561–567.

(21)

Speck-Planche, A.; Kleandrova, V. V.; Luan, F.; Cordeiro, M. N. D. S. Rational Drug Design for Anti-Cancer Chemotherapy: Multi-Target QSAR Models for the in Silico Discovery of Anti-Colorectal Cancer Agents. Bioorganic Med. Chem. 2012, 20 (15), 4848–4855.

(22)

Speck-Planche, A.; V. Kleandrova, V.; Luan, F.; Natalia D.S. Cordeiro, M. In Silico Discovery and Virtual Screening of Multi-Target Inhibitors for Proteins in Mycobacterium Tuberculosis. Comb. Chem. High Throughput Screen. 2012, 15 (8), 666–673.

(23)

Speck-Planche, A.; Kleandrova, V. V.; Ruso, J. M.; Cordeiro, M. N. D. S. First Multitarget Chemo-Bioinformatic Model to Enable the Discovery of Antibacterial Peptides against Multiple Gram-Positive Pathogens. J. Chem. Inf. Model. 2016, 56 (3), 588–598.

(24)

Speck-Planche, A.; Kleandrova, V. V.; Cordeiro, M. N. D. S. New Insights toward the Discovery of Antibacterial Agents: Multi-Tasking QSBER Model for the Simultaneous 25 ACS Paragon Plus Environment

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Prediction of Anti-Tuberculosis Activity and Toxicological Profiles of Drugs. Eur. J. Pharm. Sci. 2013, 48 (4–5), 812–818. (25)

Speck-Planche, A.; Kleandrova, V. V.; Cordeiro, M. N. D. S. Chemoinformatics for Rational Discovery of Safe Antibacterial Drugs: Simultaneous Predictions of Biological Activity against Streptococci and Toxicological Profiles in Laboratory Animals. Bioorganic Med. Chem. 2013, 21 (10), 2727–2732.

(26)

Speck-Planche, A.; Cordeiro, M. N. D. S. Simultaneous Virtual Prediction of AntiEscherichia Coli Activities and ADMET Profiles: A Chemoinformatic Complementary Approach for High-Throughput Screening. ACS Comb. Sci. 2014, 16 (2), 78–84.

(27)

Speck-Planche, A.; Cordeiro, M. N. D. S. Simultaneous Modeling of Antimycobacterial Activities and ADMET Profiles: A Chemoinformatic Approach to Medicinal Chemistry. Curr. Top. Med. Chem. 2013, 13, 1656–1665.

(28)

Speck-Planche, A.; Cordeiro, M. N. D. S. Chemoinformatics for Medicinal Chemistry: In Silico Model to Enable the Discovery of Potent and Safer Anti-Cocci Agents. Future Med. Chem. 2014, 6, 2013–2028.

(29)

Speck-Planche, A.; Cordeiro, M. N. D. S. Fragment-Based in Silico Modeling of MultiTarget Inhibitors against Breast Cancer-Related Proteins. Mol. Divers. 2017, 21 (3), 511–513.

(30)

Speck-Planche, A.; Dias Soeiro Cordeiro, M. N. Speeding up Early Drug Discovery in Antiviral Research: A Fragment-Based in Silico Approach for the Design of Virtual AntiHepatitis C Leads. ACS Comb. Sci. 2017, 19, 501–512.

(31)

Liu, Q.; Zhou, H.; Liu, L.; Chen, X.; Zhu, R.; Cao, Z. Multi-Target QSAR Modelling in the Analysis and Design of HIV-HCV Co-Inhibitors: An in-Silico Study. BMC Bioinformatics 2011, 12, 294.

(32)

Liu, Q.; Che, D.; Huang, Q.; Cao, Z.; Zhu, R. Multi-Target QSAR Study in the Analysis and Design of HIV-1. Chinese J. Chem. 2010, 28, 1587–1592.

(33)

Rosenbaum, L.; Dörr, A.; Bauer, M. R.; Frankmboeckler; Zell, A. Inferring Multi-Target Qsar Models with Taxonomy-Based Multi-Task Learning. J. Cheminform. 2013, 5, 33.

(34)

Häse, F.; Valleau, S.; Pyzer-Knapp, E.; Aspuru-Guzik, A. Machine Learning Exciton Dynamics. Chem. Sci. 2016, 7 (8), 5139–5147.

(35)

Chen, K. Deep and Modular Neural Networks. In Springer Handbook of Computational Intelligence; Kacprzyk, J., Pedrycz, W., Eds.; Springer-Verlag: Berlin, Germany, 2015; pp 473–494.

(36)

Crăciun, M. V.; Neagu, D. C.; König, C.; Bumbaru, S. A Study of Aquatic Toxicity Using Artificial Neural Networks. Lect. Notes Artif. Intell. 2003, 2774, 911–918.

(37)

Mosier, P. D.; Jurs, P. C. QSAR/QSPR Studies Using Probabilistic Neural Networks and Generalized Regression Neural Networks. J. Chem. Inf. Comput. Sci. 2002, 42, 1460– 1470.

(38)

Yap, C. W. PaDEL-Descriptor: An Open Source Software to Calculate Molecular Descriptors and Fingerprints. J. Comput. Chem. 2011, 32, 1466–1474. 26 ACS Paragon Plus Environment

Page 26 of 33

Page 27 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

(39)

Ambure, P.; Aher, R. B.; Gajewicz, A.; Puzyn, T.; Roy, K. “NanoBRIDGES” software: Open Access Tools to Perform QSAR and Nano-QSAR Modeling. Chemom. Intell. Lab. Syst. 2015, 147, 1–13.

(40)

Ballabio, D.; Consonni, V.; Mauri, A.; Claeys-Bruno, M.; Sergent, M.; Todeschini, R. A Novel Variable Reduction Method Adapted from Space-Filling Designs. Chemom. Intell. Lab. Syst. 2014, 136, 147–154.

(41)

Cassotti, M.; Grisoni, F.; Todeschini, R. Reshaped Sequential Replacement Algorithm: An Efficient Approach to Variable Selection. Chemom. Intell. Lab. Syst. 2014, 133, 136–148.

(42)

Miller, A. J. . Selection of Subsets of Regression Variables. J. R. Stat. Soc. Ser. A 1984, 147 (3), 389–425.

(43)

Grisoni, F.; Cassotti, M.; Todeschini, R. Reshaped Sequential Replacement for Variable Selection in QSPR: Comparison with Other Reference Methods. J. Chemom. 2014, 28 (4), 249–259.

(44)

Milano Chemometrics and QSAR Research Group. Reshaped Sequential Replacement Toolbox http://michem.disat.unimib.it/chm/download/rsrinfo.htm (accessed Nov 1, 2016).

(45)

Kennard, R. W.; Stone, L. A. Computer Aided Design of Experiments. Technometrics 1969, 11 (1), 137–148.

(46)

Specht, D. F. Probabilistic Neural Networks. Neural Networks 1990, 3, 109–118.

(47)

Specht, D. F. A General Regression Neural Network. IEEE Trans. Neural Networks 1991, 2 (6), 568–576.

(48)

Schmidt, A.; Bandar, Z. Modularity - a Concept for New Neural Network Architectures. In Proceeding of the IASTED International Conference on Computer Systems and Applications (CSA’98); Irbid, Jordan, 1998.

(49)

Singh, K. P.; Gupta, S.; Rai, P. Predicting Acute Aquatic Toxicity of Structurally Diverse Chemicals in Fish Using Artificial Intelligence Approaches. Ecotoxicol. Environ. Saf. 2013, 95, 221–233.

(50)

Parzen, E. On Estimation of a Probability Density Functon and Mode. Ann. Math. Stat. 1962, 33, 1065–1076.

(51)

Antanasijević, J.; Antanasijević, D.; Pocajt, V.; Trišović, N.; Fodor-Csorba, K. A QSPR Study on the Liquid Crystallinity of Five-Ring Bent-Core Molecules Using Decision Trees, MARS and Artificial Neural Networks. RSC Adv. 2016, 6 (22), 18452–18464.

(52)

Milichap, J. G. Anticonvulsant Drugs. In Physiological Pharmacology: a comprehensive treatise; Root, W. S., Hofmann, F. G., Eds.; Academic Press: New York and London, 1965; p 133.

(53)

Löscher, W. Critical Review of Current Animal Models of Seizures and Epilepsy Used in the Discovery and Development of New Antiepileptic Drugs. Seizure 2011, 20 (5), 359–368. 27 ACS Paragon Plus Environment

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(54)

Roy, P. P.; Paul, S.; Mitra, I.; Roy, K. On Two Novel Parameters for Validation of Predictive QSAR Models. Molecules 2009, 14 (5), 1660–1701.

(55)

Roy, K.; Kar, S.; Ambure, P. On a Simple Approach for Determining Applicability Domain of QSAR Models. Chemom. Intell. Lab. Syst. 2015, 145, 22–29.

28 ACS Paragon Plus Environment

Page 28 of 33

Page 29 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

Modular neural network for regression mtk-QSAR (rmtk-QSAR) modeling 86x52mm (300 x 300 DPI)

ACS Paragon Plus Environment

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

mtk-QSAR cascade decision algorithm. ED50 and TD50 values are in µmol/kg 164x137mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 30 of 33

Page 31 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

CDA results based on PNN mtk-QSAR models a) ED50 MES, b) ED50 PTZ, c) TD50 250x559mm (300 x 300 DPI)

ACS Paragon Plus Environment

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Results of MNN prediction of ED50 and TD50 values for test compounds. The lines indicate 95% confidence interval 60x37mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 32 of 33

Page 33 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

From classification to regression multi-tasking QSAR modelling using a novel modular neural network: Simultaneous prediction of anticonvulsant activity and neurotoxicity of succinimides

Davor Antanasijević, Jelena Antanasijević, Nemanja Trišović, Gordana Ušćumlić, Viktor Pocajt

A regression multi-tasking QSAR MNN model has been proposed for the simultaneous prediction of anticonvulsant activity and neurotoxicity of succinimides.

ACS Paragon Plus Environment