Document not found! Please try again

From Classification to Regression Multitasking QSAR Modeling

The MNN uses standard classification mtk-QSAR models as input modules, while the ... For a more comprehensive list of citations to this article, users...
0 downloads 0 Views 1MB Size
Article pubs.acs.org/molecularpharmaceutics

From Classification to Regression Multitasking QSAR Modeling Using a Novel Modular Neural Network: Simultaneous Prediction of Anticonvulsant Activity and Neurotoxicity of Succinimides Davor Antanasijević,*,† Jelena Antanasijević,‡ Nemanja Trišović,‡ Gordana Ušcú mlić,‡ and Viktor Pocajt‡ Innovation Center of the Faculty of Technology and Metallurgy and ‡Faculty of Technology and Metallurgy, University of Belgrade, Karnegijeva 4, Belgrade 11120, Serbia

Downloaded via UNIV OF SUSSEX on July 28, 2018 at 08:23:51 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.



S Supporting Information *

ABSTRACT: Succinimides, which contain a pharmacophore responsible for anticonvulsant activity, are frequently used antiepileptic drugs and the synthesis of their new derivatives with improved efficacy and tolerability presents an important task. Nowadays, multitarget/tasking methodologies focused on quantitative-structure activity relationships (mt-QSAR/mtkQSAR) have an important role in the rational design of drugs since they enable simultaneous prediction of several standard measures of biological activities at diverse experimental conditions and against different biological targets. Relating to this very topic, the mt-QSAR/mtk-QSAR methodology can give only binary classification models, and as such, in this study a regression mtk-QSAR (rmtk-QSAR) model based on a novel modular neural network (MNN) has been proposed. The MNN uses standard classification mtk-QSAR models as input modules, while the regression is performed by the output module. The rmtk-QSAR model has been successfully developed for the simultaneous prediction of anticonvulsant activity and neurotoxicity of succinimides, with a satisfactory accuracy in testing (R2 = 0.87). Thus, the proposed mtk-QSAR regression method can be regarded as a viable alternative to the standard QSAR methodology. KEYWORDS: succinimides, multitasking, QSAR, regression, modular neural network

1. INTRODUCTION Since 20−40% of patients with epilepsy have drug resistance,1 continuous search for novel active molecules with more favorable anticonvulsant properties, improved efficacy, and tolerability is necessary. It is well-known that the structural fragment crucial for anticonvulsant activity is nitrogen heterocyclic system, usually imide or lactam, with phenyl or alkyl groups attached to the ring.2,3 Since many studies have shown that differently substituted succinimides (pyrrolidine2,5-diones) reveal prominent anticonvulsant properties in the animal models of epilepsy (maximal electroshock (MES) and pentylenetetrazole (PTZ) seizure tests),1,4−10 these compounds are regarded as favorable in future researches and they were selected as the subject of this study. Succinimides are frequently used anticonvulsants in the management of absence seizures. Although not completely understood, their mechanism of action involves a decrease in Ttype calcium channel activity.11,12 Ethosuximide (3-ethyl-3methylpyrrolidine-2,5-dione, Zarontin) is the most commonly used among them. This drug offers a wide range of protection against different kinds of absence seizures and could potentially be useful in conjunction with other anticonvulsants in the treatment of patients with mixed seizure types.13,14 Methsuximide (1,3-dimethyl-3-phenylpyrrolidine-2,5-dione, Celontin) is © 2017 American Chemical Society

generally used when other drugs fail to effectively control absence seizures or partial seizures with complex symptomatology.13,14 Phensuximide (1-methyl-3-phenylpyrrolidine-2,5dione, Milontin) is the least toxic but also regarded to be less effective than other succinimide anticonvulsants.13,14 Considering their effect on experimental seizures, all three drugs prevent PTZ seizures at nontoxic doses in experimental animals. While methsuximide and phensuximide are capable of protecting mice against MES at non-neurotoxic and slightly ataxic doses, respectively, ethosuximide has no effect on MES, except at high anesthetic dose level.15 Having this in mind, significant efforts have been continually directed toward design and synthesis of novel succinimides as potential central nervous system-active drug candidates. In this context, QSAR (quantitative structure−activity relationship) methodology has emerged as a promising tool for identifying the most promising candidates among a high numbers of compounds, thus efficiently managing costly resources and significantly shortening the cycle of drug development.16 Received: Revised: Accepted: Published: 4476

July 8, 2017 September 26, 2017 November 4, 2017 November 4, 2017 DOI: 10.1021/acs.molpharmaceut.7b00582 Mol. Pharmaceutics 2017, 14, 4476−4484

Article

Molecular Pharmaceutics

Thus, standard mt-QSAR/mtk-QSAR methodology is extended to regression analysis.

QSAR studies endeavor to associate the chemical structure of a compound with its activity, with the assumption that correlations exist between physicochemical properties and molecular structure.17 QSAR is a powerful tool for design of new compounds as well as for the optimization of structure.18 It has proven successful in many aspects of molecular design particularly in the fields of drug discovery.19 In recent years, researchers have been focused on developing multitarget QSAR (mt-QSAR) models,20−23 that is, the prediction of biological activities by considering different biological targets, as well as, on developing multitasking QSAR (mtk-QSAR) models where different measures of biological effects and diverse experimental conditions are considered.24,25 Such models can also be used for the prediction of many ADME (absorption, distribution, metabolism, and elimination) properties26 and were applied to the calculation of quantitative contribution of diverse molecular fragments to the activity, toxicity, or any ADME property, which allows the analysis and fast detection of 2D pharmacophores, toxicophores, etc.27,28 Hence, with the use of mt-QSAR or mtk-QSAR models, it is possible to design new molecular entities with the desired properties, for example, high inhibitory activity, and at the same time low toxicity, and good ADME properties.29,30 The mt-QSAR/mtk-QSAR models are based on the application of the moving average approach, whereby the inputs compare the similarity of present case with the average case, for which activity/toxicity are above or below predefined cutoff values. Thus, this methodology is limited to producing only binary classification models. Some regression multitarget QSAR models, which are proposed for multitarget drugs such as HIV-1 inhibitors can be found in literature,31,32 but these kinds of examples have generally been developed using another approach, that is, multitask learning (MTL).33 In the MLT, for each compound a set of triples (X, y, t) is used, where X is matrix of descriptors, y is measured output, and t indicates to which target the triple belongs. Since X and y are defined as for a single-target QSAR, the QSAR model for a certain target is a separate learning task.33 Among other statistical methods, artificial neural networks (ANNs) are especially popular in the field of QSAR modeling owing to their capabilities to fit nonlinear relationships.34 A special type of ANNs is modular neural network (MNN) that represents an ensemble of independently trained neural networks but their outputs are somehow combined to reach a final prediction.35 Modular approaches have been already successfully applied in QSAR studies, for example, unsupervised trained ANNs have been used to group similar compounds after local QSAR models were generated for each group using back-propagation neural networks.36 In this paper, a novel MNN, specially designed to overcome limitation of multitasking QSAR methodology in its application for simultaneous prediction of anticonvulsant activity and neurotoxicity of succinimides, is reported. The input modules of the MNN are probabilistic neural networks (PNNs), while the output module is a generalized regression neural network (GRNN). Both neural network techniques are simple and powerful for use in QSAR studies.37 The proposed MNN transforms binary classification responses from several mtkQSAR models into a regression response, allowing prediction of the actual values of different measures of biological effects.

2. MATERIALS AND METHODS 2.1. Data Set and Calculation of Multitasking Descriptors. The data set consisted of 174 structurally diverse succinimides (see Table S1 in Supporting Information), which were taken from literature (see refs S1−S26 presented in Supporting Information). Considering that they had been examined for different biological effects (ED50 MES, ED50 PTZ, and TD50) and on/in different biological targets (mice and rats), this data set contained 327 cases. The anticonvulsant activity (MES, scPTZ tests) and the acute neurological toxicity for the compounds were expressed as their median effective (ED50) or toxic (TD50) doses in μmol/kg, whereby the ED50 was defined as the dose of a drug protecting 50% of animals, while TD50 was the dose that causing minimal neurological toxicity in 50% of animals. The molecular structures of compounds were sketched using ChemDraw, and a variety of 1D and 2D descriptors were generated using PaDEL Descriptor software.38 For each case in the data set, one of two categories, depending on the predefined cutoff values (Table 1), was Table 1. Models and Corresponding Cutoffs cutoff values

number of cases

model label

ED50 (μmol/kg)

TD50 (μmol/kg)

1 (yes)

0 (no)

I II III IV V

>100 >200 >300 >400 >500

>500 >600 >700 >800 >900

279 227 164 126 113

48 100 163 201 214

assigned. Therefore, in each model all compounds that have ED50 > cutoff and TD50 > cutoff were assigned to category 1, that is, low active/toxic, otherwise, the compounds were as considered high active/toxic and assigned to category 0. Following the multitasking QSAR methodology,24,25 the calculation of multitasking descriptors was based on the BoxJenkins moving average approach:23 n

Dbt , sm =

∑i =bt1,sm D nbt , sm

(1)

where D(bt , sm) is the mean of the descriptors (D) for all the cases where a compound was tested against the same target (bt) using the same standard measure of biological effect (sm), whereby sm > cutoff, and nbt,Sm is the number of cases that fulfills the condition above. Then the multitasking descriptors (ΔDbt,Sm) are calculated as deviation from the average value, and therefore, they describe both biological target and measure of biological effect: ΔDbt , sm = D − Dbt , sm

(2)

The number of positive (1) and negative (0) cases for each combination of bt and sm, as well as for each model, is given in Table 2. 2.2. Selection of Descriptors. In this study, the selection of descriptors was performed in two consecutive steps. First, a Java implementation of the V-WSP algorithm39 had been used to determine collinear descriptors (r > 0.95), which was 4477

DOI: 10.1021/acs.molpharmaceut.7b00582 Mol. Pharmaceutics 2017, 14, 4476−4484

Article

Molecular Pharmaceutics

the selection of subsets of the descriptors. Before the RSR was applied, the data set was split into training (287 cases) and testing (40 cases) subsets using the Kennard-Stone algorithm.45 The obtained kNN mtk-QSAR models were used to benchmark the corresponding ANN mtk-QSAR models. 2.3. Proposed Modular Neural Network. In this study, a two layer modular neural network (MNN) was used (Figure 1)

Table 2. Number of Positive (1) And Negative (0) Cases Per Model Depending on Measure and Biological Target Model sm ED50

MES

bt

outcome

I

II

III

IV

V

mouse

1 0 1 0 1 0 1 0 1 0 1 0

127 11 5 17 52 0 7 0 76 14 12 6

109 29 3 19 36 16 7 0 61 29 11 7

78 60 2 20 14 38 5 2 54 36 11 7

57 81 1 21 5 47 5 2 47 43 11 7

48 90 1 21 5 47 5 2 43 47 11 7

rat PTZ

mouse rat

TD50

mouse rat

followed by the use of the Reshaped sequential replacement (RSR) algorithm that yielded the best subset of descriptors. The V-WSP algorithm is an adaptation of the WSP (Wootton, Sergent, Phan-Tan-Luu’s) algorithm, which was modified by Ballabio et al.40 with the aim of selecting a representative set of variables. The V-WSP algorithm served as a correlation filter and reduced the initial pool of descriptors from 609 to 242. To select the descriptors from the reduced pool of 242 descriptors, the RSR algorithm proposed by Cassotti et al.41 was applied. The RSR algorithm is based on Miller’s Sequential Replacement (SR) method,42 but has added several functionalities which decrease the calculation time, increase the probability of convergence upon the optimal models, and identify models with drawbacks such as overfitting, chance correlation, variable redundancy and collinearity.43 For classification problems, these functionalities are as follows:41 • Tabu list: Preliminary exclusion of variables not correlated with the output; they are reincluded after the RSR algorithm reaches convergence, but only if they improve the model higher than a predefined threshold. The Canonical measure of correlation (CMC) index, with the threshold value of 0.3, was used for screening the “tabu” descriptors. • Roulette wheel: The initialization of the population was based on the calculated CMC index for each descriptor, whereby the descriptors with the higher CMC index had higher probability to be selected. • Randomization test: The real model error is compared with the random classification error (random error rate (RER) test), that is, with the error rate obtained if the cases are randomly assigned to the classes, and final model is accepted if the real error rate is smaller than the corresponding RER value. • Nested models: The final population of models was checked in the terms of complexity and performance. The model is rejected if its higher complexity is not balanced by higher performance. • Model distance and correlation: The canonical measure of distance (CMD) and CMC indices were used to determine whether the final models with different variables are actually different in their nature.43 The RSR (Matlab) toolbox44 that uses k-nearest neighbors (kNN) to calibrate classification models, with the nonerror rate in cross-validation (NERcv) as a fitness function, was used for

Figure 1. Modular neural network for regression mtk-QSAR (rmtkQSAR) modeling.

for the transformation of binary classification responses from several input modules into a regression response made by single (output) module. The modules in the input layer were independently trained probabilistic neural networks (PNN),46 while the output module was a general regression neural network (GRNN).47 This type of MNN belongs to the group of loosely coupled models,35 and similar MNNs have been proposed for classification problems.48 The PNNs were selected for the input layer of MNN since they are conceptually similar to the kNN, which makes them suitable to be combined with the RSR algorithm, while the GRNN was selected because it is built on the same learning paradigm as the PNN. The number of input modules is not limited to five, which was used in the current study; it can be increased if needed. It can be expected that MNN with higher number of input modules will have somewhat enhanced performance, but essentially the performance of MNN mainly depends on the accuracy of input modules. An additional comparative study that may be needed to quantitatively determine the accuracy of MNN depending on its architecture is beyond the scope of this research. The PNN/GRNN modules are four-layered one-pass supervised trained networks that, in the case of a PNN, estimate the probability density function (PDF) of features of each class from the available training samples using Gaussian kernel, while the GRNN module provides an optimal estimation of continuous variables implementing the statistical concepts of conditional probability.49 In both of the networks, conditional probabilities were estimated using the Parzen’s window approach.50 The main advantage of a PNN/GRNN is extremely rapid training, which does not require an iterative training procedure. It should be noted however that although PNN/GRNN training is not iterative, it requires the estimation of a Gaussian kernel bandwidth, which is described by smoothing factor (σ). A small value of σ causes the estimated parent density function to have distinct modes corresponding to the locations of the training samples, while a larger value produces a greater 4478

DOI: 10.1021/acs.molpharmaceut.7b00582 Mol. Pharmaceutics 2017, 14, 4476−4484

Article

Molecular Pharmaceutics degree of interpolation between points.46 Therefore, σ defines PNN/GRNN’s predicting performance. During the PNN/ GRNN training, the learning data set (80% of training data) was used to set the network weights, while the validation data set (20% of training data) was utilized for the determination of optimal smoothing factor by genetic algorithm.51 The training of PNN and GRNN is described in more details in literature.46,47 For the selected PNN/GRNN modules, the number of neurons in their input layer (6 to 8) corresponds to the number of descriptors/inputs used, while the number of neurons in pattern (230), summation (2), and output (1) layer was the same for all modules. The training set for PNN modules was built from the RSR selected descriptors and binary output that correspond to the selected cutoff value. For the GRNN module, the training set is built from the output of the PNN modules and one dummy variable that distinguishes between ED50 and TD50, together with the measured ED50/TD50 values that served as the model output. Beside its use in the MNN, each PNN module can be applied as a separate mtk-QSAR model for the classification of compounds into high biological active or low toxic, and vice versa. Also, they can be coupled into a cascade algorithm for the determination of the range to which ED50/TD50 belongs; details are provided in the ED50 and TD50 Range Prediction section. Since MNN mtk-QSAR is a regression model, it will be labeled as rmtk-QSAR.

corresponding description are depicted in Table 5. The predictions of each kNN model for test compounds are presented in Supporting Information (see Table S2). It should be noted that reported ED50 and TD50 values often demonstrate a wide 95% confidence interval (see Table S2 in Supporting Information), which should also be taken into account when prediction results are analyzed. Additional uncertainty raised from the fact that in vivo activities from multiple sources with nonuniform testing protocols were collected, that is, different TD tests (rotarod and chimney test) as well as diverse pretreatment times (30 min to 4 h). To compare the relative potencies of investigated compounds, it is necessary to determine the pretreatment time required after the administration for peak anticonvulsant effect to be exhibited.52 The peak time depends on various factors including: route of administration, experimental animal (mouse vs rat), and the pharmacokinetics of compounds. However, in many reported studies, investigated compounds were tested after the administration of single dose and the anticonvulsant effect was then determined at one fixed time point (e.g., 30 min).53 Since this information were not often reported, it could not be included into the models. Since the true predictive power of a model can be assessed only if it is applied on a set of compounds that the model has never seen before,19 each QSAR model was evaluated only based on the results obtained for the test set. All kNN models have high sensitivity (SE > 90%), while the specificity for the majority of models was low (SP < 80%). The most efficient separation was achieved by kNN-IV and kNN-V model (Q > 90%). Conversely, the PNN models (Table 6 and Table S3 in Supporting Information), developed using the same descriptors, have high sensitivity, except for PNN-II, and also satisfactory specificity for all models, except PNN-III. Efficient separation was performed by the majority of PNN models (Q > 90%), which makes them suitable modules for the MNN regression mtk-QSAR model. 3.2. ED50 and TD50 Range Prediction. As it was mentioned before, the PNN modules can be used as separate mtk-QSAR models but also can be assembled in a cascade decision algorithm (CDA) that allows the determination of the range of values to which ED50 and TD50 belong (Figure 2). The CDA ends when first model in the cascade gives positive (1) prediction, and that model is labeled as the decisive one. An obvious feature of the presented CDA routine is the reduction of the risk of false assessment of high active/toxic compounds to the minimum since such predictions must be verified by all models. The decisive kNN and PNN models are designated in Supporting Information (Tables S2 and S3) for each test compound. The predicted ranges for the test compounds obtained by kNN models with corresponding 95% confidence interval are presented in Supporting Information (Figure S1), while the same plot for PNN models is given in Figure 3. As can be observed in Figure 3, the ranges predicted correspond to the reported 95% confidence intervals. The accuracy of the CDA in predicting high and low active/toxic compounds is presented in Table 7. The CDA based on the PNN models outperformed the corresponding kNN models. In the case of compounds 15, 57, 101, and 102 (Table S3 in Supporting Information), the CDA gave an incorrect range although four of the five models did produce accurate predictions. This is a drawback of the cascade algorithm because its predictions depend on a single decision model only.

3. RESULTS AND DISCUSSION 3.1. mtk-QSAR Models. In this study, all classification mtkQSAR models were evaluated based on the counts of true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN). In addition, performance indices, such as sensitivity (SE) (eq 3), specificity (SP) (eq 4), predictive accuracy (Q) (eq 5), and nonerror rate (NER) (eq 6), were utilized: SE = TP/(TP + FN)

(3)

SP = TN/(TN + FP)

(4)

Q = (TP + TN)/N

(5)

NER = (SE + SP)/2

(6)

The RSR calculations were carried out setting the model size from 6 to 8 and the number of models for each model size to 3. Each combination of descriptors had passed the RER test and nested criterion and was selected based on five-fold cross validation. The RSR parameters are presented in Table 3, while the results of the RSR algorithms are given in Table 4. The symbols for all selected descriptors together with their Table 3. RSR Algorithm Parameters

a

parameter

value

minimum model size maximum model size number of seeds number of cross-validation groups maximum value of k for kNN type of probability for RER test threshold for nested models detection

6 8 3 5 6 propa 0.005

Proportional to the number of objects of each class. 4479

DOI: 10.1021/acs.molpharmaceut.7b00582 Mol. Pharmaceutics 2017, 14, 4476−4484

Article

Molecular Pharmaceutics Table 4. Performance of kNN mtk-QSAR Models on Testing Model performance index a

size NERcv test

a

NERtst Q (%) SE (%) SP (%) TP FN TN FP

kNN-I

kNN-II

kNN-III

kNN-IV

kNN-V

8 0.878 0.706 (0.887)b 85.0 (90.0) 91.2 (94.1) 50.0 (66.7) 31 (32) 3 (2) 3 (4) 3 (2)

6 0.796 0.727 (0.793) 77.5 (82.5) 92.0 53.3 (66.7) 23 2 8 (10) 7 (5)

7 0.794 0.866 87.5 95.5 77.8 21 1 14 4

6 0.820 0.907 (0.947) 90.0 (95.0) 93.3 88.0 (96.0) 14 1 22 (24) 3 (1)

8 0.856 0.854 (0.964) 87.5 (97.5) 78.6 (92.9) 92.3 (100) 11 (13) 3 (1) 24 (26) 2 (0)

Number of inputs. bIf the 95% confidence interval is taken into account.

Table 5. Descriptors Selected by RSR Algorithm with Description model

name

description (deviation from...)

II V I V I I III V III IV, V III IV IV II, IV I I II IV, V V I V I II III I, III IV III III II II V

ΔALogP ΔSM1_Dzs ΔSpMin3_Bhv ΔSpMin3_Bhi ΔSpMin5_Bhm ΔSpMin5_Bhv ΔSpMin7_Bhv ΔSpMin8_Bhv ΔSpMax2_Bhe ΔSpMax4_Bhv ΔSpMax6_Bhv ΔSpMax7_Bhs ΔSpMax8_Bhv ΔnBondsS2 ΔnRotBt ΔnHBDon ΔnHBDon_Lipinski ΔnHBAcc ΔVE3_Dt ΔMLFER_E ΔMDEO-11 ΔpiPC9 ΔpiPC10 ΔETA_Beta_ns ΔETA_EtaP_F ΔETA_Eta_F ΔtopoRadius ΔGGI7 ΔGGI9 ΔGGI10 ΔVR1_D

Ghose-Crippen LogKow Spectral moment of order 1 from Barysz matrix weighted by I-state Smallest absolute eigenvalue of Burden modified matrix, n 3 weighted by relative van der Waals volumes Smallest absolute eigenvalue of Burden modified matrix, n 3 weighted by relative first ionization potential Smallest absolute eigenvalue of Burden modified matrix, n 5 weighted by relative mass Smallest absolute eigenvalue of Burden modified matrix, n 5 weighted by relative van der Waals volumes Smallest absolute eigenvalue of Burden modified matrix, n 7 weighted by relative van der Waals volumes Smallest absolute eigenvalue of Burden modified matrix, n 8 weighted by relative van der Waals volumes Largest absolute eigenvalue of Burden modified matrix, n 2 weighted by relative Sanderson electronegativities Largest absolute eigenvalue of Burden modified matrix, n 4 weighted by relative van der Waals volumes Largest absolute eigenvalue of Burden modified matrix, n 6 weighted by relative van der Waals volumes Largest absolute eigenvalue of Burden modified matrix, n 7 weighted by relative I-state Largest absolute eigenvalue of Burden modified matrix, n 8 weighted by relative van der Waals volumes Total number of single bonds including bonds to hydrogens, excluding aromatic bonds Number of rotatable bonds, including terminal bonds Number of hydrogen bond donors using CDK H Bond Donor Count Descriptor algorithm Number of hydrogen bond donors using Lipinski’s definition: Any OH or NH. Number of hydrogen bond acceptors Logarithmic coefficient sum of the last eigenvector from detour matrix Excessive molar refraction Molecular distance edge between all primary oxygens Conventional bond order ID number of order 9 ln 1+x Conventional bond order ID number of order 10 ln 1+x Measure of electron-richness of the molecule Functionality index EtaF relative to molecular size Functionality index EtaF Topological radius minimum atom eccentricity Topological charge index of order 7 Topological charge index of order 9 Topological charge index of order 10 Randic-like eigenvector-based index from topological distance matrix

3.3. rmtk-QSAR Model. After all PNN modules are created, the GRNN module is trained using known binary outputs and one dummy variable (Figure 1). Thus, the MNN converts the standard mtk-QSAR binary response into actual ED50 and TD50 values. Since the PNN-V is used to predict if ED50 exceeds 500 μmol/kg and TD50 exceeds 900 μmol/kg, the reported ED50 values higher than 500 μmol/kg are standardized to 550 μmol/kg, while TD50 values higher than 900 μmol/kg are standardized to 950 μmol/kg. The GRNN module was tested using the same test compounds. The input values for test compounds are obtained from the PNN modules in the form of

category probabilities (see Table S4 in Supporting Information). The measured and MNN rmtk-QSAR predicted values of ED50 and TD50 are given in Figure 4. The performance of the MNN model is evaluated by coefficient of determination (R2), mean absolute error (MAE), and root mean squared error (RMSE). In addition, for better estimation of the external predictive potential of models, a modified r2 (rm2) performance metric (eq 7) was used, where ro2 is squared correlation coefficient between the observed and predicted values of the test set compounds with intercept set to zero.54 The value of rm2 should be greater than 0.5 for an acceptable model: 4480

DOI: 10.1021/acs.molpharmaceut.7b00582 Mol. Pharmaceutics 2017, 14, 4476−4484

Article

Molecular Pharmaceutics Table 6. Performance of PNN mtk-QSAR Models on Testing model performance index NERtst Q (%) SE (%) SP (%) TP FN TN FP a

PNN-I 0.843 (0.956) 85.0 (92.5) 85.3 (91.2) 83.3 (100) 29 (31) 5 (3) 5 (6) 1 (0)

a

PNN-II

PNN-III

PNN-IV

PNN-V

0.707 (0.827) 70.0 (80.0) 68.0 (72.0) 73.3 (93.3) 17 (18) 8 (7) 11 (14) 4 (1)

0.843 (0.866) 85.0 (87.5) 90.9 (95.5) 77.8 20 (21) 2 (1) 14 4

0.900 (0.940) 87.5 (92.5) 100 80.0 (88.0) 15 0 20 (22) 5 (3)

0.885 (0.942) 85.0 (92.5) 100 76.9 (88.5) 14 0 20 (23) 6 (3)

If the 95% confidence interval is taken into account.

Figure 2. mtk-QSAR cascade decision algorithm. ED50 and TD50 values are in μmol/kg.

(

rm 2 = r 2 1 −

r 2 − ro 2

)

proach.55 This analysis has confirmed that the test compounds have been properly selected since none of the compounds was identified as an outlier for any of the PNN models. Only three compounds (85, 89, 104) were found to fall outside of the AD in the case of the PNN-IV model. If a molecule lies outside of the AD, this does not automatically mean that the prediction by model will be incorrect, but it informs the user about a high level of possible uncertainty.19 In the case of those three compounds, their ED50 values have been predicted with relatively low error and they were in accordance with the 95% confidence interval (Table 8).

(7) 2

The values of calculated performance metrics (R = 0.87, MAE = 62 μmol/kg, RMSE = 100 μmol/kg, and rm2 = 0.57) indicate that the predictions of ED50 and TD50 were made with satisfactory accuracy. This makes the proposed mtk-QSAR regression method a viable alternative to the standard QSAR regression methodology. 3.4. Applicability Domain and Outliers. The OECD (Organization for Economic Cooperation and Development) recommended that validated QSAR models should have a defined applicability domain (AD). The AD estimates the uncertainty in the prediction of a test compound based on how similar it is to the training compounds.55 The AD can be characterized in various ways, and it is determined by the descriptors used in the model.20 In this study, the AD, as well as the presence of outliers, was determined using a recently proposed standardization ap-

4. CONCLUSIONS Multitarget/tasking quantitative structure−activity relationship (mt-QSAR/mtk-QSAR) methodology is a promising alternative to the standard drug design procedures. Although mtQSAR/mtk-QSAR models demonstrated high performance for the prediction of various measures of biological activities at 4481

DOI: 10.1021/acs.molpharmaceut.7b00582 Mol. Pharmaceutics 2017, 14, 4476−4484

Article

Molecular Pharmaceutics

Figure 4. Results of MNN prediction of ED50 and TD50 values for test compounds. The lines indicate 95% confidence interval.

Table 8. Test Compounds Outside the Applicability Domain comp.

bt (sm)

85 89 104

rat (ED50 MES)

a

model

high active/toxic

kNN PNN

50% 75%

low active/toxic 94% 100%

46 (22−82)a 50 (22−95) 73 (50−106)

55 55 54

95% confidence interval.



ASSOCIATED CONTENT

S Supporting Information *

Table 7. Accuracy of ED50 and TD50 Range Prediction b

MNN ED50 (μmol/kg)

active/nonactive or toxic/nontoxic. In this paper, the mtQSAR/mtk-QSAR concept has been extended by creating a novel regression mtk-QSAR model based on a modular neural network (MNN). This MNN consists of two layers of modules, and it is designed to transform binary responses from several standard mtk-QSAR models (input modules) into a single regression response made by the output module. Each input module is trained independently and therefore can serve as a standard classification mtk-QSAR model. The proposed MNN has been applied on a data set consisting of succinimides, with the aim of simultaneously predicting their anticonvulsant activity and their neurotoxicity. It was demonstrated that the MNN model can predict ED50 MES, ED50 PTZ, and TD50 tested on mice and rats with satisfactory accuracy (R2 = 0.87). Furthermore, the applicability domain (AD) analysis demonstrated that similar accuracy had been achieved even for compounds that have a structure outside of the AD. Further research is planned on establishing the relationship between MNN parameters (number of input modules) and its accuracy. For that purpose, a data set that contains more than two thousands cases for one and a half thousands compounds tested on four biological targets is currently being gathered.

Figure 3. CDA results based on PNN mtk-QSAR models (a) ED50 MES, (b) ED50 PTZ, (c) TD50.

a

measured ED50 (μmol/kg)

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.molpharmaceut.7b00582. Chemical structures with related anticonvulsant activity and neurotoxicity data; mtk-QSAR kNN and PNN testing results; CDA results based on kNN mtk-QSAR models; testing data set for GRNN module of MNN model (PDF)

low and high active/toxic 80% 92%

a ED50 < 100 μmol/kg and TD50 < 500 μmol/kg bED50 > 500 μmol/kg and TD50 > 900 μmol/kg

diverse experimental conditions and against different biological targets, they are limited to only binary predictions such as 4482

DOI: 10.1021/acs.molpharmaceut.7b00582 Mol. Pharmaceutics 2017, 14, 4476−4484

Article

Molecular Pharmaceutics



(12) Gomora, J. C.; Daud, A. N.; Weiergräber, M.; Perez-Reyes, E. Block of Cloned Human T-Type Calcium Channels by Succinimide Antiepileptic Drugs. Mol. Pharmacol. 2001, 60, 1121−1132. (13) McEvoy, G. K. AHFS Drug Information; American Society of Health-System Pharmacists: Bethesda, MD, 1995. (14) LeDuc, B. Antiseizure Drugs. In Foye’s Principles of Medicinal Chemistry; Lemke, T. L., Williams, D. A., Eds.; Lippincott Williams and Wilkins: Philadelphia, PA, 2008. (15) Chen, G.; Weston, J. K.; Bratton, A. C., Jr. Anticonvulsant Activity and Toxicity of Phensuximide, Methsuximide and Ethosuximide. Epilepsia 1963, 4, 66−76. (16) Manly, C. J.; Louise-May, S.; Hammer, J. D. The Impact of Informatics and Computational Chemistry on Synthesis and Screening. Drug Discovery Today 2001, 6 (21), 1101−1110. (17) Hanrahan, G. Artificial Neural Networks in Biological and Environmental Analysis; CRC Press: Boca Raton, FL, 2011. (18) Knight, N. J.; Hernando, E.; Haynes, C. J. E.; Busschaert, N.; Clarke, H. J.; Takimoto, K.; García-Valverde, M.; Frey, J. G.; Quesada, R.; Gale, P. A. QSAR Analysis of Substituent Effects on Tambjamine Anion Transporters. Chem. Sci. 2016, 7, 1600−1608. (19) Gupta, J.; Adams, D. J.; Berry, N. G. Will It Gel? Successful Computational Prediction of Peptide Gelators Using Physicochemical Properties and Molecular Fingerprints. Chem. Sci. 2016, 7, 4713− 4719. (20) García, I.; Fall, Y.; Gómez, G.; González-Díaz, H. First Computational Chemistry Multi-Target Model for Anti-Alzheimer, Anti-Parasitic, Anti-Fungi, and Anti-Bacterial Activity of GSK-3 Inhibitors in Vitro, in Vivo, and in Different Cellular Lines. Mol. Diversity 2011, 15 (2), 561−567. (21) Speck-Planche, A.; Kleandrova, V. V.; Luan, F.; Cordeiro, M. N. D. S. Rational Drug Design for Anti-Cancer Chemotherapy: MultiTarget QSAR Models for the in Silico Discovery of Anti-Colorectal Cancer Agents. Bioorg. Med. Chem. 2012, 20 (15), 4848−4855. (22) Speck-Planche, A.; Kleandrova, V. V.; Luan, F.; Cordeiro, M. N. D. S.; Cordeiro, M. In Silico Discovery and Virtual Screening of MultiTarget Inhibitors for Proteins in Mycobacterium Tuberculosis. Comb. Chem. High Throughput Screening 2012, 15 (8), 666−673. (23) Speck-Planche, A.; Kleandrova, V. V.; Ruso, J. M.; Cordeiro, M. N. D. S. First Multitarget Chemo-Bioinformatic Model to Enable the Discovery of Antibacterial Peptides against Multiple Gram-Positive Pathogens. J. Chem. Inf. Model. 2016, 56 (3), 588−598. (24) Speck-Planche, A.; Kleandrova, V. V.; Cordeiro, M. N. D. S. New Insights toward the Discovery of Antibacterial Agents: MultiTasking QSBER Model for the Simultaneous Prediction of AntiTuberculosis Activity and Toxicological Profiles of Drugs. Eur. J. Pharm. Sci. 2013, 48 (4−5), 812−818. (25) Speck-Planche, A.; Kleandrova, V. V.; Cordeiro, M. N. D. S. Chemoinformatics for Rational Discovery of Safe Antibacterial Drugs: Simultaneous Predictions of Biological Activity against Streptococci and Toxicological Profiles in Laboratory Animals. Bioorg. Med. Chem. 2013, 21 (10), 2727−2732. (26) Speck-Planche, A.; Cordeiro, M. N. D. S. Simultaneous Virtual Prediction of Anti- Escherichia Coli Activities and ADMET Profiles: A Chemoinformatic Complementary Approach for High-Throughput Screening. ACS Comb. Sci. 2014, 16 (2), 78−84. (27) Speck-Planche, A.; Cordeiro, M. N. D. S. Simultaneous Modeling of Antimycobacterial Activities and ADMET Profiles: A Chemoinformatic Approach to Medicinal Chemistry. Curr. Top. Med. Chem. 2013, 13, 1656−1665. (28) Speck-Planche, A.; Cordeiro, M. N. D. S. Chemoinformatics for Medicinal Chemistry: In Silico Model to Enable the Discovery of Potent and Safer Anti-Cocci Agents. Future Med. Chem. 2014, 6, 2013−2028. (29) Speck-Planche, A.; Cordeiro, M. N. D. S. Fragment-Based in Silico Modeling of Multi-Target Inhibitors against Breast CancerRelated Proteins. Mol. Diversity 2017, 21 (3), 511−513. (30) Speck-Planche, A.; Dias Soeiro Cordeiro, M. N. Speeding up Early Drug Discovery in Antiviral Research: A Fragment-Based in

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Phone: +381 11 3303 650. ORCID

Davor Antanasijević: 0000-0002-0915-1281 Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS The authors are grateful to the Ministry of Education, Science and Technological Development of the Republic of Serbia, Project Nos. 172007 and 172013 for financial support.



REFERENCES

(1) Kamiński, K.; Rapacz, A.; Filipek, B.; Obniska, J. Design, Synthesis and Anticonvulsant Activity of New Hybrid Compounds Derived from N-Phenyl-2-(2,5-Dioxopyrrolidin-1-Yl)-Propanamides and -Butanamides. Bioorg. Med. Chem. 2016, 24 (13), 2938−2946. (2) Perisic-Janjic, N.; Kaliszan, R.; Wiczling, P.; Milosevic, N.; Uscumlic, G.; Banjac, N. Reversed-Phase TLC and HPLC Retention Data in Correlation Studies with in Silico Molecular Descriptors and Druglikeness Properties of Newly Synthesized Anticonvulsant Succinimide Derivatives. Mol. Pharmaceutics 2011, 8 (2), 555−563. (3) Perisic-Janjic, N.; Kaliszan, R.; Milosevic, N.; Uscumlic, G.; Banjac, N. Chromatographic Retention Parameters in Correlation Analysis with in Silico Biological Descriptors of a Novel Series of NPhenyl-3-Methyl Succinimide Derivatives. J. Pharm. Biomed. Anal. 2013, 72, 65−73. (4) Rapacz, A.; Obniska, J.; Wiklik-Poudel, B.; Rybka, S.; Sałat, K.; Filipek, B. Anticonvulsant and Antinociceptive Activity of New Amides Derived from 3-Phenyl-2,5-Dioxo-Pyrrolidine-1-Yl-Acetic Acid in Mice. Eur. J. Pharmacol. 2016, 781, 239−249. (5) Rybka, S.; Obniska, J.; Rapacz, A.; Furgała, A.; Filipek, B.; Ż mudzki, P. Synthesis and Evaluation of Anticonvulsant Properties of New N-Mannich Bases Derived from 3-(1-Phenylethyl)- and 3-BenzylPyrrolidine-2,5-Dione. Bioorg. Med. Chem. Lett. 2016, 26 (9), 2147− 2151. (6) Obniska, J.; Rapacz, A.; Rybka, S.; Góra, M.; Kamiński, K.; Sałat, K.; Zmudzki, P. Synthesis, and Anticonvulsant Activity of New Amides Derived from 3-Methyl- or 3-Ethyl-3-Methyl-2,5-Dioxo-Pyrrolidin-1Yl-Acetic Acids. Bioorg. Med. Chem. 2016, 24 (8), 1598−1607. (7) Kamiński, K.; Zagaja, M.; Rapacz, A.; Łuszczki, J. J.; AndresMach, M.; Abram, M.; Obniska, J. New Hybrid Molecules with Anticonvulsant and Antinociceptive Activity Derived from 3-Methylor 3,3-Dimethyl-1-[1-Oxo-1-(4-Phenylpiperazin-1-Yl)propan-2-Yl]pyrrolidine-2,5-Diones. Bioorg. Med. Chem. 2016, 24 (4), 606−618. (8) Obniska, J.; Rapacz, A.; Rybka, S.; Powroznik, B.; Pekala, E.; Filipek, B.; Zmudzki, P.; Kaminski, K. Design, Synthesis and Biological Activity of New Amides Derived from 3-Methyl-3-Phenyl-2,5-DioxoPyrrolidin-1-Yl-Acetic Acid. Eur. J. Med. Chem. 2015, 102, 14−25. (9) Kamiński, K.; Rapacz, A.; Łuszczki, J. J.; Latacz, G.; Obniska, J.; Kieć-Kononowicz, K.; Filipek, B. Design, Synthesis and Biological Evaluation of New Hybrid Anticonvulsants Derived from N-Benzyl-2(2,5-Dioxopyrrolidin-1-Yl)propanamide and 2-(2,5-Dioxopyrrolidin-1Yl)butanamide Derivatives. Bioorg. Med. Chem. 2015, 23 (10), 2548− 2561. (10) Kaminski, K.; Zagaja, M.; Luszczki, J. J.; Rapacz, A.; AndresMach, M.; Latacz, G.; Kiec-Kononowicz, K. Design, Synthesis, and Anticonvulsant Activity of New Hybrid Compounds Derived from 2(2,5-Dioxopyrrolidin-1-Yl)propanamides and 2-(2,5-Dioxopyrrolidin1-Yl)butanamides. J. Med. Chem. 2015, 58 (13), 5274−5286. (11) Huguenard, J. R. Block of T-Type Ca2+ Channels Is an Important Action of Succinimide Antiabsence Drugs. Epilepsy Curr. 2002, 2, 49−52. 4483

DOI: 10.1021/acs.molpharmaceut.7b00582 Mol. Pharmaceutics 2017, 14, 4476−4484

Article

Molecular Pharmaceutics Silico Approach for the Design of Virtual Anti-Hepatitis C Leads. ACS Comb. Sci. 2017, 19, 501−512. (31) Liu, Q.; Zhou, H.; Liu, L.; Chen, X.; Zhu, R.; Cao, Z. MultiTarget QSAR Modelling in the Analysis and Design of HIV-HCV CoInhibitors: An in-Silico Study. BMC Bioinf. 2011, 12, 294. (32) Liu, Q.; Che, D.; Huang, Q.; Cao, Z.; Zhu, R. Multi-Target QSAR Study in the Analysis and Design of HIV-1. Chin. J. Chem. 2010, 28, 1587−1592. (33) Rosenbaum, L.; Dörr, A.; Bauer, M. R.; Boeckler, F. M.; Zell, A. Inferring Multi-Target Qsar Models with Taxonomy-Based Multi-Task Learning. J. Cheminf. 2013, 5, 33. (34) Häse, F.; Valleau, S.; Pyzer-Knapp, E.; Aspuru-Guzik, A. Machine Learning Exciton Dynamics. Chem. Sci. 2016, 7 (8), 5139− 5147. (35) Chen, K. Deep and Modular Neural Networks. In Springer Handbook of Computational Intelligence; Kacprzyk, J., Pedrycz, W., Eds.; Springer-Verlag: Berlin, Germany, 2015; pp 473−494. (36) Crăciun, M. V.; Neagu, D. C.; König, C.; Bumbaru, S. A Study of Aquatic Toxicity Using Artificial Neural Networks. Lect. Notes Artif. Intell. 2003, 2774, 911−918. (37) Mosier, P. D.; Jurs, P. C. QSAR/QSPR Studies Using Probabilistic Neural Networks and Generalized Regression Neural Networks. J. Chem. Inf. Comput. Sci. 2002, 42, 1460−1470. (38) Yap, C. W. PaDEL-Descriptor: An Open Source Software to Calculate Molecular Descriptors and Fingerprints. J. Comput. Chem. 2011, 32, 1466−1474. (39) Ambure, P.; Aher, R. B.; Gajewicz, A.; Puzyn, T.; Roy, K. NanoBRIDGES” software: Open Access Tools to Perform QSAR and Nano-QSAR Modeling. Chemom. Intell. Lab. Syst. 2015, 147, 1−13. (40) Ballabio, D.; Consonni, V.; Mauri, A.; Claeys-Bruno, M.; Sergent, M.; Todeschini, R. A Novel Variable Reduction Method Adapted from Space-Filling Designs. Chemom. Intell. Lab. Syst. 2014, 136, 147−154. (41) Cassotti, M.; Grisoni, F.; Todeschini, R. Reshaped Sequential Replacement Algorithm: An Efficient Approach to Variable Selection. Chemom. Intell. Lab. Syst. 2014, 133, 136−148. (42) Miller, A. J. Selection of Subsets of Regression Variables. J. R. Stat. Soc. Ser. A 1984, 147 (3), 389−425. (43) Grisoni, F.; Cassotti, M.; Todeschini, R. Reshaped Sequential Replacement for Variable Selection in QSPR: Comparison with Other Reference Methods. J. Chemom. 2014, 28 (4), 249−259. (44) Milano Chemometrics and QSAR Research Group. Reshaped Sequential Replacement Toolbox; Milano Chemometrics and QSAR Research Group, 2016. http://michem.disat.unimib.it/chm/ download/rsrinfo.htm (accessed Nov 1, 2016). (45) Kennard, R. W.; Stone, L. A. Computer Aided Design of Experiments. Technometrics 1969, 11 (1), 137−148. (46) Specht, D. F. Probabilistic Neural Networks. Neural Networks 1990, 3, 109−118. (47) Specht, D. F. A General Regression Neural Network. IEEE Trans. Neural Networks 1991, 2 (6), 568−576. (48) Schmidt, A.; Bandar, Z. ModularityA Concept for New Neural Network Architectures. In Proceeding of the IASTED International Conference on Computer Systems and Applications (CSA’98); IASTED: Irbid, Jordan, 1998. (49) Singh, K. P.; Gupta, S.; Rai, P. Predicting Acute Aquatic Toxicity of Structurally Diverse Chemicals in Fish Using Artificial Intelligence Approaches. Ecotoxicol. Environ. Saf. 2013, 95, 221−233. (50) Parzen, E. On Estimation of a Probability Density Functon and Mode. Ann. Math. Stat. 1962, 33, 1065−1076. (51) Antanasijević, J.; Antanasijević, D.; Pocajt, V.; Trišović, N.; Fodor-Csorba, K. A QSPR Study on the Liquid Crystallinity of FiveRing Bent-Core Molecules Using Decision Trees, MARS and Artificial Neural Networks. RSC Adv. 2016, 6 (22), 18452−18464. (52) Milichap, J. G. Anticonvulsant Drugs. In Physiological Pharmacology: A Comprehensive Treatise; Root, W. S., Hofmann, F. G., Eds.; Academic Press: New York, 1965; p 133.

(53) Löscher, W. Critical Review of Current Animal Models of Seizures and Epilepsy Used in the Discovery and Development of New Antiepileptic Drugs. Seizure 2011, 20 (5), 359−368. (54) Pratim Roy, P.; Paul, S.; Mitra, I.; Roy, K. On Two Novel Parameters for Validation of Predictive QSAR Models. Molecules 2009, 14 (5), 1660−1701. (55) Roy, K.; Kar, S.; Ambure, P. On a Simple Approach for Determining Applicability Domain of QSAR Models. Chemom. Intell. Lab. Syst. 2015, 145, 22−29.

4484

DOI: 10.1021/acs.molpharmaceut.7b00582 Mol. Pharmaceutics 2017, 14, 4476−4484