Predictive Models for Fast and Effective Profiling of ... - ACS Publications

Apr 11, 2016 - require additional computational time for the conformational analysis of the chemical structures.17,19,24 Ligand-based modeling tools c...
1 downloads 10 Views 2MB Size
Subscriber access provided by CITY UNIV LIB

Article

Predictive Models for Fast and Effective Profiling of Kinase Inhibitors Alina Bora, Sorin Avram, Ionel Ciucanu, Marius Raica, and Stefana Avram J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.5b00646 • Publication Date (Web): 11 Apr 2016 Downloaded from http://pubs.acs.org on April 13, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Information and Modeling is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Predictive Models for Fast and Effective Profiling of Kinase Inhibitors Alina Bora,1,2,# Sorin Avram,2,#,* Ionel Ciucanu,1 Marius Raica,3 Stefana Avram4 1

Department of Chemistry, West University of Timisoara, Faculty of Chemistry-BiologyGeography, 16 Pestalozzi Str., 300115, Timisoara, Romania

2

Department of Computational Chemistry, Institute of Chemistry Timisoara of Romanian Academy, 24 Mihai Viteazu Avenue, Timisoara, 300223, Romania

3

Department of Microscopic Morphology/Histology, Angiogenesis Research Center, University of Medicine and Pharmacy “Victor Babes” Timisoara, 2 Eftimie Murgu, Timisoara, 300041, Romania

4

Department Pharmacy II, Discipline of Pharmacognosy, University of Medicine and Pharmacy

“Victor Babes” Timisoara, Faculty of Pharmacy, 2 Eftimie Murgu, Timisoara, 300041, Romania #

These authors contributed equally to the article.

KEYWORDS: kinase inhibitors, human kinome, virtual screening, classification, prediction model, chemical library, drug discovery

ACS Paragon Plus Environment

1

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 43

ABSTRACT

In this study we developed two dimensional pharmacophore-based random forest models for the effective profiling of kinase inhibitors. One hundred seven prediction models were developed to address distinct kinases spanning over all kinase groups. Rigorous external validation demonstrates excellent virtual screening and classification potential of the predictors, and, more importantly, the capacity to prioritize novel chemical scaffolds in large chemical libraries. The models built upon more diverse and more potent compounds tend to exert the highest predictive power. The analysis of ColBioS-FlavRC (Collection of Bioselective Flavonoids and Related Compounds) highlighted several potentially promiscuous derivatives with undesirable selectivity against kinases. The prediction models can be downloaded from www.chembioinf.ro webpage.

INTRODUCTION Kinase inhibitors are widely known as the largest class of new drugs to treat cancers.1 Due to their extensive spread in eukaryotic cells and their central roles in cellular signaling, metabolism, transcription, cell cycle progression, cytoskeletal rearrangement and cell movement, apoptosis, and differentiation, kinases play also a central role in numerous developmental and metabolic human disorders.2,3 More than 500 distinct protein kinases are currently known and are encoded by ~2% of all human genes (the human kinome).2, 4 Kinases share a conserved catalytic domain, which transfers a phosphate from adenosine triphosphate (ATP) to a target protein and modulate enzyme activity. Most kinase inhibitors target the kinase ATP site triggering cross-reactivity problems. During the past years, increasing efforts have been made to effectively address the selectivity problem across human kinases. For example, the development of screening platforms against

ACS Paragon Plus Environment

2

Page 3 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

multiple protein kinases led to the profiling of thousands of small molecules against hundreds of human protein kinases.5-7 The vast thesaurus of kinase-related activity data (e.g., ChEMBLdb,8, 9 PubChem10, 11 etc), resulted over the past years, has fuel prediction models for the identification of novel kinase inhibitors.12-15 The comprehensive review by Ferre et al16 describes published kinome-wide profiling screenings, computational methods and sources of kinase-related bioactivity data employed to model kinase-ligand interactions. In order to maximize success rates, computational approaches are intensively employed to design consistent subsets of molecules with desirable physicochemical and biological properties, reasonable chemical and structural diversity and/or potential activity against the target of focus.17-19 Virtual screening (VS) methods aim to prioritize potential hits based upon a set of known ligands (ligand-based methods) or the crystal structure of the biological target (structurebased methods). The top scored (i.e., highest ranked) compounds, according to the VS method, are further analyzed and submitted for biological determination, thereby increasing hit identification. Ligand-based VS methods comprise similarity search,20 pharmacophore-mapping and machine-learning algorithms.21 The latter type of approaches are extensively applied in cheminformatics and, more specifically, random forest22 modeling has proven to offer excellent results in predicting compound-related quantitative or categorical biological activity based upon a quantitative description of the molecular structure.21,

23

The fast and effective evaluation of

large chemical libraries, i.e., hundreds of thousands of molecules, depends also upon the type of molecular encodings. In such cases, one or two dimensional descriptors are often preferred over the more computationally expensive three dimensional descriptors which require additional computational time for the conformational analysis of the chemical structures.17,19,24 Ligand-

ACS Paragon Plus Environment

3

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 43

based modeling tools can help to efficiently tackle the cross-reactivity problem of kinaseinhibitors, and promote novel chemical scaffolds (scaffold-hopping), with desirable properties, into drug discovery programs. The need to find and model novel kinase inhibitors with adequate kinase-selectivity is triggered and sustained by the numerous types of cancers which are able to bypass the drug target and activate resistance mechanisms.1 In the current study, we aimed to develop a framework of predictors which facilitates the efficient retrieval of kinase-specific inhibitors in large chemical libraries. The capacity of the hereby proposed models are extensively tested for real-life VS, scaffold-hopping and classification. Additionally, we explored the relationship between prediction performance and several properties of the data sets used for modeling: size, chemical diversity and biological activity. Finally we use the predictors to evaluate ColBioSFlavRC (Collection of Bioselective Flavonoids and Related Compounds).18

MATERIALS AND METHODS ChEMBL Kinase Sets. We downloaded UniProt “Human and mouse protein kinases: classification and index” (version 2015_02),25, 26 and used the list of human kinases to extract bioactivity data from ChEMBLdb8,

9

according to the specified UniProt identifiers (UniProt

IDs).25 We retained a number of 93967 bioactivity points of type IC50 (43353 unique compounds and 250 kinases) referring to single protein inhibition, and qualifiers of type equal (“=”) and lesser (“ 0.5), and, in the second case, compounds were ranked according to the predicted class-membership probability values. The classification results were analyzed using well-established metric: sensitivity (Se, or true positive rate, i.e., the ratio of correctly predicted actives to the total number of actives available), specificity (Sp, i.e., the ratio of correctly predicted inactives to the total number of inactives available) and accuracy (Acc, i.e., the ratio of correctly classified compounds in the dataset). In addition, based on the predicted probabilities, we computed the area under the receiver operating curve (AUC), to assess the capacity of the models to separate actives from inactives without imposing a class-membership threshold value.41, 42

ACS Paragon Plus Environment

10

Page 11 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

The virtual screening evaluation relies on metric focusing on the early enrichment in actives: exponential receiver operating curve enrichment (eROCE, α=20),18,

40, 43

ROC enrichment

(ROCE) and the true positive rate at false positive percentages of 0.5%, 1% and 5% (TPR at x% FP).44 For all parameters greater values indicate better performances achieved by the models in actives and inactives identification, class separation, and early enrichment, according to the case. These parameters were computed as shown in Table 2 using our in-house program ETICIv1.6. Table 2. Description of the evaluation parameters to assess the classification and virtual screening capacities of the predictors. Evaluation type

Classification

Sensitivity

Se = TP / A

Specificity

Sp = TN / I

Accuracy

Acc= (TP + TN) / (A + I)

Area under the receiver operating curve

Virtual screening

Equation1

Name

‫ = ܥܷܣ‬1 −

‫ܣ‬

1 ෍ ‫ܴ݅ܲܨ‬ ‫ܣ‬ ݅=1

‫ܣ‬

Exponential ROC Enrichment (α = 20)

1 ܴܱ݁‫ = ܧܥ‬෍ ݁ −‫ߙ ݅ ܴܲܨ‬ ‫ܣ‬

ROC enrichment at x% FP

ROCEx = TPR/FPRx, x = 0.5% , 1%, 5% FPs

TPR at x% FP

݅=1

TPRx, x = 0.5%, 1%, 5% FPs

1

A is total number of actives (positives) in the database, I is total number of inactives (negatives), TP is the number of correctly predicted actives (true positives), TN is the number of correctly predicted inactives (true negatives), FP is the number of miss-predicted actives (false positives), TPR is the ratio of the number of correctly predicted actives to the total number of actives, and FPRi is the ratio of the number of miss-predicted actives to the total number of inactives when the ith active was retrieved in the ranking list. Statistical Significance Test. The two-sample Wilcoxon-Mann-Whitney statistical test was used to compute statistically significant differences between the distributions of two evaluation

ACS Paragon Plus Environment

11

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 43

results. Here, we applied the one-sided test to assess superior or inferior performances (at a significance level of 0.05) between pairs of evaluated instances. These calculations were performed using the function wilcox_test (exact conditional distributions were approximated by 99999 Monte Carlo sampling) in the package “coin”45, 46 available in the R statistical software system.30 Workflow description. A schematic representation of the workflow is shown in Figure 1b. A number of 107 kinase-specific data sets, each containing at least 50 actives and a variable number of inactives, has been extracted from ChEMBLdb, as described above in ChEMBL Kinase Sets. The kinase-corresponding number of inactives was much smaller compared to the actives and thus, we turned to HTS-outcomes, publicly available in PubChem.10 PubChemKinIna molecules were considered, for the purpose of this study, as generally inactive against kinases. The ChEMBL sets of inactives and the PubChemKinIna set were merged with the ChEMBL sets of actives to generate, for each kinase, a training and an external test set (for classification and VS purposes). Finally, all molecules were encoded as ChemAxon 2D-pharmcophore fingerprints to train and test RF predictors.

ACS Paragon Plus Environment

12

Page 13 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 1. Full length (i.e., 210 entries) Chemaxon molecular pharmacophore fingerprint encoding pariwised cations (Ci), anions (Ai), acceptors (A), donors (D), hydophobic(H) and aromatic rings (Ar) between path lengths of 1 and 10 (a); a schematic illustration of the workflow employed to assemble kinase data sets and to train and test predictors of kinase inhibitors (b).

RESULTS AND DISCUSSIONS Kinase sets analysis. In general, the prediction capacities of the RF models should reflect, at least to some extent, the quality and the properties of the data sets they were built on. Schurer & Muskal13 reported that the reliability of predictors vary along with the number of actives in the modeling sets. Here,

ACS Paragon Plus Environment

13

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 43

we extend the analysis by considering the chemical diversity and the biological activity span of kinase inhibitors. For this purpose, we attempted to group kinase sets containing similar properties. For each of the 107 sets of actives we computed five descriptors referring to chemical diversity (the number of BMFs, the average number of actives/BMF and the average Tanimoto distance) and biological activity (median IC50 and IC50 cutoff). The number of actives, throughout the data sets, and the number of corresponding unique BMFs are highly correlated (Pearson’s correlation coefficient value of 0.976), suggesting that the data set-size is implicitly described by the BMFs. We used the package “stats” available in the R statistical software system30 and applied ward.D2 hierarchical clustering based upon Euclidean distances of the scaled descriptor values. In a first clustering run, a distant four-membered cluster has been retrieved as a final cluster and the remaining sets have been reclustered resulting into three more groups (see Figure S2 in Supporting Information). Thereby, the 107 data sets were grouped into four clusters k1, k2, k3 and k4 of sizes: 49, 27, 27 and 4 (see Table 2). In order to help distinguish the various features of the clusters we assessed the statistically significant distribution shifts as defined by the five descriptors. The counts of significant lesser (encoded as “-“) and greater (encoded as “+”) shifts between each pair of clusters are reported in Table 3. For example, the k2 sets distribution, in terms of the average Tanimoto distance, indicates significant higher values compared to one cluster (“+”) and significant lower values to two other clusters (“--”).

ACS Paragon Plus Environment

14

Page 15 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 2. Distribution of 107 kinase inhibitor sets (extracted from ChEMBLdb),8, 9 grouped into four clusters (i.e., k1, k2, k3 and k4), in terms of number of actives (a), number of BMFs (b), averaged Tanimoto distance (c), number of actives/BMF (d), IC50 cutoff (to define actives) (e) and median IC50 (f). The differences between the distributions of the clusters encoded in Table 3 should be read in the context of the absolute values of the descriptors as illustrated in Figures 2a to 2f. Thus, the following observations can be drawn: cluster k1 kinase sets contain the largest number of chemotypes (per data set), shared by numerous, closely-related (suggested by the shortest average Tanimoto distances and relative high number of actives/BMF), very potent kinase inhibitors (~60% of the sets show submicromolar IC50 cutoffs). Compared to k1, clusters k2 and k3 contain kinase sets with a smaller number of chemotypes (and actives), but still higher compared to k4. The data sets encompassed in k2 are characterized by weaker inhibitors (with 93% of the IC50 cutoff values ≥ 1 µM). Cluster k3 indicates fewer, more diverse and highly

ACS Paragon Plus Environment

15

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 43

potent chemotype members. Finally, the small kinase sets grouped in k4 can be described as containing many, diverse and weaker actives per chemotype.

Table 3. Description of statistically significant differences between pairs of kinase clusters (“+” greater values, “-“ lower values compared to another cluster) in terms of five descriptors computed per kinase set of actives: number of BMFs, average Tanimoto distance, number of actives/BMF, IC50 cutoff and median IC50.

Cluster Number of BMFs

Average Tanimoto distance

Number of actives/BMF

IC50 cutoff

Median IC50

k1

+++

---

++

--

--

k2

+-

+--

-

++

++

k3

+-

++

--

--

--

k4

---

++

++

++

++

Virtual screening evaluation. The VS test sets simulate real-live VS scenarios with an average active to inactive ratio (throughout the 107 sets) of 1 to 1242. In this case, the results were evaluated using the ROCbased parameters (see Table 2) which were demonstrated to be robust and reliable.40 The VS performances achieved by the individual models are shown in Figure 3a and 3b (the numerical description of these results and other evaluations are reported in Supporting Information). Except for a few cases (GSK3b, PIK3CG, MAPK9 in k1, and INSR, PRKCA, CSNK1D, MAPK11, IRAK4, CDK5, FYN, CDK7 in k2) the predictors ranked ≥ 70% of the actives before the first 0.5% inactives (the top ~ 250 compounds of the entire data set; Table S5

ACS Paragon Plus Environment

16

Page 17 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

in Supporting Information). Furthermore, except the previously mentioned kinase-models along with MAPK8, PIM1, PIM2, GSK3a in k1, and DYRK1A, RET, MAPK1 in k2, ROCE at 0.5% FP indicates for 89 models powerful early enrichment, i.e., > 150. In Figure 3a and 3b, one can observe that most models in k3 and k4 offer superior enrichment compared to k1 and k2. This is certificated by the pair-wise statistical significance test of the eROCE values (see Table S6 in Supporting Information). Another statistically significant difference was observed at the TPR at 0.5% FP level: k1 values are higher compared to k2 values. Thus, as resulted from the current VS test, the performance of the models throughout the four clusters is generally high and decreases as follows: k3, k4 > k1 > k2. Scaffold-hopping potential. The real value of a VS campaign is reflected by the number of novel molecular scaffolds prioritized and further confirmed in experimental determinations. In consequence, we assessed the capacity of the models to predict novel active scaffolds, i.e., represented by kinase inhibitors containing chemical scaffolds (encoded as Bemis-Murcko frameworks, i.e., BMFs) foreign to the training set. Figure 3c illustrates the capacity of the models to rank novel BMFs before the first 0.5% FPs, i.e., the top 259 (± 78) out of ~38700 compounds. On average, 30.24% (±13.16%) of the unique BMFs at 0.5% FPs represent novel BMFs indicating a relative high potential of the models to prioritize novel, active scaffolds. Furthermore, the chemical diversity among novel actives is substantial, given that almost every novel BMF per kinase set is found in a distinct active (see Table S5 in Supporting Information).

ACS Paragon Plus Environment

17

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 43

Figure 3. Virtual screening evaluation results described by eROCE, TPR at 0.5% FP (a) and ROCE at 0.5%, 1% and 5% FP (b) (further details are available in Supporting Information); (c) the number of BMFs (chemotypes) and novel BMFs (not used for model training) encountered before the first 0.5% FPs along the ranking list (the top ~ 250 compounds); (d) the number of BMFs in the training and the test sets, and that shared by both sets; kinases are illustrated with different colors and point shapes according the kinase-cluster membership: k1, k2, k3 and k4. The analysis of the chemotypes resulted from the random splitting of compounds into training and test sets (see Figure 3d) revealed remarkable diversity in both groups. From the total number of BMFs per kinase set, on average only 22.7% ± 6.7% are shared by both training and test sets. Provided that in the test sets, on average 35.6% ± 12.2% of the scaffolds were novel (i.e., not used for training), we can appreciate that the scaffold-hopping potential of these models is

ACS Paragon Plus Environment

18

Page 19 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

encouraging and might contribute to the effective extension of the chemical space of kinase inhibitors. Classification evaluation. The random forest models were challenged to discriminate between actives and inactives in the external classification kinase test sets. With a few exceptions, i.e., CSNK1D (k2), MAPK11 (k2), CDK5 (k2), FYN (k2) and PRKCZ (k4), the prediction accuracy of the models exceeds 0.8. The AUC performance for 100 models showed values > 0.9 (Figure 4a), suggesting a high capacity to separate actives from inactives. In terms of sensitivity and specificity (Figure 4b), the only statistically significant difference indicates superior k3 sensitivity compared to k1 and k2 (Table S6 in Supporting Information). The prediction of the training samples (internal evaluation) shows similar results (Figure S3 in Supporting Information). Thus, the classification capabilities of the models are generally high and decrease in the following order: k3 > k1 > k2, which is consistent with the VS evaluation results. Based on these results and on the properties of the datasets in the four clusters, we observe that the prediction potential of the models varies along with the chemical diversity (of the substituents, as reflected by the higher average Tanimoto distance more than the number of chemical frameworks) and the inhibitory potency of the compounds in the data set.

ACS Paragon Plus Environment

19

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 43

Figure 4. External validation of the individual models in terms of AUC and Acc (a), and Se and Sp (b); kinases are shown with different colors and point-shapes according to the kinase-cluster membership: k1, k2, k3 and k4 In previous modeling studies of large-scale kinase-inhibitory data, molecular fragment-based and topological descriptors were combined with variables describing the amino acids of the binding-site of the kinases. For example, in Table 4 we shown the performance of the classifiers reported by Cao et al14 and Niijima et al.12 on independent (external) test sets, and the results of the aggregated 15946 compound-kinase pairs predicted by the hereby established models. The results are fairly similar demonstrating that random forest modeling of molecular Chemaxon’s pharmacophore fingerprints can be efficiently applied in ligand-based drug discovery. During the revision period of this work, Chemaxon’s PFs have been successfully tested also for active learning by Lang T et al.31 In this study, for each kinase, the training set consists of an equal number of actives and inactives, thereby avoiding a series of problems generated by class-imbalances in machine learning.47,

48

The external classification sets were also assembled to avoid class skew,

facilitating the use of the classification evaluation metric shown in Table 2.42 The use of AUC, in the evaluation of classifiers, is encouraged over (the more popular) accuracy, due to, e.g.,

ACS Paragon Plus Environment

20

Page 21 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

independence of the decision threshold and invariance to a priori class probability distributions.41, 42 Here, because the external classification sets contain balanced class-labels and share the same distribution as the training data, we have computed and reported both Acc and AUC values along with well-established classification metric unaffected by class skew, i.e., Se and Sp.42

Table 4. Classification results for 15946 compound-kinase pairs of the 107 external classification test sets and results obtained in other studies. Independent test set

Se

Sp

Acc

0.873

0.910

0.892

Cao set1

0.895

0.916

0.907

Niijima set2

0.757

0.726

0.739

Aggregated kinase pairs

compound-

1

values extracted from Cao et al14 - level 3 evaluation of the independent dataset (kinase appears in the training set but inhibitor not); 2values extracted from Niijima et al12 - evaluation conducted on independent data sets (activity cutoff < 1 µM).

Performance across kinase groups. We compared the prediction capabilities of the models grouped also according to the standard classification of the kinome, according to which human kinases are classified into nine groups (134 families and 201 subfamilies).2,

4

The kinase-representatives comprised within the 107

kinases span over nine groups in the following counts: TK 39, CMGC 17, AGK 16, Other 13, CAMK 8, TKL 5, Atypical (PI3/PI4-kinase) 5, STE 3, CK1 1 (see the phylogenetic tree in Figure S4 in Supporting Information).

ACS Paragon Plus Environment

21

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 43

We found that in both cases, VS and classification, a vaguely but statistically significant weaker performances in the case of CK1 and CMGC kinase models compared to the other groups. Still, CMGC kinase models ranked on average 74% (± 11%) of the actives before the first 0.5% FPs and achieved on average a 0.936 AUC score, suggesting high enrichment and classification potential as shown in Figure S5 in Supporting Information.

Assessment of kinome-wide profiling panel results. The studies of Metz et al5 and Anastassiadis et al49 offer kinome–wide profiling panel results and contain mostly independent data. Such outcomes might serve as a reference for comparisons between different studies. The Metz data set comprises more than one hundred and fifty thousand kinase inhibitory values, and more than three thousand eight hundred compounds tested against 172 different protein kinases. The authors provide bioactivity results for 1497 compounds in the supplementary data. Anastassiadis assessed the activity of 178 known kinase inhibitors against a panel of 300 protein kinases. All compounds were tested at a concentration of 0.5 µM in the presence of 10 µM ATP. The compound-kinase pairs and the corresponding activity data have been downloaded for both sets. All compounds were subjected to the preparation protocol used for the training data (see Materials and Methods, section Dataset Preparation and Molecular Descriptors). Moreover, compound-kinase pairs in the Metz and Anastassiadis sets encountered in the training sets were removed. Anastassiadis’s activities were scaled, for each kinase set, between 0 and 1 before applying progressive activity cutoffs, i.e., ≤ 0.1, ≤ 0.2, ≤ 0.3, ≤ 0.4 and ≤ 0.5, to define the class of actives. Inactives were considered compounds with attributed values > 0.9. In the case of Metz profiling data, we used the following activity cutoffs (as pKi) to define actives and inactives: ≥ 7, ≥ 6.3, ≥

ACS Paragon Plus Environment

22

Page 23 of 43

6 for actives and < 6, < 5.3, < 5 for inactives. In both cases, we kept only kinase sets with at least ten actives and ten inactives, predicted the corresponding kinase-activities using the hereby established models and evaluated the classification potential of the aggregated compound-kinase pairs. Table 5. Evaluation results of Metz et al5 and Anastassiadis et al49 profiling panel data using progressive cutoff values to define actives and inactives.

Metz

Data sets

Anastassiadis

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

1

Activity cutoff1 Active

Inactive

Kinase sets

≥7

0.1 (which can be considered low, undesirable selectivity across kinases) as shown in Figure 5. Thus, almost half of the compounds comprised in ColBioSFlavRC can be presumed to exert high bioselectivity against kinases. This finding is consistent with recent studies, as reviewed by Hou and Kumamoto,51 indicating that flavonoids might exert beneficial effects in a multitude of disease states, including cancer, cardiovascular disease, and neurodegenerative disorders, by acting as kinase-selective compounds.51 The predicted kinasespecific inhibitory probabilities of ColBioS-FlavRC are available in Supporting Information for further analysis.

ACS Paragon Plus Environment

25

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 43

Figure 5. Histogram of 3341 flavonoids in ColBioS-FlavRC (Collection of Bioselective Flavonoids and Related Compounds) according to the frequency of hits score (FoH) computed based upon the predicted kinome profile; ginkgetin (biflavonoid) and raloxifene are examples of derivatives with low predicted selectivity (FoH > 0.1) against kinases.

Half of the compounds exhibiting FoH scores higher than 0.1 are glicozilated flavonoids. Biflavonoids, such as ginkgetin, didemethyl-ginkgetin, cupressuflavone and daphnodorin D2, might also act as non-selective kinase inhibitors (FoH > 0.6, see Figure 5). A further interesting encounter was raloxifene (Figure 5), a selective estrogen receptor modulator approved for the treatment and prevention of osteoporosis in postmenopausal women and associated with the reduction in the risk of invasive breast cancer (also in postmenopausal women).52 To our knowledge, there is little if any testing of biflavonoids and raloxifene specifically against kinases

ACS Paragon Plus Environment

26

Page 27 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

reported in the literature. Thus, based on the results obtained here, the kinome-wide profiling of these derivatives might reveal important kinase inhibitory activity.

CONCLUSIONS The prediction models developed in the current study line up with previous efforts to identify compounds with desirable kinase selectivity in drug discovery.18,

19

Here, we developed

prediction models which facilitate the fast evaluation of numerous compounds against 107 kinase-targets covering the most important kinase groups. The chemical space investigated in this study spans over 13334 unique chemical scaffolds comprising 6315 kinase-inhibitory BMFs in the ChEMBL sets, 7396 BMFs in PubChemKinIna and 432 BMFs in ColBioS-FlavRC. The overlap between the data sets is relatively small as can be seen in Figure S6 in Supporting Information. However, at the level of individual kinase-data sets we have demonstrated that the predictors developed herein are able to prioritize novel, active, chemical scaffolds from a diverse pool of over seven thousand BMFs. We propose the ensemble of the 107 kinase models to be employed for drug discovery purposes to explore novel chemical areas of diverse scaffolds. The extensive validation of the models demonstrates high VS, scaffold-hopping and classification performances and provides an optimistic perspective of finding novel seeds for developing highly potent series of kinase inhibitors. In the light of this study, several conclusions can be drawn: (1) the use of random forest models, built upon ChemAxon’s 2D pharmacophore fingerprints, are able to effectively identify novel active chemical scaffolds in large and diverse chemical libraries; (2) the prediction capacity of the models varies along with the chemical diversity and the inhibitory potency of the compounds in the training set; (3) PubChemKinIna offers a consistent and chemically diverse set

ACS Paragon Plus Environment

27

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 43

of compounds (38957 compounds and 6179 chemotypes) which can be used to develop and evaluate VS methods targeting kinases; (4) the evaluation of ColBioS-FlavRC demonstrates high kinome-wide selectivity for many of the available flavonoinds and related compounds. The prediction models can be downloaded for further evaluation and usage from www.chembioinf.ro webpage.

Corresponding Author *E-mail: [email protected] or [email protected], Phone: +40-723-652409 Notes The authors declare no competing financial interest. ACKNOWLEDGMENT The work of Alina Bora was supported by the strategic grant POSDRU/159/1.5/S/137750, Project "Doctoral and Postdoctoral programs support for increased competitiveness in Exact Science research" co-financed by the European Social Fund within the Sectoral Operational Programme Human Resources Development 2007-2013 and performed at West University of Timisoara. This paper was also published under the frame of European Social Found, Human Resources

Development

Operational

Program

2007-2013,

project

no.

POSDRU/159/1.5/S/136893 through the contribution of Stefana Avram. Sorin Avram’s work was supported by a grant of the Romanian National Authority for Scientific Research and Innovation, CNCS – UEFISCDI, project number PN-II-RU-TE-2014-4-0422. Sorin Avram and Alina Bora are indebted to ChemAxon Ldt for providing access to their software.

ACS Paragon Plus Environment

28

Page 29 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

ABBREVIATIONS VS, virtual screening; HTS, high-throughput screening; PubChemKinIna, pubchem kinase inactive compounds; CFs, ChemAxon fingerprints; PFs pharmacophore fingerprints; BMFs, Bemis-Murcko molecular frameworks; RF, random forest; ColBioS-FlavRC, Collection of Bioselective Flavonoids and Related Compounds;

ASSOCIATED CONTENT Supporting Information Available: Contains tables and figures (pdf file) referring to: chemical data set standardization and random forest settings, description kinase sets and of the training and test sets, VS and classification results; panel assay testing, cross-kinase groups evaluation and phylogenetic depiction of the kinome and an assessment of the molecular scaffolds in the data sets. The Excel file contains the PubChemKinIna molecules as smiles encodings, the numerical evaluation results of the models, and the predicted inhibition probabilities for ColBioS-FlavRC derivatives. This material is available free of charge via the Internet at http://pubs.acs.org.

REFERENCES (1)

Knight, Z. A.; Lin, H.; Shokat, K. M. Targeting the Cancer Kinome Through

Polypharmacology. Nat. Rev. Cancer 2010, 10, 130-137. (2)

Manning, G.; Whyte, D. B.; Martinez, R.; Hunter, T.; Sudarsanam, S. The Protein Kinase

Complement of the Human Genome. Science 2002, 298, 1912-1934.

ACS Paragon Plus Environment

29

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(3)

Page 30 of 43

Lahiry, P.; Torkamani, A.; Schork, N. J.; Hegele, R. A. Kinase Mutations in Human

Disease: Interpreting Genotype-Phenotype Relationships. Nat. Rev. Genet. 2010, 11, 60-74. (4). Miranda-Saavedra, D.; Barton, G. J. Classification and Functional Annotation of Eukaryotic Protein Kinases. Proteins 2007, 68, 893-914. (5)

Metz, J. T.; Johnson, E. F.; Soni, N. B.; Merta, P. J.; Kifle, L.; Hajduk, P. J. Navigating

the Kinome. Nat. Chem. Biol. 2011, 7, 200-202. (6)

Posy, S. L.; Hermsmeier, M. A.; Vaccaro, W.; Ott, K. H.; Todderud, G.; Lippy, J. S.;

Trainor, G. L.; Loughney, D. A.; Johnson, S. R. Trends in Kinase Selectivity: Insights for Target Class-Focused Library Screening. J. Med. Chem. 2011, 54, 54-66. (7)

Jacoby, E.; Tresadern, G.; Bembenek, S.; Wroblowski, B.; Buyck, C.; Neefs, J. M.;

Rassokhin, D.; Poncelet, A.; Hunt, J.; van Vlijmen, H. Extending Kinome Coverage by Analysis of Kinase Inhibitor Broad Profiling Data. Drug Discovery Today 2015, 20, 652-658. (8)

Bento, A. P.; Gaulton, A.; Hersey, A.; Bellis, L. J.; Chambers, J.; Davies, M.; Kruger, F.

A.; Light, Y.; Mak, L.; McGlinchey, S.; Nowotka, M.; Papadatos, G.; Santos, R.; Overington, J. P. The ChEMBL Bioactivity Database: an Update. Nucleic Acids Res. 2014, 42, D1083-1090. (9) ChEMBLdb version 20. https://www.ebi.ac.uk/ChEMBL/ (accessed Feb 17, 2015). (10) Wang, Y.; Xiao, J.; Suzek, T. O.; Zhang, J.; Wang, J.; Zhou, Z.; Han, L.; Karapetyan, K.; Dracheva, S.; Shoemaker, B. A.; Bolton, E.; Gindulyte, A.; Bryant, S. H., PubChem's BioAssay Database. Nucleic Acids Res. 2012, 40, D400-412. (11) PubChem BioAssay. https://pubchem.ncbi.nlm.nih.gov/ (accessed Jan 11, 2015).

ACS Paragon Plus Environment

30

Page 31 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(12) Niijima, S.; Shiraishi, A.; Okuno, Y. Dissecting Kinase Profiling Data to Predict Activity and Understand Cross-Reactivity of Kinase Inhibitors. J. Chem. Inf. Model. 2012, 52, 901-912. (13) Schurer, S. C.; Muskal, S. M. Kinome-wide Activity Modeling from Diverse Public High-Quality Data Sets. J. Chem. Inf. Model. 2013, 53, 27-38. (14) Cao, D. S.; Zhou, G. H.; Liu, S.; Zhang, L. X.; Xu, Q. S.; He, M.; Liang, Y. Z. Largescale Prediction of Human Kinase-Inhibitor Interactions Using Protein Sequences and Molecular Topological Structures. Anal. Chim. Acta 2013, 792, 10-18. (15) Buchwald, F.; Richter, L.; Kramer, S. Predicting a Small Molecule-Kinase Interaction Map: A Machine Learning Approach. J. Cheminf. 2011, 3, 22. (16) Ferre, F.; Palmeri, A.; Helmer-Citterich, M., Computational Methods for Analysis and Inference of Kinase/Inhibitor Relationships. Front. Genet. 2014, 5, 196. (17) Avram, S.; Funar-Timofei, S.; Borota, A.; Chennamaneni, S. R.; Manchala, A. K.; Muresan, S. Quantitative Estimation of Pesticide-Likeness for Agrochemical Discovery. J. Cheminf. 2014, 6, 42. (18) Avram, S. I.; Pacureanu, L. M.; Bora, A.; Crisan, L.; Avram, S.; Kurunczi, L. ColBioSFlavRC: A Collection of Bioselective Flavonoids and Related Compounds Filtered from HighThroughput Screening Outcomes. J. Chem. Inf. Model. 2014, 54, 2360-2370. (19) Curpan, R.; Avram, S.; Vianello, R.; Bologa, C. Exploring the Biological Promiscuity of High-Throughput Screening Hits Through DFT Calculations. Bioorg. Med. Chem. 2014, 22, 2461-2468.

ACS Paragon Plus Environment

31

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 43

(20) Willett, P. Similarity Methods in Chemoinformatics. Annu. Rev. Inf. Sci. Technol. 2009, 43, 1-117. (21) Varnek, A.; Baskin, I. Machine Learning Methods for Property Prediction in Chemoinformatics: Quo Vadis? J. Chem. Inf. Model. 2012, 52, 1413-1437. (22) Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5-32. (23) Svetnik, V.; Liaw, A.; Tong, C.; Culberson, J. C.; Sheridan, R. P.; Feuston, B. P. Random Forest: a Classification and Regression Tool for Compound Classification and QSAR Modeling. J. Chem. Inf. Comp. Sci. 2003, 43, 1947-1958. (24) Avram, S.; Avram, S.; Crisan, L.; Pacureanu, L.; Kurunczi, L.; Bora, A. Self-Organizing Map Classification Model for the Prediction of MEK1 Inhibitors. Rev. Roum. Chim. 2015, 60, 167-173. (25) UniProt, C., Reorganizing the Protein Space at the Universal Protein Resource (UniProt). Nucleic Acids Res. 2012, 40, D71-75. (26) UniProtKB. http://www.uniprot.org/ (accessed Feb 4, 2015). (27) Briem, H.; Günther, J., Classifying “Kinase Inhibitor-Likeness” by Using MachineLearning Methods. ChemBioChem 2005, 6, 558-566. (28) Darrouzet-Nardi, A.; Bowman, W. D. Hot Spots of Inorganic Nitrogen Availability in an Alpine-Subalpine Ecosystem, Colorado Front Range. Ecosystems 2011, 14, 848-863. (29) Darrouzet-Nardi,

A.

Hotspots,

R

package

version

1.0.2;

http://CRAN.R-

project.org/package=hotspots.

ACS Paragon Plus Environment

32

Page 33 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(30) R Development Core Team. R: A Language and Environment for Statistical Computing, version 3.1.1; The R Foundation for Statistical Computing: Vienna, Austria, 2014; http://www.R-project.org/. (31) Lang, T.; Flachsenberg, F.; von Luxburg, U.; Rarey, M., Feasibility of Active Machine Learning for Multiclass Compound Classification. J. Chem. Inf. Model. 2016, 56, 12-20. (32)

ChemAxon

JChem

API

package,

version

6.1.0.

2013;

Chemaxon

(http://www.chemaxon.com). (33) Bemis, G. W.; Murcko, M. A. The Properties of Known Drugs. 1. Molecular frameworks. J. Med. Chem. 1996, 39, 2887-2893. (34) Strobl, C.; Malley, J.; Tutz, G. An Introduction to Recursive Partitioning: Rationale, Application, and Characteristics of Classification and Regression Trees, Bagging, and Random Forests. Psychol. Methods 2009, 14, 323-348. (35) Strobl, C.; Hothorn, T.; Zeileis, A. Party on ! - A New, Conditional Variable Importance Measure for Random Forests Available in the party Package. The R Journal 2009, 1, 14-17. (36) Strobl, C.; Boulesteix, A. L.; Zeileis, A.; Hothorn, T. Bias in Random Forest Variable Importance Measures: Illustrations, Sources and a Solution. BMC Bioinf. 2007, 8, 25. (37) Hothorn, T.; Buhlmann, P.; Dudoit, S.; Molinaro, A.; van der Laan, M. J. Survival Ensembles. Biostatistics 2006, 7, 355-373. (38) Martin, T. M.; Harten, P.; Young, D. M.; Muratov, E. N.; Golbraikh, A.; Zhu, H.; Tropsha, A. Does Rational Selection of Training and Test Sets Improve the Outcome of QSAR Modeling? J. Chem. Inf. Model. 2012, 52, 2570-2578.

ACS Paragon Plus Environment

33

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 43

(39) Kuhn, M.; Contributions from Wing, J.; Weston, S.; Williams, A.; Keefer, C.; Engelhardt, A.; Cooper, T.; Mayer, Z.; Kenkel, B.; the R Core Team; Benesty, M.; Lescarbeau, R.; Ziem, A.; Scrucca, L. caret: Classification and Regression Training 2015, R package version 6.0-41; http://CRAN.R-project.org/package=caret. (40) Avram, S. I.; Crisan, L.; Bora, A.; Pacureanu, L. M.; Avram, S.; Kurunczi, L. Retrospective Group Fusion Similarity Search Based on eROCE Evaluation Metric. Bioorg. Med. Chem. 2013, 21, 1268-1278. (41) Jin, H.; Ling, C. X. Using AUC and Accuracy in Evaluating Learning Algorithms. IEEE T. Knowl. Data En. 2005, 17, 299-310. (42) Fawcett, T. An Introduction to ROC analysis. Pattern Recognit. Lett. 2006, 27, 861-874. (43) Avram, S.; Pacureanu, L. M.; Seclaman, E.; Bora, A.; Kurunczi, L. PLS-DA - Docking Optimized Combined Energetic Terms (PLSDA-DOCET) Protocol: a Brief Evaluation. J. Chem. Inf. Model. 2011, 51, 3169-3179. (44) Jain, A. N. Bias, Reporting, and Sharing: Computational Evaluations of Docking Methods. J. Comput.-Aided Mol. Des. 2008, 22, 201-212. (45) Hothorn, T.; Hornik, K.; van de Wiel, M. A.; Zeileis, A. A Lego System for Conditional Inference. Am. Stat. 2006, 60, 257-263. (46) Hothorn, T.; Hornik, K.; van de Wiel, M. A.; Zeileis, A. Implementing a Class of Permutation Tests: The coin Package. J. Stat. Softw. 2008, 28, 1-23. (47) Haibo, H.; Garcia, E. A. Learning from Imbalanced Data. IEEE T. Knowl. Data En. 2009, 21, 1263-1284.

ACS Paragon Plus Environment

34

Page 35 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(48) Janitza, S.; Strobl, C.; Boulesteix, A. L. An AUC-based permutation Variable Importance Measure for Random Forests. BMC Bioinf. 2013, 14, 119. (49) Anastassiadis, T.; Deacon, S. W.; Devarajan, K.; Ma, H.; Peterson, J. R. Comprehensive assay of Kinase Catalytic Activity Reveals Features of Kinase Inhibitor Selectivity. Nat. Biotechnol. 2011, 29, 1039-1045. (50) Sutherland, J. J.; Gao, C.; Cahya, S.; Vieth, M. What General Conclusions Can we Draw from Kinase Profiling Data Sets? Biochim. Biophys. Acta 2013, 1834, 1425-1433. (51) Hou, D. X.; Kumamoto, T., Flavonoids as Protein Kinase Inhibitors for Cancer Chemoprevention: Direct Binding and Molecular Modeling. Antioxid. Redox Signaling 2010, 13, 691-719. (52) Muchmore, D. B., Raloxifene: A Selective Estrogen Receptor Modulator (SERM) with Multiple Target System Effects. Oncologist 2000, 5, 388-392.

ACS Paragon Plus Environment

35

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 36 of 43

Predictive Models for Fast and Effective Profiling of Kinase Inhibitors Alina Bora,1,2,# Sorin Avram,2,#,* Ionel Ciucanu,1 Marius Raica,3 Stefana Avram4 1

Department of Chemistry, West University of Timisoara, Faculty of Chemistry-Biology-

Geography, 16 Pestalozzi Str., 300115, Timisoara, Romania 2

Department of Computational Chemistry, Institute of Chemistry Timisoara of Romanian

Academy, 24 Mihai Viteazu Avenue, Timisoara, 300223, Romania 3

Department of Microscopic Morphology/Histology, Angiogenesis Research Center, University

of Medicine and Pharmacy “Victor Babes” Timisoara, 2 Eftimie Murgu, Timisoara, 300041, Romania 4

Department Pharmacy II, Discipline of Pharmacognosy, University of Medicine and Pharmacy

“Victor Babes” Timisoara, Faculty of Pharmacy, 2 Eftimie Murgu, Timisoara, 300041, Romania

ACS Paragon Plus Environment

36

Page 37 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 1 111x124mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2 169x84mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 38 of 43

Page 39 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 3 108x69mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 4 54x17mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 40 of 43

Page 41 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 5 183x237mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 6 46x14mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 42 of 43

Page 43 of 43

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 7 131x89mm (300 x 300 DPI)

ACS Paragon Plus Environment