Prediction of Different Classes of Promiscuous and Nonpromiscuous

5 hours ago - erroneous activity annotations.10−14 The study of “true” compound ... Received: February 21, 2019 ... (activity against 10 or more...
4 downloads 0 Views 856KB Size
This is an open access article published under an ACS AuthorChoice License, which permits copying and redistribution of the article or any adaptations for non-commercial purposes.

Article Cite This: ACS Omega 2019, 4, 6883−6890

http://pubs.acs.org/journal/acsodf

Prediction of Different Classes of Promiscuous and Nonpromiscuous Compounds Using Machine Learning and Nearest Neighbor Analysis Thomas Blaschke, Filip Miljkovic,́ and Jürgen Bajorath*

ACS Omega 2019.4:6883-6890. Downloaded from pubs.acs.org by 46.161.60.210 on 04/16/19. For personal use only.

Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, D-53115 Bonn, Germany ABSTRACT: The ability of compounds to interact with multiple targets is also referred to as promiscuity. Multitarget activity of pharmaceutically relevant compounds provides the foundation of polypharmacology. Promiscuity cliffs (PCs) were introduced as a data structure to identify and organize similar compounds with large differences in promiscuity. Many PCs were obtained on the basis of biological screening data or compound activity data from medicinal chemistry. In this work, PCs were used as a source of different classes of promiscuous and nonpromiscuous compounds with close structural relationships. Various machine learning models were built to distinguish between promiscuous and nonpromiscuous compounds, yielding overall successful predictions. Analysis of nearest neighbor relationships between training and test compounds were found to rival machine learning, indicating the presence of promiscuity-relevant structural features, as further supported by feature weighting and mapping. Thus, although origins of promiscuity remain to be fully understood at the molecular level of detail, our study provides evidence for the presence of structure−promiscuity relationships that can be detected computationally and further explored.

1. INTRODUCTION Multitarget activities of compounds1,2 provide the basis of polypharmacology,3−6 an emerging theme in drug discovery, leading to increasing interest in the use of multitarget drugs.7,8 Promiscuity as the basis of polypharmacology must be clearly distinguished from undesired experimental artifacts,9−13 which cause apparent promiscuous behavior of compounds and yield erroneous activity annotations.10−14 The study of “true” compound promiscuity is complicated by several factors. First and foremost, molecular origins of multitarget activity of small molecules are only little understood at present and there are no general guidelines available that help to identify promiscuous compounds and distinguish them from others. Furthermore, experimental conditions under which compound activity is assessed and assay confidence criteria may vary greatly and activity annotations are often not transferable from one assay system to another.15 Moreover, a data-driven assessment of compound promiscuity is inevitably affected by different test frequencies and ensuing data incompleteness.16 For example, for active compounds reported in the medicinal chemistry literature, it typically remains unclear how extensively these compounds might have been tested against different targets. Taking such complications into consideration, “promiscuity cliffs” (PCs) were introduced as a data structure to organize compounds with different promiscuity and emphasize large differences in promiscuity between structural analogs.17,18 Formally, a PC is defined as a pair of structural analogs having a large difference in the promiscuity degree (PD; i.e., the number of targets a compound is active against). 18 Surprisingly, large PC populations have been identified © 2019 American Chemical Society

among extensively tested screening molecules and active compounds from medicinal chemistry sources.19−21 These PCs have revealed numerous instances of structurally nearly identical compounds with large differences in promiscuity, bringing to light puzzling structure−promiscuity relationships, again raising questions concerning the origins of molecular promiscuity. Among compounds from medicinal chemistry, adenosine triphosphate (ATP) site-directed kinase inhibitors have become a paradigm for polypharmacological compounds, especially in oncology.22,23 Since the ATP binding site is largely conserved across the kinome, these compounds are expected to be promiscuous.24 However, even among ATP site-directed kinase inhibitors, rather different selectivity or promiscuity tendencies have been detected,25−28 making the compounds a prime test case for PC analysis. In fact, large numbers of PCs have also been identified for such kinase inhibitors.21 Previous studies focused on the prediction of “frequent hitters” and potential assay interference compounds.29−31 In this work, we have attempted to put the study of compound promiscuity and PCs onto a new level, going beyond case-bycase investigation and compound activity data analysis. Here, PCs were used as a source of closely related highly promiscuous and nonpromiscuous compounds. Known compounds with potential for false-positive interactions12,13 were excluded. On the basis of qualifying PC compounds, different Received: February 21, 2019 Accepted: April 4, 2019 Published: April 16, 2019 6883

DOI: 10.1021/acsomega.9b00492 ACS Omega 2019, 4, 6883−6890

ACS Omega

Article

Figure 1. Exemplary PCs. In (a), a PC formed by extensively assayed screening compounds is shown that was tested in a large number of shared assays. This cliff involves a consistently inactive compound. (b) shows a PC formed by a highly promiscuous (40 targets) and a nonpromiscuous (single-target activity) kinase inhibitor. In addition, a section of a kinase inhibitor PC network is displayed (blue nodes: inhibitors with PD ≥ 10, gray nodes: inhibitors with PD = 1, edges: pairwise PCs), which contains these compounds and the PCs they form. In (a,b), substituents that distinguish highly promiscuous and nonpromiscuous PC partners are colored red.

they are difficult to discern by expert analysis. Then, it should be possible to exploit such relationships and build models to predict promiscuous and nonpromiscuous compounds. On the other hand, if promiscuity differences between related compounds would be strongly influenced by experimental inconsistencies or data sparseness, no such structure− promiscuity relationships would be detectable by machine learning. Thus, in this case, there would be no sound basis for machine learning and models should not be predictive. Therefore, we selected analogs from PCs and designed different test systems for modeling using a variety of machine learning methods. 2.2. Promiscuity Cliffs and Compound Selection. High-confidence PCs excluding potential assay interference compounds20,21 were taken from two different sources including experimentally validated PCs20 from PubChem screening assays32 and PCs formed by kinase inhibitors21 derived on the basis of activity annotations from the medicinal chemistry literature combining several data sources.21 The majority of these compounds originated from ChEMBL.33

test systems were designed for machine learning. Models were built to distinguish between promiscuous and nonpromiscuous compounds at different levels, starting from PCs with known test frequency and assay overlap and others for which only activity annotations were available. Hence, these predictions also assessed the potential influence of data sparseness on promiscuity differences between structural analogs. In the following, our analysis and findings are presented and discussed.

2. RESULTS AND DISCUSSION 2.1. Study Rationale. Differences in promiscuity between closely related compounds are difficult to reconcile even if only high-confidence activity data are taken into consideration.19 Why structural analogs frequently display large differences in promiscuity remains difficult to understand and might also be influenced by a variety of experimental factors or errors. We reasoned that if detected promiscuity differences predominantly originate from compound structure and properties or, in other words, if defined structure−promiscuity relationships exist, they should be detectable by machine learning, even if 6884

DOI: 10.1021/acsomega.9b00492 ACS Omega 2019, 4, 6883−6890

ACS Omega

Article

Figure 2. Selection strategies for training and test sets. Three different selection strategies (A−C), as discussed in the text, are schematically illustrated with the aid of a model PC network (according to Figure 1b). Selection strategy A is compound-based, whereas strategies B and C are PC-based. The strategies were applied to PCs of screening compounds and kinase inhibitors.

testing, PC-based compound selection strategies were applied to generate different test systems, which further challenged predictions (as discussed in detail in the next section). From PubChem compounds tested in at least 100 shared primary assays, 869 confirmed PCs were selected on the basis of the criteria detailed above, which contained a total of 920 compounds including 226 promiscuous ones, with PD values ranging from 10 to 59. In addition, 5615 kinase inhibitor PCs were obtained that involved a total of 4187 compounds including 599 promiscuous inhibitors with PD values ranging from 10 to 295. As a control, we also randomly assembled data sets of promiscuous and nonpromiscuous compounds. Promiscuous (PD ≥ 10) and nonpromiscuous (PD = 1) kinase inhibitors were randomly selected without the requirement of any structural relationship. For screening hits, we randomly selected compounds tested in at least 100 primary assays with a PD ≥ 10 and PD = 0, as promiscuous and consistently inactive compounds, respectively. On the basis of our modeling strategy, we expected that randomly selected promiscuous and nonpromiscuous compounds would provide easier test cases for machine learning than PC-based compound sets with close structural relationships between promiscuous and nonpromiscuous compounds. 2.3. Compound Test Systems. From selected PC compounds, balanced data sets were derived for learning and testing by applying different selection strategies illustrated in Figure 2. A schematic PC network is shown in which highly promiscuous and nonpromiscuous compounds are represented by blue and gray nodes, respectively. First, highly promiscuous and nonpromiscuous training and test compounds were randomly selected, irrespective of PC membership (strategy A). Second, PC-based strategies (B and C) were applied. Following strategy B, PCs were randomly selected for training sets and test sets divided into subsets of promiscuous and nonpromiscuous compounds. Hence, in this case, training and test sets contained structural analogs with different class labels. Following C, PCs were randomly selected for training sets (in analogy to B). However, in this case, test compounds were selected as nearest neighbors (NENEs) of training instances having different class labels. Thus, strategy C-based test sets also contained analogs of training compounds with different class labels. Accordingly, predictions for strategy B- and Cbased data sets were anticipated to be more difficult than for A-

For consistency, the same PC definition was applied in both cases. As a similarity criterion for cliff partners, the formation of a transformation size-restricted matched molecular pair34 was required, which limited qualifying compounds to typical structural analogs.18 In addition, as a promiscuity difference criterion, ΔPD ≥ 9 was applied. For machine learning, only PCs were selected that involved nonpromiscuous active compounds (kinase inhibitors, PD = 1) or compounds consistently inactive in all assays they were tested in (screening molecules, PD = 1 or PD = 0). Accordingly, highly promiscuous compounds were required to have a PD ≥ 10 (activity against 10 or more targets). For kinase inhibitors, only human kinases were considered as targets. Figure 1a shows screening compounds forming an exemplary experimentally confirmed PC. These molecules were tested in 493 and 469 assays, respectively, including 466 shared assays. The compound on the left was found to be active against 20 different targets, whereas the analog on the right was consistently inactive. In addition, Figure 1b shows compounds from an exemplary kinase inhibitor PC. Furthermore, a section of a global kinase inhibitor PC network is displayed from which this PC originated. The PC network was built from all PCs formed by kinase inhibitors.21 Compounds were represented as nodes and edges indicated pairwise PC relationships. In the PC network, the inhibitors forming the PC shown in Figure 1b were part of a pathway consisting of alternating highly promiscuous (blue nodes) and nonpromiscuous (gray) inhibitors. The compound on the left was active against 40 different kinases and represented a “promiscuity hub” in the network, which formed multiple PCs with nonpromiscuous analogs such as the compound on the right (PD = 1). For kinase inhibitor PCs from medicinal chemistry, no test frequency information was available. Hence, in this case, data sparseness might play a role because PD values might be underestimated if inhibitors were not extensively tested. Why was compound selection for machine learning based on PCs, rather than on individually selected highly promiscuous and nonpromiscuous or inactive compounds? Importantly, selection on the basis of PCs ensured that for each highly promiscuous compound one or more nonpromiscuous analogs were available. Therefore, predictions could not possibly be determined by structurally distinct sets of promiscuous and nonpromiscuous compounds. Moreover, for training and 6885

DOI: 10.1021/acsomega.9b00492 ACS Omega 2019, 4, 6883−6890

ACS Omega

Article

promiscuous compounds were of critical relevance for the predictions. All MCC values were positive and there were no systematic differences between predictions on compound sets of different design following strategy A, B, or C. Next, predictions for kinase inhibitors were evaluated (Table 1). In contrast to screening compounds, no test frequencies were available for kinase inhibitors. Thus, in this case, it could not be excluded that data sparseness might lead to apparent PD values that were too low, possibly resulting in incorrect PC assignments, which would negatively affect the predictions. Hence, if promiscuity would be systematically underestimated, lower prediction accuracy would be expected. However, although some differences were observed, the results for kinase inhibitors were overall readily comparable to those obtained for screening compounds. Again, performance differences between machine learning models were only small and their performance was mostly met by the NENE classifier, also emphasizing the importance of structural similarity for the predictions. Different from screening compounds, however, predictions for kinase inhibitors were notably influenced by data set design strategies. For kinase inhibitors, best predictions were obtained on the basis of random compound selection from PCs according to strategy A, regardless of the method, whereas PC-based selection (strategy B and C) yielded lower accuracy based on ACC and F1 values. In addition, MCC values were consistently lower for strategy B and C than for A, while differences between B and C were only small. On a relative scale, kinase inhibitor predictions on the basis of strategy A-based compound sets were slightly superior to screening compounds, whereas predictions on strategy Band C-based sets reached higher accuracy for screening compounds. These observations were attributable to the presence of higher structural similarity between kinase inhibitors than screening compounds, which further increased prediction challenges associated with strategy B- and C-based data sets. Importantly, for both promiscuous and nonpromiscuous screening compounds and kinase inhibitors, consistently predictive models were obtained. Furthermore, results of control calculations with sets of randomly selected promiscuous and nonpromiscuous compounds are reported in Table 2. As expected, all models reached higher performance levels than PC-based models. For screening compounds, more than 70% accuracy was achieved, which corresponded to an improvement in prediction accuracy of at least 5% compared to the selection strategy A. The

based sets. Composition details of training and test sets are reported in Materials and Methods. 2.4. Computational Models. For training and testing on the basis of strategies A−C, classification models were initially built using a variety of machine learning methods including support vector machine (SVM), random forest (RF), deep neural network (DNN), and graph convolutional network (GCN) algorithms. In addition, a NENE classifier was used. Methodological details are provided in Materials and Methods. Predictive performance was assessed by calculating accuracy (ACC), Matthew’s correlation coefficient (MCC), and F1 values (as defined in Materials and Methods). ACC and F1 values range from 0 (completely incorrect classification) to 1 (fully accurate) and MCC values from −1 (incorrect) to 1 (accurate). Values were averaged over independent trials. 2.5. Predictions. The models were used to systematically predict highly promiscuous and nonpromiscuous screening compounds and kinase inhibitors using training and test sets of different composition. The results are summarized in Table 1. Table 1. Performance of PC-Based Modelsa screening compounds metric

strategy

ACC

A B C A B C A B C

MCC

F1

NENE 0.64 0.68 0.62 0.27 0.35 0.24 0.63 0.69 0.61 kinase

SVM

RF

DNN

GCN

0.72 0.69 0.67 0.43 0.38 0.34 0.71 0.70 0.66 inhibitors

0.71 0.65 0.66 0.41 0.31 0.32 0.69 0.65 0.65

0.66 0.64 0.60 0.31 0.27 0.19 0.65 0.61 0.57

0.69 0.70 0.62 0.38 0.40 0.25 0.69 0.68 0.57

metric

strategy

NENE

SVM

RF

DNN

GCN

ACC

A B C A B C A B C

0.69 0.55 0.52 0.39 0.10 0.04 0.72 0.59 0.59

0.73 0.57 0.55 0.47 0.15 0.11 0.73 0.58 0.61

0.72 0.54 0.54 0.44 0.09 0.08 0.71 0.50 0.56

0.70 0.54 0.53 0.40 0.09 0.07 0.69 0.54 0.57

0.69 0.56 0.56 0.38 0.12 0.11 0.66 0.48 0.52

MCC

F1

a Reported are the mean ACC, MCC, and F1 values for five independent trials per data set strategy using NENE, SVM, RF, DNN, and GCN models.

Table 2. Performance of Models for Randomly Selected Compoundsa

They were surprising, from several points of view. Initially, predictions of screening compounds from experimentally confirmed PCs were evaluated. All models were found to be predictive with an overall accuracy approaching 70%, which we considered an encouraging finding, given the rather challenging prediction condition discussed above. Although the classification calculations were not highly accurate, there was a clear and consistent tendency to distinguish between promiscuous and nonpromiscuous compounds. However, differences between alternative machine learning approaches were only small and there was no performance increase for the complex DNN and GCN models over RF and SVM. More surprisingly, the simple NENE classifier consistently approached or met the performance of machine learning models, indicating that structural relationships between highly promiscuous or non-

screening compounds metric

NENE

ACC MCC F1

0.70 0.40 0.70

metric ACC MCC F1

SVM

RF

DNN

GCN

0.77 0.76 0.54 0.53 0.77 0.76 kinase inhibitors

0.73 0.45 0.73

0.78 0.55 0.78

NENE

SVM

RF

DNN

GCN

0.75 0.45 0.73

0.76 0.52 0.76

0.76 0.52 0.76

0.73 0.45 0.73

0.73 0.47 0.74

a Reported are the mean ACC, MCC, and F1 values for five independent trials per data set using NENE, SVM, RF, DNN, and GCN models.

6886

DOI: 10.1021/acsomega.9b00492 ACS Omega 2019, 4, 6883−6890

ACS Omega

Article

complex GCN model showed considerable performance improvement over the NENE classifier and DNN, but only a small improvement over SVM and RF. For all models, the MCC values were above 0.4; for the SVM, RF, and GCN models, the values were above 0.5. For kinase inhibitors, all models also yielded high accuracy of at least 73%. In this case, the complex DNN and GCN models performed worse than the simple NENE classifier, SVM and RF models. However, as observed for screening compounds, the accuracy of all models was further improved compared to PC-based models, consistent with our expectations. 2.6. Feature Relevance. We also analyzed the influence of individual fingerprint features on the predictions using an SVM-based feature weighting approach (see Materials and Methods). For strategy A-based SVM predictions of kinase inhibitors, which were overall most accurate, features were weighted according to their contributions to correct predictions of promiscuous and nonpromiscuous inhibitors and ranked on the basis of cumulative feature weights. Fingerprint features were clearly differentiated by weighting. For example, the 30 top ranked features for correct predictions of promiscuous and nonpromiscuous kinase inhibitors covered a wide weight range from −27.7 ± 5.5 to 37.9 ± 10.6. Among the five independent trials of the 30 top ranked features, 64 unique features were found to contribute to the correct prediction of promiscuous inhibitors and 74 unique features to the prediction of nonpromiscuous compounds. Thus, different features determined the prediction of promiscuous and nonpromiscuous inhibitors. Furthermore, among top ranked features from independent trials, four consensus features were identified that consistently contributed to the correct prediction of promiscuous inhibitors and four other consensus features that consistently contributed to the prediction of nonpromiscuous compounds. Consensus features were mapped onto exemplary promiscuous and nonpromiscuous kinase inhibitors, as shown in Figure 3. These features formed coherent substructures that were distinct in promiscuous and nonpromiscuous inhibitors, which further rationalized successful predictions at the structural level. 2.7. Concluding Remarks. In this study, we have attempted to systematically distinguish between promiscuous and nonpromiscuous compounds. We have reasoned that machine learning should be capable of correctly classifying these compounds only if detectable structure−promiscuity relationships existed. So far, such relationships have not been elucidated on the basis of expert analysis. Meaningful computational classification should only be feasible if apparent differences in promiscuity were not largely due to varying test frequencies and data sparseness. For our analysis, we have made use of the PC data structure to generate different compound test systems that contained many analog relationships between promiscuous and nonpromiscuous compounds. These structural relationships further challenged machine learning since they needed to be differentiated from other structural patterns that might influence or determine promiscuity. Two different compound classes were investigated including extensively assayed screening molecules for which experimental test frequencies were available and kinase inhibitors from medicinal chemistry, whose activity annotations were likely to be influenced by data sparseness. Different machine learning methods were applied to distinguish between highly promiscuous and nonpromiscuous compounds, yielding similar (moderate to good) prediction accuracy for both

Figure 3. Feature mapping. Shown are two exemplary highly promiscuous (left) and nonpromiscuous (right) kinase inhibitors that were correctly predicted using SVM models. Highlighted substructures were identified by mapping of fingerprint features that consistently contributed to correct predictions in different trials.

compound classes and revealing no detectable influence of potential data sparseness. Unexpectedly, simple NENE classification essentially met the performance of various machine models, which strongly indicated that machine learning was dominated by nearest neighbor relationships between promiscuous compounds and/or nonpromiscuous compounds, despite the presence of analogs with different class labels. Taken together, our findings indicated that structure− promiscuity relationships existed that could be explored computationally and translated into reasonable predictions. Control calculations using randomly selected promiscuous and nonpromiscuous compounds achieved further increased prediction accuracy compared to PC-based modeling, consistent with our study concept and expectations. Predictions on screening compounds with available positive and negative test data were important for establishing proof-ofconcept, together with the prediction of promiscuous versus nonpromiscuous kinase inhibitors. For practical applications, the kinase inhibitor-based model should be particularly relevant. Feature weighting for SVM predictions provided further evidence that different features determined class label predictions and identified consensus features for promiscuous and nonpromiscuous compounds. Mapping of these features identified distinct substructures in test compounds that gave rise to correct predictions. Thus, our computational analysis has provided evidence for the presence of structural characteristics that influence or determine compound promiscuity. Furthermore, it has provided a basis for subsequent investigations to further explore promiscuity-conferring structural patterns from a medicinal chemistry perspective.

3. MATERIALS AND METHODS 3.1. Data Set Composition. The different selection strategies for training and test sets (described in Section 2.3) were applied to assemble the following balanced training and 6887

DOI: 10.1021/acsomega.9b00492 ACS Omega 2019, 4, 6883−6890

ACS Omega

Article

test sets. In each case, five training and test sets were independently generated. 3.1.1. Strategy A. For training and testing, 250 promiscuous and 250 nonpromiscuous kinase inhibitors were randomly selected. Given the limited number of highly promiscuous screening hits, training sets with 165 promiscuous and 165 nonpromiscuous and test sets with 40 promiscuous and 40 nonpromiscuous screening compounds were generated. 3.1.2. Strategy B and C. For kinase inhibitors and screening compounds, 250 and 165 PCs, respectively, were randomly selected for training and divided into promiscuous and nonpromiscuous compounds (strategy B and C). For testing, 250 PCs (kinase inhibitors) and 40 PCs (screening compounds) were randomly selected and divided (strategy B). Alternatively, 250 and 40 promiscuous and nonpromiscuous kinase inhibitors and screening compounds were selected, respectively, which formed a PC with a compound chosen for training (strategy C). 3.1.3. Randomly Selected Compounds. For kinase inhibitors and screening compounds, 1000 promiscuous and 1000 nonpromiscuous were randomly selected and divided into balanced data sets for training and testing (control calculations). 3.2. Molecular Representations. Compounds were represented using the extended connectivity fingerprints with bond diameter 4 (ECFP4).35 ECFP4 is a feature set fingerprint that enumerates layered atom environments and encodes them as integers using a hashing function. ECFP4 produces molecule-dependent feature sets of variable size but can be “folded’ to generate a fingerprint with a constant number of bits. A folded version of ECFP4 comprising 2048 bits was obtained by modulo operation. The RDKit toolkit36 and inhouse Python scripts were used to generate all fingerprints. Furthermore, the graph-based GCN representation was evaluated as an alternative to ECFP4. GCN is a learnable representation inspired by the Morgan circular fingerprint that represents compounds as undirected graphs and employs convolutional layers to create graph-based features.37−39 The DeepChem (version 2.1.0)40 implementation of the GCN representation was used. The derivation of GCN-based models is further discussed below. Fingerprint similarity was quantified by calculating the Tanimoto coefficient (Tc).41 3.3. Classification Models. NENE classification, two state-of-the-art machine learning methods, and two types of neural networks were used for classification. For generating predictive models, training compounds were represented as a feature vector x ∈ X and associated with a class label y ∈ {0,1} distinguishing nonpromiscuous and promiscuous compounds. 3.3.1. Nearest Neighbor Classifier. NENE classifier is a nonparametric classification model. During training, the algorithm only stores the feature vectors and class labels of the training set. A test compound was classified by calculating all Tc values for the training set and returning the class label of the compound with largest Tc. 3.3.2. Support Vector Machine. SVM is a supervised learning algorithm aiming to identify a hyperplane H that best separates two classes using training data projected into the feature space X.42 This hyperplane is defined by a weight vector w and a bias b such that H = {x|w,x + b = 0} and maximizes the margin between the classes. For model generalization, slack variables are added to permit errors of training instances falling within the margin or on the incorrect side of H. The relation between training errors and margin size

is controlled by the regularization hyper-parameter C, which was optimized by 10-fold cross-validation using values of 1, 10, 50, 100, 200, 400, 500, 750, and 1000. As a central part of SVM modeling, training data are projected into a higherdimensional space H if linear separation is not possible in a given feature space X. The projection is facilitated through the use of kernel functions replacing the standard scalar product, the so-called “kernel trick”,42 which circumvents explicit mapping of X into H. The Tanimoto kernel43 was used. SVM models were generated using scikit-learn.44 3.3.3. Random Forest. RF consists of an ensemble of decision trees built from distinct subsets of the training data with replacement,45 known as bootstrapping.46 For the construction of individual trees, a random subset of features is used during node splitting.47 The number of trees per ensemble was optimized by 10-fold cross-fold validation using values of 10, 100, 250, and 500. Furthermore, the number of randomly selected features available at each split (max_features) was set to the square root of the number of ECFP4 features and the minimum number of samples required to reach a leaf node (min_samples_leaf) was set to 1. RF models were built using scikit-learn. 3.3.4. Feedforward Deep Neural Network. DNN derives a function that maps an input value x to a class y, y = f(x;w), and learns the value of parameters w to achieve the best approximation.48 The DNN architecture consists of different layers of neurons: an input layer, multiple hidden layers, and an output layer.49 Each hidden or output neuron assigns weights to the inputs, adds these weights, and passes the sum through a nonlinear activation function

ij yz j z yk = f jjj∑ wkjxj + bk zzz jj zz k j { Here, y is the output of neuron k, f is the activation function, x the input variable (activation neuron in the previous layer), w the weights connecting neuron k with xj, and bk the bias. The summation includes all the neurons adding connections to k.49 Accordingly, each input value is modified by a unique set of weights and biases. During the training phase, weights and biases are modified to obtain the correct output y, which is facilitated by following the gradient of the gradient decent cost function and efficiently calculated using backpropagation.49 For training, data subsets (batches) are used and weights and biases are updated accordingly. Implementations were based on PyTorch version 0.4.1.50 DNN hyper-parameters were optimized by internal validation using 80 versus 20% data splits.51−53 For the learning rate (LR), values of 0.01 and 0.001 were evaluated, the number of epochs was set to 10 or 50, and for the drop-out rate (DO), values of 0%, 25%, 50%, and 80% were tested. Investigated network architectures (values of output features in hidden layers) included [100,100], [250,250], [250,500], [500,250], [100,500], [500,100], [500,250,100], [100,250,500], and [250,100,250]. To these ends, pyramidal, rectangular, and autoencoder architectures were considered during hyper-parameter optimization. The Adam optimization algorithm54 was chosen as the optimization function, the rectified linear unit55 as the activation function, and the batch size was set to 50. 3.3.5. Graph-Convolutional Neural Networks. The GCN representation described above was also used for DNN modeling. Initially, a set of atom features and a neighbor list 6888

DOI: 10.1021/acsomega.9b00492 ACS Omega 2019, 4, 6883−6890

ACS Omega



are generated for each atom and the sum of the neighbors’ features is calculated to provide neighbor information. Learnable parameters include weight matrices and biases used for posterior transformations. The same weight matrices and bias vectors are used for a given layer. After updating atom features, the pooling layer employs an activation function to generate a new set of feature values, which provide the output vector in one layer. This procedure is iteratively carried out and all outputs are summed to obtain the final representation of the compound,56 which provides the input of a fully connected DNN. Hence, in this case, feature extraction and model building are combined. GCN models were generated with DeepChem version 2.1.0.40 Instead of summing the output of several layers, a dense layer followed by a cumulative graph layer was used. The latter layer sums all feature vectors for all atoms to obtain the final representation of a compound. For GCN models, internal validation (80−20%) was also applied, similar to other DNNs. The batch size and number of epochs were set to 50 and LR to 0.001. Parameters were optimized using a two-fold cross-validation. DO values of 0, 20 and 50% were tested. In addition, the following values of output features in hidden graph convolutional layers were evaluated: [64,64], [128,128], [64, 64, 64], [128, 128, 128]. For the dense layer dimension, feature values of 64 and 128 were tested. 3.4. Performance Measures. To assess model performance, three different measures were applied including ACC, MCC, and F1 values. ACC, MCC, and F1 are defined as MCC =

*E-mail: [email protected]. Phone: 49-228-7369-100 (J.B.). ORCID

Jürgen Bajorath: 0000-0002-0557-5714 Author Contributions

The study was carried out and the manuscript written with contributions of all authors. All authors have approved the final version of the manuscript. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS The project leading to this report has received funding (for T.B.) from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement no. 676434, “Big Data in Chemistry” (“BIGCHEM”, http://bigchem.eu). The article reflects only the authors’ view and neither the European Commission nor the Research Executive Agency (REA) are responsible for any use that may be made of the information it contains. We thank the OpenEye Scientific Software, Inc., for providing a free academic license of the OpenEye toolkit.

■ ■

ABBREVIATIONS TP, true positives; TN, true negatives; FP, false positives; FN, false negatives

TP × TN − FP × FN (TP + FP)(TP + FN)(TN + FP)(TN + FN)

3.5. Feature Weighting and Mapping. For SVM, a feature weighting method57 was introduced to evaluate contributions of individual fingerprint features to predictions. Different weights were assigned to features that corresponded to coefficients of primal optimizations. For the Tanimoto kernel (and other nonlinear kernels), feature weights cannot be calculated directly because an explicit mapping to highdimensional space is not available. However, the Tanimoto kernel can be expressed as the sum of feature contributions by using a normalization factor to obtain a constant denominator for each individual support vector. Hence, if fc(x,d) is the contribution of feature d to an individual SVM prediction, the normalization factor is determined by the following equation

∑ support vectors

REFERENCES

(1) Hu, Y.; Bajorath, J. Compound promiscuity: what can we learn from current data? Drug Discovery Today 2013, 18, 644−650. (2) Jasial, S.; Hu, Y.; Bajorath, J. Determining the Degree of Promiscuity of Extensively Assayed Compounds. PLoS One 2016, 11, No. e0153873. (3) Hopkins, A. L. Network Pharmacology: The Next Paradigm in Drug Discovery. Nat. Chem. Biol. 2008, 4, 682−690. (4) Anighoro, A.; Bajorath, J.; Rastelli, G. Polypharmacology: Challenges and Opportunities in Drug Discovery. J. Med. Chem. 2014, 57, 7874−7887. (5) Bolognesi, M. L. Polypharmacology in a Single Drug: Multitarget Drugs. Curr. Med. Chem. 2013, 20, 1639−1645. (6) Bolognesi, M. L.; Cavalli, A. Multitarget Drug Discovery and Polypharmacology. ChemMedChem 2016, 11, 1190−1192. (7) Zimmermann, G. R.; Lehár, J.; Keith, C. T. Multi-Target Therapeutics: When the Whole is greater than the Sum of the Parts. Drug Discovery Today 2007, 12, 34−42. (8) Rosini, M. Polypharmacology: The Rise of Multitarget Drugs over Combination Therapies. Future Med. Chem. 2014, 6, 485−487. (9) Gilberg, E.; Jasial, S.; Stumpfe, D.; Dimova, D.; Bajorath, J. Highly Promiscuous Small Molecules from Biological Screening Assays Include Many Pan-Assay Interference Compounds but also Candidates for Polypharmacology. J. Med. Chem. 2016, 59, 10285− 10290. (10) McGovern, S. L.; Caselli, E.; Grigorieff, N.; Shoichet, B. K. A Common Mechanism Underlying Promiscuous Inhibitors from Virtual and High-Throughput Screening. J. Med. Chem. 2002, 45, 1712−1722. (11) Irwin, J. J.; Duan, D.; Torosyan, H.; Doak, A. K.; Ziebart, K. T.; Sterling, T.; Tumanian, G.; Shoichet, B. K. An Aggregation Advisor for Ligand Discovery. J. Med. Chem. 2015, 58, 7076−7087. (12) Baell, J. B.; Holloway, G. A. New Substructure Filters for Removal of Pan Assay Interference Compounds (PAINS) from Screening Libraries and for Their Exclusion in Bioassays. J. Med. Chem. 2010, 53, 2719−2740.

TP 2 TP + FP + FN

fc Tanimoto(x , d) =

AUTHOR INFORMATION

Corresponding Author

TP + TN ACC = TP + TN + FP + FN F1 = 2 ×

Article

yi λixidxd ⟨x i , x i⟩ + ⟨x , x⟩ − ⟨x i , x⟩

where x is the test instance, xi is the support vector, and yi and λi are support vector coefficients for the dual solution. ECFP4 features were ranked on the basis of their weights and the top 30 features making largest contributions to correct predictions of highly promiscuous and nonpromiscuous compounds, respectively, were further analyzed, transformed into SMARTS strings, and mapped on correctly predicted test compounds. 6889

DOI: 10.1021/acsomega.9b00492 ACS Omega 2019, 4, 6883−6890

ACS Omega

Article

(13) Baell, J.; Walters, M. A. Chemistry: Chemical Con Artists Foil Drug Discovery. Nature 2014, 513, 481−483. (14) Aldrich, C.; Bertozzi, C.; Georg, G. I.; Kiessling, L.; Lindsley, C.; Liotta, D.; Merz, K. M., Jr.; Schepartz, A.; Wang, S. The Ecstasy and Agony of Assay Interference Compounds. J. Chem. Inf. Model. 2017, 57, 387−390. (15) Stumpfe, D.; Tinivella, A.; Rastelli, G.; Bajorath, J. Promiscuity of Inhibitors of Human Protein Kinases at Varying Data Confidence Levels and Test Frequencies. RSC Adv. 2017, 7, 41265−41271. (16) Mestres, J.; Gregori-Puigjané, E.; Valverde, S.; Solé, R. V. Data completeness-the Achilles heel of drug-target networks. Nat. Biotechnol. 2008, 26, 983−984. (17) Dimova, D.; Hu, Y.; Bajorath, J. Matched Molecular Pair Analysis of Small Molecule Microarray Data Identifies Promiscuity Cliffs and Reveals Molecular Origins of Extreme Compound Promiscuity. J. Med. Chem. 2012, 55, 10220−10228. (18) Dimova, D.; Bajorath, J. Rationalizing Promiscuity Cliffs. ChemMedChem 2018, 13, 490−494. (19) Dimova, D.; Gilberg, E.; Bajorath, J. Identification and Analysis of Promiscuity Cliffs Formed by Bioactive Compounds and Experimental Implications. RSC Adv. 2017, 7, 58−66. (20) Hu, Y.; Jasial, S.; Gilberg, E.; Bajorath, J. Structure-Promiscuity Relationship Puzzles-Extensively Assayed Analogs with Large Differences in Target Annotations. AAPS J. 2017, 19, 856−864. (21) Miljković, F.; Bajorath, J. Computational Analysis of Kinase Inhibitors Identifies Promiscuity Cliffs across the Human Kinome. ACS Omega 2018, 3, 17295−17308. (22) Knight, Z. A.; Lin, H.; Shokat, K. M. Targeting the Cancer Kinome through Polypharmacology. Nat. Rev. Cancer 2010, 10, 130− 137. (23) Gross, S.; Rahal, R.; Stransky, N.; Lengauer, C.; Hoeflich, K. P. Targeting Cancer with Kinase Inhibitors. J. Clin. Invest. 2015, 125, 1780−1789. (24) Gavrin, L. K.; Saiah, E. Approaches to Discover Non-ATP Site Kinase Inhibitors. Med. Chem. Commun. 2013, 4, 41−51. (25) Anastassiadis, T.; Deacon, S. W.; Devarajan, K.; Ma, H.; Peterson, J. R. Comprehensive Assay of Kinase Catalytic Activity Reveals Features of Kinase Inhibitor Selectivity. Nat. Biotechnol. 2011, 29, 1039−1045. (26) Zhao, Z.; Wu, H.; Wang, L.; Liu, Y.; Knapp, S.; Liu, Q.; Gray, N. S. Exploration of Type II Binding Mode: A Privileged Approach for Kinase Inhibitor Focused Drug Discovery? ACS Chem. Biol. 2014, 9, 1230−1241. (27) Müller, S.; Chaikuad, A.; Gray, N. S.; Knapp, S. The Ins and Outs of Selective Kinase Inhibitor Development. Nat. Chem. Biol. 2015, 11, 818−821. (28) Miljković, F.; Bajorath, J. Reconciling Selectivity Trends from a Comprehensive Kinase Inhibitor Profiling Campaign with Known Activity Data. ACS Omega 2018, 3, 3113−3119. (29) Jasial, S.; Gilberg, E.; Blaschke, T.; Bajorath, J. Machine Learning Distinguishes with High Accuracy between Pan-Assay Interference Compounds That Are Promiscuous or Represent Dark Chemical Matter. J. Med. Chem. 2018, 61, 10255−10264. (30) Matlock, M. K.; Hughes, T. B.; Dahlin, J. L.; Swamidass, S. J. Modeling Small-Molecule Reactivity Identifies Promiscuous Bioactive Compounds. J. Chem. Inf. Model. 2018, 58, 1483−1500. (31) Stork, C.; Chen, Y.; Š ícho, M.; Kirchmair, J. Hit Dexter 2.0: Machine-Learning Models for the Prediction of Frequent Hitters. J. Chem. Inf. Model. 2019, 59, 1030−1043. (32) Wang, Y.; Bryant, S. H.; Cheng, T.; Wang, J.; Gindulyte, A.; Shoemaker, B. A.; Thiessen, P. A.; He, S.; Zhang, J. PubChem BioAssay: 2017 Update. Nucleic Acids Res. 2017, 45, D955−D963. (33) Gaulton, A.; Hersey, A.; Nowotka, M.; Bento, A. P.; Chambers, J.; Mendez, D.; Mutowo, P.; Atkinson, F.; Bellis, L. J.; Cibrián-Uhalte, E.; Davies, M.; Dedman, N.; Karlsson, A.; Magariños, M. P.; Overington, J. P.; Papadatos, G.; Smit, I.; Leach, A. R. The ChEMBL Database in 2017. Nucleic Acids Res. 2017, 45, D945−D954.

(34) Hu, X.; Hu, Y.; Vogt, M.; Stumpfe, D.; Bajorath, J. MMP-Cliffs: Systematic Identification of Activity Cliffs on the Basis of Matched Molecular Pairs. J. Chem. Inf. Model. 2012, 52, 1138−1145. (35) Rogers, D.; Hahn, M. Extended-Connectivity Fingerprints. J. Chem. Inf. Model. 2010, 50, 742−754. (36) RDKit: Cheminformatics and Machine Learning Software. 2013. http://www.rdkit.org (accessed Jan 17, 2019). (37) Morgan, H. L. The Generation of a Unique Machine Description for Chemical Structures − a Technique Developed at Chemical Abstracts Service. J. Chem. Doc. 1965, 5, 107−113. (38) Duvenaud, D.; Maclaurin, D.; Aguilera-Iparraguirre, J.; GomezBombarelli, R.; Hirzel, T.; Aspuru-Guzik, A.; Adams, R. P. Convolutional Networks on Graph for Learning Molecular Fingerprints. Neural Inf. Proc. Sys. 2015, 28, 2224−2232. (39) Altae-Train, H.; Ramsundar, B.; Pappu, A. S.; Pande, V. Low Data Drug Discovery with One-Shot Learning. ACS Cent. Sci. 2017, 3, 283−293. (40) Ramsundar, B.; Eastman, P.; Leswing, K.; Walters, P.; Pande, V. Deep Learning for the Life Sciences; O’Reilly Media, 2019. (41) Willett, P.; Barnard, J. M.; Downs, G. M. Chemical Similarity Searching. J. Chem. Inf. Comput. Sci. 1998, 38, 983−996. (42) Joachims, T. Making Large-scale SVM Learning Practical. In Advances in Kernel Methods: Support Vector Learning; Schölkopf, B., Burges, C. J. C., Smola, A. J., Eds.; MIT Press: Cambridge, 1999; pp 169−184. (43) Ralaivola, L.; Swamidass, S. J.; Saigo, H.; Baldi, P. Graph Kernels for Chemical Informatics. Neural Network. 2005, 18, 1093− 1110. (44) Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; Vanderplas, J.; Passos, A.; Cournapeau, D.; Brucher, M.; Perrot, M.; Duchesnay, E. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825−2830. (45) Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5−32. (46) Efron, B. Bootstrap Methods: Another Look at the Jackknife. Ann. Stat. 1979, 7, 1−26. (47) Alpaydin, E. Introduction to Machine Learning, 2nd ed.; MIT Press: Cambridge, 2010. (48) Duda, R. O.; Hart, P. E.; Stork, D. G. Pattern Classification, 2nd ed.; Wiley-Interscience: New York, 2000. (49) Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, 2016. (50) Ketkar, N. Introduction to PyTorch. Deep Learning with Python; Apress: Berkeley, CA, 2017; pp 195−208. (51) Ma, J.; Sheridan, R. P.; Liaw, A.; Dahl, G. E.; Svetnik, V. Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships. J. Chem. Inf. Model. 2015, 55, 263−274. (52) Ramsundar, B.; Liu, B.; Wu, Z.; Verras, A.; Tudor, M.; Sheridan, R. P.; Pande, V. Is Multitask Deep Learning Practical for Pharma? J. Chem. Inf. Model. 2017, 57, 2068−2076. (53) Nielsen, M. A. Neural Networks and Deep Learning; Determination Press, 2015. (54) Kingma, D. P.; Ba, J. Adam: A Method for Stochastic Optimization. Presented at the Third International Conference on Learning Representations, San Diego, CA, May 7−9, 2015. arXiv.org ePrint archive, arXiv:1412.6980. https://arxiv.org/abs/1412.6980 (accessed Feb 19, 2019). (55) Nair, V.; Hinton, G. E. Rectified Linear Units Improve Restricted Boltzmann Machines. Proceedings of the 27th International Conference on International Conference on Machine Learning; ICML’10; Omnipress: USA, 2010; pp 807−814. (56) Chen, H.; Engkvist, O.; Wang, Y.; Olivecrona, M.; Blaschke, T. The Rise of Deep Learning in Drug Discovery. Drug Discovery Today 2018, 23, 1241−1250. (57) Balfer, J.; Bajorath, J. Visualization and Interpretation of Support Vector Machine Activity Predictions. J. Chem. Inf. Model. 2015, 55, 1136−1147.

6890

DOI: 10.1021/acsomega.9b00492 ACS Omega 2019, 4, 6883−6890