J. Med. Chem. 2005, 48, 4031-4039
4031
A Naive Bayes Classifier for Prediction of Multidrug Resistance Reversal Activity on the Basis of Atom Typing Hongmao Sun† Discovery Chemistry, Hoffmann-La Roche Inc., Nutley, New Jersey 07110 Received February 24, 2005
Multidrug resistance (MDR), the ability of cancer cells to become simultaneously resistant to different drugs, remains an unsolved challenge in cancer chemotherapy. The use of MDR reversal (MDRR) agents is a promising approach to overcome this problem. For the design and development of such agents, it would be desirable to have a reliable model to estimate the MDRR activity of compounds. Presented here is a naive Bayes classifier to categorize MDRR agents into active and inactive classes, which uses a universal, generic molecular-descriptor system.1 The naive Bayes classifier was built from a 424 compound training set, selected from 609 druglike compounds in the publicly available “Klopman set”. The model correctly predicted MDRR activities for 82.2% of 185 compounds in a testing set. The cumulative probabilities were proven useful for prioritizing the compounds for testing. The impact of attribute dependences on the performance of the classifier was examined. As an unsupervised learner with no tuning parameters, a naive Bayes classifier is capable of providing an objective comparison of the effectiveness of different molecular descriptors. The relative performance of the classifiers constructed from either an atom-type-based molecular descriptor or the longrange functional-class fingerprint descriptors FCFP_6 or FCFP_2 was compared. Employing an atom typing descriptor with the naive Bayes classification, it enables the interpretability of the resulting model, which offers extra information for the rational design of MDRR agents. Introduction Self-protection from nonnatural compounds is an ability exhibited by many living creatures, from complex species such as human beings to the most simple viruses and microbes. This might explain why drug resistance is such an ubiquitous phenomenon. Given the wide distribution of drug resistance, it is not surprising that a variety of mechanisms are utilized. Populations of viruses, bacteria, and other microorganisms can sustain themselves by exploiting mutations occurring in the enzyme targeted by a particular drug molecule.2,3 The protective effect of the mutations is amplified under the “selection pressure” of chemotherapy. On the other hand, antitumor drug resistance is more complicated and mostly manifests as multidrug resistance (MDR). MDR is cellular resistance to multiple structurally and functionally divergent drugs. This is such a common response to cancer chemotherapy that it is a problem facing almost every effective drug, including the newest agents.4 Cells adopt multiple MDR strategies. MDR can be mediated by (i) reduced drug uptake, (ii) activation of coordinated detoxification systems, (iii) suppression of drug-induced apoptosis, (iv) increased DNA damage repair, or, most importantly and frequently, (v) expression of ATP-dependent efflux pumps with broad drug specificity to reduce intracellular drug concentrations.4 Despite the complexity of MDR resulting from the multiple mechanisms, many approaches have been focused solely on understanding the interaction mechanism of P-glycoprotein (P-gp, MDR1), a prototype of drug efflux ATPase.5-7 Molecular level approaches to † To whom correspondence should be addressed: tel (973) 562 3870, e-mail hongmao.sun@roche.com.
P-gp are twofold: attempts to obtain structures of P-gprelated proteins by crystallography, or through homology modeling, and computational approaches to correlate the structural features of P-gp substrates and inhibitors with their corresponding activities. P-gp is a 170 kDa efflux pump, which exports a diverse group of natural products, chemotherapeutic drugs, and hydrophobic peptides out of cells, driven by ATP hydrolysis. Recently, Seigneuret and GarnierSuillerot8 proposed a homology model of P-gp, based on the Escherichia coli MsbA crystal structure,9 which revealed some interesting structural features of P-gp. However, more work needs to be done to understand the structural basis by which P-gp can recognize so many structurally divergent substrates and inhibitors. MDR represents a long existing hurdle in anticancer drug development. Much effort has been expended in attempting to understand the structure-activity relationship (SAR) of P-gp substrates.10,11 Pharmacophore models12-15 have been constructed in attempts to identify P-gp substrates. Because of the huge structural diversity of P-gp substrates, there has so far been no pharmacophore model or SAR model that is general and conclusive enough to cover a reasonably large chemical space. At the same time, the development of MDR reversal agents, which combat the undesired multidrug resistance phenotype, has received considerable attention.16-18 Many such agents have been synthesized and tested.19-22 The basic theme as summarized from homology modeling, pharmacophore, and quantitative SAR (QSAR) studies is that active MDRR agents are hydrophobic compounds with multiple functional groups carrying
10.1021/jm050180t CCC: $30.25 © 2005 American Chemical Society Published on Web 05/07/2005
4032
Journal of Medicinal Chemistry, 2005, Vol. 48, No. 12
Sun
hydrogen bond acceptors. Unfortunately, these structural features also belong to P-gp substrates, a situation that makes the separation of P-gp inhibitors and substrates a challenging task. Besides, the huge structure diversity of P-gp inhibitors represents another hurdle to construction of a general model applicable to various chemical families. To date, most pharmacophore or QSAR models were constructed from small data sets of structurally homogeneous compounds,20,23-25 and only a few studies were carried out across different chemical classes.11 Klopman et al.26 pioneered the approach of theQSAR of MDRR agents of different chemical classes. Later, Klopman and co-workers27 proposed a more robust QSAR model capable of correctly classifying 82% of external prediction set compounds by using the MULTI-CASE program, trained on a larger data set of 609 compounds (Klopman set). In their study, structural features, or biophores, relevant to MDRR activity were identified, opening the door to a rational design of MDRR agents. By using nine topological descriptors, Bakken and Jurs28 achieved a slightly better classification rate of 83.1% for a 177 compound prediction set selected from the same 609 compound data set but at the cost of losing model interpretation, through linear discriminant analysis (LDA). The success of their models indicated that molecular structures did encode the information needed for differentiating MDRR activities. Encouraged by their success, the application of an atomtype-based molecular descriptor system1 has been extended herein to construction of a naive Bayes classifier for MDRR activities.
subtle difference to its neighbors, and avoiding too few types, where each element is represented by only one atom type. Ideally, the atom types should be as many as needed to summarize the common structural features across diverse molecules, but not more, and as few as to differentiate variance among the similar structures, but not too few. To optimize the classification tree, a primary tree, which was constructed on the basis of experience and chemical intuition, was trained by optimizing the log P predictions of the compounds in Starlist.1 Starlist is a structurally diverse data set, containing nearly 11 000 compounds. The optimization was achieved by analysis of the structures of the outliers, the variable importance in projection , and the standard errors of the coefficients after cross validation. Details on the method of atom-type classification are described in ref 1. Naive Bayes Classifier. Each compound can be represented as a vector a ) 〈a1, a2, ..., an〉, where a1, a2, ..., an are the occurrences of the attribute A1, A2, ..., An, which represent atom types in this study. The probability of a new compound, say methanol CH3OH, belonging to a certain class C, say being MDRR positive, can be expressed as P(+ | amethanol), namely P(C ) + | A1 ) 0, A2 ) 0, ...Ai ) 1, ..., Aj ) 1, ... An ) 0), assuming Ai and Aj correspond to the atom types of carbon and oxygen in CH3OH. When Bayes’s theorem is used31
Method Data Set. The data set used in this study was the 609 compound set, generously provided by Dr. Klopman (“Klopman set”). This structurally diverse 609 compound set represents the largest data set used for MDR SAR development to date. The reported activities were from an assay carried out in P388/ AADR cells by Ramu and Ramu.29,30 ED50 values were measured for the druglike compounds in P388 cells that were resistant to adriamycin (ADR). The MDRR activity, the ability to reverse MDR, was measured as a reversal factor (RF), where
RF )
ED50 with no ADR ED50 with 200 nM ADR
An RF value of 1.0 indicates no MDRR activity at all, while larger RF values correspond to higher MDRR activity. RF values were not continuous in a strict sense, since inactive compounds, which were essential for model development, all shared an RF value of 1.0. As a result, it is more appropriate to represent the activities in a qualitative way. Out of the 609 compounds in the data set, 378 compounds, or 62.1%, were active, with a RF value of greater than 2.0 (as suggested by Klopman et al.27), and 231 were inactive. Atom Type Classification. Atom types may represent the most straightforward and interpretable molecular descriptors. If appropriately assigned, atom types could be powerful in deriving predictive models for different properties.1 In the present study, the type for each atom was determined by its own chemical properties and the neighboring atoms and bonds, reflecting its chemical environment. A classification tree was designed to assign atom types to each atom. To become applicable, a classification tree needs to be optimized. The key steps toward a high-quality classification tree are to determine where to split, and where to stop splitting, the tree. A good atom-type classification tree represents a balance of the two extremes, that is, avoiding too many different atom types, where each atom is assigned a different type to reflect the most
P(+ |A1, A2, ... An) )
P(+|A1, A2, ... An| +)P(+) P(A1, A2, ... An)
(1)
where P(A1, A2, ..., An | +) is the conditional probability of a particular compound, such as methanol, being classified MDRR positive, P(+) is the prior probability, a probability induced from a set of compounds preceding methanol, namely, the training set, and P(A1, A2, ..., An) is the marginal probability of observing the compound methanol. The three probabilities on the right side of eq 1 can be learned from a training set containing a number of compounds with known MDRR activities. In practice, the prior probability P(+) can be straightforwardly estimated from the training set, while the marginal probability P(A1, A2, ..., An) can be ignored, since it is the same to all of the classes. Therefore, the task of determining the class membership of compound methanol is reduced to estimating P(A1, A2, ..., An | +). Generally, P(A1, A2, ..., An | +) is not immediately available, unless the same compound exists also in the training set. By using Bayes theorem recursively, we get
P(A1, A2, ..., An | +) ) P(A1 ) a1| A2 ) a2, ..., An ) an, +) × P(A2 ) a2| A3 ) a3, ..., An ) an, +) × ... × P(An ) an| +) Here, if we “naively” assume that each feature, or atom type Ai, is conditionally independent of every other feature Aj, for i * j, then
P(A1, A2, ..., An | +) ) P(A1 ) a1| +) × P(A2 ) a2| +) × ... × P(An ) an| +) (2) Now each factor in the product can be estimated from a training set
P(Ai ) ai | +) )
count(Ai ) ai ∩ C ) +) count(C ) +)
(3)
Since attributes are seldom independent given the class in reality, the method is called naive Bayes learning. The following strategy is usually applied to avoid evaluating the marginal probability. Assume that there are only two classes, such as in this case, MDRR positive and MDRR negative, and let p- ) P(C ) -) and p+ ) P(C ) +), let pi- ) P(Ai ) ai | C ) -) and pi+ ) P(Ai ) ai | C ) +), then
Multidrug Resistance Reversal Activity
Journal of Medicinal Chemistry, 2005, Vol. 48, No. 12 4033
p+
n
p ) P(C ) +| A1 ) a1, A2 ) a2, ..., An ) an) ) (
∏p
i+)
i)1
z
and
p-
n
q ) P(C ) - | A1 ) a1, A2 ) a2, ..., An ) an) ) (
∏p i)1
i-)
z
where z is marginal probability, a constant. Since p + q ) 1, then
log
p q
) log
p 1-p
)
n
∑(log p
i+
- log pi-) + (log pi+ - log pi-) (4)
i)1
In eq 4, marginal probability is canceled, and p can be evaluated by exponentiating both sides and rearranging the terms. There are still a couple of issues that must be addressed, that is, the zero counts and missing values. Zero counts refer to the situation where a certain atom type never occurs for a given class in the training set. Zero counts will result in zero probability in eq 3 and, in turn, wipe out all the information presented in eq 2.32 A principal solution to the problem is to use Laplace correction. For a two-class problem, the Laplace corrected P(Ai ) ai | C ) +) could be expressed as (count(Ai ) ai | C ) +) + 0.5)/(count(C ) +) + 1.0), as adopted by the program Pipeline Pilot.33 A related issue is missing values, which can occur when classifying a test compound. For example, an atom type may occur in a compound from the test set but not in any compounds from the training set. In general, missing values, or missing atom types, are ignored to avoid introducing unproved information. SciTegic’s Pipeline Pilot (version 3.0) was used to perform the naive Bayes classification.33
Results and Discussion Atom Type Classification. Two hundred and eighteen atom types were identified by the atom-type classification tree, after being trained by log P.1 The 27 correction factors introduced in this study were the same as those used in ref 1. After atom typing, 63 atom types and 5 correction factors were absent in the Klopman set. In total, 177 descriptors were included for model development. Naive Bayes Classifier. The activities of MDRR were classified into two classes: active compounds with an RF greater than 2.0 and inactive compounds with an RF less than or equal to 2.0. To model qualitative data, recursive partitioning (RP) and discriminant analysis have proven effective. RP is powerful in constructing a decision tree on relatively low dimensional variable space, where each individual feature carries a reasonably high percentage of total variance.34 Similarly, LDA also requires that the features be few and independent. Although naive Bayes classification is also known to be optimal when attributes, or features, are independent given the class, it has been illustrated from analysis of a large amount of both artificial and realworld data that naive Bayes classifiers performed surprisingly well, even when they violated the independence assumption.32,35,36 The naive Bayes classifier has been proven to outperform other sophisticated classifiers in text characterization and antispam filtering and has recently started to find application in drug discovery.37-39
Figure 1. Enrichment (a) and ROC (b) plots for the whole data set of 609 compounds.
When the less populated MDRR negative compounds (37.9%) were set as “good” samples and the highly populated MDRR positives as background, the first classifier derived from the whole 609 compound data set gave a correct classification rate of 82.3%, when used to predict the original whole data set. The accuracy of the classifier was comparable to Klopman’s and Jurs’ results.27,28 Figure 1 depicts the enrichment plot and receiver operating characteristic (ROC) plot of the model. The enrichment plot illustrates how fast all of the MDRR negative compounds could be identified if the compounds were tested in a resorted order according to the model. An enrichment curve close to the perfect model is a good indication of the high-prioritization power of the model. In this model, half of the MDRR negative compounds would be found if only 20% of the compounds were tested, compared to 19% if the model is perfect. In other words, the first 20% of the compounds were almost all MDRR negative, as read from the plot. The ROC plot is another useful tool to evaluate the prioritizing power of a model. Since no learner can be perfect, each learner represents a compromise between sensitivity and specificity. Figure 1b demonstrates the tradeoff between sensitivity, the ability of the model to avoid false negatives, and specificity, the ability of the model to avoid false positives. The accuracy, as measured by the area under the ROC curve, is 0.89 for this model. MDRR activity is sensitive to both the MDR cell lines selected for testing and other experimental conditions.27
4034
Journal of Medicinal Chemistry, 2005, Vol. 48, No. 12
Sun
Figure 2. PCA t1/t2 plot. Open diamonds (]) are training set compounds, and open squares (0) are testing set compounds.
The MDRR activities of 609 compounds in Klopman’s data set were determined under the same experimental conditions on the same MDR cell line, eliminating the potential errors resulting from the combination of data from different sources. However, MDRR activities were measured in vitro, and no in vitro experiment is completely free of experimental uncertainty. This uncertainty may cause misclassification of compounds, especially those near the threshold activity of 2.0. There were 77 compounds having an MDRR activity between 1.5 and 2.5, among which 26 compounds were misclassified by the classifier, a correct rate (66.2%) significantly lower than that of the whole data set. One more source of false positives worthy of notice is that a drug, if toxic, may be misinterpreted as a P-gp inhibitor, even though it does not interact with P-gp. To assess the robustness of the model, a fivefold cross validation was carried out. The original 609 compound data set was randomly split into five equal groups. While one-fifth of the data was kept out of model development, naive Bayes classifiers were built on the basis of the rest of the data points, then the MDRR activities of the compounds left out were predicted and compared with the actual activities. Each data point was left out only once. Cross-validation results showed that 479 compounds out of 609 were correctly classified, that is, a 78.7% success rate, indicating a robust model. A demanding and rigorous way of testing predictive performance of a model is to predict an independent external data set. In the current study, a testing set was split off from the original data set by selecting compounds from a t1/t2 score plot, where t1 and t2 were the first two principal components of a principal component analysis (PCA) of the atom-type matrix of the
original data set. The testing set was selected so that it could reflect the distribution of the original data set, such that each compound in the testing set had at least one close neighbor in the training set, as shown in Figure 2.40 Finally, 185 compounds were picked as a testing set, among which 81 compounds were MDRR negative, with a ratio of negative compounds close to that of the original data set. The classifier built from the 424 compound training set predicted the MDRR activities of the testing compounds with a success rate of 82.2%. The p-value of the model was less than 0.0001, indicating that the model was statistically significant. One advantage of a Bayes classifier is that it not only predicts the class membership of a compound but quantitatively describes the confidence level of the prediction. The likelihood of a testing compound belonging to a certain class was expressed as the cumulative conditional probability of each atom in the compound. A positive number indicates that the compound is more likely to be MDRR negative in this case. Table 1 lists all of the testing compounds as sorted in an ascending manner by their cumulative probabilities. The first 20 compounds were most likely to be MDRR positive, which were 100% correctly predicted, and the last 20 compounds were most likely to be MDRR negative, where 19 out of 20 were correctly classified. If the first third of the compounds were selected for testing, there would be 55 out of 62 compounds MDRR positive, while the last third of the compounds would have 52 out of 62 compounds negative. In other words, nearly two-thirds of MDRR negatives would be filtered out if the last onethird compounds were excluded from the testing. Impact of Feature Dependences on the Performance of the Classifier. A major concern of utilizing
Multidrug Resistance Reversal Activity
Journal of Medicinal Chemistry, 2005, Vol. 48, No. 12 4035
Table 1. The Cumulative Possibilities of the 185 Compounds in the Test Set, Together with Their Experimentally Determined MDRR Activities and Class Membership ID
activity
classa
prediction
ID
activity
class
prediction
ID
activity
class
prediction
107 167 24 33 296 537 315 541 112 130 110 108 28 59 542 290 287 253 484 342 16 15 300 183 521 119 286 550 317 316 244 303 122 329 527 145 494 529 533 12 127 380 241 27 227 257 111 376 568 362 175 49 509 590 251 292 353 43 281 309 69 228
6.7 10 10 16.7 44.4 40 6.6 100 7.1 65 30 13.3 5 13.3 100 26.6 100 7.5 2.3 125 1 10 66.7 4.4 13.3 6 44.4 17.8 3.2 19 4 1.3 5.6 20 10 8.3 1.5 1.7 1 6 4.4 1 16.7 16.7 44.4 12 6.3 4.4 7.5 7.5 10 6 1 34 16.7 200 7.5 45 5 5 5.6 44.4
G G G G G G G G G G G G G G G G G G G G P G G G G G G G G G G P G G G G P P P G G P G G G G G G G G G G P G G G G G G G G G
-12.7177 -12.1454 -12.1105 -11.3487 -10.6484 -9.0850 -8.6373 -8.3062 -8.2443 -8.1936 -8.0827 -7.4586 -7.3355 -7.2540 -6.9962 -6.7338 -6.5237 -6.3609 -6.3339 -6.3235 -5.9855 -5.9280 -5.8809 -5.8532 -5.7775 -5.6979 -5.6763 -5.6579 -5.4881 -5.3434 -5.2827 -5.2021 -5.0604 -4.9698 -4.7923 -4.7731 -4.7349 -4.7314 -4.6980 -4.6750 -4.4292 -4.3849 -4.3035 -4.1915 -4.0595 -4.0541 -3.9540 -3.9494 -3.8681 -3.8493 -3.7763 -3.7133 -3.6427 -3.6378 -3.1437 -3.0849 -3.0357 -2.9861 -2.9654 -2.8295 -2.7820 -2.6494
423 593 84 307 37 328 41 250 427 83 132 258 375 200 71 99 510 478 413 254 581 472 214 318 585 236 564 282 98 174 272 341 475 11 332 265 471 278 606 166 305 553 218 165 333 219 117 570 164 320 267 582 481 368 116 433 499 503 515 607 180 400
1 8 3.3 10 10 22.5 8 22.2 1 5.6 2.5 13.3 1.3 2.5 5 4 7.7 1.6 1.3 15 1 4.4 7.5 6 10 4.2 1.6 1 10 10 50 16.7 1.3 2.7 2 40 1.3 2.5 1 5 10 4.4 4 1 2.5 1.5 2.3 10 5 60 100 1 1.7 1.7 2 3 2.3 1 1.5 1 1 1.2
P G G G G G G G P G G G P G G G G P P G P G G G G G P P G G G G P G P G P G P G G G G P G P G G G G G P P P P G G P P P P P
-2.5324 -2.5073 -2.4140 -2.3623 -2.3047 -2.2906 -2.2065 -2.1883 -2.1710 -2.1656 -2.1438 -2.1339 -2.1244 -1.9327 -1.8971 -1.8610 -1.8544 -1.7729 -1.7228 -1.7045 -1.6832 -1.5278 -1.4670 -1.3964 -1.2640 -1.2426 -1.2373 -1.1941 -1.1931 -1.1077 -1.1002 -0.8843 -0.8149 -0.8075 -0.7875 -0.7739 -0.7645 -0.6602 -0.6363 -0.5733 -0.4328 -0.4137 -0.4079 -0.4058 -0.3731 -0.2964 -0.2099 -0.1195 -0.0843 -0.0788 -0.0121 0.0599 0.2044 0.2166 0.3325 0.3843 0.4704 0.5140 0.5839 0.8161 0.9364 0.9479
447 370 369 259 367 366 138 518 602 421 310 202 94 501 482 459 105 500 158 157 595 93 352 137 187 154 556 466 197 464 468 193 436 492 159 456 415 392 418 283 511 95 416 506 364 406 87 398 498 355 365 458 551 347 428 435 86 596 455 384 385
2 1 1 1 1 1 1 1 1 1.8 5 1 4 5.8 1 15 2.2 1 1 3 20 1 1 1 1 1.7 3 1.3 1 1 1 1.7 1 1 1 1 1 1 1.7 7.5 1 6 1 1 1 1 1 1.3 1 2 1.7 1 1 1 1 1 1 1 1 1 1
P P P P P P P P P P G P G G P G G P P G G P P P P P G P P P P P P P P P P P P G P G P P P P P P P P P P P P P P P P P P P
0.9493 0.9596 0.9780 1.1723 1.2580 1.2793 1.4206 1.4302 1.4302 1.4820 1.5032 1.5426 1.5678 1.6016 1.6041 1.6260 1.7112 1.7448 1.8442 1.8782 1.9145 1.9480 1.9612 2.1329 2.1802 2.2037 2.2725 2.3176 2.4842 2.5231 2.7726 2.7743 2.8783 2.8845 2.8946 3.0368 3.0376 3.0931 3.1480 3.1985 3.2215 3.3336 3.3638 3.4513 3.5473 3.6999 3.7329 3.7813 3.7939 3.9473 4.3122 4.3871 4.4975 4.5688 4.6837 4.7467 4.7936 4.9933 5.8801 7.1403 7.3105
a
“G” means MDRR active, “P” means MDRR inactive.
the naive Bayes classifier is the assumption of feature independency. After all, features are seldom independent of each other in the real world. However, it has been widely accepted in the statistical community that the feature dependency is not a good predictor of a naive Bayes classifier’s performance. The true region of optimal performance of a naive Bayes classifier is in fact far greater than that implied by the feature independence assumption. Even when a naive Bayes classifier
is not in its optimal region, it can still achieve higher accuracy in many domains than other more sophisticated learners.32 Therefore, feature dependence should not be a limiting concern to the application of naive Bayes classification. In this study, features used to describe a molecule were a vector of the counts of the atom types. Each atom type is assigned according to its neighboring atoms, so atom-type-based molecular descriptors are inevitably
4036
Journal of Medicinal Chemistry, 2005, Vol. 48, No. 12
Sun
interrelated.1 To investigate the impact of feature dependency on the performance of the classifier, pairwise atom-type correlation analysis was carried out, and the highly correlated atom types with over 80% correlation level were removed. The classifiers based on the reduced descriptors gave a correct rate of 82.4% for the whole data set and 81.1% for the testing set. The performance of the classifiers did not improve by reducing the dependency of the features. The results can be explained by the tolerance of feature dependence with respect to the optimal performance region of the naive Bayes classifier and by the observation that only if the feature dependence causes information loss, then will it have a negative influence on the performance.35 Comparison with Fingerprint Descriptors. As mentioned earlier, naive Bayes classification is an unsupervised learner, with no fitting process and no tuning parameters. The process of Bayes learning is to search through each feature in an unbiased way for those with separation power. A set of features is considered effective if the occurrences of the features are uneven across the existing classes, so that a subset of the features is more relevant in determining the membership of the classes. Similarly, any chemical feature of a molecule can be selected as an input molecular descriptor for Bayes learning but various combinations of different molecular descriptors will demonstrate different discriminating powers against a target property. Since FCFP_6 is a fingerprint-type molecular-descriptor system supplied by Pipeline Pilot, it is interesting to compare the performance of relatively low dimensional atom-type-based descriptors to that of fingerprint descriptors of high dimension. The classifier built from FCFP_6, together with other physicochemical properties, including A log P, molecular weight, number of hydrogen bond donors and acceptors, and number of rotatable bonds, correctly characterized 87.7% of the whole data set, which was better than the 82.3% success rate from the atom types. The success rate dropped to 71.9% when an FCFP_6 classifier constructed from the 424 compound training set was used to predict the activities of the 185 compound testing set. The higher accuracy of FCFP_6 descriptors in predicting the whole data set might be due to its coverage of more detailed structural features in the fingerprint bits; while the reduced accuracy of FCFP_6 descriptors in predicting the external data set might result from more missing values which were ignored when predicting the membership of the testing compounds, because of the higher specificity of each fingerprint bit. These results imply that fingerprint descriptors, such as FCFP_6, will be more powerful for applications involving large training sets, such as high-throughput screening,41 while lowdimensional descriptors, such as atom typing, will be more suitable for small training sets, such as certain adsorption, distribution, metabolism, excretion, and toxicity properties measured in vitro or in vivo. Reducing the fingerprint specificity by choosing FCFP_2, instead of FCFP_6 descriptors, in combination with other physicochemical properties, did not improve the predictive power of the model, which resulted in an identical success rate of 71.9% but reduced ROC accuracy from 0.93 to 0.84.
Model Interpretation. A good model should be not only accurate in prediction but interpretable. The interpretability of a model is determined mostly by two factors, molecular descriptors on which to base the model and the methodology adopted to derive the model. Those molecular descriptors that depict whole-molecule properties, such as log P, or topological descriptors, are more difficult to be translated into simple guidance for synthetic chemists with respect to how to improve the property.42 On the other hand, atom types are simple and meaningful to chemists. As discussed earlier, atom typing is a method to depict molecular structure by analyzing each atom within the context of its neighbors; thus atom types are potentially good molecular descriptors for the construction of interpretable models. However, using atom types as descriptors has one major limitation; that is, atom types are large in number and likely to be intercorrelated.1 As a result, it is improper to manipulate these atom-type variables by multiple linear regression or LDA. As previously discussed, naive Bayes classifier is not limited by feature dependency. Besides, Bayes learning scales linearly with respect to the features, so it is fast and efficient in processing large data sets with high-dimensional features. Unlike artificial intelligence approaches, such as artificial neural networks, a Bayes learner enables one to trace the contributions of each feature to the determination of its class membership. Since the atom types implicitly carry fragmental information, it was straightforward for Bayes learners to identify the chemical moieties that related to MDRR activity on the basis of atom types. At the same time, no fragments were explicitly predefined, which reduced the possibility of introducing bias at the beginning of the model construction, enabling learners to identify structural features that were not well defined by usual fragments. Pipeline Pilot generates a statistics table for Bayes classification, which contains the feature name, the feature counts in the whole data set and subset of a selected class, and an adjusted probability of the feature. Each feature was further binned into subgroups to reflect its occurrence. The statistics table is sorted according to the importance of the features in categorizing the training molecules. The most important descriptors were molecular weight (M1) and atom types C3, O7, H2, H4, F1, and C21. It became obvious, by analyzing the 609 compound data set, that small molecules were more likely to be MDRR negative, while large molecules tended to be positive. Indeed, 18 compounds in the data set with molecular weights less than 200 were all MDRR negative, while 41 out of 49 compounds with MW > 500 were positives. This observation was in agreement with conclusions obtained from both QSAR and homology modeling.8,25,43 The atom type with the strongest discerning power was C3, an unsubstituted aromatic carbon neighbored by two other aromatic carbons. When atom type C3 occurred exactly twice in a molecule, the molecule had a 95% (19/20) chance to be MDRR negative. On the other hand, 73 out of 79 compounds whose C3 occurrence was greater than 10 were MDRR positive, indicating that compounds with multiple phenyl rings have a high likelihood of being MDRR positive. This result can be
Multidrug Resistance Reversal Activity
Journal of Medicinal Chemistry, 2005, Vol. 48, No. 12 4037
Figure 3. The structures of the 11 acidic compounds containing atom type O7.
Figure 4. The structures of the close analogues containing atom type C7.
explained by the observation of a high density of aromatic residues in the transmembrane ligand-binding segments of P-gp.44 The conclusion was reinforced by the occurrence of atom type H2, a proton attached to an aromatic carbon atomsit occurred more than 11 times in 77 compounds and 72 of those were MDRR positive. The H4, acidic protons, and the O7, hydroxyl oxygen in an acidic group, were highly correlated. All of the 11 compounds containing an acidic group were MDRR negative (Figure 3). These observations indicated that introducing an acidic group would cause a compound to lose MDRR activity, while slightly basic anticancer drugs had a better chance to be MDRR agents by themselves. A carbon in an aliphatic ring bonded to another aromatic carbon atom, C21, occurred twice in only five MDRR negative compounds, implying an increased tendency to be MDRR negative for compounds with an aliphatic ring coupled with an aromatic ring. There was an interesting trend that fluorine substitution of an aromatic ring increased the chance of a compound being MDRR positive. Twenty six out of thirty four compounds were MDRR positive, when aromatic fluorine occurred once, and the ratio increased to 15/16 when it occurred twice. The importance of nitrogen atoms in MDRR activity has been discussed by both Ecker and Klopman.24,27,45 Ecker et al.45 concluded, by investigating 12 nitrogen containing compounds, that the interaction of the nitrogen atom with P-gp was nonionic and was determined by the sum of the hydrogen acceptor strengths
of the region. Klopman et al. found that compounds with the CH2-CH2-N-CH2-CH2 group were mostly active.27 In the present study, N16, a tertiary aliphatic nitrogen in a ring, was among the most important nitrogens to discriminate active compounds from inactives. More than 74% (220/296) of the compounds with N16 were active, while the active percentage increased to 84.4% (65/77), when N16 occurred twice in the molecule, such as in piperazines. Eighteen out of nineteen compounds (94.7%) with N50, an aliphatic nitrile nitrogen, were found to be active, while only 1/11 compounds containing N52, an aromatic primary amine nitrogen, was active. The results were different from both Ecker and Klopman’s conclusions, illustrating that the charged nitrogen was important in the P-gp interaction but that it was not the only reason for MDRR activity. Bayes classification is a machine learning method, where a classifier cannot learn anything beyond the learning materials fed to the “machine”; on the other hand, no training set is perfect in terms of chemical space coverage. As a result, one needs to be very cautious in drawing any conclusion from the analysis of the statistics table, to avoid overinterpretation. For example, some atom types, such as C7, an aromatic carbonyl carbon neighbored by two other aromatic carbons, showed great discriminating power with a 1/7 active ratio, but because of the limitation of the data set itself, C7 occurred only in close analogues (Figure 4), weakening the conclusion that compounds containing
4038
Journal of Medicinal Chemistry, 2005, Vol. 48, No. 12
C7 tend to be inactive. Another example showing the limitation of compound selection of the training set is C52, a tertiary aliphatic carbon atom surrounded by all carbon neighbors. Twenty out of the twenty one compounds containing two C52 atoms were MDRR positive, but all of these 21 compounds were structurally similar and they all carried a piperazine ring. Similarly, there exists a danger of deriving a misleading conclusion from an incomplete data set. Atom type F2, aliphatic fluorine, was mostly found in MDRR positive compounds, with an active ratio of 95% (19/20), but it was improper to state that introducing aliphatic fluorine atoms would increase MDRR activity. Detailed structural analysis indicated that all 19 MDRR active molecules actually contained a trifluoro methyl group (CF3), while the only inactive compound with F2 did not have a CF3 group. So, a more appropriate conclusion would be that compounds with a trifluoro methyl group tend to be active. Summary The application of a universal molecular-descriptor system has been extended to the field of MDRR activity prediction. The naive Bayes classifiers built from the atom-type-based molecular descriptors were both predictive and robust, as indicated by cross validation. In the external validation test, the classifier correctly categorized 82.2% of a 185 compound testing set, while the cumulative probabilities were useful in prioritizing the compounds for testing. Reducing the feature dependency did not improve the performance of the Bayes classifier, supporting the observation that Bayes classifiers are optimal under a far broader range of conditions than previously thought.32 Model interpretation via a statistics table identified the specific atom types and fragments that contributed most significantly to MDRR activities, and this information can be used as a guideline for rational design of MDRR agents. Another potential application of the model would be to eliminate the undesirable multidrug resistance of the anticancer drugs, by directly introducing fragments that can increase MDRR activities. Acknowledgment. The author would like to acknowledge Dr Sung-Sau So and Dr. David Fry for critical reading of the manuscript and insightful suggestions. The author is also grateful to Dr. G. Klopman for providing compound structures and RF values. Supporting Information Available: A list of the cumulative probabilities of the molecules in the whole Klopman set. This material is available free of charge via the Internet at http://pubs.acs.org.
References (1) Sun, H. A universal molecular descriptor system for prediction of logP, logS, logBB, and absorption. J. Chem. Inf. Comput. Sci. 2004, 44, 748-757. (2) Balzarini, J. Suppression of resistance to drugs targeted to human immunodeficiency virus reverse transcriptase by combination therapy. Biochem. Pharmacol. 1999, 58, 1-27. (3) Wright, G. D. Mechanisms of resistance to antibiotics. Curr. Opin. Chem. Biol. 2003, 7, 563-569. (4) Gottesman, M. M.; Fojo, T.; Bates, S. E. Multidrug resistance in cancer: role of ATP-dependent transporters. Nat. Rev. Cancer 2002, 2, 48-58. (5) Ambudkar, S. V.; Dey, S.; Hrycyna, C. A.; Ramachandra, M.; Pastan, I.; et al. Biochemical, cellular, and pharmacological aspects of the multidrug transporter. Annu. Rev. Pharmacol. Toxicol. 1999, 39, 361-398.
Sun (6) Hipfner, D. R.; Deeley, R. G.; Cole, S. P. Structural, mechanistic and clinical aspects of MRP1. Biochim. Biophys. Acta 1999, 1461, 359-376. (7) Borst, P.; Evers, R.; Kool, M.; Wijnholds, J. The multidrug resistance protein family. Biochim. Biophys. Acta 1999, 1461, 347-357. (8) Seigneuret, M.; Garnier-Suillerot, A. A structural model for the open conformation of the mdr1 P-glycoprotein based on the MsbA crystal structure. J. Biol. Chem. 2003, 278, 30115-30124. (9) Chang, G.; Roth, C. B. Structure of MsbA from E. coli: a homolog of the multidrug resistance ATP binding cassette (ABC) transporters. Science 2001, 293, 1793-1800. (10) Osterberg, T.; Norinder, U. Theoretical calculation and prediction of P-glycoprotein-interacting drugs using MolSurf parametrization and PLS statistics. Eur. J. Pharm. Sci. 2000, 10, 295-303. (11) Stouch, T. R.; Gudmundsson, O. Progress in understanding the structure-activity relationships of P-glycoprotein. Adv. Drug Delivery Rev. 2002, 54, 315-328. (12) Seelig, A. A general pattern for substrate recognition by Pglycoprotein. Eur. J. Biochem. 1998, 251, 252-261. (13) Penzotti, J. E.; Lamb, M. L.; Evensen, E.; Grootenhuis, P. D. A computational ensemble pharmacophore model for identifying substrates of P-glycoprotein. J. Med. Chem. 2002, 45, 17371740. (14) Ekins, S.; Kim, R. B.; Leake, B. F.; Dantzig, A. H.; Schuetz, E. G.; et al. Application of three-dimensional quantitative structureactivity relationships of P-glycoprotein inhibitors and substrates. Mol. Pharmacol. 2002, 61, 974-981. (15) Garrigues, A.; Loiseau, N.; Delaforge, M.; Ferte, J.; Garrigos, M.; et al. Characterization of two pharmacophores on the multidrug transporter P-glycoprotein. Mol. Pharmacol. 2002, 62, 1288-1298. (16) Suzuki, T.; Fukazawa, N.; San-nohe, K.; Sato, W.; Yano, O.; et al. Structure-activity relationship of newly synthesized quinoline derivatives for reversal of multidrug resistance in cancer. J. Med. Chem. 1997, 40, 2047-2052. (17) Berger, D.; Citarella, R.; Dutia, M.; Greenberger, L.; Hallett, W.; et al. Novel multidrug resistance reversal agents. J. Med. Chem. 1999, 42, 2145-2161. (18) Pajeva, I.; Wiese, M. Molecular modeling of phenothiazines and related drugs as multidrug resistance modifiers: a comparative molecular field analysis study. J. Med. Chem. 1998, 41, 18151826. (19) Ford, J. M.; Hait, W. N. Pharmacology of drugs that alter multidrug resistance in cancer. Pharmacol. Rev. 1990, 42, 155199. (20) Robert, J.; Jarry, C. Multidrug resistance reversal agents. J. Med. Chem. 2003, 46, 4805-4817. (21) Avendano, C.; Menendez, J. C. Inhibitors of multidrug resistance to antitumor agents (MDR). Curr. Med. Chem. 2002, 9, 159193. (22) Sonneveld, P.; Wiemer, E. Inhibitors of multidrug resistance. Curr. Opin. Oncol. 1997, 9, 543-548. (23) Ekins, S.; Kim, R. B.; Leake, B. F.; Dantzig, A. H.; Schuetz, E. G.; et al. Three-dimensional quantitative structure-activity relationships of inhibitors of P-glycoprotein. Mol. Pharmacol. 2002, 61, 964-973. (24) Klopman, G.; Zhu, H.; Ecker, G.; Chiba, P. MCASE study of the multidrug resistance reversal activity of propafenone analogs. J. Comput.-Aided Mol. Des. 2003, 17, 291-297. (25) Wiese, M.; Pajeva, I. K. Structure-activity relationships of multidrug resistance reversers. Curr. Med. Chem. 2001, 8, 685713. (26) Klopman, G.; Srivastava, S.; Kolossvary, I.; Epand, R. F.; Ahmed, N.; et al. Structure-activity study and design of multidrugresistant reversal compounds by a computer automated structure evaluation methodology. Cancer Res. 1992, 52, 4121-4129. (27) Klopman, G.; Shi, L. M.; Ramu, A. Quantitative structureactivity relationship of multidrug resistance reversal agents. Mol. Pharmacol. 1997, 52, 323-334. (28) Bakken, G. A.; Jurs, P. C. Classification of multidrug-resistance reversal agents using structure-based descriptors and linear discriminant analysis. J. Med. Chem. 2000, 43, 4534-4541. (29) Ramu, A.; Ramu, N. Reversal of multidrug resistance by phenothiazines and structurally related compounds. Cancer Chemother. Pharmacol. 1992, 30, 165-173. (30) Ramu, A.; Ramu, N. Reversal of multidrug resistance by bis(phenylalkyl)amines and structurally related compounds. Cancer Chemother. Pharmacol. 1994, 34, 423-430. (31) Berger, J. O. Statistical Decision Theory and Bayesian Analysis, 2nd ed.; Springer: New York, 1993. (32) Domingos, P.; Pazzani, M. J. On the optimality of the simple bayesian classifier under zero-one loss. Mach. Learn. 1997, 29, 103-130. (33) Pipeline Pilot. http://www.scitegic.com. (34) Young, S. S.; Hawkins, D. M. Using recursive partitioning analysis to evaluate compound selection methods. Methods Mol. Biol. 2004, 275, 317-334.
Multidrug Resistance Reversal Activity (35) Rish, I. An empirical study of the naive Bayes classifier; IBM T. J. Watson Research Center: New York, 2001; pp 41-46. (36) Domingos, P.; Pazzani, M. J. Beyond independence: Conditions for the optimality of simple Bayesian classifier. 13th International Conference on Machine Learning; Bari, Italy, 1996; pp 105-112. (37) Bender, A.; Mussa, H. Y.; Glen, R. C.; Reiling, S. Molecular Similarity Searching Using Atom Environments, InformationBased Feature Selection, and a Naive Bayesian Classifier. J. Chem. Inf. Comput. Sci. 2004, 44, 170-178. (38) Xia, X.; Maliski, E. G.; Gallant, P.; Rogers, D. Classification of kinase inhibitors using a Bayesian model. J. Med. Chem. 2004, 47, 4463-4470. (39) Klon, A. E.; Glick, M.; Thoma, M.; Acklin, P.; Davies, J. W. Finding more needles in the haystack: A simple and efficient method for improving high-throughput docking results. J. Med. Chem. 2004, 47, 2743-2749. (40) Golbraikh, A.; Shen, M.; Xiao, Z.; Xiao, Y. D.; Lee, K. H.; et al. Rational selection of training and test sets for the development of validated QSAR models. J. Comput.-Aided Mol. Des. 2003, 17, 241-253.
Journal of Medicinal Chemistry, 2005, Vol. 48, No. 12 4039 (41) Glick, M.; Klon, A. E.; Acklin, P.; Davies, J. W. Enrichment of extremely noisy high-throughput screening data using a naive Bayes classifier. J. Biomol. Screening 2004, 9, 32-36. (42) Beresford, A. P.; Segall, M.; Tarbit, M. H. In silico prediction of ADME properties: are we making progress? Curr. Opin. Drug Discovery Dev. 2004, 7, 36-42. (43) Pajeva, I. K.; Globisch, C.; Wiese, M. Structure-function relationships of multidrug resistance P-glycoprotein. J. Med. Chem. 2004, 47, 2523-2533. (44) Pawagi, A. B.; Wang, J.; Silverman, M.; Reithmeier, R. A.; Deber, C. M. Transmembrane aromatic amino acid distribution in P-glycoprotein. A functional role in broad substrate specificity. J. Mol. Biol. 1994, 235, 554-564. (45) Ecker, G.; Huber, M.; Schmid, D.; Chiba, P. The importance of a nitrogen atom in modulators of multidrug resistance. Mol. Pharmacol. 1999, 56, 791-796.
JM050180T