Subscriber access provided by CORNELL UNIVERSITY LIBRARY
Article
Debunking the Idea that Ligand Efficiency Indices are Superior to pIC50 as QSAR Activties. Robert P. Sheridan J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.6b00431 • Publication Date (Web): 21 Oct 2016 Downloaded from http://pubs.acs.org on October 22, 2016
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
Journal of Chemical Information and Modeling is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 40
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
Debunking the Idea that Ligand Efficiency Indices are Superior to pIC50 as QSAR Activities. Robert P. Sheridan
Modeling and Informatics Department RY800B-305 MRL Merck & Co. Inc. Rahway, NJ 07065 USA
[email protected] ACS Paragon Plus Environment
1
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 2 of 40
ABSTRACT Several papers have appeared in which a ligand efficiency index instead of pIC50 is used as the activity in QSAR. The claim is that better fits and predictions are obtained with ligand efficiency. We show on both public-domain and in-house data sets that the apparent superiority is a statistical artifact that occurs when ligand efficiency indices are correlated with the physical property included in their definition (number of non-hydrogens, ALOGP, TPSA, etc.), and when the property is easier to predict than the original pIC50.
INTRODUCTION
Ligand efficiency (LE) indices 1-4 are a popular method to normalize a binding measurement, usually pIC50 (e.g. –log(IC50), by a physical property against which the binding is thought to track. For example molecules with more atoms are thought, on the average, to have a higher pIC50 than molecules with fewer atoms. Therefore the LE index:
BEI=pIC50/NA,
where NA is the number of non-hydrogen atoms, is invented to normalize binding for the number of atoms. There are now over a dozen proposed LE indices using various measures of molecular size or polarity. While many workers feel LE indices are an important means to identify “efficient” compounds, others, notably Kenny, 4 have argued that there is little physicochemical or mathematical justification for them.
ACS Paragon Plus Environment
2
Page 3 of 40
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
Given the popularity of LE indices, it was inevitable that they would be used as activities in QSAR problems. In 2013 Sugaya 5 introduced the idea of using LE indices as an alternative to pIC50. He showed that the apparent goodness of cross-validated prediction was improved for a number of GPCR and kinase data sets if the LE index BEI was used instead of pIC50 as the activity. This work used SVM as the QSAR method for classification and tested a variety of descriptors. A second paper from Sugaya 6 expanded the set of LE indices to BEI, LLE, and SIE and used SVR as a QSAR method for regression. A more recent paper from Li et al. 7 studied a series of AR antagonists and showed an improvement in CoMFA models as well as multiple linear regression using SEI instead of pIC50. Most recently Cortes-Ciriano8 examined 25 data sets, 11 LE indicies, and four QSAR methods: PLS, SVM, GBM and random forest. There were improvements with many combinations of LEs and QSAR methods on most data sets.
If LE indices are robustly more predictable than the pIC50, what is the reason for this? Sugaya suggests that LEs have “greater diversity of activity values”. Li et al. suggest that LEs can be a more “rational parameter to be optimized in the drug discovery project.” Authors note that some data sets are more improved than others by using LE instead of pIC50, but generally cannot explain why.
Our investigation examines 21 data sets from Cortes-Ciriano8 and 18 in-house data sets, using descriptors and QSAR methods commonly used at Merck. Our results suggest that the apparent superiority of four types of LE (BEI, LLE, SEI, LELP) compared to pIC50 is due to a statistical artifact. The improvement occurs when:
ACS Paragon Plus Environment
3
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 4 of 40
1. LE strongly correlates (positively or negatively) with the computable physical property included in its definition. 2. The property in the LE definition is easier to predict than pIC50.
We show these two factors quantitatively explain almost all the variation in the improvement of predictivity among data sets whatever the LE, the QSAR method, or QSAR descriptors.
METHODS
In testing the supposed superiority of LE metrics in QSAR, we first examine public domain data sets and use a random split to form training and test sets, which is the usual procedure in the literature on this topic. We try nine combinations of QSAR method and descriptors. Second, we examine in-house data sets in such a way as to be as realistic as possible for our environment. Our in-house data sets are large, and we use time-split to form training and test sets. This is much closer to the way QSAR models are actually used: predicting newly assayed molecules based on molecules assayed before the model was built. We have also found that the R2 from time-split validation is a much more realistic approximation of the R2 for true prospective prediction 9. Also we stay with the QSAR method (random forest) and descriptors (AP,DP) we would normally use.
Data sets The list of data sets used in this study is in Table 1. A good choice for public domain data sets of pIC50 would be those from Cortes-Ciriano8 since these were tested with more combinations of
ACS Paragon Plus Environment
4
Page 5 of 40
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
LE indices and QSAR methods than any previous study. These are downloadable as Supporting Information from that paper. We looked the 21 data sets with > 500 unique molecules. The names in our Table 1 correspond to the file names in the download. Where the names of the files differ from the names in Table 1 of Cortes-Ciriano, the alternative names are shown in parentheses. Since most of the Cortes-Cirino data sets are from ChEMBL, where pIC50 is given in units of nM, it is necessary to transform them, as did Cortes-Ciriano, to units of M to avoid division by zero in the LE index LELP, where the pIC50 is in the denominator. If a molecules is measured more than once, its pIC50 is taken as the average over all determinations.
The 18 in-house data sets are also listed in Table 1. The data sets represent a mixture of on- and off-target activities of varying sizes, most much bigger than the public domain data sets. Some of these data sets have a substantial fraction of “qualified data,” for example, "IC50 >30µM". Most off-the-shelf QSAR methods do not treat qualified data as anything but a definite value, so here >30µM would be treated as 30µM, or –log(IC50)=4.5 in units of M.
Table 1. Data sets used in this study Name Description
N (training+test)
Mean + stdev pIC50 IC50 in molar
Data sets from Cortes-Ciriano aurora_A (O14965) cdk2 estrogen_alpha (P03372) estrogen_beta (Q29731) glucocorticoid
human protein kinase Aurora-A human cyclic-dependent kinase 2 human estrogen receptor α human estrogen receptor α human glucocorticoid
1651
6.30+1.30
1130
6.24+1.38
908
6.64+1.32
799
6.93+1.14
1182
6.94+1.15
ACS Paragon Plus Environment
5
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
(P04150) jak2 (O60674) kinase_src (P12931) map_vasc_endotel (P35968) mtor (P42345) pkc_alpha (P17252) progesterone (P06401) IL4 P11229 P19327 P21554 P24530 P28335 P41594 P47871 P61169 In-house data sets 2C8 2D6 3A4 A-II BACE CAV
CB1 DPP4
receptor human protein kinase jak2 human protein kinase src human vascular endothelial growth factor receptor 2 human FK506 binding protein 12 Human protein kinase C alpha human progesterone receptor human interleukin receptor human muscarinic M1 receptor rat 5-HT1A receptor human cannabinoid receptor human endothelin receptor human 5-HT2C receptor human metabolomic glutamate receptor 5 human glucagon receptor rat D2 receptor
inhibition of CYP 2C8 inhibition of CYP 2D6 inhibition of CYP 3A4 inhibition of the angiotensin-II receptor inhibition of betasecretase-1 inhibition of the CaV 1.2 Channel binding to the cannabinoid-1 receptor inhibition of
Page 6 of 40
869
6.82+1.21
2719
5.96+1.38
4662
6.48+1.29
1337
6.88+1.60
580
5.46+1.23
1233
6.67+1.14
665
6.64+0.56
718
6.27+1.33
919 1240
6.99+1.12 6.85+1.46
966
5.92+1.27
752
6.11+1.06
1318
6.15+1.11
615
6.69+1.05
1756
6.48+0.99
29958 50000 50000 2763
4.88+0.66 4.50+0.46 4.69+0.65 8.70+2.72
17469
6.07+1.40
50000
4.93+0.45
11637
7.13+1.21
6905
6.28+1.23
ACS Paragon Plus Environment
6
Page 7 of 40
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
ERK2 FACTORXIA HERG HIV_INT HIV_PROT NAV NK1 OX1 OX2 THROMBIN
dipeptidylpeptidase-4 inhibition of the ERK2 kinase inhibition of factor XIa inhibition of the hERG channel inhibiting HIV integrase inhibiting HIV protease inhibiting the NaV 1.5 channel binding to substance P receptor inhibition of the Orexin1 receptor inhibition of the Orexin2 receptor inhibition of thrombin
12843
4.38+0.68
9536 50000
5.49+0.97 5.21+0.78
2413
6.32+0.56
4311 50000
7.30+1.71 4.77+0.40
13482
8.28+1.21
7135
6.16+1.22
14875
7.25+1.46
6800
6.67+2.02
Workflow The handling of data is shown in Figure 1. For the public domain data, each intial data set is split once at random into a training and test set in the proportion 75/25. For in-house data each intial data set is split 75/25 into a training and test set based on dates of assay for the compound, with the later-assayed compounds in the test set. A QSAR model is generated from the training set with pIC50 as an activity, and the model is used to predict the pIC50s in the test set. A LE transformation is then made from the pIC50 using the appropriate Property in the definition of the LE. A model is generated from the training set using the LE as the activity, and the model is used to predict the LE in the test set. The predicted LEs are transformed back into pIC50, so a comparison can be made of the accuracy of prediction of pIC50 from the LE model vs the original pIC50 model. Similarly, predicted pIC50s can be transformed into predicted LEs.
ACS Paragon Plus Environment
7
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 8 of 40
Figure 1. Scheme for generation of pIC50, LE, and Property QSAR models and predictions from them. All QSAR models use the same training set/test set split.
ACS Paragon Plus Environment
8
Page 9 of 40
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
Finally, for each data set we generate a QSAR model from the training set using the Property in the definition of the LE as the activity and predict the Property of the test set.
In each case the QSAR model may be built using one of three QSAR methods and one of three descriptor types (see below). All QSAR models are regressions, i.e. both the observed activities and predictions are floating-point numbers.
Definitions of LE There are several dozen LE indices in the literature. For our purposes we need examine only a representative set. These are the four we have chosen:
1. BEI=pIC50/NA
where NA is the number of non-hydrogen atoms, which is the Property for BEI. Because of its reciprocal form, BEI increases without limit as NA becomes very small. We therefore ignored molecules with fewer than 10 atoms to avoid that issue. Such molecules are very rare in the data sets. LEs using molecular weight instead of NA would be expected to be very similar, since molecular weight is highly correlated with NA.
2. LLE=pIC50-ALOGP
ACS Paragon Plus Environment
9
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 10 of 40
where ALOGP, the Property for LLE, is calculated by Pipeline Pilot 10
3. SEI=100* pIC50/TPSA
where TPSA, the Property for SEI, is the solvent accessible polar surface area. Here we calculate the polar surface area of a single conformer of each molecule produced by CORINA 11. Again, because of its reciprocal nature, SEI can increase very fast as TPSA goes down. This makes it hard to make predictive models of SEI for the Cortes-Ciriano data sets that contain very nonpolar molecules, for which TPSA can be zero or near zero, because the range of SEI can be inconsistent between training and test sets. To avoid this, we ignored any molecule with TPSA < 20 square Angstroms.
4. LELP = CLOGP/BEI = CLOGP*NA/pIC50
Where CLOGP is from Pipeline Pilot 10. The Property for LELP would be CLOGP*NA.
Descriptors for QSAR models Our usual substructure descriptors for QSAR is the union of the Carhart atom pair descriptors (AP) 12 and a version of atom pairs called DP 13 where the atom types are replaced with physiochemical types (cation, anion, donor, acceptor, polar, hydrophobic, other).
ECFP4 is the circular fingerprint with a radius of 4 described by Rogers and Hahn14.
ACS Paragon Plus Environment
10
Page 11 of 40
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
MOE_2D is a set of 186 computable physical properties provided by the MOE package15. One of the reasons for including MOE_2D is that at least some QSAR models for LE in the literature use computed physical properties as QSAR descriptors, and this makes it potentially likely that that the Property will be especially easy to predict, since some of the QSAR descriptors will be highly correlated with the Property. This, in turn, could make the potential artifact more problematic.
QSAR methods Random forest (RF): Our implementation of random forest 16 is a parallelized version of the original code of Brieman17. The parallelization allows us to deal with very large data sets. We typically use 100 trees and a node size of 5.
PLS: We use the R implementation of pls18. The number of components (up to 10) is optimized by cross-validation on the training set.
Deep neural networks (DNN): We use Python-based code obtained from the Kaggle contest and described in Ma et al. 19. We use parameters slightly different than the “standard set” described in that paper: Two intermediate layers of 1000 and 500 neurons with 25% dropout rate, and 75 training epochs. The above change is made for the purposes of more time-efficient calculation. Also we use the asinh (inverse hyperbolic sine) transformation on the descriptors instead of the log transformation. This allows us to handle the MOE_2D descriptors, which may have negative values. Log and asinh give equivalent prediction accuracies for the substructure descriptors APDP and ECFP4.
ACS Paragon Plus Environment
11
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 12 of 40
Metrics We measure goodness of prediction by Pearson’s R2 because we are usually interested in whether predictions linearly track with observations, rather than insisting on that predictions match observations numerically.
RESULTS
A table of R2 of prediction for pIC50 , LEs, and Properties for the Cortes-Ciriano data sets is included in Supporting Information. There is a similar table for the in-house data sets. We will discuss a number of aspects of the Cortes-Ciriano data sets, for which we tried all nine QSAR method/descriptor combinations. Then we will show how the inhouse data sets show similar behavior.
Are LEs easier to predict than pIC50 and are Properties easier to predict than pIC50? First we need to confirm in our data sets that LEs are indeed easier to predict than pIC50. Second, the thesis that the apparent superiority of LEs is due to an artifact requires that the Properties in the definition of the LEs are generally easier to predict than pIC50.
Figure 2 shows the R2 of prediction of the test set for pIC50, LE, and the corresponding Property for the Cortes-Ciriano data sets using the RF/APDP method/descriptor combination as an example. Most of the time, the black line (LE) is above the red line
ACS Paragon Plus Environment
12
Page 13 of 40
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
BEI RF/APDP
LLE RF/APDP
ACS Paragon Plus Environment
13
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 14 of 40
SEI RF/APDP
LELP RF/APDP
Figure 2. For the Cortes-Ciriano data sets using the RF/APDP method/descriptor combination. R2 of prediction vs. data set for the pIC50, LE, and the corresponding Property (NA for BEI, ALOGP for LLE, TPSA for SEI and CLOGP*NA for LELP) .
ACS Paragon Plus Environment
14
Page 15 of 40
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
(pIC50). That is, generally speaking, LEs are easier to predict than pIC50, although the amount of improvement depends on the data set and on the method/descriptor combination. It is also clear from Figure 2 that the Property is very easy to predict compared to pIC50, with the blue line above the red line with very few exceptions. Interestingly, NA, the Property for BEI, is uniformly easy to predict for all data sets. The general observations above are seen whatever the method/descriptor combination.
Critical factors Discussion of the hypothetical artifact requires three terms: 1. R2_LE_MINUS_PIC50. This is equivalent to how far the black line is above the red line in Figure 2, i.e. the improvement in predicting LE vs. predicting pIC50, which we ultimately want to explain. 2. R2_LE_VS_PROPERTY. This is the square of the correlation of the LE with its Property. The correlation is negative for BEI, LLE, and SEI and positive for LELP, but for the purposes of the artifact, only the magnitude of the correlation matters. 3. R2_PROPERTY_MINUS_PIC50. This is how much easier the Property is to predict than the pIC50, i.e. how far the blue line is above the red line in Figure 2.
Showing Property can dominate a model for LE It is useful to start with demonstrations how a model for LE can be dominated by the Property when R2_LE_VS_PROPERTY is moderate to high. We will look at BEI for the RF/APDP method/descriptor combination as an example. The first approach will be to see how well the LE model predicts its individual components pIC50 and Property (NA
ACS Paragon Plus Environment
15
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 16 of 40
in the case of BEI). The second approach will be to replace one of the components with noise.
The black, red, and blue lines in Figure 3 are the predictions of LE from the LE model, pIC50 from the pIC50 model, and Property from the Property model, respectively, as seen in Figure 2. These are baseline references. In Figure 3A we show, in addition, predictions of Property from the LE model (green) and predictions of pIC50 from the LE model (grey). The R2 for the LE model predicting the Property (green) is almost as high as the LE model predicting LE (black line). On the other thand, the R2 for the LE model predicting the pIC50 (grey line) is low, clearly lower than the ability of the pIC50 model to predict itself (red line). Clearly, the Property is dominating the LE model.
For Figure 3B, we create a permuted version of Property, where the Property is randomly assigned to the wrong molecules in the training set, and a similarly permuted version of pIC50. A model made from the permuted pIC50, not surprisingly, cannot predict pIC50 of the test set, i.e. the R2 for the orange line is very close to zero. Similarly, a model made with a permuted Property cannot predict the Property of the test set (light blue line). There are two additional LE models, one with the intact Property and permuted IC50 (green line), and one with the intact pIC50 and permuted Property (grey line). The fact that the grey line is far below the green line, and certainly below the original pIC50 model (red), again implies that the majority of the signal in the LE (BEI) comes from the Property (NA) instead of pIC50.
ACS Paragon Plus Environment
16
Page 17 of 40
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
A
B
ACS Paragon Plus Environment
17
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 18 of 40
C
Figure 3. For the Cortes-Ciriano data sets and the RF/APDP method/descriptor combination. R2 of prediction vs. data set for the pIC50, LE, and the corresponding Property. A. In addition we show the prediction of Property by the LE model (green) and prediction of pIC50 by the LE model (grey). B. In addition we show predictions where LE is made from permuted pIC50 (green line) or permuted Property (grey line). Also shown are predictions of pIC50 from permuted IC50 (orange line) and predictions of Property from permuted Property (light blue line). C. R2_LE_VS_PROPERTY vs. data set.
It should be noted that the data set where the predictions of LE models are most dominated by Property in Figures 3A and 3B, not surprisingly, is the one where R2_LE_VS_PROPERTY is highest (estrogen_beta in Figure 3C), and the dataset that is least dominated by Property is the one where R2_LE_VS_PROPERTY is lowest (P21554).
Explaining the variation in improvement of LE over pIC50 among data sets
ACS Paragon Plus Environment
18
Page 19 of 40
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
Figure 4 displays the improvement of LE over pIC50 for the Cortes-Ciriano data sets as a function of factors suggested by the hypothesized artifact. All LE/method/descriptor combinations are represented, with the LEs distinguished by color and the method/descriptor combinations as shapes. The y-axis in Figure 4A-C is R2_LE_MINUS_R2_ PIC50. In Figure 4A the x-axis is R2_LE_VS_PROPERTY. In Figure 4B the x-axis is R2_PROPERTY_MINUS_R2_PIC50. In Figure 4C the x-axis is the product of the quantities of the values in the x-axes of Figures 4A and 4B.
To demonstrate the statistical artifact mentioned in the INTRODUCTION, it is sufficient to demonstrate that data sets that have most improvement of LE over pIC50 should be the ones where R2_LE_VS_PROPERTY is highest and/or where R2_MINUS_R2_PIC50 is highest. We see some weak trends in Figures 4A and 4B, respectively, where there is some separation of LEs. The trends should be stronger if we consider the combination of both factors simultaneously, as in Figure 4C. All combinations of LE/method/descriptors seem to fall onto the same line, with an R2 of 0.82. That is, with few exceptions, we can predict the improvement of LE vs. pIC50 given only two factors suggested by the artifact hypothesis, regardless of the LE, QSAR method or descriptor.
ACS Paragon Plus Environment
19
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 20 of 40
A
B
ACS Paragon Plus Environment
20
Page 21 of 40
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
C
Figure 4. For the Cortes-Ciriano data sets. Ligand efficiency types are represented by color and QSAR method/descriptor combinations represented as shapes. A. R2_LE_MINUS_R2_PIC50, the improvement of the prediction of LE compared to pIC50 vs. R2_LE_VS_PROPERTY, the correlation of LE with its Property. B. R2_LE_MINUS_R2_PIC50 vs. R2_PROPERTY_MINUS_R2_PIC50, the difference in R2 between Property and pIC50. C. R2_LE_MINUS_R2_PIC50 vs. the product of the xaxes in A and B.
What LE and what method/descriptor combination shows the highest improvement of LE over pIC50 and why? We address this question with boxplots in Figures 5 and 6. In Figure 5A, we see that LELP is the LE that shows the highest average improvement over pIC50. One can see the same trend in Figure 4A-C, where the highest symbols along the y-axis are green. CortesCiriano8 notes that among the 11 LEs he tested, LELP shows the highest improvement of LE over pIC50 for these data sets. Figure 5B shows that LELP has the highest R2_LE_VS_ PROPERTY. This is also seen with the separation of green symbols from
ACS Paragon Plus Environment
21
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 22 of 40
the rest along the x-axis in Figure 4A. A higher R2_LE_VS_PROPERTY would be associated with a higher improvement according to the artifact hypothesis. This is quite different than the explanation offered by Cortes-Ciriano8 that LELP shows the most improvement because there is something good about the combination of polarity and size factor.
In Figure 6A we see that, for any given QSAR method, the highest improvement averaged over the LEs occurs with the MOE_2D descriptors. In Figure 6B we see that that MOE_2D shows a higher R2_PROPERTY_MINUS_R2_PIC50. There are two ways of raising R2_PROPERTY_MINUS_R2_PIC50. One is to raise R2_PROPERTY. The other is to lower R2_PIC50. Our expectation was that MOE_2D descriptors would have the effect of making the Property easier to predict, i.e. raising R2_PROPERTY, since MOE_2D descriptors contain atom count, two favors of LOGP, topological polar surface area, etc. However, Figure 6C shows that R2_PROPERTY is not much higher for MOE_2D than the other descriptors, when averaged over four LEs (although it is quite high for BEI). Instead, in Figure 6D we see that the larger effect of MOE_2D is to lower R2_PIC50. In retrospect this makes sense because physical properties are often not enough to predict on-target activities, which we have in the Cortes-Ciriano data sets. More detailed chemical descriptors, such as with APDP and ECFP4 are usually needed.
ACS Paragon Plus Environment
22
Page 23 of 40
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
A
B
Figure 5. For the Cortes-Ciriano data sets. Box plots averaged over all method/descriptor combinations and binned by LE. A. R2_LE_MINUS_R2_PIC50, the improvement of LE over pIC50. B. R2_LE_VS_PROPERTY.
ACS Paragon Plus Environment
23
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 24 of 40
A
B
ACS Paragon Plus Environment
24
Page 25 of 40
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
C
D
Figure 6. For the Cortes-Ciriano data sets. Box plots averaged over all method/descriptor combinations and binned by method/descriptor combination. A.
ACS Paragon Plus Environment
25
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 26 of 40
R2_LE_MINUS_R2_PIC50, the improvement of LE over pIC50). B. R2_PROPERTY_MINUS_R2_PIC50. C. R2_PROPERTY. D. R2_PIC50.
Predictions of pIC50 from “back-calculated” predicted LE vs. direct predictions of pIC50 Even if the improvement in R2 for LE over pIC50 is due to an artifact, could it be possible that LEs could be a useful transformation to apply to pIC50? For that to be true, the predicted pIC50 “back-calculated” from a predicted LE from an QSAR model of LE will be more accurate than the direct prediction of pIC50 from QSAR model. The back-calculation is simply reversing the LE transformation, for example for BEI:
predicted pIC50 = predicted BEI * NA
In Figure 7A, R2 refers to the correlation of predicted pIC50 vs. actual pIC50. We are showing as an example the RF/APDP method/descriptor combination. The y-axis shows the R2 where the predicted pIC50 is back-calculated from the prediction of LE. The x-axis shows the R2 for the direct prediction of pIC50 from the pIC50 model. That the points are on or below the diagonal (dashed line) indicates that calculating pIC50 directly from a pIC50 model is preferable. Figure 7B shows the reverse. Here R2 means the agreement of a predicted LE vs. actual LE. This time we are comparing the predicted LE forward-calculated from a prediction of pIC50 to the LE directly predicted from the LE model. In this case the LE generated from a predicted pIC50 is the same or better (above the diagonal). Thus in neither case does it make sense to use a model for LE. Similar results are observed for every method/descriptor combination. In all cases, BEI seems to be closest to the diagonal, probably because NA is so easy to predict.
ACS Paragon Plus Environment
26
Page 27 of 40
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
A
B
Figure 7. For the Cortes-Ciriano data sets using the RF/APDP method/descriptor combination. A. R2 for the prediction of pIC50 as back-calculated from the LE model vs. R2 for the prediction of pIC50 from the pIC50 model. B. R2 for the prediction of LE as forward- calculated from the pIC50 model vs. R2 for the prediction of LE from the LE model.
ACS Paragon Plus Environment
27
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 28 of 40
In-house data sets In this section we confirm the in-house data sets (all using RF/APDP) have similar behavior to the Cortes-Ciriano data sets. The biggest differences are that the in-house data sets are larger, are a mix of on-target and ADMET data sets, and use a time-split to generate the training vs. the test set.
Figure 8 shows the same type of plots as Figure 2. In those plots we see that the R2 tend to be lower than in Figure 2, especially for pIC50 and LE. This is not at all surprising since we expect time-split test sets to be less predictable than random-split test sets. Aside from that, we see the same types of trends: the black line is above the red line on the average, and with few exceptions, the blue line is above the red line. BEI has the bestpredicted Property.
Figure 9 shows the same type of plot as Figure 4 for the in-house data sets. Again we see weak trends with individual metrics R2_LE_VS_PROPERTY and R2_PROPERTY_MINUS_R2_PIC50, but a very strong linear trend and superposition of the LEs with the product of the metrics (R2 for the linear fit 0.85). For the in-house data sets all the LEs seem about equivalent in their improvement.
ACS Paragon Plus Environment
28
Page 29 of 40
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
BEI RF/APDP
LLE RF/APDP
ACS Paragon Plus Environment
29
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 30 of 40
SEI RF/APDP
LELP RF/APDP
Figure 8. For the in-house data sets and the RF/APDP method/descriptor combination. R2 of prediction vs. for the pIC50, LE, and the corresponding Property for the in-house data sets.
ACS Paragon Plus Environment
30
Page 31 of 40
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
A
B
Figure 9 (1 of 2)
ACS Paragon Plus Environment
31
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 32 of 40
C
Figure 9 (2 of 2) For the in-house data sets using the RF/APDP method/descriptor combination. Ligand efficiency types are represented by color and QSAR method/descriptor combinations represented as shapes. A. The improvement in R2 of LE vs. pIC50 vs. the R2 of the correlation of LE with its Property. B. The improvement in R2 of LE vs. pIC50 vs. the difference in R2 between Property and pIC50. C. The improvement in R2 of LE vs. pIC50 vs. the product of the x-axes in A and B.
Figure 10 shows the same type of plot as Figure 7. There is a shift of points toward the left compared to Figure 7 because, as previously noted, predictions of LE and pIC50 will be poorer with time-split data sets. Again we see that there is no advantage to making a LE model relative to making a pIC50 model.
ACS Paragon Plus Environment
32
Page 33 of 40
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
A
B
Figure 10. For in-house data sets using the RF/APDP method/descriptor combination. A. R2 for the prediction of pIC50 as back-calculated from the LE model vs. R2 for the prediction of pIC50 from the pIC50 model. B. R2 for the prediction of LE as forward-calculated from the pIC50 model vs. R2 for the prediction of LE from the LE model.
ACS Paragon Plus Environment
33
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 34 of 40
DISCUSSION
Any transformation of the activity that will make pIC50s easier to predict in QSAR would be interesting, and we can regard LEs as a type of transformation of pIC50. However, the improvement in prediction must be meaningful instead of an illusion. Our studies strongly indicate that when there is an improvement of predictions of LE over pIC50, the improvement is due to an artifact which seems obvious once pointed out: The LE mixes in a Property that is much easier to predict than the pIC50.
Given only the product of two factors suggested by the hypothesized artifact, we are able to predict which data sets will show what level of improvement of LE over pIC50 for each individual LE. As far as we are aware, there is no other published qualitative explanation for why improvement occurs in some data set/LE combinations, but not others. The fact that, for a given collection of data sets, different LEs seem to follow the same quantitative trend provides further support, since there is no obvious reason other than the artifact this should be true.
What of the literature explanations of why LE shows an improvement over pIC50 in general? Suggestions of the type that LEs are “more natural metrics” cannot be taken seriously because the reasoning is circular; no reason is given other than the fact that they are apparently superior. Regarding Sugaya’s 6 suggestion, which is echoed by CortesCiriano8, that LEs are better than pIC50 because LEs have a “greater diversity of activity values”, it is not clear whether Sugaya is comparing the standard deviation of the activity
ACS Paragon Plus Environment
34
Page 35 of 40
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
distribution or the kurtosis of the activity distribution, i.e. whether the distribution of LE is “heavier in the wings” relative to that of pIC50. It would not make mathematical sense to directly compare the standard deviations of an LE with that of a pIC50 because they have different units, but we do see a way distributions of activity can correlate with the improvement. There is a weak negative correlation between R2_LE_MINUS_R2_PIC50 and the standard deviation of the pIC50 for the in-house data sets. This probably works indirectly since there is a stronger negative correlation between R2_LE_VS_PROPERTY and the standard deviation in the pIC50, and a positive correlation between R2_LE_VS_PROPERTY and the improvement. However, the correlation in the CortesCiriano data set is very weak.
We investigated a limited number of LEs, QSAR methods, and descriptors. However, we would expect the artifact to be an issue whatever the descriptors or QSAR method, whether 2D or 3D. All that is needed for the artifact to be present is for the LE to correlate with the Property in its definition and for the Property to be easily predicted. For the first, correlation is almost inevitable. For the second, the Property in LE definitions is almost always some measure of size or polarity. These are usually easily calculated from substructure descriptors generated from the molecular structure. In fact, the Property is itself is sometimes the result of a prediction from a QSAR model based on substructure descriptors. ALOGP and CLOGP are examples. If the QSAR descriptors are computable physical properties (e.g. MOE_2D) that could make the Property especially easy to predict. This is not to say that there is no conceivable combination of LE, QSAR, descriptors, and QSAR method where the improvement of LE over pIC50 does not
ACS Paragon Plus Environment
35
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 36 of 40
involve the artifact of the type described here. However, those who claim the superiority of the LE have the burden of proof to show that there is no artifact.
ACKNOWLEGEMENTS
The author thanks Dr. Qiaolin Deng for encouraging this study. The implementation of random forest was written by Joseph Shpungin. The implementation of deep neural networks was written by George Dahl and modified by Junshui Ma.
ASSOCIATED CONTENT
Supporting Information Available: 1. Full table of metrics for predictions of Cortes-Ciriano data sets. 2. Full table of metrics for predictions of in-house datasets. This material is available free of charge via the Internet at http://pubs.acs.org
CONFLICT OF INTEREST STATEMENT The author declares no competing financial interest.
REFERENCES
1. Hopkins, A.L.; Groom, C.R.; Alex, A. Ligand Efficiency: A Useful Metric for Lead Selection. Drug. Disc. Today 2004, 9, 430-431.
ACS Paragon Plus Environment
36
Page 37 of 40
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
2. Abad-Zapatero, C.; Metz, J.T. Ligand Efficiency Indices as Guideposts for Drug Discovery. Drug Disc. Today 2005, 10, 464-469.
3. Shultz, M.D. Setting Expectations in Molecular Optimizations: Strengths and Limitations of Commonly Used Composite Parameters. Biorg. Med. Chem. Lett. 2013, 23, 5980-5991.
4. Kenny, P.W.; Leitao, A.; Montanari, C.A. Ligand Efficiency Metrics Considered Harmful. J. Comput-Aided Mol. Des. 2014, 28, 699-710.
5. Sugaya, N. Training Based on Ligand Efficiency Improves Prediction of Bioactivities of Ligands and Drug Target Proteins. J. Chem. Inf. Model. 2013, 53, 2525-2537.
6. Sugaya, N. Ligand Efficiency-Based Support Vector Regression Models for Improvement of Regression Models for Predicting Bioactivities of Ligands to Drug Target Proteins. J. Chem. Inf. Model. 2014, 54, 2751-2763.
7. Li, J.; Bai, F.; Liu, H.; Gramatica, P. Ligand Efficiency Outperforms Pic50 on Both 2D MLR and 3D Comfa Models: a Case Study on AR Antagonists. Chem. Biol. Drug Des. 2015, 86, 1501-1517.
8. Cortes-Ciriano, I. Benchmarking the Predictive Power of Ligand Efficiency Indices In QSAR. J. Chem. Inf. Model. 2016, 56, 1576-1587.
ACS Paragon Plus Environment
37
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 38 of 40
9. Sheridan, R.P. Time-Split Cross-Validation as a Method for Estimating the Goodness of Prospective Prediction. J. Chem. Inf. Model. 2013, 53, 783-790
10. Dassault Systemes Biovia Pipeline Pilot Overview http://accelrys.com/products/collaborative-science/biovia-pipeline-pilot/ (accessed October 19, 2016)
11. Sadowski, J.; Gasteiger, J.; Klebe, G. Comparison of Automatic Three-Dimensional Model Builders Using 639 X-Ray Structures. J. Chem. Inf. Comput. Sci. 1994, 34, 10001008.
12. Carhart, R. E.; Smith, D. H.; Ventkataraghavan, R. Atom Pairs as Molecular Features in Structure-Activity Studies: Definition and Application. J. Chem. Inf. Comput. Sci., 1985, 25, 64-73.
13. Kearsley, S.K.; Sallamack, S.; Fluder, E.M.; Andose, J.D.; Mosley, R.T.; Sheridan, R.P. Chemical Similarity Using Physiochemical Property Descriptors. J. Chem. Inform. Comp. Sci. 1996, 36, 118-27.
14. Rogers, D.; Hahn, M. Extended-Connectivity Fingerprints. J. Chem. Inf. Model. 2010, 50, 742-754.
ACS Paragon Plus Environment
38
Page 39 of 40
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
15. Chemical Computing Group QuaSAR descriptors https://www.chemcomp.com/journal/descr.htm (accessed October 19, 2016)
16. Svetnik, V.; Liaw, A.; Tong, C.; Culberson, J.C.; Sheridan, R.P.; Feuston, B.P. Random Forest: A Classification And Regression Tool for Compound Classification and QSAR Modeling. J. Chem. Inf. Comput. Sci. 2003, 43, 1947-1958.
17. Breiman, L. Random Forests. Machine Learning 2001, 45, 3-32.
18. Mevik, B.-H.; Wehrens, R. The pls Package: Principal Component and Partial Least Squares Regression in R. J. Stat. Software 2007, 18, issue2
19. Ma, J.; Sheridan, R.P.; Liaw, A.; Dahl, G.E.; Svetnik, V. Deep Neural Networks as a Method for Quantitative Structure-Activity Relationships. J. Chem. Inf. Model. 2015, 55, 263-274.
ACS Paragon Plus Environment
39
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 40 of 40
TOC Graphic
ACS Paragon Plus Environment
40