In Silico Categorization of in Vivo Intrinsic Clearance Using Machine

Feb 21, 2013 - In the work presented here, we have applied orthogonal partial least-squares (OPLS), principal component analysis (PCA), and random for...
0 downloads 9 Views 628KB Size
Article pubs.acs.org/molecularpharmaceutics

In Silico Categorization of in Vivo Intrinsic Clearance Using Machine Learning Ya-Wen Hsiao, Urban Fagerholm, and Ulf Norinder* AstraZeneca Research and Development Södertälje, SE-151 85 Södertälje, Sweden S Supporting Information *

ABSTRACT: Machine learning has recently become popular and much used within the life science research domain, e.g., for finding quantitative structure−activity relationships (QSARs) between molecular structures and different biological end points. In the work presented here, we have applied orthogonal partial least-squares (OPLS), principal component analysis (PCA), and random forests (RF) methods for classification as well as regression analysis to a publicly available in vivo data set in order to assess the intrinsic metabolic clearance (CLint) in humans. The derived classification models are able to identify compounds with CLint lower and higher than 1500 mL/min, respectively, with nearly 80% accuracy. The most relevant descriptors are of lipophilicity and charge/polarizability types. Furthermore, the accuracy from a classification model based on regression analysis, using the 1500 mL/min cutoff, is also around 80%. These results suggest the usefulness of machine learning techniques to derive robust and predictive models in the area of in vivo ADMET (absorption, distribution, metabolism, elimination, and toxicity) modeling. KEYWORDS: machine learning, OPLS, PCA, RF, in vivo CLint, hepatic CL



INTRODUCTION Hepatic metabolism is a major determinant for drug clearance (CL), bioavailability, and exposure. Accurate prediction of hepatic CL (CLH) and intrinsic CLH (CLint) is therefore of great importance in drug design and for pharmacodynamics and safety. Animal in vivo and in vitro data have generally little value when predicting CL and CLint in human. Ward and Smith1 found no apparent (weak) correlation between human and animal (rat, dog, and monkey) CL, and Clarke and Jeffrey2 demonstrated virtually no correlation between human and rat CLint. In a report by Bogaards et al.,3 CLint values in seven different species, including human, were inconsistent. Most commonly, in vivo CLint is predicted from in vitro CLint obtained using human liver microsomes and hepatocytes. Results span from poor/modest to reasonably good. Internal data show a correlation between log in vitro and in vivo CLint of approximately 0.3 and 0.6 with human microsomes and hepatocytes, respectively. Results are consistent with those produced by Stringer et al.4 A limitation with the in vitro assays is the low limit of quantification (LLOQ) and (as a consequence) the inability/difficulty to predict low (below the liver blood flow rate; 1500 mL/min) in vivo CLint values. LLOQ values vary from several hundred to several thousand mL/min depending on methodology, laboratory, and compounds. Attempts have been made to build and use an in silico methodology for prediction of in vitro and in vivo CLint. These have met with little success, which can, at least partly, be © 2013 American Chemical Society

explained by the complex nature of the metabolic processes. An in silico metabolic stability model built by Ekins5 produced moderate predictions of human in vivo CLint (R2 = 0.34; P < 0.0001; n = 41). Multivariate data analysis as well as machine learning are powerful approaches for deriving predictive in silico models for drug research.6 For a recent assessment of their utility in drug discovery, see ref 7. In this work we have employed multivariate methods of orthogonal partial least-squares (OPLS) type,8 principal component analysis (PCA), and ensembles of decision trees9 (random forests: RF) in order to develop models with good predictive performance.



COMPUTATIONAL DETAILS Data Set and Descriptors. Modeling was conducted on the CLint values derived from an extensive pharmacokinetic data set compiled by Varma et al.10 CLH was assumed to equal nonrenal CL (CL − renal CL), and CLint was calculated using the well-stirred model and a liver blood flow rate of 1500 mL/ min. After removing compounds excreted via bile and those with zero CLint, the data set consisted of 244 drugs. CLint values varied from 1 to 1400000 mL/min. 1500 mL/min was used as the cutoff to distinguish drug metabolic stability, i.e., those Received: Revised: Accepted: Published: 1318

August 31, 2012 February 8, 2013 February 21, 2013 February 21, 2013 dx.doi.org/10.1021/mp300484r | Mol. Pharmaceutics 2013, 10, 1318−1321

Molecular Pharmaceutics

Article

which with CLint < 1500 mL/min were considered “stable” while the rest were considered as “moderate−poor”. This cutoff resulted in 145 compounds within the stable category and 99 within the moderate−poor. Such skewed distribution is not unexpected from drugs that are available in the market. To balance the distribution so that the minority group might be properly modeled, we duplicated the drug entries from the moderate−poor category and made sure that these entries would end up in the same cross-validation fold for OPLS and PCA evaluations using SIMCA-P+ (version 12.0).11 The number of duplicates added was just sufficiently many to equalize the sizes of the two classes. A similar procedure for balancing binary classes has previously been employed by McGregor and Muskel.12 Also, within the machine learning domain, the difficulties in modeling highly unbalanced classes and methods to address this has been extensively studied where oversampling of the minority class, e.g., by duplication procedures for the smallest of the classes, is one investigated approach to handle the situation.13,14 For RF, we assigned a weight factor of 2 to these drug entries instead of duplication, which means that the probability for a compound to be selected during sampling is increased by a factor of 2. 93 descriptors describing molecular lipophilicity, hydrogen bonding, topology, size/shape, polarity, and drug ability were calculated using our in-house descriptor generation program Selma.15 We have, over time, found these descriptors very useful for ADMET modeling.6 11 3-D quantum chemical descriptors were also calculated on optimized geometries, including molecular volume, total and frontier molecular orbital energies, dipole moment, isotropic polarizability, and Mulliken charges of constituent atoms in a molecule. These calculations were performed using density functional theory with BP8616,17 functional and 6-31G*18,19 basis set implemented in the Gaussian 09 package.20 QSAR Modeling. The CLint data were negative-logtransformed prior to modeling, and the range of −log(CLint) became between −6.15 and −0.15. For regression analysis, we used both OPLS implemented in SIMCA-P+11 and RF.21 In the text below, we will refer to these two approaches as REGOPLS and REGRF, respectively. For classification, we used PCA class in SIMCA-P+, and RF. Similarly, they are referred to as CLAPCA and CLARF, respectively. Furthermore, we obtained more classification models using the Y-predicted values from regression analysis and categorized them according to the 1500 mL/min cutoff. These approaches are referred to as R2CXX, with XX being OPLS or RF. To build the REGOPLS model within SIMCA-P+, we initially included all of the descriptors and used the program’s autofit function to obtain the major components. We further increased the number of components to make sure that the component no longer gave a reduced Q2 value or the component was no longer significant according to the cross-validation rules. Then, according to SIMCA’s variable importance plot (VIP), we removed the descriptors with variable importance lower than 0.8, and built a new model, finding all the components the same way as the initial step. The default 7-fold cross-validation was used. The same collection of descriptors was then also used for building the CLAPCA model. In addition to the PCA-class method, we also used RF to build a classification model. For the RF approach, all the descriptors were included and we used 500 trees as well as 7fold cross-validation for building the models.

For comparison purposes we have also performed a stepwise multiple linear regression (MLR) analysis using 7-fold crossvalidation.



RESULTS AND DISCUSSION Statistics and Cross-Validation. The statistics of models generated are collected in Tables 1 and 2. Table 1 shows that,

Table 1. Statistics of Regression Models method 2

R Q2

REGOPLS

REGRF

REGMLR

0.59 0.48

0.96 0.48

0.43 0.35

Table 2. Accuracy of Classification Models from CLAPCA, CLARF, R2COPLS, R2CRF, and R2Cave for the Original Unbalanced Data Set class

CLAPCA

CLARF

R2COPLS

R2CRF

R2Cave

stable moderate−poor overall

0.89 0.59 0.75

0.84 0.64 0.76

0.75 0.82 0.78

0.83 0.74 0.79

1.00 0.00 0.59

for the regression models, a strong R2 = 0.59 is obtained for the REGOPLS model and the cross-validation Q2 = 0.48 is quite similar to R2. CV-ANOVA analysis in SIMCA gives P < 10−31, indicating a significant model. For REGRF, we obtain R2 = 0.96 and the cross-validation Q2 = 0.48 which indicates overfitting with respect to R2 but where the more important crossvalidation Q2 value shows a performance similar to that obtained using OPLS. The stepwise MLR analysis, however, shows less good statistical quality with R2 and Q2 values of 0.43 and 0.35, respectively, inferring that application of the somewhat more complex PLS and RF methods seems to merit their use in trying to develop more predictive in silico models for in vivo hepatic CLint. Statistics of classification models CLAPCA, CLARF, R2COPLS, R2CRF, and R2Cave are shown in Table 2. The overall accuracies in prediction (total number of correct classification/total number of drugs) are similar across the three models. However, the CLAPCA model clearly shows lower predictive performance for the “moderate−poor” class compared to the models derived from the respective regression analysis, and is compensated by the better predictive performance for the “stable” class. The same pattern is also present in the results from the CLARF analysis where, again, the accuracy for the “moderate−poor” class is somewhat worse compared with the accuracy for the “stable” class. It thus seems that the PCA classification is more influenced by the skewed data distribution for the two classes compared to classification models derived from regression analysis, i.e., the R2COPLS and R2CRF models. In addition to the results obtained by REGOPLS and REGRF regression analysis the classified predictions from these two approaches yield excellent models and the results suggest that a good categorization protocol can be obtained this way. Using the overall mean clearance as the prediction for all compounds (R2Cave) significantly decreases the classification accuracy. Also, the predictive performance for the “moderate− poor” class becomes very poor using this approach.The distribution of fold differences between experimental and predicted in vivo CLint values across the data set from the R2COPLS analysis is such that 29% of the data points are within 2-fold, 53% between 2- and 10-fold, and 19% greater than 10fold. Results are comparable to those obtained with micro1319

dx.doi.org/10.1021/mp300484r | Mol. Pharmaceutics 2013, 10, 1318−1321

Molecular Pharmaceutics

Article

Figure 1. The variable importance plot for the REGOPLS model.

From either OPLS or RF, regression or classification, the derived models unanimously indicate that lipophilicity is the most important factor, and then charge distribution, hence the polarizability or dipole moment is another key factor in describing the hepatic clearance.

somes and hepatocytes in house and by others. For example, Stringer et al.4 selected 110 compounds and reached 29% and 18% within 2-fold prediction errors with human microsomes (n = 41; CLint could not be measured for 69 compounds) and hepatocytes (n = 57; CLint could not be measured for 53 compounds), respectively. Vital Descriptors. Figure 1 is the plot of variable importance from the REGOPLS model by SIMCA-P+. The five most important descriptors are GClogP, NNlogP (see below), the number of nonpolar atoms and its molecular-weightnormalized version, and the average Gasteiger negative charge. QM descriptors that are among 30 most important ones include the energy of the highest occupied molecular orbital and the Mulliken charge range in a molecule. The top five most important descriptors from the REGRF model are NNlogP, GClogP, number of nonpolar atoms and its molecular-weightnormalized version, and the second smallest eigenvalue of the Burden molecular connectivity matrix (Min eV #2), while QM descriptors among the top 30 include isotropic polarizability, dipole moment, and the most negative Mulliken charge of the molecule. For model CLARF, the five most important descriptors are GClogP, NNlogP, carbon count, Min eV #2, and NPSA. The QM descriptors within the top 30 are dipole moment, energy of the lowest occupied molecular orbital, and the charge of the atom with maximum solvent accessible surface area. The GClogP and NNlogP are estimated logP values using Ghose−Crippen descriptors 22 and a neural net based estimation from other non-logP Selma descriptors,15 respectively. Many of the most important descriptors have a rather direct interpretation from a chemical point of view. The second smallest eigenvalue of the Burden molecular connectivity matrix is more difficult to assign a direct chemically interpretable meaning, but it is related to molecular size since the diagonal elements of matrix are atomic weights.



CONCLUSION We have built classification models for in vivo hepatic CLint data containing values as low as 1 mL/min. All models have good statistics and sound cross-validation performances. Although regression modeling on the CLint is probably still quite difficult, however, the results obtained in this work suggest that a promising approach for predicting in vivo CLint data seems to be the combination of regression analysis followed by classification of the predictions. Our models show good statistics and predictive qualities. The most relevant descriptors are of lipophilicity and charge/polarizability types.



ASSOCIATED CONTENT

S Supporting Information *

SMILES and clearance values for the investigated data set. This material is available free of charge via the Internet at http:// pubs.acs.org.



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Present Address

U.N.: Department of Computational Chemistry, H. Lundbeck A/S, Ottiliavej 9, 2500 Valby, Denmark. Notes

The authors declare no competing financial interest. 1320

dx.doi.org/10.1021/mp300484r | Mol. Pharmaceutics 2013, 10, 1318−1321

Molecular Pharmaceutics



Article

E.; Ogliaro, F.; Bearpark, M.; Heyd, J. J.; Brothers, E.; Kudin, K. N.; Staroverov, V. N.; Kobayashi, R.; Normand, J., Raghavachari, K.; Rendell, A.; Burant, J. C., Iyengar, S. S.; Tomasi, J., Cossi, M.; Rega, N.; Millam, J. M, Klene, M., Knox, J. E.; Cross, J. B.; Bakken, V.; Adamo, C., Jaramillo, J.; Gomperts, R.; Stratmann, R. E.; Yazyev, O., Austin, A. J.; Cammi, R.; Pomelli, C.; Ochterski, J. W.; Martin, R. L.; Morokuma, K.; Zakrzewski, V. G.; Voth, G. A.; Salvador, P.; Dannenberg, J. J.; Dapprich, S.; Daniels, A. D.; Farkas, Ö .; Foresman, J. B.; Ortiz, J. V.; Cioslowski, J.; Fox, D. J. Gaussian 09, Revision A.1; Gaussian, Inc.: Wallingford, CT, 2009. (21) Boström, H. Concurrent Learning of Large-Scale Random Forests. In Eleventh Scandinavian Conference on Artificial Intelligence: Scai 2011; Kofod-Petersen, A., Heintz, F., Langseth, H., Eds.; IOS Press: Amsterdam, 2011; pp 20−29. (22) Ghose, A. K.; Crippen, G. M. Atomic physicochemical parameters for three-dimensional structure-directed quantitative structure-activity relationships I. Partition coefficients as a measure of hydrophobicity. J. Comput. Chem. 1986, 7, 565−577.

REFERENCES

(1) Ward, K. W.; Smith, B. R. A comprehensive quantitative and qualitative evaluation of extrapolation of intravenous pharmacokinetic parameters from rat, dog, and monkey to humans. I. Clearance. Drug Metab. Dispos. 2004, 32, 603−611. (2) Clarke, S. E.; Jeffrey, P. Utility of metabolic stability screening: comparison of in vitro and in vivo clearance. Xenobiotica 2001, 31, 591−598. (3) Bogaards, J. J. P.; Bertrand, M.; Jackson, P.; Oudshoorn, M. J.; Weaver, R. J.; van Bladeren, P. J.; Walther, B. Determining the best animal model for human cytochrome P450 activities: A comparison of mouse, rat, rabbit, dog, micropig, monkey and man. Xenobiotica 2000, 30, 1131−1152. (4) Stringer, R.; Nicklin, P. L.; Houston, J. B. Reliability of human cryopreserved hepatocytes and liver microsomes as in vitro systems to predict metabolic clearance. Xenobiotica 2008, 38, 1313−1329. (5) Ekins, S. In silico approaches to predicting drug metabolism, toxicology and beyond. In Structural biology in drug metabolism and drug discovery; Biochemical Society Transactions: AstraZeneca R & D Charnwood: Loughborough, Leicestershire, U.K., 2003. (6) Norinder, U. Calculated Molecular Properties and Multivariate Statistical Analysis. In Drug Bioavailability: Estimation of Solubility, Permeability, Absorption and Bioavailability, Vol. 40, 2nd ed.; van de Waterbeemd, H., Testa, B., Eds.; Wiley-VCH Verlag GmbH & Co. KGaA: Weinheim, 2009; pp 375−408. (7) Gleeson, M. P.; Hersey, A.; Hannongbua, S. In-Silico ADME Models: A General Assessment of their Utility in Drug Discovery Applications. Curr. Top. Med. Chem. 2011, 11, 358−381. (8) Trygg, J.; Wold, S. Orthogonal projections to latent structures (O-PLS). J. Chemom. 2002, 16, 119−128. (9) Witten, I. H.; Frank, E. Data MiningPractical Machine Learning Tools and Techniques; Elsevier: San Francisco, 2005. (10) Varma, M.; Obach, R. S.; Rotter, C.; Miller, H. R.; Change, G.; Steyn, S. J.; El-Kattan, A.; Troutman, M. D. Physicochemical Space for Optimum Oral Bioavailability: Contribution of Human Intestinal Absorption and First-Pass Elimination. J. Med. Chem. 2010, 53, 1098− 1108. (11) Version 12.0 of the Simca-P software, Umetrics AB, Umeå, Sweden. (12) McGregor, M. J.; Muskal, S. Pharmacophore Fingerprinting. 1. Application to QSAR and Focused Library Design. J. Chem. Inf. Comput. Sci. 1999, 39, 569−574. (13) Chawla, N. V. Chapter 40: Data Mining for Imbalanced Datasets: An Overview. In Data mining and knowledge discovery handbook; Maimon, O. Z., Rokach, L., Eds.; Springer: New York, 2005; pp 853−867. (14) Chawla, N. V.; Bowyer, K. W.; Hall, L. O.; Kegelmeyer, W. P. SMOTE: Synthetic Minority Over-Sampling Technique. J. Artif. Intell. Res. 2002, 16, 321−357. (15) Olsson, T.; Sherbukhin, V. SELMA, Synthesis and Structure Administration (SaSA). AstraZeneca R&D Mölndal, Sweden. (16) Becke, A. D. Density-functional exchange-energy approximation with correct asymptotic-behavior. Phys. Rev. A 1988, 38, 3098−3100. (17) Perdew, J. P. Density-functional approximation for the correlation energy of the inhomogeneous electron gas. Phys. Rev. B 1986, 33, 8822−8824. (18) Hariharan, P. C.; Pople, J. A. Influence of Polarization Functions on Molecular-Orbital Hydrogenation Energies. Theor. Chem. Acc. 1973, 28, 213−222. (19) Hehre, W. J.; Ditchfield, R.; Pople, J. A. Self-Consistent Molecular Orbital Methods. 12. Further Extensions of Gaussian-Type Basis Sets for Use in Molecular-Orbital Studies of Organic-Molecules. J. Chem. Phys. 1972, 56, 2257−2261. (20) Frisch, M. J.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.; Cheeseman, J. R.; Scalmani, G.; Barone, V.; Mennucci, B.; Petersson, G. A.; Nakatsuji, H.; Caricato, M.; Li, X.; Hratchian, H. P.; Izmaylov, A. F.; Bloino, J.; Zheng, J.; Sonnenberg, J. L.; Hada, M.; Ehara, M.; Toyota, K.; Fukuda, R.; Hasegawa, J.; Ishida, M.; Nakajima, T.; Honda, Y.; Kitao, O.; Nakai, H.; Vreven, T.; Montgomery, Jr., J. A.; Peralta, J. 1321

dx.doi.org/10.1021/mp300484r | Mol. Pharmaceutics 2013, 10, 1318−1321