Development of Simplified in Vitro P-Glycoprotein Substrate Assay

The transport of test compounds from the A-to-B side was assessed in the presence or absence of 10 μM CsA (CsA completely inhibits P-gp function at 1...
0 downloads 0 Views 1MB Size
Subscriber access provided by UNIV AUTONOMA DE COAHUILA UADEC

Article

Development of a Simplified in Vitro P-glycoprotein Substrate Assay and in Silico Prediction Models to Evaluate Transport Potential of P-glycoprotein Rikiya Ohashi, Reiko Watanabe, Tsuyoshi Esaki, Tomomi Taniguchi, Nao Torimoto-Katori, Tomoko Watanabe, Yuko Ogasawara, Tsuyoshi Takahashi, Mikiko Tsukimoto, and Kenji Mizuguchi Mol. Pharmaceutics, Just Accepted Manuscript • DOI: 10.1021/ acs.molpharmaceut.8b01143 • Publication Date (Web): 01 Apr 2019 Downloaded from http://pubs.acs.org on April 3, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

 Title Development of a Simplified in Vitro P-glycoprotein Substrate Assay and in Silico Prediction Models to Evaluate Transport Potential of P-glycoprotein Rikiya Ohashi1,3, ‡,*, Reiko Watanabe3, ‡, Tsuyoshi Esaki3, Tomomi Taniguchi1, Nao Torimoto-Katori1, Tomoko Watanabe2, Yuko Ogasawara2, Tsuyoshi Takahashi2, Mikiko Tsukimoto1, Kenji Mizuguchi3 1. Discovery Technology Laboratories, Mitsubishi Tanabe Pharma Corporation, 2-2-50 Kawagishi, Toda, Saitama 335-8505, Japan 2. DMPK Research Laboratories, Mitsubishi Tanabe Pharma Corporation, 2-2-50 Kawagishi, Toda, Saitama 335-8505, Japan 3. Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, 7-6-8 Saito-Asagi, Ibaraki, Osaka 567-0085, Japan



Author Information

Corresponding Author Rikiya Ohashi Laboratory of Bioinformatics, National Institutes of Biomedical Innovation, Health and Nutrition, 7-6-8 Saito-Asagi, Ibaraki, Osaka 567-0085, Japan TEL; +81-72-639-7010 FAX; +81-72-641-9881 E-mail address; [email protected] Discovery Technology Laboratories, Mitsubishi Tanabe Pharma Corporation 2-2-50 Kawagishi, Toda, Saitama 335-8505, Japan TEL: +81 48-433-2762. FAX: +81 48-433-2849 E-mail: [email protected]



Author Contributions

‡These

authors contributed equally.

ACS Paragon Plus Environment

Molecular Pharmaceutics 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60



Abbreviations

P-gp, P-glycoprotein; CsA, cyclosporin A; CNS, central nervous system; NCEs, new chemical entities; CL, membrane-permeable clearance; Kp,brain, brain-to-plasma concentration ratio; Kp,uu, unbound brain-to-blood concentration ratio; KO, knockout; WT, wild type; MTPC, Mitsubishi Tanabe Pharma Corporation; PS, permeability-surface area products; A-to-B, apical-to-basal; B-to-A, basalto-apical; UFR, unidirectional flux ratio; CFR, corrected flux ratio; PCA, principal component analysis; MLR, Multiple Linear Regression; k-NN, k-Nearest Neighbors; RF, Random Forest; SVM, Support Vector Machine; TPSA, topological polar surface area; HBD, hydrogen bond donor; HBA, hydrogen bond acceptor 

ORCID iD

Rikiya Ohashi: 0000-0002-6919-5665 Reiko Watanabe: 0000-0001-9359-8731 Tsuyoshi Esaki: 0000-0001-8780-6346 Kenji Mizuguchi: 0000-0003-3021-7078

ACS Paragon Plus Environment

Page 2 of 38

Page 3 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics



Table of Contents Graphic

ACS Paragon Plus Environment

Molecular Pharmaceutics 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60



Abstract

For efficient drug discovery and screening, it is necessary to simplify P-glycoprotein (P-gp) substrate assays and to provide in silico models that predict the transport potential of P-gp. In this study, we developed a simplified in vitro screening method to evaluate P-gp substrates by unidirectional membrane transport in P-gp-overexpressing cells. The unidirectional flux ratio positively correlated with parameters of the conventional bidirectional P-gp substrate assay (R2 = 0.941) and in vivo Kp, brain ratio (mdr1a/b KO/WT) in mice (R2 = 0.800). Our in vitro P-gp substrate assay had high reproducibility and required approximately half the labor of the conventional method. We also constructed regression models to predict the value of P-gp-mediated flux and three-class classification models to predict P-gp substrate potential (low-, medium-, and highpotential) using 2397 data entries with the largest dataset collected under the same experimental conditions. Most compounds in the test set fell within 2- and 3-fold errors in the random forest regression model (71.3 and 88.5%, respectively). Furthermore, the random forest three-class classification model showed a high balanced accuracy of 0.821 and precision of 0.761 for the lowpotential classes in the test set. We concluded that the simplified in vitro P-gp substrate assay was suitable for compound screening in the early stages of drug discovery and the in silico regression model and three-class classification model using only chemical structure information could identify the transport potential of compounds including P-gp-mediated flux ratios. Our proposed method is expected to be a practical tool to optimize effective central nervous system (CNS) drugs, to avoid CNS side effects, and to improve intestinal absorption. Keywords: P-glycoprotein, substrate, non-substrate, in vitro screening, in silico prediction, physicochemical parameters, correlation, machine learning

ACS Paragon Plus Environment

Page 4 of 38

Page 5 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics



Introduction

While pharmacotherapeutic satisfaction has improved in recent years, many central nervous system (CNS) diseases (e.g., schizophrenia, depression, and anxiety neurosis) and rare diseases remain as unmet medical needs. In drug discovery strategies, drug penetration into the brain should be controlled to potentiate the pharmacological actions of CNS drugs and to avoid side effects of non-CNS drugs in the CNS. Therefore, controlling the brain penetration of new chemical entities (NCEs) becomes an important issue in the early stages of drug discovery. The multidrug resistance protein 1 (MDR1 or ATP-binding cassette subfamily B member 1 (ABCB1)) gene product, P-glycoprotein (P-gp), is an ABC transporter, which has an ATP binding cassette and transports its substrate using the energy of ATP hydrolysis as a driving force. P-gp plays an important role in the pharmacokinetics of several drugs and has extremely broad substrate specificity. P-gp is highly expressed in intestinal epithelial cells, brain capillary endothelial cells, renal tubular epithelial cells, and hepatocytes, where it limits the membrane permeability of drugs. Especially, P-gp functions as a barrier for the endothelial cells of the brain capillaries to restrict the penetration of drugs and toxins into the brain.1-2 Furthermore, intestinal P-gp disturbs the absorption of drugs, which results in low bioavailability and/or nonlinear absorption. Many CNS-active drugs are barely affected by P-gp efflux transport and have good brain penetration profiles.3 In contrast, second-generation H1 antagonists, such as bepotastine, fexofenadine, and cetirizine, are good P-gp substrates, and their brain distribution is limited by Pgp activity. By utilizing this property, the developers of these drugs have skillfully avoided side effects such as sedation.4-6 Thus, P-gp-mediated efflux transport has been an important issue in the discovery, optimization, and development strategy of NCEs. Accordingly, the evaluation of Pgp transport is now an essential screening item for pharmaceutical companies during drug development. Generally, the impact of P-gp efflux transport of NCEs has been evaluated by using MDR1 gene-transfected P-gp-overexpressing cells that produce highly polarized monolayers such as LLC-PK1 and MDCKII cells. The potential of P-gp-mediated transport of a compound is evaluated by measurement of permeability in the apical-to-basal (A-to-B) and basal-to-apical (Bto-A) directions across the P-gp-overexpressing cells in the presence and absence of a potent P-gp inhibitor.7 Corrected flux ratio is a generally accepted evaluation of P-gp substrates; it is assessed by measuring bidirectional membrane transport across the P-gp-overexpressing cells and host cells. It was reported previously that the corrected flux ratio correlates well with the corresponding in

vivo Kp, brain ratio (mdr1a/1b KO/WT) in mice.8 However, measuring bidirectional transcellular transport is time-consuming and costly. Therefore, a simplified in vitro P-gp substrate assay is needed in the early stages of drug discovery. In addition, it would be very useful if the P-gpmediated transport potential could be predicted in silico, reducing the number of P-gp assays with highly reliable predictions.

ACS Paragon Plus Environment

Molecular Pharmaceutics 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Predictions of P-gp substrate specificity by computational approaches have been reported in several studies using many statistical techniques and machine learning approaches, such as Multiple Linear Regression (MLR), k-Nearest Neighbors (k-NN), Random Forest (RF), and Support Vector Machine (SVM) algorithms. These methods have become major approaches to predict whether a compound is a P-gp substrate/non-substrate or a P-gp inhibitor/non-inhibitor using binary classification models. Wang et al. built computational models to predict whether a compound is a P-gp substrate by using an SVM with a dataset containing 332 compounds, which showed a prediction accuracy of 0.88.9 Poongavanam et al. used a set of fingerprints for k-NN-, RF-, and SVM-based binary classifications of 484 P-gp substrates/non-substrates, and the best model had an accuracy of 0.75.10 Most recently, Li et al. developed binary classification models based on simple molecular properties, topological descriptors, and fingerprints using the naïve Bayesian classification technique with a set of 822 compounds, which showed an accuracy of 0.84. Although these previous studies showed relatively high degrees of accuracy for binary classification, the number of compounds used in their datasets was limited. In addition, the datasets were derived from several sources: publications, databases, and in-house experiments, with inconsistent assay conditions. A high-quality dataset is critical to create highly reliable theoretical models, because P-gp substrate assays are susceptible to experimental conditions. Desai

et al. published a report about in silico prediction tools based on structurally diverse data from more than 2000 compounds generated using in-house P-gp substrate assays over the past several years under a unified condition, which showed an overall accuracy of 0.80.11 However, the generated models in that study, which were created to distinguish substrates and non-substrates by binary classification, were similar to those used in previous studies, and no study has focused on predicting the transport potential of P-gp substrates using three-class classification. We believe that it is advantageous to be able to predict not only if a compound is a substrate or non-substrate, but also its P-gp transport potential (e.g., low, medium, or high potential), especially in the drug optimization stages of drug discovery. Furthermore, it is desirable to develop regression models that can be used to predict P-gp-mediated efflux values. Gunaydin et al. uniquely presented a model to predict P-gp efflux ratios in 282 compounds with an R2 (R-Squared, Coefficient of determination) and RMSE (Root mean square error) of 0.64 and 0.29 in the test set, respectively. They demonstrated that the difference in computed solvation free energies of a compound between water and chloroform can be used as a de novo descriptor.12 By using general descriptors of large amounts of high-quality data obtained under standardized conditions, prediction models can be more convenient and applicable to larger classes of chemical compounds. To approach these challenges, we have collected in-house in vitro assay data over several years, consisting of 2397 Pgp transport assays at various stages of drug discovery under the same conditions. To our knowledge, the dataset reported here is larger and more consistently collected than those used in

ACS Paragon Plus Environment

Page 6 of 38

Molecular Pharmaceutics

Page 7 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

previous reports.9-11, 13 In the present study, we established a simplified in vitro screening method to evaluate P-gp substrates by only unidirectional membrane transport in P-gp-overexpressing cells and proposed reference compounds for potential classification of P-gp-mediated transport. Furthermore, in

silico regression models to predict values of P-gp-mediated flux ratios and three-category classification models using dynamic thresholds, which were dependent on the P-gp-mediated flux ratio of reference compounds in each experiment, were constructed.

ACS Paragon Plus Environment

Molecular Pharmaceutics 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60



Experimental Section

Chemicals Antipyrine, atenolol, buspirone, caffeine, carbamazepine, chlorpromazine, clozapine, cyclosporin A (CsA), dexamethasone, haloperidol, hydroxyzine, ketoconazole, loperamide, metoclopramide, nifedipine, prazosin, propranolol, quinidine, reserpine, risperidone, sertraline, sulpiride, terfenadine, tolbutamide, venlafaxine, and verapamil were purchased from Sigma Chemical Co. (St. Louis, MO, USA), Tokyo Chemical Industry Co. (Tokyo, Japan), and FUJIFILM Wako Pure Chemical Corporation (Osaka, Japan). All other chemicals and reagents were reagent grade and were obtained from commercial sources. LLC-PK1 cells, LLC-GA5-COL150 cells (P-gp-overexpressing cells), and culture conditions LLC-PK1 cells were obtained from the Japan Health Science Research Resources Bank (Osaka, Japan). LLC-PK1 cells that had been stably transfected with the human MDR1 gene to overexpress P-gp (i.e., LLC-GA5-COL150 cells14-15) were purchased from the Riken Gene Bank (Tsukuba, Japan). These cells were maintained by serial passage in plastic culture dishes. LLC-PK1 cells were cultured in complete medium consisting of Medium 199 (Nissui Pharmaceutical, Tokyo, Japan) with 10% fetal bovine serum (Invitrogen, Carlsbad, CA, USA). For P-gp-overexpressing cells, 150 ng/mL colchicine was added to the same complete medium. LLC-PK1 and P-gp-overexpressing cells were grown as monolayer cultures at 37 °C in a 5% CO2/95% air atmosphere in plastic culture dishes. Simplified assay for P-gp substrates using P-gp-overexpressing cell monolayers P-gp-overexpressing cells were seeded on microporous polycarbonate membrane Transwell® inserts (3-µm pore size, 6.5-mm diameter, 0.33 cm2, 24-well culture plates; Corning Costar, Inc., Cambridge, MA, USA) at a cell density of 3.2–4.6 × 105 cells/cm2. The cells were used for transport studies on the fifth or sixth day after seeding and were cultured on the microporous membrane with 200 µL and 600 µL of complete medium without colchicine in the donor and receiver compartments, respectively, 24 h before experiments. The transport of test compounds from the A-to-B side was assessed in the presence or absence of 10 µM CsA (CsA completely inhibits P-gp function at 10 µM). Approximately 1 h before the initiation of the transport experiments, the medium in both the donor and receiver compartments was replaced with transport medium, which consisted of Hanks' Balanced Salt Solution (HBSS) supplemented with 20 mM HEPES (pH 7.4) with or without CsA. The transport experiments were initiated by replacing the apical fluid with medium containing 1 µM test compound and the medium on the basal side was replaced again with fresh transport medium with or without inhibitor. After 2 h of incubation at 37 °C, aliquots of medium were sampled from the basal

ACS Paragon Plus Environment

Page 8 of 38

Page 9 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

compartment. To confirm the integrity of the cell monolayer in each well, lucifer yellow was added with the test compound. The concentrations of test compounds in the transport buffer were measured by liquid chromatography tandem mass spectrometry (LC/MS/MS). The apparent permeability coefficient (Papp; in cm/s) was calculated as described previously.16 Papp in the absence (Papp, A-to-B(-CsA)) or presence (Papp, A-to-B(+CsA)) of the P-gp inhibitor CsA was calculated using Equations 1 and 2, respectively. Papp,

A ― to ― B( ― CsA)

=

CLA ― to ― B( ― CsA)

(1) Papp,

A

A ― to ― B( + CsA)

=

CLA ― to ― B( + CsA) A

(2)

where CL is the membrane-permeable clearance in the P-gp-overexpressing cell monolayers, and A is the membrane surface area of the Transwell® insert used in the unidirectional transcellular transport assay. In P-gp-overexpressing cells, the unidirectional flux ratio (UFR) is defined by the following Equation 3. Unidirectional Flux Ratio (UFR) =

Papp,

A ― to ― B( +CsA)

Papp,

A ― to ― B( ―CsA)

=

CLA ― to ― B( +CsA) CLA ― to ― B( ―CsA)

(3)

The transport of test compounds in the A-to-B direction of P-gp-overexpressing cells was measured to mimic the flux of drugs across brain capillary endothelial cells from the blood or across intestinal epithelial cells from the intestinal luminal side. According to the pharmacokinetic model illustrated in Scheme 1, reported previously by Mizuno et al. and Adachi et al., the membrane-permeable clearance of A-to-B (CLA-to-B(-CsA)) is described by the following Equation 4:8, 17 CLA ― to ― B( ― CsA) =

PSA,inf × PSB,eff PSA,eff + PSB,eff + PSP ― gp

(4)

where PSA,inf and PSA,eff represent the permeability-surface area products (PS) for the influx and non-P-gp-mediated efflux, respectively, across the apical membrane of P-gp-overexpressing cell monolayers; PSB,eff and PSB,inf represent the PS values for the efflux and influx, respectively, across the basal membrane; and PSP-gp represents the PS value for the P-gp-mediated efflux across the apical membrane. The membrane-permeable clearance of A-to-B in the presence of CsA (CLA-to-B (+CsA)) is described by the following Equation 5, which was derived from Equation 4 because PSP-gp is approximately zero when P-gp is completely inhibited by CsA. CLA ― to ― B( + CsA) =

PSA,inf × PSB,eff PSA,eff + PSB,eff

(5)

To clarify the contribution of PSP-gp to drug transport across the P-gp overexpressing cell monolayers, UFR (Equation 3) was calculated using Equations 6 and 7, which were derived from Equations 4 and 5:

ACS Paragon Plus Environment

Molecular Pharmaceutics 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Unidirectional Flux Ratio (UFR) =

CLA ― to ― B( +CsA) CLA ― to ― B( ―CsA)

=

Page 10 of 38

PSA,eff + PSB,eff + PSP ― gp

=1+

PSA,eff + PSB,eff PSP ― gp PSA,eff + PSB,eff

(6)

(7)

Conventional assay for P-gp substrates using P-gp-overexpressing and LLC-PK1 cell monolayers Wild-type LLC-PK1 cells and P-gp-overexpressing cells were seeded on microporous polycarbonate membrane Transwell® inserts (3-µm pore size, 6.5-mm diameter, 0.33 cm2, 24-well culture plate; Corning Costar, Inc.) at a cell density of 1.0 × 105 and 3.2–4.6 × 105 cells/cm2, respectively. The cells were cultured with 200 µL and 600 µL of complete medium without colchicine in the donor and receiver compartments, respectively, for 24 h before experiments. The cells were supplemented with fresh medium every 2 days and used for the transport studies on the sixth day after seeding. Approximately 1 h before initiation of the transport experiments, the medium in both the donor and receiver compartments was replaced with transport medium. The transport experiments were initiated by replacing the medium with medium that either contained or did not contain test compound. After 2 h of incubation at 37 °C, aliquots of medium were sampled from the receiver compartment. The drug concentrations were measured in an LCMS/MS system. To confirm the integrity of the cell monolayer in each well, lucifer yellow was added with the test compound. The flux ratio was calculated by the following equation: Flux ratio = Papp,

B-to-A/Papp, A-to-B,

where Papp,

B-to-A

and Papp,

A-to-B

represent the apparent permeability

coefficients in the B-to-A and A-to-B direction, respectively. Adachi et al. previously proposed the corrected flux ratio (CFR) as a parameter to evaluate the in vitro transport potential of P-gp, and CFR was evaluated by the following Equations 8 and 9:8 Corrected Flux Ratio (CFR) =

Flux ratio in P ― gp ― overexpressing LLC ― PK1 Cells Flux ratio in LLC ― PK1 Cells

=1+

PSP ― gp PSA,eff

(8) (9)

Evaluation of simplified in vitro assay for P-gp-mediated transport using P-gp-overexpressing LLC-PK1 cells To confirm and clarify the experimental reproducibility, propranolol, prazosin, and quinidine were used as reference compounds (Figure S1). Thus, propranolol, prazosin, and quinidine were defined as a non-substrate, weak substrate, and strong substrate of P-gp, respectively, based on a previous report.7 The reproducibility of the unidirectional membrane transport experiments was evaluated using these compounds.

ACS Paragon Plus Environment

Page 11 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

Correlation between UFR and CFR or in vivo Kp, brain ratio (mdr1a/1b KO/WT) in mice To investigate the correlation between UFR and CFR, data were obtained using P-gpoverexpressing LLC-PK1 cells and LLC-PK1 cells for the 12 compounds, including P-gp substrates (weak and strong substrates) and non-substrates. Furthermore, to investigate the correlation between UFR and the in vivo Kp, brain ratio (mdr1a/b KO/WT) in mice, the in vivo Kp, brain ratios of 16 compounds in mice were collected from previous reports.3, 18-20 Analytical procedure The LC-MS/MS instrument consisted of an Acquity ultra-performance liquid chromatography system (UPLC, Waters, Milford, MA, USA) equipped with an Acquity UPLC BEH C18 column (30 or 50 × 2.1 mm, i.d., 1.7-µm particle size; Waters) coupled to a Quattro Premier tandem quadrupole mass spectrometer or Xevo TQ triple quadrupole mass spectrometer (Waters) operated in the positive or negative electrospray ionization mode with an internal standard (erythromycin or carbamazepine). The LC was operated under gradient conditions with mobile phases (0.1% formic acid and acetonitrile) at a flow rate of 0.5 mL/min and column temperature of 50 °C. The ionization source parameters were: capillary voltage 0.5–3.4 kV, source temperature 120–150 °C, desolvation gas temperature 350–600 °C at a flow rate of 800–1200 L/h (nitrogen), and cone gas flow rate of 50–100 L/h. Nitrogen (99.9% purity) and argon (99.9999% purity) were used as the cone and collision gases, respectively. The autosampler was conditioned at 5–8 °C with an injection volume of 2–5 µL. Potential classification of P-gp substrates based on reference compounds by in vitro assay We propose reference compounds for the classification of P-gp-mediated transport that are defined as follows: test compounds with UFRs less than or equal to that of propranolol are categorized as non-substrates of P-gp (decision: low-potential class), those with UFRs from propranolol to prazosin are categorized as non-substrates or weak substrates of P-gp (decision: medium-potential class), and those with UFRs greater than that of prazosin are categorized as weak to strong substrates of P-gp (decision: high-potential class). An overview of this categorization is shown in Scheme 2. Dataset preparation The 2D structures of the 2397 compounds in the dataset (Mitsubishi Tanabe Pharma Compounds, MTPC dataset), mainly CNS-targeted compounds and/or compounds with low membrane permeability in Caco-2 cells, were obtained from the in-house Isentris database (Biovia) as structure data files (SDF). After these structures were converted into 3D structures using

ACS Paragon Plus Environment

Molecular Pharmaceutics 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Operating Environment (MOE) software, they were locally minimized by Merck molecular force field 94x (MMFF94x) for the study. For pKa and pKb, calculator Plugins (Marvin 6.2.1, 2014, ChemAxon) were used. We are currently unable to share structure related information in entirety due to proprietary reasons. Descriptor calculation and data analysis We calculated 206 2D descriptors using MOE2015. The list of these calculated descriptors is shown in Tables S1-1–S1-4 (Chemical Computing Group ULC, Montreal, QC, Canada). Extended, KlekotaRoth, and Atom Pairs 2D fingerprints were generated by PaDEL-Descriptor.21 Data analysis was performed with R Statistical Software and the results were visualized by the ggplot222 and ggfortify23 packages. Eleven descriptors (i.e., Weight (molecular weight), SLogP (log octanol/water partition coefficient), TPSA (topological polar surface area), h_logD (octanol/water distribution coefficient (pH = 7)), h_pKa (acidity (pH = 7)), h_pKb (basicity (pH = 7)), a_acc (number of H-bond acceptor atoms), a_don (number of H-bond donor atoms), a_aro (number of aromatic atoms), b_ar (number of aromatic bonds), and b_rotN (number of rotatable bonds)) were used for the principal component analysis (PCA). As a control dataset for the chemical space analysis, 6162 approved drugs, containing organic compounds of less than 2000 Da including organohalides from KEGG (Kyoto Encyclopedia of Genes and Genomes24) DRUG, which is a comprehensive drug information resource for approved drugs in Japan, USA, and Europe, were applied. Model construction The caret25 package in R was used to build the prediction models. Two hundred and six features calculated by MOE were used for the model construction. The 2397-compound dataset was split into common training (1919 compounds) and test (478 compounds) sets, using random selection at a ratio of 8:2. The descriptors that showed near-zero variance and absolute correlations above 0.90 were identified and excluded by calculating frequency ratios with the nearZeroVar function and by creating a correlation matrix with the findCorrelation function in the caret package. Important descriptors were selected using the Boruta26 algorithm to automatically rank and omit descriptors based on Z-score with the training set. This algorithm is a wrapper built around the random forest classification algorithm implemented in the R package, randomForest; it allows unbiased and stable selection of important and non-important attributes. To predict P-gp-mediated transport potential in P-gp-overexpressing cells, the classification models were constructed using various machine learning techniques, namely Random Forest (RF), Support Vector Machine (SVM, with radial functions), k-Nearest Neighbors (k-NN), Artificial Neural Network (ANN), and AdaBoost. To execute each technique, the train function was passed

ACS Paragon Plus Environment

Page 12 of 38

Page 13 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

with method parameter as rf, svm, knn, nnet, and AdaBoost.M1 in the caret package. The RF, kNN, and AdaBoost algorithms can naturally handle multiclass classification, whereas all-versusall and all-versus-rest approaches were used for multiclass SVM in the e1071 package27 and multinomial log-linear models via neural networks in the nnet package28, respectively. We used the automatic grid search of each tuning parameter with 4 (tune Length = 4) values for each (16 total models) in the caret package to prioritize the optimal parameters for our predictions and a 10-fold cross validation was implemented during training. Balanced accuracy, kappa, and positive predictive value (Precision) obtained from the confusion matrix were used to evaluate the classification models on the common test set containing 478 compounds. Judgement error rates, defined as the rate of misclassification across two categories, were also calculated. The regression models were constructed with values of UFR using four types of machine learning techniques, RF, SVM with radial functions, k-NN, and ANN on the common training set of 1919 compounds. The technical processes were similar to those used for the classification models; RMSE was applied to evaluate the accuracy of the models. For validation, the common test set containing 478 compounds was used. A confusion matrix based on predicted UFR values was generated and the balanced accuracy, precision, and error rate in the three-class classification model and regression model were compared. 

Results

Evaluation of simplified in vitro assay for P-gp-mediated transport using P-gp-overexpressing LLC-PK1 cells The permeability of the reference compounds, propranolol (non-substrate of P-gp), prazosin (weak substrate of P-gp), and quinidine (strong substrate of P-gp), was assessed in the A-to-B direction in the presence or absence of the P-gp inhibitor CsA (10 µM), using P-gp-overexpressing cells that had been stably transfected with the human MDR1 gene. The results for the unidirectional membrane transport of these reference compounds across the P-gp-overexpressing cell monolayers, along with the UFRs, are summarized in Table 1. The UFR method demonstrated good experimental reproducibility. The Papp, A-to-B (+CsA) of prazosin and quinidine as P-gp substrates increased compared with their corresponding Papp, A-to-B (-CsA) values. The UFR values increased depending on the potential of the P-gp substrate. The dataset that was constructed appeared to be high quality because the experimental reproducibility of the UFR was good. Correlation between UFR and CFR for 12 compounds Figure 1 shows the correlation between UFRs using only P-gp-overexpressing cells and CFRs using P-gp-overexpressing cells and LLC-PK1 cells for 12 compounds, including P-gp substrates and non-substrates. The UFRs of the test compounds are well-correlated with the corresponding

ACS Paragon Plus Environment

Molecular Pharmaceutics 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

CFRs with high coefficients of determination (logarithmic plot: y = 1.218x0.742, R2 = 0.941; linear plot: y = 0.491x + 1.055, R2 = 0.970). The linear plot is shown in Figure S2. Correlation between Kp, brain ratio (mdr1a/1b KO/WT) and UFR for 16 compounds To identify the correlation between the Kp,

brain

ratios (mdr1a/b KO/WT) in mice and UFRs,

compounds whose Kp, brain ratios (mdr1a/b KO/WT) in mice have been previously reported were selected.3, 18-20 The correlation between the Kp, brain ratios (mdr1a/b KO/WT) in mice and UFRs is shown in Figure 2. The UFRs of the test compounds are well-correlated with the corresponding

Kp, brain ratios in mice with high coefficients of determination (logarithmic plot: y = 0.981x0.659, R2 = 0.800; linear plot: y = 0.385x + 0.527, R2 = 0.968). The linear plot is shown in Figure S3. Data collection and chemical space analysis Figure 3A shows the distribution of UFRs in the MTPC dataset, with approximately one-half of the compounds having a UFR of less than 2. To visualize the chemical space of the dataset, we performed a PCA of 11 descriptors (i.e., Weight, SLogP, TPSA, h_logD, h_pKa, h_pKb, a_acc, a_don, a_aro, b_ar, and b_rotN), all of which are generally considered to be important parameters for synthetic expansion. As a control, a representative set of 6162 approved small-molecule drugs, taken from KEGG DRUG, was analyzed together. Figure 3B shows a plot of the first two principal components (PC1 and PC2). These two principal components together explained approximately 64% of the variance, and they roughly correspond to lipophilicity/hydrophilicity and molecular weight, respectively. The MTPC compounds show narrower chemical space, especially with respect to lipophilicity and hydrophilicity, than the approved drugs because the MTPC compounds are mainly CNS-targeted compounds and/or have low membrane permeability in Caco-2 cells. In the guidelines for drug-drug interactions published by the U.S. Food and Drug Administration29, the European Medicines Agency30, and the Japanese Pharmaceuticals and Medical Devices Agency31, it is recommended that a test compound be considered a P-gp substrate if its net flux ratio is more than 2 in P-gp-expressing cells. In the case of CFR = 2, a UFR is calculated with the linear regression equation obtained from linear plots (Figure S2), and the UFR was estimated to be approximately 2. Accordingly, compounds with UFRs less than 2 and more than 2 are defined as P-gp non-substrates (UFR < 2) and P-gp substrates (UFR ≥ 2), respectively. Several physicochemical properties have been correlated with P-gp recognition, including cLogP (calculated partition coefficient), TPSA, and the number of hydrogen bond donors (HBD) and acceptors (HBA).32-33 To clarify the distributions of several physicochemical properties, the number of P-gp substrates (UFR ≥ 2) and P-gp non-substrates (UFR < 2) are depicted in bar graphs for molecular weight, partition coefficient (LogP), TPSA, HBD, and HBA using the 2397 MTPC

ACS Paragon Plus Environment

Page 14 of 38

Page 15 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

dataset (Figure 4A–E). Furthermore, the data of UFR, Papp,

A-to-B(-/+CsA),

and some fundamental

molecular descriptors are shown in Supporting Information 2. The distributions of P-gp substrates (1102 compounds) and P-gp non-substrates (1295 compounds) are substantially overlapped with respect to these physicochemical properties. Figure 4F shows the chemical space of three MTPC compound classes (i.e., high-, medium-, and low-potential compounds) visualized by PCA. While the low class includes compounds exhibiting lower lipophilicity, the high class includes compounds having larger molecular weights than those of the other classes. The 95% normal confidence ellipse of the medium-potential class was completely overlapped with that of the highand low-potential classes; there was substantial overlap among the 95% normal confidence ellipses of the three classes of MTPC compounds. Three-class classification models and regression models Initially, the three-class classification models were constructed to predict P-gp-mediated transport potential in the P-gp-overexpressing cells. Compounds were classified into three classes, the low-, medium-, and high-potential classes, according to the UFR values of prazosin and propranolol in each experiment (Scheme 2, Details are described in “Potential classification of Pgp substrates based on reference compounds by in vitro assay” in the Experimental Section). The dataset consisted of 667, 719, and 1011 compounds in the low, medium, and high classes, respectively. After elimination of descriptors that showed near-zero variance and absolute correlation above 0.90, 164 descriptors were selected in the training set by the Boruta algorithm26 to automatically rank and omit descriptors. Five machine learning methods (i.e., RF, SVM with radial functions, k-NN, ANN, and AdaBoost) with a selected set of 164 descriptors were employed in the training set containing 1919 compounds; the models were validated using the common test set containing 478 compounds. Statistical results, such as balanced accuracy, kappa, and positive predictive value (Precision) are shown in Table 2. The kappa values were 0.537–0.603 in the training set and 0.576–0.592 in the test set, which were above the desired level of 0.4.34 The balanced accuracy was approximately 0.8 in the low and high classes among all methods (low class, 0.791–0.821; high class, 0.829–0.846), but balanced accuracy in the medium class was slightly lower than that in the other classes (Medium class, 0.713–0.733). The RF model produced the highest kappa (0.592) among the five machine learning methods, and balanced accuracies were 0.821, 0.713, and 0.846 in the low, medium, and high classes, respectively. The RF parameters (ntree and mtry) were 500 and 110, respectively. The judgement error that should be avoided the most in the three-class classification model is misclassification across two categories, specifically, a high-potential compound being predicted as a low-potential compound, and a low-potential compound being predicted as a high-potential compound. These judgement error rates were 4.8– 7.6% and 2.2–4.9% in our models, respectively.

ACS Paragon Plus Environment

Molecular Pharmaceutics 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

For the regression models, the descriptor selection process was similar to that used in creating the classification models, with 128 descriptors finally selected by the Boruta package. Four machine learning methods (i.e., RF, SVM with radial functions, k-NN, and ANN) with a selected set of descriptors were employed in the training set. The models were validated using the common test set. Since the distribution of UFRs in the original linear scale was biased toward lower ranges (Figure 3A) and the R2 value can be unreliable in evaluating a prediction model with highly biased data, as discussed in previous studies,35-36 RMSE was used as an evaluation parameter. Thus, the RMSE was 4.478–4.820 in the training and 3.844–4.467 in the test set (Table 3). The RF model produced the lowest RMSE among the four machine learning methods; the RF parameters (ntree and mtry) were 500 and 2, respectively. The observed and predicted UFRs by RF regression model are plotted in Figure 5, where 76.2% and 92.0% in the training set and 71.3% and 88.5% in the test set fall within a 2-fold and 3-fold error, respectively. To compare the accuracy of the threeclass RF classification model, a confusion matrix was generated with UFRs predicted by the RF regression model, with the threshold set to UFR = 1.0 for propranolol and UFR = 2.6 for prazosin, according to the validation results of the flux ratios in human P-gp-overexpressing cells (Table 1). Whereas the balanced accuracy and precision for the medium- and high-potential classes were comparable with those of the classification model, the balanced accuracy and precision in the lowpotential class were lower than those of the classification model (0.821 versus 0.526 in balanced accuracy and 0.761 versus 0.161 in precision in the classification and regression models, respectively; Table 2 and Table 5). Furthermore, the error rate for the low-potential compounds in the regression model was four times more than that of three-class classification model (19.4% versus 5.5%). The top-ranked descriptors and their descriptions according to their variable importance for the RF classification and regression models are shown in Table 4. Many of the important descriptors were common in the classification and regression models. Descriptors that influence lipophilicity, such as h_emd (sum of EHT donor strengths), a_don (Number of a H-bond donor atoms), PEOE_VSA_HYD (total hydrophobic vdw surface area), h_pstates (Entropic state count (pH = 7)), and GCUT_SLOGP (LogP GCUT), those that influence basicity, such as h_pKb, and those that influence intermolecular force, such as h_pavgQ (average total charge (pH = 7)), BCUT_PEOE_3 (PEOE Charge BCUT), PEOE_VSA (total positive vdw surface area), and SMR_VSA6 (Bin 6 SMR), were selected as important descriptors. 

Discussion

The conventional methods to evaluate P-gp substrates are inefficient in the early stages of drug discovery because membrane transport measurements of test compounds are required in both Ato-B and B-to-A directions using both P-gp-overexpressing cells and host cells or using P-gp-

ACS Paragon Plus Environment

Page 16 of 38

Page 17 of 38 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

overexpressing cells in the absence and presence of a potent P-gp inhibitor.7-8 Most previous studies have predicted whether a compound is a P-gp substrate/non-substrate or a P-gp substrate/inhibitor by binary classification models,13, 37-41 whereas, to the best of our knowledge, no study has predicted the transport potential of a compound as a P-gp substrate. In the present study, we proposed a simplified in vitro assay to evaluate P-gp substrate potential efficiently in the early stages of drug discovery and determined its correlation with the conventional in vitro method and in vivo Kp, brain ratio. Furthermore, we constructed an in silico predictive model of Pgp transport potential based on the data obtained in our simplified in vitro assay. To simplify the identification process of P-gp substrates, it is necessary to use a cell line in which human P-gp is overexpressed and endogenous P-gp expression is extremely low. Human MDR1 gene-transfected LLC-PK1 cells are derived from porcine renal tubular epithelial cells and are often used in P-gp transport assays along with other cell lines, such as human intestinal Caco-2 cells or human MDR1 gene-transfected canine renal MDCK cells. Although endogenous transporters are expressed in all three cell lines, little activity of P-gp-mediated transport was observed in LLC-PK1 cells, compared to that of MDCK or MDCKII cells, in a transport study using the P-gp substrate, vinblastine.42 Furthermore, P-gp activity was negligible in LLC-PK1 cells by bidirectional membrane transport using fexofenadine as a P-gp substrate.43 Therefore, we concluded that human MDR1-overexpressing LLC-PK1 cells are a suitable cell line to establish a simplified in vitro assay for P-gp substrates. The unidirectional membrane transport of drugs in the A-to-B direction of P-gp-overexpressing cell monolayers mimics the flux of drugs across brain capillary endothelial cells from the blood or across intestinal epithelial cells from the intestinal luminal side. Propranolol, prazosin, and quinidine were selected to validate the reproducibility in our simplified in vitro assay using P-gpoverexpressing cells. The UFRs of the reference compounds had less error and high reproducibility in this simplified in vitro assay for P-gp substrates because the measurements of the test compounds in the absence and presence of P-gp inhibitor were conducted using only unidirectional membrane transport from the A-to-B side in P-gp-overexpressing cells (Table 1). Since both UFR and CFR are calculated as ratios of P-gp-mediated flux to non-P-gp-mediated flux, including passive diffusion, these parameters are helpful to clarify the contribution of P-gpmediated flux in the membrane permeation of drugs. From the results of propranolol (UFR of 1.0), prazosin (UFR of 2.6), and quinidine (UFR of 15.0), we determined that the contributions of Pgp-mediated flux for the membrane permeation of propranolol, prazosin, and quinidine are low, medium, and high, respectively. Propranolol, prazosin, and quinidine were used as reference compounds for P-gp non-substrates, weak substrates, and strong substrates, respectively. A previous study indicated that the threshold for rat Kp,uu ≤ 0.2 is between 2 and 3 of the human Pgp efflux ratio, based on the correlation between in vitro human P-gp efflux ratios and in vivo rat

ACS Paragon Plus Environment

Molecular Pharmaceutics 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 38

Kp,uu values.44 Using these reference compounds, the classification of P-gp substrates was defined as follows. UFR values of test compounds less than or equal to propranolol were categorized as non-substrates of P-gp (decision: low-potential class), UFR values from propranolol to prazosin were categorized as non-substrates or weak substrates of P-gp (decision: medium-potential class), and UFR values greater than prazosin were categorized as weak to strong substrates of P-gp (decision: high-potential class). It was reported previously that a good linear correlation exists (r = 0.892) between the in vitro parameter, CFR, and the in vivo parameter, Kp, brain ratio (mdr1a/1b KO/WT) in mice.8 To compare this previous method (CFR) and our simplified method (UFR) quantitatively, each assessment was made using the same P-gp-overexpressing LLC-PK1 cell monolayers and host LLC-PK1 cell monolayers. Figure 1 clearly demonstrates a significant positive correlation between CFR and UFR. This simplified method must be compared to previous methods with respect to theoretical considerations. According to the pharmacokinetic model described in Scheme 1, UFR and CFR were calculated using Equations 7 and 9. If PSA,eff was much larger than PSB,eff (PSA,eff >> PSB,eff), equal to PSB,eff (PSA,eff = PSB,eff), or much smaller than PSB,eff (PSA,eff