Predicting Passive Permeability of Drug-like Molecules from

*E-mail: [email protected]. ... validation based solely on in vitro data might not be a good characterization of the usefulness of the in si...
0 downloads 0 Views 1MB Size
Subscriber access provided by UNIV OF WATERLOO

Article

Predicting passive permeability of drug-like molecules from chemical structure: where are we? F. Broccatelli, L. Salphati, E. Plise, J. Cheong, A. Gobbi, M.-L. Lee, and I. Aliagas Mol. Pharmaceutics, Just Accepted Manuscript • DOI: 10.1021/acs.molpharmaceut.6b00836 • Publication Date (Web): 02 Nov 2016 Downloaded from http://pubs.acs.org on November 6, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Molecular Pharmaceutics is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 22

Table of Contents Graphic

Fa (human)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

BDDCS Class

1 2 3 4 Calculated MDCK

Predicting passive permeability of drug-like molecules from chemical structure: where are we? Broccatelli F., Salphati L., Plise, E., Cheong, J., Gobbi A., Lee M.-L., Aliagas I. Abstract Intestinal absorption in human is routinely predicted in drug discovery using in vitro assays such as permeability in the MDCK cell line. In silico models trained on these data are used in drug discovery efforts to prioritize novel chemical targets for synthesis, however their proprietary nature and the limited validation available, which is usually restricted to predicting in vitro permeability, is a barrier to widespread adoption. Due to the categorical nature of the in vitro permeability assay, intrinsic assay variability, and the challenges often encountered when translating in vitro data to an in vivo drug property, validation based solely on in vitro data might not be a good characterization of the usefulness of the in silico tool. In this work we analyze the performance of three different in silico models in predicting the in vitro and in vivo permeability of 300 marketed drugs and 86 discovery compounds. The models differ in their approach (mechanistic vs QSAR) and the degree of complexity; one of them is a linear equation based on seven simple physico-chemical descriptors and is presented for the first time in this work. Results show that in silico models can be successfully used to complement the discovery toolbox for characterizing in vivo intestinal permeability, defined using fraction of dose absorbed in human (Fa) and human jejunal permeability (Peff). While the in vitro permeability models outperformed the in silico approach at predicting each of the in vivo endpoints explored, the gap in predictivity between the in vitro and the in vivo data was generally comparable to the gap between in silico and in vitro data. The in vitro and in silico approaches shared many of the same outliers, which can often be explained by the route of drug absorption (paracellular vs transcellular, active vs passive). Data suggest that the discovery process can greatly benefit from an early adoption

ACS Paragon Plus Environment

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

of in silico models for predicting permeability as well as from a careful analysis of the in silico to in vivo disconnects. Keywords Intestinal permeability, Intestinal absorption, QSAR, In Silico, Madin-Darby canine kidney cells (MDCK), Biopharmaceutics Drug Disposition Classification System (BDDCS) Introduction Permeability and solubility, along with physiologically relevant parameters such as intestinal transit time, determine the oral absorption of small molecules. Due to significant advancements in the field of in vitro DMPK, project teams can rely on a variety of systems, such as the Caco-2, Madin-Darby canine kidney (MDCK), and parallel artificial membrane permeability assay (PAMPA) cell lines to predict intestinal permeability [1-5]. Resulting data are used to drive structure–activity relationship (SAR) analyses and are only limited by the time and resources required for in vitro testing and chemical synthesis. The use of in silico models to predict permeability of small molecules has been proposed to complement the discovery paradigm [6-8], offering quick and inexpensive predictions for virtual compounds that exist only on computer desktops. These models capitalize on the wellrecognized relationships between physico-chemical properties and different component of passive permeation such as: lipid solubility (LogP), energy of desolvation (polarity), conformational energy (number of rotatable bonds) and tendency of neutral species to drive absorption (pKa). Routine use of such models can enrich the pool of pharmacokinetic (PK)-friendly molecules emerging from each cycle of synthesis. While a number of in silico models have been presented to predict in vitro permeability, the practical usefulness of these technologies is complicated to assess. Models trained or calibrated on literature collections of molecules tend to suffer from poor chemical diversity and heterogeneity in the experimental setting used to test the permeability, resulting in limited applicability domains. Proprietary models are either not available to the public, or in the case of commercial models, trained on undisclosed datasets, thus posing a problem for unbiased external validation. Additional complications come from the assessment of the intrinsic in vitro assay variablity that affects the quality of both the in silico model and the validation set [9]. We believe that a lot could be learned from a comparative study of in vitro and in silico predictions of in vivo endpoints. This work attempts to evaluate the suitability of in silico data as one of the three pillars (together with in vitro and in vivo) supporting the cycle of hypothesis generation, chemical design and synthesis, hypothesis testing, and learning. The aim of this study is not to measure the extent to which an in silico model can predict an in vitro number, but rather to understand if this information, available before compound synthesis, can be used to effectively prioritize compounds towards improved in vivo permeability. While state of the art in silico predictions are not sufficiently accurate to replace in vitro measurements in biologically relevant systems, it is our experience at Genentech that they can successfully “push” the chemistry in the right direction, reducing the number of cycles of synthesis

ACS Paragon Plus Environment

Page 2 of 22

Page 3 of 22

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

necessary to solve ADME related issues. An important prerequisite for this to happen is an establishment of trust in the tool through validation and definition of the applicability domain, which is primarily dictated by the route of absorption in the case of permeability predictions. We believe that model validation and applicability domain reporting are tools that drug discovery scientists will ultimately need in order to make informed decisions prior to synthesis in projects using computational ADME tools. The ability of three in silico models to predict passive permeability is tested in this work. We analyze data recently presented by Varma et al. [10], that characterizes the ability of in vitro permeability to predict the extent of human intestinal absorption (Fa) and effective permeability (Peff) across the jejunum membrane. The biopharmaceutics drug disposition classification system (BDDCS) was used to direct the analysis towards compounds for which intestinal absorption is not limited by solubility [11]. An exhaustive evaluation of all the in silico technologies used to predict permeability is beyond the scope of this paper. We limited our evaluation to three different in silico technologies, and we deliberately did not explore models for which the computational time to run predictions approaches the time required for in vitro testing (e.g. ab initio or QM calculations). We believe that there is great value in exploring this area of science, given the rapid increase in computational power afforded by for example GPU technology that will no doubt lead to faster calculations. The three models that were explored differ in their approach and complexity. The first literature model is a mechanistic approach based on LogP and pKa [12]. The second is a simple linear equation stemming from a quantitative structure-activity relationship (QSAR) analysis using partial least squares (PLS) based on seven simple physicochemical descriptors and is presented for the first time in this work. The third is a more complex and less interpretable machine learning QSAR predictor based on 67 molecular descriptors that is routinely used at Genentech to inform decision-making during drug design. The present study aims to help understand whether the complexity of a model is justified by its accuracy, as this might be a point of discussion for people gravitating around the in silico ADME field. The two QSAR models discussed in this work were trained on a Genentech proprietary dataset of 3818 in vitro permeability measurements from the MDCK cell line, which to our knowledge is unprecented for a publicly available permeability predictor. Substrates for efflux transporters were excluded from the training set since the models were primarily designed to predict passive transcellular permeability. Molecules characterizing the training set were internal discovery compounds and deliberately did not include any marketed drugs that were used for external validation. To add granularity to the analysis, we included an additional external validation set composed of three series of closely related analogues synthesized during discovery campaigns not performed at Genentech. It can be argued that the real value of in silico models in discovery lies in the ability to differentiate between similar compounds, allowing chemists to make informed decisions on which molecule should be synthesized next. The external validation set of closely related chemical series was used in this study to assess the ability of the three in silico models to do so.

ACS Paragon Plus Environment

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Methods Literature Dataset: Twenty compounds with Peff data and 105 marketed drugs with in vitro MDCK data measured at pH 7.4 in a low efflux transporter MDCK (MDCK-LE) cell line were taken from a recent publication [10]. Additionally, 295 compounds with human Fa data were taken either from the same publication or from a different publication by the same group [13]. Fa data were averaged when more than one value was available. While the original work included a number of MDCK measurements performed at pH 6.5, we included only measurements at pH 7.4 to make the analyses directly comparable to the in silico models trained using our data obtained at the same pH. A total of 299 compounds containing BDDCS classifications along with either Fa or MDCK data were extracted from two different publications, with the highest precedence given to the most recent of the two when different classes were reported for the same molecule [14,15]. In agreement with the BDDCS criteria, compounds in BDDCS classes 1 and 3 have high solubility, while those in BDDCS classes 2 and 4 have low solubility, where the definition of solubility is based on the lowest thermodynamic solubility value in a pH range from 1 to 7.5 as well as on maximum therapeutic dose [11]. Compounds belonging to BDDCS classes 1 and 3 with available Fa data were used to test the ability of in vitro and in vivo permeability models to predict fraction absorbed in human gut. Poorly soluble compounds were excluded from this set as their absorption may not be limited by permeability. Vancomycin and carboplatin were also excluded because some physicochemical properties could not be computed for these drugs. A set of 92 compounds were collected from ChEMBL [15-19], with the aim of finding three series of related analogues for which MDCK permeability data were available. Molecules in this dataset were not synthetized at Genentech. All have relatively low molecular similarity to compounds in the training set of the QSAR models (Tanimoto similarity 0.6) and predict the permeability class to which they belong (AUC >0.85). Overall, the performance of the QSAR models in this set was superior to that of the mechanistic model in predicting the in vitro measurements. A common outlier for the three in silico models was fexofenadine. This compound is a well-known Pgp substrate and the ability of Pgp inhibitors to increase its absorption in rat has been well characterized [45]. Table 4. Statistical values describing the ability of different in vitro and in silico permeability models to predict Peff. MDCK-LE Papp-

cMDCK-SVM

cMDCK-PLS

Mechanistic

0.63 0.88

0.62 0.86

0.37 0.71

AB 7.4

Rsp AUCa

0.77 0.95

ACS Paragon Plus Environment

Page 13 of 22

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

Accuracy 90% 80% 90% 75% -4 cut-off value of 1 X 10 cm/s was adopted to calculate AUC. This value is close to the permeability of the reference compound metoprolol. aA

Figure 4. Plots showing the ability of the different in vitro and in silico permeability models to predict Peff in human for 20 marketed drugs. The cut-off value of 1 X 10-4 cm/s is represented as a dashed line.

Discussion and Conclusions In this study we assessed the usefulness of three in silico technologies designed to predict permeability in drug discovery. One of the three models is based on a mechanistic description of drug permeation presented by Zhang et al. [12]. Its only inputs are lipophilicity (LogP), charge and pKa. The biggest advantages of this model are the ability to i) perform sensitivity analyses around input values, ii) predict permeability at different pH levels, and iii) provide information on the intracellular and extracellular concentrations of the compounds. A major theoretical limitation is the inability to describe features influencing the kinetics of absorption, such as the conformational energy required to permeate and move across biological membranes. This is particularly important considering that the lipid composition of the PAMPA in vitro system can influence permeability [47]. Nevertheless, this

ACS Paragon Plus Environment

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

model offers an excellent mechanistic description of drug permeation across the intestinal epithelia and is a great starting point for investigators seeking to gather more information on drug permeation. The second model with an increasing order of complexity (cMDCK-PLS) uses a simple linear equation derived by means of QSAR modeling. Just like the mechanistic permeability model, this approach can be entirely reproduced by investigators outside our company. The strength of this model lies in its simplicity. Except for charge and fraction ionized, all the seven descriptors used by the model are independent of pKa; thus, by recalculating the fraction ionized at a different pH (e.g. pH=6.5) the cMDCK-PLS model can be used to simulate permeability values that are more physiologically relevant for species such as acidic compounds [10]. Although this model provides no insight on the mechanism, it can be used by medicinal chemists to simulate “what if” scenarios and inspire strategies for altering compound permeability. The third model stems from a QSAR machine learning approach, effectively resulting in a black-box model. Due to the model’s non-linearity, degree of complexity, and high information content, this approach might capture recurring trends driving permeability that are not obvious when using a linear approach. The three models heavily rely on LogP and pKa calculations, hence their performance can be expected to deteriorate if in silico tools with low accuracy in defining these descriptors are used. In vitro MDCK data are routinely used in discovery to predict the ability of compounds to cross the intestinal membrane permeating through the cells (transcellular permeability) and between the tight junctions (paracellular permeability). MDCK cell lines express efflux transporters (such as MDR1), which are present in the human intestine and can modulate the absorption of small molecules. Other transporters that are important for absorption in human may not be expressed in MDCK cell lines. When interpreting in vitro to in vivo correlations, DMPK scientists are expected to be mindful of these characteristics, and we strongly suggest that similar expectations be applied when establishing in silico to in vivo (or in vitro) correlations. In this work we found that in vitro and in silico predictors for modeling intestinal absorption in human shared many outliers, which can be rationalized upon further analysis of the absorption pathway. The in silico models presented in this work are suitable for predicting the passive transcellular permeability of drug-like molecules and may need to be integrated with other predictors when this is not the dominant route of absorption. Typically in silico models are evaluated exclusively for their ability to predict compounds in the same assay used for training and calibration; this is not the case in this work, which differs from others in the variety of in vitro and in vivo endpoints that were used to characterize the performance of in silico predictors. The ultimate objective of permeability optimization in drug design is to synthesize compounds that are bioavailable in vivo. While in vitro data have been consistently proven to be a valuable tool for this task [10], it is important to point out that the interpretation of in vitro discovery assays are largely categorical. Hence, by evaluating in silico models based solely on quantitative predictions of in vitro data, one might end up underestimating the usefulness of in silico models. We believe that this study may

ACS Paragon Plus Environment

Page 14 of 22

Page 15 of 22

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

help future investigators explore the relevance of in silico modeling beyond statistics. Statistical data can sometimes be deceiving due to the categorical nature of discovery assays as well as the intrinsic variability and heteroskedastic nature of statistical approaches [9]. For instance a model able to correctly categorize most compounds might display poor R2 due to the tendency to under or over predict extremely low/high values, which might identify regions where the assay is more prone to high variability. This issue could partially be overcome by using a classification approach, however the cost for this would be a loss of information due to categorization of borderline and moderate values. Figure 5. Line plots summarizing the performance of the different in vitro and in silico predictors characterized in this study; A) is a comparison between the in silico models, B) is a comparison between in vitro and in vitro models.

When used to predict in vitro MDCK data, the two QSAR models performed significantly better than the mechanistic approach (Figure 5b). Interestingly, the permeability predictions for the QSAR models were within 3-fold or less from the measured value for the majority of the compounds (~70% of the time for the machine learning QSAR model). While it was virtually impossible to differentiate between the performance of the two QSAR predictors when tested on the set of marketed drugs, the machine learning approach significantly outperformed the simple linear model when rank ordering three series of analogues from the

ACS Paragon Plus Environment

Molecular Pharmaceutics

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

literature. These analogues were not Genentech compounds and do not have a high structural resemblance to any compound in the training set (fingerprint Tanimoto similarity score