Physiologically Based Pharmacokinetic Modeling in Lead

Jan 16, 2018 - For example, adding a polar or charged group will generally increase solubility but decrease permeability. Identifying the 2 or 3 prope...
0 downloads 6 Views 1MB Size
Subscriber access provided by READING UNIV

Article

Physiologically-Based Pharmacokinetic Modeling in Lead Optimization II: “Rational Bioavailability Design” by Global Sensitivity Analysis to Identify Properties Affecting Bioavailability Pankaj R. Daga, Michael B. Bolger, Ian Haworth, Robert Daniel Clark, and Eric J. Martin Mol. Pharmaceutics, Just Accepted Manuscript • DOI: 10.1021/acs.molpharmaceut.7b00973 • Publication Date (Web): 16 Jan 2018 Downloaded from http://pubs.acs.org on January 24, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Molecular Pharmaceutics is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

Physiologically-Based Pharmacokinetic Modeling in Lead Optimization II: “Rational Bioavailability Design” by Global Sensitivity Analysis to Identify Properties Affecting Bioavailability Pankaj R. Daga1#, Michael B. Bolger2, Ian S. Haworth3, Robert D. Clark2, Eric J. Martin1*

1

Novartis Institute of Biomedical Research, Emeryville, CA. 94608 USA

2

Simulations Plus, Inc., 42505 10th Street West, Lancaster, CA. 93534 USA

3

Department of Pharmacology and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA. 90089 USA.

* Corresponding author. email: [email protected], 510-759-4193 # Current affiliation: Simulations Plus, Inc., 42505 10th Street West, Lancaster, CA. 93534 USA

ACS Paragon Plus Environment

Molecular Pharmaceutics 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Abstract When medicinal chemists need to improve oral bioavailability (%F) during lead optimization, they systematically modify compound properties mainly based on their own experience and general rules of thumb. However, at least a dozen properties can influence %F, and the difficulty of multi-parameter optimization for such complex non-linear processes grows combinatorially with the number of variables. Furthermore, strategies can be in conflict. For example, adding a polar or charged group will generally increase solubility but decrease permeability. Identifying the 2 or 3 properties that most influence %F for a given compound series would make %F optimization much more efficient. We previously reported an adaptation of physiologicallybased pharmacokinetic (PBPK) simulations to predict %F for a lead series from purely computational inputs within a 2-fold average error. Here, we run thousands of such simulations to generate a comprehensive “bioavailability landscape” for the series. A key innovation was recognition that the large and variable number of pKas in drug molecules could be replaced by just the two straddling the isoelectric point. Another was use of the ZINC database to cull out chemically inaccessible regions of property space. A quadratic Partial Least Squares regression (PLS) accurately fits a continuous surface to these thousands of bioavailability predictions. The PLS coefficients indicate the globally sensitive compound properties. The PLS surface also displays the %F landscape in these sensitive properties locally around compounds of particular interest. Finally, being quick to calculate, the PLS equation can be combined with models for activity and other properties for multi-objective lead optimization.

ACS Paragon Plus Environment

Page 2 of 28

Page 3 of 28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

Keywords PBPK, lead optimization, lead series, local model, global sensitivity analysis, PLS, multiobjective optimization

Introduction PBPK modeling in lead optimization The most important question faced by a medicinal chemist is what compound to make next. A medicinal chemist has many computer-aided drug design tools to help understand how to improve affinity,1 but there are no comparable tools for understanding how to improve bioavailability (%F). Medicinal chemists largely rely on their own experience, folk wisdom, rules of thumb,2, 3 luck, and preclinical in vitro and in vivo studies to optimize %F and other PK properties.4, 5 Physiologically-based pharmacokinetic (PBPK) models simulate the in vivo mechanisms of absorption, distribution, metabolism and excretion (ADME). Literature reports have mainly employed PBPK simulations to predict %F and other pharmacokinetic properties6, 7 for just a few advanced compounds from a chemical series, primarily to estimate first-in-human doses8, 9 from in vitro experiments. These methods have been called in vitro / in vivo extrapolation (IVIVE). In addition, PBPK simulations are extensively used for quality-by-design (QbD) formulation development and to develop generic substitutions with a minimal number of biostudies. In order to inform lead optimization, however, %F predictions must be computed for synthesis candidates from chemical structure alone, with no experimental inputs. We recently reported an adaptation of GastroPlus PBPK simulation that uses series-specific local QSAR models for an ideal fitted intrinsic clearance (CLloc), to predict %F from chemical structure alone. The method

ACS Paragon Plus Environment

Molecular Pharmaceutics 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

required 15 to 20 initial rat PK studies from each congeneric series to parameterize the calculation, but could then accurately predict %F for most compounds within about two-fold for that series.10

Global Sensitivity Analysis Medicinal chemists want more than a computer simulation to predict %F, however. They want guidance about what kind of compound to make next. When attempting to optimize %F, medicinal chemists might adjust any of over a dozen chemical properties: solubility, LogP, LogD, Peff, CLint, PPB, RBP, MW, efflux, PSA, flexibility, pKa’s, etc. While medicinal chemists do know how to adjust these properties, this many-parameter optimization is daunting. Efficient bioavailability optimization requires understanding which 2 or 3 compound properties most influence %F for each specific medchem series. These very properties that chemists know how to adjust serve as key inputs to PBPK models. To characterize the relative influence of each compound property input, PBPK programs often include local sensitivity analysis (LSA).11 LSA varies one input property at a time, usually over a limited range, and monitors the local change in the model output. This identifies which model inputs have high influence over the pharmacokinetics for that specific compound. Examples of LSA include reports of "solubility greatly limiting and permeability moderately affecting the absorption of a compound",12 and "no or very low sensitivity of oral bioavailability to LogD and solubility with higher influence by hepatic and intestinal clearance, permeability, and fraction unbound in plasma."13 Because LSA varies just one parameter at a time over a limited range it is rapid and simple to implement. However, it can give misleading results because there are substantial interactions among compound properties in these highly complex and non-linear PBPK models, and because LSA is insensitive to larger trends across the entire medchem series.

ACS Paragon Plus Environment

Page 4 of 28

Page 5 of 28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

Global Sensitivity Analysis (GSA), in contrast, analyzes the influence of many properties in all compatible combinations over the full accessible range. Having developed procedures to accurately calculate %F from PBPK calculations for entire series trained on a few dozen experimental %F measurements,10 we can then perform many thousands of these simulations for hypothetical series members for which we have no experimental data to characterize that series’ full “bioavailability landscape”. GSA of this landscape reveals how to efficiently improve %F, i.e. provide rational bioavailability design.

Computational Methods Zinc database for GSA sampling The “All Clean” subset of the public domain ZINC DB version 12 (accessed 03/20/2014) containing ~16.5 million compounds was employed as a large sampling of chemically realistic combinations of inputs to GastroPlus. Duplicate compounds were removed. Required input physicochemical properties were computed using ADMET Predictor v7.2 for use in filtering and grid sampling.

Grid-based sampling of chemically realizable property space Seven input properties were selected as independent variables defining the %F landscape: S+LogP, S+Sw, S+Peff, S+Fup, S+RBP, ApKa and BpKa. Properties such as S+Peff, S+Sw, and S+PrUnbnd are generally described in terms of fold differences. These properties were therefore log transformed, to put them on a uniform scale. All seven input parameters were further zscaled to equalize the weights. For each medchem series, the subset of Zinc compounds with properties within the expected active range were selected. In each case, this left about 3 million

ACS Paragon Plus Environment

Molecular Pharmaceutics 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

cmpds. Five PCs were retained (4 in case of DPP4 dataset), explaining about 80% of the variance. Each principal component was divided into 5 levels. In addition, a 6th clearance dimension was added to the PC space by dividing the CLloc values from the known active compounds into 5 levels and sampling from there. The ZINC compound nearest the center of each cell was selected for simulation. All other Gastro Plus inputs were simply those calculated by ADMET Predictor for the selected compounds. The procedure was repeated at 6 levels to ensure sufficient sampling. The filtering, principal component analysis and grid selection were carried out using in-house scripts written in the R statistical language.

GastroPlus Simulations GastroPlus PBPK simulations were tuned for each series as previously described.10 GastroPlus simulations were performed on ~10000 property-combinations from sampling the ZINC database for each medchem series. While the standard user interface to GastroPlus is a versatile GUI ideally adapted to optimizing the performance of the simulations for individual compounds, performing thousands of calculations required a batch interface. GastroPlus databases were generated using a specialized macro enabled excel spreadsheet, now available from SimulationsPlus upon request. Standard GastroPlus inputs were used for all three series: IR-Solution, 2mpk dose in 0.5 ml solution, Rat Physiology and PBPK, ADSON model for paracellular permeability.

PLS And Neural Net regression analysis The regression analyses were carried out using two orthogonal methods, Partial Least Squares (PLS) and Artificial Neural Network (ANN), using the pls14 and nnet15 packages in the R statistical program. Models were trained for 2 GastroPlus outputs: bioavailability (%F) and

ACS Paragon Plus Environment

Page 6 of 28

Page 7 of 28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

Fraction absorbed (%Fa). In both the cases, five-fold cross-validation was used with eight independent variables, including quadratic terms, for a total of 44 variables. Sensitive Coefficients The coefficients for each variable of the PLS models were taken as the sensitivity coefficients. For ANN, the weights connecting each input node to output nodes through the hidden nodes were aggregated using R. That single value for each input variable describes its contribution to the predicted %F. As the data had been Z-scaled before regression, they were already of equal weight. However, to simplify interpretation, relative sensitivity coefficients from both approaches were further normalized for each analysis as a fraction of the magnitude of the largest coefficient (CLint in all the three cases), keeping their signs. The coefficient’s possible values therefore varied from -1 to 1.The relative coefficients with absolute magnitude larger than 0.25 were selected for further interpretation. In HSD1, no other parameter was found to have relative coefficient larger than 0.25. Hence the threshold was reduced to 0.10 for that graph. Prediction of Bioavailability for series members using PLS Models %F was predicted for all compounds within each chemical series using the PLS models. Separate predictions were carried out using 3 sources of CLint values (CLfit, CLloc and CLglb) along with the ADMET predictor values for the 7 other selected physicochemical properties. The results were compared with those predicted using the full GastroPlus simulations. 2D Contour plots for specific compounds For each of 3 selected PIM Kinase inhibitors, 10,000 combinations of LogP and Peff within the active property ranges given in Table 1 were combined with the compound’s remaining 6 properties. %F was calculated using the PLS model for PIM Kinase inhibitors, and contour plots

ACS Paragon Plus Environment

Molecular Pharmaceutics 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

were graphed to show the 2D “slices” through the 8 dimensional bioavailability landscape containing these 3 compounds using “R”.

Results Medchem Series GSA of PBPK was applied to three medchem series with substantial rat %F data previously studied in our series-wide PBPK report: 49 DPP4 inhibitors published by Merck; 81 HSD-1 inhibitors published by Astra Zeneca; and 63 compounds from internal work on PIM Kinase inhibitors.10 Bioavailability was predicted within two-fold for more than 80% of the compounds using CLloc QSARs specific for each chemical series. We also obtained less accurate but still useful predictions using a global rat liver microsomal intrinsic clearance (RLM CLint) QSAR from an in-house neural network model trained on 6650 diverse CLint measurements.10

Property selection and ranges for GSA This study focused on 8 key compound properties. Seven properties were computed using ADMET Predictor™ (Simulations Plus, Inc. Lancaster, CA) : LogP(S+LogP), native solubility (S+Sw), human intestinal permeability Peff (S+Peff), fraction unbound in plasma (S+PrUnbnd), ratio of blood to plasma (S+RBP), acidic pKa (ApKa) and basic pKa (BpKa). The eighth was intrinsic clearance predicted using series-specific local QSARs (CLloc)10. Property distributions are shown in supplemental Figure S1. For the literature DPP4 and HSD1 inhibitor data sets, we did not have access to activity data from all compounds in the program. We therefore assumed that the active property ranges were those of the compounds in the published PK studies. For the internal PIM Kinase series, the active property space was taken as the range of these properties

ACS Paragon Plus Environment

Page 8 of 28

Page 9 of 28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

covered by all synthesized compounds from the series with IC5080% of the total variance required: 5 PCs for compounds in the HSD1 and PIM property ranges, and 4 PCs for DPP-4. The respective PCA spaces were each divided into both 5 and 6level grids. The compound closest to the center of each occupied grid cell was used as a surrogate for the medchem series members with that property combination in the final GSA. This procedure ensured that only realistic property combinations were considered, reduced the number of PBPK calculations, and prevented bias due to the influence of properties like LogP that are correlated with other inputs like solubility and Peff. CLloc Since the CLloc QSARs are specific to a given series, they could not be applied to the ZINC compounds. CLloc was therefore sampled orthogonally to the other 7 properties, but in a way that enforced the actual CLloc distribution. CLloc was calculated for all active members of the medchem series. The CLloc range was binned into either 5 or 6 levels. For each occupied grid cell from the PC space of the other 7 properties, a value was selected at random from each CLloc bin and assigned as the CLint value to that cell’s ZINC surrogate to add the additional CLloc dimension. The number of occupied cells before and after addition of CLloc for each medchem series is shown in Table 1 for 6- and 5-level sampling. Table 1 also shows that the number of cells containing only one compound (before adding CLloc) is small.

Prediction of Absorption and Bioavailability The standard set of GastroPlus™ PBPK simulations for each series included Immediate Release Solutions of a 2 mpk dose in 0.5 ml solvent using rat. The paracellular permeability is estimated using equation published by Adson18 from drug properties (size, charge) and physiological parameters (pore radius, porosity) in each intestinal compartment

ACS Paragon Plus Environment

Page 12 of 28

Page 13 of 28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

Simulations were performed on each sampled property combination described above, generating the landscape of %F (and other PK parameters) for all chemically feasible combinations of properties expected for that medchem series. Metaphorically, each ZINC surrogate represents an “acre of land” in the series’ active property landscape, where %F is the elevation. As shown in Table 1, for 5-level sampling this required about 4,000 simulations each for HSD1 and PIM, and about 10,000 simulations for 6-level sampling. For DPP-4 the numbers were about half as it only required 4 PCs rather than 5. Box plots in supplemental Figure S2 summarize the distribution of calculated %F and %Fa for each series. Median %Fa is >85% for all three series, but median %F is only 32%, 31%, and 64% for DPP4, PIM, and HSD1 respectively, indicating generally good physical properties with a large clearance component to %F in all cases. PLS Regression models for GastroPlus-predicted %F and %Fa Partial least squares (PLS) regression and artificial neural network (ANN) models for predicted %Fa and %F were trained using just the 8 selected compound properties as descriptors: S+LogP, S+Sw, S+Peff, S+PrUnbnd, S+RBP, CLint, ApKa and BpKa. Accurate PLS models required quadratic terms for a total of 44 independent variables. Five-fold cross-validated correlations (q2) for the PLS %F models (with 6-level sampling) were 0.92 for PIM, 0.98 for DPP4, and 0.98 for HSD1 (0.98, 0.99 and 0.99, respectively, for ANN models). Without quadratic terms, the correlations were only 0.75, 0.94 and 0.67 respectively, highlighting the non-linearity of the %F landscape. The number of latent variables and q2 for linear and quadratic PLS models of %Fa and %F are presented in supplemental Table S1.

ACS Paragon Plus Environment

Molecular Pharmaceutics 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Comparison of GastroPlus and PLS predictions The accuracy of the PLS predictions was compared to full GastroPlus simulations for the PIM series (which gave the worst PLS statistics). Predictions using the three sources of in silico CLint values were compared: CLfit, CLloc and the global QSAR CLglb. The remaining seven properties for the PLS, and the many additional GastroPlus inputs for the full PBPK simulations, were the ADMET Predictor values for the PIM compounds. The results are shown in Figure 3.

ACS Paragon Plus Environment

Page 14 of 28

Page 15 of 28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

Figure 3: Comparison of PIM %F calculations from PBPK simulations with three sources of clearance, to those using the PLS regression model. The line of identity is red and the blue dotted lines indicate two-fold limits. %F values predicted using the simple PLS regression model are all but indistinguishable from those predicted using the full PBPK simulations.

Sensitivity coefficients from PLS regression The bar charts in Figure 4 plot the largest of the regression coefficients from the 8 properties, or combinations of properties, both for %F and %Fa. The coefficients are normalized to a maximum of 1.0 (or -1.0 for compound properties that reduce %F) by dividing by the magnitude of the largest coefficient. Coefficients with relative magnitude >0.25, indicated on the graphs with vertical blue lines, are somewhat arbitrarily taken as the sensitive, meaningful properties and included in the plots. While HSD1 had only 1 sensitive property for %F, some additional insensitive properties were also included to illustrate that these were also unaffected by statistical method and sampling density (see below).

ACS Paragon Plus Environment

Molecular Pharmaceutics 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 4: Sensitive parameters affecting predicted %F (a, b, & c) identified using PLS with 5-level grid sampling, (orange), PLS with 6-levels (blue), and ANN with 6-levels (green). All methods identify the same properties as sensitive. The blue dotted lines indicate the boundary for a relative coefficient of 0.25. For the HSD1 inhibitors, no other coefficient exceeded 0.25 relative to CLint, although RBP just missed. Lower three charts, d, e & f, show sensitive parameters affecting %Fa (predicted) identified using PLS regression with 6-level grid sampling. As expected, gut absorption is affected by a range of physical properties but not by CLint.

PLS vs ANN Comparison Figure 4 also shows results for modeling %F with the completely independent ANN method. While ANN is a non-linear method, our goal was a comparison with PLS, not to build the most efficient and parsimonious model. We therefore included the same 44 quadratic terms. The relative ANN sensitivities were determined using methods in Garson19 and Goh.20 Bioavailability landscape around specific compounds The speed of the PLS model calculations makes it possible to plot high-resolution %F landscapes around particular compounds of interest. Figure 5 shows contour surface plots from the PLS %F landscape for 2D “slices” around three individual PIM kinase inhibitors, holding the other 6 properties constant. LogP x log(Peff) were chosen from among the sensitive properties.

Figure 5: PIM %F landscapes of 3 selected PIM Kinase inhibitors. The PLS regression models were used to predict %F varying LogP and Peff while keeping the other properties constant. %F of compound A is improved by increasing LogP alone, compound B is more sensitive to Peff and compound C is best improved by simultaneously increasing LogP and Peff.

ACS Paragon Plus Environment

Page 16 of 28

Page 17 of 28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

Discussion Our goal is to identify, for medicinal chemists in lead optimization, the two or three most influential compound properties for achieving or maintaining good bioavailability. GSA evaluates %F for all chemically realistic combinations of properties over the expected active range, thus characterizing the entire %F landscape. The PLS coefficients indicate the relative sensitivities. Synthesizing compounds and measuring the requisite thousands of bioavailabilities would be impracticable. If PBPK simulations can be adapted to give reliable %F for entire medchem series, computed %F for thousands of property combinations can be obtained quickly and inexpensively. We previously reported10 methods to adapt GastroPlus PBPK simulations to accurately predict %F across entire chemical series. Here we apply GSA to 1000s of those simulations to identify the properties that most influence %F across the series in general, as well as the most sensitive for optimizing %F around individual compounds within the series. The overall approach is a sequence of simple well-defined steps: 1) Optimize GastroPlus simulations to reproduce measured %F from existing rat PK experiments. Capture all known PK in the simulations as far as possible. Beyond that, a key tool is to train a series-specific local QSAR model for the fitted ideal intrinsic clearance optimized to reproduce the experimental %F measurements in the simulation. 2) Identify the acid and base immediately above and below the isoelectric point to simplify the pKa treatment. 3) Identify the ranges of the GastroPlus input properties which the SAR indicates are compatible with adequate activity. 4) Run PBPK simulations over a large systematic sampling of ZINC surrogates within these property ranges to produce the %F landscape for the series.

ACS Paragon Plus Environment

Molecular Pharmaceutics 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

5) Fit the landscape as a quadratic PLS model. The normalized coefficients >0.25 indicate the sensitive properties. 6) For optimizing the overall series, focus compound design on improving these sensitive properties, compromising if necessary on less influential properties. 7) As the series advances and a few lead compounds emerge, focus on the PLS “landscape” around those compounds for guidance on their specific optimization.

Complications specific to GSA of chemical properties. Property Scaling For the PLS fit, the input properties must be scaled so they will be well distributed across the series and have comparable weights. S+Peff, S+Sw, S+Fup and CLint, which are described in terms of fold differences, were log transformed to put them on a uniform scale. All eight inputs were z-scaled to equalize the weights. pKa alignment pKa differs from other GastroPlus inputs in having a variable number of values per compound. The pKa’s must somehow be aligned, both to perform GSA and to give clear and simple feedback to the medicinal chemists on which pKa to adjust when pKa is a sensitive variable. Compounds with many pKa’s are especially problematic. Several approaches were compared to identify a minimum, fixed combination of pKa’s to accurately reproduce the pH solubility profile and %F calculated using all pKa’s in GastroPlus PBPK simulations. We compared results obtained using either two or four pKa’s straddling either the native pH or isoelectric pH. The best result was to use the 2 pKa’s from each compound immediately below and above the isoelectric pH. This simplification produced pH-solubility profiles nearly identical to using all

ACS Paragon Plus Environment

Page 18 of 28

Page 19 of 28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

pKa’s. Ceftobiprole, with 7 pKa’s, is an extreme illustration. Figure S3 in supplemental data shows the close match for the computed pH-solubility curve for Ceftobiprole using all pKa’s vs. using just the two bracketing the isoelectric point. Figure 1 compares the correlations between predicted %F and %Fa using all pKa’s or just these 2 for 2040 small molecules from the Zinc Drug Database. The correlation coefficients of R2=0.97 and R2=0.99 respectively validate the high quality of this 2 pKa approximation. Restriction to realistic property combinations We focused on just the eight physicochemical properties detailed above. Even so, a few dozen experimental rat %F values are not sufficient to characterize an 8-dimensional %F landscape, nor are a few hundred GastroPlus computed %F values from members of the series. However, the input to GastroPlus simulations is a series of compound properties, either measured or computed from QSARs, rather than a structure per se. We could, in principle, simply sample thousands of points within the property ranges, without consideration of actual chemical structures. However, chemistry-naive sampling methods, such as random or design of experiments, would sample chemically incompatible property combinations such as high aqueous solubility paired with high hydrophobicity. Furthermore, these would not necessarily provide compatible values for the dozens of other required inputs that are not our focus. As described in the Results, chemical property combinations consistent with active members of each series were therefore sampled from actual compounds in the ZINC database, guaranteeing that only chemically feasible property combinations were used. The ZINC compound nearest the center of each occupied grid cell acts as a surrogate for the PK behavior of an unspecified hypothetical active member of the medchem series. 90% of the occupied grid cells contained at least two ZINC compounds,

ACS Paragon Plus Environment

Molecular Pharmaceutics 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

indicating that the 12.8 million compounds have enough redundancy to thoroughly sample the entire feasible space.

Sensitivity analysis PLS regression Frequently used GSA algorithms such as the Fourier Amplitude Sensitivity Testing (FAST),21, 22 extended FAST (eFAST),23 Sobol24 or Morris25 methods automatically sample from all regions of property space. Indeed, these approaches have been applied to biological systems modeling,26 and chemical-kinetic reaction mechanisms.27 For chemical property space, however, PLS regression was chosen because it can use an arbitrary, pre-defined collection of input property combinations. Furthermore it is robust with respect to correlated inputs, and has a straightforward mathematical basis that clarifies interpretation. The utility of the PLS predictions depends on how well they approximate the full GastroPlus simulations for %F and %Fa on the actual medchem series compounds. Recall that the quadratic PLS models for GastroPlus %F and %Fa were trained on abstract chemistry landscapes using physicochemical property combinations from arbitrary ZINC surrogates that sampled the ranges for 7 of the 8 key properties within each medchem series, along with a stratified statistical sampling of CLint, rather than on actual structures from the medchem series. Furthermore, the PLS descriptors include only the 8 selected properties from among the dozens of GastroPlus inputs computed for each AINC surrogate. The accuracy of the PLS predictions was compared to full GastroPlus simulations for the PIM series (which gave the worst PLS statistics) to ensure applicability back to the original medchem series compounds. Figure 3 compares results using all three sources of in silico CL values: CLfit, CLloc and a global QSAR CLglb. CLfit was optimized on GastroPlus simulations to exactly reproduce the

ACS Paragon Plus Environment

Page 20 of 28

Page 21 of 28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

experimental %F, so the points in Figure 3a lie exactly on the line of identity. The modest scatter about the unity line in Figure 3b shows the small error due to using the much simpler PLS approximation rather than the full PBPK simulations. The pairs of plots using CLloc and CLglb as inputs to either the full simulation or the PLS model are all but indistinguishable, showing that the errors are almost entirely due to the estimates of CLint, Thus, replacing the full PBPK %F with the simpler PLS approximation does not introduce significant additional error. The most sensitive variables from each series are graphed in Figure 4. Positive coefficients indicate that %F increases with more positive values of that property. Positive cross-terms indicate more than additive interactions between pairs of properties. Positive squared terms indicate concave upward curvature. If the sensitivities are stable and unique, analysis by an orthogonal statistical method should produce similar results. The sensitivity coefficients determined using an ANN model, graphed in Figure 4, produced nearly identical results, and would indicate the same guidance for medicinal chemistry. The close similarity of PLS results sampling property space with a grid of 5 or 6 levels indicates that 5 levels are sufficient and cuts the required number of GastroPlus simulations by more than half. Interpretation of GSA results across three chemical series The purpose of the GSA analysis is to provide guidance for medicinal chemists. PLS regression coefficients in Figure 4 indicate the relative sensitivity of %F and %Fa to the most influential of the 8 selected properties. CLint is the most important factor in all three series, consistent with Figure S2 which shows that all compounds are well absorbed. Beyond that, the series differ. For HSD1, no other property is even ¼ as sensitive as CLint. The message for improving bioavailability in this case would be to focus on clearance. In fact, the chemist can safely lower

ACS Paragon Plus Environment

Molecular Pharmaceutics 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

LogP and Peff to reduce CL and improve %F. For both DPP4 and PIM datasets, LogP is important, so a more surgical strategy for blocking metabolic clearance is required. For PIM, Peff is important as well. The positive squared LogP terms in PIM and DPP4 indicates concave upwards curvature i.e. the %F landscape is relatively flat for low values of LogP, and then rises more steeply as the value is increased further. This is an unwelcome message since high LogP often raises other development issues. The ratio of blood to plasma concentration (RBP) is important for PIM and DPP-4 %F. The role of RBP in GastroPlus is complex. It increases the plasma clearance computed from CLint and it decreases Fup computed from PPB. Because it is used to estimate acidic phospholipid binding, it increases the liver/plasma partition coefficient (Kp) and consequently increases liver tissue concentration (as well as some other tissues, so also larger volume of distribution). The net result of these competing effects is that increasing RBP increases %F, as indicated by the positive PLS coefficient. The effect is largest for strong bases because the effect on binding acidic phospholipids is greater. The sensitivity coefficients for %Fa in Figure 4 show that intestinal absorption represents a complex interplay of many physicochemical properties, with noticeable differences among chemical series. Gut metabolism was not included in these simulations, so clearance is not a factor. All three cases are sensitive to Peff and LogP. HSD1 and PIM are limited by solubility, but DPP-4 is not. Absorption of HSD1 inhibitors is sensitive to the pKa below the isoelectric point (ApKa), while that of DPP4 inhibitors is sensitive the pKa above (BpKa). Only the HSD1 series is sensitive to Fup.

ACS Paragon Plus Environment

Page 22 of 28

Page 23 of 28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

Compound-specific Recommendations While Peff, RBP, and LogP were important for the PIM series generally, the %F landscape is non-linear and complex and is shaped differently in the vicinity of different compounds. Figure 5 plots the PLS approximation of the 2D LogD x Peff slices through the %F landscape around some specific PIM compounds while the other 6 properties are held constant. Compound “A” is improved most by increasing LogP, compound “B” is more sensitive to Peff and compound “C” to both LogP and Peff. Given that these are the approximate PLS surfaces, it is prudent to confirm a projected result with a full simulation before embarking on synthesis.

Limitations Three examples were presented from 3 protein families: a peptidase, a reductase and a kinase. Still, three examples do not guarantee a method applicable to all disease areas or classes of drugs. The approach is limited mainly by the quality of the optimized PBPK models, which in turn are limited by the experimental data across the compound series for optimizing the models. E.g. we know that active renal clearance is important for the DPP4 series. However, data were not available, so all clearance was assigned to the liver. Similarly, we recently studied a compound series where %F was reduced by Pgp efflux. An additional local QSAR for Caco-2 efflux ratio successfully addressed this additional complication (unpublished). Continued use will undoubtedly require additional adaptations.

Future directions Sampling the ZINC drug-like database ensures chemically compatible structures, but might include property combinations inaccessible within a given medchem series. An improvement would be to enumerate large virtual libraries in the series and perform the sampling and simulations on these instead of the unrelated ZINC compounds. Of course, generating

ACS Paragon Plus Environment

Molecular Pharmaceutics 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

sufficiently complete virtual libraries to cover all property combinations accessible to the series might require considerable effort. This paper has described a way to identify sensitive compound properties that can help guide medicinal chemists in designing analogs with improved %F, but does not suggest specific compounds for synthesis. Having identified the most sensitive properties, it is up to the medicinal chemist to propose new chemical structures with more desirable values of the sensitive properties that will likely maintain activity. Those proposed structures can then be run through GastroPlus to confirm the improved predicted %F. However, the simple polynomial PLS model that accurately approximates the full PBPK simulation could be combined with predicted activity from ligand-based QSARs, structure-based docking scores, free energy perturbation calculations or other affinity prediction methods. Thus, a bias toward bioavailability could be incorporated directly and seamlessly into existing processes for rational drug design, either interactively or by processing large virtual libraries. Furthermore, the PLS equation can be added to the fitness functions of de novo design programs that employ genetic algorithms or other numerical optimizations to automatically design chemical structures with improved activity and %F. In either case, a cut of the best scoring compounds designed interactively or evolved programmatically would then be submitted to the full GastroPlus calculation for a more thorough %F prediction.

Conclusions Medicinal chemists are hungry for guidance on how to improve bioavailability. They know a dozen compound properties they might try adjusting, but not which ones will be most influential for any particular project. Gathering sufficient experimental PK data to find out directly is

ACS Paragon Plus Environment

Page 24 of 28

Page 25 of 28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

impractical. However, accurate PBPK calculations can be adapted to a particular compound series with as few as 15 to 20 rat PK experiments. Thousands of simulations can then be run to characterize the entire bioavailability response surface. Global sensitivity analysis of this landscape identifies the most influential compound properties for the series overall, and in greater detail around specific lead compounds.

Supporting Information Figure S1: Distribution of various physicochemical properties for each medchem series used for filtering the Zinc database. Figure S2: Distribution of calculated %F (s) and %Fa (b) for the three series. Figure S3: Comparison of pH-log (solubility) curves for Ceftobiprole using all (seven) pKa’s (a) and using selected two pKa’s (b). Both the curves show nearly identical pH-Log (solubility) profile. Table S1: q2 and number of latent variables for linear and quadratic PLS models for GastroPlus %F and %Fa.

Competing financial interests EJM, ISH: None. PRD, MBB and RDC are employees of Simulations Plus, Inc., which distributes GastroPlus and ADMET Predictor programs; they own stock and options in the company.

References 1. Kuhn, B.; Guba, W.; Hert, J.; Banner, D.; Bissantz, C.; Ceccarelli, S.; Haap, W.; Korner, M.; Kuglstatter, A.; Lerner, C.; Mattei, P.; Neidhart, W.; Pinard, E.; Rudolph, M. G.; Schulz-

ACS Paragon Plus Environment

Molecular Pharmaceutics 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Gasch, T.; Woltering, T.; Stahl, M., A Real-World Perspective on Molecular Design. J Med Chem 2016, 59, 4087-102. 2. Zhang, M. Q., Working with small molecules: rules-of-thumb of "drug likeness". Methods Mol Biol 2012, 803, 297-307. 3. Lipinski, C. A.; Lombardo, F.; Dominy, B. W.; Feeney, P. J., Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev 2001, 46, 3-26. 4. Bolger, M. B.; Agoram, B.; Fraczkiewicz, R.; Steere, B. Simulation of Absorption, Metabolism, and Bioavailability. In Drug Bioavailability; Wiley-VCH Verlag GmbH & Co. KGaA: 2004, pp 420-443. 5. Bolger, M. B.; Fraczkiewicz, R.; Lukacova, V. Simulations of Absorption, Metabolism, and Bioavailability. In Drug Bioavailability; Wiley-VCH Verlag GmbH & Co. KGaA: 2009, pp 453-495. 6. Thomas, S., Physiologically-based simulation modelling for the reduction of animal use in the discovery of novel pharmaceuticals. Altern Lab Anim 2009, 37, 497-511. 7. Huang, Q.; Riviere, J. E., The application of allometric scaling principles to predict pharmacokinetic parameters across species. Expert Opin Drug Metab Toxicol 2014, 10, 1241-53. 8. Khan, M. T., Predictions of the ADMET properties of candidate drug molecules utilizing different QSAR/QSPR modelling approaches. Curr Drug Metab 2010, 11, 285-95. 9. Agoram, B.; Woltosz, W. S.; Bolger, M. B., Predicting the impact of physiological and biochemical processes on oral drug bioavailability. Adv Drug Deliv Rev 2001, 50 Suppl 1, S4167. 10. Pankaj R. Daga, M. B. B., Ian S. Haworth, Robert D. Clark, Eric Martin, PhysiologicallyBased Pharmacokinetic Modeling in Lead Optimization I: Evaluation and Adaptation of GastroPlus to Predict Bioavailability of Medchem Series. Mol Pharm 2017, (Submitted). 11. Mathias, N. R.; Crison, J., The use of modeling tools to drive efficient oral product design. AAPS J 2012, 14, 591-600. 12. Berry, P.; Parrott, N.; Reddy, M.; David-Pierson, P.; Lavé, T. Putting it All Together. In Hit and Lead Profiling; Wiley-VCH Verlag GmbH & Co. KGaA: 2010, pp 221-240. 13. Heikkinen, A. T.; Baneyx, G.; Caruso, A.; Parrott, N., Application of PBPK modeling to predict human intestinal metabolism of CYP3A substrates - an evaluation and case study using GastroPlus. Eur J Pharm Sci 2012, 47, 375-86. 14. Mevik, B.-H.; Wehrens, R., The pls Package: Principal Component and Partial Least Squares Regression in R. 2007 2007, 18, 23. 15. Venables, W. N.; Ripley, B. D. Modern Applied Statistics with S. In; Springer, New York.: 2002. 16. Levene, P. A.; Simms, H. S., Calculation of Isoelectric Point. Journal of Biological Chemistry 1923, 55, 801-813. 17. Irwin, J. J.; Shoichet, B. K., ZINC--a free database of commercially available compounds for virtual screening. J Chem Inf Model 2005, 45, 177-82. 18. Adson, A.; Raub, T. J.; Burton, P. S.; Barsuhn, C. L.; Hilgers, A. R.; Ho, N. F. H.; Audus, K. L., Quantitative approaches to delineate paracellular diffusion in cultured epithelial cell monolayers. Journal of Pharmaceutical Sciences 1994, 83, 1529-1536. 19. Garson, G. D., Interpreting neural-network connection weights. AI Expert 1991, 6, 46-51. 20. Goh, A. T. C., Back-propagation neural networks for modeling complex systems. Artif Intell Eng 1995, 9, 143-151.

ACS Paragon Plus Environment

Page 26 of 28

Page 27 of 28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Molecular Pharmaceutics

21. Collins, D. C.; Avissar, R., An Evaluation with the Fourier Amplitude Sensitivity Test (FAST) of Which Land-Surface Parameters Are of Greatest Importance in Atmospheric Modeling. Journal of Climate 1994, 7, 681-703. 22. Cukier, R. I.; Fortuin, C. M.; Shuler, K. E.; Petschek, A. G.; Schaibly, J. H., Study of the sensitivity of coupled reaction systems to uncertainties in rate coefficients. I Theory. J Chem Phys 1973, 59, 3873-3878. 23. Saltelli, A.; Bolado, R., An alternative way to compute Fourier amplitude sensitivity test (FAST). Computational Statistics & Data Analysis 1998, 26, 445-460. 24. Sobol′, I. M., Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates. Mathematics and Computers in Simulation 2001, 55, 271-280. 25. Morris, M. D., Factorial Sampling Plans for Preliminary Computational Experiments. Technometrics 1991, 33, 161-174. 26. Kiparissides, A.; Kucherenko, S. S.; Mantalaris, A.; Pistikopoulos, E. N., Global Sensitivity Analysis Challenges in Biological Systems Modeling. Ind Eng Chem Res 2009, 48, 7168-7180. 27. Davis, M. J.; Skodje, R. T.; Tomlin, A. S., Global Sensitivity Analysis of ChemicalKinetic Reaction Mechanisms: Construction and Deconstruction of the Probability Density Function. J Phys Chem A 2011, 115, 1556-1578.

ACS Paragon Plus Environment

Molecular Pharmaceutics 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

For Table of Contents Use Only

Physiologically-Based Pharmacokinetic Modeling in Lead Optimization II: “Rational Bioavailability Design” by Global Sensitivity Analysis to Identify Properties Affecting Bioavailability

Pankaj R. Daga1#, Michael B. Bolger2, Ian S. Haworth3, Robert D. Clark2, Eric J. Martin1*

ACS Paragon Plus Environment

Page 28 of 28