Physicochemical and DMPK In Silico Models: Facilitating Their Use by

Feb 12, 2013 - Commercial Software for Calculating Physicochemical Descriptors and .... create customized project-specific scoring functions using sof...
2 downloads 0 Views 3MB Size
Perspective pubs.acs.org/molecularpharmaceutics

Physicochemical and DMPK In Silico Models: Facilitating Their Use by Medicinal Chemists Daniel F. Ortwine* and Ignacio Aliagas Discovery Chemistry, Genentech Inc., 1 DNA Way, South San Francisco California 94080, United States ABSTRACT: It is known that the developability of drugs is related to their physicochemical and DMPK properties. Given the time and expense involved in discovering and developing new drugs, maximizing the chance of success by calculating properties ahead of chemical synthesis and testing, and only acting on those candidates whose properties fall into a desired range, would seem to make sense. This paper provides an overview of calculable physicochemical and DMPK properties, an assessment of their relative difficulty of their calculation and accuracy, and available software. Methods companies have employed to communicate results will be discussed, including the use of composite scoring functions and ranking schemes. Calculations do no good if chemists will not use them to prioritize synthesis decisions. Strategies are presented for facilitating model usage. An approach adopted at Genentech for presenting results that involves the close coupling of property calculations with 3D structure based drug design is described. KEYWORDS: ADMET, QSAR, physicochemical, modeling, liver microsome, hepatocyte, desktop modeling, review, MPO



INTRODUCTION. WHY CALCULATE PROPERTIES?

vitro/cellular data correlations that were otherwise masked by compounds with poor solubility, stability, or permeability.



In modern drug discovery, it has become increasingly recognized that the developability of drugs is related to their physicochemical and DMPK profiles.1 For example, studies point to an optimal distribution of properties possessed by marketed oral drugs.2 Molecular properties have also been shown to be related to a number of ADMET end points.3 A recent analysis pointed out that in many cases marketed oral drugs are seldom the most potent analogues at the target receptor, but rather possess a balance of small size and desirable physicochemical and DMPK properties at the expense of high intrinsic potency.4 Therefore, given the time and cost involved in discovering and developing NMEs and the declining productivity of the drug industry,5 calculating physicochemical and DMPK properties ahead of compound synthesis and using the results to triage which compounds to make would seem to make sense. One can explode virtual libraries, calculate their properties, and prioritize compounds for synthesis that fall into an appropriate property space. Calculated properties can assist in interpreting potency and DMPK data. They can be combined with docking scores in a multiparameter optimization paradigm. Calculations can assist in HTS triage by facilitating selection of candidates with desirable properties for follow-up testing. For highly predictive DMPK QSAR models such as the microsomal stability model described by Lee et.al.,6 calculations can replace measurements, freeing up testing capacity. Calculated properties can help guide the growth or subsetting of compound collections. Filtering data sets into subsets with appropriate drug-like properties can reveal in © 2013 American Chemical Society

CALCULABLE PHYSICOCHEMICAL AND DMPK PROPERTIES Commonly calculated physicochemical descriptors7 and DMPK properties are shown in Table 1, categorized by descriptor type. Vendors offering software to calculate and display these properties appear in Table 2. Neither list is exhaustive. Properties range from simple counts of molecular features to calculated DMPK properties that usually require a QSAR model to predict. Physicochemical descriptors such as logD, pKa, and TPSA generally are readily calculable, and a number of commercial programs are available for this task (Table 2). Because most methods use 2D structures for these calculations, they generally do not take into account 3D interactions that can affect the value of the property. For example, an intramolecular hydrogen bond will limit the solvent exposure of the interacting polar groups, effectively increasing the logD and reducing the TPSA relative to calculations based on a 2D structure that assumes these groups are fully and independently exposed to solvent. Hydrogen bonding can also affect permeability and Pglycoprotein transport.8 QSAR models to predict DMPK end Special Issue: Predictive DMPK: In Silico ADME Predictions in Drug Discovery Received: Revised: Accepted: Published: 1153

October 27, 2012 February 6, 2013 February 12, 2013 February 12, 2013 dx.doi.org/10.1021/mp3006193 | Mol. Pharmaceutics 2013, 10, 1153−1161

Molecular Pharmaceutics

Perspective

internal data. Software to do this is readily available (Table 2). The area has been reviewed.12

Table 1. Calculable Physicochemical and DMPK Properties



REPORTING RESULTS The traditional method is to present the actual result of a calculation, with an uncertainty if one is known or can be calculated. This is acceptable for reasonably well predicted properties such as logD or pKa. Lower accuracy QSAR models of ADMET end points such as stability in liver microsomes or hepatocyte cells often necessitate the use of categorical output such as high−medium−low instead of an actual predicted value. For such models that deliver largely categorical results, we10 and others31 have chosen to report results as probabilities32,30 rather than the actual predicted value. For example, for our internal human liver microsomal (HLM) stability QSAR model, we calculated the probability of a compound being stable using a univariate class-modeling technique.33 This method used the predicted HLM clearance value, the standard deviation of the prediction errors, and a threshold clearance value that separates “stable” from “unstable” compounds. This threshold is chosen by discovery scientists to separate desirable from undesirable property values, and varies from model to model. For the HLM model, the threshold for stability was chosen to be 13 mL/min/ kg (63% of total human liver blood flow at 21 mL/min/kg). Compounds with predicted clearances below and above 13 mL/min/kg were classified as stable and unstable, respectively. We assumed that prospective predictions will follow the observed distribution of prediction errors of the training set compounds, which showed a standard deviation of approximately 4.1 mL/min/kg.10 The probability that the true clearance of a compound is below the threshold is then computed as the cumulative distribution function (CDF). Numerical integration of the distribution function centered at the predicted value yields the CDF (Figure 1). The integration of the distribution function from −∞ to the threshold is then the probability of a compound being stable. Probabilities are easy to interpret: a single number contains information about categories (0.5 cutoff) and the likelihood of being in a certain category (Figure 2). Teams can use different cutoffs as their project progresses. In the early stages, with few compounds, a team can decide to use a very low cutoff, rejecting few compounds for synthesis. As the project evolves and more compounds are prepared, a new model can be derived using an expanded training set containing additional compounds from that project. If this expanded model proves to be more predictive, then the probability cutoff can be increased, resulting in improved properties of subsequently synthesized molecules. This project-specific model validation is a critical exercise, because it allows all team members to have input on desired property ranges, and facilitates subsequent acceptance and use of the models. Normalized probabilities from 0 to 1 are easy to compare and weight across different models. They are easy to use in plots, correlations, and ranking schemes. A downside to the use of probabilities is that they are not directly interpretable in terms of the end point being forecast. Unbalanced data sets, as well as data sets where errors are not uniformly distributed on either side of the selected cutoff value can pose challenges in terms of predictability. It also can be difficult to know when a predicted probability for a compound not in the training set is significantly far enough away from the cutoff value to warrant its use in synthesis prioritization.

a

Colored by difficulty of calculation. Green properties are easy to calculate or readily available from commercial software. Orange denotes QSAR models or docking protocols must be developed to obtain good predictivity. Red properties remain difficult to accurately calculate and continue to be subjects of active research. bHigh accuracy denotes counts, or properties that can be precisely calculated (generally within 0.5 log unit). Moderate refers to properties with variable precision depending on the model and training set used, or shape-based descriptors that rely on molecular modeling software. Low are normally categorical (yes/no) descriptors because they depend on multiple mechanisms that are difficult to model in a single QSAR equation, or require more approximate pharmacophore modeling studies because a 3D structure of the protein target is not available. cIn vitro/in vivo PK correlations. Prediction of in vivo clearance in vivo PK experiments based on in vitro stability data and other properties. dAlso called “cell shift”, typically calculated as the cellular EC50/in vitro IC50 (or Ki). Denotes the reduction in potency on going from an in vitro to a cellular assay.

points are typically more difficult to derive and necessarily deliver more approximate results. Companies such as ACDlabs, Optibrium, Molecular Discovery, and SimulationsPlus offer built-in DMPK QSAR models in addition to descriptive and physicochemical property calculators. Unfortunately, many of the commercially available calculators for end points such as solubility and logD suffer from poor predictivity because they were built using limited data sets. This is particularly true for DMPK QSAR models, since large data sets of consistently measured values are scarce, although recent efforts9 are addressing this issue. Because of this, many pharmaceutical companies have generated their own physicochemical and DMPK QSAR models6,10,11 using consistently measured 1154

dx.doi.org/10.1021/mp3006193 | Mol. Pharmaceutics 2013, 10, 1153−1161

Molecular Pharmaceutics

Perspective

Table 2. Commercial Software for Calculating Physicochemical Descriptors and DMPK/ADMET Propertiesa company name

software

available properties

url

comment

Accelrys ACDlabs BioByte ChemAxon Lhasa

Pipeline Pilot Percepta Profilers cLogP Metabolizer Derek Meteor

physchem + ADMET physchem + ADMET logP, MR metabolic products toxicology metabolites

accelrys.com acdlabs.com biobyte.com chemaxon.com lhasalimited.org

Dotmatics Molecular Discovery Optibrium

Vortex MoKa, VolSurf+, MetaSite Stardrop

physchem pKa, logD, physchem + ADMET, metabolic hotspots physchem + ADMET

Schrodinger SimulationsPlus

QikProp ADMETpredictor GastroPlus Sarchitect

ADMET physchem + ADMET ADMET

dotmatics.com moldiscovery. com new.optibrium. com schrodinger.com simulations-plus. com strandls.com

OEChem Toolkit ADME Prediction Toolbox Spofire

physchem Fu, absorption, other PK properties

eyesopen.com simcyp.com

for command line processing in vivo PK/PD simulation

physchem (with chemistry module)

tibco.com

for visualization

Strand Life Sciences OpenEye SimCyp Tibco

also has R statistics package for model generation a gold standard for pKa calculation original gold standard logP calculator identifies substructures associated with toxicity; predicts metabolites visualization + compound ranking models can be customized29 contains a compound ranking scheme30

allows QSAR model building allows QSAR model building

a

Most commercial molecular modeling packages also have the capability of calculating physicochemical properties and generating plots and charts, although this is not typically their focus. Examples include MOE from CCG (chemcomp.com), Sybyl and Benchware3DExplorer from Tripos (tripos.com), Maestro from Schrodinger (schrodinger.com), and Vida from OpenEye (eyesopen.com).

in human liver microsomes. There are quite a few compounds in this category, increasing the confidence that predictions of unstable compounds will hold true. In contrast, a compound with a cHH probability >0.8 is deemed an attractive synthetic candidate because there is a 76% chance of the compound being stable. Reporting data as probabilities leaves the synthesis decision in the hands of the chemist. If there are other compelling reasons to make a molecule with a low predicted probability of being stable, then the chemist may decide to accept the risk. However, if other synthesis candidates exist in the same chemical series with higher calculated probabilities that can address the same compelling reasons, the chemist can make the analogues that are predicted to be more stable. Given the inherent difficulties of deriving accurate models to predict physicochemical and DMPK properties, companies have developed empirical methods to classify compounds.34 Examples include ADMET rules of thumb,3 ADMET traffic lights,35 solubility forecast indices,28 desirability functions such as QED,36 cheminformatics toolkits,37 bioavailability scores,38 and multiparameter ranking schemes for CNS compounds.27,39 Because models have different prospective predictive powers, we have found it useful to create customized project-specific scoring functions using software such as Stardrop30 and Vortex (dotmatics.com) (Table 2). These scoring functions can be used to rank order existing compounds or to triage virtual compounds prior to synthesis. More approximate models such as those that predict DMPK end points can be assigned a reduced importance when triaging potential synthesis candidates, relative to models whose end points are more accurately predicted. One such triage scheme is shown in Figure 4. Virtual combinatorial libraries were exploded, then subjected to a set of filters agreed on by the team, and then only those highly scored virtual compounds were considered for actual synthesis. Later in the project, a multiparameter scoring profile was agreed on and used to triage compounds. Systematic application of this paradigm resulted for example in a steady improvement of stability in hepatocyte cells without sacrificing cellular potency (Figure 5), leading to the discovery of clinical candidate quality

Figure 1. Calculation of the probability of a compound being stable. The probability density function is centered at the predicted clearance value (dotted line). The red line at 13 mL/min/kg is the threshold between stable and unstable compounds in human liver microsomes. This threshold varies from model to model. The shaded area under the curve from −∞ to the threshold is the probability of a compound being stable.

Figure 2. Use of normalized (0 → 1) probabilities to report the results of categorical model predictions. Instead of reporting a prediction from a LM QSAR model as “Labile” or “Stable”, one can report a probability of being stable.10 A value of 0.5 would mean total uncertainty in assigning the result to either category.

An example of reporting probabilities from human liver microsome and hepatocyte stability QSAR models in an actual project is shown in Figure 3. The team may decide to not make any compounds with a cHLM probability 70%; moderate [M], 30−70%; stable [S],