Quantifying, Visualizing, and Monitoring Lead Optimization - Journal of

Subscriber access provided by UQ Library

Article

Quantifying, Visualizing, and Monitoring Lead Optimization andrew maynard J. Med. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.jmedchem.5b00948 • Publication Date (Web): 11 Aug 2015 Downloaded from http://pubs.acs.org on August 15, 2015

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Medicinal Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

Quantifying, Visualizing, and Monitoring Lead Optimization Andrew T. Maynard* and Christopher D. Roberts♦ GlaxoSmithKline, 5 Moore Drive, Research Triangle Park, North Carolina 27709-3398, United States *Corresponding author: [email protected] ♦

Current address: Syros Pharmaceuticals, Boston, MA

KEYWORDS: lead optimization telemetry, standard risk, convergence efficiency, tractability, compound effort, design entropy

■ABSTRACT Although lead optimization (LO) is by definition a process, quantitative process-centric analysis and visualization of this important phase of pharmaceutical R&D has been lacking. Here we describe a simple statistical framework to quantify and visualize the progression of LO projects so that the vital signs of LO convergence can be monitored. We refer to the resulting visualizations generated by our methodology as the “LO telemetry” of a project. These visualizations can be automated to provide more objective, holistic, and instantaneous analysis and communication of LO progression. This enhances the ability of project teams to more effectively drive LO process, while enabling management to better coordinate and prioritize LO projects. We present the LO telemetry of five LO projects comprising different biological targets and different project outcomes, including clinical compound selection, termination due to preclinical safety/tox, and termination due to lack of tractability. We demonstrate that LO progression is accurately captured by the telemetry.

We also present metrics to quantify LO efficiency and

tractability.

1 ACS Paragon Plus Environment


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

■INTRODUCTION Lead optimization (LO) is an iterative process of improving the pharmacology, DMPK, and physical properties of synthetic compounds to achieve composition of matter of sufficient quality to merit clinical testing. While desirable DMPK and physical property properties are fairly consistent across projects for orally bioavailable compounds, all LO projects are unique with regard to the primary molecular target biology and pharmacology. Further, the manner in which a project team pursues LO is particular to the composition of the team (leadership, experience, knowledge, skills), which can also change in the course of an LO program. Veteran project leaders emphasize LO is a team sport, as much as it is a technical art1. In addition to the multivariate complexity of the optimization process, the duration of LO projects often spans several years, involving thousands of compounds, eventually ending in either selection of a clinical candidate compound or project termination. Given the multivariate complexity, idiosyncrasies, and duration of LO projects, progression towards candidate selection often appears opaque and disjointed to an outsider. Even for a project team member, it can be difficult to view LO progression holistically and objectively. While the number and sophistication of biological and biochemical assays underpinning LO have dramatically increased in the past 40 years, the overall communication and management structure of LO projects has remained essentially unchanged. By its nature, traditional personal communication of LO progress (e.g. memos, .ppt slides, .xls tables, etc.) is incomplete and static. Generally, only periodic communication of project highlights is practical, featuring a subset of selected compounds. Assessing whether or not an LO project is progressing effectively towards the clinic inherently involves subjective judgment that needs to be balanced with objective metrics. Establishing quantitative and holistic visualizations of LO progression supports more objective and nuanced LO evaluation and management decisions. It is also valuable to create a framework that captures the longitudinal knowledge (properties, dynamics, endpoints) of past LO projects to help guide future programs. Our motivation for developing a framework to assess LO quality is to enhance, not diminish, traditional modes of LO management by providing more transparent, objective and holistic views of LO progression that enable more responsive and efficient management decisions. Although considerable time and resource can be expended in the lead optimization phase of R&D, process-centric analysis of LO in industrial settings has been


Page 2 of 30

Page 3 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


lacking. Given increasing pressures to enhance the efficiency of discovery and development of new therapeutics2, it is important to develop meaningful analytics to describe and visualize LO efficiency in the context of ongoing projects. In the context of LO portfolio management, discerning healthy, convergent LO projects also supports prioritization and coordination of projects to enhance productivity. To date, LO has been described primarily in static compound-centric terms, specifically the “compound quality” of the eventual clinical asset resulting from an LO program, in the context of Lipinski’s seminal “Rule of 5” and subsequent studies3-6. While compound quality metrics are valuable for defining desirable endpoints, the dynamic qualities of LO projects remain ill-defined. A compound, whether a lead compound or the eventual clinical candidate, is only a single endpoint in a succession of many compounds, often thousands, synthesized in the course of an LO program. The temporal quality of LO project progression is missing. The dynamics of LO progression enables a holistic analysis of the quality of all compounds associated with an LO project, greatly enhancing understanding of both product and process. Shifting to a project quality paradigm, we present a framework to quantify, visualize and monitor the dynamics of LO projects in order to better inform and manage LO process. We refer to the resulting project visualizations as the “LO telemetry” of a project. LO telemetry allows progression to be readily visualized and tracked by anyone interested in a project. Key LO data is remotely harvested and transformed to visualize the key dynamical vital signs of projects. In the context of a web interface, this enables anyone, anywhere, to quickly and interactively view LO progression. In addition to supporting individual LO projects, LO telemetry can be adapted to support portfolio management and coordination of LO programs. Our efforts to discover and develop Hepatitis C Virus (HCV) replication inhibitors provided the initial impetus for this work. Rapid advancement of Hepatitis C Virus (HCV) replication inhibitors, as recently chronicled in an HCV thematic edition of this journal7, has made dramatic progress toward effective cures for HCV. Four programs formed the core of our LO portfolio: NS4B, NS5A, NS5B and PI4Ka, which have been previously described8-14.

In an effort to better monitor and prioritize these programs, we developed the LO telemetry

framework described herein.

The resulting telemetry of these LO projects, which reached different outcomes, is

presented. Two projects successfully achieved clinical candidate selection (NS5A, NS5B), while two were halted due to preclinical safety issues (NS4B, PI4Ka). For additional project diversity, we include the telemetry of an



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

oncology project that was terminated due to lack of LO tractability. Analysis of the telemetry of these five projects enabled development of LO efficiency metrics that are also presented.

■ Quantifying Lead Optimization The conceptual framework we adopt for quantifying LO convergence is based on the minimization of risk, defined in statistical terms. For a given optimization variable (x), every LO project begins with a set of compounds with some distribution of SAR relative to a desired threshold, xT. In principle, each optimization step attempts to move the SAR closer to the threshold, eventually achieving convergence. From a statistical viewpoint, this corresponds to attempting to move the tail of the SAR distribution closer to xT, effectively shifting the mean (µ) of the distribution towards xT and expanding the dynamic range (variance) of the SAR. The “risk” of an optimization variable corresponds to its resistance to convergence, which can be quantified in terms of the standard deviate distance (σ) between µ and xT, Figure 1a. If a project team is unable to fully optimize a particular variable, compounds will carry a residual risk, reflected by their statistical distance from convergence. For a given compound and LO variable x, we define the standard risk (rx) as =

| − | ℎ → , 1

where h is a step function that switches from one to zero when x reaches or exceeds the desired threshold xT, Figure 1a. The step function also takes into account the direction of convergence. Thus, risk is measured in terms of σ units from convergence, approaching a minimum of zero upon convergence, with no extra credit for exceeding convergence. Importantly, transformation from {x} to {r} standardizes the LO variables so that the variance of risk can be compared between variables within a project. This also sets the stage for comparing the minimization of risk across projects. In essence, the functional form of Eq. 1 is analogous to a z-score or standard score, where µ is now replaced by xT and the function is biased towards the risk side of the threshold, becoming equivalent to a z-score when µ → xT. For multivariate optimization, the total risk (R) of the ith compound can therefore be expressed as the sum over all LO variables {x} = , 2

or as a weighted average (ρ),


Page 4 of 30

Page 5 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


=

∑ , ∑ 3

Use of a weighted average can offer project teams flexibility in weighting LO variables of particular importance. Prior to the linear transform Eq.1, log transform is usually applied to raw data to improve the normal distribution profiles of {x}, if not already done. In practice, σ is also computed by the median absolute deviation (MAD)15 which provides a more robust measure of variability, σ = 1.48 * MAD. In cases where there is little or “no” SAR (σ ≈ 0), σ is defined as the non-zero experimental assay precision; usually biochemical and biological assays have two to three-fold variability. For discrete “pass/fail” LO variables, for example reactive (GSH) metabolites, σ ≡ 1 and r ⊂{0 (pass),1 (fail)}. Thus, either continuous or discrete LO data can be treated.

Figure 1. a) The statistical framework for determining compound risk. The overall risk of an LO variable corresponds to its distribution relative to the desired convergence threshold xT, determined by the distance of the mean (µ) from the threshold in relation to the variance of the SAR (σ). The interplay between µ and σ, relative to xT, determines the likelihood of observing convergence. Increasing the statistical distance between µ and xT proportionately increases the risk of not achieving convergence, while increasing σ (broad SAR) proportionately decreases risk, as reflected in Eqn. 1. If an observation x equals or exceeds the desired threshold, the risk is defined as zero. b) Color coding the histogram of HCV gt1a replicon inhibition data for the NS4B project by the distance r from the pEC50 threshold, xT ≡ 8.5, Eq.1, where σ is based on the median absolute deviation (MAD); MAD replicon gt1a pEC50 = 1.0, σ = 1.48 * MAD = 1.48. Color coding by r is automatically determined in terms of σ units, Eq. 1: r = 0 (green), 1 ≥ r > 0 (yellow), 2 ≥ r > 1 (orange), r > 2 (red). Assay limit pEC50 < 4.3.



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

A byproduct of this data transformation is a natural statistical framework for color coding raw LO data by risk, i.e., the number of σ units from convergence. This is quite useful for compound progression tables consisting of multivariate LO data (examples to follow and in Supporting Information), since human cognition is much more adept at recognizing and translating colors than numbers. We adopt a convention of: green r = 0, yellow 0 < r ≤ 1, orange 1 < r ≤ 2, and red r > 2. This is shown in Figure 1b, where the distribution of HCV genotype 1a (gt1a) inhibition data for NS4B project compounds is color coded by risk. Color coding compound data also reinforces a statistical understanding of LO convergence. Alternative data transformations have included the use of desirability functions16,17 (DF) for scoring and prioritizing compounds18. More recently, this approach has also been used to analyze the time course of compound quality in LO progression19. While DF approaches are well established for multi-objective optimization, shortcomings can arise due to neglect of the variance-covariance data structure of the parent variable distributions2022

. Typically, DF techniques involve the linear transform of an LO variable to [0,1], where maximum desirability is

optimal. Both the DF location and scale of variance are proactively defined by the user and since mapping of a DF distribution may not necessarily coincide with the observed data distribution careful parameterization is required. In contrast, we take a more passive approach, using a simpler transformation (Eq. 1) to map an LO variable to [0,ℝ], measured in terms of the variability (σ) of the observed data distribution, and where minimization of risk is optimal. Rather than impose desirability, we measure whether a process is conducive to minimization of risk, as reflected by the evolution of the observed distribution relative to convergence (xT), Figure 1. In the projects presented, the number of LO variables ranges from 9-22, which is characteristic of many projects. For every project there is a core set of in vitro and in vivo DMPK variables which determine compound bioavailability {x}DMPK. Augmenting these variables are target-based pharmacology variables {x}TARGET that are unique to an LO program, such as primary target potency (biochemical) and associated cell-based potency, as well as off-target selectivity. If desired, physical properties (MW, logD, solubility) can also be added {x}PHYS. Because Eq.2 is additive, contributions to the LO profile of a project can be easily partitioned into different components or telemetry channels, for example {x}DMPK+TARGET and {x}PHYS, to assess key or problematic risks.


Page 6 of 30

Page 7 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


In the course of an LO campaign, risks evolve in accordance with the shifts of the variable distributions, reflecting evolution of {µ} and {σ}, relative to the endpoints {xT}. Not all variables are independent. Some variables usually prove relatively easy to converge, while others are more problematic. Covariance between a subset of {x} can promote either constructive (positively correlated) or divergent (anti-correlated) convergence that leads to bottlenecks in the optimization process. In principle, a project team attempts to converge all variables simultaneously, but in practice there is an optimization cascade, usually prioritizing optimization of target-specific pharmacology and backfilling optimization of DMPK variables, with some measure of constraints in regard to desirable physical properties. The time course of an LO program is naturally established by the registration stamp of project compounds. We work with the registration sequence of project compounds, or the compound progression sequence, to define a time course. For quantification and visualization purposes, this avoids potential gaps and unevenness in the time domain when working with compound registration dates (Julian calendar), due to project delays, holidays, etc. No optimization problem can be defined without defining the desired endpoint. Operationally, all that is needed to quantify the convergence (efficiency) of an LO process is definition of the variables {x} and associated convergence thresholds {xT}. The LO project team drives the LO process, which is subsequently quantified and visualized. Some project teams may be reluctant to define LO convergence endpoints, but it is a necessary and essential step towards defining and objectifying the LO process. Care should taken to include only important LO variables and to define realistic endpoints that would sufficiently match the profile of a clinical candidate compound. As a rule of thumb, if a compound were to match the profile {xT}, it would be good enough to scale for advanced animal safety/tox studies as a prelude to candidate selection. With the desired thresholds defined, statistical progress towards these endpoints can be visualized as a function of compound progression. In essence, LO is educated guesswork that the next compound made will be more potent, more selective, more stable, more bioavailable, more soluble, etc. As each compound is made and tested we can now objectively quantify how well guesses are made in terms of minimizing residual risks {r}, relative to thresholds {xT}. Here, we treat LO as a phenomenological process which we measure quantitatively within a simple statistical framework that enables a variety of visualizations.



■ RESULTS AND DISCUSSION HCV NS4B Lead Optimization Telemetry and Analytics Hepatitis C virus (HCV) is a global health issue that is estimated to affect 170 million individuals worldwide23. As part of our efforts to discover and develop HCV therapeutics, targeting NS4B mediated HCV replication was an important component of our strategy, offering the prospect of first-in-class HCV therapeutics. These efforts have been previously described8,9. The LO variables of the NS4B project, including the desired LO convergence thresholds which were defined in consultation with project leadership, are shown in Table 1. Table 1 also includes data metrics reflecting the associated number of data points and distributions of the data. The experimental cutoff for pEC50 and pCC50 assays was 0 (yellow), 2 ≥ r > 1 (orange), r > 2 (red). b

{xT}, Eqn 1.

c

direction of convergence: decreasing = -1, increasing = 1.

d

the experimental cutoff for all pEC50 and pCC50 assays was

Quantifying, Visualizing, and Monitoring Lead Optimization - Journal of

Recommend Documents