Toxicology Strategies for Drug Discovery: Present and Future

Review pubs.acs.org/crt

Toxicology Strategies for Drug Discovery: Present and Future Eric A. G. Blomme* Global Preclinical Safety, AbbVie Inc., 1 North Waukegan Road, North Chicago, Illinois 60064, United States

Yvonne Will Drug Safety Research and Development, Pfizer, Eastern Point Road, Groton, Connecticut 06340, United States ABSTRACT: Attrition due to nonclinical safety represents a major issue for the productivity of pharmaceutical research and development (R&D) organizations, especially during the compound optimization stages of drug discovery and the early stages of clinical development. Focusing on decreasing nonclinical safety-related attrition is not a new concept, and various approaches have been experimented with over the last two decades. Front-loading testing funnels in Discovery with in vitro toxicity assays designed to rapidly identify unfavorable molecules was the approach adopted by most pharmaceutical R&D organizations a few years ago. However, this approach has also a non-negligible opportunity cost. Hence, significant refinements to the “fail early, fail often” paradigm have been proposed recently to reflect the complexity of accurately categorizing compounds with early data points without taking into account other important contextual aspects, in particular efficacious systemic and tissue exposures. This review provides an overview of toxicology approaches and models that can be used in pharmaceutical Discovery at the series/lead identification and lead optimization stages to guide and inform chemistry efforts, as well as a personal view on how to best use them to meet nonclinical safety-related attrition objectives consistent with a sustainable pharmaceutical R&D model. The scope of this review is limited to small molecules, as large molecules are associated with challenges that are quite different. Finally, a perspective on how several emerging technologies may impact toxicity evaluation is also provided.

■

CONTENTS

1. Introduction 2. Target Knowledge 3. General Comments about Compound Profiling and Early Data Points 4. Physicochemical Properties, Chemical Reactivity, and Nonclinical Safety-Related Attrition 5. In Silico Toxicological Evaluation 5.1. Computational Models in Toxicology 5.2. Specific In Silico Assessments 5.3. Improving Toxicity Prediction and TSAs through Protein−Ligand Interaction Network Analysis 6. In Vitro Toxicology 6.1. In Vitro Genetic Toxicology 6.2. Molecular Pharmacology Profiling 6.3. Kinome Profiling 6.4. In Vitro Cardiovascular Profiling 6.5. High-Throughput Cytotoxicity Assays and Early Identification of Problematic Chemical Matter 6.6. Mitochondrial Toxicity Screens 6.7. Other in Vitro Toxicity Assays 6.8. Zebrafish 7. Early in Vivo Toxicology Evaluation © 2015 American Chemical Society

7.1. Exploratory Toxicology Studies 7.2. In Vivo Cardiovascular Assessment 7.3. In Vivo Gene Mutation Tests 7.4. Nonstandard in Vivo Models 8. Utility of Biomarkers Including the Omics Technologies in Early in Vivo Toxicology Assessment 9. An Eye to The Future 9.1. Complex in Vitro Models 9.2. Induced Pluripotent Stem Cells 10. Conclusions Author Information Corresponding Author Funding Notes Biographies Abbreviations References

474 476 477 478 479 479 480

480 481 482 483 483 484

485 486 488 489 490

490 491 491 491

492 493 493 494 495 496 496 496 496 496 496 497

Special Issue: Toxicology Strategies for Drug Discovery - Present and Future Received: September 25, 2015 Published: November 20, 2015 473

DOI: 10.1021/acs.chemrestox.5b00407 Chem. Res. Toxicol. 2016, 29, 473−504

Chemical Research in Toxicology

Review

1. INTRODUCTION Pharmaceutical research and development (R&D) is a long, complex, and expensive process that results in more failures than successes with approximately only 10% of molecules entering phase 1 clinical trials being ultimately approved by the United States Food and Drug Administration (FDA).1,2 Failures at the Discovery stage are even more frequent.3 There are regularly published, quite variable estimates of the direct costs of bringing a new drug to the market, and irrespective of the estimates, these costs are largely over US$ 1 billion at the time of publication. Beyond challenging the sustainability of the pharmaceutical R&D model, this cost issue deters drug developers from focusing on several areas with significant unmet medical needs, and patients do not get access rapidly enough to new medicines with better efficacy and safety profiles. The increased investments in R&D over the past 30 years have not translated to meaningful increases in drug approvals, leading to what has been referred to as the pharmaceutical R&D productivity gap: increased investments without corresponding increased success rates. There are many factors that explain this productivity gap, and those have been reviewed elsewhere in more details.1,4 One of these factors is failure due to safety issues.3,5 In particular, attrition due to nonclinical safety represents a major issue for R&D productivity, especially during the compound optimization stages of drug discovery and at the earlier stages of clinical development.3,5 Reported attrition numbers vary according to company and reports.1 While these attrition numbers can be useful benchmarks, it is nonetheless important to analyze them with caution, as they depend on a large number of variables.1,5 First, these numbers vary according to the time periods evaluated. Second, although often reported by large pharmaceutical companies, these analyses are based on relatively small numbers of molecules, numbers often too small for any robust statistical analysis. Third, nonclinical safety-related attrition will vary according to therapeutic areas. For instance, nonclinical safety-related attrition is generally lower in oncology compared to indications where safety is a more critical component for success, such as in the areas of antiviral agents or metabolic diseases. Fourth, companies have comparable, yet not similar governing processes, which typically dictate the fate of a compound or project, as well as different working principles, risk tolerance, culture, and strategies.6 These organizational aspects impact when and how a project is labeled as terminated. Finally, a compound rarely fails due to a single reason, and nonclinical toxicity has often been used as a default category. For example, a specific toxicity may be acceptable if observed at a comfortable predicted safety margin in an investigational new drug (IND)-enabling good laboratory practice (GLP) toxicology study. However, if the human pharmacokinetic (PK) prediction is incorrect, the safety margin may shrink to an unacceptable level. In that situation, the termination will typically be labeled as related to nonclinical toxicity, while in fact one may argue that it was related to an incorrect human PK prediction. Conversely, a suboptimal toxicity profile may limit the ability to interrogate the pharmacodynamic (PD) effect or target engagement of an exploratory compound in a phase 1 clinical study. This may be categorized as an efficacy failure or as a nonclinical toxicity failure depending on the organization. The main toxicities driving nonclinical safety-related attrition are variable depending on the company. Most companies are

usually reporting liver and cardiac toxicities as two of the most common toxicities.7−9 However, as already stated, these data are heavily influenced by current and past practices in individual organizations, as well as the therapeutic areas of interest. For instance, in the past decade, front-loading cardiovascular assessment with a battery of assays and models has significantly impacted the incidence of unexpected cardiovascular findings in pivotal safety pharmacology GLP studies or in early clinical trials.10,11 In addition, other factors influence the types of development-limiting toxicities observed in an organization. For example, compounds targeting the central nervous system (CNS) will require particular physicochemical properties that are more frequently associated with some undesirable effects such as pharmacological promiscuity, while compounds targeting the kidney typically have quite different biodistributions. Finally, the chemical spaces vary depending on the company and likely influence the nature of the toxicities observed preclinically, although no good controlled studies have demonstrated that to be the case. At AbbVie, the main toxicities driving termination in nonclinical studies are quite variable and have evolved with time partly due to changes in therapeutic area focus, as well as integration of novel practices for nonclinical safety assessment. Focusing on decreasing nonclinical safety-related attrition is not a new concept, and various approaches have been experimented with over the last two decades. Fully understanding the impact that these approaches may have had is a difficult task, as it takes a few years to generate enough data to assess improvement, and these data are usually too limited for any robust assessment. A decade or so ago, the mantra was “kill early, kill often” or “fail early, fail often”, the concept being that terminating bad molecules as early as possible in Discovery would result in substantial productivity increases. This was implemented by front-loading testing funnels in Discovery with decision-enabling, often in vitro toxicity assays designed to rapidly identify these unfavorable molecules. While some may argue to the contrary, current evidence indicates that this approach has not resulted in meaningful improvements in R&D productivity in general and in nonclinical safety-related attrition in particular. One may even argue that it may have had a negative impact on productivity by limiting alternatives for chemistry (what one may consider as an opportunity cost). While technology advances have been considered a viable and potentially transforming solution for early safety prediction, it is fair to say that in the last two decades most technologies labeled as “new” or “emerging” have had limited impact on overall nonclinical safety-related attrition. Furthermore, the strategy adopted by the industry of front-loading with more toxicology assays to “kill early, kill often” has been recently challenged since most assays (even good ones) have associated false positive rates, and the combination of these early assays result in an overall low positive predictive value (PPV). This is partly explained by the fact that at this stage little information is available about efficacious exposures, such that data generated from these early assays can really only be used for hazard identification and not risk assessment. In other words, frontloading with more toxicology assays to “kill early, kill often” can result in discontinuation of too many “good” molecules (i.e., molecules with the potential to succeed) and in increased cycle times at the lead optimization (LO) stage. This is mostly because few data points can be accurately interpreted in isolation without taking into account other contextual aspects. 474



Review

Figure 1. Overview of the main assays and studies conducted in Discovery toxicology.

Figure 2. Target safety assessments (TSAs) are conducted for all targets at the stage of target exploration. Multiple sources of information and tools are available to toxicologists to conduct these TSAs. The figure illustrates the various components used at AbbVie. Evaluation of this large collection of data is facilitated by text mining strategies powered by search tools developed internally, as well as in silico simulation of the pharmacological consequences associated with changes of activity of the target or pathways.

Others have also proposed that considering “what will not fail” may be a better way forward.8 This debate does not, however, negate the utility of focused, well-understood, and validated assays to better guide and inform chemistry efforts. However, rather than using data from these assays as decision criteria to discontinue compounds, these data are probably better used as “alerts” to influence the overall nonclinical safety testing strategy in a particular project. In other words, as nicely articulated by Hornberg et al., the

strategy should be more about “testing the right things at the right time... instead of testing everything early”.8 This review provides an overview of the types of assays and approaches that have shown utility and a personal view on how to best use them to meet nonclinical safety-related attrition objectives consistent with a sustainable pharmaceutical R&D model. Figure 1 provides a high-level overview of the assays and studies that will be discussed in this review, as well as their timing during Discovery and early Development. The scope of this review is 475



Review

bioinformatics platforms allowing for in silico simulation of the effects associated with changes in target activity or pathway perturbations can be quite useful in spite of their current limitations.12 Some of these platforms are commercially or publicly available.12 As described later, those developed internally possess the advantages of incorporating internal experimental data that may be quite useful to also identify chemical matter as a basis for the synthesis of tool compounds; those can then be used for in vitro or in vivo experiments to interrogate the potential liabilities. For compounds identified through public information as potential modulators of the target of interest, a chemical safety review, including profiling in a battery of in vitro tests (e.g., cytotoxicity, molecular pharmacology, and mitochondrial toxicity), may also be conducted. The cardiotoxicity associated with targeting ErbB2 is a good and well documented example of a target-related safety liability. ErbB2 is a receptor tyrosine kinase also called HER2 (for human epidermal growth factor receptor 2) or HER2/neu. It is a validated target for cancer treatment, with Herceptin (trastuzumab, a monoclonal antibody or mAb) representing an important therapeutic option for cancer treatment, especially breast cancer.13 Because of the role played by the ErbB2 pathway in cardiomyocyte differentiation, growth, repair, and survival, Herceptin, as well as several other molecules impacting ErbB2 signaling, has been associated with cardiac dysfunction in the clinic. This obviously impacts the benefit/risk ratio of this class of drugs and limits their use in patients with preexisting cardiac dysfunction. However, a deeper understanding of the mechanism of toxicity of these benchmark anti-ErbB2 molecules offers an opportunity to circumvent this undesirable toxic effect. For instance, recent data indicate that the cardiac liability of these molecules is related to their ability to prevent the assembly of the Neuregulin 1 b (NRG-1)/ErbB2/ErbB4 complex, which promotes cardiomyocyte survival.13 This mechanistic understanding provides a key starting point for the development of alternative antibodies targeting different epitopes to circumvent the cardiotoxic effect. The epidermal growth factor receptor (EGFR) is another member of the ErbB family and a target for cancer that has received a lot of attention for its potential in oncology. Multiple companies have been targeting EGFR over the years using either small molecule receptor tyrosine kinase inhibitors (TKIs, including gefitinib, erlotinib, afatinib, and dacomitinib) or mAbs (e.g., cetuximab and panitumumab). This approach has definite therapeutic benefits and EGFR inhibitors have become the standard of care for the treatment of advanced nonsmall-cell lung cancer. However, these agents are also associated with dermatologic side effects characterized by a severe papulopustular rash typically affecting the face, scalp, upper chest, and back.14,15 These target-related dermatological effects are due to the critical role of EGFR in the physiology and development of the epidermis, and interestingly, a positive correlation between skin rash and clinical response has been observed.14 This rash may lead to secondary bacterial or viral infections, interference with daily activities, dose reduction, or discontinuation.14 To address this on-target toxicity, AbbVie oncology discovery scientists and colleagues used a mAb that selectively targets a unique epitope of the EGFR, which is largely inaccessible when EGFR is expressed at normal physiological levels (i.e., levels observed in normal tissues) in contrast to levels in tumors.15 This tumor-specific binding property provides a scientific basis

limited to small molecules, as large molecules are associated with challenges that are quite different. Finally, a perspective on how several emerging technologies may impact toxicity evaluation is also provided.

2. TARGET KNOWLEDGE Pharmacological modulation of the activity of some interesting therapeutic targets may result in unacceptable safety issues, such that therapeutic benefits cannot be easily separated from toxic effects. With discovery units investigating more unprecedented targets playing central roles in cellular homeostasis, this situation is probably more common than before. Being aware of the potential safety liabilities of targets and interrogating achievable therapeutic windows for these liabilities at an early stage become paramount for prioritizing work on targets more likely to succeed, for thinking early about creative ways to increase the therapeutic window for those targets or for better predicting safety issues in humans. With this context in mind, most large pharmaceutical companies are routinely conducting an evaluation of the potential liabilities of new targets in Discovery. This exercise has different names depending on the company and may include different components, but the overall principle of trying to learn earlier and being proactive with regards to target derisking strategies is the same. At AbbVie and Pfizer, we conduct what we refer to as target safety assessments (TSAs) for all targets at the stage of target exploration. This assessment is based on a deep understanding of the biology of the target, the generation of a tissue expression map (mRNA as well as protein expression), the evaluation of a variety of data or information (such as human genetic data, phenotypes of available genetically engineered rodent models, ongoing or past clinical trials targeting similar targets or pathways, or data from the patent literature), interrogation of biomedical literature databases facilitated by text mining strategies, as well as in silico simulation of the pharmacological consequences associated with changes of activity of the target or pathways (Figure 2). This collective knowledge is then used to generate a “TSA dashboard” that summarizes the lists of potential liabilities, their probability of occurrence (low, mid, and high), their impact on development (low, mid, and high), the proposed derisking approaches (i.e., the activities that are recommended to rapidly interrogate the liability), and links to the most critical information. These TSAs are obviously living documents that require updates as new information (internal as well as external) becomes available. Ultimately, TSAs represent the basis for the design of target/pathway-specific safety testing plans for target/pathway validation, hit identification, and LO. Importantly, as they evolve and become more accurate during the life of a project, TSAs can serve as a basis for identifying populations that may be more sensitive to some adverse effects or alternatively patient populations that may benefit therapeutically from a specific pharmacological mechanism. For this approach to be successful, companies need to develop the appropriate infrastructure for scientists to access and mine the vast amount of information and data available to evaluate targets and pathways. At AbbVie, we worked with our knowledge management and information technology colleagues to develop customized tools for that purpose. Those include search engines that mine a vast array of internal and external databases, and provide information categorized in an appropriate manner and prioritized according to relevance for safety. In addition, chemoinformatics, pathway analysis, and 476



Review

excretion (ADME) profiling has been associated with better in vivo PK properties.19 A major reason for the success of ADME is that a consensus was reached among scientists in academia and industry on the best ways to tackle and optimize ADME properties. This consensus has led to a common agreed set of approaches to address the problem. In contrast, there is no consensus on the best ways to conduct early profiling for toxicity. This lack of consensus has definitely slowed down progress, although more efforts have been put toward that objective in the past few years. However, it should also be acknowledged that this lack of consensus is partly due to the complexity, lack of full characterization or understanding, and diversity of mechanisms of toxicity. Here, we will focus on assays that can be used early in Discovery to profile compounds with the objective of decreasing their probability of causing development-limiting toxic effects in vivo. The key word here is “probability”. Although some of those assays or filters have arguably attractive performance characteristics, all have imperfect negative and positive predictive values, and few can be interpreted in isolation and out of context. These individual data points need to be interpreted in an integrated manner to increase the probability to identify compounds with better safety profiles. Furthermore, a significant proportion of profiling assays generate semiquantitative end points or binary outputs, which may facilitate decision-making in some situations, but also limit their usefulness in other instances. An important aspect when trying to front-load toxicity assays or guidelines early in the hope of identifying and eliminating “bad players” is the multiple testing problem or false-discovery rate (FDR). Simply stated, by trying to identify “bad” compounds using multiple parameters (even assays with good calculated performance characteristics), there is a high likelihood that most compounds will have at least one bad attribute by random chance alone. Since a single bad attribute, even if real, may not necessarily translate to a clinical significant safety issue, this strategy leads to the termination of a large numbers of compounds with likely acceptable safety profiles. In other words, there are significant limitations for early data points originating from in vitro assays or filters to predict in vivo toxicity. The bile salt export pump (BSEP) membrane vesicle assay is a good illustration of the limitations of in vitro data interpreted in isolation. BSEP is a member of the ATP-binding cassette (ABC) family of transporters and is involved in the secretion of bile acids in the bile.20 Inhibition of BSEP has been suggested as a cause of cholestasis and hepatotoxicity (especially idiosyncratic) for several drugs, such as nefazodone and troglitazone.21 However, multiple widely prescribed drugs are also BSEP inhibitors and do not cause cholestasis clinically. For instance, the thiazolidinedione or glitazone class of compounds developed to treat type 2 diabetes mellitus are all BSEP inhibitors; yet, troglitazone is associated with idiosyncratic hepatotoxicity, while rosiglitazone has lower hepatotoxicity liabilities, and pioglitazone is not considered hepatotoxic.22 Clearly, when interpreting these data, the IC50 values for the inhibition of BSEP activity need to be interpreted in the context of exposure data (e.g., projected liver concentrations of the compounds achieved at the therapeutic doses), as well as effects on the activity of other transporters involved in bile acid homeostasis (i.e., the multidrug resistance-associated proteins or MRPs).21,23 The last aspect is particularly important since in addition to being a cell-free assay system and therefore lacking important cellular contents, the BSEP membrane vesicle assay

for eliminating or limiting the toxic skin rash observed with other anti-EGFR compounds. The transient receptor potential vanilloid-1 (TRPV1) channel is expressed at high levels in sensory ganglia, is activated by multiple noxious stimuli, and has been an attractive target for pain management. However, selective TRPV1 antagonists have been associated with increased core body temperature in preclinical models and humans, while the natural TRPV1 agonist capsaicin causes dose-dependent hypothermia in animals.16 Those data complemented by additional experimental evidence suggest that modulation of TRPV1 would invariably affect core body temperature in vivo. Given the potential antinociceptive benefit of TRPV1 antagonists, additional efforts have been made to better understand this on-target effect. These efforts have led to the identification of modality-specific TRPV1 antagonists, which, in contrast to first-generation antagonists that inhibit all modes of TRPV1 activation, do not elevate core body temperature in preclinical models and are acid-sparing (i.e., only partially blocking TRPV1activation following acid treatment).17,18 The reason why acid-sparing TRPV1 antagonists are not associated with elevation in core body temperature is mostly unknown. The identification of these second-generation antagonists required adjustment to the testing plans in order to frontload evaluation of the effects on core body temperature using appropriate in vivo models. In summary, leveraging the constantly growing amount of target-specific information as well as in silico biology tools to generate TSAs has multiple benefits to a discovery-focused nonclinical safety organization. First, these TSAs allow for optimal prioritization of experiments such that data-driven decisions on targets can be made more rapidly. Second, this exercise results in more complete, biology-driven data packages that ultimately should result in more comprehensive risk assessments of molecules in development. Third, by being proactive and thinking of potential safety issues earlier, scientists from different functions can brainstorm to develop creative and innovative solutions to circumvent these issues. Fourthly, as serendipity is still a key aspect of drug discovery, discrepancies between predictions and outcomes may sometimes result in the identification of off-target effects with either safety concerns or therapeutic benefits. Finally, a thorough biological understanding of targets and pathways is an important step to ensure the safety of subjects in clinical trials. However, it is important to acknowledge the challenge associated with generating meaningful metrics to understand the impact of target knowledge activities, especially on safetyrelated drug attrition and/or clinically concerning adverse effects. Numerous variables can affect safety-related attrition in Discovery and Development, including the nature of therapeutic indications and targets, concurrent changes in practices, as well as the overall proportion of small molecules versus large molecules in R&D portfolios.

3. GENERAL COMMENTS ABOUT COMPOUND PROFILING AND EARLY DATA POINTS Compound profiling offers the opportunity to triage compounds according to their inherent toxic properties but also to optimize compounds for improved overall drug-like properties and lower probability of inducing toxicity. Compound profiling is widely used to improve the PK properties of compounds, and although this aspect is beyond the scope of this review, it is noteworthy that absorption, distribution, metabolism, and 477



Review

TPSA, suggesting that nonpolar, lipophilic compounds may be associated with more adverse outcomes due to off-target effects.26 This observation that lipophilicity and low polarity may be important determinants of pharmacological promiscuity is consistent with that of others.27−29 Following this report, we conducted in 2009 a similar analysis at AbbVie to determine if these reported associations between toxicity and ClogP/TPSA was also reflected in our chemical space. Importantly, we tried to follow Pfizer’s approach as much as possible by using exclusively compounds for which well curated data from short-term exploratory toxicology studies were available, by exploring a set of 95 structurally and pharmacologically diverse compounds, and by using a similar definition of what is classified as “toxic” versus “clean” (with the exception that we used a 5 μM Cmax threshold compared to the 10 μM Cmax threshold used by Pfizer scientists as the practical desirable Cmax to achieve). The overall toxicity rate was 49% (47/95). Similar to that at Pfizer, compounds with ClogP > 3 and TPSA < 75 A2 were more likely to fail due to toxicity: approximately 60% of compounds violating the 3/ 75 rule were considered toxic in these exploratory toxicology studies. Further reports have provided similar evidence.30 However, it is noteworthy that other reports did not validate these findings, although the nature of data used was slightly different.31 Furthermore, other reports have suggested the influence of other molecular descriptors, such as molecular complexity.32 The inconsistencies between all these studies point to some of their limitations, which include, depending on the study: (1) a low number of well curated end points; (2) the variable nature of the data used (e.g., data from short-term exploratory studies versus data from robust, longer studies conducted under GLP conditions); (3) differences in the end points used or in the definition of these end points (e.g., differences in what is considered “toxic”) in the analysis; (4) an unavoidable bias due to the nature of the projects and chemical spaces of different companies; and (5) the overall weak, even if statistically significant, correlations reported. The latter point is especially noteworthy as it relates to the concept of correlation inflation.33 The majority of these analyses partition data into bins prior to analysis in an effort to simplify their complexity, but this practice of binning continuous data and of averaging quantities by bin typically exaggerates the strength of any trend identified.33 In addition, most retrospective analyses do not control well for potential covariates (i.e., other attributes that may influence the same parameters) and therefore may generate spurious relationships.30 Given the relatively modest statistical significance reported in the analyses, it is important to be cautious when interpreting these results but also use the proposed rules appropriately. These rules, if valid, have the advantage of raising awareness of potential issues and parameters that may be associated with higher probability of success. To address some of these pitfalls, four companies (AstraZeneca, Eli Lilly, GlaxoSmithKline, and Pfizer) partnered to evaluate a larger data set composed of 812 oral compounds at various stages of Development.5 While this data set can clearly contribute to a robust analysis, it nonetheless was limited to compounds nominated for development: in other words, compounds that had already satisfied several relatively stringent criteria for toxicity properties (such as more extensive in vivo testing). With this data set, ClogP and TPSA did not influence in vivo toxicological outcomes. Collectively, these various

does not interrogate other transporters, such as MRP2. This example also identifies an important gap for toxicology profiling to be successful: it is currently extremely difficult to estimate or simulate in vivo tissue exposure and the kinetics of compounds in tissues. To identify potential hazards associated with molecules, three general categories of profiling assays can be used: profiling for specific physicochemical attributes and chemical reactivity, in vitro toxicity assays, and molecular pharmacology assays. It is beyond the scope of this review to discuss in detail the technical aspects of these various assays, and those will only be discussed when they can have an impact on data interpretation.

4. PHYSICOCHEMICAL PROPERTIES, CHEMICAL REACTIVITY, AND NONCLINICAL SAFETY-RELATED ATTRITION In the past decade, there have been multiple extensive efforts to understand the impact of various physicochemical properties on probability of success in general and toxicity in particular. This has been partly stimulated by the observation that current drug candidates have in general increased molecular weight and higher lipophilicity compared to historical benchmark data, a phenomenon that some have creatively labeled as “molecular obesity”.24 These changing attributes have generated challenges related to formulation and absorption but also toxicity.19 More than a decade ago, Lipinski and colleagues conducted a large assessment of the physicochemical properties of 2,245 drugs to understand the attributes that drive oral bioavailability of small molecules.25 Their assessment led to the well-known Lipinski’s “Rule of Five” that can be used to predict the potential of a compound to exhibit good oral absorption. The Rule of Five states that to be orally bioavailable, a small molecule should violate no more than one of the following criteria: having no more than 5 hydrogen bond donors and 10 hydrogen bond acceptors; having a molecular mass 75 A2 were approximately 2.5 times more likely to be “clean” (i.e., not be associated with adverse outcomes at Cmax >10 μM) than “toxic”. In contrast, when both risk factors were present (i.e., ClogP > 3 and TPSA < 75 A2), compounds were approximately 2.5 times more likely to be “toxic” than “clean” in these short-term exploratory toxicology studies at Cmax 8.0 and ClogP > 1.0 tend to be associated with phospholipidosis.48 These models have been further refined by improving database curation49 and by incorporating additional features in the models, such as volume of distribution (Vd) to capture tissue accumulation50 or using QSAR models.51 5.3. Improving Toxicity Prediction and TSAs through Protein−Ligand Interaction Network Analysis. Prediction of potential off-target effects has traditionally been based on the shared identity (or sequence homology) between the target of interest and other proteins. In other words, under that principle, high homology indicates a close relationship between two proteins, resulting in an increased likelihood that a compound with high affinity for a target will also exhibit affinity for proteins with some degree of homology. This conventional approach leads to establishment of selectivity screens against isoforms of the target or other proteins with high homology with the target. These selectivity screens often include members of the same family than the target of interest, especially if those are known to be associated with safety liabilities. This concept does have validity and utility, and is routine practice in the industry. For instance, in the realm of protein kinases, a general guideline is that any two kinases with over 60% sequence identity are likely to exhibit comparable affinities for the same chemical inhibitors.52 However, the converse cannot be generalized, as even very distantly related kinases (from a sequence perspective) can exhibit surprisingly high pharmacological similarity.53 These pharmacological relationships (defined as the ability to bind with similar affinities a range of chemical entities regardless of sequence similarity) can only be determined by testing sufficient compounds against an array of protein targets. The recent explosion of public, commercial, and internal databases coupled with the availability of appropriate information technology infrastructures and analysis tools has offered a unique opportunity to gain even more information on pharmacological relationships and chemotype activity profiles. This growing knowledge when appropriately leveraged enables better prediction of probability of success for a target and a chemotype of interest but also of key potential liabilities that could and should be interrogated early for specific targets. For targets for which a sufficiently large number of compounds have been tested internally with some reasonable experimental consistency, AbbVie Cheminformatics scientists have developed tools to rapidly generate measures of

achievable task. A more realistic approach is to focus on very specific end points, such as interactions with a specified target protein (e.g., a drug transporter or DNA) or a well characterized, narrowly defined mechanism of toxicity (e.g., phospholipidosis). Another limitation of computational toxicology models is related to the quality and quantity of the data points used as training sets. To develop a performant model, a large volume of robust data points is required, and most available data sets do not provide the necessary level of data quality/quantity and of chemical diversity. For instance, data sets often contain data points obtained under different, frequently inconsistent experimental conditions. Likewise, models are often built upon an irrelevant chemical space, for instance using pesticides or natural toxins. Models can also be developed with congeneric series or a narrow chemical space, such that their applicability domain is limited to structurally related compounds.40 Such models are termed local or specific (in contrast to the more comprehensive global models) and can be quite useful when customized for use with chemical series associated with a specific, not pharmacologically driven toxic liability. However, the applicability domain of published models is typically not clearly defined and when it is defined, it is often not useful for drug discovery. Finally, published models are often not sufficiently validated using solid experimental evidence. However, in spite of their limitations, computational models offer obvious advantages in terms of cost and speed compared to “wet” experiments, which can be expensive, laborious, and time-consuming without even considering the need to synthesize compounds. As such, there is growing interest to develop these models for discovery toxicology, especially local, fit-for-purpose models. 5.2. Specific In Silico Assessments. With this context in mind, some computational approaches have definite value in pharmaceutical R&D. In particular, in silico methods are widely used in genetic toxicology for the identification of structural alerts, and their relatively good performance is related to the fact that they focus on a very well-defined mechanism of toxicity (i.e., mutagenicity) that results from direct DNA interaction and depends partly on chemical electrophilicity. The performance of these systems is still variable, although a lot of the major reactive chemical structures associated with genotoxicity are known and incorporated into commercial or proprietary in silico models such as MultiCASE or DEREK.41−43 In addition, some models allow companies to integrate internal data to enhance their performance. Hence, for regulatory submissions during impurity qualification process for drug substances and products, the use of a rule-based expert system complemented by either expert knowledge or a second QSAR model is recommended.43 Computational approaches are also quite attractive to predict interactions with drug transporters. Although crystal structures of some transporters, such as P-glycoprotein (P-gp), are available to enable structure-based homology models, the majority of in silico methodologies to predict interactions with transporters are based on common structural fragments, recognition elements, or physicochemical descriptors.40 Several methods are available to predict substrates, inhibitors, or modulators of P-gp.44 Since the critical structure of P-gp has been deciphered, molecular docking models are also feasible in the future, although it is known that compounds interact with transporters at several binding nodes (a feature referred to as polyspecificity).44 Models also have been reported for other transporters, such as BSEP and the MRPs.40 However, these 480



Review

PXR) that can then rapidly be interrogated using appropriate in vitro assays; (2) the identification of chemical series with higher risk based on polypharmacology; (3) the selection of the most appropriate in vitro assays to guide LO; and (4) the formulation of data-driven hypotheses to investigate mechanisms of toxicity.

pharmacological similarity (i.e., similarity in binding profile independent of protein sequence or a measure of the relatedness of targets based upon experimental profiling of potential ligands; Figure 4).53 Simply stated, this analysis allows

6. IN VITRO TOXICOLOGY In vitro toxicity assays are attractive since they have high throughput, require minimal quantities of compounds, and reduce animal testing (in keeping with the 3R principles). In vitro toxicity tests can be useful in series selection and LO by guiding chemistry toward a chemical space with reduced toxic liabilities. However, to impact discovery productivity, in vitro methods need to generate data of a quality level sufficient to increase the probability of making a good molecular design decision. That implies data that can be used to infer the effects of compounds in humans and/or animal species that will be used in toxicology studies.54 While this statement seems obvious, it is noteworthy that a majority of current tests used in some organizations (and developed under significant imperatives to make an impact with in vitro technologies) have debatable utility, such that one wonders whether establishment of in vitro toxicity assays became an end objective, rather than the mean to a more relevant end objective (i.e., predicting toxicity and increasing probability of success). Nevertheless, despite some recognized limitations, some assays have a definite utility, such that they can confidently be used routinely during LO. For instance, in vitro genetic toxicity and safety pharmacology assays (e.g., assays for interaction with the hERG channel or molecular pharmacology profiling) are routinely used during LO and candidate selection in most companies using similar, yet not identical paradigms. In contrast, other in vitro toxicology assays are probably more useful when used on an as-needed basis to support SAR work around specific toxic end points. In addition, albeit desirable, it is not practical to set up screens for all possible toxic effects that may be observed in animals or humans. Hence, most companies tend to focus on the most common toxic events observed especially in humans, including hepatotoxicity, cardiovascular toxicity, and CNS toxicity.8 Finally, previous institutional experience heavily (and maybe inappropriately) influences testing paradigms, resulting in the integration of unique assays in some companies. Consequently, the way in vitro toxicity assays are used in the industry varies quite a lot across companies, reflecting the complexity of successfully integrating in vitro toxicology assays during pharmaceutical Discovery and also the fact that most organizations cannot develop accurate metrics to understand the positive or negative impact of any of those assays. Understanding and refining experimental conditions for in vitro assays and screens are of outmost importance. For example, years of in vitro testing using rat and human hepatocytes have advanced our understanding of the changes in expression and activity of CYP450 enzymes and transporters over time in culture and how these changes may affect experimental outcomes. However, most toxicology laboratories still culture hepatocytes in glucose media, even though a more physiological substrate would be lactate. Likewise, the bionergetic profile of induced pluripotent stem (iPS) cellderived cardiomyocytes changes depending on the substrate used: cells grown in low glucose, galactose, and fatty acids demonstrate increased aerobic metabolism in comparison to cells cultured in high glucose media.55 As discussed later, this

Figure 4. Pharmacology similarity measures. Using interaction network analysis, several measures of pharmacological similarity can be generated. Illustrated here are the activity data (given in pKi units) of small molecules against two unrelated targets. The pharmacology interaction strength (Pij) represents the fraction of compounds tested against both targets that are within 10-fold in potency. The Tanimoto Tij is the fraction of compounds that exhibit submicromolar potency against both enzymes. The Pearson Rij is derived from a linear fit of the data and measures the absolute correlation of potency values against the two targets.53 In this example, the pharmacology metrics indicate that compounds within the applicability domain are likely to interact with both targets.

for pharmacological grouping of targets that may have little or no structural similarity. These tools offer an unprecedented opportunity to rapidly generate a list of potential off-targets as long as enough end points are available. This approach has limitations similar to what we described for other computational models. First, there is no a priori guidance as to what value constitutes a significant pharmacological link, and experience with the approach is required to understand what values are significant and under which conditions. Second, the strength of the interaction predictions is dependent on the size and diversity of the data training set: for an ongoing project, predictions may strengthen or weaken as new data become available, hence the need to ensure continuous incorporation of these new data points into the model. Third, simply observing a correlation of potency values between two target proteins does not in fact establish that a true causative pharmacological relationship exists. In other words, it is important to view the results of this analysis as probabilities. Fourth, if the pharmacological link is based on a limited chemical space, the applicability domain is likely to be limited to that chemistry space. Conversely, if multiple chemical series have been tested, in addition to strengthening the similarity measures, this information may identify chemical series with higher risk that may be either deprioritized or interrogated earlier. While it is premature to generalize on the utility of protein− ligand network analysis, one can envision several value-added applications, including (1) the identification of off-target effects with known safety or drug−drug interaction (DDI) concerns (i.e., hitting toxicity hubs, such as hERG, PPARγ, 5-HT2b, or 481



Review

has some functional consequences as illustrated by the effect of glucose concentrations in culture media on the detection of mitochondrial toxicants: cells grown in high glucose concentrations are resistant to mitochondrial toxicity, while cells grown in the presence of galactose die upon the insult.56 As most cells used in in vitro toxicology screens are of tumorigenic origin, care must be taken to understand how metabolism is affected and the optimal experimental conditions. Large scale experiments to predict organ toxicity (e.g., liver, kidney or heart) have clearly demonstrated that a cell line representative of the organ or tissue is not sufficient to predict organ toxicity, especially using general cell health parameters, such as ATP depletion. There are obvious reasons for that. First, tissues are usually composed on more than a cell type. Second, cell lines cultured in monolayers cannot recapitulate the complexity of a tissue and the large number of biologically important cell−cell and cell−extracellular matrix (ECM) interactions. Even cocultures of different cell types cannot reproduce the tissue architecture required for some cell responses. Third, tissue-specific end points may be required for predicting a tissue-specific response. For instance, multiparametric assessment by high content screening (to simultaneously monitor, for example, mitochondrial membrane potential, DNA damage, endoplasmic reticulum (ER) stress, and lipid accumulation) would not differentiate hepatotoxicants acting through BSEP inhibition or proarrythmogenic agents acting through cardiac ion channels. Finally, it is likely that most compounds cause toxicity in vitro by affecting common functions and structures of the cells (i.e., what is known as the basal cytotoxicity concept) and that in contrast to the in vivo situation, tissue-specific exposure cannot be recapitulated in vitro.57 6.1. In Vitro Genetic Toxicology. Genetic toxicology testing is designed to identify compounds which may interact with and change the native structure of DNA, either through gene mutations (i.e., mutagenicity) or through chromosomal damage (i.e., clastogenicity). Compound−DNA interactions are considered a good surrogate of carcinogenicity potential and are therefore testing requirements for compounds to progress in clinical trials. No single genetic toxicology test predicts all types of genetic damage. To support clinical trials, a battery of tests conducted to GLP standards is performed concurrently with other IND-enabling GLP toxicology studies. Typically, this battery consists of a bacterial mutation test (the Ames test) and an in vitro cytogenetic test for chromosome damage in mammalian cells (e.g., human lymphocytes) complemented by an in vivo cytogenetic test conducted in rodents (typically a bone marrow micronucleus test in rats). These assays are usually conducted at the same time as the other IND-enabling GLP studies, although in vivo genotoxicity studies are not required to support Phase 1 clinical studies. Except for oncology indications, a positive (genotoxic) result in one or more of these GLP tests would significantly impact further development. Hence, early identification of positive compounds and enabling SAR analysis on problematic series is extremely important. Genotoxicity screening paradigms typically involve miniaturized genetic toxicology tests that are conducted during LO or earlier and that have high throughput, low cost, and minimal compound requirements. The miniaturized versions of the Ames test (e.g., the micro-Ames and mini-Ames assays) have very good concordance with the regulatory GLP Ames assay and are consequently very reliable for decision-making.58,59

This is based on the fact that compounds positive in the Ames assay will face major, likely unsurmountable development hurdles for most therapeutic indications. However, it is important to recognize that the Ames assay has also some shortcomings. First, it is an assay based on bacteria and not mammalian cells. Second, the assay is used as a hazard identification test and does not estimate the relative risk for mutagenic events under physiological conditions that would occur in whole animals. Third, an exogenous metabolic activation system (most often an induced rodent microsome S9 fraction) is used to generate and assess metabolites, which may not be present under in vivo conditions (a consideration relevant to all in vitro test results). Therefore, more relevant in vivo approaches to evaluate mutagenicity would be useful to improve risk assessment. Several screening assays for clastogenicity are available. For example, the mouse lymphoma assay evaluates forward mutations in the thymidine kinase locus of mouse lymphoma cells, allowing these cells to form colonies. The sizes of the colonies vary depending on whether the changes are caused by mutation events or chromosome damage. The in vitro chromosome aberration test uses various types of cells arrested in metaphase to identify by microscopy chromosomal breaks or rearrangements. This test is labor intensive and technically challenging. Finally, the in vitro micronucleus assay is a simple option that utilizes various cell types to detect the presence of micronuclei (i.e., a small nucleus formed from a chromosome or a chromosome fragment).60 While easy to perform, traditional in vitro micronucleus assay protocols still require extended hands-on technical time, limiting their throughput. Adaptation to an automated image analysis or flow cytometry platform can significantly increase throughput and the numbers of cells evaluated per compound/dose combination, resulting in a more objective evaluation and decreased technician time.60−62 Other in vitro genotoxicity assays are available as early screens, such as the GreenScreen high-content screening assay, which has been shown to have robust performance characteristics.63−65 Our experience at AbbVie has been that the Ames test (in a format adapted to the Discovery stage) complemented with another assay in mammalian cells is sufficient to circumvent the vast majority of genotoxicity signals at later stages of R&D. In vitro tests for clastogenicity have a high rate of false positive results. Thus, in contrast to Ames results, a positive signal in an in vitro clastogenicity test may not justify termination, and mechanistic studies are occasionally warranted. Compounds that cause DNA damage in these assays may induce whole chromosome loss (aneugenicity) or chromosome breakage (clastogenicity). This distinction is important because compounds acting as aneugens cause damage by indirect mechanisms (deregulation of chromosome segregation), and the damage may be absent below a certain threshold and, thus, may have no carcinogenicity liability at therapeutic exposures. To identify aneugens, a kinetochore assay can be used as a second-tier assay to interrogate in vitro micronucleus assay positive compounds for projects where the target is suspected to induce aneuploidy.66 In addition, flow cytometry-based assays have shown the potential to distinguish aneugens from clastogens in a higher throughput, first-tier in vitro micronucleus assay.62,67 It is also important to underscore that an in vitro clastogenic signal may not translate to an in vivo positive signal. Therefore, for these compounds, it may be useful to rapidly interrogate in an early in vivo rat repeat-dose 482



Review

toxicology study whether an in vivo correlate to the in vitro signal exists. This can be done by evaluating for the presence of micronuclei in bone marrow or peripheral blood.60 For these compounds, a negative in vivo result could support further advancement at a lower risk with a recommended early evaluation in the battery of GLP genetic toxicology tests supplemented by an additional in vivo genetic toxicology end point if the GLP in vitro cytogenetic test is positive, as indicated in ICH S2 (see: ICH harmonized tripartite guideline guidance on genotoxicity testing and data interpretation for pharmaceuticals intended for human use). 6.2. Molecular Pharmacology Profiling. The vast majority of pharmaceutical discovery organizations routinely screen compounds against a panel of safety relevant proteins (e.g., enzymes, receptors, and ion channels) with the objective of identifying activity against these nontherapeutic targets (also called antitargets) that are associated with undesirable adverse effects. These screens are conducted internally or in contract laboratories depending on the companies. Often, at least at the candidate selection stage, robust large panels composed of 50− 80 antitargets are used. These screens have traditionally been radioligand displacement assays, which do not distinguish activators from inhibitors and which are usually conducted using a single concentration of the test article (10 μM appears to be most widely used). Consequently, hits at a prespecified threshold (e.g., > 50% displacement at 10 μM) are commonly followed up with a functional assay in order to understand the activity better (inhibitor versus activator) and also to generate an IC50 or EC50 for better risk assessment. In addition to identifying specific antitargets, these assays, as discussed before, offer an opportunity to evaluate the pharmacological promiscuity of compounds. It should be stressed that there are no widely accepted criteria for what constitutes pharmacological promiscuity in these assays. Several criteria have been proposed in the literature. For example, Peters et al. in a recent analysis of the molecular properties associated with promiscuity used a hit rate >5% in the Bioprint panel, which is composed of 131 safety relevant antitargets.68 However, since there is no industry harmonization of panel compositions, such that each company uses a different set and a different number of antitargets, it is not possible to derive criteria, and these criteria will therefore be company-specific. US FDA scientists, who review submissions from a wide variety of companies, reported that indeed the panels of targets used by sponsors vary widely and that often the rationale for the selection of the targets is unclear.69 This lack of obvious rationale likely reflects the fact that most companies use a standardized panel across all therapeutic indications without necessarily tailoring it according to the therapeutic target being pursued. Finally, polypharmacology may be an important part of driving efficacy. On average, a drug is estimated to modulate six target proteins, and for some indications, some level of promiscuity may be required to achieve efficacy.70 Large, radioligand binding assays are costly and of relative low throughput, such that they are not really amenable for early testing. For that reason, a recent trend in the industry has been to complement these development candidate-stage evaluations with smaller sets of representative assays that can be used earlier in the Discovery process for both assessment of pharmacological promiscuity and detection of interactions with few specific antitargets that have clear developmentlimiting implications.68,71 For instance, agonists of the 5hydroxytryptamine 2b (5-HT2b) receptor are clearly undesir-

able due to their known cardiovascular liability (i.e., valvulopathy).72 Likewise, activators of the peroxisome proliferator activated receptor gamma (PPARγ) are also typically avoided due to the safety liability associated with this pharmacological mechanism (i.e., cardiac toxicity due to plasma volume overload).73 At AbbVie, we have implemented a similar approach that we call Bioprofiling, which evaluates 21 antitargets selected according to their known link to wellcharacterized in vivo adverse events. The main objective of the Bioprofiling approach is to flag compound liabilities as early as post-high-throughput screening (HTS), to prioritize the best candidate clusters for hit-to-lead (HTL), and to monitor through “spot checking” the profile of selected leads during LO and if applicable conduct SAR studies. In a study comparing the rationale, strategies and methodologies for pharmacological profiling from four large pharmaceutical companies (AstraZeneca, GlaxoSmithKline, Novartis, and Pfizer), variations in panel sizes, tactics, and technologies were reported, but substantial overlap in the nature of the targets was also present.74 This analysis led to the recommendation of a 44target panel for early assessment of potential hazard.74 There is a recent move in some R&D organizations toward focusing on smaller screening panels composed of functional assays rather than radioligand-based assays even for advanced compounds. The goal is to focus exclusively on antitargets with good predictive value for adverse drug reactions, rather than to evaluate complete selectivity, which is arguably never completely feasible even with large panels.68,71,74 The rationale is based on the fact that a lot of hits in radioligand-based assays are associated with inhibitors and frequently have no safety relevance. However, these hits need to be followed up with second-tier functional assays, which adds time and cost. For instance, across the industry, hits for the 5-HT2B receptor are frequent, and most of these compounds are inhibitors panels (unpublished personal observation).68,71 For example, in the Novartis assay panel, 5-HT2b was reported to be the most promiscuous receptor.71 Likewise, compounds with true safety issues due to the activation of an antitarget may not be easily detected in these radioligand displacement assays, which are notoriously insensitive when the radioligand is an inhibitor.68 6.3. Kinome Profiling. Because of the importance of protein kinases as therapeutic targets but also as key regulators of essential cellular functions, it has become routine practice in drug discovery to screen small molecules for their kinase inhibition profiles for both safety and efficacy reasons. These screens use various panels of kinases representative of the complete kinome, which is estimated to contain 518 protein kinases.75 Kinome profiling is typically conducted internally in larger R&D organizations, but commercial profiling services exist as well. These high- to medium-throughput assays evaluate the inhibition of enzyme activity (i.e., capability of compounds to decrease the phosphorylation activity of kinases), measurement of inhibitor binding, or cellular activity.76 It is noteworthy that a retrospective evaluation of four large collections of kinase profiles demonstrated that the concordance between the assay panels from the different sources was relatively modest, indicating that assay conditions substantially influence results, especially for compounds with activity.30 The composition and size of the panels are variable across companies and reflect to some extent institutional history (e.g., past kinase projects, experience with specific kinases associated with undesirable outcomes, and literature reports linking a kinase to a specific pathology), such that there are no simple criteria for what is an 483



Review

of kinome profiling panels and the decisions made from kinome profiles will remain mostly empirical. However, given the lack of concordance in activity results and related quantities, such as pharmacological similarities and target promiscuity, this approach will have definite limitations and will have to go through an extensive validation exercise in each institution using internal data.30 Furthermore, these inconsistencies in results across platforms indicate that caution should be used at the current time when building or using in silico and chemoinformatics tools using data sets from different sources. 6.4. In Vitro Cardiovascular Profiling. While compound profiling against cardiovascular effects belongs to the large category of molecular pharmacology profiling, it is useful to consider it separately because of its importance in the LO process but also because it is often conducted separately from other molecular profiling assays. In addition, rapid changes have been and will be experienced in terms of evolution of the technologies and testing paradigms in this area. A decade or so ago, the hERG (human Ether-à-go-go-related gene) channel (a repolarizing potassium channel) was at the center and often the only part of in vitro cardiovascular safety tests conducted in Discovery, which included a radioligand binding assay (e.g., dofetilide binding assay) sometimes complemented by the evaluation of action potential duration (APD) using canine Purkinje fibers.83,84 These screens were then followed by a more thorough evaluation of hERG block to generate an accurate IC50 using manual in vitro electrophysiology patch clamp assays. The rationale for this approach was clear and sound: hERG block by drugs is known to produce delayed repolarization and prolongation of the QT interval. Since QT prolongation is used as a surrogate marker of torsade de pointes (TdP), a potentially lethal cardiac arrhythmia, hERG blockade was considered an unacceptable attribute for exploratory compounds. However, as knowledge has evolved, there are reasons to challenge this simplistic approach. First, hERG is a very promiscuous antitarget, meaning that a large proportion of compounds will be positive in a binding assay. Although theoretical safety margins are used (for example, ratio between IC50 for hERG and EC50 for the therapeutic target being pursued), many potentially safe compounds may be mistakenly deprioritized at that stage. Second, QT prolongation is a very sensitive end point and a weak indicator of risk for TdP because of poor specificity (i.e., several clinically safe drugs are associated with QT prolongation).84,85 Third, not all hERG blockers are associated with QT prolongation. This can be explained by the simultaneous effects on other ion channels in the heart that modulate the effects of hERG current block. For example, verapamil is the prototypical “false positive” in the hERG assay: it is a potent hERG blocker that is not torsadogenic because it also blocks calcium channels, thereby offsetting its effects on hERG.86,87 As new equipment and methods for in vitro electrophysiology have become available, a shift in testing paradigm has occurred with most companies adopting automated electrophysiology patch clamp platforms and/or fluorescent dye-based assays (e.g., thallium influx evaluation with the FluxOR technology) to move away from binding assays.88 In our experience, in vitro electrophysiology assays adapted to different platforms yield generally comparable yet not identical results. This has also enabled the parallel evaluation of the effects of compounds on several relevant cardiac ion channels (Nav1.5, a sodium channel, and Cav1.2, an L-type calcium channel) to better identify hERG blockers without torsadogenic potential. This

acceptable kinome profile from a safety perspective. The general assumption is that the more selective the compound, the better. Hence, the objective is to usually find compounds with the cleanest profiles against unintended kinases in these panels. Whether this assumption is always correct is often unknown, but without additional data to prove the contrary, this is a reasonable way to proceed. Developing small molecule kinase inhibitors has been a very active area in medicinal chemistry in the last two decades. The majority of small molecule inhibitors are ATP-mimetic compounds that target the ATP binding site of the enzymes (those compounds are referred to as Type 1 and 2 kinase inhibitors). This site is located in a hydrophobic cleft between the two lobes of the kinase domain and is highly conserved. This structural conservation makes finding compounds with good selectivity for a specific kinase a challenge. Substantial progress in synthesizing kinase inhibitors with good selectivity profiles has been made, and the regulatory approval of several small molecule kinase inhibitors for various indications (dominated by oncology but also including immunology indications) is a testimony of these remarkable advances. However, one can argue that the vast majority of small molecule kinase inhibitors are not truly selective and that some degree of promiscuity exists, which may not necessarily translate into a safety issue. Because complex kinase/inhibitor relationships occur without relationships to their sequences, the kinase selectivity of the majority of exploratory compounds against the entire kinome is generally not known and cannot be reliably predicted using a small subset of the kinome.77,78 Another complicating factor is that little is still understood about the polypharmacology of kinase inhibition, especially for safety concerns. Several on-target effects against specific kinases are well described. For example, inhibition of the vascular endothelial growth factor receptor (VEGFR), especially the VEGFR2, is known to be associated with hypertension, hemorrhage, thrombosis, and proteinuria.79,80 However, it is usually unknown if additional kinase interactions may worsen or ameliorate these on-target effects, given our current limited understanding of cellular signaling networks in various tissues. Furthermore, interactions with other less understood kinases may lead to effects that are not yet well understood. For example, cardiac toxicity (defined as drug-related deterioration in cardiac function and/or development of congestive heart failure) has been reported with kinase inhibitors. Although this is not surprising given that kinases play a critical role in cardiovascular energy and calcium homeostasis, it is not fully understood what combinations of kinase interactions really drive this side effect.81 Ideally, one would want to define the kinase inhibition profiles that are associated with specific toxicities. An example of this approach is the model published by Olaharsky et al. to predict whether a compound will test positive for clastogenicity in the in vitro micronucleus assay based on results of kinome profiling.82 Such safety predictive kinome “signatures” could then be used as more reliable criteria for decisions on compounds but also possibly for SAR studies. As private and public data-rich databases linking phenotypic end points (i.e., undesirable effects) to large kinase interaction maps for a large number of diverse molecules are becoming more robust and abundant, this approach may become feasible and may contribute to the knowledgebase necessary for developing these safety alerts. Without this fundamental biological understanding generated through data mining, the composition 484



Review

utility to guide medicinal chemistry decisions. This is mostly because of their lack of well-defined relationships with specific molecular mechanisms of toxicity and because of the complexity of most toxic events in whole animals, which can rarely be fully recapitulated in simple in vitro models. Several reports have suggested that high-content mutiparametric cytotoxicity screening approaches can be used to predict specific toxicity in vitro, especially hepatotoxicity,98−100 but these approaches are also resource-intensive, often organspecific, and may not provide additional value compared to that of simpler, higher throughput, plate reader-based cytotoxicity assays, such as the CyQUANT assay.101 One major challenge facing the in vitro cytotoxicity approach is that expectations have probably been too high for what can be extracted from these types of data. Using a retrospective analysis of findings from rat exploratory toxicity studies, Pfizer has demonstrated that safety toleration can be roughly approximated using simple cytotoxicity values.102,103 More specifically, this retrospective analysis used safety findings from rat exploratory toxicity studies (4−14 days) and in vitro cytotoxicity values using THLE cells (an SV40 large T-antigen immortalized hepatocellular cell line) generated in a 96-well plate ATP-based viability assay (Vialight) for 72 compounds that spanned a broad range of therapeutic targets (protease, transport, G-protein-coupled receptors, and kinase inhibitors, and cGMP modulators). A composite safety score was calculated for each compound dose based on findings in each of the following categories: systemic toleration (mortality, food consumption, and adverse clinical signs), clinical chemistry/ hematology parameters (deviations from normal ranges), and multiorgan pathology (necrosis or incidence/severity of histopathological change). A higher score was considered “more toxic”. Binning compounds into potent (LC50 < 10 μM) and nonpotent (LC50> 100 μM) in vitro toxicants showed that the potent in vitro toxicants were associated with higher overall severity scores at lower in vivo exposures. Correlating overall safety toleration for individual compounds was further refined using in vivo exposure data. When average plasma exposure (Cave) for a compound exceeded its LC50 (Cave/LC50 > 1), higher overall safety scores were calculated compared to lower exposure margins (Cave/LC50 < 0.01).102 In the same year, Greene et al. expanded on the previous studies by showing how cytotoxicity data (using THLE cells and ATP as an end point) could be used to better predict in vivo adverse outcomes: compounds with a LC50 < 50 μM were 5 times more likely to be associated with toxicity findings in an exploratory rat study at a Cmax 50 μM.103 This approach adopted by Pfizer offers several advantages. First, it is not limited to evaluation of hepatotoxicity, which is probably realistic given that most liver-derived cell lines, including THLE cells, have limited hepatocellular characteristics and are probably more representative of poorly differentiated cells. Second, it provides a probability correlation to in vivo toleration (irrespective of its mechanism) at plasma concentrations representative of exposures achieved in toxicology studies, thereby providing a safety margin aspect. This probability measure can be used for compound or series discrimination, prioritization, or selection at early stages. Third, it established a robust probability-based cutoff value that allows for rapid analysis and decision-making in a Discovery environment. Others have also demonstrated that cytotoxicity data can be useful surrogates of in vivo toxicology outcomes. In

approach is referred to as MICE (multiple ion channel effects).89 A precompetitive effort is also currently evaluating a new paradigm to assess the proarrhythmic risk of small molecules, and results from this collaboration will likely significantly impact screening paradigms for proarrythmic risk in the future. This collaboration is assessing the value of nonclinical in vitro human models based on current understanding of the mechanism of TdP, with the ultimate goal of shifting the current general approach which strongly relies on QT prolongation. The proposed paradigm is known as the comprehensive in vitro proarrythmia assay (CiPA) and proposes to evaluate concurrently the functional effects of compounds on multiple cardiac ion channels with data interpreted with the help of in silico cellular simulations.90,91 The computer electrophysiological simulations will determine effects on the cardiac action potential and potential for development of aberrant rhythms. In addition, the paradigm would take advantage of the newly developed iPS cell-derived cardiomyocytes to serve as the basis for an integrated electrophysiological drug response.90 Other integrated models are available to assess the proarrhythmic potential of compounds, including the traditional Langendorff heart model, as well as the use of cell impedance assays or microelectrode arrays (MEA) to measure the electrophysiological activities of cardiac myocytes or iPSderived cardiomyocytes.92 The basis of MEA is the presence of substrate-integrated extracellular electrodes embedded in an array allowing for the detection of field potentials and reconstruction of the shape and time course of the underlying action potential.87 These more integrated approaches also reflect the full range of mechanisms involved in cardiac action potential regulation and limits the number of false positive compounds like verapamil in the hERG assay.87,93 iPS-derived human cardiomyocytes have been proposed as the ideal cell type to conduct these in vitro electrophysiologic evaluations for two main reasons. First, they are believed to be a more relevant model system than animal-derived cells. Second, they enable the integrated evaluation of the total effects of compounds on all ion channels in contrast to evaluating effects on each ion channel separately and using cell lines with heterogeneous expression of human ion channels. Large numbers of publications have reported results with iPS-derived human cardiomyocytes, but it is noteworthy that depending on the origin of the cells and methods of preparation, results vary because of phenotypic and functional differences, and responses to reference agents may also differ from those obtained in established in vitro and in vivo models.92,94,95 No comprehensive effort to fully characterize these various models has been made, such that it is still unclear whether their performance characteristics will be superior to those of existing in vitro methods. In addition, iPS-derived cardiomyocytes have some recognized limitations, which complicate their use in in vitro systems. In particular, current culture protocols generate mostly immature cells with fetal-like morphology, ion channel expression, and electrophysiological function, resulting in spontaneous beating and slow depolarization−repolarization speeds compared to those of primary cardiomyocytes.96,97 6.5. High-Throughput Cytotoxicity Assays and Early Identification of Problematic Chemical Matter. The use of nonmechanism specific cytotoxicity assays has traditionally been unrewarding and sometimes counterproductive by diverting limited resources toward efforts with no evident 485



Review

Figure 5. High-throughput cytotoxicity assay: correlation with in vivo outcomes. The retrospective study at AbbVie showed that in the 72 h-ATP assay with HepG2 cells, compounds with an IC20 < 30 μM are 2.5 times more likely to be associated with tolerability issues in exploratory rat toxicology studies at Cmax < 5 μM.

summarizes the results of this validation exercise. This retrospective evaluation confirmed Pfizer’s observation that compounds with cytotoxicity at low concentrations are more likely to be toxic in in vivo tolerability studies at relevant plasma exposures. Because AbbVie was using a different cell type, a different assay format, and a different end point (IC20), the cutoff value was different, and an IC20 of 30 μM appeared to be an appropriate cutoff value for decision-making. Integrating these cytotoxicity data with results from physicochemical filters and pharmacological profiling and interpreting them in the context of other data available (e.g., ADME and potency data against the target of interest) can be useful for post-HTS and HTL prioritization decisions. In addition, compounds can be evaluated with this approach during LO for regular “spot-checking” and for compound optimization. However, low cytotoxicity values should just be viewed as a probability measure of in vivo tolerability issues. Hence, for interesting series and compounds with cytotoxicity at low concentrations, early evaluation in rodent exploratory toxicology studies may be justified to interrogate the in vivo translatability of this in vitro signal. If in vivo data confirm the presence of toxicity issues, then cytotoxicity data may prove extremely useful to guide chemistry toward higher quality chemical matter associated with increased in vivo safety margins in parallel with improvements in potency and ADME characteristics. 6.6. Mitochondrial Toxicity Screens. Whereas the role of mitochondria in toxic injuries caused by chemicals and environmental components (e.g., insecticides and herbicides) has been studied in academic settings for nearly half a century, drug-induced mitochondrial toxicity was not investigated until much later. Pharmaceutical companies did not pay much attention to this issue until the withdrawal of troglitazone due to liver injury and cerivastatin due to rhabdomyolysis. Both compounds were postulated to have caused toxicity in humans at least in part through mitochondrial impairment.105,106 Today, more than 200 drugs that cause severe liver injury (resulting in market withdrawal, black box warning, or restricted usage) have been shown to impair mitochondrial function.107,108 Furthermore, evidence strongly suggests that mitochondrial toxicity plays a particularly important role in initiating idiosyncratic toxicity.109

a comprehensive evaluation, the combination of cytotoxicity data using rat primary hepatocytes and volume of distribution (Vd) was a good surrogate of in vivo toxicity, which as pointed out by the authors, is intuitive given that the Vd represents a measure of tissue distribution, while cytotoxicity addresses the intrinsic toxicity of compounds.30 Obviously, like any retrospective analysis using a binning approach, such findings can suffer from the phenomenon of correlation inflation.33 Therefore, in order to confirm that Pfizer’s observations would translate to another chemical space, AbbVie scientists conducted a similar retrospective study a few years ago (data not published). The high-throughput cytotoxicity assay used at that time within the Discovery profiling group was different from the one adopted by Pfizer: it was based on HepG2 cells, a 384-well plate format and the CyQUANT assay (a cell proliferation assay). However, these differences were not considered sufficient to modify the existing internal approach since, as previously mentioned, THLE cells are (like HepG2 cells) poorly differentiated toward a hepatocellular phenotype and since the use of another assay would be unlikely to change the in vivo−in vitro probability correlation outside of affecting the cutoff value to be used for decisions. These initial assumptions may not be completely correct since recent data have shown that different cell lines have different sensitivities to compounds depending on their ionization state with HepG2 cells being more sensitive to basic compounds, whereas THLE cells have higher sensitivity for acidic and neutral compounds.104 These different sensitivity levels suggest a potential advantage of using multiple cell lines compared to a single cell line.104 To conduct this forward validation exercise, a set of 34 structurally and pharmacologically diverse compounds (all previously evaluated in 5-day rat exploratory studies at sufficient plasma exposure levels) was tested in the HepG2-based highthroughput cytotoxicity assay in a dose−response format to determine an IC20 (defined as 20% inhibition of cell growth). Oncology compounds were eliminated from the analysis because they were expected to have low IC20. Compounds with clear in vivo toxic effects (based on in-life, clinical pathology, and histopathology findings) at Cmax < 5 μM were classified as “toxic”, while compounds with no in vivo toxic effects at Cmax > 5 μM were classified as “non-toxic”. Figure 5 486



Review

Table 1. Examples of Drugs and Their Mitochondrial Targetsa mitochondrial target

examples of drugs

potential clinical manifestation

electron transport chain complexes I−V uncouplers of oxidative phosphorylation fatty acid oxidation

tamoxifen, nefazodone, alpidem, troglitazone, propofol, sorafenib, bupivacaine, cerivastatin, NSAIDs, amiodarone, acetaminophen, nimesulide, diclofenac, usnic acid

organ toxicity/failure, lactate accumulation thermogenesis, organ toxicity/failure

valproate, amineptine, amiodarone, pirprofen, tamoxifen, disclofenac, ibuprofen

mtDNA replication and mitochondrial protein synthesis induction of mitochondrial permeability transition pore mitochondrial potassium channel sodium−hydrogen exchanger carnitine palmiotyltransferase

linezolid, zalcibatine, abacavir, stavudine, didanosine, tacrine

acylcarnitine accumulation, impaired ketogenesis, hypoglycemia organ toxicity/failure, lactate accumulation organ toxicity/failure, lactate accumulation organ toxicity/failure organ toxicity/failure organ toxicity/failure

dimebon, nimesulide, alpidem, amiodarone, diclofenac, acetaminophen, troglitazone, valproic acid glibenclamide, paxillin, nicorandil cariporide perhexilline, etomoxir

a

It is noteworthy that many drugs have more than one mitochondrial target. For example, troglitazone is known to inhibit the mitochondrial electron transport chain but also induces the mitochondrial permeability transition pore. In addition, many drugs cause organ toxicity though multiple mechanisms. For example, nefazodone and troglitazone, which both caused fatal liver injury, have been shown to not just inhibit mitochondrial function but also to inhibit BSEP and to form reactive metabolites.

This requires screening assays which ideally can be run in HTS format and can be used to define SAR. One HTS assay which is now used by many pharmaceutical companies is the “glucose− galactose” assay. This assay measures the ATP content of cells cultured with test compounds for 24 h in media containing high concentrations of either glucose or galactose as an energy source.56,116 Cells cultured in media containing high glucose concentrations are resistant to ATP depletion caused by mitochondrial toxicants since they can compensate for the loss of mitochondrial function by glycolysis, whereas cells cultured in media containing galactose cannot do so. This leads to a marked (≥3-fold) shift between the potency for cellular ATP depletion caused by mitochondrial toxicants in glucose vs galactose media. However, this shift will only occur with compounds for which mitochondrial toxicity is the dominant mechanism of toxicity. For molecules that cause toxicity through multiple mechanisms (e.g., BSEP inhibition, reactive oxygen species, or ROS formation, and ER stress) in addition to the mitochondrial toxicity, no difference will be detected between cells cultured in glucose and galactose media.56 In fact, only about 2−5% of mitochondrial toxicants can be detected using the glucose−galactose assay.117 A more precise HTS assay for mitochondrial toxicity that can be deployed in early drug discovery is the respiratory screening technology (RST), which uses soluble sensor molecules to measure oxygen consumption in isolated mitochondria in real time.117 Oxygen consumption measurements can be considered a surrogate readout for mitochondrial bioenergetic function. By modulating the substrate which fuels the electron transport chain (e.g., by selecting glutamate/malate or succinate), insight into the site of mitochondrial impairment can also be obtained. It is also possible to distinguish between uncouplers (which increase oxygen consumption) and inhibition of the electron transport chain or inhibition of fatty oxidation (both of which decrease oxygen consumption). The RST assay is useful for SAR investigations and has also been used to derive an in silico tool for the prediction of uncouplers.35 However, no SAR model has yet been built for inhibitors of mitochondrial function, even though more than 200 different drug classes are known to inhibit complex I alone. Mehta et al. have published a valuable review of drug classes and their mitochondrial targets.118 When chemists are unable to identify chemical matter devoid of mitochondrial liabilities through SAR studies, more complex

Mitochondria generate most of the energy used by cells in the form of ATP, which is required to execute all essential cellular functions. When mitochondrial function is impaired, cellular function is impaired, which can lead to organ failure and, in extreme circumstances, to death. Mitochondrial function can be inhibited in numerous ways. For many hepatotoxic drugs, direct inhibition of one or more of the electron transport chain proteins (e.g., cerivastatin, chloroquine, fenofibrate, and nefazodone) and inhibition of fatty acid oxidation (e.g., salicylic acid, ibuprofen, and pirprofen) have been demonstrated. Small molecules can also cause mitochondrial toxicity through uncoupling, a mechanism by which electron transport is separated from ATP synthesis, as the drugs cause inner membrane damage that makes protons pass back and forth in a futile cycle, producing heat instead of ATP. In animals, this toxic mechanism translates into clinical signs of impaired respiration, decreased activity, drowsiness, and elevation of rectal body temperature.110 Such compounds are typically weak acids and lipophilic cationic drugs. Tolcapone and nimesulide are two well-known examples of drugs that exhibit mitochondrial toxicity through uncoupling.110,111 Other mechanisms of drug-induced mitochondrial injury, which have been demonstrated for several antiviral agents and antibiotics, include the inhibition of mitochondrial DNA (mtDNA) transcription and translation, respectively.112 The deleterious consequences of these effects are not of acute nature but will manifest over time.113 Table 1 summarizes the effects of selected drugs on the various mitochondrial targets. At Pfizer, during multiple mitochondrial studies conducted over the past decade, we have noted that many compounds inhibit more than one mitochondrial electron transport chain complex or mitochondrial protein target. For example, troglitazone was shown to potently inhibit complex IV but also to a lesser extent complexes II/III and V.114 We observed similar trends for many other drugs, reflecting the promiscuity for many pharmaceutical agents. Our in-house analysis of more than 500 marketed drugs exhibiting effects on mitochondrial toxicity revealed that the majority of these drugs had also unfavorable physicochemical properties, such as high lipophilicity (clogP > 3), which can be expected to result in tissue accumulation and as noted before in promiscuity.115 Since mitochondrial dysfunction can result in significant adverse functional consequences, it is desirable to select candidate compounds that do not exhibit this adverse property. 487



Review

induction), and mitochondrial function to multiparametric imaging approaches using high-content screens.100,134,135 A large number of in vitro assays have been proposed or described. Here, selected in vitro assays are discussed because of their potential impact on the probability of success. Hepatic transporters are critical for the uptake and excretion of xenobiotics. Transporters regulate homeostasis, play an important role in bile formation and drug disposition, and their functionality can be critical for pharmacology and toxicology studies.136 Because interactions with hepatic drug transporters have been linked to toxicity (particularly cholestasis) as well as DDIs, various companies have integrated the early evaluation of hepatic transporter−drug interactions, especially for BSEP and MRP2.135 The jury is still open on whether these early evaluations provide value in addition to the evaluations required for the clinic. As discussed earlier, there are numerous examples of safe drugs interacting with hepatic transporters, especially BSEP, to suggest that this approach will result in the termination of otherwise viable candidates. However, when the data generated with these assays are correctly interpreted in the context of other data, especially the estimated exposure of hepatocytes or plasma exposures in humans, one may argue that such data can be used to reliably predict a safety concern in humans. Dawson et al. reported that in their assay out of 85 drugs evaluated, 40 showed BSEP inhibition, and out of these 40 drugs, 17 with an IC50 < 100 μM and Cmax free fraction >0.002 μM caused DILI.137 Likewise, the same group proposed an integrated approach using data generated from five assays (cytotoxicity using THLE cells with and without CYP450 3A4 activity, cytotoxicity using HepG2 cells in glucose and galactose media, BSEP inhibition, and MRP2 inhibition) combined with an estimation of covalent binding to select development candidates with reduced propensity to cause idiosyncratic adverse drug reactions in humans.126 Their retrospective assessment of this approach using 36 drugs showed good specificity and sensitivity to detect drugs with a high idiosyncratic toxicity concern.126 However, the generation of covalent binding data requires radiolabeled test agents, which are not typically available early during LO, and the interpretation of these data requires some understanding of daily dose in humans, which is often not available during compound optimization. Hence, the practical aspect of this approach is debatable. A decade ago, gene expression profiling of cultured cells, especially hepatocytes, was proposed as a holistic approach to predict toxicity. Numerous literature reports have been published showing the potential utility for various end points, including toxicologically relevant nuclear receptor interactions (e.g., aryl hydrocarbon receptor (AhR) agonists and the PPARα agonists) or phospholipidosis.138−140 However, the applications of in vitro toxicogenomics to prospectively evaluate compounds have been mostly limited to in vitro genomic biomarker end points clearly linked to a single specific mechanism.54 This does not negate the utility of in vitro toxicogenomics to understand mechanisms of toxicity, to formulate hypotheses related to the mechanism of a toxic effect, or to characterize in vitro systems. However, the use of the technology has definitely been narrower in scope than originally hoped for. Formation of reactive metabolites is well recognized as a risk factor for toxicity, especially idiosyncratic toxicity. Consequently, screens evaluating the formation of reactive metabolites have been integrated or experimented with in most pharmaceutical R&D organizations.141 While this was routine

assays that measure mitochondrial function in cells can be deployed to better assess the risk of toxicity that may arise in vivo. For example, both mitochondrial and glycolytic activity can be measured simultaneously using soluble or solid sensor technology.111,117 In addition, cells can be permeabilized which allows for the testing of different substrates and provides insight into mechanisms of inhibition of bioenergetics and inhibition of fatty acid oxidation.119,120 For many years, mitochondrial toxicity was studied in isolation as an initiating mechanism that could lead to liver injury. For a variety of drug classes, excellent correlations were observed between the severity of human drug-induced liver injury (DILI) observed in the clinic and the potency of mitochondrial toxicity. Examples include the thiazolidinediones,114 biguanides,121 nonsteroidal anti-inflammatory drugs (NSAIDs),111 and antidepressants.122 Not surprisingly, these retrospective evaluations also showed that in vitro data interpretation becomes more meaningful when systemic exposure was taken into consideration.111,123 For example, Perceddu et al. using >200 drugs showed a correlation with human DILI of >80% for drugs with an in vitro IC50 value

Toxicology Strategies for Drug Discovery: Present and Future

Recommend Documents