Recent Progress in Understanding Activity Cliffs and Their Utility in

Aug 27, 2013 - Among the five newly introduced classes of activity cliffs, R-group cliffs were most frequently found in public domain active compounds...
71 downloads 11 Views 3MB Size
Perspective pubs.acs.org/jmc

Recent Progress in Understanding Activity Cliffs and Their Utility in Medicinal Chemistry Miniperspective Dagmar Stumpfe, Ye Hu, Dilyana Dimova, and Jürgen Bajorath* Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstrasse 2, D-53113 Bonn, Germany ABSTRACT: The activity cliff concept is of high relevance for medicinal chemistry. Recent studies are discussed that have further refined our understanding of activity cliffs and suggested different ways of exploiting activity cliff information. These include alternative approaches to define and classify activity cliffs in two and three dimensions, data mining investigations to systematically detect all possible activity cliffs, the introduction of computational methods to predict activity cliffs, and studies designed to explore activity cliff progression in medicinal chemistry. The discussion of these studies is complemented with new findings revealing the frequency of activity cliff formation when different molecular representations are used and the distribution of activity cliffs across different targets. Taken together, the results have a number of implications for the practice of medicinal chemistry.



INTRODUCTION Activity cliffs have been defined as pairs of structurally similar compounds having a significant difference in potency.1−4 As such, activity cliffs are directly associated with structure−activity relationship (SAR) information. Cliffs represent centers of SAR discontinuity in activity landscapes of compound data sets and are focal points of SAR exploration.4,5 Activity cliffs, their representation, and associated SAR features have been increasingly studied over the years. In medicinal chemistry, activity cliffs are often encountered in hit-to-lead or lead optimization projects and usually considered on a case-by-case basis for a given compound series. Although activity cliffs often identify important sites and chemical modifications, the presence of multiple cliffs in series might also complicate compound optimization, especially at later stages when different molecular properties beyond potency must be optimized. In this case, frequent activity cliffs are indicative of highly discontinuous or steep SARs, which often leave little room for multiproperty optimization. Thus, although activity cliffs are associated with SAR information, this information might not be exploitable in the context of a given optimization effort. In addition to case-by-case exploration of activity cliffs in practical medicinal chemistry, they can also be systematically investigated through mining of compound activity data. In fact, much of our current knowledge of activity cliffs has originated from such data mining efforts. Activity cliff research has been the subject of a perspective article in the Journal of Medicinal Chemistry4 reporting on the state-of-the-art in this field at the end of 2011. Since then, a number of studies have become available that further advance our understanding of activity cliffs and their utility in medicinal © 2013 American Chemical Society

chemistry. Herein, we present an update on these recent studies and complement the discussion with new results revealing the global distribution of cliff-forming compounds and the distribution of activity cliffs across different targets. To set the stage for this discussion, we first revisit the general activity cliff definition in greater detail and comment on a number of aspects that should be generally considered in activity cliff analysis. Then recent studies and new findings are presented emphasizing those that are particularly relevant for medicinal chemistry applications.



ACTIVITY CLIFF DEFINITION REVISITED

Key Criteria. The general definition of activity cliffs stated above leaves two criteria open, i.e., the similarity criterion and the potency difference criterion. When compound data are mined for activity cliffs, these criteria must be clearly specified and consistently applied. Even if it is attempted to monitor activity cliffs as a continuum, e.g., as pairs of similar compounds with increasing potency differences, threshold values must ultimately be applied to capture activity cliff populations and compare them across different data sets. In the absence of clearly specified similarity and potency difference criteria, systematic activity cliff analysis is questionable at best. It is emphasized that the assessment of compound similarity is far from being a trivial issue. Medicinal and computational chemists often have very different views of molecular similarity, an issue that is further addressed below. Received: July 24, 2013 Published: August 27, 2013 18

dx.doi.org/10.1021/jm401120g | J. Med. Chem. 2014, 57, 18−28

Journal of Medicinal Chemistry

Perspective

Figure 1. MMP-cliffs. A substructure-based definition of activity cliffs is illustrated. In (A), an exemplary MMP is shown. The exchanged substructures are highlighted in blue and define a chemical transformation. (B) Transformation-size restrictions are illustrated. Green arrows indicate examples of largest permitted substitutions of a hydrogen atom (top) and a benzene ring (bottom), whereas red arrows indicate substitutions exceeding the size limits that are not permitted. In (C), two exemplary (transformation size-restricted) MMP-cliffs are shown. Exchanged fragments are highlighted, and compound potency (pKi) values are provided.

Representation Dependence. The most popular way to quantify molecular similarity for activity cliff analysis has thus far been the calculation of whole-molecule Tanimoto similarity using various fingerprint descriptors as molecular representations.4,5 These calculations are generally affected by a strong representation dependence (more so than by similarity metric dependence).4−7 This is the case because different fingerprints yield different similarity values, and generally applicable similarity thresholds do not exist.8 Hence, activity cliffs observed using a given representation/descriptor might not be detected using others.4,5 Given these uncertainties, the term consensus activity cliff has been introduced for cliffs that are consistently formed when different molecular representations are used.6 Activity Measurements. Experimental factors also affect the assessment of activity cliffs. For example, if activity measurements are flawed, artificial cliff assignments might be made (or cliffs remain undetected). Even in the absence of significant inaccuracies, the type of activity measurements that are utilized influences activity cliff distributions. For example, the alternative use of (assay-dependent) IC50 measurements or

(assay-independent) equilibrium constants (Ki values) often leads to significant differences in activity cliff populations.9 Measurement errors tend to be larger in the cases of IC50 data. Typically, fewer activity cliffs are detected in compound data sets when Ki values are used.9 Because activity cliffs are usually studied to identify SAR determinants, the use of highconfidence activity data is generally preferred, Ki and IC50 should be separately considered, and approximate measurements (such as % inhibition) should not be used. Thus, care must be taken to select activity data from compound repositories and to ensure that the data have been carefully curated. For compound structure and activity information, public domain databases such as ChEMBL,10 BindingDB,11 and PubChem12 are critically important sources. Note that structure and activity data retrieval from these and other public databases is not always straightforward and might require the implementation of in-house infrastructures for data organization and curation. Single vs Coordinated Activity Cliffs. In its original formulation, the activity cliff concept focuses on compound pairs forming isolated cliffs; i.e., no structural neighbors of cliff19

dx.doi.org/10.1021/jm401120g | J. Med. Chem. 2014, 57, 18−28

Journal of Medicinal Chemistry

Perspective

Figure 2. Categorization of 2D-cliffs. Shown are exemplary activity cliffs described on the basis of molecular graphs, i.e., 2D-cliffs. Three different categories are given including chirality and topology cliffs (category 1), R-group cliffs (category 2), and scaffold and scaffold/topology cliffs (category 3). Distinguishing R-groups and structural differences between scaffolds are highlighted in red and blue, respectively. Compound potency (pKi) values are provided.

not be considered similar by a medicinal chemist (or chemists might view calculated similarity rather differently). In order to depart from Tanimoto similarity and further improve the interpretability of activity cliffs, chemical similarity has been assessed in different ways, leading to new categorizations of activity cliffs. A compound substructurebased definition of activity cliffs was introduced on the basis of the matched molecular pair (MMP) concept.15,16 As illustrated in Figure 1A, an MMP is defined as a pair of compounds that only differ at a single site15 and that can be interconverted through the exchange of a substructure, i.e., a chemical transformation.16 In MMPs, chemical transformations can involve substituents and parts of core structures. For activity cliff assessment, transformation size restrictions were introduced that limit permitted transformations to typical R-group replacements (or core fragment changes of comparable size).17 In Figure 1B, exemplary replacements of maximally permitted size are shown. As a similarity criterion for activity cliffs, the formation of a transformation size-restricted MMP was set, leading to the introduction of MMP-cliffs, illustrated in Figure 1C.17 MMP-cliffs provide a structurally more conservative representation than activity cliffs defined on the basis of fingerprint Tanimoto similarity and, as further discussed below, occur with lower frequency. In general, MMP-cliffs are chemically intuitive and easier to interpret for medicinal chemists than other cliff representations. For us, the currently

forming compounds are considered. It has previously been shown that activity cliffs are often not formed in an isolated manner but rather as coordinated cliffs involving groups of compounds. 13 To account for the formation of such coordinated activity cliffs, the concept of activity ridges has been introduced.13 Activity ridges consist of a series of highly and lowly potent compounds forming multiple cliffs.13 Recently, such activity ridges have also been detected in highdimensional activity space; i.e., they were formed by compounds with activity against multiple targets.14 These compounds were often active against overlapping yet distinct sets of targets. Thus, multitarget activity ridges were identified that consisted of compounds forming activity cliffs against overlapping sets of targets. Accordingly, these compound subsets represent centers of multitarget SAR discontinuity in high-dimensional activity space. Multitarget activity ridges have been automatically extracted from high-dimensional compound data, shown to involve targets from a variety of families, and contain substantial amounts of SAR information.14



ADDRESSING THE REPRESENTATION CAVEAT Calculated fingerprint similarity values do not only cause a representation dependence of activity cliff formation; an additional disadvantage is that Tanimoto similarity is often difficult to interpret in chemical terms.4,5 For example, a pair of compounds reaching a predefined similarity threshold might 20

dx.doi.org/10.1021/jm401120g | J. Med. Chem. 2014, 57, 18−28

Journal of Medicinal Chemistry

Perspective

Figure 3. Categorization of 3D-cliffs. Shown are representative 3D-cliffs, defined on the basis of binding mode and spatial chemical feature similarity. Five different categories are given that organize cliffs according to 3D interactions distinguishing highly and lowly potent cliff partners. For each 3Dcliff, the superimposed/aligned compound binding modes and their calculated (property density function-based) 3D similarity values (black) are provided. In addition, binding site views of individual complexes are shown (with the lowly and highly potent ligand on the left and right, respectively). On the right, regions to which notable interaction differences map are highlighted using red dashed circles. Compound potency (pKi) values are reported (blue).

Among the five newly introduced classes of activity cliffs, Rgroup cliffs were most frequently found in public domain active compounds.19 Importantly, many of the identified activity cliffs, especially scaffold cliffs and R-group cliffs, remained undetected on the basis of Tanimoto similarity calculations. This cliff categorization and the underlying structural criteria also provided another chemically intuitive representation of activity cliffs on the basis of molecular graphs, in addition to MMPcliffs.

most preferred definition of an activity cliff is an MMP-cliff with an at least 100-fold difference in potency between the two compounds on the basis of Ki values.18 The application of this definition largely controls structurally ambiguous cliff assignments. Another conceptually different compound structure-centric activity cliff categorization was introduced to distinguish cliffs on the basis of fine structural details,19 as illustrated in Figure 2. This classification scheme was based on hierarchically defined molecular scaffolds.20 Three cliff categories were defined; i.e., cliff-forming compounds were required to have (i) conserved scaffold and conserved R-groups, (ii) conserved scaffold and different R-groups, or (iii) different scaffolds and conserved Rgroups. As a similarity criterion, scaffolds had to be topologically equivalent; i.e., they were only permitted to differ in bond orders and/or heteroatom content. As shown in Figure 2, this categorization yielded five different classes of activity cliffs. Cliff partners were distinguished only by chirality, topology, R-groups, core structures (scaffolds), or core structures plus R-group topology. One of the five classes, Rgroup cliffs, essentially correspond to so-called R-cliffs that were previously introduced on the basis of R-group decomposition of active compounds.21 R-cliffs have recently also been applied to detect activity cliffs in combinatorial libraries.22



2D VERSUS 3D CLIFFS Activity cliffs reflect significant differences in the binding of closely related compounds, with direct receptor−ligand interactions being a major component of the binding process. Thus, it also makes sense to consider available structures of receptor−ligand complexes to study activity cliffs in three dimensions and explore differences in interactions that might trigger cliff formation. Given the limited availability of experimentally determined structures, only a fraction of activity cliffs that are detectable at the ligand level can be explored in the context of 3D structures. Previously, first attempts to describe, rationalize,23 and utilize24 3D-cliffs were made.23,24 Recent studies have compared 3D- and 2D-cliffs and systematically extracted 3D-cliffs from currently available X21

dx.doi.org/10.1021/jm401120g | J. Med. Chem. 2014, 57, 18−28

Journal of Medicinal Chemistry

Perspective

predicted from compound pair-based descriptor combinations using random (decision tree) forests.28 By use of these models, cliff-forming compound pairs could be prioritized. In addition, support vector machine models were derived for activity cliff prediction that utilized newly designed kernel functions capturing chemical transformation and core structure information for pairs of compounds.29 By use of these models, MMPcliffs were predicted in various compound data sets with high accuracy. Furthermore, the particle swarm optimization technique was successfully adapted to search for sets of compounds forming coordinated activity cliffs.30 While intellectually stimulating, machine learning models need to be further advanced for practical applications. For example, the activity cliff formalism might be utilized to search for highly potent analogues of a given weakly potent compound, which would require combining predictions at the level of individual compounds and pairs. Feature Probabilities. On the basis of statistical analysis, a conceptually different approach has been introduced31 to determine conditional probabilities of activity landscape features5 including activity cliffs. On the basis of per-compound feature statistics, conditional probabilities were derived for individual compounds to form activity cliffs and other landscape features (or combinations).31 A unique feature of this approach is that probabilities of activity cliff formation are obtained for each compound in a data set rather than for pairs.

ray data. In general, the identification of 3D-cliffs requires pairwise comparison of experimentally determined compound binding modes (modeled structures should be excluded because of inherent accuracy limitations) and hence the application of 3D similarity measures. By use of an atomic property density function taking conformational, positional, and atomic feature differences between ligand binding modes into account,25 3D-cliffs were identified for popular targets such as β-secretase 1 and factor Xa, for which many complex crystal structures were publicly available.25 Alternative 3D similarity and potency difference thresholds were evaluated such that between 40 and 70 3D-cliffs were ultimately obtained. These 3D-cliffs were then compared to 2D-cliffs calculated for the same ligand sets. Only less than 40% of 3D-cliffs were found among 2D-cliffs derived on the basis of fingerprint Tanimoto similarity calculations.25 Thus, 2D and 3D similarity assessment yielded rather different results and many compound pairs with very similar binding modes were not detected on the basis of 2D similarity calculations. Furthermore, 3D-cliffs were systematically extracted from complex X-ray structures26 available in the Protein Data Bank (PDB).27 In this study, at least 80% property density functionbased 3D similarity of bound ligands and a potency difference of at least 2 orders of magnitude were applied as 3D-cliff criteria. From all PDB X-ray structures, 216 3D-cliffs were extracted involving a total of 269 ligands (structures of 255 of which were available at high crystallographic resolution). These compounds were active against 38 targets from 17 different families. The analysis revealed that only a limited number of high-confidence 3D-cliffs are currently available for further study. However, the structural information provided by these cliffs was sufficient to introduce a categorization of 3D-cliffs. On the basis of detailed binding mode comparisons, 3D-cliffs were mostly assigned to five different classes, depending on the presence of well-defined interaction differences between cliff partners,26 as illustrated in Figure 3. Prominent interaction differences included direct ligand−receptor hydrogen bonds, water-mediated hydrogen bonds/charge interactions, (partial) occupancy of hydrophobic pockets, or interaction differences that were due to changes in ligand stereochemistry. Importantly, not all 3D-cliffs could be categorized on the basis of interaction differences. In a number of instances, no notable differences were detected that could rationalize cliff formation. Thus, other components of complex binding processes such as entropic effects or desolvation penalties were also implicated in cliff formation, as expected. Short-range interactions seen in X-ray structures provide an incomplete account of binding processes, and in the absence of confirmatory experiments, it cannot be concluded with certainty that observed interaction differences are responsible for activity cliff formation. Nevertheless, the systematic evaluation and categorization of 3D-cliffs provide strong support for the optimization of compounds active against targets with available structural information.



ACTIVITY CLIFF DISTRIBUTION Aspects of activity cliff research that are particularly relevant for medicinal chemistry include the frequency of occurrence of activity cliffs and their distribution over different data sets and targets. Cliff Frequency and Potency Ranges. In a detailed recent survey of activity cliffs in ChEMBL that exclusively considered high-confidence activity data, it was determined that ∼22−34% of all active compounds (depending on the molecular representations used) were involved in the formation of activity cliffs spanning at least 2 orders of magnitude in potency.18 Thus, activity cliffs occur with relatively high frequency in compound data from medicinal chemistry sources. On average, at least every fifth active compound forms one or more significant activity cliffs. These cliffs were formed by ∼5− 7% of all compound pairs meeting a given similarity criterion.18 Furthermore, the potency range distribution of activity cliffs was determined. The majority of activity cliffs involved compounds with micromolar and nanomolar potency;18 i.e., they spanned the medicinal-chemistry-relevant potency range. On the basis of potency value distribution analysis, it is advisable to consider activity cliffs formed across the entire potency ranges of compound data sets.18 Cliff-Forming Compounds. The distribution of cliffforming compounds for alternative molecular representations has been systematically determined and the overlap between representation-dependent populations quantified. The results are shown in Figure 4. For this analysis 35 021 unique ChEMBL compounds with activity against 129 targets were selected for which equilibrium constants were available. As an additional requirement, each target-based data set had to contain at least 100 compounds. Activity cliffs were determined using MMPs and six different fingerprints as alternative representations. For fingerprints, Tanimoto similarity threshold values were adjusted such that the same proportion of all possible compound pairs met or exceeded the threshold



COMPUTATIONAL PREDICTIONS Machine Learning. First attempts have also been made to predict activity cliffs using different machine learning techniques. Here, a general challenge is that computational models need to be built on the basis of compound pairs and associated features/properties rather than individual compounds, i.e., the classical scenario for machine learning in medicinal chemistry. For example, activity cliff scores were 22

dx.doi.org/10.1021/jm401120g | J. Med. Chem. 2014, 57, 18−28

Journal of Medicinal Chemistry

Perspective

(applying a MACCS Tanimoto similarity of 0.85 as a reference point4,5). The potency difference criterion for cliff formation was set to at least 2 orders of magnitude. All activity cliffs were systematically identified, and the proportion of cliff-forming compounds was determined for each representation. The data in Figure 4 show that the application of transformation sizerestricted MMPs yielded a more conservative similarity assessment than the calculation of fingerprint Tanimoto similarity, which produced consistently higher proportions of cliff-forming compounds. For the different fingerprints, the proportions ranged from ∼34% to 41% and were thus relatively similar. However, the consensus between fingerprints (i.e., consistently detected cliff-forming compounds) was low, with less than 15%. When MMP-cliffs were included in the assessment, the consensus was further reduced to ∼11%. In addition, another key observation was that the union of cliffforming compounds over all representations yielded a large proportion; ∼66% of all compounds participated at least once in the formation of an activity cliff when different representations were used. Thus, alternative representations not only lead to low consensus but also involve increasingly large numbers of compounds in activity cliff formation. These findings further support the application of structurally conservative and chemically interpretable similarity measures. Considering Inactive Compounds. In addition to the compound pair focus, another central aspect of the original activity cliff concept is that only active compounds are taken

Figure 4. Activity cliff-forming compounds. The proportion of compounds with available high-confidence activity data for 129 different targets is reported that participate in the formation of activity cliffs when different molecular representations are utilized. These include transformation size-restricted MMPs and six fingerprints of different design and complexity for which corresponding Tanimoto similarity threshold values were determined (MACCS,35 0.85; ECFP4,36 0.56; FCFP4,36 0.65; TGD,37 0.85; TGT,37 0.84; GpiDAPH3,37 0.43). Further details are provided in the text.

Figure 5. Target set distribution of MMP-cliffs. Heat maps capture relationships between the number of compounds in a target set and the ratio of (A) MMP-cliffs (relative to all qualifying MMPs) or (B) cliff-forming compounds (relative to all data set compounds). Cells are colored according to the population density of target sets containing increasing numbers of compounds and increasing ratios of MMP-cliffs or cliff-forming compounds. In addition, target set statistics for four heat map sections are reported in schematic representations. 23

dx.doi.org/10.1021/jm401120g | J. Med. Chem. 2014, 57, 18−28

Journal of Medicinal Chemistry

Perspective

Table 1. Target Sets with Largest Proportion of Cliff-Forming Compoundsa target identification

target name

no. compds

no. MMPs

no. MMP-cliffs

no. cliff compds

194 118 11534 124 138 219 11156 10142 259 11 19905 137 134 130 280 10193

coagulation factor X gonadotropin-releasing hormone receptor cathepsin S corticotropin releasing factor receptor 1 nociceptin receptor muscarinic acetylcholine receptor M3 bradykinin B1 receptor melanocortin receptor 4 cannabinoid CB2 receptor thrombin melanin-concentrating hormone receptor 1 κ opioid receptor vasopressin V1a receptor dopamine D3 receptor adenosine A3 receptor carbonic anhydrase I

1052 239 247 384 557 253 325 1183 1130 658 805 1102 273 663 1318 852

8220 1115 1586 2712 4064 1182 1746 12687 5489 2247 4901 6749 1029 2362 5698 1667

904 (11.0%) 90 (8.1%) 96 (6.1%) 261 (9.6%) 297 (7.3%) 178 (15.1%) 103 (5.9%) 459 (3.6%) 392 (7.1%) 229 (10.2%) 311 (6.3%) 454 (6.7%) 53 (5.2%) 137 (5.8%) 297 (5.2%) 184 (11.0%)

403 (38.3%) 81 (33.9%) 81 (32.8%) 116 (30.2%) 167 (30.0%) 74 (29.2%) 93 (28.6%) 299 (25.3%) 275 (24.3%) 159 (24.2%) 191 (23.7%) 246 (22.3%) 61 (22.3%) 137 (20.7%) 272 (20.6%) 173 (20.3%)

a

Sixteen target sets containing more than 200 compounds are reported that have the largest proportion of activity cliff-forming compounds according to Figure 5B. Target sets are arranged in the order of decreasing proportions of cliff compounds (bold). For each set, the target name and the number of compounds (compds) and MMPs are provided. In addition, the numbers (and proportions) of MMP-cliffs and cliff-forming compounds (cliff compds) are reported.

factors would be expected to influence the frequency of activity cliff formation. Thus, one might anticipate that differences in the propensity of activity cliffs would occur across different targets. The target distribution of activity cliffs was determined on the basis of 218 target-based compound sets that contained MMP-cliffs and were used to explore global activity cliff frequencies,18 as discussed above. These compound sets varied substantially in size. In Figure 5A, the proportion of MMP-cliffs (relative to all MMPs) is reported in a heat map format for data sets of increasing size. For small data sets, large fluctuations in MMP-cliff rates were observed, as to be expected. It is evident that many data sets were too small to draw statistically sound conclusions. For target sets containing more than 200 compounds, MMP-cliff rates were comparable within the ∼5−15% range. Thus, in large data sets, activity cliffs occurred with similar frequency for different targets and significant differences were not detected. Figure 5B reports the proportion of cliff-forming compounds in data sets of increasing size. Similar observations were made. For small data sets containing up to 200 compounds, large fluctuations in the proportion of cliff-forming compounds occurred, from which no statistically sound conclusions could be drawn. For 31 target sets with more than 200 compounds, the proportion of activity cliff-forming compounds was similar in the ∼5−20% range. For 16 other target sets (populating the lower right section of the map in Figure 5B), larger proportions were observed. Table 1 reports the 16 targets with largest proportion of cliff-forming compounds, ranging from ∼20% to 38%. These proportions did not correlate with MMP-cliff rates for these targets that ranged from ∼5% to 15%. These findings implied that the coordination level of activity cliffs often differed in these data sets. Ligands of various G protein coupled receptors were found to contain the largest proportion of cliffforming compounds. Taken together, the results showed that activity cliffs were similarly distributed over different targets when large numbers of active compounds were available, which was not necessarily anticipated, as discussed above.

into account. When studying databases such as ChEMBL or BindingDB that do not contain confirmed inactive compounds for given targets, cliff analysis must inevitably concentrate on active compounds. However, from a practical medicinal chemistry and SAR perspective, this needs not be the case. Rather, compounds confirmed to be inactive against a given target might also be considered. Pairs formed by similar active and inactive compounds are also SAR-informative, and the inclusion of such compound pairs would be expected to further increase the information content of activity cliff analysis. A possible complication is that inactive compounds might often be difficult to differentiate from borderline active ones. For a consistent representation of activity cliffs, this distinction needs to be made. In a pilot study, publicly available inactive compounds have recently been included in activity cliff analysis.32 PubChem confirmatory bioassays were used as a source of active and confirmed inactive compounds, and activity cliffs formed between pairs of active and pairs of active and inactive compounds were systematically identified for different targets. For the purpose of this analysis, PubChem compounds designated as inactive in confirmatory assays were also consistently classified as inactive (although they might include some compounds with very weak activity at assay detection limits). Importantly, for cliffs formed by active and inactive compounds, a potency difference criterion is not applicable. Rather, a potency threshold value must be set for an active compound. For the analysis of confirmatory screening data, this potency threshold was set to at least 10 μM. The results of PubChem analysis showed that activity cliff frequency was significantly increased when confirmed inactive compounds were taken into account. Depending on the molecular representation used, ∼11−15% of all compound pairs meeting the similarity criteria formed activity cliffs.32 Target Distribution. Another question that is of high relevance for medicinal chemistry is how activity cliffs are distributed across different targets. Different binding sites impose different constraints on ligand binding, and some sites are more adaptable than others, rendering them more permissive to structural variations of active compounds. Such 24

dx.doi.org/10.1021/jm401120g | J. Med. Chem. 2014, 57, 18−28

Journal of Medicinal Chemistry

Perspective

Figure 6. Evidence for progression of activity cliffs. In (A), activity cliff progression is monitored by searching for analogues of potent activity cliff partners. In (B), a preferred activity cliff-dependent compound pathway is shown that leads over a sequence of analogues to highly potent data set compounds. In (C), this pathway (left) is compared to analogous activity cliff-independent compound pathways (middle) as well as two types of merging pathways (right) that combine cliff-dependent and -independent pathways. Highly and lowly potent activity cliff compounds are represented as green and red nodes, respectively. Compounds not involved in the formation of cliffs are represented as gray nodes with a black border, intermittent pathway analogues as gray nodes, and most potent data set compounds as blue nodes. Merging pathways are counted as both cliffdependent and -independent pathways.



PROGRESSION AND UTILIZATION OF ACTIVITY CLIFFS In addition to exploring the frequency of occurrence and target distribution of cliffs, the questions to what extent activity cliffs might be utilized in compound optimization and whether or not cliff information might provide SAR advantages are also of high interest for the practice of medicinal chemistry. However, they are difficult to analyze on a large scale beyond individual case studies. First investigations have recently attempted to provide some insights. To monitor activity cliff progression, the neighborhood of potent activity cliff compounds was systematically analyzed,33 as illustrated in Figure 6A. To enable this analysis, 56 compound data sets evolving over time were assembled from ChEMBL, all MMP-cliffs were identified, and the dates the cliff-forming compounds were reported in the literature were recorded. Then a search for structural analogues of the more potent cliff partner was carried out (applying the MMP formalism) that became available subsequently, thus providing evidence for

activity cliff progression. Of course, it could not be determined with certainty whether or not such analogues were indeed generated on the basis of an activity cliff. By contrast, if no analogues were found, no evidence existed for the further exploration of activity information. As reported in Figure 6A, for 75% of all available MMP-cliffs, no structural analogues were detected for the more potent cliff partners, a surprisingly large number. For the remaining 25% of all cliffs, evidence for progression was available. For 15%, the identified analogues displayed further improvements in potency.33 Thus, on the basis of these findings, activity cliff information available in the public domain is likely underutilized because for the majority of currently available MMP-cliffs, no evidence for further chemical exploration is detectable. In addition, a compound pathway model was designed, illustrated in Figure 6B, to mimic compound optimization paths and to compare activity cliff-dependent and -independent pathways34 (Figure 6C). Compound pathways searched in evolving compound data sets consisted of pairwise overlapping 25

dx.doi.org/10.1021/jm401120g | J. Med. Chem. 2014, 57, 18−28

Journal of Medicinal Chemistry

Perspective

measurement dependencies of activity cliff distributions have been studied in detail, emphasizing the need for consistent activity cliff definitions. In addition, conservative substructurebased activity cliff representations have been introduced to improve the chemical interpretability of cliffs. Furthermore, we have learned that activity cliffs frequently occur in active compounds, even if structurally conservative similarity criteria are applied. A substantial amount of activity cliff information is currently available in the public domain, but for most of these activity cliffs, there is no evidence for further chemical exploration. Thus, there are significant opportunities to collect SAR information associated with available cliffs and utilize this information in compound optimization. If there is evidence for activity cliff progression, it is usually associated with an SAR advantage compared to optimization of other compounds, as indicated by systematic compound pathway analysis. Moreover, we have also learned that activity cliffs are surprisingly evenly distributed across different targets. Thus, for the majority of current pharmaceutical targets, at least partly unexplored activity cliff and SAR information is currently available, which should be carefully considered in medicinal chemistry and drug development. As our understanding of activity cliffs and their distribution has matured in recent years, first attempts have also been made to computationally predict individual or coordinated activity cliffs. Computational exploitation of activity cliff information to identify or design potent compounds is anticipated to be one of the future growth areas of activity cliff research. In addition, it is expected that the activity cliff concept will be further extended, for example, by introducing other types of single- or multiproperty cliffs (going beyond compound potency). One of the grand challenges going forward will be bridging between data mining, knowledge extraction, and practical medicinal chemistry to further improve the knowledge base for compound optimization and develop advanced optimization strategies. Given the large amounts of little explored activity cliff information that are currently available, there should be many opportunities for the integration of compound data mining and optimization efforts in the context of medicinal chemistry projects.

MMP (analogue) sequences leading from an activity cliff compound or, alternatively, a compound not involved in a cliff to one of the 10% most potent compounds in a data set. All possible activity cliff-dependent, cliff-independent, and merging pathways according to Figure 6C were systematically identified.34 The potency range distribution of cliff- and noncliff compounds was overall comparable, but 5−6 times more noncliff compounds were available in these data sets, which statistically favored the occurrence of cliff-independent pathways. Nevertheless, activity cliff-dependent pathways were found to reach more highly potent compounds than cliffindependent ones. Pathways originating from 54% of all activity cliffs successfully detected the most potent data set compounds, but only 28% of pathways originating from compounds not involved in activity cliffs.34 Hence, if activity cliff progression was detected by the pathway model, i.e., if there was evidence for the utilization of activity cliff information, an SAR advantage was associated with cliff-dependent over -independent pathways.



CONCLUDING REMARKS Similar or analogous structures with large potency differences often reveal substitution sites, R-groups, and/or core structure features that strongly influence SARs. Therefore, activity cliffs are in general of high interest in medicinal chemistry where they are typically explored in individual compound series as indicators of steep SARs or activity determinants. The activity cliff concept has also been generalized. In its original formulation, this concept has four key aspects: it is potencycentric, focuses on compound pairs, only considers active compounds, and strongly emphasizes SAR discontinuity. This has implications for medicinal chemistry. First, the activity cliff concept alone is insufficient to fully address the requirements of lead optimization during which multiple molecular properties must be optimized by balancing different parameters such as potency, solubility, metabolic stability, etc. In addition to potency-centric activity cliffs, other types of property cliffs can certainly be considered. Furthermore, the presence of highly discontinuous SARs, which are reflected by activity cliff formation, is not always desirable when multiple properties need to be optimized and chemical changes made to further improve solubility or stability while retaining potency. In addition, it is also evident that focusing on isolated activity cliffs yields an incomplete view of SAR information that is highly context-dependent. SARs are often rather heterogeneous in nature and combine discontinuous and continuous components that must be carefully considered. Some of the above limitations have been addressed by extending the activity cliff concept, for example, by taking coordinated activity cliffs or chemical neighborhoods of cliffs into account. Despite conceptual limitations, the study of activity cliffs has experienced increasing interest in recent years. This has likely been due to the fact that the notion of activity cliffs is equally relevant for classical series-based compound optimization when cliffs are evaluated on a case-by-case basis and for systematic compound data mining. In fact, much of our current understanding of activity cliffs, their associated SAR information content, and potential utility in medicinal chemistry has resulted from systematic exploration of compound activity data. A number of recent studies have further refined our views of activity cliffs, introduced alternative approaches to evaluate cliffs, and suggested different ways to exploit activity cliff information. For example, molecular representation and



AUTHOR INFORMATION

Corresponding Author

*Phone: 49-228-2699-306. E-mail: [email protected]. Notes

The authors declare no competing financial interest. Biographies Dagmar Stumpfe studied Biology at the University of Bonn, Germany. In 2006, she joined the Department of Life Science Informatics at the University of Bonn headed by Prof. Jürgen Bajorath for her Ph.D. thesis, where she worked on methods for computer-aided chemical biology with a focus on the exploration of compound selectivity. Since 2009, Dagmar has been working as a postdoctoral fellow in the department, and her current research interests include computational chemical biology and large-scale structure−activity relationship analysis. Ye Hu studied Clinical Medicine at the Southeast University, China, from 1999 to 2004. In 2006, she joined the Life Science Informatics Master program at the University of Bonn and obtained her Master degree in 2008. In October 2008, she began her Ph.D. studies in the group of Prof. Jürgen Bajorath focusing on systematic computational analysis of molecular scaffolds of bioactive compounds and associated 26

dx.doi.org/10.1021/jm401120g | J. Med. Chem. 2014, 57, 18−28

Journal of Medicinal Chemistry

Perspective

(10) Gaulton, A.; Bellis, L. J.; Bento, A. P.; Chambers, J.; Davies, M.; Hersey, A.; Light, Y.; McGlinchey, S.; Michalovich, D.; Al-Lazikani, B.; Overington, J. P. ChEMBL: A Large-Scale Bioactivity Database for Drug Discovery. Nucleic Acids Res. 2011, 40, D1100−D1107. (11) Liu, T.; Lin, Y.; Wen, X.; Jorissen, R. N.; Gilson, M. K. BindingDB: A Web-Accessible Database of Experimentally Determined Protein−Ligand Binding Affinities. Nucleic Acids Res. 2007, 35, D198−D201. (12) Wang, Y.; Xiao, J.; Suzek, T. O.; Zhang, J.; Wang, J.; Zhou, Z.; Han, L.; Karapetyan, K.; Dracheva, S.; Shoemaker, B. A.; Bolton, E.; Gindulyte, A.; Bryant, S. H. PubChem’s BioAssay Database. Nucleic Acids Res. 2012, 40, D400−D412. (13) Vogt, M.; Huang, Y.; Bajorath, J. From Activity Cliffs to Activity Ridges: Informative Data Structures for SAR Analysis. J. Chem. Inf. Model. 2011, 51, 1848−1856. (14) Gupta-Ostermann, D.; Bajorath, J. Identification of Multitarget Activity Ridges in High-Dimensional Bioactivity Spaces. J. Chem. Inf. Model. 2012, 52, 2579−2586. (15) Kenny, P. W.; Sadowski, J. Structure Modification in Chemical Databases. In Chemoinformatics in Drug Discovery; Oprea, T. I., Ed.; Wiley-VCH: Weinheim, Germany, 2005; pp 271−285. (16) Hussain, J.; Rea, C. Computationally Efficient Algorithm To Identify Matched Molecular Pairs (MMPs) in Large Data Sets. J. Chem. Inf. Model. 2010, 50, 339−348. (17) Hu, X.; Hu, Y.; Vogt, M.; Stumpfe, D.; Bajorath, J. MMP-Cliffs: Systematic Identification of Activity Cliffs on the Basis of Matched Molecular Pairs. J. Chem. Inf. Model. 2012, 52, 1138−1145. (18) Stumpfe, D.; Bajorath, J. Frequency of Occurrence and Potency Range Distribution of Activity Cliffs in Bioactive Compounds. J. Chem. Inf. Model. 2012, 52, 2348−2353. (19) Hu, Y.; Bajorath, J. Extending the Activity Cliff Concept: Structural Categorization of Activity Cliffs and Systematic Identification of Different Types of Cliffs in the ChEMBL Database. J. Chem. Inf. Model. 2012, 52, 1806−1811. (20) Bemis, G. W.; Murcko, M. A. The Properties of Known Drugs. 1. Molecular Frameworks. J. Med. Chem. 1996, 39, 2887−2893. (21) Agrafiotis, D. K.; Wiener, J. J. M.; Skalkin, A.; Kolpak, J. Single R-Group Polymorphisms (SRPs) and R-Cliffs: An Intuitive Framework for Analyzing and Visualizing Activity Cliffs in a Single Analog Series. J. Chem. Inf. Model. 2011, 51, 1122−1132. (22) Medina-Franco, J. L.; Edwards, B. S.; Pinilla, C.; Appel, J. R.; Giulianotti, M. A.; Santos, R. G.; Yongye, A. B.; Sklar, L. A.; Houghten, R. A. Rapid Scanning Structure−Activity Relationships in Combinatorial Data Sets: Identification of Activity Switches. J. Chem. Inf. Model. 2013, 53, 1475−1485. (23) Sisay, M. T.; Peltason, L.; Bajorath, J. Structural Interpretation of Activity Cliffs Revealed by Systematic Analysis of Structure− Activity Relationships in Analog Series. J. Chem. Inf. Model. 2009, 49, 2179−2189. (24) Seebeck, B.; Wagener, M.; Rarey, M. From Activity Cliffs to Target-Specific Scoring Models and Pharmacophore Hypotheses. ChemMedChem 2011, 6, 1630−1639. (25) Hu, Y.; Bajorath, J. Exploration of 3D Activity Cliffs on the Basis of Compound Binding Modes and Comparison of 2D and 3D cliffs. J. Chem. Inf. Model. 2012, 52, 670−677. (26) Hu, Y.; Furtmann, N.; Gütschow, M.; Bajorath, J. Systematic Identification and Classification of Three-Dimensional Activity Cliffs. J. Chem. Inf. Model. 2012, 52, 1490−1498. (27) Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. The Protein Data Bank. Nucleic Acids Res. 2000, 28, 235−242. (28) Guha, R. Exploring Uncharted Territories: Predicting Activity Cliffs in Structure−Activity Landscapes. J. Chem. Inf. Model. 2012, 52, 2181−2191. (29) Heikamp, K.; Hu, X.; Yan, A.; Bajorath, J. Prediction of Activity Cliffs Using Support Vector Machines. J. Chem. Inf. Model. 2012, 52, 2354−2365.

characteristics. Since July 2011, she has been working as a postdoctoral fellow in the department. Her current research interests include largescale mining of ligand−target interaction data and structure−activity relationship analysis. Dilyana Dimova studied Computer Science at the Saarland University, Germany. In 2010, she joined the Department of Life Science Informatics headed by Prof. Jürgen Bajorath for her Ph.D. studies. She has initially investigated the development of graphical methods for systematic analysis of multitarget structure−activity relationships. Currently, Dilyana is in the third year of her Ph.D. studies and mainly focuses on large-scale exploration of activity cliffs and structure−activity relationships. Jürgen Bajorath is Professor and Chair of Life Science Informatics at the University of Bonn. He is also an Affiliate Professor in the Department of Biological Structure at the University of Washington, Seattle. His research interests include drug discovery, computer-aided medicinal chemistry and chemical biology, and chemoinformatics. For further details, please see: http://www.lifescienceinformatics.uni-bonn. de.



ACKNOWLEDGMENTS The authors thank Kathrin Heikamp for help with data set compilation and activity cliff calculations. D.S. is supported by Sonderforschungsbereich 704 of the Deutsche Forschungsgemeinschaft.



ABBREVIATIONS USED MMP, matched molecular pair; PDB, Protein Data Bank; SAR, structure−activity relationship; 2D, two-dimensional; 3D, three-dimensional



REFERENCES

(1) Lajiness, M. Evaluation of the Performance of Dissimilarity Selection Methodology. In QSAR: Rational Approaches to the Design of Bioactive Compounds; Silipo, C., Vittoria, A., Eds.; Elsevier: Amsterdam, The Netherlands, 1991; pp 201−204. (2) Shanmugasundaram, V.; Maggiora, G. M. Characterizing Property and Activity Landscapes Using an Information-Theoretic Approach. Abstracts of Papers, 222nd National Meeting of the American Chemical Society, Chicago, IL, August 26−30, 2001; American Chemical Society: Washington, DC, 2001; Division of Chemical Information, Abstract 77. (3) Maggiora, G. M. On Outliers and Activity CliffsWhy QSAR often Disappoints. J. Chem. Inf. Model. 2006, 46, 1535−1535. (4) Stumpfe, D.; Bajorath, J. Exploring Activity Cliffs in Medicinal Chemistry. J. Med. Chem. 2012, 55, 2932−2942. (5) Wassermann, A. M.; Wawer, M.; Bajorath, J. Activity Landscape Representations for Structure−Activity Relationship Analysis. J. Med. Chem. 2010, 53, 8209−8223. (6) Medina-Franco, J. L.; Martínez-Mayorga, K.; Bender, A.; Marín, R. M.; Giulianotti, M. A.; Pinilla, C.; Houghten, R. A. Characterization of Activity Landscapes Using 2D and 3D Similarity Methods: Consensus Activity Cliffs. J. Chem. Inf. Model. 2009, 49, 477−491. (7) Peltason, L.; Iyer, P.; Bajorath, J. Rationalizing Three-Dimensional Activity Landscapes and the Influence of Molecular Representations on Landscape Topology and the Formation of Activity Cliffs. J. Chem. Inf. Model. 2010, 50, 1021−1033. (8) Stumpfe, D.; Bajorath, J. Similarity Searching. Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2011, 1, 260−282. (9) Stumpfe, D.; Bajorath, J. Assessing the Confidence Level of Public Domain Compound Activity Data and the Impact of Alternative Potency Measurements on SAR Analysis. J. Chem. Inf. Model. 2011, 51, 3131−3137. 27

dx.doi.org/10.1021/jm401120g | J. Med. Chem. 2014, 57, 18−28

Journal of Medicinal Chemistry

Perspective

(30) Namasivayam, V.; Bajorath, J. Searching for Coordinated Activity Cliffs Using Particle Swarm Optimization. J. Chem. Inf. Model. 2012, 52, 927−934. (31) Vogt, M.; Iyer, P.; Maggiora, G. M.; Bajorath, J. Conditional Probabilities of Activity Landscape Features for Individual Compounds. J. Chem. Inf. Model. 2013, 53, 1602−1612. (32) Hu, Y.; Maggiora, G. M.; Bajorath, J. Activity Cliffs in PubChem Confirmatory Bioassays Taking Inactive Compounds into Account. J. Comput.-Aided Mol. Des. 2013, 27, 115−124. (33) Dimova, D.; Heikamp, K.; Stumpfe, D.; Bajorath, J. Do Medicinal Chemists Learn from Activity Cliffs? A Systematic Evaluation of Cliff Progression in Evolving Compound Data Sets. J. Med. Chem. 2013, 56, 3339−3345. (34) Stumpfe, D.; Dimova, D.; Heikamp, K.; Bajorath, J. Compound Pathway Model To Capture SAR Progression: Comparison of Activity Cliff-Dependent and -Independent Pathways. J. Chem. Inf. Model. 2013, 53, 1067−1072. (35) MACCS Structural Keys; Accelrys: San Diego, CA. (36) Rogers, D.; Hahn, M. Extended-Connectivity Fingerprints. J. Chem. Inf. Model. 2010, 50, 742−754. (37) Molecular Operating Environment (MOE); Chemical Computing Group Inc.: Montreal, Quebec, Canada.

28

dx.doi.org/10.1021/jm401120g | J. Med. Chem. 2014, 57, 18−28