Computational Exploration of Molecular Scaffolds in Medicinal

Feb 3, 2016 - Jürgen Bajorath is Professor and Chair of Life Science Informatics at the University of Bonn and also an Affiliate Professor in the Dep...
12 downloads 14 Views 2MB Size
Perspective pubs.acs.org/jmc

Computational Exploration of Molecular Scaffolds in Medicinal Chemistry Miniperspective Ye Hu, Dagmar Stumpfe, and Jürgen Bajorath* Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstrasse 2, D-53113 Bonn, Germany ABSTRACT: The scaffold concept is widely applied in medicinal chemistry. Scaffolds are mostly used to represent core structures of bioactive compounds. Although the scaffold concept has limitations and is often viewed differently from a chemical and computational perspective, it has provided a basis for systematic investigations of molecular cores and building blocks, going far beyond the consideration of individual compound series. Over the past 2 decades, alternative scaffold definitions and organization schemes have been introduced and scaffolds have been studied in a variety of ways and increasingly on a large scale. Major applications of the scaffold concept include the generation of molecular hierarchies, structural classification, association of scaffolds with biological activities, and activity prediction. This contribution discusses computational approaches for scaffold generation and analysis, with emphasis on recent developments impacting medicinal chemistry. A variety of scaffold-based studies are discussed, and a perspective on scaffold methods is provided.



SCAFFOLD CONCEPT In the introductory section, key aspects of the scaffold concept are discussed and different scaffold definitions introduced. Structural Rationale. In general terms, a “scaffold” is best understood as a molecular core to which functional groups are attached. Accordingly, scaffolds are predominantly used to represent core structures of compounds.1 Other terms such as, for example, “framework”, “substructure”, or “fragment” are often synonymously used to refer to scaffolds.1 However, these terms are also used to describe other structures. For example, a molecular fragment might contain a part of a core structure and one or more substituents. Our interest in core structures is by and large a consequence of the compound series-centric view of medicinal chemistry efforts. Active compounds are subjected to chemical modifications to further improve optimization-relevant properties, derivatives are evaluated, and series of analogs are represented in R-group tables that organize them on the basis of an iteratively modified core. As long as one analyzes one compound series at a time, there is little need to formally define scaffolds. The situation changes if one compares different series, organizes heterogeneous compound data sets, or searches for alternative core structures. Then, it becomes important to consistently account for core structures, leading to the definition of scaffolds. On the basis of scaffolds, large compound decks or screening libraries can be structurally classified and organized.2 For computational analysis and classification, a formal, consistently applied, and reproducible definition of scaffolds is essential.1 Furthermore, the scaffold © XXXX American Chemical Society

concept is also relevant for intellectual property issues. In medicinal chemistry patents, core structures are typically represented as Markush structures. For substitution sites in Markush structures, R-group or reagent lists are then provided to cover as many compounds as possible. Link to Activity. The interest in the scaffold concept goes beyond structural analysis and classification. Another important aspect is the association of scaffolds with desired biological activities. Essentially, there are two major goals. One is the identification of structural prototypes that are preferentially active against targets of interest and ultimately yield highly and specifically active compounds; the other is the identification of different scaffolds that represent structurally distinct compounds having the same activity. The first goal relates scaffolds to “privileged substructures”;3−5 the second leads us to the idea of “scaffold hopping”,6,7 an important computational task in medicinal chemistry. Privileged Substructures. In medicinal chemistry, there has been (and continues to be) high interest in core structures that preferentially interact with given target families. Such “privileged substructures”3 or “masterkeys”4 are thought to provide opportunities for the generation of specifically active compounds, for example, by designing chemical libraries that are focused on such core structures.5 While the existence of privileged structural motifs that might exclusively interact with a Special Issue: Computational Methods for Medicinal Chemistry Received: November 9, 2015

A

DOI: 10.1021/acs.jmedchem.5b01746 J. Med. Chem. XXXX, XXX, XXX−XXX

Journal of Medicinal Chemistry

Perspective

Figure 1. Privileged substructures (red) for GPCRs and kinases.69−71 For each substructure, an exemplary compound is shown with reported target activity.

given target family has remained questionable1 (to our knowledge such structures are not available), preferential binding of prototypic structures to targets of interest has frequently been observed,4,5 beginning with the classical case of the benzodiazepine scaffold found in many ligands of Gprotein-coupled receptors (GPCRs) and ion channels.3 Figure 1 shows examples of privileged substructures from GPCR ligands and kinase inhibitors. For the assessment of privileged substructures as a starting point for chemical optimization and the search for such structural motifs, the scaffold concept is of central relevance (and alternative scaffold definitions might be considered, as discussed below). Scaffold Hopping. A major motivation for computational compound screening (virtual screening) is the identification of structurally distinct compounds having similar activity, a task typically referred to as “scaffold hopping”.6 Ligand-based virtual screening approaches start from known active compounds for a given target as search templates (reference molecules) and aim to identify structural classes (compounds with different core structures) that are also active against the same target. Such calculations, for which a variety of computational methods have been introduced,7,8 might be carried out to circumvent a competitor’s patent position or identify alternative chemical entities for optimization if a series hits a roadblock. Although these searches are typically (but not always) carried out at the level of compounds, the results are evaluated by comparing their scaffolds. In a successful case, scaffolds from reference compounds and confirmed screening hits differ. The question if differences between known and new scaffolds are substantial or if they are chemically distinct is typically answered subjectively in medicinal chemistry, and views might well differ. In addition, as discussed below, there is at least one computational method available to quantify structural distances between scaffolds. Definitions. Regardless of whether scaffolds are utilized for compound classification or activity assessment and prediction,

the key question is how to best define them. Here, views and preferences might differ and a number of alternatives can be considered. There is no generally preferred definition of core structures or scaffolds. Ultimately, the choice of a scaffold definition depends on the specific requirements of applications, as further detailed below. Synthetic Information. From a chemical viewpoint, core structures are often defined on the basis of chemical reactions that can be applied to them. Accordingly, core structures are typically understood as chemical building blocks, or combinations of building blocks, which can be chemically diversified. If small sets of compounds, reagents, or individual series are considered, they can be visually inspected and compared and core structures can be extracted or defined in a subjective manner. However, to formalize reaction-based scaffold design, retrosynthetic rules can be applied to isolate synthetically relevant scaffolds from compounds.9 This approach generalizes reaction-oriented scaffold design and makes it computationally feasible and applicable to increasingly large compound sets. Graph-Based Methods. In addition, there are different ways to define scaffolds on the basis of molecular graphs, for example, by calculating the “maximum common substructure” (MCS) of a set of compounds. Furthermore, the increasingly popular concept of “matched molecular pairs” (MMPs)10,11 can be applied to systematically determine core structures, considering single or multiple substituents. An MMP is defined as a pair of compounds that only differ by a chemical change at a single site (i.e., the exchange of a substructure).10,11 MMPs are not only chemically intuitive, they can also be generated algorithmically in an efficient manner, for example, through systematic fragmentation of individual exocyclic bonds in compounds,12 permitting MMP mining of large compound sets.12,13 MMPs yield core structure representations. However, the core structure obtained for an individual pair of analogs might contain substitution(s) shared by them but not other B

DOI: 10.1021/acs.jmedchem.5b01746 J. Med. Chem. XXXX, XXX, XXX−XXX

Journal of Medicinal Chemistry

Perspective

Figure 2. Alternative scaffold representations. Three compounds (A−C) are shown that yield the same BM scaffold, CSK, MCS, and retrosynthetic MMP core. Scaffold representations are drawn in bold.

rationalize core structures and molecular building blocks, the hierarchical definition of scaffolds according to Bemis and Murcko is most widely applied in computational studies, despite inherent limitations. Figure 2 shows alternative scaffold representations.

analogs within the same series, which might have substitutions at different sites. Hence, multiple MMP cores must be taken into consideration. Accordingly, MMP-based scaffold generation can be further refined by considering all pairwise singlesite variations and removing corresponding substituents from the original compounds, which yields a “consensus scaffold”. This approach is straightforward but has not been reported thus far. Algorithmic MMP generation can also be combined with reaction information through the application of retrosynthetic rules.9 To this end, random bond fragmentation is replaced by knowledge-based fragmentation according to retrosynthetic rules, which produces “retrosynthetic MMPs”14 and corresponding cores. Other algorithms have also been introduced to systematically extract ring systems from compounds,15 as further discussed below. Molecular Hierarchy. From molecular graphs, scaffolds can be systematically derived in a hierarchical manner, as first introduced by Bemis and Murcko 20 years ago.16 Following this approach, compounds are divided into R-groups, linkers, and ring systems and a scaffold (framework) is obtained from a compound by removal of all R-groups while retaining all ring systems and linkers between them.16 So-defined core structures that contain heteroatoms and bond orders are often termed “Bemis and Murcko (BM) scaffolds”. BM scaffolds can be further generalized by converting all heteroatoms to carbon and setting all bond orders to 1, yielding “graph frameworks”16 or “cyclic skeletons” (CSKs).17 Accordingly, each CSK represents a set of topologically equivalent BM scaffolds. From CSKs, a further level of chemical abstraction is obtained by unifying differences in ring size and linker length such that all rings are of unit size and all linkers of unit length, yielding “reduced cyclic skeletons”.17 Following the molecular hierarchy from compounds over BM scaffolds to CSKs has provided a generally applicable, reproducible, and computationally efficient approach to scaffold generation and organization. Although there is no generally accepted and consistently applied definition of scaffolds in the medicinal chemistry literature, given alternative ways to



CAVEATS Although scaffolds need to be unambiguously defined for computational analysis, the term scaffold is often used without clear definition in the scientific literature including virtual screening and scaffold hopping applications. This makes it often very difficult if not impossible to compare the results of different investigations. Furthermore, when associating scaffolds with biological activities, it is often not considered that the compounds are active from which scaffolds are derived but not necessarily the scaffolds themselves. Whether or not a scaffold has intrinsic activity (as hypothesized for privileged substructures) must be determined on a case-by-case basis. Therefore, a term such as “active scaffold” should be used with the explicit understanding that it primarily refers to actual compounds represented by a given scaffold. It should also be considered that scaffolds not generated on the basis of reaction information essentially represent a “molecular construct” that might or might not be chemically feasible, a point of critique frequently raised by practicing chemists when judging scaffold analysis. A more specific recurrent concern of medicinal chemists directly relates to hierarchical scaffolds. By definition, the addition of a ring to a BM scaffold generates a new scaffold, although individual rings are often added as substituents during chemical optimization. Therefore, multiple BM scaffolds with differences in ring content (and/or length of linkers between rings) might represent a given analog series, which is artificial from a chemical perspective and complicates scaffold selection for follow-up chemistry or library design.18 Furthermore, by definition, a BM scaffold must contain at least one ring, and hence core structures without rings cannot be represented (although they certainly exist). By contrast, comprehensive coverage of core structures, irrespective of ring content, is C

DOI: 10.1021/acs.jmedchem.5b01746 J. Med. Chem. XXXX, XXX, XXX−XXX

Journal of Medicinal Chemistry

Perspective

Figure 3. Quantifying scaffold dissimilarity. A compound (top) is compared to four others, which are increasingly dissimilar (from the left to the right). Scaffolds are colored blue. “Scaffold distances” are reported for pairwise comparisons.



MILESTONES OF COMPUTATIONAL SCAFFOLD EXPLORATION Prior to the introduction of graph-based hierarchical scaffolds, a seminal study reported a computational methodology for threedimensional scaffold replacement in structures of active compounds.20 The underlying idea was to identify alternative scaffolds that would present substituents of given active compounds in their correct spatial arrangements. Therefore, bond vector representations were implemented. Databases of scaffold conformers were searched to identify replacement scaffolds for active compounds that matched attachment points and bonds of substituents on the basis of bond vectors.20 In their key contribution, Bemis and Murcko focused on a systematic analysis of “drug shapes”,16 approximated on the basis of topological variations among molecular graphs. For this purpose, BM scaffolds were originally introduced. A set of ∼5000 drugs was analyzed, and it was determined that ∼25% of these drugs were represented by the 42 most frequently occurring scaffolds and ∼50% by the 32 most frequent CSKs. Hence, taken together, these findings indicated that there was only limited diversity among drug topologies/shapes. Lipkus et al. applied the scaffold concept to synthetic compounds and showed that ∼25 million organic molecules were represented by ∼2.5 million BM scaffolds and ∼800 000 CSKs.21 Hence, CSK diversity, and thus topological diversity, was also limited among compounds from general organic chemistry. The BM scaffold definition was adapted and further extended to systematically extract scaffolds from compound sets and

achieved by applying the MCS formalism. By definition, the determination of an MCS requires at least two compounds, and this formalism is best applied to compound series. Another feature of BM scaffolds is particularly critical for scaffold hopping analysis. On the basis of their definition, BM scaffolds can be structurally distinct or very similar. For example, two scaffolds might contain different ring systems and have different topology or, on the other hand, might only be distinguished by a single heteroatom replacement or bond order variation. Topology refers to the way in which atoms are bonded to each other in molecules. It follows that a “scaffold hop” might involve very similar or distinct structures sharing the same activity, and the degree of difficulty involved in “successful” scaffold hopping exercises is usually not considered, especially in benchmark calculations.8 To provide a measure for a consistent assessment of scaffold hopping, a mathematical function has been introduced to quantify the “chemical distance” between scaffolds (i.e., their degree of dissimilarity).19 Figure 3 shows compounds with increasing pairwise scaffold distances. Although the method provides a consistent quantitative readout, it has so far not been applied to evaluate virtual screening applications reported in the literature. Despite limitations that are (without doubt) associated with the scaffold concept, the introduction of scaffolds, especially hierarchical scaffolds, has provided a basis for systematic computational analysis and organization of core structures as well as large-scale exploration of structure−activity relationships (SARs), both at the level of compounds and scaffolds. D

DOI: 10.1021/acs.jmedchem.5b01746 J. Med. Chem. XXXX, XXX, XXX−XXX

Journal of Medicinal Chemistry

Perspective

Figure 4. Scaffold tree. Shown is a prototypic ST for three compounds (A−C from Figure 2) sharing the same BM scaffold. Rings are iteratively removed from scaffolds (drawn in bold) by applying predefined rules until only a single ring remains.

relevance of generated ring components. These scaffolds were isolated from drugs and subjected to rule-based decomposition into fused ring systems and individual rings. From a drug set, 1197 unique scaffolds and 351 ring systems were obtained and 901 scaffolds and 204 ring systems were found to occur only once in a drug. Moreover, only less than 1% of newly approved drugs contained more than one previously unobserved ring system. By contrast, 83 of the top 100 most frequently observed ring systems originated from drugs released before 1983. It is also worth noting that ∼40% of all drugs did not contain any sp3 carbon in a ring system. The study revealed that ring systems were recurrent in drugs and that less than a third of new drugs contained previously unobserved rings.26 Scaffolds not only can be isolated from known compounds but also can be designed on the basis of chemical rules. For example, in another key investigation focusing on ring systems, nearly 25 000 heteroaromatic ring systems were enumerated and only less than 2000 of these rings were detected in known compounds.27 Synthetic feasibility of the remaining heteroaromatic rings was investigated, and it was estimated that ∼3000 of these rings could be synthesized,27 which further expanded synthetically accessible chemical space around heterocycles. Going beyond structural investigations, the association of scaffolds with biological activities was explored. For example, the knowledge-based concept of privileged substructures was

generate scaffold hierarchies on the basis of structural relationships. Key developments included the HierS algorithm22 and the conceptually related scaffold tree (ST) methodology.23 In both cases, the original BM scaffold definition was modified by including exocyclic double bonds and double bonded substituents attached to the linkers. HierS systematically removes fused rings from scaffolds and generates all smaller ring fragments and their combinations from them. This gives rise to the formation of networks of scaffolds with decreasing size and directed edges from the original scaffold and the compound from which it originates (thereby establishing a scaffold hierarchy). ST also begins with scaffolds and decomposes them according to predefined chemical rules along tree branches (establishing structural relationships) until only an individual ring remains. In this case, no scaffold networks are generated. Figure 4 shows a prototypic ST for three compounds. HierS and ST have provided a basis for advanced structural classification by mapping of compounds to scaffold hierarchies. A treelike structure, from which ST evolved, was also used to generate and organize scaffolds from natural products,24 hence providing a knowledge base for natural product-based diversity-oriented synthesis.25 For a systematic analysis of ring substructures in drugs, the BM scaffold definition was modified by retaining all exocyclic carbonyl, thiocarbonyl, imine, sulfonyl, and sulfinyl ring substituents,26 thereby further increasing the chemical E

DOI: 10.1021/acs.jmedchem.5b01746 J. Med. Chem. XXXX, XXX, XXX−XXX

Journal of Medicinal Chemistry

Perspective

inhibitor data sets).35 Multitarget activity ridges were represented using an annotated scaffold−target matrix in which cells contained subsets of compounds with activity against individual kinases. In addition, a new structural classification scheme for activity cliffs was introduced on the basis of CSKs.36 Following this approach, two compounds yielding the same CSK were regarded as similar and BM scaffolds, chirality, R-group patterns, and molecular topology were compared. Accordingly, five different categories of activity cliffs were established including “chirality cliffs” (i.e., compounds with a large potency difference only distinguished by stereochemistry), “topology cliffs” (distinguished by varying positions of the same set of substituents in a conserved scaffold), “R-group cliffs” (different substituents of the same scaffold), “scaffold cliffs” (conserved substituents at corresponding positions in different scaffolds), and “scaffold/topology cliffs” (different scaffolds, substituents at varying positions). From a chemical perspective, this classification scheme was rather intuitive. In a systematic survey of active compounds, R-group cliffs were most frequently detected.36 In another study, scaffolds from active compounds were compared in a pairwise manner to analyze activity cliff formation and scaffold hopping.37 Compounds having the same specific activity involved in scaffold hopping were required to contain topologically distinct scaffolds but have potency values within the same order of magnitude, whereas compounds involved in activity cliffs were required to share the same scaffold but have an at least 100-fold difference in potency. A systematic search was carried out for compounds involved in scaffold hopping and/or activity cliff formation. Results obtained for compound data sets covering more than 300 human targets revealed clear trends. If scaffolds represented multiple but fewer than 10 active compounds, nearly 90% of all scaffolds were exclusively involved in hopping events. With increasing compound coverage, the fraction of scaffolds involved in both scaffold hopping and activity cliff formation significantly increased to more than 50%. However, 40% of scaffolds representing large numbers of active compounds continued to be exclusively involved in scaffold hopping. More than 200 scaffolds with broad target coverage were identified that consistently represented potent compounds and yielded an abundance of scaffold hops in the lownanomolar range.37 Thus, these scaffolds provided attractive templates for compound design. Taken together, these findings also corroborated earlier results indicating that scaffold hops frequently occurred among ligands of many targets.32 Scaffold Hopping Approaches. A variety of scaffold hopping methods have been reported. A “fragment hopping” strategy was implemented for a prospective case study reporting the discovery of a novel chemical series of PIM-1 kinase inhibitors.38 From database compounds, so-called “Onion0” and “Onion1” fragment databases were generated as scaffold sources applying a fragmentation protocol. By use of bound conformations of known inhibitors as a reference, ligand-based virtual screening was performed to identify scaffolds from the Onion databases having high threedimensional similarity to reference structures. In addition, scaffolds displaying predefined pharmacophore features were prioritized and docking calculations were carried out to identify scaffolds maintaining prominent interactions with the target. In this multilayered computational search, triazolopyridine was identified as a potential alternative of the imidazopyridazine

investigated systematically by generating a compound-based target pair network, isolating BM scaffolds, and associating scaffolds with compound activities.28 In the network, 259 human targets were found to form 18 target communities (i.e., separate clusters of targets) and more than 200 BM scaffolds were identified that were contained in at least five compounds with exclusive activity against targets in a given community. These so-called “community-selective scaffolds” were structurally diverse, yielding ∼150 different CSKs, and only 11 of them were detected in approved drugs,28 indicating that many candidate scaffolds were available for target family directed drug design. Moreover, scaffold-based data structures can also be used for activity prediction. Since the ST structure is generated by systematic decomposition of BM scaffolds, it typically includes “virtual scaffolds” that are not present in source compounds. Scaffolds were generated from different compound activity classes, and virtual scaffolds were predicted to be associated with the same activity as their nearest neighbors in the scaffold hierarchy.29,30 Then, test compounds with unknown activity were mapped to these virtual scaffolds to identify candidates for experimental evaluation. Alternatively, new compounds might be synthesized on the basis of prioritized virtual scaffolds. Following this approach, new active compounds were successfully predicted in a number of instances.29−31 To assess the probability of successful compound activity predictions through scaffold hopping, the diversity of scaffolds from active compounds was systematically determined.32 Accordingly, nearly 500 target-based compound sets were assembled that contained a minimum of five compounds with at least 1 μM potency. BM scaffolds and CSKs were extracted. By definition, each CSK represented a set of one or more BM scaffolds that were topologically distinct from all others. For almost 400 target proteins, between 5 and 99 different scaffolds were identified, and for 28 targets, 100 or more were identified.32 Thus, for the majority of targets, available active compounds were characterized by a high degree of scaffold diversity, including many topologically distinct scaffolds. Thus, identifying additional “active scaffolds” for these targets through virtual screening might not be very difficult.



RECENT DEVELOPMENTS In the following, recent advances of scaffold-based approaches are discussed that fall into different areas of computational medicinal chemistry. Description of Activity Cliffs. The understanding of activity cliffs has been further refined on the basis of scaffolds and CSKs. Activity cliffs are generally defined as pairs or groups of structurally similar compounds or analogs with large potency variations.33 They are of interest in medicinal chemistry because they often reveal SAR determinants. A data structure termed “activity ridge” was introduced that consists of multiple and overlapping activity cliffs formed by a series of compounds in which each compound participates in cliff formation.33 To search for activity ridges, compounds yielding the same CSK were considered to meet the similarity criterion for cliff formation and CSK-based compound subsets were analyzed. A total of 125 activity ridges were identified in 71 target sets (compound activity classes),34 which provided a first view of coordinated activity cliffs, i.e., overlapping cliffs formed by series of compounds. The approach was further extended through the identification of multitarget activity ridges in kinase-inhibitor assay matrices (high-dimensional kinase F

DOI: 10.1021/acs.jmedchem.5b01746 J. Med. Chem. XXXX, XXX, XXX−XXX

Journal of Medicinal Chemistry

Perspective

scaffold of known inhibitors. Two compounds were designed and synthesized that replaced the imidazopyridazine with the triazolopyridine scaffold but retained the same substitution patterns as two known PIM-1 inhibitors. These compounds were found to be active and displayed improved druglike properties and target selectivity.38 A methodology for scaffold hopping by fragment replacement was also introduced.39 Scaffolds were generated by combinatorial fragmentation of acyclic bonds of known compounds according to predefined rules. For these graphbased scaffolds, conformations were sampled, leading to a searchable scaffold conformer database. A novel indexing scheme was developed on the basis of the relative geometry of attachment vectors, which enabled fast pruning of the search database. In addition, a scaffold shape descriptor was designed for querying scaffolds with a single attachment vector. The program was successfully applied to retrieve known bioisosteric replacement scaffolds from a large search database.39 Libraries of Scaffolds and Scaffold Diversity. Compound activity data were assembled from various database sources for 1654 human protein targets.40 On average, each target was associated with 964 compounds. Targets were ranked according to the number of compounds, and the top 278 targets were found to cover ∼90% of all compounds. Additional rankings were generated on the basis of scaffolds assigned to five different levels of chemical feasibility and diversity, yielding the currently most extensive ordering of human targets on the basis of compound activity data and classified scaffolds.40 Scaffolds were also classified according to target and target family relationships. 41 For this purpose, BM scaffolds representing compounds active against single targets, multiple targets belonging to the same target family, or targets belonging to different families were systematically identified. On the basis of Ki or IC50 values, largely nonoverlapping sets of scaffolds belonging to these three categories were obtained. However, compared to Ki data, the use of more abundant IC50 values resulted in many more single- and multifamily scaffolds.41 In a large-scale application of the ST data structure, more than 180 000 natural products were systematically decomposed.42 A series of filtering criteria was applied, yielding ∼110 000 natural product-derived fragments, which were subsequently clustered. Centroid compounds from 2000 clusters were assembled to generate a library of structurally diverse fragments from natural products that were rich in sp3 centers. Chemical space coverage of this library differed from reference fragment libraries derived from synthetic compounds. To demonstrate the utility of the natural product-derived fragment library, a subset of fragments that were commercially available or synthetically accessible was obtained and used for biochemical and crystallographic screening against p38α MAP kinase and different phosphatases. Several novel inhibitors were identified including a type III (allosteric) p38α inhibitor.42 New studies on drug scaffolds have also been reported. For example, 700 unique scaffolds were extracted from 1241 approved small molecule drugs. Of these scaffolds, 552 were found to represent only a single drug.43 Furthermore, a subset of 221 drug scaffolds was not detected in currently available bioactive compounds. These scaffolds were designated “drugunique scaffolds”. They were found to display only very limited structural relationships to scaffolds from bioactive compounds.43 Furthermore, various structural relationships were explored for scaffolds from approved small molecule drugs.44

Drug scaffolds frequently showed substructure relationships or shared the same topology. By contrast, only a small number of drug scaffolds were involved in retrosynthetic relationships. The majority of structurally related drug scaffolds represented drugs that had overlapping sets of targets but displayed clear differences in their degree of promiscuity. When structural and activity profile relationships of scaffolds from drugs and other bioactive compounds were compared, systematic differences were detected.44 In the context of this study, the generation of “consensus activity profiles” was introduced as an approach for a qualitative and quantitative assessment of activity similarity of structurally related drugs represented by the same scaffold. Other studies have focused on a comprehensive assessment of scaffolds across different targets. For example, BM scaffolds and CSKs were extracted from all currently available kinase inhibitors and organized taking activity data, structural relationships, and retrosynthetic criteria into account.45 Scaffold coverage was found to vary greatly across the human kinome, and many scaffolds representing compounds with distinct activity profiles were identified. Scaffolds exclusively representing highly potent inhibitors were detected as well as structurally very similar scaffolds of compounds with very different degrees of promiscuity.45 These findings, obtained on the basis of currently available data, have revised the view that most kinase inhibitors might be promiscuous. Moreover, a comprehensive analysis of scaffolds and CSKs from all currently available bioactive compounds was carried out, following the compound−scaffold−CSK hierarchy.46 The analysis was focused on scaffolds and CSKs that were distinct on the basis of topological criteria, lack of substructure relationships, and molecular size. For 315 targets, large numbers of unique scaffold−CSK combinations were identified. In many instances, such combinations represented highly potent compounds. A limited number of scaffolds were also detected that were contained in highly potent compounds with activity against multiple targets. These scaffolds were termed “promiscuous scaffolds”. Taken together, the results of this study provided firm insights that many pharmaceutically relevant proteins are excellent targets for small molecules, given that for many of these targets structurally distinct and highly potent compounds are already available. The analysis of promiscuous scaffolds also showed that many compounds with multitarget activity are highly potent against their targets. Finally, we note that the recent study by Taylor et al. on ring systems in drugs, as discussed above,26 also belongs to the spectrum of scaffold library and diversity investigations. Assessment of Scaffold Similarity and Dissimilarity. Similarity relationships between scaffolds have been explored in various ways. As mentioned above, a function was introduced to quantify the “chemical distance” (dissimilarity) between any pair of scaffolds.19 The development of this function was primarily motivated by the need to evaluate the degree of difficulty associated with detecting alternative “active scaffolds” through virtual screening. Therefore, scaffolds of different composition and topology were subjected to molecular editing procedures that abstracted from original scaffold structures in a defined manner until compositional and topological equivalence was established. Pairs of scaffolds were then transformed into one-dimensional atom sequences that were aligned using approaches adapted from biological sequence comparison. From best scoring atom sequence alignments, interscaffold distances were derived (ranging from 0 to 1).19 On the basis of G

DOI: 10.1021/acs.jmedchem.5b01746 J. Med. Chem. XXXX, XXX, XXX−XXX

Journal of Medicinal Chemistry

Perspective

Figure 5. Scaffold network. Shown is a scaffold network for compound A from Figure 2. The network is obtained by combining the ST and HierS organization schemes described in the text. The BM scaffold is assigned to level 4 and single-ring scaffolds are assigned to level 1 (bottom). Scaffolds comprising the ST are shown on a blue background. In addition, HierS generates all possible coherent scaffolds at each level through systematic ring decomposition, which results in a scaffold network. Scaffolds are drawn in bold.

systematic scaffold comparisons, distance threshold values for close (0.34, “similar”) and remote (0.74, “dissimilar”) structural relationships between scaffolds were determined. When analyzing substitution (“connection”) points of scaffolds from different sources, it was found that the majority of scaffolds contained only one or two connection points to which substituents were bonded (and removed during scaffold generation).47 Only 27.4% and 24.8% of scaffolds from drugs and other bioactive compounds, respectively, contained three to four connection points. A scaffold replacement database was generated from bioactive compounds including 4834 scaffolds with three connection points and 2516 scaffolds with four connection points. Similarity of query and database scaffolds was determined using molecular connectivity descriptors to identify bioisosteric scaffold replacements.47 An earlier gridbased methodology utilized both geometric and shape information for scaffold similarity searching (and scaffold hopping).48

Scaffold-based virtual screening strategies have also been devised. For example, two-dimensional (fingerprint) and threedimensional (shape) similarity searches using the Tanimoto coefficient were carried out for a query scaffold derived from a known potent TTK kinase inhibitor and large numbers of database scaffolds.49 In addition, scaffold-based similarity searching was compared to whole-molecule search calculations. In virtual screens for kinase inhibitors, hit rates of wholemolecule searching were higher than of scaffold searching. However, whole-molecule searches preferentially identified structurally similar hits, whereas scaffold searches displayed the tendency to detect more scaffold hops. Screening hits identified by scaffold searching were experimentally confirmed in assays and studied by X-ray structure analysis, revealing compound binding modes.49 Scaffold similarity was also assessed on the basis of substructures and structural patterns derived from scaffolds, yielding “scaffold keys”,50 which define an additional layer of the molecular hierarchy. A set of 32 scaffold keys were H

DOI: 10.1021/acs.jmedchem.5b01746 J. Med. Chem. XXXX, XXX, XXX−XXX

Journal of Medicinal Chemistry

Perspective

fingerprint similarity. Tree maps provide a view of “scaffold space” including highly populated scaffold regions as well as singleton scaffolds.55 Furthermore, the “molecule cloud” was introduced as a visualization of large collections of molecules.56 It generates dense mosaic-like or play-card-like views in which enlarged structure images indicate recurrent scaffolds (or substituents). Bioactivity information can be added to the molecule cloud through color-coding. The method provides immediate visual access to the most prominent structural features contained in a data set. Scaffold keys,50 as discussed above, can also be used to visualize large compound sets. In the “scaffold map”, scaffolds are ordered according to the first key (i.e., scaffold size) along the horizontal axis and then arranged along the vertical axis on the basis of all remaining keys.50 Scaffolds in the map can be color-coded according to different properties such as, for example, preferred target classes. Other visualization methods have focused on the exploration of SAR patterns in scaffold-based representations of compound data sets. For example, LASSO was introduced as a graphical method for canonical structural organization following the compound−scaffold−skeleton hierarchy.57 Three different structural levels were encoded including compounds represented by the same scaffold, scaffolds yielding the same CSK, and CSKs forming substructure relationships. The graph was organized in different layers accounting for CSKs with increasing numbers of rings and their substructure relationships. CSK and corresponding scaffold information was represented in embedded nodes color-coded according to the potency values of associated compounds. Navigating this structural hierarchy enabled “forward−backward” SAR exploration. The graph reveals compound series with interpretable SAR information and scaffolds whose pie-chart-like nodes represented compounds with characteristic SAR patterns.57 Figure 6 shows an exemplary LASSO graph representation that illustrates these features. Additional scaffold-based visualization methods have been developed to directly focus SAR analysis on analog series. For example, the “directed R-group combination graph” was designed to extract analog series from data sets on the basis of MCSs and to hierarchically organize series according to Rgroups and their combinations.58 Nodes in a graph correspond to all R-group combinations derived from analogs and subsets of these R-group combinations, which are annotated with potency information. From the hierarchical organization of Rgroup combinations and corresponding analog subsets, characteristic subgraphs emerge that reveal SAR patterns.58 A different type of visualization tool was developed based upon ligand−receptor interaction types and termed “biologically relevant chemical space navigator”.59 The navigator was designed to explore series of compounds sharing a given scaffold (provided as input). R-groups of user-defined scaffolds were analyzed using a ligand−receptor interaction fingerprint. For visualization, an interactive Web-based tool was generated for scaffold-based compound series using heat maps representing different types of ligand−receptor interactions and substitution sites in scaffolds. For a given scaffold, R-group patterns at a single site or maximally two sites were analyzed in heat maps, enabling the selection of representative compounds from series or the generation of compounds with nonexplored substituent combinations.59 Finally, AnalogExplorer was introduced as an MCS-based method for graphical deconvolution of large series of analogs

generated on the basis of chemical knowledge and ranked by proposed medicinal chemistry relevance. The keys were used to order scaffold populations. This was accomplished by iterative sorting of scaffolds according to ranked keys they contained. Furthermore, scaffold keys were also encoded in a fingerprint format and used for similarity searching to identify bioisosteric replacements. When compared to conventional whole-molecule fingerprint similarity searching, scaffold keys further improved the accuracy of the search calculations.50 A different way of accounting for scaffold similarity or dissimilarity is the assessment of chemical changes that differentiate topologically equivalent scaffolds (i.e., scaffolds yielding the same CSK). A computational method was introduced to determine chemical changes that distinguished between drug and bioactive scaffolds with conserved topology.51 Chemical modifications were assigned to different categories such as different types of atom replacements or bond order modifications in rings or linker fragments of scaffolds, and the most frequently occurring changes were determined. Small chemical modifications were often found to substantially change scaffold activity profiles or render them completely distinct. Among drug scaffolds with conserved topology, small chemical changes were often accompanied by large differences in (target) promiscuity.51 Scaffolds can also be used to directly focus molecular similarity calculations on core structure contributions. For this purpose, a variant of the Tanimoto coefficient was introduced to quantify compound similarity on the basis of pairwise MCS calculations.52 Then the number of bonds contained in the MCS was related to the number of bonds present in two molecules under comparison. Focusing on bonds instead of non-hydrogen atoms for MCS-based Tanimoto similarity calculations resulted in a relative increase in similarity values for MCSs with high ring structure content. This similarity metric was applied to generate chemical space networks on the basis of pairwise compound similarity relationships as a representation of biologically relevant chemical space.52 Finally, reminiscent of the scaffold key approach discussed above, a graph-based (two-dimensional) scaffold fingerprint was designed to encode properties of ring systems including, among others, topology, shape, and chirality information as well as pharmacophore features and sp3 carbon content.53 This fingerprint representation was applied to characterize more than 157 000 unique scaffolds that were assembled from compounds at different pharmaceutical development stages. It was also used for scaffold similarity searching to identify scaffold hops and bioisosteric replacements. In test calculations, this scaffold property fingerprint was found to further increase the accuracy of conventional fingerprint similarity searching or shape searching.53 Visualization. Methodologies for the visualization of compound collections or SAR information experience increasing interest in medicinal chemistry. A variety of scaffold-based visualization approaches have been introduced in recent years. One way to visualize different types of relationships between scaffolds is the design of scaffold networks, which can be generated in various ways. For example, by combination of the HierS and ST data structures, directed scaffold networks have been constructed and used to analyze screening data in order to identify scaffolds that preferentially yield hits.54 Figure 5 shows a prototypic HierS/ST scaffold network. In addition, from STs, first-level scaffolds were isolated and represented in “tree maps” in which scaffolds were clustered on the basis of calculated I

DOI: 10.1021/acs.jmedchem.5b01746 J. Med. Chem. XXXX, XXX, XXX−XXX

Journal of Medicinal Chemistry

Perspective

with complex substitution patterns and the prioritization of subsets of analogs with varying substituents at specific site(s).60 Following MCS-based identification of analog series, they were subjected to canonical R-group decomposition. AnalogExplorer was designed to consist of three graphical components including the (directed acyclic) “complete graph”, the “reduced graph”, and “R-group trees”. The “complete graph” represents all possible substitution sites and site combinations for a series following R-group decomposition including explored and unexplored site combinations. Thus, it essentially maps the available “analog space” around a given scaffold. In addition, the “reduced graph” only captures available analogs, their structural relationships, and SAR patterns. Furthermore, for a given scaffold (representing a series), “R-group trees” provide a detailed view of subsets of analogs with varying R-groups at defined substitution sites or site combinations.60 Figure 7 shows the AnalogExplorer graph components. The approach is equally suitable for the exploration of large individual analog series and the comparative analysis of structurally distinct or related series contained in heterogeneous compound data sets. AnalogExplorer and AnalogExplorer2,61 a stereochemistrysensitive extension of the program, have been made freely available.61 In addition, there are other publicly available computational tools for the generation, analysis, and visualization of scaffolds including, for example, Scaffold Hunter29,30 and a program to generate scaffold networks.54



CURRENT SCAFFOLD KNOWLEDGE BASE Many of the recent studies discussed above used data from ChEMBL,62 the major public repository for compounds and activity data from the medicinal chemistry and patent literature. Applying different scaffold definitions discussed herein, we have isolated all currently available scaffolds from ChEMBL (release 20), providing a large knowledge base of scaffolds representing bioactive compounds. Only high-confidence activity data63 were selected, and (assay-dependent) IC50 measurements and (assay-independent) equilibrium constants (Ki values) were separately analyzed. A total of 49 038 and 117 122 compounds with Ki and IC50 values for human targets were selected, respectively, separated into two different data sets, and organized into different target-based sets of active compounds (target sets, activity classes). A total of 18 720 and 44 127 unique BM scaffolds were isolated from the Ki and IC50 valuebased target sets respectively, as reported in Table 1. If scaffolds originating from more than one activity class were separately counted, 35 872 (Ki) and 74 379 (IC50) “target-based” BM scaffolds were obtained. The BM scaffolds yielded 8844 and 17 393 unique CSKs as well as 23 056 and 49 216 target-based CSKs for the Ki and IC50 data sets, respectively. In addition, as alternative scaffold representations, two types of MMP cores were generated, as also reported in Table 1. Through fragmentation of a single exocyclic bond per iteration, only coherent MMP cores were obtained. A standard singlebond fragmentation scheme12 yielded 25 931 (Ki) and 55 420 (IC50) unique MMP cores, hence exceeding the number of BM scaffolds, as expected. In addition, retrosynthetic bond fragmentation14 produced 11 632 and 24 219 unique as well as 19 040 and 32 382 target-based retrosynthetic MMP cores for the Ki and IC50 data sets, respectively. Thus, regardless of the scaffold definition, large numbers of scaffolds were obtained that represent compounds with high-confidence data for the current spectrum of targets.

Figure 6. LASSO graph. (a) Shown is an exemplary LASSO graph for a set of 246 serotonin receptor 7 ligands with available Ki values. Nodes in different layers represent CSKs with increasing numbers of rings and edges substructure relationships between them. Three SAR patterns are labeled (1−3). In (b), these patterns are depicted in detail with associated CSKs, BM scaffolds, and representative compounds. Scaffolds are shown on a blue background and compounds on a purple background. For each CSK, the number of corresponding BM scaffolds is reported in parentheses, and for each scaffold, the number of compounds it represents is reported. The pKi values of compounds are also given. For pattern 3, rings that distinguish CSKs are colored red. The figure was adapted from ref 57. J

DOI: 10.1021/acs.jmedchem.5b01746 J. Med. Chem. XXXX, XXX, XXX−XXX

Journal of Medicinal Chemistry

Perspective

Figure 7. AnalogExplorer graphs. AnalogExplorer graph components are shown for a series of 25 compounds active against the serotonin 7 receptor. Graph components include the complete graph (top left), reduced graph (top right), and two exemplary R-group trees (bottom). For each R-group tree node, the structure of the substituent is provided and pKi values of the corresponding compounds (leaf nodes) are reported. Numbers next to nodes represent sites and site combinations and layers indicate site combinations of increasing size.

Table 1. Scaffold Statisticsa

Figure 8a and Figure 8b report the distribution of BM scaffolds and CSKs for the 434 (Ki) and 912 (IC50) target sets with more than 10 compounds, respectively. The distributions were similar. The majority of target sets contained more than 10 scaffolds or CSKs. On average, a compound activity class yielded nearly 81 BM scaffolds and 52 CSKs. The median number of BM scaffolds was 24 and 31 for the Ki and IC50 data sets, respectively. In addition, the ratio of compounds to BM scaffolds was calculated for all activity classes. As reported in Figure 8c, the majority of Ki and IC50 value-based target sets yielded scaffold-to-compound ratios of 1−3. Thus, most scaffolds represented only limited numbers of compounds, indicating a high degree of scaffold diversity across current target sets. However, because structural differences between BM scaffolds might often be only small, as discussed above,

ChEMBL, release 20 Ki

IC50

number of

target-based

unique

target-based

unique

BM scaffolds CSKs MMP cores retrosynthetic MMP cores

35872 23056 42104 19040

18720 8844 25931 11632

74379 49216 73616 32382

44127 17393 55420 24219

a The number of target-based and unique BM scaffolds and CSKs is reported for the Ki and IC50 value-based data sets, respectively. In addition, the number of target-based and unique MMP and retrosynthetic MMP cores is provided.

K

DOI: 10.1021/acs.jmedchem.5b01746 J. Med. Chem. XXXX, XXX, XXX−XXX

Journal of Medicinal Chemistry



Perspective

PERSPECTIVE The scaffold concept is intuitive and widely applied, in particular, to represent core structures of active compounds. Medicinal and computational chemists view scaffolds often differently. For analyzing individual compound series and designing analogs, no formal scaffold definitions are required and medicinal chemistry knowledge is typically applied. The situation changes when scaffolds are used to organize compound data sets or when scaffolds are systematically isolated and compared. For computational studies, formal and consistently applied scaffold definitions are essential. Especially the introduction of the compound−scaffold hierarchy, with BM scaffolds being the central component, has paved the way for many computational investigations. Moreover, different types of scaffold hierarchies have been created for structural organization. The ST structure is a prime example for a scaffold hierarchy, which has also been successfully adapted for activity prediction. Given their consistent and straightforward generation, BM scaffolds have for long dominated computational analysis. However, other emerging scaffold representations are equally intuitive and efficient such as MMP cores and consensus scaffolds. Regardless of whether scaffolds are represented as BM scaffolds, MCSs, or MMP cores, computationally generated scaffolds often have an artificial touch when viewed by medicinal chemists applying synthetic criteria. Accordingly, the inclusion of reaction information into scaffold formalisms will be an important step forward to further bridge between computational and medicinal chemistry and increase the attractiveness of computationally generated scaffolds for medicinal chemistry applications. First attempts in this direction have been made, and we currently favor MMP cores generated on the basis of retrosynthetic molecular fragmentation rules as a preferred basis for scaffold assessment. Major applications of the scaffold concept include structural classification of compound collections, compound data mining, attempts to associate scaffolds and molecular hierarchies with biological activities, and activity prediction. While structural classification of compound sets has been one of the original applications of scaffold methods, there have not been many new developments in this area in recent years, with the exception of visualization approaches. This is likely to change again before long, given the advent of the “big data” era in medicinal chemistry.66,67 Being confronted with unprecedentedly large (and rapidly growing) amounts of increasingly complex compound structure and activity data, there will be a need to further advance canonical data classification and organization schemes, especially for database implementation and utilization. More likely than not, this will lead to a renaissance of classification and data mining methods employing scaffolds and molecular hierarchies. It is also anticipated that even further increasing emphasis will be put on the development of new visualization techniques for large data sets and large-scale SAR exploration. Current challenges in chemical optimization already go much beyond the capacity of data display in R-group tables or similar data representations. For example, late-stage lead optimization sets used to narrow down and ultimately select preclinical or clinical candidates often contain thousands of analogs originating from multiple series. In such cases, subjective assessment quickly approaches its limits and computational methods to visualize and analyze data become essential. For these applications, graphical representations that abstract from structural data and

Figure 8. Scaffold and CSK distribution over target sets. The percentages of target sets with at least 10 compounds containing increasing numbers of (a) BM scaffolds and (b) CSKs are reported for the Ki (blue) and IC50 (orange) value-based data sets. In addition, the corresponding distributions of (c) compound-to-scaffold and (d) scaffold-to-CSK ratios are represented in box plots. Each box plot reports the smallest ratio (bottom line), lower quartile (lower boundary of the box), median (horizontal line), upper quartile (upper boundary of the box), and the largest ratio (top line).

scaffold diversity does not necessarily correlate with global structural diversity, as is occasionally proposed. Therefore, scaffold-to-CSK ratios were also determined, as reported in Figure 8d. Comparable distributions were observed for the Ki and IC50 data sets. Most of the target sets yielded very low scaffold-to-CSK ratios. Only 34 (Ki) and 57 (IC50) target sets showed scaffold-to-CSK ratios greater than 2. Thus, a high degree of topological diversity was detected for scaffolds, which is a more reliable indicator of global structural diversity than scaffold diversity alone (as assessed on the basis of compoundto-scaffold ratios). All MMPs, BM scaffolds, and CSKs were calculated using inhouse implementations that utilize the OpenEye toolkit.64 The different scaffold sets reported herein are made freely available in an organized form via an open access deposition.65 L

DOI: 10.1021/acs.jmedchem.5b01746 J. Med. Chem. XXXX, XXX, XXX−XXX

Journal of Medicinal Chemistry



enable annotations at different levels are particularly relevant, with scaffold-based approaches being a prime example. A major attraction of such visualization techniques is that they are capable of presenting structural, activity, and SAR data in an intuitive form. A conundrum is that computational and medicinal chemists typically view results differently; what is intuitive from a computational viewpoint might be much less so from a chemical perspective and vice versa. Although a variety of scaffold-based visualization methods are already available, in our experience, their use in practical medicinal chemistry is still limited. Hence, going forward, a grand challenge will be to develop graphical methods for increasingly large compound data sets that are appealing to both computational and medicinal chemists and, equally important, easy to understand and use. The association of scaffolds with biological activities is perhaps best exemplified by knowledge-based attempts to identify privileged substructures on the one hand and by ligandbased virtual screening aiming at scaffold hopping on the other. In addition, different approaches exist to explore scaffold− activity relationships such as the adaptation of the ST data structure for activity prediction. Scaffold hopping will continue to be high on the agenda, not only as an intellectually stimulating exercise in the computational arena but also for very practical reasons. Given the rapid growth of compound databases and libraries, with millions of molecules already available at present, it is expected that computational screening will, and probably must, be increasingly used as a complement to high-throughput screening and beyond, especially for applications such as the identification of scaffold replacements for leads of interest. However, as databases further grow, it is also anticipated that data mining for compound−scaffold− activity relationships across different targets will be of increasing interest to organize activity annotations and explore SARs on a large scale. This knowledge will be important to aid in new drug discovery efforts. A challenge for computational efforts will be to make it consistently available and readily accessible for practical medicinal chemistry applications. Last but not least, although the scaffold concept is ligandcentric, it has also been applied to automatically construct three-dimensional models of proteins in complex with structurally related ligands.68 This study points at the potential of exploring scaffolds systematically in structure-based design, an area with much room for further developments.

Perspective

AUTHOR INFORMATION

Corresponding Author

*Phone: 49-228-2699-306. E-mail: [email protected]. Notes

The authors declare no competing financial interest. Biographies Ye Hu studied clinical medicine at the Southeast University, China, from 1999 to 2004. In 2006, she joined the Life Science Informatics Master program at the University of Bonn and obtained her Master’s degree in 2008. In October 2008, she began her Ph.D. studies in the group of Prof. Jürgen Bajorath, focusing on systematic computational analysis of molecular scaffolds of bioactive compounds and associated characteristics. Since July 2011, she is working as a postdoctoral fellow in the department. Her current research interests include large-scale mining of ligand−target interaction data and structure−activity relationship analysis. Dagmar Stumpfe studied biology at the University of Bonn, Germany. In 2006, she joined the Department of Life Science Informatics at the University of Bonn headed by Prof. Jürgen Bajorath for her Ph.D. thesis, where she worked on methods for computer-aided chemical biology with a focus on the exploration of compound selectivity. Since 2009, Dagmar is working as a postdoctoral fellow in the department, and her current research interests include computational chemical biology and large-scale structure−activity relationship analysis. Jürgen Bajorath is Professor and Chair of Life Science Informatics at the University of Bonn and also an Affiliate Professor in the Department of Biological Structure at the University of Washington, Seattle. His research interests include computational medicinal chemistry, chemoinformatics, chemical biology, and drug discovery. For further details, see http://www.limes-institut-bonn.de/forschung/ arbeitsgruppen/unit-4/abteilung-bajorath/abt-bajorath-startseite/.



ACKNOWLEDGMENTS We thank OpenEye Scientific Software, Inc., for the free academic license of the OpenEye Toolkits. D.S. is supported by Sonderforschungsbereich 704 of the Deutsche Forschungsgemeinschaft.



ABBREVIATIONS USED BM, Bemis and Murcko; CSK, cyclic skeleton; GPCR, Gprotein-coupled receptor; MCS, maximum common core structure; MMP, matched molecular pair; SAR, structure− activity relationship; ST, scaffold tree





CONCLUDING REMARKS In our review of and perspective on the scaffold concept and its applications, we have pointed out that scaffold analysis has greatly aided in structural classification, derivation of knowledge from structural relationships, and systematic SAR evaluation. Furthermore, on the basis of the scaffold concept, a variety of visualization methods for compound data sets and associated activity information have been developed. As also discussed herein, different applications might require alternative scaffold representations. Without doubt, scaffolds will continue to be investigated in computational medicinal chemistry, especially as data challenges further increase, and we anticipate seeing new methodological developments in coming years. It is also hoped that the up-to-date knowledge base of scaffolds65 generated and provided as a part of this review might spark further interest to explore scaffolds and associated activity information for medicinal chemistry applications.

REFERENCES

(1) Hu, Y.; Stumpfe, D.; Bajorath, J. Lessons Learned from Molecular Scaffold Analysis. J. Chem. Inf. Model. 2011, 51, 1742−1753. (2) Schuffenhauer, A.; Varin, T. Rule-based Classification of Chemical Structures by Scaffolds. Mol. Inf. 2011, 30, 646−664. (3) Evans, B. E.; Rittle, K. E.; Bock, M. G.; DiPardo, R. M.; Freidinger, R. M.; Whitter, W. L.; Lundell, G. F.; Veber, D. F.; Anderson, P. S. Methods for Drug Discovery: Development of Potent, Selective, Orally Effective Cholecystokinin Antagonists. J. Med. Chem. 1988, 31, 2235−2246. (4) Müller, G. Medicinal Chemistry of Target Family-Directed Masterkeys. Drug Discovery Today 2003, 8, 681−691. (5) Welsch, M. E.; Snyder, S. A.; Stockwell, B. R. Privileged Scaffolds for Library Design and Drug Discovery. Curr. Opin. Chem. Biol. 2010, 14, 347−361. (6) Schneider, G.; Neidhart, W.; Giller, T.; Schmid, G. “ScaffoldHopping” by Topological Pharmacophore Search: A Contribution to Virtual Screening. Angew. Chem., Int. Ed. 1999, 38 (19), 2894−2896.

M

DOI: 10.1021/acs.jmedchem.5b01746 J. Med. Chem. XXXX, XXX, XXX−XXX

Journal of Medicinal Chemistry

Perspective

(7) Schuffenhauer, A. Computational Methods for Scaffold Hopping. Wires Comput. Mol. Sci. 2012, 2, 842−867. (8) Geppert, H.; Vogt, M.; Bajorath, J. Current Trends in Ligandbased Virtual Screening: Molecular Representations, Data Mining Methods, New Application Areas, and Performance Evaluation. J. Chem. Inf. Model. 2010, 50, 205−216. (9) Lewell, X. Q.; Judd, D. B.; Watson, S. P.; Hann, M. M. RECAP– Retrosynthetic Combinatorial Analysis Procedure: A Powerful New Technique for Identifying Privileged Molecular Fragments with Useful Applications in Combinatorial Chemistry. J. Chem. Inf. Model. 1998, 38, 511−522. (10) Kenny, P. W.; Sadowski, J. Structure Modification in Chemical Databases. In Chemoinformatics in Drug Discovery; Oprea, T. I., Ed.; Wiley-VCH: Weinheim, Germany, 2004; pp 271−285, DOI: 10.1002/ 3527603743.ch11. (11) Griffen, E.; Leach, A. G.; Robb, G. R.; Warner, D. J. Matched Molecular Pairs as a Medicinal Chemistry Tool. J. Med. Chem. 2011, 54, 7739−7750. (12) Hussain, J.; Rea, C. Computationally Efficient Algorithm to Identify Matched Molecular Pairs (MMPs) in Large Data Sets. J. Chem. Inf. Model. 2010, 50, 339−348. (13) Warner, D. J.; Griffen, E. J.; St-Gallay, S. A. WizePairZ: A Novel Algorithm to Identify, Encode, and Exploit Matched Molecular Pairs with Unspecified Cores in Medicinal Chemistry. J. Chem. Inf. Model. 2010, 50, 1350−1357. (14) de la Vega de León, A.; Bajorath, J. Matched Molecular Pairs Derived by Retrosynthetic Fragmentation. MedChemComm 2014, 5, 64−67. (15) Broughton, H. B.; Watson, I. A. Selection of Heterocycles for Drug Design. J. Mol. Graphics Modell. 2004, 23, 51−58. (16) Bemis, G. W.; Murcko, M. A. The Properties of Known Drugs. 1. Molecular Frameworks. J. Med. Chem. 1996, 39, 2887−2893. (17) Xu, Y.-J.; Johnson, M. Algorithm for Naming Molecular Equivalence Classes Represented by Labeled Pseudographs. J. Chem. Inf. Model. 2001, 41, 181−185. (18) Katritzky, A. R.; Kiely, J. S.; Hebert, N.; Chassaing, C. Definition of Templates within Combinatorial Libraries. J. Comb. Chem. 2000, 2, 2−5. (19) Li, R.; Stumpfe, D.; Vogt, M.; Geppert, H.; Bajorath, J. Development of a Method to Consistently Quantify the Structural Distance Between Scaffolds and to Assess Scaffold Hopping Potential. J. Chem. Inf. Model. 2011, 51, 2507−2514. (20) Lauri, B.; Bartlett, P. A. CAVEAT: A Program to Facilitate the Design of Organic Molecules. J. Comput.-Aided Mol. Des. 1994, 8, 51− 66. (21) Lipkus, A. H.; Yuan, Q.; Lucas, K. A.; Funk, S. A.; Bartelt, W. F., III; Schenck, R. J.; Trippe, A. J. Structural Diversity of Organic Chemistry. A Scaffold Analysis of the CAS Registry. J. Org. Chem. 2008, 73, 4443−4451. (22) Wilkens, S. J.; Janes, J.; Su, A. I. HierS: Hierarchical Scaffold Clustering Using Topological Chemical Graphs. J. Med. Chem. 2005, 48, 3182−3193. (23) Schuffenhauer, A.; Ertl, P.; Roggo, S.; Wetzel, S.; Koch, M. A.; Waldmann, H. The Scaffold TreeVisualization of the Scaffold Universe by Hierarchical Scaffold Classification. J. Chem. Inf. Model. 2007, 47, 47−58. (24) Koch, M. A.; Schuffenhauer, A.; Scheck, M.; Wetzel, S.; Casaulta, M.; Odermatt, A.; Ertl, P.; Waldmann, H. Charting Biologically Relevant Chemical Space: A Structural Classification of Natural Products (SCONP). Proc. Natl. Acad. Sci. U. S. A. 2005, 102, 17272−17277. (25) Tan, D. S. Diversity-Oriented Synthesis: Exploring the Intersections between Chemistry and Biology. Nat. Chem. Biol. 2005, 1, 74−84. (26) Taylor, R. D.; MacCoss, M.; Lawson, A. D. Rings in Drugs. J. Med. Chem. 2014, 57, 5845−5859. (27) Pitt, W. R.; Parry, D. M.; Perry, B. G.; Groom, C. R. Heteroaromatic Rings of the Future. J. Med. Chem. 2009, 52, 2952− 2963.

(28) Hu, Y.; Wassermann, A. M.; Lounkine, E.; Bajorath, J. Systematic Analysis of Public Domain Compound Potency Data Identifies Selective Molecular Scaffolds across Druggable Target Families. J. Med. Chem. 2010, 53, 752−758. (29) Wetzel, S.; Klein, K.; Renner, S.; Rauh, D.; Oprea, T. I.; Mutzel, P.; Waldmann, H. Interactive Exploration of Chemical Space with Scaffold Hunter. Nat. Chem. Biol. 2009, 5, 581−583. (30) Renner, S.; van Otterlo, W. A. L.; Seoane, M. D.; Möcklinghoff, S.; Hofmann, B.; Wetzel, S.; Schuffenhauer, A.; Ertl, P.; Oprea, T. I.; Steinhilber, D.; Brunsveld, L.; Rauh, D.; Waldmann, H. Bioactivityguided Mapping and Navigation of Chemical Space. Nat. Chem. Biol. 2009, 5, 585−592. (31) Wetzel, S.; Wilk, W.; Chammaa, S.; Sperl, B.; Roth, A. G.; Yektaoglu, A.; Renner, S.; Berg, T.; Arenz, A.; Giannis, A.; Oprea, T. I.; Rauh, D.; Kaiser, M.; Waldmann, H. A Scaffold-Tree-Merging Strategy for Prospective Bioactivity Annotation of γ-Pyrones. Angew. Chem. 2010, 122, 3748−3752. (32) Hu, Y.; Bajorath, J. Global Assessment of Scaffold Hopping Potential for Current Pharmaceutical Targets. MedChemComm 2010, 1, 339−344. (33) Stumpfe, D.; Hu, Y.; Dimova, D.; Bajorath, J. Recent Progress in Understanding Activity Cliffs and Their Utility in Medicinal Chemistry. J. Med. Chem. 2014, 57, 18−28. (34) Vogt, M.; Huang, Y.; Bajorath, J. From Activity Cliffs to Activity Ridges: Informative Data Structures for SAR Snalysis. J. Chem. Inf. Model. 2011, 51, 1848−1856. (35) Gupta-Ostermann, D.; Bajorath, J. Identification of Multi-target Activity Ridges in High-dimensional Bioactivity Space. J. Chem. Inf. Model. 2012, 52, 2579−2586. (36) Hu, Y.; Bajorath, J. Extending the Activity Cliff Concept: Structural Categorization of Activity Cliffs and Systematic Identification of Different Types of Cliffs in the ChEMBL Database. J. Chem. Inf. Model. 2012, 52, 1806−1811. (37) Stumpfe, D.; Dimova, D.; Bajorath, J. Systematic Assessment of Scaffold Hopping versus Activity Cliff Formation across Bioactive Compound Classes Following a Molecular Hierarchy. Bioorg. Med. Chem. 2015, 23, 3183−3191. (38) Saluste, G.; Albarran, M. I.; Alvarez, R. M.; Rabal, O.; Ortega, M. A.; Blanco, C.; Kurz, G.; Salgado, A.; Pevarello, P.; Bischoff, J. R.; Pastor, J.; Oyarzabal, J. Fragment-hopping-based Discovery of a Novel Chemical Series of Proto-oncogene PIM-1 Kinase Inhibitors. PLoS One 2012, 7, e45964. (39) Vainio, M. J.; Kogej, T.; Raubacher, F.; Sadowski, J. Scaffold Hopping by Fragment Replacement. J. Chem. Inf. Model. 2013, 53, 1825−1835. (40) Southan, C.; Boppana, K.; Jagarlapudi, S. A.; Muresan, S. Analysis of In Vitro Bioactivity Data Extracted from Drug Discovery Literature and Patents: Ranking 1654 Human Protein Targets by Assayed Compounds and Molecular Scaffolds. J. Cheminf. 2011, 3, 14. (41) Hu, Y.; Bajorath, J. Systematic Identification of Scaffolds Representing Compounds Active against Individual Targets and Single or Multiple Target Families. J. Chem. Inf. Model. 2013, 53, 312−326. (42) Over, B.; Wetzel, S.; Grütter, C.; Nakai, Y.; Renner, S.; Rauh, D.; Waldmann, H. Natural-product-derived Fragments for Fragmentbased Ligand Discovery. Nat. Chem. 2013, 5, 21−28. (43) Hu, Y.; Bajorath, J. Many Drugs Contain Unique Scaffolds with Varying Structural Relationships to Scaffolds of Currently Available Bioactive Compounds. Eur. J. Med. Chem. 2014, 76, 427−434. (44) Hu, Y.; Bajorath, J. Structural and Activity Profile Relationships between Drug Scaffolds. AAPS J. 2015, 17, 609−619. (45) Hu, Y.; Bajorath, J. Exploring the Scaffold Universe of Kinase Inhibitors. J. Med. Chem. 2015, 58, 315−332. (46) Kayastha, S.; Dimova, D.; Stumpfe, D.; Bajorath, J. Structural Diversity and Potency Range Distribution of Scaffolds from Compounds Active against Current Pharmaceutical Targets. Future Med. Chem. 2015, 7, 111−122. (47) Ertl, P. Database of Bioactive Ring Systems with Calculated Properties and its Use in Bioisosteric Design and Scaffold Hopping. Bioorg. Med. Chem. 2012, 20, 5436−5442. N

DOI: 10.1021/acs.jmedchem.5b01746 J. Med. Chem. XXXX, XXX, XXX−XXX

Journal of Medicinal Chemistry

Perspective

(70) Salaski, E. J.; Krishnamurthy, G.; Ding, W.-D.; Yu, K.; Insaf, S. S.; Eid, C.; Shim, J.; Levin, J. L.; Tabei, K.; Toral-Barza, L.; Zhang, W.G.; McDonald, L. A.; Honores, E.; Hanna, C.; Yamashita, A.; Johnson, B.; Li, Z.; Laakso, L.; Powell, D.; Mansour, T. S. Pyranonaphthoquinone Lactones: A New Class of AKT Selective Kinase Inhibitors Alkylate a Regulatory Loop Cysteine. J. Med. Chem. 2009, 52, 2181− 2184. (71) Yan, A.; Wang, L.; Xu, S.; Xu, J. Aurora-A Kinase Inhibitor Scaffolds and Binding Modes. Drug Discovery Today 2011, 16 (5−6), 260−269.

(48) Bergmann, R.; Linusson, A.; Zamora, I. SHOP: Scaffold HOPping by GRID-based Similarity Searches. J. Med. Chem. 2007, 50, 2708−2717. (49) Langdon, S. R.; Westwood, I. M.; van Montfort, R. L.; Brown, N.; Blagg, J. Scaffold-focused Virtual Screening: Prospective Application to the Discovery of TTK Inhibitors. J. Chem. Inf. Model. 2013, 53, 1100−1112. (50) Ertl, P. Intuitive Ordering of Scaffolds and Scaffold Similarity Searching Using Scaffold Keys. J. Chem. Inf. Model. 2014, 54, 1617− 1622. (51) Hu, Y.; Zhang, B.; Bajorath, J. Method for Systematic Assessment of Chemical Changes in Molecular Scaffolds with Conserved Topology and Application to the Analysis of Scaffoldactivity Relationships. Mol. Inf. 2015, 34, 531−549. (52) Zhang, B.; Vogt, M.; Maggiora, G. M.; Bajorath, J. Design of Chemical Space Networks Using a Tanimoto Similarity Variant Based upon Maximum Common Substructures. J. Comput.-Aided Mol. Des. 2015, 29, 937−950. (53) Rabal, O.; Amr, F. I.; Oyarzabal, J. Novel Scaffold FingerPrint (SFP): Applications in Scaffold Hopping and Scaffold-based Selection of Diverse Compounds. J. Chem. Inf. Model. 2015, 55, 1−18. (54) Varin, T.; Schuffenhauer, A.; Ertl, P.; Renner, S. Mining for Bioactive Scaffolds with Scaffold Networks: Improved Compound Set Enrichment from Primary Screening Data. J. Chem. Inf. Model. 2011, 51, 1528−1538. (55) Langdon, S. R.; Brown, N.; Blagg, J. Scaffold Diversity of Exemplified Medicinal Chemistry Space. J. Chem. Inf. Model. 2011, 51, 2174−2185. (56) Ertl, P.; Rohde, B. The Molecule Cloud - Compact Visualization of Large Collections of Molecules. J. Cheminf. 2012, 4, 12. (57) Gupta-Ostermann, D.; Hu, Y.; Bajorath, J. Introducing the LASSO Graph for Compound Data Set Representation and Structureactivity Relationship Analysis. J. Med. Chem. 2012, 55, 5546−5553. (58) Wassermann, A. M.; Bajorath, J. Directed R-group Combination Graph: A Methodology to Uncover Structure-activity Relationship Patterns in Series of Analogs. J. Med. Chem. 2012, 55, 1215−1226. (59) Rabal, O.; Oyarzabal, J. Biologically Relevant Chemical Space Navigator: From Patent and Structure-activity Relationship Analysis to Library Acquisition and Design. J. Chem. Inf. Model. 2012, 52, 3123− 3137. (60) Zhang, B.; Hu, Y.; Bajorath, J. AnalogExplorer − A New Method for Graphical Analysis of Analog Series and Associated with Structure-activity Relationship Information. J. Med. Chem. 2014, 57, 9184−9194. (61) Hu, Y.; Zhang, B.; Vogt, M.; Bajorath, J. AnalogExplorer2 − Stereochemistry Sensitive Graphical Analysis of Large Analog Series. F1000Research 2015, 4 (Chem. Inf. Sci.), 1031. (62) Gaulton, A.; Bellis, L. J.; Bento, A. P.; Chambers, J.; Davies, M.; Hersey, A.; Light, Y.; McGlinchey, S.; Michalovich, D.; Al-Lazikani, B.; Overington, J. P. ChEMBL: A Large-Scale Bioactivity Database for Drug Discovery. Nucleic Acids Res. 2012, 40, D1100−D1107. (63) Hu, Y.; Bajorath, J. Influence of Search Parameters and Criteria on Compound Selection, Promiscuity, and Pan Assay Interference Characteristics. J. Chem. Inf. Model. 2014, 54, 3056−3066. (64) OEChem, version 1.7.7; OpenEye Scientific Software, Inc.: Santa Fe, NM, U.S., 2012; http://www.eyesopen.com. (65) Hu, Y.; Stumpfe, D.; Bajorath, J. Currently Available Scaffolds and MMP Cores in ChEMBL 20. ZENODO; DOI: 10.5281/ zenodo.35375. (66) Hu, Y.; Bajorath, J. Learning from “Big Data”: Compounds and Targets. Drug Discovery Today 2014, 19, 357−360. (67) Lusher, S. J.; McGuire, R.; van Schaik, R. C.; Nicholson, C. D.; de Vlieg, J. Data-driven Medicinal Chemistry in the Era of Big Data. Drug Discovery Today 2014, 19, 859−868. (68) Hare, B. J.; Walters, W. P.; Caron, P. R.; Bemis, G. W. CORES: An Automated Method for Generating Three-dimensional Models of Protein/Ligand Complexes. J. Med. Chem. 2004, 47, 4731−4740. (69) Klabunde, T.; Hessler, G. Drug Design Strategies for Targeting G-protein-coupled Receptors. ChemBioChem 2002, 3, 928−944. O

DOI: 10.1021/acs.jmedchem.5b01746 J. Med. Chem. XXXX, XXX, XXX−XXX