Network Variants for Analyzing Target-Ligand ... - ACS Publications

This chapter focuses on a discussion of network variants that have been ..... Hu, Y.; Stumpfe, D.; Bajorath, J. Lessons learned from molecular scaffol...
0 downloads 0 Views 2MB Size
Chapter 4

Downloaded by UNIV OF CALIFORNIA SAN DIEGO on December 12, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch004

Network Variants for Analyzing Target-Ligand Interactions Ye Hu and Jürgen Bajorath* Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, D-53113 Bonn, Germany *E-mail: [email protected]

The systematic exploration of target-ligand interactions is the central theme of chemogenomics and also a focal point of chemical informatics. Large-scale interaction analysis is predominantly carried out on the basis of compound activity data and target annotations, rather than three-dimensional structures of target-ligand complexes. This is the case because the structural knowledge base is still much smaller than the volume of available compound activity data. Rationalizing details of target-ligand interactions is an integral part of the drug design process. Compound activity data implicitly encode target-ligand interactions, from which single- or multi-target structure-activity relationships (SARs) can be deduced. The compound-centric approach to systematically mining target-ligand interactions, elucidating SAR patterns across different targets, and identifying key compounds is supported by graphical methods, in particular, molecular networks. This chapter focuses on a discussion of network variants that have been designed for specific applications in target-ligand interaction analysis.

Introduction Given the vastness of theoretically possible chemical space (1), only small sections can principally be explored. Our primary interest is in biologically relevant chemical space that is predominantly populated with biologically active © 2016 American Chemical Society Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by UNIV OF CALIFORNIA SAN DIEGO on December 12, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch004

or potentially active compounds. How to best represent and explore this confined segment of chemical space continues to be a matter of debate. First, it must be rationalized that any representation of biologically relevant chemical space is incomplete, as long as not all available compounds have been tested against all available targets, which represents the ultimate goal of chemogenomics (2, 3). For molecular modeling and design, chemical space representations are typically generated through the use of different chemical descriptors and molecular similarity relationships in such spaces that are mostly assessed on the basis of calculated Tanimoto similarity (4). Currently populated biologically relevant chemical space can then be delineated by mapping compounds with biological activity annotations into chemical reference spaces. Furthermore, adding biological activities as a hyper-surface to similarity-based chemical space representations generates so-called activity landscapes (5) that have higher information content than the former chemical space projections. Moreover, there is a conceptually different way to represent biologically relevant space, i.e. the explicit generation of target-ligand spaces, which might also be referred to as pharmacological spaces (6). There are also different ways to generate such target-ligand spaces. For example, target proteins might be organized on the basis of sequence and/or structural similarity and this target space might be complemented with active compounds linked to their targets (6). However, another intuitive and interpretable way to represent target-ligand spaces is the use of network representations, which have become popular given the network paradigm (7) for the systematic exploration of drug and compound polypharmacology (7–9). Many drugs are known to specifically interact with different targets and their efficacy often depends on multiple interactions that result in the formation of polypharmacological target-ligand networks.

Target-Ligand Networks Such networks can be conceptualized in different ways. For example, targets are often represented as nodes that are connected by edges if they share active compound(s). As a further refinement of such network views, target nodes might be connected if shared ligands reach a predefined level of structural similarity. Such networks are best understood as ligand-based target networks. Alternatively, target-ligand networks might contain both target and compound nodes, hence yielding a bipartite network, which might explicitly account for target-ligand interactions, as illustrated in Figure 1, which shows a bipartite drug-target network. In the network, two types of nodes represent 1226 approved small molecule drugs assembled from DrugBank 3.0 (10) and 881 drug targets, respectively. An edge is drawn between a drug and a target if they are known to interact. In total, the network contains 3776 drug-target interactions (11). For specific applications, many different variants of target-ligand networks can be designed, as discussed in the following. Figure 2 shows a prototypic variant of a target-ligand network. This network contains only one type of node representing targets. Nodes are connected if they share active compounds. 36

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by UNIV OF CALIFORNIA SAN DIEGO on December 12, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch004

Figure 1. Bipartite drug-target network in which red nodes represent approved drugs and blue nodes drug targets. Edges between red and blue nodes indicate drug-target interactions.

Figure 2. Prototypic ligand-based target network. Targets are represented as nodes that are connected by edges if they share active compounds. 37 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Molecular Hierarchies and Compound Data

Downloaded by UNIV OF CALIFORNIA SAN DIEGO on December 12, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch004

For mapping of target-ligand interactions, it is also useful to go beyond individual active compounds and consider the concept compound-scaffoldskeleton hierarchies (12). Including molecular hierarchies in systematic interaction analysis often increases the amount of SAR information that is revealed. Figure 3 illustrates the compound-scaffold-skeleton hierarchy.

Figure 3. Compound-scaffold-skeleton hierarchy. In the compound at the top, R-groups are displayed in gray and the cyclic skeleton at the bottom is shown in bold.

Scaffolds (molecular frameworks) (13) are obtained from complete molecules by removal of R-groups from rings and linkers. Cyclic skeletons (CSKs) further abstract from scaffolds by transforming all heteroatoms to carbons and setting all bond orders to one. Thus, following the so defined hierarchy, multiple compounds can yield the same scaffold and multiple scaffolds the same CSK. Importantly, compounds and scaffolds sharing the same CSK are topologically equivalent. Thus, the hierarchy defines topological relationships between molecular entities, which can be complemented with substructure relationships. Compounds and activity data discussed in the following were taken from BindingDB (14), ChEMBL (15), and PubChem (16). Many active compounds from medicinal chemistry resources have subsequently been tested against targets 38

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by UNIV OF CALIFORNIA SAN DIEGO on December 12, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch004

other than their primary targets, giving rise to additional biological annotations. For mapping of target-ligand interactions, compound optimization and secondary assay data are typically more informative than original screening data. Typically, active compounds and target-ligand annotations are systematically extracted from the databases and either merged or analyzed in parallel, depending on the specific goals of the analysis. Often, additional selection criteria must be applied such as the availability of well-defined potency measurements or the consistency of multiple potency records. Furthermore, target family information is often also taken into consideration. A meaningful data mining effort should go beyond purely statistical data assessment and aim at extracting knowledge from the data and obtaining new insights. This also applies to compound-centric mapping of target-ligand interactions. In the following, selected interaction analyses will be discussed for which specific network variants were designed.

From Privileged Substructures to Target Community-Selective Scaffolds In medicinal chemistry, the concept of privileged substructures has been heavily investigated since its introduction by Evans and colleagues in 1988 (17). In this seminal investigation, it was observed that cholecystokinin antagonists contained conserved scaffolds, illustrated in Figure 4, which were -at that timenot often found in other active compounds.

Figure 4. Exemplary privileged substructure (top). Three representative cholecystokinin antagonists are displayed (bottom) that contain this privileged substructure (top) according to Evans et al. (17).

39 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by UNIV OF CALIFORNIA SAN DIEGO on December 12, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch004

The privileged substructure concept postulates the existence of core structures that yield compounds with selectivity for members of individual target families. The definition has been further refined over time (18) but the existence of privileged substructures has also been questioned (19). Typically, privileged substructures have been proposed on the basis of medicinal chemistry knowledge and comparison of series of active compounds. Such proposals have been retrospectively assessed by frequency-of-occurrence analysis, revealing that putative privileged structural motifs also appear with notable frequency in compounds active against other target families. However, the issue whether or not privileged substructures exist can also be addressed through systematic data mining, rather than re-evaluation of knowledge-based hypotheses. In 2010, we reported a study designed to investigate the privileged substructure concept from a different perspective (20). The analysis departed from frequency-of-occurrence assessment of pre-selected substructures. By contrast, it involved a systematic compound data mining effort on the basis of target-ligand annotations known at that time. The focal point of the analysis was to determine whether molecular scaffolds existed that exclusively occurred in compounds active against individual target families. Accordingly, the study involved systematic mapping of target-ligand interactions on the basis of pre-selected compounds with reported activities against human targets and well-defined potency measurements taken from BindingDB. A key step to facilitate this analysis was the organization of all active compounds into so-called target pair sets. Each set consisted of all compounds active against a pair of targets. A qualifying target pair set contained at least five compounds. Depending on the number of available activity annotations, it was possible that compound participated in multiple target pair sets. From a pool of ~18,000 qualifying bioactivity records, 520 target pair sets were generated involving a total of 6,343 compounds active against 259 human targets (20). On the basis of these target pair sets, a compound-based target network was generated, shown in Figure 5. In this network, nodes represented targets and edges pair sets, i.e. an edge connected two nodes forming a set. The edge width was scaled by increasing numbers of compounds shared by targets. The network representation revealed well-defined communities of major therapeutic targets, a key finding of this study. Importantly, target relationships leading to community formation were exclusively established on the basis of shared ligands. A total of 18 communities with at least four targets were obtained. Then, ligands associated with each community were examined for the presence of community-selective scaffolds, i.e. scaffolds exclusively contained in compounds active against targets belonging to an individual target community. More than 200 community-selective scaffolds were identified that yielded 147 unique CSKs in this proof-of-concept investigation. The identification of many community-selective scaffolds provided substantial support for the privileged substructure concept on the basis of available compound activity. Exemplary community-selective scaffolds are also shown in Figure 5.

40 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by UNIV OF CALIFORNIA SAN DIEGO on December 12, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch004

Figure 5. Compound-based target network used for the identification of target communities and community-selective scaffolds. Nodes represent targets. Two nodes are connected by an edge if the targets share at least five active compounds. Eighteen target communities are labeled that contain at least four targets. For each of these communities (1a, 1b, 3 and 8), two representative community-selective scaffolds are displayed.

Target Selectivity Patterns The pool of originally identified community-selective scaffolds also provided an attractive basis for exploration of target selectivity. Accordingly, for each compound in a target pair set, its target selectivity (TS) was calculated as the logarithmic potency difference for the target pair. For each community-selective scaffold active against a given target, all compounds containing this scaffold were pooled, all possible TS values were calculated, and the median TS was determined. Median TS values were compared for different targets in a community. The comparison revealed that many scaffolds represented compounds having different selectivity against targets within a community. However, different scaffolds displayed similar selectivity profiles (21). In this context, the issue of data sparseness must also be considered. Data sparseness refers to the fact that activity annotations of compounds continue to be incomplete because not all compounds have been tested against all targets (which represents the ultimate 41

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by UNIV OF CALIFORNIA SAN DIEGO on December 12, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch004

goal of chemogenomics). Thus, when more experimental measurements and activity annotations become available, the number of compounds with apparent selectivity might be reduced. Due to data sparseness, truly target-selective scaffolds could not be confirmed because those scaffolds that were selective for an individual target over one or more other targets were generally only contained in one or two compounds (21). However, more than half of the original pool of community-selective scaffolds were contained in at least five active compounds and displayed a clear tendency to produce target-selective compounds, i.e. compounds that were preferentially (but not exclusively) highly potent against one among several targets. Such selectivity patterns were captured in a scaffold-based target selectivity network, shown in Figure 6. These network variant was of critical importance for rationalizing the results of this follow-up investigation. In this network, nodes represented targets and directed edges “selective over” relationships. Such a relationship existed if a scaffold represented compounds that were consistently selective for one target A over another target B, resulting in a directed edge pointing from target A to B. In Figure 6, relationships are displayed at the 50-fold selectivity level, i.e. compounds had to be at least 50-fold more potent against target A than B. Edge width was scaled according to the number of scaffolds involved in a selectivity relationship. This network representation revealed a number of targets with scaffold selectivity over multiple others. These targets were termed “selectivity hubs” (21).

From Selectivity to Promiscuity A logical extension of the study of scaffold-centric compound selectivity was considering the other end of the binding spectrum, which ranges from single-target activity/selectivity to multi-target activity/promiscuity, and searching for intrinsically promiscuous chemotypes that would yield compounds with activity across different targets or target families. Such compound classes would be relevant for the study of polypharmacology. The term chemotype is used here to refer to CSKs and corresponding scaffolds. To search for promiscuous scaffolds and chemotypes, the analysis scheme was further extended. Instead of target pair sets, individual target sets were assembled, each of which had to contain at least 10 compounds with a potency of at least 1 μM. From BindingDB and ChEMBL, sets were obtained for a total of 458 different targets belonging to 19 families. These target sets comprised ~35,000 compounds that yielded 13,462 unique scaffolds. The target annotations of these compounds and the resulting scaffolds were analyzed and 435 scaffolds were identified that represented compounds with activity against targets belonging to two or more families (22). Of these multi-activity scaffolds, 83 represented compounds that were active against targets from three to 13 different target families and were thus designated promiscuous scaffolds. These scaffolds corresponded to 33 topologically distinct CSKs. Representative examples are shown in Figure 7. An important observation was that promiscuous CSKs were not always small and generic, as one might perhaps expect. Rather, they included chemotypes having different structural complexity and diverse topologies, as illustrated in Figure 7. The proof-of-concept search 42

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by UNIV OF CALIFORNIA SAN DIEGO on December 12, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch004

for promiscuous chemotypes provided an instructive example for the utility of molecular hierarchies to systematically capture target-ligand interactions. For each promiscuous CSK, relationships between its scaffolds and their target families were determined and target profiles were generated. Figure 8 shows a representative bipartite scaffold-target family network for a given CSK. This network variant contained two types of nodes representing scaffolds and target families, respectively. An edge was drawn between a scaffold and target family node if the scaffold represented compounds active against targets belonging to the family. The analysis revealed a variety of scaffold-family relationships with greatly varying degrees of promiscuity and partly overlapping but also distinct family profiles. Hence, there were no dominant promiscuity patterns detectable across different chemotypes and target families. In addition, target profiles of scaffolds of each promiscuous CSK were also analyzed. Therefore, target-based scaffold networks were generated, as shown in Figure 9. In this case, nodes represented scaffolds annotated with varying numbers of targets and edges were drawn between nodes if the corresponding scaffolds shared activity against one or more of these targets. Unexpectedly, these network representations revealed different activity profiles of closely related scaffolds (22).

Figure 6. Scaffold-based target selectivity network at the 50-fold selectivity level. Nodes represent targets and edges indicate target selectivity relationships. The width of edges is scaled according to the number of scaffolds involved in a relationship. Edges representing single scaffolds are colored gray. Selectivity hubs are indicated using thick black circles. 43 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by UNIV OF CALIFORNIA SAN DIEGO on December 12, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch004

Figure 7. 14 examples of promiscuous CSKs. For each CSK, the number of target families it is active against and the number of promiscuous scaffolds it represents are reported. For example, the cyclohexane CSK represents five promiscuous scaffolds yielding compounds active against targets belonging to a total of 15 families.

Figure 8. Representative bipartite scaffold-target family network for the cyclohexane CSK. Circular nodes indicate scaffolds represented by this CSK and rectangular nodes target families. A scaffold node is connected to a target family node if compounds represented by the scaffold are active against targets belonging to the family.

44 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by UNIV OF CALIFORNIA SAN DIEGO on December 12, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch004

In this case, probable consequences of data sparseness contrast the ones discussed above for community-selective scaffolds. Once more active compounds and measurements become available it is likely that increasing numbers of promiscuous scaffolds emerge and that target profiles of related scaffolds might even be more differentiated than observed in the study discussed above.

Figure 9. Target-based scaffold network. Activity profiles of five scaffolds represented by the given CSK (cyclohexane) are reported in bold. For example, the benzene scaffold represents compounds active against a total of 88 targets from 13 families. Nodes represent scaffolds and two nodes are connected by an edge if they share one or more targets. For each scaffold pair, the number of shared targets is reported.

45 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by UNIV OF CALIFORNIA SAN DIEGO on December 12, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch004

Activity Cliffs As a conceptually different application of compound-centric interaction analysis is the search for scaffolds that represent compounds forming activity cliffs. Similar to privileged substructures, the activity cliff concept is also frequently applied in medicinal chemistry and chemical informatics. Activity cliffs are formed by structurally similar compounds that share the same activity but have large differences in potency (23). In another proof-of-concept investigation, activity cliff analysis was refocused from compounds to scaffolds. A systematic analysis of scaffolds carrying single- and multi-target activities was carried out and a scaffold discontinuity score was designed to quantify the ability of compounds sharing the same scaffold to form activity cliffs (24):

Here |pi – pj| is the absolute potency difference of a compound pair i and j represented by scaffold s, sim(i, j) the Tanimoto similarity calculated using a fingerprint, and |ij| the number of all compound pairs sharing the given scaffold s. Hence, all compounds sharing the same scaffold were compared in a pairwise manner. Raw discontinuity scores were normalized with respect to the distribution of scores of all scaffolds to obtain final scores between 0 and 1. Accordingly, scores close to 1 indicated the presence of large-magnitude activity cliffs for a given scaffold. Scaffolds were systematically extracted from active compounds and pre-selected if they had a discontinuity score > 0.80, were represented by more than two compounds (i.e. at least three pairs), and were active against at least two targets. These requirements were only met by 212 scaffolds from a large pool of nearly 18,000. For each of these scaffolds, the discontinuity score was then recalculated on a per-target basis to identify scaffolds that formed significant activity cliffs for more than one target. Scaffolds were selected that produced a target-based discontinuity score > 0.8 for at least two different targets. A total of 103 pre-selected scaffolds met this criterion. These scaffolds also had significantly varying size and chemical complexity. Multi-target activity cliffs formed by these scaffolds were then analyzed in cliff-forming scaffold-based target networks, as shown in Figure 10. In this case, nodes corresponded to targets with a scaffold that represented compounds forming activity cliffs. Edges were drawn between targets if they shared active compounds containing this scaffold. Node coloring accounted for single shared compounds or the target-based discontinuity score produced by multiple active compounds. For each scaffold, a mini-network was generated. A variety of the 103 qualifying scaffolds represented compounds forming large-magnitude activity cliffs against multiple targets from one or more families (24); an unexpectedly large number, revealed in a proof-of-concept investigation. 46

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by UNIV OF CALIFORNIA SAN DIEGO on December 12, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch004

Figure 10. Activity cliff-forming scaffold-based target network for a given scaffold (upper left corner). Nodes represent targets that are connected by an edge if they share compounds containing the cliff-forming scaffold. A node is colored white if only a single compound is active against the target, gray if the target yields a discontinuity score of at most 0.8, or black if the score exceeds 0.8. Two exemplary compounds containing the scaffold are shown and corresponding potency values (pKi) are reported for carbonic anhydrases (CA) 1, 2, and 9.

Scaffold Hopping Potential The concept of scaffold hopping refers to the identification of compounds that are active against the same target but contain different core structures (25). Scaffold hopping is often considered as an essential criterion for evaluation of virtual screening methods. For the assessment of scaffold hopping, the definition of scaffolds is of critical importance. Scaffold hopping potential was investigated for compounds active against a wide spectrum of pharmaceutical targets and it was analyzed how frequently scaffold hops occurred (26). A total of 795 different target sets were assembled from BindingDB and ChEMBL. The analysis was focused on topologically distinct scaffolds. Among scaffolds sharing the same CSK within a target set, only the scaffold representing the largest number of compounds was retained. If multiple scaffolds represented the same number of compounds, the scaffold yielding the highest median compound potency was selected for further analysis. Therefore, scaffolds only distinguished by heteroatoms and/or bond orders were disregarded. In addition, each target set was required to contain at least five bioactive compounds with at least 1μM potency and two topologically equivalent scaffolds. On the basis of these criteria, 502 qualifying target sets belonging to 19 different families were obtained.

47 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by UNIV OF CALIFORNIA SAN DIEGO on December 12, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch004

For each target set, the number of topologically distinct scaffolds was determined. It was observed that the majority of target sets contained between five and 49 scaffolds, which represented the average scaffold diversity and hopping potential across different targets. In addition, 70 target sets (i.e. ~14%) consisted of compounds represented by 50 or more scaffolds. Most of these corresponding targets were well-known pharmaceutical targets including different subtypes of G-protein coupled receptors (GPCRs), protein kinases, and proteases. In addition, a “hopping score” was defined for which compound potency information was taken into consideration and calculated for individual scaffold pairs in a target set (26):

Here sim(i, j) is the Tanimoto similarity of MACCS keys of two scaffolds i and j, PCi and PCj are the potency values of compound Ci and Cj represented by scaffold i and j, respectively, |Cij| is the number of compound pairs represented by the scaffold pair ij. Raw scores were normalized with respect to the distribution of all original scores to obtain final scores between 0 and 1. Accordingly, scaffold pairs that displayed low structural similarity and represented compounds with comparable potency values yielded high scores. For a given target set, its hopping score was determined as the median of all normalized scaffold pair scores.

Figure 11. Scaffold-based target network in which nodes represent targets that are connected by edges if they share at least one scaffold. Edges are scaled according to the number of shared scaffolds. Nodes are colored on the basis of target families they belong to and scaled by scaffold hopping scores.

48 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by UNIV OF CALIFORNIA SAN DIEGO on December 12, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch004

Furthermore, scaffold overlaps between 70 target sets with highest scaffold hopping potential were determined and visualized in a scaffold-based target network, as shown in Figure 11. In this network, nodes represented targets that were connected by edges if they shared one or more scaffolds. Edges were scaled according to the number of shared scaffolds. Nodes were colored on the basis of target families and scaled by scaffold hopping scores. Among a total of 142 target relationships, 106 relationships were formed exclusively within individual target families (i.e. intra-family relationships) and the remaining 36 relationships across different families (inter-family relationships). In summary, for the majority of pharmaceutical targets, considerable scaffold hopping potential was detected.

Concluding Remarks In this chapter, we have introduced approaches for mining of compound activity data to systematically map target-ligand interactions. For compoundcentric mapping of target-ligand interactions, the compound-scaffold-skeleton hierarchy is often employed and network representations play an important role. A variety of network variants have been discussed that represent target-ligand interaction patterns or different relationships between scaffolds and/or targets. For example, the idea of privileged substructures was revisited from a systematic data mining perspective and target community-selective scaffolds were introduced in support of the privileged substructure concept. In addition, the compound-scaffold-skeleton hierarchy was applied to explore promiscuity patterns. In this case, network variants also played an important role in rationalizing the results of data mining. Furthermore, activity cliffs and scaffold hopping potential were studied with the aid of network representations. Most of the studies discussed herein had proof-of-concept character, paving the way for medicinal chemistry applications guided by molecular hierarchies and network views.

References 1. 2. 3. 4.

5.

6.

Dobson, C. M. Chemical space and biology. Nature 2004, 432, 824–828. Jacoby, E. Computational chemogenomics. WIREs Comput. Mol. Sci. 2011, 1, 57–67. Rognan, D. Chemogenomics approaches to rational drug design. Br. J. Pharmacol. 2007, 152, 38–52. Geppert, H.; Vogt, M.; Bajorath, J. Current trends in ligand-based virtual screening: Molecular representations, data mining methods, new application areas, and performance evaluation. J. Chem. Inf. Model. 2010, 50, 205–216. Wassermann, A. M.; Wawer, M.; Bajorath, J. Activity landscape representations for structure-activity relationship analysis. J. Med. Chem. 2010, 53, 8209–8223. Nisius, B.; Bajorath, J. Mapping of pharmacological space. Expert. Opin. Drug Discovery 2011, 6, 1–7. 49

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

7. 8.

9.

Downloaded by UNIV OF CALIFORNIA SAN DIEGO on December 12, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch004

10.

11.

12. 13. 14.

15.

16. 17.

18. 19.

20.

21. 22.

Hopkins, A. L. Network pharmacology: the next paradigm in drug discovery. Nat. Chem. Biol. 2008, 4, 682–690. Keiser, M. J.; Roth, B. L.; Armbruster, B. N.; Ernsberger, P.; Irwin, J. J.; Shoichet, B. K. Relating protein pharmacology by ligand chemistry. Nat. Biotechnol. 2007, 25, 196–206. Paolini, G. V.; Shapland, R. B. H.; vanHoorn, W. P.; Mason, J. S.; Hopkins, A. L. Global mapping of pharmacological space. Nat. Biotechnol. 2006, 24, 805–815. Knox, C.; Law, V.; Jewison, T.; Liu, P.; Ly, S.; Frolkis, A.; Pon, A.; Banco, K.; Mak, C.; Neveu, V.; Djoumbou, Y.; Eisner, R.; Guo, A. C.; Wishart, D. S. DrugBank 3.0: a comprehensive resource for ‘omics’ research on drugs. Nucleic Acids Res. 2011, 39, D1035–D1041. Hu, Y.; Gupta-Ostermann, D.; Bajorath, J. Exploring compound promiscuity patterns and multi-target activity space. Comput. Struct. Biotechnol. J. 2014, 9, e201401103. Hu, Y.; Stumpfe, D.; Bajorath, J. Lessons learned from molecular scaffold analysis. J. Chem. Inf. Model. 2011, 51, 1742–1753. Bemis, G. W.; Murcko, M. A. The properties of known drugs. 1. Molecular frameworks. J. Med. Chem. 1996, 39, 2887–2893. Liu, T.; Lin, Y.; Wen, X.; Jorissen, R. N.; Gilson, M. K. BindingDB: A web-accessible database of experimentally determined protein-ligand binding affinities. Nucleic Acids Res. 2007, 35, D198–D201. Gaulton, A.; Bellis, L. J.; Bento, A. P.; Chambers, J.; Davies, M.; Hersey, A.; Light, Y.; McGlinchey, S.; Michalovich, D.; Al-Lazikani, B.; Overington, J. P. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2012, 40, D1100–D1107. Li, Q.; Cheng, T.; Wang, Y.; Bryant, S. H. PubChem as a public resource for drug discovery. Drug Discovery Today 2010, 15, 1052–1057. Evans, B. E.; Rittle, K. E.; Bock, M. G.; Dipardo, R. M.; Freidinger, R. M.; Whitter, W. L.; Lundell, G. F.; Veber, D. F.; Anderson, P. S. Methods for drug discovery: Development of potent, selective, orally effective cholecystokinin antagonists. J. Med. Chem. 1988, 31, 2235–2246. Müller, G. Medicinal chemistry of target family-directed masterkeys. Drug Discovery Today 2003, 8, 681–691. Schnur, D. M.; Hermsmeier, M. A.; Tebben, A. J. Are target-familyprivileged substructures truly privileged? J. Med. Chem. 2006, 49, 2000–2009. Hu, Y.; Wassermann, A. M.; Lounkine, E.; Bajorath, J. Systematic analysis of public domain compound potency data identifies selective molecular scaffolds across druggable target families. J. Med. Chem. 2010, 53, 752–758. Hu, Y.; Bajorath, J. Exploring target-selectivity patterns of molecular scaffolds. ACS Med. Chem. Lett. 2010, 1, 54–58. Hu, Y.; Bajorath, J. Polypharmacology directed data mining: Identification of promiscuous chemotypes with different activity profiles and comparison to approved drugs. J. Chem. Inf. Model. 2010, 50, 2112–2118. 50

Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.

Downloaded by UNIV OF CALIFORNIA SAN DIEGO on December 12, 2016 | http://pubs.acs.org Publication Date (Web): October 5, 2016 | doi: 10.1021/bk-2016-1222.ch004

23. Stumpfe, D.; Bajorath, J. Exploring activity cliffs in medicinal chemistry. J. Med. Chem. 2012, 55, 2932–2942. 24. Hu, Y.; Bajorath, J. Molecular scaffolds with high propensity to form multitarget activity cliffs. J. Chem. Inf. Model. 2010, 50, 500–510. 25. Schneider, G.; Neidhart, W.; Giller, T.; Schmid, G. “Scaffold hopping” by topological pharmacophore search: a contribution to virtual screening. Angew. Chem., Int. Ed. 1999, 38, 2894–2896. 26. Hu, Y.; Bajorath, J. Global assessment of scaffold hopping potential for current pharmaceutical targets. MedChemComm 2010, 1, 339–344.

51 Bienstock et al.; Frontiers in Molecular Design and Chemical Information Science - Herman Skolnik Award Symposium 2015: ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.