Article pubs.acs.org/jmc
Method for the Evaluation of Structure−Activity Relationship Information Associated with Coordinated Activity Cliffs Dilyana Dimova, Dagmar Stumpfe, and Jürgen Bajorath* Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstraße 2, D-53113 Bonn, Germany ABSTRACT: Activity cliffs are generally defined as pairs of active compounds having a large difference in potency. Although this definition of activity cliffs focuses on compound pairs, the vast majority of cliffs are formed in a coordinated manner. This means that multiple highly and weakly potent compounds form series of activity cliffs, which often overlap. In activity cliff networks, coordinated cliffs emerge as disjoint activity cliff clusters. Recently, we have identified all cliff clusters from current bioactive compounds and analyzed their topologies. For structure−activity relationship (SAR) analysis, activity cliff clusters are of high interest, since they contain more SAR information than cliffs that are individually considered. For medicinal chemistry applications, a key question becomes how to best extract SAR information from activity cliff clusters. This represents a challenging problem, given the complexity of many activity cliff configurations. Herein we introduce a generally applicable methodology to organize activity cliff clusters on the basis of structural relationships, prioritize clusters, and systematically extract SAR information from them.
■
INTRODUCTION Activity cliffs are of interest in medicinal chemistry because they reveal small chemical changes leading to large biological activity alterations.1 The general definition of activity cliffs1,2 requires a clear specification of structural similarity and potency difference criteria.1,3,4 Otherwise, it is not possible to describe activity cliffs in a consistent manner, search for them in compound activity data, and analyze their distribution across different compound classes and targets.3−5 Compound similarity relationships can be assessed in different ways, which represents an important variable for activity cliff exploration. A variety of molecular representations can be employed to calculate whole-molecule similarity.1,6 The representation dependence of similarity calculations and the limited chemical interpretability of calculated similarity values complicate activity cliff exploration and exploitation in the practice of medicinal chemistry.3,4 Alternatively, similarity might be assessed on the basis of substructure relationships, for example, utilizing molecular scaffolds,7,8 scaffold/R-group relationships,8 or matched molecular pairs (MMPs),9,10 which often increases the interpretability of activity cliff information.3,4 An MMP is generally defined as a pair of compounds that only differ by the exchange of two substructures9,11 representing a chemical transformation.11 We strongly favor a substructure-based definition of activity cliffs on the basis of MMPs.4,10 By introduction of MMP transformation size restrictions,10 structural differences between activity cliff forming compounds have generally been limited to small and chemically sound replacements,10 leading to the introduction of so-called MMP-cliffs.10 MMP-cliffs require the formation of a transformation size-restricted MMP and a potency difference of at least 2 orders of magnitude between cliff partners on the basis of equilibrium constants (Ki © 2014 American Chemical Society
values). This represents our preferred activity cliff representation, which is also consistently applied herein. Another fundamental aspect concerning the general definition of activity cliffs is the focus on compound pairs. However, this does not mean that activity cliffs are formed in isolation. Rather, the vast majority of activity cliffs, i.e., more than 95% across all compound activity classes,12 are formed in a coordinated manner.4,12,13 Coordinated activity cliffs have higher SAR information content than individually considered cliffs, which makes coordinated cliffs particularly attractive for medicinal chemistry. In order to extract coordinated cliffs from current bioactive compounds and characterize them, we have recently generated a global activity cliff network14 from the ChEMBL database.15 In the activity cliff network, coordinated cliffs emerged as disjoint activity cliff clusters of varying size, composition, and topology.14 Approximately 20 000 MMP-cliffs were extracted from bioactive compounds, and only 769 of these cliffs were formed in isolation. The global network was found to contain more than 2000 activity cliff clusters with 450 distinct topologies. For small to moderately sized clusters, three main topology categories termed the “star”, “chain”, and “rectangle” topology and variants of these topologies were recurrent.14 We have now asked the question of how SAR information might be systematically extracted from clusters of coordinated activity cliffs to make associated SAR information available for medicinal chemistry applications. Clearly, characterizing cluster topologies is insufficient for this purpose. Therefore, we have developed a method to organize and prioritize activity cliff clusters on the basis SAR information content and extract SAR Received: April 14, 2014 Published: July 11, 2014 6553
dx.doi.org/10.1021/jm500577n | J. Med. Chem. 2014, 57, 6553−6563
Journal of Medicinal Chemistry
Article
“# MMP-cores” is the number of different MMP-cores (# MMP-cores ≥ 1) involved in the formation of all MMP-cliffs in a cluster, and “# MMP-cliffs” is the total number of all MMP-cliffs per cluster (# MMP-cliffs ≥ 2). Hence, the Core_idx_raw is in the range
information in a consistent manner. This methodology is reported herein.
■
MATERIALS AND METHODS
Compound Data Sets. Compounds and activity data were assembled from ChEMBL15 (version 17). Rigorous data selection criteria were applied. In our analysis, only compounds with precisely defined equilibrium constants (Ki values) for human targets at the highest confidence level (ChEMBL confidence score 9) were considered. Compounds with multiple activity annotations for the same target only qualified for further analysis if all values fell within 1 order of magnitude. In this case, the average potency value was used as the final activity annotation. Otherwise the compound was not further considered. Structural analysis was based on molecular graphs. If multiple stereoisomers of a compound with potency within 1 order of magnitude were reported in ChEMBL records, the compound was retained. A total of 72 494 compounds with activity against 661 targets were obtained (i.e., yielding 661 different target sets). MMP-Cliffs. A matched molecular pair (MMP)9 is formed by two compounds that are only distinguished by a structural change at a single site, i.e., the exchange of a pair of substructures. The substructure exchange is termed a chemical transformation.11 The conserved core structure shared by compounds forming an MMP is termed the MMP-core. For all qualifying compounds, transformation size-restricted MMPs10 were systematically calculated using the OpenEye chemistry toolkit16 and an in-house implementation of the Hussain and Rea algorithm.11 Furthermore, an MMP-cliff10 is formed by two compounds forming a transformation size-restricted MMP having a potency difference of at least 2 orders of magnitude. An MMP-cliff is a substructure-based representation of an activity cliff.10 A total of 329 342 MMPs were obtained that originated from 467 target sets. These MMPs yielded 15 543 MMP-cliffs4,10 that were formed by 11 787 compounds in 282 target sets. Only ∼1.5% of all MMP-cliffs resulted from different assays. Activity Cliff Network. A global target-based activity cliff network was generated based on MMP-cliffs. In this network, nodes represented cliff-forming compounds and edges MMP-cliffs. The network was generated using Cytoscape.17 Isolated activity cliffs were removed from the network prior to activity cliff cluster analysis. Activity Cliff Cluster Indices. Two different cluster indices were designed. First, the MMP index (MMP_idx) accounts for structural similarity between cluster compounds: MMP_idx_raw =
0