Computational Method for the Systematic ... - ACS Publications

Aug 8, 2016 - ABSTRACT: A computational methodology is introduced for detecting all unique series of analogs in large compound data sets, regardless o...
4 downloads 6 Views 2MB Size
Article pubs.acs.org/jmc

Computational Method for the Systematic Identification of Analog Series and Key Compounds Representing Series and Their Biological Activity Profiles Dagmar Stumpfe,† Dilyana Dimova,† and Jürgen Bajorath* Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstrasse 2, D-53113 Bonn, Germany ABSTRACT: A computational methodology is introduced for detecting all unique series of analogs in large compound data sets, regardless of chemical relationships between analogs. No prior knowledge of core structures or R-groups is required, which are automatically determined. The approach is based upon the generation of retrosynthetic matched molecular pairs and analog networks from which distinct series are isolated. The methodology was applied to systematically extract more than 17 000 distinct series from the ChEMBL database. For comparison, analog series were also isolated from screening compounds and drugs. Known biological activities were mapped to series from ChEMBL, and in more than 13 000 of these series, key compounds were identified that represented substitution sites of all analogs within a series and its complete activity profile. The analog series, key compounds, and activity profiles are made freely available as a resource for medicinal chemistry applications.



INTRODUCTION Series of structurally analogous compounds produced in the course of chemical optimization are a natural focal point of medicinal chemistry. From analogs, structure−activity relationships (SARs) are derived to guide further compound optimization.1 In the practice of medicinal chemistry, analog series are mostly generated on a case-by-case basis and conventional R-group tables continue to be the prevalent data format for monitoring evolving series and associated SAR information. Quantitative SAR (QSAR) analysis2 has traditionally been used as a computational approach to support analog design by predicting the activity of new analogs prior to synthesis. Other computational methods have been developed to further extend the R-group table format3 or hierarchically organize analog sets and visualize SAR patterns associated with large series.4,5 Furthermore, computational concepts have recently been introduced to monitor SAR progression during compound optimization.6,7 While series-centric efforts continue to dominate the generation of analogs in medicinal chemistry, only very few attempts have thus far been made to globally view accessible analog space and systematically search for analog series in compound repositories. In particular, no efficient computational approach is presently available to extract all available series from compound collections. From a computational perspective, a comprehensive search for analog series among structurally heterogeneous compounds is a nontrivial task. The situation differs for individual compounds. For example, a chemically intuitive way to detect analogs of a query compound is the calculation of matched molecular pairs (MMPs),8,9 which are © XXXX American Chemical Society

defined as pairs of compounds that only differ by a structural change at a single site (i.e., the exchange of a pair of substructures). As an extension of the MMP concept, matching molecular series (MMSs) were introduced as series of compounds that are only distinguished by chemical modifications at a single site.10 Accordingly, MMSs can also be used to represent analog series.11 For the generation of MMPs and MMSs, elegant and computationally efficient molecular fragmentation algorithms are available.12 Furthermore, retrosynthetic combinatorial analysis procedure (RECAP) rules13 have been applied to generate MMPs.14 Following this approach, retrosynthetic rules replace random fragmentation of bonds. These RECAP-MMPs14 are synthetically accessible and thus further increase the relevance of MMPs for the exploration of chemical relationships. An MMP-based analog search was carried out in the ChEMBL database,15 the major public repository for compounds and activity data from medicinal chemistry, for preselected query compounds.16 For each query, available analogs were identified. A surprising finding was that for more than a third of active compounds taken from ChEMBL, no analogs were detected.16 Furthermore, the promiscuity of analogs within a set was determined and compared. Compound promiscuity was defined as the number of targets a compound was reported to be active against.17,18 So-defined promiscuity represents the molecular basis of polypharmacology19−21 and is distinct from undesired promiscuity due to reactivity of compounds or other assay artifacts.17 For about three-quarters of all sets, identified analogs Received: June 16, 2016

A

DOI: 10.1021/acs.jmedchem.6b00906 J. Med. Chem. XXXX, XXX, XXX−XXX

Journal of Medicinal Chemistry

Article

disjoint clusters were isolated. Each cluster represented a unique series of analogs covering all available substitution sites and R-groups. Figure 1 illustrates the methodology. Network representations were drawn with Cytoscape.27 Key Compounds. For compounds comprising each cluster, substitution sites were recorded and target annotations collected. A search was carried out for individual compounds that were substituted at all sites and active against all targets other cluster members were active against. The union of all target annotations represented the activity profile of a cluster. If available, such compounds were considered representatives (key compounds) of given analog series. All analog series and key compounds are made freely available via an open access deposition on the Zenodo platform.28

were found to have the same degree of promiscuity as the query compound.16 While an MMP-based analog search is useful to detect available analogs for given compounds that differ by modifications at a single site, it does not identify and distinguish between unique series of analogs taking all chemical modifications and compound relationships into account. This task goes much beyond individual analog searches. Herein we present, to our knowledge, a first generally applicable methodology to extract all available analog series from compound sources. The approach combines the generation of retrosynthetic MMPs and analog networks in which unique series emerge as disjoint clusters that can be isolated and visualized. In a large-scale analysis of analog space across ChEMBL, 17 371 distinct series were identified and their target annotations studied. Rather unexpectedly, compounds were frequently detected that captured all analog relationships within a newly identified series, including different sites of chemical diversification, and shared all reported biological activities. Thus, these compounds were series-representative from a chemical and biological viewpoint. They delineate currently accessible analog space and provide a previously unconsidered source of knowledge for analog mapping and design including the consideration of multitarget activities. Herein we introduce the computational methodology and report the results of our large-scale application.





RESULTS AND DISCUSSION Methodological Concept. We introduce an intuitive and effective computational methodology to extract distinct analog series from compound data sets of any source. The approach is outlined in Figure 1. Compound-based analog searches can be carried out in different ways, for example, by simple substructure searching using predefined scaffolds or, more objectively and systematically, by applying the MMP formalism, as discussed above. In order to further increase the chemical relevance of computationally identified analogs, core−substituent size relationships and retrosynthetic rules were applied for MMP generation. MMP calculations make it possible to identify all pairwise analog relationships for query compounds, but this is not sufficient to determine unique series. Therefore, pairwise relationships must be systematically organized, which is accomplished through the generation of a global network where compounds are represented as nodes and connected to each other if they form an MMP relationship. In the network, unique series of analogs emerge as disjoint clusters. Importantly, a cluster contains all compounds sharing a given core structure, all chemically explored substitution sites, and currently available R-groups. The core structure is not predefined but results from comprehensive molecular comparisons. Hence, no prior knowledge or structural definitions are required, which also sets the methodology apart from scaffoldbased approaches. As illustrated in Figure 1, a three-step process is applied to identify unique series. In the first step, each compound is used once as a query to generate all possible compound-based analog subsets. In the second step, overlapping subsets are combined and the global network structure is constructed. In the third step, all disjoint clusters representing individual series are identified. A compound can only belong to one series. The network structure is computationally analyzed by searching the underlying relationship matrix and can be visualized in part, e.g., by focusing on sections or individual clusters, or as a whole. Visualization further supports the analysis of analog series, which can be applied to screening or optimization data sets of any source. If analog series are extracted from large compound collections with activities against many different targets, as presented in the following, target annotations of compounds are mapped to individual clusters once the structural organization has been finalized. This makes it possible to further distinguish between analog series with single- or multitarget activities and identify key compounds with high SAR information content. Searching for Analog Series. In a large-scale application of the method described above, we have systematically searched for analog series in 167 290 ChEMBL compounds for which well-defined activity data were available. These compounds were active against a total of 1594 targets. Retrosynthetic analog relationships were detected for 96 889 compounds with activity against 1382 targets. The resulting global analog network consisted

METHODS AND MATERIALS

Compound Activity Data. Compounds and activity data were assembled from ChEMBL,15 version 21. Only compounds involved in direct interactions (target relationship type “D”) with human targets at the highest confidence level (target confidence score 9) were taken. Two different types of potency measurements were separately considered including assay-independent equilibrium constants (Ki values) and assay-dependent IC50 values. Approximate measurements associated with “>”, “300

7815 5222 2276 1369 604 69 13 2 1

15630 19135 17111 19840 17816 4574 1890 557 336

7056 4121 1345 544 114

12916 10407 5671 3646 1153

1.0 1.0 2.0 2.0 2.0 3.0 4.0 20.5 4.0

1−40 1−57 1−49 1−38 1−63 1−47 1−130 1−40 4

a

The distribution of clusters, compounds (CPDs), and targets across the global analog network is reported. Each cluster represents a unique series of analogs. “Cluster order” gives the number of compounds per cluster. For each cluster order, the total number of key compounds and clusters with key compounds, the median number of target annotations, and the minimum and maximum (min−max) are provided.

As a control calculation, compounds were also first retrieved on a per-target basis and then organized. This is often done when extracting active compounds from databases. Compared to our approach, the target-based retrieval was accompanied by a highly significant loss of information. Specifically, between one and 4570 analogs remained undetected for series originating from a total of 1006 target sets. Depending on the set, the missed analogs were active against one to 217 additional targets. For comparison, we also isolated analog series from 304 956 screening compounds from PubChem and 6225 drugs from

DrugBank. From PubChem compounds, 32 375 analog series were extracted comprising 160 446 compounds. In addition, drugs yielded 441 analog sets with 1209 compounds. Hence, a large number of analog series was also obtained from PuBChem. We determined the overlap of analog series from different sources. There was essentially no overlap between ChEMBL and DrugBank. Only one series from ChEMBL and one series from DrugBank shared a single compound. Similarly, three series from PubChem and three from DrugBank had four compounds in common. By contrast, 1204 series from ChEMBL and 1156 series D

DOI: 10.1021/acs.jmedchem.6b00906 J. Med. Chem. XXXX, XXX, XXX−XXX

Journal of Medicinal Chemistry

Article

Figure 3. Analog series and key compounds. Venn diagrams show two small series of analogs with key compounds in the center that cover all MMP relationships and substitution sites within a series as well as all target annotations. In (a) and (b), inhibitors of different matrix metalloproteases and PI3 kinase p110 subunits are displayed, respectively.

Activity Profiles. Table 1 also reports target annotations for series of increasing size. For series with up to 50 compounds, median numbers of targets per series were only one or two,

from PubChem shared a total of 3033 compounds, reflecting the fact that there is data exchange between these large repositories. E

DOI: 10.1021/acs.jmedchem.6b00906 J. Med. Chem. XXXX, XXX, XXX−XXX

Journal of Medicinal Chemistry

Article

Table 2. Promiscuity Degree of Key Compoundsa no. targets

no. CPDs

no. key CPDs

1 2 3 4 5 ≥6, ≤10 ≥11, ≤20 ≥21, ≤30 ≥31, ≤40 ≥41, ≤68

67132 17893 7205 2831 799 823 160 31 7 8

23958 6018 2258 1005 293 217 34 8 2 0

The overall large number of key compounds can be rationalized by considering analog series having only one or small numbers of substitution sites and activity against single targets. In these frequently occurring cases, multiple key compounds were typically present. By definition, key compounds are the most promiscuous analogs within a series, since no compound can have a higher degree of promiscuity. Table 2 reports the distribution of compounds and key compounds with increasing degrees of promiscuity. The majority of compounds (67 132) were only active against a single target, which also included 23 958 key compounds. In addition, 6018 key compounds were active against two targets and 2258, 1005, and 293 key compounds against three, four, and five targets, respectively. By contrast, only very small numbers of compounds had more than 10 target annotations. Taken together, the results in Tables 1 and 2 reveal that individual key compounds frequently characterized relatively large analog series with multitarget activities. Only 3075 of the more than 32 000 analog series isolated from PubChem contained key compounds. A total of 3993 key compounds (2.4%) were identified, a much lower proportion than determined for ChEMBL, which might be expected for screening data. Furthermore, 185 of the 441 series from DrugBank contained 304 key compounds (25.1%). Targets. It is emphasized that the methodology does not attempt to predict novel targets of active compounds. Rather, available target annotations of compounds are systematically mapped to analog series. Table 3 reports targets with known active compounds present in many different analog series (with single activity or multitarget activity). Not surprisingly, popular therapeutic targets including, among others, a variety of G protein coupled receptors covered most analog series. The majority of these series contained one or more key compounds, often with multitarget activities. Prospective Applications. Our methodology can be prospectively applied in different ways. As described above in detail, a systematic search for all available analog series can be carried out in compound collections of any source, including in-house compound decks, leading to a global structural organization and the identification of all existing series. Then activity data can be mapped to these series. Furthermore, another immediate application of the methodology is the search for available analogs and activity data during the early stages of compound optimization campaigns. When only a few analogs are available, or even a single active compound, the method can be effectively used to search any compound database for all

a

The distribution of all compounds (CPDs) and key compounds over different promiscuity degrees (no. targets) is reported.

although between 38 and 63 target annotations were detected for individual series, depending on their size. For series with up to 100 and 200 compounds the median numbers of targets increased to three and four, respectively, with individual series having more than 100 target annotations. Thus, many series, especially series with up to 50 analogs, were active against single targets, but multitarget activities were also frequently observed. Key Compounds. We also searched for compounds that covered all substitution sites within a series and all target annotations of analogs. These compounds were considered seriesrepresentative. Figure 3 illustrates the concept of key compounds for two small analog series. Figure 3a shows inhibitors of matrix metalloproteases, all of which share activity against matrix metalloprotease 12. Three analogs are also active against protease 13, and the key compound in the center is additionally active against 2, 9, and 14. The series consists of three subsets with varying substitution sites, all of which are present in the key compound. The series in Figure 3b also consists of three subsets, each of which contains a pair of inhibitors of different subunits of PI3-kinase p110. The key compound in the center combines the activity of the other analogs against the α, β, γ, and δ subunits. A total of 33 793 key compounds were identified in 13 180 series with activity against 1120 targets (81.0%). Thus, there were often multiple key compounds per series. Table 1 reports the distribution of key compounds over series of increasing size. For series with three to five and six to 10 analogs, 10 407 and 5671 key compounds were detected, respectively. Larger series with more than 10−50 compounds still yielded a total of 4799 key compounds. For series with more than 50 analogs, no key compounds were detected.

Table 3. Targets with Extensive Representation across Analog Seriesa target

no. CPDs

no. analog series

no. key CPDs

no. analog series with key CPDs

additional targets

serotonin transporter cannabinoid CB1 receptor dopamine D2 receptor cannabinoid CB2 receptor cytochrome P450 3A4 adenosine A2a receptor carbonic anhydrase II vascular endothelial growth factor receptor 2 serotonin 6 (5-HT6) receptor adenosine A3 receptor

1489 2007 1562 1901 1046 2428 1101 1591 1151 1946

417 397 380 378 365 345 341 338 332 324

597 716 544 639 326 472 391 590 561 391

274 292 244 281 182 210 184 204 244 182

0−38 0−5 0−38 0−5 0−31 0−7 0−11 0−18 0−38 0−38

a

Reported are the top 10 targets with active compounds participating in the largest numbers of analog series. In each case, the total number of compounds, key compounds, and additional targets per series (min−max) is also provided. F

DOI: 10.1021/acs.jmedchem.6b00906 J. Med. Chem. XXXX, XXX, XXX−XXX

Journal of Medicinal Chemistry

Article

available analogs and associated activity information. This might reveal important information during early project stages, as illustrated in Figure 4. In the example in Figure 4a, a global analog search was carried out in ChEMBL for a single inhibitor of acetylcholinesterase. A total of 13 analogs were identified, five of which were also known inhibitors of acetylcholinesterase. In addition, three analogs were detected with activity against monoamine oxidase B and four others with dual activity against monoamine oxidase A and B. Furthermore, another analog was

found to be active against an unrelated target, P-glycoprotein 1. In the example in Figure 4b, a global search was carried out for a small series of kinase inhibitors with activity against Aurora A and B. In this case, the series-based search also identified several other analogs with similar activity, one analog active against another kinase, and three others that were exclusively annotated with HERG antitarget activity, thus pointing at potential liabilities associated with the evolving series, which would merit careful follow-up investigation.

Figure 4. continued G

DOI: 10.1021/acs.jmedchem.6b00906 J. Med. Chem. XXXX, XXX, XXX−XXX

Journal of Medicinal Chemistry

Article

Figure 4. Global analog search. A prospective application of the computational approach introduced herein is illustrated. Global analog search calculations are carried out in ChEMBL for a (a) single query compound and (b) small series of analogs. In each case, identified analogs are shown and color-coded by known targets. For each target, the target name and the ChEMBL target identification (in parentheses) are given.



CONCLUSIONS In the practice of medicinal chemistry, analog series are typically defined on a case-by-case basis, which is sufficient for many applications. The study of large individual series can be

These examples illustrate that in addition to identifying all available structural analogs for given query compounds or series, the searches might also yield activity information for evolving series that is of immediate relevance for compound optimization. H

DOI: 10.1021/acs.jmedchem.6b00906 J. Med. Chem. XXXX, XXX, XXX−XXX

Journal of Medicinal Chemistry

Article

Consonni, V.; Kuz’min, V. E.; Cramer, R.; Benigni, R.; Yang, C.; Rathman, J.; Terfloth, L.; Gasteiger, J.; Richard, A.; Tropsha, A. QSAR Modeling: Where Have You Been? Where Are You Going To? J. Med. Chem. 2014, 57, 4977−5010. (3) Agrafiotis, D. K.; Shemanarev, M.; Connolly, P. J.; Farnum, M.; Lobanov, V. S. SAR Maps: A New SAR Visualization Technique for Medicinal Chemists. J. Med. Chem. 2007, 50, 5926−5937. (4) Wassermann, A. M.; Bajorath, J. Directed R-Group Combination Graph: A Methodology To Uncover Structure-Activity Relationship Patterns in a Series of Analogues. J. Med. Chem. 2012, 55, 1215−1226. (5) Zhang, B.; Hu, Y.; Bajorath, J. AnalogExplorer: A New Method for Graphical Analysis of Analog Series and Associated Structure-Activity Relationship Information. J. Med. Chem. 2014, 57, 9184−9194. (6) Maynard, A. T.; Roberts, C. D. Quantifying, Visualizing, and Monitoring Lead Optimization. J. Med. Chem. 2016, 59, 4189−4201. (7) Shanmugasundaram, V.; Zhang, L.; Kayastha, S.; de la Vega de León, A.; Dimova, D.; Bajorath, J. Monitoring the Progression of Structure-Activity Relationship Information During Lead Optimization. J. Med. Chem. 2016, 59, 4235−4244. (8) Kenny, P. W.; Sadowski, J. Structure Modification in Chemical Databases. In Chemoinformatics in Drug Discovery; Oprea, T. I., Ed.; Wiley-VCH: Weinheim, Germany, 2004; pp 271−285, DOI: 10.1002/ 3527603743.ch11. (9) Griffen, E.; Leach, A. G.; Robb, G. R.; Warner, D. J. Matched Molecular Pairs as a Medicinal Chemistry Tool. J. Med. Chem. 2011, 54, 7739−7750. (10) Wawer, M.; Bajorath, J. Local Structural Changes, Global Data Views: Graphical Substructure-Activity Relationship Trailing. J. Med. Chem. 2011, 54, 2944−2951. (11) Ghosh, A.; Dimova, D.; Bajorath, J. Classification of Matching Molecular Series on the Basis of SAR Phenotypes and Structural Relationships. MedChemComm 2016, 7, 237−246. (12) Hussain, J.; Rea, C. Computationally Efficient Algorithm to Identify Matched Molecular Pairs (MMPs) in Large Data Sets. J. Chem. Inf. Model. 2010, 50, 339−348. (13) Lewell, X. Q.; Judd, D. B.; Watson, S. P.; Hann, M. M. RECAP − Retrosynthetic Combinatorial Analysis Procedure: A Powerful New Technique for Identifying Privileged Molecular Fragments with Useful Applications in Combinatorial Chemistry. J. Chem. Inf. Comput. Sci. 1998, 38, 511−522. (14) de la Vega de León, A.; Bajorath, J. Matched Molecular Pairs Derived by Retrosynthetic Fragmentation. MedChemComm 2014, 5, 64−67. (15) Gaulton, A.; Bellis, L. J.; Bento, A. P.; Chambers, J.; Davies, M.; Hersey, A.; Light, Y.; McGlinchey, S.; Michalovich, D.; Al-Lazikani, B.; Overington, J. P. ChEMBL: A Large-Scale Bioactivity Database for Drug Discovery. Nucleic Acids Res. 2012, 40, D1100−D1107. (16) Dimova, D.; Stumpfe, D.; Bajorath, J. Systematic Assessment of Analog Relationships between Bioactive Compounds and Promiscuity of Analog sets. MedChemComm 2016, 7, 230−236. (17) Hu, Y.; Bajorath, J. Compound Promiscuity - What Can We Learn From Current Data. Drug Discovery Today 2013, 18, 644−650. (18) Hu, Y.; Bajorath, J. High-Resolution View of Compound Promiscuity. F1000Research 2013, 2, 144. (19) Hopkins, A. L. Network Pharmacology: The Next Paradigm in Drug Discovery. Nat. Chem. Biol. 2008, 4, 682−690. (20) Jalencas, X.; Mestres, J. On the Origins of Drug Polypharmacology. MedChemComm 2013, 4, 80−87. (21) Anighoro, A.; Bajorath, J.; Rastelli, G. Polypharmacology: Challenges and Opportunities in Drug Discovery. J. Med. Chem. 2014, 57, 7874−7887. (22) Wang, Y.; Xiao, J.; Suzek, T. O.; Zhang, J.; Wang, J.; Zhou, Z.; Han, L.; Karapetyan, K.; Dracheva, S.; Shoemaker, B. A.; Bolton, E.; Gindulyte, A.; Bryant, S. H. PubChem’s BioAssay Database. Nucleic Acids Res. 2012, 40, D400−D412. (23) Law, V.; Knox, C.; Djoumbou, Y.; Jewison, T.; Guo, A. C.; Liu, Y.; Tang, A. DrugBank 4.0: Shedding New Light on Drug Metabolism. Nucleic Acids Res. 2014, 42, D1091−D1097.

computationally supported, for example, by applying the scaffold or maximum common substructure concepts, which provide formally consistent reference frames for structural analysis but have limited chemical relevance. Herein we have introduced a generally applicable computational approach to extract analog series from large compound data sets, which combines MMP and molecular network analysis and enables visualization of clusters representing different series. By design, the methodology is not series-centric but operates at a global level. It comprehensively and consistently accounts for all analog relationships that can be captured by the further extended and retrosynthetic MMP formalism. Following this approach, prior knowledge of core structures or substitution sites is not required, which are automatically determined on the basis of comprehensive compound comparisons, applying retrosynthetic rules and size restrictions for the generation of R-groups. Analog series can be extracted from data sets with multitarget activities, as reported herein, and key compounds be identified that contain substitution site and activity information representative for an entire series. In our analysis of the current release of ChEMBL, more than 17 000 analog series with varying composition, size, and activity annotations were identified. Rather unexpectedly, more than 13 000 of these series contained key compounds. As defined herein, key compounds delineate currently accessible analog space. All identified analog series and key compounds from ChEMBL, PubChem, and DrugBank are made freely available in an organized form as a part of our study,28 which provides a substantial knowledge base for different practical applications. For example, individual compounds can be searched against these series. Alternatively, global analog searches can be carried out for given queries in any compound repositories. Matching analogs provide hit expansion candidates and SAR information. Moreover, for a target of interest, all currently available series of analogs can be obtained, the SAR information associated with these series analyzed, and multitarget activities of compounds considered.



AUTHOR INFORMATION

Corresponding Author

*Phone: +49-228-2699-306. Fax: +49-228-2699-341. E-mail: [email protected]. Author Contributions †

The contributions of D.S. and D.D. should be considered equal.

Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS The use of OpenEye’s toolkits was made possible by their free academic licensing program. D.S. is supported by Sonderforschungsbereich 704 of the Deutsche Forschungsgemeinschaft.



ABBREVIATIONS USED MMP, matched molecular pair; MMS, matching molecular series; QSAR, quantitative structure−activity relationship; RECAP, retrosynthetic combinatorial analysis procedure; SAR, structure−activity relationship



REFERENCES

(1) The Practice of Medicinal Chemistry, 3rd ed.; Wermuth, C. G., Ed.; Academic Press/Elsevier: San Diego, CA, U.S./London, U.K., 2008. (2) Cherkasov, A.; Muratov, E. N.; Fourches, D.; Varnek, A.; Baskin, I. I.; Cronin, M.; Dearden, J.; Gramatica, P.; Martin, Y. C.; Todeschini, R.; I

DOI: 10.1021/acs.jmedchem.6b00906 J. Med. Chem. XXXX, XXX, XXX−XXX

Journal of Medicinal Chemistry

Article

(24) Berthold, M. R.; Cebron, N.; Dill, F.; Gabriel, T. R.; Kötter, T.; Meinl, T.; Ohl, P.; Sieb, C.; Thiel, K.; Wiswedel, B. KNIME: The Konstanz Information Miner. In Studies in Classification, Data Analysis, and Knowledge Organization; Preisach, C., Burkhardt, H., SchmidtThieme, L., Decker, R., Eds.; Springer: Berlin, Germany, 2008; pp 319− 326, DOI: 10.1007/978-3-540-78246-9_38. (25) OEChem TK; OpenEye Scientific Software, Inc.: Santa Fe, NM, U.S., 2012. (26) Hu, X.; Hu, Y.; Vogt, M.; Stumpfe, D.; Bajorath, J. MMP-cliffs: Systematic Identification of Activity Cliffs on the Basis of Matched Molecular Pairs. J. Chem. Inf. Model. 2012, 52, 1138−1145. (27) Shannon, P.; Markiel, A.; Ozier, O.; Baliga, N. S.; Wang, J. T.; Ramage, D.; Amin, N.; Schwikowski, B.; Ideker, T. Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks. Genome Res. 2003, 13, 2498−2504. (28) Zenodo. https://zenodo.org/record/59688.

J

DOI: 10.1021/acs.jmedchem.6b00906 J. Med. Chem. XXXX, XXX, XXX−XXX