Privileged Substructures to Modulate Protein ... - ACS Publications

Subscriber access provided by Caltech Library

Article

Privileged substructures to modulate protein-protein interactions Nicolas Bosc, Mélaine A Kuenemann, Jérome Bécot, Marek Vavrusa, Adrien H Cerdan, and olivier sperandio J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.7b00435 • Publication Date (Web): 18 Sep 2017 Downloaded from http://pubs.acs.org on September 19, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Information and Modeling is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 42

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Privileged Substructures to Modulate ProteinProtein Interactions Nicolas Bosc1,2,3,4, Mélaine A1,2. Kuenemann1,2, Jerome Bécot1,2, Marek Vavrusa1,2, Adrien H. Cerdan1,2, and Olivier Sperandio1,2,3,4,* 1 Inserm, U973, Paris 75013, France 2 Université Paris Diderot, Sorbonne Paris Cité, UMRS 973 Inserm, Paris 75013, France 3 Structural Bioinformatics Unit, Department of Structural Biology and Chemistry, Institut Pasteur, 25-28 rue du Dr Roux 75015 Paris 4 CNRS UMR3528, Institut Pasteur, 25-28 rue du Dr Roux 75015 Paris *Author to whom all correspondence should be addressed: Olivier Sperandio: email: [email protected]

ABSTRACT Given the difficulties to identify chemical probes that can modulate protein-protein interactions (PPIs), actors in the field start to agree on the necessity to use PPI-tailored screening chemical collections. However, which type of scaffolds may promote the binding of compounds to PPI targets remains unclear. In this big data analysis, we have identified a list of privileged chemical substructures that are most often observed within inhibitors of PPIs. Using molecular

ACS Paragon Plus Environment

1


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 42

frameworks as a way to perceive chemical substructures with the combination of an experimental and a machine-learning based predicted dataset of iPPI compounds, we propose a list of privileged substructures in the form of scaffolds and chemical moieties that can be substantially chemically functionalized and do not present any toxicophore nor Pan-assay interference (PAINS) alerts. We think that such chemical guidance will be valuable for medicinal chemists in their attempt to identify initial quality chemical probes on PPI targets.

INTRODUCTION Amongst macromolecular interactions, protein-protein interactions (PPIs) play a fundamental role in cell signaling. Even though their exact number remains undefined, estimations of the number of PPIs in Human varies from 130,0001 to 650,0002. Failure or deregulation of their mechanisms, or for instance host/pathogen protein interactions, often leads to diseases3. Therefore, targeting these interactions for therapeutic interventions is of high medical relevance. For those reasons, considerable efforts have been made to develop small molecules inhibiting PPIs although these targets still represent a major challenge for drug discovery4,5. Unlike G protein–coupled receptors (GPCRs), non-kinase enzymes, or protein kinases, PPIs are often described as being devoid of tractable druggable (aka ligandable) cavities6. Those relatively flat and hydrophobic interfaces make the binding of small molecule inhibitors more challenging, and the identification of such compounds one of the most difficult tasks in molecular medicine7,8. Yet, despite those difficulties, a growing number of inhibitors of protein-protein interactions (iPPIs) is successfully identified using procedures ranging from pure serendipity to more rational design. Such data can be obtained in several online databases9–11, including TIMBAL and iPPIDB, which allow the scientific community to quickly access available inhibitors on a given PPI. Those success stories represent valuable data to learn from and to attempt to rationalize the


2

Page 3 of 42

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


chemical and physicochemical features, so-called “chemical space”, characterizing the small compounds capable to bind and modulate PPI targets. Thus, several trends or rules of thumbs have emerged to characterize the contour of the known iPPI chemical space, including some from our group12–17. They all highlight the heavier and more hydrophobic character of iPPI compounds with respect to more conventional ligands12 and also molecular structures leaning on ring aromaticity14,16. Those properties are unfortunately known to impede drug development and should therefore not been used to drive compound optimization. Thankfully, other studies also describe the importance of molecular shapes and specific architectures13,17. Additionally, the shape was also pinpointed as an important factor to mimic protein-protein recognition18,19, as well as the globularity and the distribution of hydrophilic and hydrophobic interacting regions20. Thus in the past years, several studies have presented methods to distinguish iPPI from noniPPI13–15,21 in an attempt to build PPI-compliant chemical libraries. They share the same approach: 1/ collection of actual iPPI and non-iPPI into a training set; 2/ calculation of molecular properties on the training set using their chemical structure and pharmacological activities; 3/ construction of a statistical model, using so-called machine learning technologies, by taking into account the information from the training set to extrapolate to the prediction of new putative PPI inhibitors from any commercial catalog. But conversely, very few studies19,22 have translated those results into actual chemical guidance for the medicinal chemists. Indeed, no study has really reported the existence of privileged structures or chemotypes (specific to PPI or not) that could also be applied to the pharmacological modulation of PPI targets. The concept of privileged structures was introduced to define structures found in ligands binding different receptors23, such as benzodiazepines24. It eventually evolved to also cover structures found in ligands targeting a given therapeutic area


3


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 42

(e.g anti-inflamatory25, antimicrobial26, analgesic27) or even only one given protein family28–31. However, which type of chemical structures or scaffolds may promote the binding to PPI targets remains unclear. Given the difficulties to identify iPPI compounds on a regular basis, the perspective of characterizing privileged structures is therefore of major importance. Having such chemical information to start from would clearly boost the identification of initial quality chemical probes, especially if such information is combined with precise consideration with regard to Absorption Distribution Metabolism Excretion / Toxicity (ADME/Tox) profiles and Pan-assay interference (PAINS) alerts. In this study, we address this important question by considering molecular frameworks as a way to identify privileged structures using both an experimental dataset of iPPIs (derived from TIMBAL and iPPI-DB) and a dataset of predicted iPPIs using machine learning technologies, derived from millions of chemical compounds. Molecular frameworks were introduced by Bemis and Murcko32. In their definition, they considered molecules into rings, linkers and side chains and defined molecular frameworks as the union of rings and linkers. This molecular representation is particularly helpful to group molecules that share the same molecular architecture. It is especially relevant given the fact that it is precisely molecular architectures that have been shown as essential for PPI modulation13,17. Side chains being often a source of diversity or specificity, considering molecules without them allow to consider the essence of molecular chemotypes. Thus, the analysis of molecular frameworks from a combination of several datasets: experimental and predicted datasets of iPPIs; hit compounds on conventional targets (enzyme, etc.); or drugs under development has allowed us to list privileged substructures that can be substantially chemically functionalized and that do not present any known toxicophores nor PAINS alerts.


4

Page 5 of 42

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


MATERIALS AND METHODS Molecular Datasets Dataset of inhibitors of Protein-Protein Interactions We built a dataset of iPPIs by combining the 1,756 compounds from iPPI-DB33 (version 2016) with the 8,107 compounds from TIMBAL10,34 (version May 2015). In order to handle the most accurate datasets, we applied several criteria. Hence, the compound targets had to be explicitly identified and the compounds have to inhibit the interaction between its targets, rather than stabilizing it. Only activities referring to Kd, Ki, IC50 and EC50 were kept and an activity threshold of 30µM was retained. We developed a cleaning and standardization protocol in Pipeline Pilot (version 9.0.2) (www.biovia.com) to exclude compounds containing atoms different from C, N, O, S, P and halogens, to keep only small molecules (peptides and macrocycles are not considered in this study) and to remove potential solvent molecules or counter ions. Eventually, we removed every duplicate molecule that may exist in both iPPI-DB and TIMBAL, and obtained an experimental dataset of iPPIs containing 3,051 compounds (1,756 from iPPI-DB and 1,295 from TIMBAL). A diversity set selection was additionally performed on this dataset. Compounds were first described using FCFP4 fingerprint computed by Pipeline Pilot. Then we performed a hierarchical clustering using Ward method and the Tanimoto distance and kept cluster centers using a similarity threshold of 0.7. Hence, we obtained 719 chemically diverse iPPIs. Dataset of Non-Inhibitors of Protein-Protein Interactions As a set of non-iPPIs, we used two datasets: BindingDB35 and BDM. BindingDB is a web accessible database that collects affinities/activities of small molecules on protein targets. In this study, we considered only compounds targeting the top 100 most studied proteins (version 2011). These proteins belong to five different families: nuclear receptors, G protein–coupled


5


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 42

receptors, non-kinase enzymes, kinases and ion-channels. Inhibitors that bind these proteins are less likely to bind PPIs due to the major differences between the pockets of “conventional” targets and the relative flat pockets of PPIs7,20. BDM, as described in

36

, is a selection of

compounds extract from three different chemical commercial libraries: Asinex, ChemDiv and Enamine. All BDM compounds have been tested as negative on three different PPI targets (BLC2/Bim, XIAP/smac and MDM2/p53) using a fluorescence polarization assay. Using the same cleaning and standardization protocol than for iPPIs, we obtained a non-iPPI dataset of 83,572 compounds (44,228 from BindindDB and 39,344 from BDM). In the same manner, we performed a diversity selection and 35,686 compounds were obtained. Exploratory datasets ZINC. The ZINC database37 contains commercially available compounds from more than 300 vendors. It is accessible from the web and allows the selection of different subsets of molecules depending on several criteria (lead-like, fragment-like, drug-like…). For the aim of this study, we used the “All Purchasable” subset which comprised more than 16 million of unique molecules in 2012. ChEMBL. The ChEMBL database is one of the largest publicly available bioactivity database38. The version 20 contains information on more than 1.4 million compounds and 10,000 targets. At total, more than 13 million of bioactivities from the literature are freely available for the scientific community. Additionally, we used a subset, called ChEMBLMT, of around 250,000 compounds for which a main target has been identified according to the data contained in the whole database. For each molecule contained in the database, we searched all IC50, EC50, Kd, Ki values associated. Only values less than 30 µM and related to binding assays were kept in order to clearly identify the main protein target. If more than one target were found, we finally conserved the one with the largest number of occurrences.


6

Page 7 of 42

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


MDDR. MDDR (www.biovia.com) provides information for therapeutic molecules that are currently under development, either in biological testing (early development), preclinical phase, clinical phases, or launched drugs. The version 2015 used in this study gathered more than 220,000 compounds. Molecular framework calculation In order to identify common structures or substructures in our datasets, molecular frameworks were generated, in Pipeline Pilot, using an approach close to the protocol introduced by Guy Bemis and Mark Murko32. Rings and linkers (set of atoms linking two rings) were kept. Exocyclic and exolinker double bonds were conserved. Atom identity was conserved on the resulting molecular framework as described on the Figure 1. We generated molecular frameworks for the molecules from iPPI, ZINC, ChEMBL, ChEMBLMT and MDDR. Scaffold Hunter In order to navigate the chemical space of our datasets of molecular frameworks, we used the java-based open source software Scaffold Hunter (http://scaffoldhunter.sourceforge.net)39. The tool allows to interactively visualize, as a tree, a compound collection using a hierarchical molecule deconstruction. In our study, we used Scaffold Hunter on our already calculated molecular frameworks to analyze hierarchical relationships and ring assembly inheritance within the tree. ADME/Tox Filters PAINS (pan assay interference compounds)40 (filters A, B and C), Eli Lilly MedChem rules41 (relaxed option) and the toxicophore filter were calculated using the online program FAFDrugs342. All of the parameters were set to default values.


7


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 42

Machine learning models Molecular descriptors Using the MOE software (Molecular Operating Environment, Chemical Computing Group Inc.)43, we calculated a set of 186 2D descriptors and 112 3D descriptors. 2D descriptors were calculated using the 2D representation of the molecules. However, the calculation of the 3D descriptors requires to have the active conformation of each molecule. In the absence of such information for the vast majority of the molecules in our datasets, we computed several 3D conformations for each molecule. Basically, we calculated up to 50 conformations for each molecule using a stochastic method and the MMFF94x force field. We set up the parameters to reject the generation of a new conformation after 20 failures and the maximum number of attempts to generate conformations could not exceed 5,000. Energy minimization of the conformation ended after 200 iterations and finally a conformation is rejected if its root mean square deviation is less than 0.75 compare to the other conformations. The 3D descriptors were then calculated on all the conformations for a given compound and the mean value for each descriptor was kept as the final value. A total of 298 descriptors were calculated for each molecule of each dataset. Due to the impossibility of calculating at least one conformation for some molecules, the iPPI and the non-iPPI datasets finally contained respectively 3,033 and 82,394 molecules. Descriptor selection A descriptor selection procedure was performed on the 298 descriptors in order to select the most discriminative ones between iPPIs and non-iPPIs. We first removed invariant molecular descriptors. Then, for each remaining descriptor, we compared the distributions of the iPPI and non-iPPI populations. The significance of the difference was estimated by looking at the p-values obtained with a comparison test. The comparison test consisted in a Student test if the two


8

Page 9 of 42

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


populations follow a normal distribution according to Shapiro test and have different variances according to a Fisher test. If not, we applied the non-parametric test of Mann-Witney-Wilcoxon. We only kept descriptors with a p-value lower than 0.001 in order to exclusively keep the most discriminative ones. We then checked for correlations between the remaining descriptors. Correlation matrix was computed and descriptors with an absolute Pearson correlation coefficient greater than 0.9 were gathered. Amongst the correlated descriptors, the most discriminative was selected according to the p-value from the same comparison test abovementioned. To prevent a selection of descriptors that would be too correlated related to molecular size and hydrophobicity, and to highlight new insights into the chemical space of iPPIs, we rejected the descriptors that presented an absolute correlation value greater than 0.6 with the following three descriptors: molecular weight, logP and TPSA. To exclude the descriptors whose significance was due to over-represented chemotypes, we applied this protocol (Supplementary Figure S2) on the full iPPI/non-iPPI datasets and their corresponding diverse subsets. We kept only the descriptors that were found in both cases. Finally, our last selection step consisted in making sure the descriptors were also significantly discriminant for at least 60% of the PPI targets against the non-iPPIs. Eventually, we highlighted 40 molecular descriptors that are detailed in the Supplementary Table S3. Learning methods Decision trees. In classification approaches, decision tree algorithm aims to classify observations according to their property values. In the modeling part, a tree is built in which for each node, observations are split depending on the value of the selected properties. Each leaf of the tree represents one of the observation classes labeled with an associated probability obtained following the corresponding branch, i.e. the succession of node from the root of the tree until the


9


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 42

leaf. For the aim of this study, we used the J48 decision tree algorithm as it is implemented in Weka (version 3.7)44,45. Random Forest (RF). The RF algorithm builds a user-defined number of decision trees based on random numbers of observations and variables to classify the observations. Each new prediction is the result of the predictions provided by each of the individual trees. We used the RF implemented in Weka (version 3.7)44,46. The optimal numbers of trees and of variables selected by the trees were both tuned by grid search. JRip. JRip is an implementation in Weka (version 3.7) of the Repeated Incremental Pruning to Produce Error Reduction (RIPPER)44,47. As the two previous methods, the algorithm can be applied for classification. In a two classes example, JRip builds rules, one at a time, based on the observation properties in order to separate them. Once a rule is found, the observations covered are removed. The process is repeated until all observations of a class have been used. Support vector machines (SVM). SVM’s algorithm consists in projecting the observations in a high dimensional feature space in which they can be split depending of the class they belong to48. In this study, we used a C-SVM implementation presented in the R statistical software package Caret49,50. Using the radial basis function, the hyper-parameters cost and gamma were tuned by grid search. Model creation Using the 40 descriptors we selected, the four classification methods (J48, RF, JRip and SVM) were applied to predict the class of the molecules: iPPI or non-iPPI. The whole iPPI and noniPPI datasets were combined to form the final dataset. The selected descriptors were normalized to be centered to zero mean and scaled to unit variance. We partitioned the final dataset into a training set (70%) and a test set (30%) taking care of conserving the same proportion of iPPI and


10

Page 11 of 42

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


non-iPPI. Note that as described in the previous section “Descriptor selection”, both the training and the test set were used for the selection of the 40 descriptors. Each method parameters were optimized by grid search using a five-fold cross validation on the training set. During this process, for each combination of parameters, a model is trained using four folds and the fifth fold is then predicted. This procedure was repeated five times for each combination of parameters, each time holding out a different fold. We selected the model with best mean F1 score, but the mean sensitivity, the mean specificity and the mean enrichment factor (EF) were also considered. The test set was finally evaluated using the optimized parameters of each model. Model validation For each step of the validation, we evaluated the performance of our models by calculating the sensitivity, the specificity, the F1 score and the enrichment factor (EF). Here, EF is defined as the ratio between the proportions of true active compounds before and after the use of a classification model on a given library14. The more EF is above 1, the better it is. It indicates that there is EF times more active compounds in the focused chemical library than in the initial library, even though some active compounds may have been lost due to a lack of precision of the model. These four metrics are calculated from the confusion matrix. The confusion matrix is established by comparing the predicted class to the actual class of each molecule (Supplementary Table S3).

+ = +

=

2

2 + +

+ + + = × +

+ 1 =


11


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 42

For the validation, using the optimized parameters we first applied the models to the training set that was used to build them (70% of the data). Then, we performed a five-fold cross validation on the training set and calculated the mean values of the four metrics. Finally, the performance of the models was evaluated by predicting the test set (the 30% remaining data). In addition, using the training set, we also performed a response permutation testing, so-called Yscrambling. It consists in a random permutation of the variable to predict, and is used to test if the results are not due to a chance correlation51. The procedure was repeated 30 times with different random permutations, and mean values over all the repetitions were calculated.

RESULTS AND DISCUSSION Although it is known that the peculiar properties of iPPIs make them differ from more conventional pocket binders (e.g GPCR inhibitors, enzyme inhibitors…), the translation of those properties into chemical guidance has not been accomplished yet. Hence, we examined the presence of privileged structures by analyzing structurally related compounds with respect to iPPIs. Rather than employing a distance metric (e.g fingerprintbased) between experimentally identified iPPIs and compounds present in reference molecular databases, such approach would likely provide interesting results, we considered their molecular frameworks (Figure 1). Such representation is particularly useful to detect chemical series or compounds sharing the same combinations of ring assemblies and linkers but differ due to different terminal chemical moieties. We inspected such molecular frameworks within both our experimental iPPI dataset and a dataset of predicted iPPIs proceeding from the use of speciallydesigned machine learning models applied on the purchasable ZINC database. Location of Figure 1 Figure 1: Detection of Bemis and Murcko Molecular Frameworks


12

Page 13 of 42

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


iPPI molecular frameworks in other datasets As a preliminary step in our search for privileged substructures, we first evaluated the presence in reference datasets of the 1,282 unique molecular frameworks derived from our experimental dataset 3,033 iPPIs. We selected three different reference datasets. First, we looked into MDDR which collects information about launched or under development drugs. It gives precious indications about development phase reached by a given compound. Second, we looked into ChEMBL that contains information about compounds published in the literature. It is also a snapshot of molecules that have been experimentally tested on a main target for the ChEMBLMT subset. Many compounds from iPPI-DB or TIMBAL are included in ChEMBL. Third, we investigated the purchasable dataset of the ZINC database which is a container of commercially available compounds from more than 300 vendors. It is one of the largest molecular databases freely available. We calculated the molecular frameworks of all molecules contained in our experimental iPPI dataset and in the ChEMBL, MDDR and ZINC databases. For each set, we kept one occurrence of each molecular framework. We then compared every dataset against each other and counted the number of molecular frameworks in common (Supplementary Table S1). We found that more than 75% of iPPI molecular frameworks are also present in ChEMBL, around 19% in MDDR and almost 22% in ZINC. Extrapolating this result to the compounds containing those frameworks, we can state that a large majority of iPPIs are also in ChEMBL, as we anticipated. We looked more specifically the development phase of the 19% iPPI molecular frameworks present in MDDR (Figure 2). More than 50% of the corresponding molecules are at the stage of biologically testing, 23% are preclinical and 14% are in phase I/II/II. Finally, 7% of the identified frameworks are part of drugs currently on the market. Although only 19% of iPPI


13


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 42

frameworks are found in MDDR, this demonstrates that those can be found at every stage of drug development with similar proportions to those of conventional drug candidates. Location of Figure 2 Figure 2: Proportions of experimental iPPI molecular frameworks found in common in MDDR across the different development phases.

Finally, 22% of the iPPI molecular frameworks were also found in molecules from ZINC, which represents around 930,000 molecules. This ratio is close to the proportions of compounds from ChEMBLMT within ZINC (21.46%), for which ChEMBL compounds have a clearly identified main target. Identification of privileged molecular frameworks among iPPI compounds Studies have shown the heterogeneity of protein-protein interfaces, therefore looking for privileged structures of iPPIs may seem irrelevant. However, current iPPIs have in common peculiar properties such as, amongst other things, a higher molecular weight and a higher lipophilicity compared to approved drugs36,52. More interestingly, they present a specific “architecture” characterized by a higher globular shape and a specific distribution of their polar and hydrophobic interacting regions20. Hence, looking for privileged structures that could favor and maximize these properties is highly relevant to prioritize compound selection in the design of PPI-compliant chemical libraries. To this end, we used again the 1,282 unique molecular frameworks of the 3,033 iPPI compounds of our experimental iPPI dataset. We kept only, either the ones that were found for more than one PPI target, or the ones with more than ten occurrences for a given PPI target. These rules aimed to collect only molecular frameworks that could be privileged structures either because they have been used to inhibit at least two different PPI targets (Type-1) or because they


14

Page 15 of 42

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


represent a broadly confirmed chemotype (at least ten occurrences) on a specific PPI target for which some structure activity relationships (SAR) could be established (Type-2). Hence, respectively 6 Type-1 and 51 Type-2 molecular frameworks were identified. From this set, we removed the molecular frameworks constituted of only one ring, such as phenyl or pyrrolidine which are very conventional rings that would bring poor chemical insights into the privileged aspect of those substructures. We also manually excluded recursive molecular frameworks, i.e substructures contained within other molecular frameworks. Particular attention was made to reject too large structures or those that could be considered as “almost-full-compounds” due to a succession of rings that force the algorithm to nearly keep the entire iPPI compound as the molecular framework. Hence several very large frameworks of BCL2 inhibitors were not taken into account. This selection led to the selection of 38 molecular frameworks (Figure 3), including 4 Type-1 and 34 Type-2 molecular frameworks respectively. Location of Figure 3 Figure 3: iPPI privileged structures. Molecular frameworks obtained from the experimental iPPI dataset with their associated PPI targets, their occurrence, and the number of different chemical series in which they were found. Molecular Weight (MW) and a score of synthetic tractability (rsynth) were calculated using the software MOE43. Rsynth ranges from 0, low probability that the molecule can be synthetized, to 1, high probability that the molecule can be synthetized. IDs marked with a star indicate molecular frameworks that were also found in ChEMBLMT or MDDR.

We then compared those 38 molecular frameworks to the molecular frameworks found in the MDDR dataset and also in the ChEMBLMT subset of around 250,000 compounds related to a main target. These comparisons aimed to investigate both the specificity of those frameworks and whether some have already reached clinical development. Amongst the 34 Type-2 molecular frameworks, 31 were interestingly specific to the experimental iPPI dataset and therefore absent from any of the compounds from ChEMBLMT and MDDR. Amongst them, 19 stem from ligands belonging to three highly represented PPI targets: BCL2, MDM2 and XIAP. Those three targets are particularly well populated in the two


15


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 42

databases TIMBAL and iPPI-DB9,10 from which they were extracted. The 12 remaining identified structures come from 11 distinct PPI targets: annexinA2, BRD, CD80, CTNNB1, cyclophilins, GP120, HIF, ITGAL, neuropilin, RAD51 and TAK1. Amongst the 7 frameworks (4 Type-1 and 3 Type-2) that were also present in MDDR and ChEMBLMT, we identified: biphenyl, pentacyclic triterpene or flavone that occur in inhibitors of different enzymes and receptors (Figure 3, IDs: 27*, 82*, 411*, 666*, 946*, 1061*, 1085*). Biphenyl was one of the first identified privileged structures, its presence was demonstrated in many protein ligands53, and it has often been highlighted as an efficient binder in fragment-based screening campaigns against PPI targets. Although there are not specific to iPPI, these seven molecular frameworks have also proven to be useful on PPI targets and are found in molecules that reached clinical trials. Some are even parts of launched drugs. Interestingly, only 4 of the molecular frameworks we identified are found in several PPI targets (Type-1). Those that match this condition are also those we found in non-iPPIs (Figure 3, IDs: 27*, 666*, 946* and 1061*). Given this low number, these results tend to give evidence of a high singularity in the structure of iPPI compounds and that the successful examples of potent hits have been identified at the cost of substantial medicinal chemistry efforts. This illustrates that there is an urgent need to identify privileged molecular frameworks that could be chemically functionalized for different types of PPI targets. Identification of PPI privileged molecular frameworks using machine learning technologies Using our experimental dataset of iPPI compounds, we have identified 38 molecular frameworks that have demonstrated their biological effects on PPI targets (Figure 3). We have seen previously that around 930,000 molecules from the ZINC purchasable database possess one of those molecular frameworks. This top-down approach for identifying privileged structures,


16

Page 17 of 42

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


from framework to compounds, does not make those compounds necessarily putative iPPI compounds. This simply highlights that those synthetically tractable compounds share important chemical features with experimental iPPIs and that their framework may represent a good starting point to design PPI-compliant chemical libraries. In order to identify more PPI privileged molecular frameworks especially those that could be chemically functionalized for several PPI targets, we used machine learning technologies. Using this time, a bottom-up approach, from compounds to frameworks, we built machine learning models to extract putative iPPIs from the ZINC purchasable database and derived from these predicted compounds privileged molecular frameworks. Indeed, using a list of 40 molecular descriptors, statistically specific to iPPI, we trained several machine learning models (J48, JRip, RF, and SVM) on a training set composed of our experimental iPPIs dataset on one hand and a set of non-iPPIs on the other hand (see Materials and Methods). The models were evaluated on the training set, within a cross-validation procedure, and finally, once optimized, compared to each other on a test set to assess their predictive capability. Results show a good ability of all the models to distinguish iPPIs from noniPPIs (Table 1). To ensure that these results were not due to correlation chance, we performed for each method 30 Y-scrambling procedures. As expected, in the four cases we observed a dropin sensitivity and an increase of the false positive prediction confirming that the good predictions were not due to chance.


17


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 42

Table 1: Statistical performances of the machine learning models Methods J48

JRip

RF

SVM

Validation Training set 5-CV Y-scrambling Test set Training set 5-CV Y-scrambling Test set Training set 5-CV Y-scrambling Test set Training set 5-CV Y-scrambling Test set

Sensitivity 0.88 0.71 0.03 0.74 0.77 0.70 0.03 0.70 1.00 0.71 0.04 0.71 0.93 0.74 0.03 0.83

Specificity 0.99 0.99 0.97 0.99 0.99 1.00 0.97 0.99 1.00 0.99 0.97 0.99 0.99 0.99 0.97 0.99

F1-score 0.92 0.74 0.03 0.76 0.85 0.77 0.03 0.77 1.00 0.83 0.04 0.83 0.96 0.80 0.03 0.89

EF 26.8 21.65 1.0 22.0 26.5 24.4 1.0 24.3 28.1 27.8 1.0 27.7 28.1 23.9 1.0 26.7

Training set (70%), 5-fold Cross Validation (5-CV), Y-scrambling, and Test set (30%)

Results of the cross validation and of the test set demonstrated a good ability of the four methods to classify iPPI and non-iPPI, with good sensitivities, specificities, F1-Score, and enrichment factors (EF). Therefore, we combined the four methods for the prediction of iPPIs within the ZINC purchasable database. Very low percentages of the ZINC database were predicted as putative iPPI compounds. Indeed, 0.8%, 0.5%, 0.02% and 0.4% of the compounds were selected by J48, JRip, RF and SVM respectively. Still, these low percentages represent thousands of compounds compared to the initial 16 million compounds from the ZINC purchasable dataset. Then, we grouped those predicted compounds according to their shared molecular framework. In order to identify only PPI-compliant molecular frameworks that can be substantially chemically functionalized, first we only considered molecular frameworks shared by at least 500


18

Page 19 of 42

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


different compounds. Second, we only kept frameworks with a percentage of predicted iPPIs, within the group, greater than the global percentage of predicted iPPIs as calculated onto the whole ZINC database. Thereby, this step allowed us to only collect molecular frameworks that are, by their own structure and chemical functionalization, most often associated with predicted iPPI compounds. Using the FAF-Drugs3 web server42, we paid attention not to select molecular frameworks that could interfere with biological assays, so we filtered out pan-assay-interfering substructures40 (PAINS), frameworks rejected by the so-called Eli Lilly rules41 (relaxed option), and removed frameworks containing the most documented toxicophores. PAINS filters were developed considering three levels of interfering substructures depending on the number of occurrences that were observed in experimental screenings. Even though, there are still some debates about PAINS54, their detection can shed light on potential issues with some chemical substructures. Similarly, Eli Lilly developed a set of 275 rules to remove compounds that may lead to invalid screening results, according to their observations over the years. These include, amongst others, promiscuous compounds, ADMET interfering compounds and unstable compounds. The analysis rejected 15% of the predicted iPPI friendly molecular frameworks because they were PAINS, a result in agreement with recent observations36. In addition, 22% of the remaining compounds were rejected by the Ely Lilly rules and the toxicophore filter of FAF-Drugs3. This led to 447 molecular frameworks, likely to be safe for drug development and overrepresented in predicted iPPI compounds, and that can be substantially functionalized. Amongst these 447 frameworks, 12 of them (Figure 4) were also present in our experimental iPPI dataset with two of them characterized as privileged structures in our previous paragraph (Figure 3, IDs: 411* and 946*).


19


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 42

Location of Figure 4 Figure 4: 12 molecular frameworks shared by the experimental and predicted iPPI datasets.

In addition, 219 of those frameworks (48%) are included in ChEMBLMT. Interestingly, some of these frameworks correspond to iPPIs, more precisely, inhibitors of MCL-1, GP120 and of the PB1 domain of MEK5 (Figure 5). Location of Figure 5 Figure 5: Examples of molecular frameworks of predicted iPPI within the ZINC database. Confirmations have been made by looking at the presence of these molecular frameworks in iPPI compounds within the ChEMBL database.

Moreover, a total of 174 of those molecular frameworks (38%) were found in MDDR. Looking into more details, we observed that a clinical phase had been reached for each of them (Figure 6). Interestingly, and comparing to Figure 2, we observed the same proportions of the different phases with a majority of compounds currently in biological testing (49%), 23% of the molecules in preclinical and decreasing proportions of compounds between Phase I (8%), Phase II (6%), and Phase III (2%). Finally, we observed that 8% of the corresponding molecules are actual launched drugs. Location of Figure 6 Figure 6: Proportions of predicted privileged iPPI molecular frameworks found in common in MDDR across the different development phases.

Extending the list of PPI privileged structure using hierarchical scaffold classification In our search for finding privileged structures of iPPIs, we further explored our experimental and predicted datasets of molecular frameworks of using Scaffold Hunter39. The 447 molecular frameworks retrieved from the prediction made on the ZINC database were added to a file containing the 1282 unique molecular frameworks from our experimental iPPI dataset. Each molecular framework was tagged with its origin and its PPI target (only for the experimental


20

Page 21 of 42

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


iPPI dataset). The file was read in Scaffold Hunter. This software proceeds to a hierarchical deconstruction of the molecules (here molecular frameworks). It reads compound structures and bioactivity data, generates compound scaffolds, correlates them in a hierarchical tree-like arrangement, and annotates them with bioactivity. Brachiating along tree branches from structurally complex to simple scaffolds allows identification of new ligand types and common substructures. We used it to find common substructures (portions of molecular frameworks) between the experimental and the predicted iPPI datasets. The rationale behind this approach is that a substructure (here a partial molecular framework) which is shared by different PPI targets (experimental iPPI dataset) is putatively a good starting point to be chemically functionalized for different compounds for the modulation of several PPI targets. The same is true if a given substructure is at the intersection of an experimental branch and a predicted branch of molecular frameworks. A list of 44 chemical substructures resulting from the analysis using Scaffold Hunter is shown below (Figure 7). Thus, three types of substructures have been identified. By analogy to the Type-1 molecular frameworks cited above, Type-1 substructures are chemical moieties that are localized within the Scaffold Hunter tree at the intersections of branches leading to the experimental iPPI frameworks corresponding at least to two different PPI targets, making them privileged substructures that can be chemically functionalized for several PPI targets. Type-3 substructures are chemical moieties that are localized within the Scaffold Hunter tree at the intersections of branches leading to at least one experimental iPPI framework and one predicted iPPI framework. Finally, Type-4 substructures are substructures being both Type-1 and Type-3 substructures.


21


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 42

Location of Figure 7 Figure 7: List of Type-1, Type-3, and Type-4 substructures. List of Type-1, Type-3, and Type-4 substructures that were found at key branch intersections within the Scaffold Hunter tree derived from the combination of the experimental and predicted sets of molecular frameworks.

Presence of terminal Phenyl groups as a chemical pattern In inspecting the PPI privileged molecular frameworks and substructures listed above, we noticed a consistent chemical pattern concerning the combination of rings and ring assemblies. Indeed, looking at both full molecular frameworks (447 predicted privileged molecular frameworks) or the 44 partial molecular frameworks (as determined with Scaffold Hunter), it seemed that there was a persistent type of use of the Phenyl group within those structures. Phenyl is undoubtedly the most prominent group used in medicinal chemistry regardless of the type of target. It is no exception with PPI targets. Nevertheless, the proportion of Phenyl groups is higher in iPPI compounds than in conventional hits, leads, or drugs. It is therefore not surprising to observe biphenyl and benzylbenzene in our list of privileged substructures. But more specific are the use and localization of those Phenyl groups within our frameworks and substructures. Strikingly, they seem to most often occur as terminal rings in the structure with no further ring assembly beyond them. As terminal ring, they can be linked to the rest of the structure by different atom types. Moreover, as part of a framework, they can be or not chemically functionalized with side chains but rarely with an extra ring system. To check this hypothesis, we measured the proportions of full molecular frameworks in which a terminal phenyl is observed (Table 2) for the experimental iPPI frameworks, the predicted iPPI frameworks and the MDDR frameworks taken as reference for which we distinguished the MDDR-advanced subset (molecular frameworks of all compounds present in MDDR in all development phases but Biological Testing) from the full MDDR dataset (molecular frameworks of all compounds


22

Page 23 of 42

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


present in MDDR). We also distinguished different types of Phenyl-based terminal groups on the basis of the atom type carrying the Phenyl chemical moiety: -R-Phenyl indicates a terminal Phenyl group carried by any atom type; -CH2-Phenyl is the Benzyl group; -S-Phenyl is a phenyl group carried by a Sulfur atom; -N-Phenyl is a phenyl group carried by a Nitrogen atom; -OPhenyl is a phenyl group carried by an Oxygen atom. Table 2: Comparison of the proportions of terminal phenyl groups observed in the molecular frameworks (MF) of the different datasets. Dataset

-R-Phenyl

-CH2-Phenyl

-S-Phenyl

-N-Phenyl

-O-Phenyl

Experimental iPPI MF

71% ***

60% ***

5%

5%

2%

Predicted MF

87% ***

59% ***

4%

23% ***

4%

MDDR-advanced MF

57%

44%

2%

8%

3%

MDDR-total MF

59%

46%

3%

8%

3%

iPPI

MDDR-advanced represents the molecular frameworks of all compounds present in MDDR in all development phases but Biological Testing, MDDR-total represents the molecular frameworks of all compounds present in MDDR including Biological Testing. Different types of Phenyl groups are listed depending on the atom that carries it: –R-Phenyl indicates a terminal Phenyl group carried by any atom type; -CH2-Phenyl is the Benzyl group; -SPhenyl is a phenyl group carried by a Sulfur atom; -N-Phenyl is a phenyl group carried by a Nitrogen atom; -OPhenyl is a phenyl group carried by an Oxygen atom. *** indicates very significant differences of proportions (χ2 test P values < 0.0001) as compared to MDDR-total MF.

Those comparisons of proportions were also evaluated using a χ2 test to evaluate the significance of those differences. The proportions described in Table 2 confirmed that there are significantly (χ2 test p-value < 0.0001) more terminal Phenyl-based groups regardless of the atom carrying the ring (-R-Phenyl) within the molecular frameworks of experimental (71%) and predicted iPPIs (87%) as compared to those of MDDR-advanced (57%) and MDDR-total (59%). More specifically, in both datasets of iPPIs the type of terminal Phenyl-based group is most often a Benzyl group (χ2 test p-value < 0.0001). In cases of –S-Phenyl or –O-Phenyl groups no


23


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 42

significant difference could be measured. Very interestingly, in case of the –N-Phenyl group the difference is extremely significant between the predicted iPPI dataset and MDDR (χ2 test p-value < 0.0001) but not between the experimental iPPI dataset and MDDR. This is the second biggest contributions to phenyl-based terminal groups for the predicted iPPI datasets after Benzyl. This interestingly highlights that our models suggest to expand the selection of scaffolds based on such chemical features to maximize the capacity of the corresponding compounds to tackle PPI interfaces. Those results excitingly confirm industrial strategies undertaken by companies such as Asinex Ltd that have based their design of PPI-compliant chemical libraries on the overrepresentation of such terminal phenyl groups. Although they made the choice of linking those similar terminal groups on nature-inspired central scaffolds, it is compelling to notice that our models suggest similar chemical patterns and opens to complementary options.

CONCLUSION Using the combination of an experimental and a machine-learning based predicted dataset of iPPI compounds, we managed to identify a list of privileged chemical substructures that can be chemically functionalized to construct a PPI-tailored chemical library for high throughput screening campaigns. This chemical information can be used to guide the medicinal chemists in such an endeavor and thus goes far beyond the simple cherry picking of PPI-compliant compounds among existing catalogs. It provides chemical guidance to assist the chemists by proposing scaffolds as promising starting points that can be used to expand the available PPIfriendly chemical space. A particular attention was paid to the selection of scaffolds that can be thoroughly chemically functionalized while complying with ineluctable physico-chemical profiles regarding ADME/Tox safety.


24

Page 25 of 42

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


ASSOCIATED CONTENT Support Information Additional information concerning the covering percentages of shared molecular frameworks between our experimental iPPI datasets and reference database, and the descriptors used in the construction of the models. The list of 447 molecular frameworks derived from the ZINC database can be obtained as a CSV file containing the canonical smiles of the chemical structures. This material is available free of charge via the Internet at http://pubs.acs.org.

ACKNOWLEDGMENT We want to thank Chemical Computing Group for providing us with several hundred licenses for the software MOE. Those licenses have allowed us to proceed to the CPU-farm computing of the molecular descriptors for more than 20 million compounds. We thank the Agence Nationale de la Recherche (ANR) for funding a part of this project (ANR-15-CE18-0023-03).

FUNDING SOURCES Project funded by the Agence Nationale de la Rehcerche (ANR-15-CE18-0023-03)

ABBREVIATIONS USED PPI, protein-protein interactions; iPPI, inibitors of protein-protein interactions; MF, molecular frameworks; ADMET, absorption distribution metabolism excretion toxicity; PAINS, pan-assay interference compounds; RF, random forest; SVM, support vector machine; CV, cross validation; EF, enrichment factor; GPCR, G-protein coupled receptors; PSIP1, PC4 and SFRS1 interacting protein 1; XIAP, X-linked inhibitor of apoptosis; BCL2, B-cell CLL/lymphoma 2; MDM2, mouse double minute 2; GP120, glyco protein 120; ITGAL, Integrin alpha-L; CTNNB1,


25


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 42

Catenin beta-1; BRD, bromodomain; HIF, Hypoxia-inducible factor 1-alpha; TAK1, TGF-betaactivated kinase 1, MEK5; Dual specificity mitogen-activated protein kinase kinase 5; MCL-1, Induced myeloid leukemia cell differentiation protein Mcl-1

REFERENCES (1) Venkatesan, K.; Rual, J.-F.; Vazquez, A.; Stelzl, U.; Lemmens, I.; Hirozane-Kishikawa, T.; Hao, T.; Zenkner, M.; Xin, X.; Goh, K.-I.; Yildirim, M. A.; Simonis, N.; Heinzmann, K.; Gebreab, F.; Sahalie, J. M.; Cevik, S.; Simon, C.; de Smet, A.-S.; Dann, E.; Smolyar, A.; Vinayagam, A.; Yu, H.; Szeto, D.; Borick, H.; Dricot, A.; Klitgord, N.; Murray, R. R.; Lin, C.; Lalowski, M.; Timm, J.; Rau, K.; Boone, C.; Braun, P.; Cusick, M. E.; Roth, F. P.; Hill, D. E.; Tavernier, J.; Wanker, E. E.; Barabási, A.-L.; Vidal, M. An Empirical Framework for Binary Interactome Mapping. Nat. Methods 2009, 6 (1), 83–90. (2) Stumpf, M. P. H.; Thorne, T.; de Silva, E.; Stewart, R.; An, H. J.; Lappe, M.; Wiuf, C. Estimating the Size of the Human Interactome. Proc. Natl. Acad. Sci. 2008, 105 (19), 6959–6964. (3) Ryan, D. P.; Matthews, J. M. Protein–protein Interactions in Human Disease. Curr. Opin. Struct. Biol. 2005, 15 (4), 441–446. (4) Higueruelo, A. P.; Jubb, H.; Blundell, T. L. Protein–protein Interactions as Druggable Targets: Recent Technological Advances. Curr. Opin. Pharmacol. 2013, 13 (5), 791–796. (5) Mullard, A. Protein–protein Interaction Inhibitors Get into the Groove. Nat. Rev. Drug Discov. 2012, 11 (3), 173–175. (6) Jubb, H.; Blundell, T. L.; Ascher, D. B. Flexibility and Small Pockets at Protein–protein Interfaces: New Insights into Druggability. Prog. Biophys. Mol. Biol. 2015, 119 (1), 2–9. (7) Scott, D. E.; Bayly, A. R.; Abell, C.; Skidmore, J. Small Molecules, Big Targets: Drug Discovery Faces the Protein–protein Interaction Challenge. Nat. Rev. Drug Discov. 2016, 15 (8), 533–550. (8) Kuenemann, M. A.; Sperandio, O.; Labbé, C. M.; Lagorce, D.; Miteva, M. A.; Villoutreix, B. O. In Silico Design of Low Molecular Weight Protein–protein Interaction Inhibitors: Overall Concept and Recent Advances. Prog. Biophys. Mol. Biol. 2015, 119 (1), 20–32. (9) Labbé, C. M.; Laconde, G.; Kuenemann, M. A.; Villoutreix, B. O.; Sperandio, O. IPPI-DB: A Manually Curated and Interactive Database of Small Non-Peptide Inhibitors of Protein– protein Interactions. Drug Discov. Today 2013, 18 (19–20), 958–968. (10) Higueruelo, A. P.; Schreyer, A.; Bickerton, G. R. J.; Pitt, W. R.; Groom, C. R.; Blundell, T. L. Atomic Interactions and Profile of Small Molecules Disrupting Protein-Protein Interfaces: The TIMBAL Database. Chem. Biol. Drug Des. 2009, 74 (5), 457–467. (11) Basse, M. J.; Betzi, S.; Bourgeas, R.; Bouzidi, S.; Chetrit, B.; Hamon, V.; Morelli, X.; Roche, P. 2P2Idb: A Structural Database Dedicated to Orthosteric Modulation of ProteinProtein Interactions. Nucleic Acids Res. 2013, 41 (D1), D824–D827. (12) Morelli, X.; Bourgeas, R.; Roche, P. Chemical and Structural Lessons from Recent Successes in Protein–protein Interaction Inhibition (2P2I). Curr. Opin. Chem. Biol. 2011, 15 (4), 475–481.


26

Page 27 of 42

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


(13) Neugebauer, A.; Hartmann, R. W.; Klein, C. D. Prediction of Protein−Protein Interaction Inhibitors by Chemoinformatics and Machine Learning Methods. J. Med. Chem. 2007, 50 (19), 4665–4668. (14) Reynès, C.; Host, H.; Camproux, A.-C.; Laconde, G.; Leroux, F.; Mazars, A.; Deprez, B.; Fahraeus, R.; Villoutreix, B. O.; Sperandio, O. Designing Focused Chemical Libraries Enriched in Protein-Protein Interaction Inhibitors Using Machine-Learning Methods. PLoS Comput. Biol. 2010, 6 (3), e1000695. (15) Hamon, V.; Bourgeas, R.; Ducrot, P.; Theret, I.; Xuereb, L.; Basse, M. J.; Brunel, J. M.; Combes, S.; Morelli, X.; Roche, P. 2P2IHUNTER: A Tool for Filtering Orthosteric Protein-Protein Interaction Modulators via a Dedicated Support Vector Machine. J. R. Soc. Interface 2013, 11 (90), 20130860–20130860. (16) Sperandio, O.; Reynès, C. H.; Camproux, A.-C.; Villoutreix, B. O. Rationalizing the Chemical Space of Protein–Protein Interaction Inhibitors. Drug Discov. Today 2010, 15 (5–6), 220–229. (17) Villoutreix, B. O.; M. Labbe, C.; Lagorce, D.; Laconde, G.; Sperandio, O. A Leap into the Chemical Space of Protein-Protein Interaction Inhibitors. Curr. Pharm. Des. 2012, 18 (30), 4648–4667. (18) Che, Y.; Brooks, B. R.; Marshall, G. R. Development of Small Molecules Designed to Modulate Protein–Protein Interactions. J. Comput. Aided Mol. Des. 2006, 20 (2), 109–130. (19) Hershberger, S.; Lee, S.-G.; Chmielewski, J. Scaffolds for Blocking Protein-Protein Interactions. Curr. Top. Med. Chem. 2007, 7 (10), 928–942. (20) Kuenemann, M. A.; Bourbon, L. M. L.; Labbé, C. M.; Villoutreix, B. O.; Sperandio, O. Which Three-Dimensional Characteristics Make Efficient Inhibitors of Protein–Protein Interactions? J. Chem. Inf. Model. 2014, 54 (11), 3067–3079. (21) Wang, Y.-C.; Chen, S.-L.; Deng, N.-Y.; Wang, Y. Computational Probing Protein–protein Interactions Targeting Small Molecules. Bioinformatics 2015, btv528. (22) Hamon, V.; Brunel, J. M.; Combes, S.; Basse, M. J.; Roche, P.; Morelli, X. 2P2Ichem: Focused Chemical Libraries Dedicated to Orthosteric Modulation of Protein–protein Interactions. MedChemComm 2013, 4 (5), 797. (23) Evans, B. E.; Rittle, K. E.; Bock, M. G.; DiPardo, R. M.; Freidinger, R. M.; Whitter, W. L.; Lundell, G. F.; Veber, D. F.; Anderson, P. S. Methods for Drug Discovery: Development of Potent, Selective, Orally Effective Cholecystokinin Antagonists. J. Med. Chem. 1988, 31 (12), 2235–2246. (24) Welsch, M. E.; Snyder, S. A.; Stockwell, B. R. Privileged Scaffolds for Library Design and Drug Discovery. Curr. Opin. Chem. Biol. 2010, 14 (3), 347–361. (25) Silva, C. F. M.; Pinto, D. C. G. A.; Silva, A. M. S. Chromones: A Promising Ring System for New Anti-Inflammatory Drugs. ChemMedChem 2016. (26) Jafari, E.; Khajouei, M. R.; Hassanzadeh, F.; Hakimelahi, G. H.; Khodarahmi, G. A. Quinazolinone and Quinazoline Derivatives: Recent Structures with Potent Antimicrobial and Cytotoxic Activities. Res. Pharm. Sci. 2016, 11 (1), 1–14. (27) Gouda, A. M.; Abdelazeem, A. H. An Integrated Overview on Pyrrolizines as Potential Anti-Inflammatory, Analgesic and Antipyretic Agents. Eur. J. Med. Chem. 2016, 114, 257–292. (28) Costantino, L.; Barlocco, D. Privileged Structures as Leads in Medicinal Chemistry. Curr. Med. Chem. 2006, 13 (1), 65–85.


27


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 42

(29) Schnur, D. M.; Hermsmeier, M. A.; Tebben, A. J. Are Target-Family-Privileged Substructures Truly Privileged? J. Med. Chem. 2006, 49 (6), 2000–2009. (30) Bondensgaard, K.; Ankersen, M.; Thøgersen, H.; Hansen, B. S.; Wulff, B. S.; Bywater, R. P. Recognition of Privileged Structures by G-Protein Coupled Receptors. J. Med. Chem. 2004, 47 (4), 888–899. (31) Szabo, M.; Klein Herenbrink, C.; Christopoulos, A.; Lane, J. R.; Capuano, B. Structure– Activity Relationships of Privileged Structures Lead to the Discovery of Novel Biased Ligands at the Dopamine D 2 Receptor. J. Med. Chem. 2014, 57 (11), 4924–4939. (32) Bemis, G. W.; Murcko, M. A. The Properties of Known Drugs. 1. Molecular Frameworks. J. Med. Chem. 1996, 39 (15), 2887–2893. (33) Labbé, C. M.; Kuenemann, M. A.; Zarzycka, B.; Vriend, G.; Nicolaes, G. A. F.; Lagorce, D.; Miteva, M. A.; Villoutreix, B. O.; Sperandio, O. IPPI-DB: An Online Database of Modulators of Protein–Protein Interactions. Nucleic Acids Res. 2016, 44 (D1), D542– D547. (34) Higueruelo, A. P.; Jubb, H.; Blundell, T. L. TIMBAL v2: Update of a Database Holding Small Molecules Modulating Protein-Protein Interactions. Database 2013, 2013 (0), bat039-bat039. (35) Liu, T.; Lin, Y.; Wen, X.; Jorissen, R. N.; Gilson, M. K. BindingDB: A Web-Accessible Database of Experimentally Determined Protein-Ligand Binding Affinities. Nucleic Acids Res. 2007, 35 (Database), D198–D201. (36) Kuenemann, M. A.; Labbé, C. M.; Cerdan, A. H.; Sperandio, O. Imbalance in Chemical Space: How to Facilitate the Identification of Protein-Protein Interaction Inhibitors. Sci. Rep. 2016, 6, 23815. (37) Irwin, J. J.; Sterling, T.; Mysinger, M. M.; Bolstad, E. S.; Coleman, R. G. ZINC: A Free Tool to Discover Chemistry for Biology. J. Chem. Inf. Model. 2012, 52 (7), 1757–1768. (38) Bento, A. P.; Gaulton, A.; Hersey, A.; Bellis, L. J.; Chambers, J.; Davies, M.; Kruger, F. A.; Light, Y.; Mak, L.; McGlinchey, S.; Nowotka, M.; Papadatos, G.; Santos, R.; Overington, J. P. The ChEMBL Bioactivity Database: An Update. Nucleic Acids Res. 2014, 42 (D1), D1083–D1090. (39) Wetzel, S.; Klein, K.; Renner, S.; Rauh, D.; Oprea, T. I.; Mutzel, P.; Waldmann, H. Interactive Exploration of Chemical Space with Scaffold Hunter. Nat. Chem. Biol. 2009, 5 (8), 581–583. (40) Baell, J. B.; Holloway, G. A. New Substructure Filters for Removal of Pan Assay Interference Compounds (PAINS) from Screening Libraries and for Their Exclusion in Bioassays. J. Med. Chem. 2010, 53 (7), 2719–2740. (41) Bruns, R. F.; Watson, I. A. Rules for Identifying Potentially Reactive or Promiscuous Compounds. J. Med. Chem. 2012, 55 (22), 9763–9772. (42) Lagorce, D.; Sperandio, O.; Baell, J. B.; Miteva, M. A.; Villoutreix, B. O. FAF-Drugs3: A Web Server for Compound Property Calculation and Chemical Library Design. Nucleic Acids Res. 2015, 43 (W1), W200–W207. (43) Molecular Operating Environment (MOE), 2013.08; Chemical Computing Group Inc., 1010 Sherbooke St. West, Suite #910, Montreal, QC, Canada, H3A 2R7, 2014. (44) Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, I. H. The WEKA Data Mining Software: An Update. ACM SIGKDD Explor. Newsl. 2009, 11 (1), 10. (45) Quinlan, J. R. C4.5: Programs for Machine Learning; The Morgan Kaufmann series in machine learning; Morgan Kaufmann Publishers: San Mateo, Calif, 1993.


28

Page 29 of 42

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


(46) Breiman, L. Random Forests. Mach. Learn. 2001, 45 (1), 5–32. (47) Cohen, W. W. Fast Effective Rule Induction. In Twelfth International Conference on Machine Learning; Morgan Kaufmann Publishers, 1995; pp 115–123. (48) Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn. 1995, 20 (3), 273–297. (49) R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2014. (50) Kuhn, M. Building Predictive Models in R Using the Caret Package. J. Stat. Softw. 2008, 28 (5). (51) Clark, R. D.; Fox, P. C. Statistical Variation in Progressive Scrambling. J. Comput. Aided Mol. Des. 2004, 18 (7–9), 563–576. (52) Fry, D. C. Protein–protein Interactions as Targets for Small Molecule Drug Discovery. Biopolymers 2006, 84 (6), 535–552. (53) Hajduk, P. J.; Bures, M.; Praestgaard, J.; Fesik, S. W. Privileged Molecules for Protein Binding Identified from NMR-Based Screening. J. Med. Chem. 2000, 43 (18), 3443–3447. (54) Capuzzi, S. J.; Muratov, E. N.; Tropsha, A. Phantom PAINS: Problems with the Utility of Alerts for Pan-Assay INterference CompoundS. J. Chem. Inf. Model. 2017, 57 (3), 417– 427.

Table of Contents graphic


29


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 42


Page 31 of 42


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 42

Nb of Chemical iPPIs series

ID

PPI

Type

27*

PSIP1, BCL2

Type-1

3

2

299

0

666*

BCL2, ITGAL

Type-1

2

2

237

1

946*

PSIP1, BCL2

Type-1

2

2

222

1

Type-1

2

2

154

1

1061* BCL2, E1

MW rsynth

10

BCL2

Type-2

14

9

319

1

82*

GP120

Type-2

70

17

361

1

123

XIAP

Type-2

10

4

351

0.62


Page 33 of 42

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


203

GP120

Type-2

10

4

324

1

253

MDM2

Type-2

14

8

411

1

260

cyclophili n

Type-2

10

2

328

1

268

XIAP

Type-2

16

3

365

1

271

XIAP

Type-2

18

5

496

0.57

279

XIAP

Type-2

31

2

263

0.52



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 42

411*

ITGAL

Type-2

21

5

264

1

522

BCL2

Type-2

11

10

429

1

557

CTNNB1

Type-2

24

6

354

1

677

BCL2

Type-2

10

8

303

1

annexinA2 Type-2

12

7

263

0.35

714

719

RAD51

Type-2

10

5

258

1

744

MDM2

Type-2

10

4

404

1


Page 35 of 42

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


805

MDM2

Type-2

63

17s

342

1

815

MDM2

Type-2

31

12

264

0.35

834

MDM2

Type-2

10

2

369

1

843

MDM2

Type-2

13

3

358

1

880

BRD

Type-2

12

4

300

1

954

CD80

Type-2

38

10

327

1



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 36 of 42

978

BCL2

Type-2

10

3

297

0.35

1039

MDM2

Type-2

11

3

370

1

1043

TAK1

Type-2

36

5

344

0.73

1044

HIF

Type-2

12

5

236

0.44

1061* BCL2, E1

Type-1

2

2

154

1

1066

MDM2

Type-2

19

6

296

1

1085*

HIF

Type-2

12

2

157

1

1115

MDM2

Type-2

27

8

378

1


Page 37 of 42

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


1122

neuropilin

Type-2

21

7

312

1

1152

BRD

Type-2

11

3

235

1

1203

ITGAL

Type-2

13

4

332

1

1220

XIAP

Type-2

27

4

22

0.41

1240

MDM2

Type-2

41

17

328

1



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 38 of 42

411*

946*


Page 39 of 42

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


MF

ChEMBL

PPI Target

CHEMBL1424929

PB1-PB1 (MEK5)

CHEMBL1536174

HIV-1gp120

CHEMBL1400469 MCL-1

CHEMBL1592119 MCL-1

CHEMBL2069788 MCL-1

CHEMBL1548111 MCL-1

CHEMBL1478140 MCL-1

CHEMBL233531


MCL-1


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


Page 40 of 42

Page 41 of 42

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


N O

HN

N

N N

N

O

Type-1

Type-1

Type-1

O

N

Type-1

HN

HN

N

Type-1

N

Type-1

Type-1

Type-1

N N

N

N

S

N

Type-1

O

O

Type-3

NH

Type-3

Type-3

O

HN O

HN

S

O

NH

O

N

O NH

O

Type-3

Type-3

Type-3

Type-3

N N N

N

N

N

O N

N

N

N

N

N

HN

Type-3

Type-3

Type-3


Type-3


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

N

N N

N

N

Page 42 of 42

N

Type-3

Type-3

Type-3

Type-3

HN

N

N

NH N

HN

Type-3

Type-3

Type-3

Type-3

N

N

N

O O

Type-3

Type-3

Type-3

Type-4

O S

NH O

NH

O

N

S O

N

O

Type-4

Type-4

Type-4

Type-4

HN

N O

Type-4

Type-4

Type-4

Type-4

N N

N

N

N HN

Type-4

Type-4

N

Type-4


HN

N

Type-4

Privileged Substructures to Modulate Protein ... - ACS Publications

Recommend Documents