Prospectively Validated Proteochemometric Models for the Prediction

Aug 20, 2018 - In this study we build and benchmark proteochemometric (PCM) ... and binding site protein descriptors as input variables, achieving a ...
2 downloads 0 Views 2MB Size
Subscriber access provided by University of South Dakota

Computational Chemistry

Prospectively Validated Proteochemometric Models for the Prediction of Small Molecule Binding to Bromodomain Proteins Kathryn Giblin, Samantha Hughes, Helen Boyd, Pia Hansson, and Andreas Bender J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.8b00400 • Publication Date (Web): 20 Aug 2018 Downloaded from http://pubs.acs.org on August 27, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 68 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Prospectively Validated Proteochemometric Models for the Prediction of Small Molecule Binding to Bromodomain Proteins Kathryn A. Giblin1, Samantha J. Hughes2, Helen Boyd3, Pia Hansson3, Andreas Bender1* 1. CENTRE FOR MOLECULAR INFORMATICS, DEPARTMENT OF CHEMISTRY, UNIVERSITY OF CAMBRIDGE, LENSFIELD ROAD, CAMBRIDGE, CB2 1EW, UK 2. COMPUTATIONAL CHEMISTRY, ONCOLOGY, IMED BIOTECH UNIT, ASTRAZENECA, CAMBRIDGE, CB10 1XL, UK 3. DISCOVERY BIOLOGY, DISCOVERY SCIENCES, IMED BIOTECH UNIT, ASTRAZENECA, GOTHENBURG, 431 50 SE, SWEDEN±

* e-mail: [email protected]

1 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 68

ABSTRACT

The bromodomain-containing proteins are a ligandable family of epigenetic readers, which play important roles in oncological, cardiovascular and inflammatory diseases. Achieving selective inhibition of specific bromodomains is challenging, due to the limited understanding of compound and target selectivity features. In this study we build and benchmark proteochemometric (PCM) classification models on bioactivity data for 15,350 data points across 31 bromodomains, using both compound fingerprints and binding site protein descriptors as input variables, achieving a maximum performance as measured by the Matthew’s Correlation Coefficient (MCC) of 0.83 on the external test set. We also find that histone peptide binding data can be used as a target descriptor to build a high performing PCM model (MCC 0.80), showing the transferability of peptide interaction information to modelling small-molecule bioactivity. 1,139 compounds were selected for prospective experimental testing by performing a virtual screen using model predictions and implementing conformal prediction, which resulted in 319 correctly predicted compoundtarget pair actives and the correct prediction for certain selectivity profile combinations of the four bromodomains tested against. We identify that conformal prediction can be used to finetune the balance between hit retrieval and hit structural diversity in a virtual screening setting. PCM can be applied to future virtual screening and compound design, including off-target prediction for bromodomains.

2 ACS Paragon Plus Environment

Page 3 of 68 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

INTRODUCTION Bromodomain-containing proteins are epigenetic readers, which regulate gene transcription by recognising and binding to acetyl lysine post translational modifications (PTMs) on histone proteins.1,2 PTMs alter the accessibility of proximal chromatin to gene transcription and the binding of bromodomain-containing proteins often causes the transcriptional activation of genes.3 61 unique bromodomains have been identified, which have been classified into to eight sub-families.4 Bromodomains contain a deep and well-defined acetyl lysine recognition binding site, which interacts with acetylated lysine residues on histone peptides in a protein-protein interaction, providing a target for small molecule inhibition.5,2 Bromo and extra terminal (BET) family bromodomains have been well-studied, with bromodomain containing protein 4 (BRD4) having received the most attention.6 BET family bromodomain inhibitors of different chemical scaffolds are currently in clinical development for oncological, cardiovascular and inflammatory indications.7,4,8,29 The roles of other bromodomain-containing proteins have been discussed in depth in the review by Galdeano et al., 2016.10 When designing inhibitors for bromodomain-containing proteins it is important to understand the selectivity profiles between subfamilies and between individual domains. To facilitate efficient inhibitor design, studies have investigated the prediction of affinity of small molecules for bromodomains. Examples include the molecular docking and QSAR study developed for the prediction of naphthyridone derivatives for the ATAD2 bromodomain,11 and the predictions of binding affinities for 16 tetrahydroquinoline (THQ)based ligands12 against BRD4 BD1 using free energy perturbation (FEP) calculations. Selectivity has also been predicted on a small scale using relative binding free energy (RBFE) approaches;13 this study used three inhibitor structures and predicted their affinities across up to 22 bromodomains, achieving mean unsigned errors of 0.81-1.76 kcal/mol with 3 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 68

experimental data. Recently, multiple QSAR models have been developed for a set of 88 organic molecules against the bromodomains in proteins BRD2, BRD3 and BRD4. These models achieved Q2 values between 0.75-0.88 for predicting the activity for the three bromodomains by employing QuBiLS-MIDAS 3D molecular descriptors and multiple linear regression models with 6-9 variables per model.14 However, as far as we are aware, no studies have yet modelled bromodomain bioactivity and selectivity data for a large number of ligands and targets, which is what we aimed to address in the current work using the global technique of incorporating these data into one model using Proteochemometric Modelling. Proteochemometric (PCM) Modelling is an extension of Quantitative Structure Activity Relationship (QSAR) modelling,15 which makes use of both compound and target descriptors and their interactions to predict bioactivity using machine learning methods. PCM has been shown to be a highly predictive modelling technique, with the ability to predict and interpret selectivity profiles across targets.16 Another advantage of PCM is the ability to utilise the activity data available for similar targets in order to inform prediction of activity for those targets where data is lacking, termed orphan targets.17 PCM has been applied to the following target classes with encouraging results in classification, measured by the area under the receiver operating characteristic curve (ROC AUC) and regression, measured by the coefficient of determination of predictions to the test set labels (Q2): G-protein coupled receptors, (ROC AUC 0.89);18,19 kinases, (ROC AUC > 0.8);20,21 HIV proteases, (Q2 = 0.87);22 Cytochrome p450s, (ROC AUC > 0.9)23 and HDACs (Q2 = 0.75).24 A more recent study employed PCM for the identification of allosteric modulators of glutamate receptors achieving an overall ROC AUC of 0.97 for the model, which was used to identify hits using virtual screening for the glutamate 7 receptor described as an orphan target.25 For further information on PCM, we refer the reader to the comprehensive reviews by Cortes-Ciriano et al16 and Qui et al.26 4 ACS Paragon Plus Environment

Page 5 of 68 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

This study employs the PCM modelling approach, which has not previously been performed for the bromodomain target class, utilising a dataset of 15,350 bioactivity data points (6352 distinct compounds) across 31 bromodomains to predict for activity and selectivity of new compounds against bromodomains. To obtain a suitable and informationrich dataset, data from multiple endpoints were incorporated. Although it is appreciated that there are limitations around incorporating different experimental readouts,27 it was necessary to include as many data as possible to provide sufficient small molecule bioactivity values across the bromodomains in the dataset. Much of the Kd data incorporated was from the BROMOscan®28 competition based assay panel screens. Kd and % inhibition data originated from AstraZeneca data and public data. Ki and IC50 values were obtained from public datasets. ∆Tm values (at 10 µM) were used from panel selectivity screening studies and provide a large amount of inactive binding data. We explored the comparability of data points which were collated from different sources for the same compound-target pair between public data (IC50, Kd and Ki values, as well as the IC50 values estimated from % inhibition values) and AstraZeneca concentration response (CR) assays, which measured Kd values (Figure S1). For compounds with multiple data points, we observed a correlation of R=0.96. Due to the heterogeneity of endpoints, a classification (instead of a regression) model was generated to produce a global model of selectivity. Classification has been implemented in place of regression previously for similar tasks where there is data variability, including the DREAM challenge to model cytotoxicity.29 Classification and regression PCM studies have been performed previously using multiple bioactivity endpoints.30,31,25 This study provides the largest reported in silico study for bromodomain-containing proteins, based on a diverse chemical space and the use of selectivity panel data, which allows information about multiple related biological targets to inform the structure activity model. Furthermore, the model has been prospectively validated as described in the following. 5 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 68

METHODS Dataset The dataset used to generate the models was extracted from the public and licensed sources of ChEMBL32, PubChem33, ChEpiMod34, GOSTAR35 and the manual extraction of data from recent publications,36,37,38,39,40,41,42,43,44,45,46,47,48,49, as well as AstraZeneca proprietary databases. Public Dataset The public dataset composition across sources is shown in Figure S2 and the distribution across bromodomains is shown in Figure S3. Compound-target bioactivity data points were extracted from ChEMBL-20, using bromodomain UniProtKB Accession IDs and applying the criteria of a bioactivity type of either IC50, Ki, Kd, %inhibition and ∆Tm, binding (B) assay type, as well as a confidence score of at least 8 and presence of a numerical value for the bioactivity. Bioassay descriptions were used to manually filter out compounds interacting with multiple protein domains or non-bromodomain domains within a bromodomain-containing protein, and to place data into the correct domain where multiple domains exist within one bromodomain-containing protein. Data points where the domain was unresolved were removed. For percentage inhibition values between 20-80 % at a certain concentration, the following Hill equation (Equation 1) was applied to convert to an estimated pIC50 value: 50 = −(  (100 − ) −  () + ), where Y is the inhibition value (in %) and X is the log concentration (Molar).50 Those inhibition values below 20 % inhibition and above 80 % inhibition were assigned to less than the derived pIC50 at 20 % inhibition or greater than the derived pIC50 at 80 % inhibition respectively, to account for the fact that the equation no longer applies to these parts of the IC50 curve. 6 ACS Paragon Plus Environment

Page 7 of 68 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Inactive data from PubChem was extracted by using the UniProtKB Accession ID,51 using the PubChem API. Again, bioassay descriptions were used to filter out compounds with unresolved bromodomain activity. Compound ID’s were converted to SMILES using the PubChem identifier exchange service52. Data was from ChEpiMod was provided by the curators. We filtered out the large functional screen for BAZ2B with PubChem_AID:504391, as it is suggested in the assay description that this screen should be used with caution due to likelihood of screening artefacts. Any PDB compounds without numerical data points were also removed, as these might be fragment molecules with low activity. GOSTAR data has been incorporated into the internal AstraZeneca database service and data for bromodomain targets were extracted using SQL Developer (version 2.11.2), queried using EntrezGene IDs (EGIDs). For the combined public dataset, active (A) and not active (N) classes were assigned by the following criteria: A = pIC50, pKd, pKi ≥ 5 or ∆Tm ≥ 0.9. All other data points were assigned as N and other endpoints were removed. ∆Tm ≥ 0.9 was chosen to maintain consistency between public data and AstraZeneca data, as 0.9 was used as a cut-off in the internal AstraZeneca Differential Scanning Fluorimetry (DSF) assays. 0.9 ∆Tm was chosen for these assays as it reflects 3 x standard deviation of the DMSO controls. The cut-off of 10 µM was chosen to enable the application of the model to virtual screening where this is a frequently used activity threshold. AstraZeneca Dataset Proprietary AstraZeneca data was extracted for bromodomains by querying the internal IBIS SAR database. Inhibition data was extracted at four commonly used concentrations: 1, 3, 10 and 25µM, as well as concentration response data and data from DSF assays. Where multiple data were present for the same assay, these were aggregated as an average. The 7 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 68

AstraZeneca dataset was classified into active (A) or not active (N) classes. pKd and pIC50 data were classified using the same thresholds as above, and compounds with thermal shift values of ∆Tm ≥ 0.9 at a concentration of 10 µM were classified as active, while the remainder were assigned to the not active class. The binding data from DiscoveRx assays had pre-assigned activity flags for each concentration point based on their percentage of control values, which were used for classification. Records with flags other than active or not active were removed. Since some records had values at multiple concentrations, the records were then placed into classes in the following order: active at 1µM: A, active at 3µM: A, active at 10µM: A, active at 25µM: N, not active at 25µM: N, not active at 10µM: N, not active at 3µM: N, not active at 1µM: N. Combined dataset After classification of compounds into active and not active classes, duplicate entries were removed from the dataset by comparison of structure (as calculated by StandardiseMolecules function in camb53 package in R using Indigo’s C API.54), domain and activity assignment. Additionally, for entries that were duplicates of structure and domain but had been classified into opposite activity classes, both data points were removed due to inconsistency. Bromodomains with less than 25 data points were removed from subsequent analysis. The final dataset contained 15,350 data points; 6,352 compounds across 31 bromodomains. The final distribution of the dataset, split into public and AstraZeneca data, per target and activity label can be found in Table S1 and is depicted in Figure 1. Approximately 53.2 % of data points correspond to type 2 bromodomains. In contrast, PB1 BD5, PCAF and SMARCA2 contain a low proportion of data points (0.2 %, 0.6 %, and 0.2 % respectively). The BET family bromodomains contain an enriched proportion of active compounds (47.3 %) compared to the whole dataset (34.3 %). Other domains which have a high proportion of actives include: BRD9 (27.8 %), CREBBP (37.3 %), EP300 (27.2 %) and 8 ACS Paragon Plus Environment

Page 9 of 68 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

TAF1 BD2 (55.2 %). PCAF, PB1 BD5 and SMARCA2 bromodomains contain a high proportion of actives (60.0 %, 61.5 % and 100 %) within a low number of data points, suggesting that these domains are rarely screened in bromodomain panel assays. ATAD2B, BRWD1 BD2 and BRPF3 contain low numbers of actives (0.0 %, 0.5 %, and 0.02 %), and data for these domains originates primarily from screening panels. These domains were included as compounds overlapped with other domains, and thus provided information on selectivity. Analysis of Chemical and Biological space Bromodomains were clustered based on sequence similarity using R packages SeqinR55 and APE56 to plot a phylogenetic tree using the plot.phylo function, with argument type=fan. DataWarrior57 Similarity Analysis using Skelspheres descriptors was used to generate chemical space visualizations of the dataset. The technique uses a Rubberbanding Forcefield approach to place molecules according to their similarity, with a cut-off value of 0.9 for the similarity. Bemis-Murcko scaffolds58 for the whole dataset and the public portion of the dataset were generated in KNIME59 using the RDKit Find Murcko Scaffolds node. The scaffold visualisations including the network graph, which used the force-directed layout, were generated in TIBCO Spotfire60. The network graph was generated only for active data points from the public dataset and for scaffolds where greater than 4 compound members had active data points for a bromodomain for presentation clarity. Compound and target descriptors Compound Descriptors Compounds were standardised from SMILES as described above using the Indigo API54, removing inorganic molecules and salts. Different compound descriptors were benchmarked against one another in PCM models. Compound 1D and 2D physicochemical descriptors 9 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 68

were calculated using camb53, using the functions GeneratePadelDescriptors, by the API linking to the Java PaDEL descriptors library61 and 512-bit hashed binary Morgan fingerprints of radius 2 were generated from the python RDKit module.62 3D structures were generated using Corina-3.663 providing a single low-energy conformer for each molecule, from which 3D Vsurf descriptors were generated in MOE64. 3D VSurf descriptors are internal co-ordinate based descriptors calculated from the 3D conformation of molecules and are similar to the Volsurf descriptors used in previous studies.20,65–67 Since 3D Vsurf descriptors encode the distribution of molecular size, shape, hydrophilic and hydrophobicity properties across the molecule,65 they were benchmarked for their use in PCM against the physicochemical 1D and 2D descriptors generated by the PaDEL library. Target Descriptors Sequences were aligned by importing one crystal structure per bromodomain into MOE 64, (Table S2), using the in-built “protein” function and selecting the automatic alignment option and then manually adjusting the alignment using the sequence editor to minimise alignment gaps. Binding site residues were chosen by importing all publicly known liganded bromodomain crystal structures in the MOE64 protein family database, filtering to those that contained ligands between 100-600 molecular weight (in total 352 crystal structures) and applying a threshold of 4.5 Å from any ligand for a residue position to be included. The final alignment can be found in Table S3. Z-scales 568 and other alignment dependent target descriptors were calculated using the camb53 AADescs function. These descriptors are derived from dimensionality reduction methods (e.g. principal component analysis) of the numerical property values for amino acid residues in the binding site of each protein.69 Peptide array data was obtained from the SPOT peptide array data for histone proteins against bromodomains published in Filippakopoulos, et al. 2012 (see Figure 4 of this work) 70 10 ACS Paragon Plus Environment

Page 11 of 68 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

by request for the data from the authors. The normalized raw intensity values (normalized to between 0-100) for each peptide-bromodomain interaction were used as a numerical descriptor for bromodomain targets, providing a fingerprint of bromodomain binding specificity for mono-lysine acetylated histone peptides. The final target descriptors comprised of a matrix of 22 bromodomain numerical interactions with 136 acetylated-lysine histone peptides. Algorithms and generation of PCM models PCM models were built using camb and caret packages in R.53,71 Compound and target descriptors were concatenated and highly correlated and near zero variance descriptors were removed using the functions RemoveHighlyCorrelatedFeatures (cut-off 0.95) and RemoveNearZeroVarianceFeatures (cut-off 30/1). The number of data points for BRD4 BD1 were randomly downsampled to 50 % of the original data points to reduce bias in the dataset and to present a more even distribution across bromodomains (Table S4). Variables were centred and scaled to mean and unit variance using the PreProcess caret function and split into 70/30 training to test ratio using the SplitSet function in camb, using stratified sampling according to bioactivity labels). Models were created for Random Forest (RF), Support Vector Machines (SVM) and Generalized Linear Models (GLM) using the rf, svmRadial and glm methods respectively in the train function in caret using the function GetCVTrainControl, 5-fold cross validation (CV), argument classProbs=True and summaryFunction=twoClassSummary to calculate class probabilities to provide summary statistics. Recursive feature elimination using caret rfe function, removed redundant variables, as determined by 5-fold CV assessed by ROC AUC. The number of input variables to randomly select at each node in the random forest trees (mtry) was optimised in CV using a random grid search of 15 values for RF and a grid search was performed to optimise values of σ (0-0.9) and C (0-5) for SVM. 11 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 68

Model Validation PCM models were validated through 5-fold cross validation and, after parameter optimisation, the prediction for unseen test set values (30% of the records split using stratified random sampling according to bioactivity) using the predict function from caret. ROC AUC, Matthew’s Correlation Coefficient (MCC), Area under the precision-recall curve (PRAUC), sensitivity and specificity were used to assess performance on the test set. Leave-one-scaffold-out (LOSO) validation was conducted by grouping compounds by their carbon framework Bermis-Murcko Scaffolds,58 which were calculated in KNIME59 using the RDKit Find Murcko Scaffolds node. 3,553 framework scaffolds resulted, of which a random sample of 50 scaffolds with more than 10 data points were selected as hold-out scaffolds. Leave-one-target-out (LOTO) validation was conducted by sequentially removing all compound-target pairs associated with one bromodomain target, training a model based on the remaining data and predicting for the hold-out target data points. Benchmarking to QSAR, Quantitative Sequence Activity Model (QSAM) and baseline models Models using only chemical descriptors (global QSAR) and only target descriptors (global QSAM) were generated on the same dataset using the caret and camb R packages71, using the same method described above on the same test set. Their performance as measured by ROC AUC was compared using a pairwise t-test utilising the compare_models function in caret, with Bonferroni correction71. This tests the hypotheses that compound activities are correlated across targets (global QSAR) and that compound activities are only target dependent (global QSAM). Individual QSAR RF models were generated for each bromodomain using the same method as for the PCM models.

12 ACS Paragon Plus Environment

Page 13 of 68 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

To assess the utility of the extra target information encoded in the binding site alignmentdependent target descriptors, a comparator RF model was trained on Morgan fingerprints with binary target identity descriptors. The binding site alignment-dependent descriptors, such as Z-Scales, provide a similarity measure between domains based on their amino acid binding site residues and properties from which the model is expected to make interpolations between similar domains. In contrast, the basic binary label identifiers for the different bromodomains do not encode a relationship between the targets themselves. To assess baseline performance, a generalised linear model was also generated on the whole dataset applying the glm function implemented in caret, using compound and target IDs as binary descriptors. Y-scrambling was performed by randomly reorganising the labels associated with each compound-target pair and rebuilding the models with the same final parameters as for the unscrambled models.72 Class labels were randomly scrambled by both 50% and 100% over 50 iterations. The mean ROC AUC over all iterations was calculated. Public Dataset Model A RF model was constructed for the public dataset using the same methods described above. This model was tested on a 30 % external test set as well as the proprietary AstraZeneca dataset. Applicability Domain Since applicability domain determination is difficult to rationalise in multi-dimensional space,73 the Mondrian cross-conformal prediction (CCP) framework was used to define which new compound-target pairs can be predicted at a certain confidence level (Cl).74 Applying Mondrian CCP results in data points in the test set being classified into 4 classes; either active, not active, “neither” active nor not active, or “both” active and not active, to achieve the specific error rate provided at different Cls. For our test set we assume that 13 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 68

samples are exchangeable with the training set, i.e. that it fulfils the exchangeable distribution assumption required for a guaranteed error rate. For the conformal method to be suitable, the validity metric (defined as the number of samples in the test set that are classified into a label that contains their experimental class (including the “both” class assignment)), must be greater than the Cl; i.e. the error rate should not exceed 1-Cl. The applicability domain was assessed by calculating conformal predictions implementing Mondrian cross-conformal prediction75 in R, which provides a wrapper around the caret package. We use the RF probability values (fraction of trees predicting for each class) as the non-conformity measure. The validity and efficiency metrics on the test set were used to assess whether the conformal prediction framework was valid for the performance on the external test set. When interpreting conformal predictions, the p-value assigned to the new sample prediction can be used as an indication of the degree of confidence that a sample is in the assigned class. Experimental Validation Predictions and filtering Compounds in the liquid sample library at AstraZeneca were screened against the model to predict activity against four bromodomains, namely BRD1, BRD4 BD1, BRD9 and BRPF1b. A total of 2,164,399 compounds were pre-processed in the same way as the training set molecules (see above) and combined with target descriptors for each target. The model was used to predict activity for these molecules using conformal predictions. We applied conformal prediction using the confidence levels of 0.7, 0.8 and 0.9 and the corresponding significance values of 0.3, 0.2 and 0.1 to compare new compound-target pair p-values to the model training set values to shortlist compounds for experimental testing. We chose this range because we wanted to find hits with structural diversity to the training set and any 14 ACS Paragon Plus Environment

Page 15 of 68 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

lower confidence than 0.7 would place too many new samples into the “neither” class, especially as the screening library compound set will be more diverse than the model test set. We also selected a range of significance values to observe the relationship between p-values and prediction accuracy. Firstly, the predictions with a p-value of > 0.1 for the active class (corresponding to the 0.9 confidence level) were selected. These actives were filtered to exclude those which were unavailable for testing (availability as a solid < 3 mg, as a solution 0.1 (inactive at 0.9 confidence level) was generated. These inactives were combined with the active set to provide bioactivity profiles for the same compound against multiple domains and compounds were annotated with their selectivity profile. Compound structures that were in the public domain were identified to select a significant number of these structures.

15 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 68

In the next step, sampling parameters for a subsequent diversity selection of compounds were calculated, namely the compound similarity to the training set and the cluster membership of the selected compounds. To provide measures of diversity for future selection processes and analysis, the compound-target pairs were assessed for their diversity compared to the training set. Firstly, the similarity of compound structures to the nearest neighbour compound structure in the training set (across any domain) was calculated using the Tanimoto similarity78 index, calculated from 512-bit Morgan Fingerprints in KNIME59. The experimentally determined selectivity profile of the most similar compound in the training set was extracted. Additionally, the nearest compound-target pair neighbour in the training set was identified by selecting the training set instance with the minimum Euclidean distance, calculated in KNIME59 from all compound and target descriptors used in the model, from the new prediction. Clustering was conducted to provide a means of selecting a diverse set of compounds for testing. To cluster a larger set of molecules, compounds were assigned their Murcko carbon framework scaffolds, from which 512-bit Morgan fingerprints were generated and used to hierarchically cluster the scaffolds based on their distance matrix measured as the Tanimoto distance. A distance threshold of 0.375 was used to merge scaffolds into clusters in KNIME,59 derived from the largest non-outlier distance value, which resulted in 585 clusters. Compound Selection for the prospective validation study Compounds were sorted into three overlapping sets: (a) Interpolation Compounds that were present in the training set but lacking experimental data for one or more of the four bromodomains (BRD1, BRD4 BD1, BRD9 and BRPF1b) , (b) Selectivity Profiles for novel compounds, defined as multiple confident active or inactive predictions (at the 0.7 to 0.9 confidence levels) for a given compound and (c) Singular Active predictions, comprising 16 ACS Paragon Plus Environment

Page 17 of 68 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

compounds that were not in the training set and were predicted to be active (at the 70-90 % confidence levels) for one of the four bromodomains in the validation study. All compounds available from the Interpolation Compound set were tested for their bromodomain activity. This included 9 compounds that had not previously been tested on BRD1, 18 for BRD9 and 8 for BRPF1. For the selection of Selectivity Profile compounds, grouped stratified weighted sampling (using sample_n in the dplr R package79) was conducted to select profile compounds across groups of public domain compounds, profile annotations, compound Tanimoto similarity to the nearest compound neighbour in the training set (binned into 10 bins between 0 and 1), and cluster numbers, weighted overall to an even distribution across public domain structure, profile annotations and Tanimoto similarity (limited by overall availability of compounds in each category). In total 721 compound-target pairs were tested as part of the Selectivity Profiles set. For the Singular Actives set, grouped stratified weighted sampling was conducted to select active compounds separately for each domain across groups of public availability, binned pvalues from conformal prediction (p-value >0.3, 0.2< p-value 0.85, specificity > 0.90). For most of these domains the dataset is heavily biased towards inactives 22 ACS Paragon Plus Environment

Page 23 of 68 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(Figure 1), and the high performance can be explained by the active scaffolds common to training and test set for these domains (Figure S5). High overall ROC AUCs are found for ATAD2, BPTF, BRD1, EP300, BRD7, BRD9, BRPF1 and TAF1 BD2; however, they generally have lower sensitivity values (0.48-0.84) at a higher specificity, resulting in an overprediction of inactives. This could be due to their higher active scaffold diversity, which is represented in Figure S6. Conversely, those domains with a higher sensitivity than specificity outside the BET family include CREBBP, and PCAF, both of which have high proportion of active data points (37.3 % and 60 %; Figure 3). Overall, 21 bromodomains are modelled well (with a ROC AUC > 0.8 and both sensitivity and specificity > 0.6). For bromodomains with poorer performance, this can be attributed to data limitations in many cases, including class imbalances and scaffold diversity as discussed above. Model Validation Next, we performed Leave-One-Scaffold-Out (LOSO) and Leave-One-Target-Out (LOTO) validation to assess the ability of the model to predict for new compounds and new targets respectively. Leave-One-Scaffold-Out Validation Figure S7 displays the LOSO results, which shows there is a large variation in ROC AUC across scaffolds (mean ROC AUC 0.92, standard deviation 0.12). There is a general observed trend towards better discrimination of activity for more complex scaffolds (scaffolds with higher numbers are more complex with a larger number of rings), which is particularly noticeable for those scaffold numbers above 3,000, which have consistently high ROC AUC values above 0.95. It appears that this trend is driven by the fact that there are similar scaffolds with slightly different ring structures in the training set, from which the model can interpolate more easily (Figure S8). The higher predictivity for more specific scaffolds is also likely to result from more consistent SAR for these molecules, due to their high degree of 23 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 68

optimisation. A combination of these factors can explain the trend towards better discrimination between active and not active classes for more complex scaffolds. For less complex scaffolds, with lower numbers, there is a higher variability between scaffold interpolation ability and hence model performance. This can be attributed to class imbalance in some cases (Figure S9); for example, scaffolds 192 and 2,753 both have a highly active data point composition, and consequently both have a high sensitivity accompanied by a lower specificity, corresponding to a larger number of overpredictions. Conversely, we observe high specificities for scaffolds 172, 740 and 1614 with zero sensitivity, which have a low number of active data points within the scaffold. However, this is not always the case, as for example scaffold 464 has a lower fraction of active data points (0.09) while still achieving perfect performance. Therefore, we can conclude that multiple factors contribute to the ability of models to interpolate for a new scaffold, including the scaffold similarity, the class distribution and the SAR itself within the dataset. Leave-One-Target-Out Validation PCM offers the advantage over QSAR that predictions can be interpolated also for new targets based on information in the training set about similar targets. To test the ability to interpolate to new targets we employed LOTO validation, the results of which are displayed in Figure S10. The average performance across all bromodomains was ROC AUC 0.863, with a sensitivity of 0.672 and a specificity of 0.864, which was unsurprisingly lower than our original model results averaged over all targets (ROC AUC 0.934, sensitivity 0.750, specificity 0.900). Target interpolation/extrapolation for the BET bromodomains (BD1 and BD2 domains from BRD2, BRD3, BRD4 and BRDT proteins) can be achieved with high predictive performance (ROC AUC > 0.8, sensitivities and specificities > 0.7). This can be attributed to the high correlation of activity of molecules between the BETs, as well as a high proportion of shared scaffolds between the domains (Figure 1 and Figure S4). These targets 24 ACS Paragon Plus Environment

Page 25 of 68 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

are also well represented in selectivity panel screens and therefore have data points for a larger number of chemotypes. All targets have a ROC AUC well above 0.5 (random), except for TAF1 BD2, which cannot be predicted for using information about other domains. This can be rationalised by the fact that there is only one other member of the subfamily, which is TAF1L BD2 (Figure 2); however, TAF1 BD2 has many more active compounds than TAF1L BD2 (Figure 1), and extrapolation from TAF1L BD2 hence results in the underprediction of active compounds, demonstrated by the low sensitivity of 0.081. Conversely the opposite is true for the predictions of TAF1L BD2 based on the activity for TAF1 BD2, where we observe an underprediction of the inactive class, thereby resulting in a low specificity of 0.292. BRPF1 and ATAD2 also have lower ROC AUCs of around 0.7. This can be rationalised again by the lower proportion of active compounds of the nearest domain sequences BRPF3/BRD1 and ATAD2B (Figure 1), resulting in more inactive predictions. SMARCA4 predictions have an overprediction of actives because the nearest domain SMARCA2 has a large proportion of active data points (Figure 1). ATAD2B consisted of all inactive data points and we could predict this domain with a specificity of 0.887 and a false positive rate of 0.113; these findings likely result from the transfer of the activity profile of ATAD2. BRD7, BRD1, KAT2A and PB1 BD5 are predicted with high ROC AUC, sensitivity and specificity values > 0.8. The number of data points for each of these domains is small, and a larger number of data points for a similar domain, BRD9, BRPF1b, CECR2 and SMARCA4 respectively, (Figure 1) are contained within the training set, with a similar activity profile (Figure S4). This shows the ability of the model to interpolate between similar domains to those with similar activity profiles, especially for domains where there is fewer data available.

25 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 68

Overall, we can conclude that PCM can generally extrapolate well for new targets that have similar bioactivity profiles to those in the training set, and less well for novel targets where this is not the case. Benchmarking to QSAR, QSAM and baseline models We next compared our PCM model to other techniques which could be performed on the same dataset, in order to establish its comparative advantage for bromodomain bioactivity modelling. The final PCM model significantly (p value 0.8 and both sensitivity and specificity > 0.6.

43

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Page 44 of 68

Figure 6. ROC AUC curves for PCM versus Global QSAR and Global QSAM techniques on the same test set. PCM outperforms both Global QSAR and QSAM techniques, showing an interaction of compound and target features is important for modelling this dataset.

44

ACS Paragon Plus Environment

Page 45 of 68 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Journal of Chemical Information and Modeling

Figure 7. Individual QSAR vs PCM performance for each target measured by Matthew’s Correlation Coefficient (MCC). On average, PCM outperforms QSAR. The BET bromodomains are highlighted the blue box outline.

45

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

Tanimoto Similarity to Nearest Compound Neighbour in Training Set

b

P-value for Active Class from Conformal Prediction

a

P-value for Active Class from Conformal Prediction

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Page 46 of 68

Tanimoto Similarity to Nearest Compound Neighbour in Training Set

Figure 8. Selectivity Profile predictions in terms of p-values plotted against Tanimoto similarity to the nearest compound neighbour in the training set (Morgan fingerprints, 512 bits) for a) compounds that were predicted active at both BRD1 and BRPF1; active=A, not active=N. The p-values are lower for this set on average compared to the data for the other Selectivity Profile, and most of the wrong predictions fall between 0.1-0.3. b) compounds that were predicted active at BRD4 BD1 and inactive at BRD1 and BRPF1; active=A, not active=N. Higher p-values and similarity to the nearest compound neighbour in the training set were predicted correctly as active for BRD4 BD1.

46

ACS Paragon Plus Environment

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Journal of Chemical Information and Modeling

P-value for Active Class from Conformal Prediction

Page 47 of 68

Tanimoto Similarity to Neighbour in Training Set

Nearest

Compound

Figure 9. The p-value generated from conformal prediction as plotted against the Tanimoto similarity of compounds to the nearest compound neighbour in the training set (calculated from 512-bit Morgan fingerprints). A=active, N= not active. The overall trend is towards a higher confidence for molecules with a more similar compound neighbour in the training set. The false positive rate decreases as the p-value cut-off is increased to 0.3 for all bromodomains. Hit rate is increased to 68.7 % from 34.1 %. Many of the incorrectly predicted single-label actives are 47

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Page 48 of 68

now classified as the “neither” class, which means they are considered outside of the newly defined applicability domain. However, improved hit rate comes with decreased diversity of new actives.

48

ACS Paragon Plus Environment

Page 49 of 68 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Journal of Chemical Information and Modeling

Table 1. Comparison of the external test set performance of the PCM binary classification model with other models generated for the dataset. Validation

ROC AUC

MCC

Sensitivity

Specificity

AUPR curve

RF PCM 5-fold Cross 0.964 +/- 0.0026 Validation

-

0.857 +/- 0.0141

0.948 +/- 0.0066

-

RF PCM 30 external test set

% 0.969

0.826

0.859

0.955

0.879

Peptide descriptor 0.957 model with Morgan fingerprints

0.801

0.850

0.942

0.912

SVM PCM model 30 0.942 % external test set

0.789

0.810

0.956

0.910

Linear Model 30 % 0.916 external test set

0.680

0.754

0.912

0.860

Global QSAR

0.801

0.498

0.511

0.926

0.679

Global QSAM 0.747 (Quantitative Sequence Activity Models)

0.376

0.390

0.919

0.555

Average individual 0.936 PCM performance

0.634

0.750

0.900

-

Average

0.569

0.762

0.809

-

Individual 0.897

49

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Page 50 of 68

QSAR performance

RF PCM with binary 0.853 target id descriptors 30 % external test set

0.611

0.616

0.940

0.722

Linear baseline Model 0.730 (Target and compound binary IDs) 30 % external test set

0.441

0.755

0.706

0.113

Public data model 30 0.951 % external test set

0.767

0.919

0.848

0.884

Public data model 0.642 proprietary data test set

0.182

0.410

0.778

0.347

The peptide model uses histone peptide array data as a surrogate for target descriptors. Models utilising different algorithms (RF=Random Forest, SVM=Support Vector Machines, GLM= Generalised Linear Model) were compared, as well as those incorporating different feature spaces (only compound descriptors (QSAR) and only target descriptors (QSAM)). Binary target ID descriptors and a baseline linear model (GLM) based on using only binary compound IDs and target IDs were also benchmarked. A model built on the public dataset and tested on a 30 % random test set as well as the proprietary data was benchmarked. The performance measures were area under the receiver operating characteristic curve (ROC AUC), Matthews’ Correlation Coefficient (MCC), sensitivity (true positive predictions divided by all positive labels), specificity (true negative predictions divided by all negative labels) and the area under the precision recall curve (PRAUC). The table shows that the PCM model based on Morgan Fingerprints and Z-scales 5 outperformed other models in terms of ROC AUC and that the peptide model and the public data model performed well for their external test set validations. On average PCM outperformed individual target QSAR models.

50

ACS Paragon Plus Environment

Page 51 of 68 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Journal of Chemical Information and Modeling

Table 2: Performance as measured by Accuracy, Balanced Accuracy and Matthew’s Correlation Coefficient (MCC) for the three overlapping subsets of compounds selected for testing. Interpolation performs the best for the model, as expected, followed by the Selectivity Profile overall predictions. Set

True Positive

True Negative

False Positive

False Negative

Accuracy

Balanced Accuracy

MCC

Precision

Median Euclidean distance to nearest compound-target neighbour in training set

Interpolation Compounds

9

12

12

2

0.60

0.66

0.30

0.42

0.37

Selectivity Profiles

204

166

343

8

0.51

0.64

0.31

0.37

0.52

0

273

0

0.29

0.29

-

0.29

0.57

Singular Actives 115

51

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Page 52 of 68

Table 3: Three compound selectivity profiles which were predicted by the PCM model and their Thermal Shift values, as determined by Differential Scanning Fluorimetry (DSF). Structure

BRD1

BRPF1b

BRD9

BRD4 BD1

Experimental

Prediction

Experimental

Prediction

Experimental

Prediction

Experimental

Prediction

Activity

Activity

Activity

Activity

Activity

Activity

Activity

Activity

(∆Tm average (p-value) / range)

(∆Tm average (p-value) / range)

(∆Tm average (p-value) / range)

(∆Tm average (p-value) / range)

A (4.65/ 0.10)

A (0.12)

A (7.81/0.35)

A (0.12)

NV

NV

NV

NV

A (2.15/0.70)

A (0.21)

N (0.87/0.25)

A (0.21)

NV

NV

NV

NV

1

2

52

ACS Paragon Plus Environment

Page 53 of 68 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Journal of Chemical Information and Modeling

NV

NV

A (1.15/0.31)

A (0.17)

NV

NV

A (1.15/0.10)

A (0.16)

3 A= active, N= not Active. Shows the thermal shift values (∆Tm) tested at 10 µM averaged between the midpoint and first differential (see methods) and the range of the two measurements. p-values were calculated from conformal predictions and show the probability values that the compound belongs to the class assigned. The table shows four compound structures for which selectivity was predicted by the model including two examples of a dual BRD1 and BRPF1b active prediction (one of which was correctly predicted the other partially correctly predicted), one example of a dual BRD4 BD1 and BRPF1b active and one example of a compound which is active at BRD4 BD1 but inactive for BRD1, BRD9 and BRPF1b. *Compound classified as inactive, due to being below the value for 3 times the standard deviation of the DMSO control.

53

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Page 54 of 68

Table 4. True positive (TP), True negative (TN), False positive (FP) and False negative (FN) values for the prospective validation results on a per-bromodomain basis and overall. Calculated from these values are the balanced accuracies, Matthew’s Correlation Coefficient (MCC) and the active hit rate percentage. Bromodomain

TP

TN

FP

FN

Balanced Accuracy

MCC

Active hit rate (%)

Overall

319

193

616

11

0.60

0.24

34.1

BRD1

57

66

162

0

0.64

0.27

26.0

BRD4 BD1

129

40

111

1

0.63

0.36

53.8

BRD9

62

33

182

8

0.52

0.05

25.4

BRPF1b

71

54

161

2

0.61

0.25

30.6

54

ACS Paragon Plus Environment

Page 55 of 68 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

AUTHOR INFORMATION Corresponding Author For correspondence please contact Dr. Andreas Bender, Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, UK, CB2 LEW. Present Addresses ±

Helen Boyd Current Address:

Chief of Staff, Drug Safety and Metabolism, IMED Biotech Unit, AstraZeneca, Cambridge, CB4 0WG, UK.

Author Contributions K.G carried out the computational analysis with supervision from A.B and S.H. K.G, A.B and S.H together conceived the idea for the project. P.H and H.B co-ordinated and conducted the experimental validation. All authors have given approval to the final version of the manuscript. Funding Sources The authors would like to acknowledge AstraZeneca and the European Research Council for funding. The authors declare no competing financial interest. ACKNOWLEDGMENTS The authors would like to acknowledge Robert Sheppard, Thomas Hayhow, Willem Nissink, Al Rabow, Avid Afzal, Fredrik Svensson, Lewis Mervin, Krishna Bulusu, Stephanie Ashenden and Clare Gregson for their useful discussions and feedback throughout the project, as well as Sarah

ACS Paragon Plus Environment

55

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 56 of 68

Scholze and Annika Brosig for making the protein for experiments. Kathryn Giblin would like to acknowledge AstraZeneca and the European Research Council for funding her PhD. ASSOCIATED CONTENT The following Supporting Information is provided: Figures S1 -S17 and Tables S1-S14 (PDF) Public dataset from ChEMBL, PubChem and manually curated data (.xlsx) This information is available free of charge via the Internet at http://pubs.acs.org

ACS Paragon Plus Environment

56

Page 57 of 68 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

REFERENCES (1)

Sanchez, Roberto; Meslamani, Jamel; Zhou, Ming-Ming. The Bromodomain: From Epigenome Reader to Druggable Target. Biochim. Biophys. Acta 2014, 1839, 676–685.

(2)

Muller, Susanne; Filippakopoulos, Panagis; Knapp, Stefan. Bromodomains as Therapeutic Targets. Expert Rev. Mol. Med. 2011, 13, e29.

(3)

Josling, Gabrielle A.; Selvarajah, Shamista A.; Petter, Michaela; Duffy, Michael F. The Role of Bromodomain Proteins in Regulating Gene Expression. Genes (Basel). 2012, 3, 320–343.

(4)

Ferri, Elena; Petosa, Carlo; McKenna, Charles E. Bromodomains: Structure, Function and Pharmacology of Inhibition. Biochem. Pharmacol. 2016, 106, 1–18.

(5)

Arkin, Michelle R. R.; Tang, Yinyan; Wells, James A. A. Small-Molecule Inhibitors of Protein-Protein Interactions: Progressing toward the Reality. Chem. Biol. 2014, 21, 1102– 1114.

(6)

Brand, Michael; Measures, Angelina M.; Wilson, Brian G.; Cortopassi, Wilian A.; Alexander, Rikki; Höss, Matthias; Hewings, David S.; Rooney, Timothy P. C.; Paton, Robert S.; Conway, Stuart J. Small Molecule Inhibitors of Bromodomain-Acetyl-Lysine Interactions. ACS Chem. Biol. 2015, 10, 22–39.

(7)

Prinjha, R. K.; Witherington, J.; Lee, K. Place Your BETs: The Therapeutic Potential of Bromodomains. Trends Pharmacol. Sci. 2012, 33, 146–153.

(8)

Müller, S.; Knapp, S. Discovery of BET Bromodomain Inhibitors and Their Role in Target Validation. Medchemcomm 2014, 5, 288–296.

(9)

Atkinson, Stephen J.; Soden, Peter E.; Angell, Davina C.; Bantscheff, Marcus; Chung, Chun Wa; Giblin, Kathryn A.; Smithers, Nicholas; Furze, Rebecca C.; Gordon, Laurie; Drewes, Gerard; Rioja, Inmaculada; Witherington, Jason; Parr, Nigel J.; Prinjha, Rab K. The Structure Based Design of Dual HDAC/BET Inhibitors as Novel Epigenetic Probes. Medchemcomm 2014, 5, 342–351.

(10)

Galdeano, Carles; Ciulli, Alessio. Selectivity On-Target of Bromodomain Chemical Probes by Structure-Guided Medicinal Chemistry and Chemical Biology. Future Med. Chem. 2016, 8, 1655–1680.

(11)

Sepehri, Bakhtyar; Rasouli, Zolaikha; Hassanzadeh, Zeinabe; Ghavami, Raouf. Molecular Docking and QSAR Analysis of Naphthyridone Derivatives as ATAD2 Bromodomain Inhibitors: Application of CoMFA, LS-SVM, and RBF Neural Network. Medicinal Chemistry Research. Birkhauser Boston September 7, 2016, pp 2895–2905.

(12)

Wan, Shunzhou; Bhati, Agastya P.; Zasada, Stefan J.; Wall, Ian; Green, Darren; Bamborough, Paul; Coveney, Peter V. Rapid and Reliable Binding Affinity Prediction of Bromodomain Inhibitors: A Computational Study. J. Chem. Theory Comput. 2017, 13, 784–795.

(13)

Aldeghi, Matteo; Heifetz, Alexander; Bodkin, Michael J.; Knapp, Stefan; Biggin, Philip C. Predictions of Ligand Selectivity from Absolute Binding Free Energy Calculations. J. Am. Chem. Soc. 2017, 139, 946–957.

ACS Paragon Plus Environment

57

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 58 of 68

(14)

García-Jacas, C. R.; Martinez-Mayorga, K.; Marrero-Ponce, Y.; Medina-Franco, J. L. Conformation-Dependent QSAR Approach for the Prediction of Inhibitory Activity of Bromodomain Modulators. SAR QSAR Environ. Res. 2017, 28, 41–58.

(15)

Tropsha, Alexander; Golbraikh, Alexander. Predictive QSAR Modeling Workflow, Model Applicability Domains, and Virtual Screening. Curr. Pharm. Des. 2007, 13, 3494–3504.

(16)

Cortés-Ciriano, Isidro; Ain, Qurrat Ul; Subramanian, Vigneshwari; Lenselink, Eelke B.; Méndez-Lucio, Oscar; IJzerman, Adriaan P.; Wohlfahrt, Gerd; Prusis, Peteris; Malliavin, Thérèse E.; van Westen, Gerard J. P.; Bender, Andreas. Polypharmacology Modelling Using Proteochemometrics (PCM): Recent Methodological Developments, Applications to Target Families, and Future Prospects. Med. Chem. Commun. 2015, 6, 24–50.

(17)

Van Westen, Gerard J. P.; Wegner, J??rg K.; Ijzerman, Adriaan P.; Van Vlijmen, Herman W. T.; Bender, A. Proteochemometric Modeling as a Tool to Design Selective Compounds and for Extrapolating to Novel Targets. Medchemcomm 2011, 2, 16–30.

(18)

Gao, Jun; Huang, Qi; Wu, Dingfeng; Zhang, Qingchen; Zhang, Yida; Chen, Tian; Liu, Qi; Zhu, Ruixin; Cao, Zhiwei; He, Yuan. Study on Human GPCR–inhibitor Interactions by Proteochemometric Modeling. Gene 2013, 518, 124–131.

(19)

Lapinsh, Maris; Prusis, Peteris; Lundstedt, Torbjörn; Wikberg, Jarl E. S. Proteochemometrics Modeling of the Interaction of Amine G-Protein Coupled Receptors with a Diverse Set of Ligands. Mol. Pharmacol. 2002, 61, 1465–1475.

(20)

Subramanian, Vigneshwari; Prusis, Peteris; Pietilä, Lars-Olof; Xhaard, Henri; Wohlfahrt, Gerd; Pietilä, Lars-Olof; Xhaard, Henri; Wohlfahrt, Gerd. Visually Interpretable Models of Kinase Selectivity Related Features Derived from Field-Based Proteochemometrics. J. Chem. Inf. Model. 2013, 53, 3021–3030.

(21)

Subramanian, Vigneshwari; Prusis, Peteris; Xhaard, Henri; Wohlfahrt, Gerd. Predictive Proteochemometric Models for Kinases Derived from 3D Protein Field-Based Descriptors. Med. Chem. Commun. 2016, 7, 1007–1015.

(22)

Lapins, Maris; Eklund, Martin; Spjuth, Ola; Prusis, Peteris; Wikberg, Jarl E. S. Proteochemometric Modeling of HIV Protease Susceptibility. BMC Bioinformatics 2008, 9, 181.

(23)

Lapins, Maris; Worachartcheewan, Apilak; Spjuth, Ola; Georgiev, Valentin; Prachayasittikul, Virapong; Nantasenamat, Chanin; Wikberg, Jarl E. S. A Unified Proteochemometric Model for Prediction of Inhibition of Cytochrome P450 Isoforms. PLoS One 2013, 8, e66566.

(24)

Wu, Dingfeng; Huang, Qi; Zhang, Yida; Zhang, Qingchen; Liu, Qi; Gao, Jun; Cao, Zhiwei; Zhu, Ruixin. Screening of Selective Histone Deacetylase Inhibitors by Proteochemometric Modeling. BMC Bioinformatics 2012, 13, 212.

(25)

Tresadern, Gary; Trabanco, Andres A.; Pérez-Benito, Laura; Overington, John P.; van Vlijmen, Herman W. T.; van Westen, Gerard J. P. Identification of Allosteric Modulators of Metabotropic Glutamate 7 Receptor Using Proteochemometric Modeling. J. Chem. Inf. Model. 2017, 57, 2976–2985.

(26)

Qiu, Tianyi; Qiu, Jingxuan; Feng, Jun; Wu, Dingfeng; Yang, Yiyan; Tang, Kailin; Cao, Zhiwei; Zhu, Ruixin. The Recent Progress in Proteochemometric Modelling: Focusing on

ACS Paragon Plus Environment

58

Page 59 of 68 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Target Descriptors, Cross-Term Descriptors and Application Scope. Brief. Bioinform. 2016, bbw004. (27)

Burlingham, Benjamin T.; Widlanski, Theodore S. An Intuitive Look at the Relationship of Ki and IC50: A More General Use for the Dixon Plot. J. Chem. Educ. 2003, 80, 214– 218.

(28)

Quinn, Elizabeth; Wodicka, Lisa; Ciceri, Pietro; Pallares, Gabriel; Pickle, Elyssa; Torrey, Adam; Floyd, Mark; Hunt, Jeremy; Treiber, Daniel. Abstract 4238: BROMO Scan - a High Throughput, Quantitative Ligand Binding Platform Identifies Best-in-Class Bromodomain Inhibitors from a Screen of Mature Compounds Targeting Other Protein Classes. Cancer Res. 2013, 73, 4238–4238.

(29)

Eduati, Federica; Mangravite, Lara M.; Wang, Tao; Tang, Hao; Bare, J. Christopher; Huang, Ruili; Norman, Thea; Kellen, Mike; Menden, Michael P.; Yang, Jichen; Zhan, Xiaowei; Zhong, Rui; Xiao, Guanghua; Xia, Menghang; Abdo, Nour; Kosyk, Oksana; Friend, Stephen; Dearry, Allen; Simeonov, Anton; Tice, Raymond R.; Rusyn, Ivan; Wright, Fred A.; Stolovitzky, Gustavo; Xie, Yang; Saez-Rodriguez, Julio; Menden, Michael P.; Yang, Jichen; Zhan, Xiaowei; Zhong, Rui; Xiao, Guanghua; Xia, Menghang; Abdo, Nour; Kosyk, Oksana; Friend, Stephen; Dearry, Allen; Simeonov, Anton; Tice, Raymond R.; Rusyn, Ivan; Wright, Fred A.; Stolovitzky, Gustavo; Xie, Yang; SaezRodriguez, Julio; Aittokallio, Tero; Alaimo, Salvatore; Amadoz, Alicia; Ammad-ud-din, Muhammad; Azencott, Chloé-Agathe; Bacardit, Jaume; Barron, Pelham; Bernard, Elsa; Beyer, Andreas; Bin, Shao; Bömmel, Alena van; Borgwardt, Karsten; Brys, April M.; Caffrey, Brian; Chang, Jeffrey; Chang, Jungsoo; Chheda, Himanshu; Christodoulou, Eleni G.; Clément-Ziza, Mathieu; Cohen, Trevor; Cowherd, Marianne; Demeyer, Sofie; Dopazo, Joaquin; Elhard, Joel D.; Falcao, Andre O.; Ferro, Alfredo; Friedenberg, David A.; Giugno, Rosalba; Gong, Yunguo; Gorospe, Jenni W.; Granville, Courtney A.; Grimm, Dominik; Heinig, Matthias; Hernansaiz, Rosa D.; Hintsanen, Petteri; Hochreiter, Sepp; Huang, Liang-Chin; Huska, Matthew; Jaiswal, Alok; Jiao, Yunlong; Kaski, Samuel; Kaur, Ismeet; Khana, Suleiman Ali; Klambauer, Günter; Krasnogor, Natalio; Kuhn, Michael; Kursa, Miron Bartosz; Kutum, Rintu; Lazzarini, Nicola; Lee, Inhan; Leung, Michael K. K.; Lim, Weng Khong; Liu, Charlie; López, Felipe Llinares; Mammana, Alessandro; Mayr, Andreas; Michoel, Tom; Mongiovì, Misael; Moore, Jonathan D.; Mpindi, JohnPatrick; Narasimhan, Ravi; Opiyo, Stephen O.; Pandey, Gaurav; Peabody, Andrea L.; Perner, Juliane; Poso, Antti; Pulvirenti, Alfredo; Rawlik, Konrad; Reinhardt, Susanne; Riffle, Carol G.; Ruderfer, Douglas; Sander, Aaron J.; Savage, Richard S.; Scornet, Erwan; Sebastian-Leon, Patricia; Sharan, Roded; Simon-Gabriel, Carl Johann; Stoven, Veronique; Sun, Jingchun; Tang, Jing; Teixeira, Ana L.; Tenesa, Albert; Vert, JeanPhilippe; Vingron, Martin; Walter, Thomas; Wennerberg, Krister; Whalen, Sean; Wiśniewska, Zofia; Wu, Yonghui; Xu, Hua; Zhang, Shihua; Zhao, Junfei; Zheng, W. Jim; Ziwei, Dai; Friend, Stephen; Dearry, Allen; Simeonov, Anton; Tice, Raymond R.; Rusyn, Ivan; Wright, Fred A.; Stolovitzky, Gustavo; Xie, Yang; Saez-Rodriguez, Julio. Prediction of Human Population Responses to Toxic Compounds by a Collaborative Competition. Nat. Biotechnol. 2015, 33, 933–940.

(30)

Ain, Qurrat U.; Mé ndez-Lucio, Oscar; Corté Ciriano, Isidro; rè se Malliavin, Thé; P van Westen, Gerard J.; Bender, Andreas. Modelling Ligand Selectivity of Serine Proteases Using Integrative Proteochemometric Approaches Improves Model Performance and

ACS Paragon Plus Environment

59

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 60 of 68

Allows the Multi-Target Dependent Interpretation of Features. Integr. Biol. Integr. Biol 2014, 6, 1023–1033. (31)

Koutsoukas, Alexios; Lowe, Robert; KalantarMotamedi, Yasaman; Mussa, Hamse Y.; Klaffke, Werner; Mitchell, John B. O.; Glen, Robert C.; Bender, Andreas. In Silico Target Predictions: Defining a Benchmarking Data Set and Comparison of Performance of the Multiclass Naïve Bayes and Parzen-Rosenblatt Window. J. Chem. Inf. Model. 2013, 53, 1957–1966.

(32)

Bento, a. Patrícia; Gaulton, Anna; Hersey, Anne; Bellis, Louisa J.; Chambers, Jon; Davies, Mark; Krüger, Felix a.; Light, Yvonne; Mak, Lora; McGlinchey, Shaun; Nowotka, Michal; Papadatos, George; Santos, Rita; Overington, John P. The ChEMBL Bioactivity Database: An Update. Nucleic Acids Res. 2014, 42, D1083-90.

(33)

Wang, Yanli; Suzek, Tugba; Zhang, Jian; Wang, Jiyao; He, Siqian; Cheng, Tiejun; Shoemaker, Benjamin A.; Gindulyte, Asta; Bryant, Stephen H. PubChem BioAssay: 2014 Update. Nucleic Acids Res. 2014, 42, D1075-82.

(34)

Meslamani, Jamel; Smith, Steven G.; Sanchez, Roberto; Zhou, Ming-Ming. ChEpiMod: A Knowledgebase for Chemical Modulators of Epigenome Reader Domains. Bioinformatics 2014, 30, 1481–1483.

(35)

Jagarlapudi, Sarma A. R. P.; Kishan, K. V. Radha. Database Systems for KnowledgeBased Discovery. In Methods in molecular biology (Clifton, N.J.); Jacoby E., Ed.; Humana Press: Totowa, NJ, 2009; Vol. 575, pp 159–172.

(36)

Martin, Laetitia J.; Koegl, Manfred; Bader, Gerd; Cockcroft, Xiao-Ling; Fedorov, Oleg; Fiegen, Dennis; Gerstberger, Thomas; Hofmann, Marco H.; Hohmann, Anja F.; Kessler, Dirk; Knapp, Stefan; Knesl, Petr; Kornigg, Stefan; Müller, Susanne; Nar, Herbert; Rogers, Catherine; Rumpel, Klaus; Schaaf, Otmar; Steurer, Steffen; Tallant, Cynthia; Vakoc, Christopher R.; Zeeb, Markus; Zoephel, Andreas; Pearson, Mark; Boehmelt, Guido; McConnell, Darryl. Structure-Based Design of an in Vivo Active Selective BRD9 Inhibitor. J. Med. Chem. 2016, 59, 4462–4475.

(37)

Gerstenberger, Brian S.; Trzupek, John D.; Tallant, Cynthia; Fedorov, Oleg; Filippakopoulos, Panagis; Brennan, Paul E.; Fedele, Vita; Martin, Sarah; Picaud, Sarah; Rogers, Catherine; Parikh, Mihir; Taylor, Alexandria; Samas, Brian; O’Mahony, Alison; Berg, Ellen; Pallares, Gabriel; Torrey, Adam D.; Treiber, Daniel K.; Samardjiev, Ivan J.; Nasipak, Brian T.; Padilla-Benavides, Teresita; Wu, Qiong; Imbalzano, Anthony N.; Nickerson, Jeffrey A.; Bunnage, Mark E.; Müller, Susanne; Knapp, Stefan; Owen, Dafydd R. Identification of a Chemical Probe for Family VIII Bromodomains through Optimization of a Fragment Hit. J. Med. Chem. 2016, 59, 4800–4811.

(38)

Crawford, Terry D.; Tsui, Vickie; Flynn, E. Megan; Wang, Shumei; Taylor, Alexander M.; Côté, Alexandre; Audia, James E.; Beresini, Maureen H.; Burdick, Daniel J.; Cummings, Richard; Dakin, Les A.; Duplessis, Martin; Good, Andrew C.; Hewitt, Michael C.; Huang, Hon-Ren; Jayaram, Hariharan; Kiefer, James R.; Jiang, Ying; Murray, Jeremy; Nasveschuk, Christopher G.; Pardo, Eneida; Poy, Florence; Romero, F. Anthony; Tang, Yong; Wang, Jian; Xu, Zhaowu; Zawadzke, Laura E.; Zhu, Xiaoyu; Albrecht, Brian K.; Magnuson, Steven R.; Bellon, Steve; Cochran, Andrea G. Diving into the Water: Inducible Binding Conformations for BRD4, TAF1(2), BRD9, and CECR2 Bromodomains. J. Med. Chem. 2016, 59, 5391–5402.

ACS Paragon Plus Environment

60

Page 61 of 68 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(39)

Bamborough, Paul; Barnett, Heather A.; Becher, Isabelle; Bird, Mark J.; Chung, Chun-wa; Craggs, Peter D.; Demont, Emmanuel H.; Diallo, Hawa; Fallon, David J.; Gordon, Laurie J.; Grandi, Paola; Hobbs, Clare I.; Hooper-Greenhill, Edward; Jones, Emma J.; Law, Robert P.; Le Gall, Armelle; Lugo, David; Michon, Anne-Marie; Mitchell, Darren J.; Prinjha, Rab K.; Sheppard, Robert J.; Watson, Allan J. B.; Watson, Robert J. GSK6853, a Chemical Probe for Inhibition of the BRPF1 Bromodomain. ACS Med. Chem. Lett. 2016, 7, 552–557.

(40)

Gao, Nana; Ren, Jixia; Hou, Li; Zhou, Yue; Xin, Ling; Wang, Jiedong; Yu, Heming; Xie, Yong; Wang, Huiping. Identification of Novel Potent Human Testis-Specific and Bromodomain-Containing Protein (BRDT) Inhibitors Using Crystal Structure-Based Virtual Screening. 2016, 38, 39–44.

(41)

Kharenko, Olesya A.; Gesner, Emily M.; Patel, Reena G.; Norek, Karen; White, Andre; Fontano, Eric; Suto, Robert K.; Young, Peter R.; McLure, Kevin G.; Hansen, Henrik C. RVX-297- a Novel BD2 Selective Inhibitor of BET Bromodomains. Biochem. Biophys. Res. Commun. 2016, 477, 62–67.

(42)

Drouin, Ludovic; McGrath, Sally; Vidler, Lewis R.; Chaikuad, Apirat; Monteiro, Octovia; Tallant, Cynthia; Philpott, Martin; Rogers, Catherine; Fedorov, Oleg; Liu, Manjuan; Akhtar, Wasim; Hayes, Angela; Raynaud, Florence; Müller, Susanne; Knapp, Stefan; Hoelder, Swen. Structure Enabled Design of BaZ2-ICR, a Chemical Probe Targeting the Bromodomains of BaZ2a and BaZ2B. 2015, 58, 2553–2559.

(43)

Clark, Peter G. K.; Vieira, Lucas C. C.; Tallant, Cynthia; Fedorov, Oleg; Singleton, Dean C.; Rogers, Catherine M.; Monteiro, Octovia P.; Bennett, James M.; Baronio, Roberta; Müller, Susanne; Daniels, Danette L.; Méndez, Jacqui; Knapp, Stefan; Brennan, Paul E.; Dixon, Darren J. LP99: Discovery and Synthesis of the First Selective BRD7/9 Bromodomain Inhibitor. Angew. Chem. Int. Ed. Engl. 2015, 54, 6217–6221.

(44)

Clark, Peter G. K.; Dixon, Darren J.; Brennan, Paul E. Development of Chemical Probes for the Bromodomains of BRD7 and BRD9. Drug Discov. Today Technol. 2016, 19, 73– 80.

(45)

Chen, Peiling; Chaikuad, Apirat; Bamborough, Paul; Bantscheff, Marcus; Bountra, Chas; Chung, Chun Wa; Fedorov, Oleg; Grandi, Paola; Jung, David; Lesniak, Robert; Lindon, Matthew; Müller, Susanne; Philpott, Martin; Prinjha, Rab; Rogers, Catherine; Selenski, Carolyn; Tallant, Cynthia; Werner, Thilo; Willson, Timothy M.; Knapp, Stefan; Drewry, David H. Discovery and Characterization of GSK2801, a Selective Chemical Probe for the Bromodomains BAZ2A and BAZ2B. J. Med. Chem. 2016, 59, 1410–1424.

(46)

McKeown, Michael R.; Shaw, Daniel L.; Fu, Harry; Liu, Shuai; Xu, Xiang; Marineau, Jason J.; Huang, Yibo; Zhang, Xiaofeng; Buckley, Dennis L.; Kadam, Asha; Zhang, Zijuan; Blacklow, Stephen C.; Qi, Jun; Zhang, Wei; Bradner, James E. Biased Multicomponent Reactions to Develop Novel Bromodomain Inhibitors. J. Med. Chem. 2014, 57, 9019–9027.

(47)

Guetzoyan, Lucie; Ingham, Richard J.; Nikbin, Nikzad; Rossignol, Julien; Wolling, Michael; Baumert, Mark; Burgess-Brown, Nicola A.; Strain-Damerell, Claire M.; Shrestha, Leela; Brennan, Paul E.; Fedorov, Oleg; Knapp, Stefan; Ley, Steven V. Machine-Assisted Synthesis of Modulators of the Histone Reader BRD9 Using Flow Methods of Chemistry and Frontal Affinity Chromatography. Med. Chem. Commun. 2014,

ACS Paragon Plus Environment

61

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 62 of 68

5, 540–546. (48)

Cortopassi, Wilian A.; Kumar, Kiran; Paton, Robert S.; Cortopassi, W. A.; Kumar, K.; Duarte, F.; Pimentel, A. S.; Paton, R. S.; Brand, M.; Measures, A. M.; Wilson, B. G.; Cortopassi, W. A.; Alexander, R.; Hoss, M.; Hewings, D. S.; Rooney, T. P.; Paton, R. S.; Conway, S. J.; Uemura, M.; Yamamoto, H.; Takemasa, I.; Mimori, K.; Hemmi, H.; Mizushima, T.; Ikeda, M.; Sekimoto, M.; Matsuura, N.; Doki, Y.; Mori, M.; Okada, Y.; Scott, G.; Ray, M. K.; Mishina, Y.; Zhang, Y.; Filippakopoulos, P.; Knapp, S.; Pan, C.; Mezei, M.; Mujtaba, S.; Muller, M.; Zeng, L.; Li, J.; Wang, Z.; Zhou, M. M.; Borah, J. C.; Mujtaba, S.; Karakikes, I.; Zeng, L.; Muller, M.; Patel, J.; Moshkina, N.; Morohashi, K.; Zhang, W.; Gerona-Navarro, G.; Hajjar, R. J.; Zhou, M. M.; Gerona-Navarro, G.; Yoel, R.; Mujtaba, S.; Frasca, A.; Patel, J.; Zeng, L.; Plotnikov, A. N.; Osman, R.; Zhou, M. M.; Galdeano, C.; Ciulli, A.; Hewings, D. S.; Fedorov, O.; Filippakopoulos, P.; Martin, S.; Picaud, S.; Tumber, A.; Wells, C.; Olcina, M. M.; Freeman, K.; Gill, A.; Ritchie, A. J.; Sheppard, D. W.; Russell, A. J.; Hammond, E. M.; Knapp, S.; Brennan, P. E.; Conway, S. J.; Rooney, T. P.; Filippakopoulos, P.; Fedorov, O.; Picaud, S.; Cortopassi, W. A.; Hay, D. A.; Martin, S.; Tumber, A.; Rogers, C. M.; Philpott, M.; Wang, M.; Thompson, A. L.; Heightman, T. D.; Pryde, D. C.; Cook, A.; Paton, R. S.; Muller, S.; Knapp, S.; Brennan, P. E.; Conway, S. J.; Xu, M.; Unzue, A.; Dong, J.; Spiliotopoulos, D.; Nevado, C.; Caflisch, A.; Unzue, A.; Xu, M.; Dong, J.; Wiedmer, L.; Spiliotopoulos, D.; Caflisch, A.; Nevado, C.; Picaud, S.; Fedorov, O.; Thanasopoulou, A.; Leonards, K.; Jones, K.; Meier, J.; Olzscha, H.; Monteiro, O.; Martin, S.; Philpott, M.; Tumber, A.; Filippakopoulos, P.; Yapp, C.; Wells, C.; Che, K. H.; Bannister, A.; Robson, S.; Kumar, U.; Parr, N.; Lee, L.; Lugo, D.; Jeffrey, P.; Taylor, S.; Vecellio, M. L.; Bountra, C.; Brennan, P. E.; O’Mahony, A.; Velichko, S.; Muller, S.; Hay, D.; Daniels, D. L.; Urh, M.; Thangue, N. B. La; Kouzarides, T.; Prinjha, R.; Schwaller, J.; Knapp, S.; Hay, D. A.; Fedorov, O.; Martin, S.; Singleton, D. C.; Tallant, C.; Wells, C.; Picaud, S.; Philpott, M.; Monteiro, O. P.; Rogers, C. M.; Conway, S. J.; Rooney, T. P.; Tumber, A.; Yapp, C.; Filippakopoulos, P.; Bunnage, M. E.; Muller, S.; Knapp, S.; Schofield, C. J.; Brennan, P. E.; Zolek, T.; Maciejewska, D.; Zabinski, J.; Kazmierczak, P.; Rezler, M.; Shaikh, S. A.; Jayaram, B.; Martinez, J.; Sanchez, R.; Castellanos, M.; Makarava, N.; Aguzzi, A.; Baskakov, I. V.; Gasset, M.; Hugle, M.; Lucas, X.; Weitzel, G.; Ostrovskyi, D.; Breit, B.; Gerhardt, S.; Einsle, O.; Gunther, S.; Wohlwend, D.; Helbling, R. E.; Aeschimann, W.; Simona, F.; Stocker, A.; Cascella, M.; Holdgate, G. A.; Ward, W. H. J.; Sunner, J.; Nishizawa, K.; Kebarle, P.; Meotner, M.; Deakyne, C. A.; Pizzitutti, F.; Giansanti, A.; Ballario, P.; Ornaghi, P.; Torreri, P.; Ciccotti, G.; Filetici, P.; Magno, A.; Steiner, S.; Caflisch, A.; Ferguson, F. M.; Fedorov, O.; Chaikuad, A.; Philpott, M.; Muniz, J. R.; Felletar, I.; Delft, F. von; Heightman, T.; Knapp, S.; Abell, C.; Ciulli, A.; Lucas, X.; Wohlwend, D.; Hugle, M.; Schmidtkunz, K.; Gerhardt, S.; Schule, R.; Jung, M.; Einsle, O.; Gunther, S.; Wheeler, S. E.; Houk, K. N.; Minoux, H.; Chipot, C.; Gallivan, J. P.; Dougherty, D. A.; Dougherty, D. A.; Fong, T. M.; Cascieri, M. A.; Yu, H.; Bansal, A.; Swain, C.; Strader, C. D.; Wintjens, R.; Lievin, J.; Rooman, M.; Buisine, E.; Wheeler, S. E.; Houk, K. N.; Aldeghi, M.; Heifetz, A.; Bodkin, M. J.; Knappcd, S.; Biggin, P. C.; Steiner, S.; Magno, A.; Huang, D. Z.; Caflisch, A.; 3rd, B. R. Miller; Jr., T. D. McGee; Swails, J. M.; Homeyer, N.; Gohlke, H.; Roitberg, A. E.; Ran, T.; Zhang, Z. M.; Liu, K. J.; Lu, Y.; Li, H. F.; Xu, J. X.; Xiong, X.; Zhang, Y. M.; Xu, A. Y.; Lu, S.; Liu, H. C.; Lu, T.; Chen, Y. D.; Oehme, D. P.; Brownlee, R. T.; Wilson, D. J.; Wright, D. W.; Hall, B. A.; Kenway, O. A.; Jha, S.;

ACS Paragon Plus Environment

62

Page 63 of 68 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Coveney, P. V.; Hou, T. J.; Wang, J. M.; Li, Y. Y.; Wang, W.; Rapp, C.; Kalyanaraman, C.; Schiffmiller, A.; Schoenbrun, E. L.; Jacobson, M. P.; Caldwell, J. W.; Kollman, P. A.; Kumpf, R. A.; Dougherty, D. A.; Caldwell, J. W.; Kollman, P. A.; Kuang, M.; Zhou, J.; Wang, L.; Liu, Z.; Guo, J.; Wu, R.; Heifetz, A.; Trani, G.; Aldeghi, M.; MacKinnon, C. H.; McEwan, P. A.; Brookfield, F. A.; Chudyk, E. I.; Bodkin, M.; Pei, Z.; Burch, J. D.; Ortwine, D. F.; Zhong, W.; Gallivan, J. P.; Zhang, Y.; Li, L.; Lester, H. A.; Dougherty, D. A.; Gallivan, J. P.; Dougherty, D. A.; Dougherty, D. A.; Cubero, E.; Luque, F. J.; Orozco, M.; Tsuzuki, S.; Yoshida, M.; Uchimaru, T.; Mikami, M.; Raju, R. K.; Bloom, J. W.; An, Y.; Wheeler, S. E.; Kennedy, C. R.; Lin, S.; Jacobsen, E. N.; Johnston, C. P.; Kothari, A.; Sergeieva, T.; Okovytyy, S. I.; Jackson, K. E.; Paton, R. S.; Smith, M. D.; Boys, S. F.; Bernardi, F.; Simon, S.; Duran, M.; Dannenberg, J. J.; Zhao, Y.; Truhlar, D. G.; Pople, J. A.; Headgordon, M.; Raghavachari, K.; Grimme, S.; Ehrlich, S.; Goerigk, L.; Marenich, A. V.; Cramer, C. J.; Truhlar, D. G.; Marenich, A. V.; Olson, R. M.; Kelly, C. P.; Cramer, C. J.; Truhlar, D. G.; Mujtaba, S.; Zeng, L.; Zhou, M. M.; Wang, J. M.; Wang, W.; Kollman, P. A.; Case, D. A.; Dupradeau, F. Y.; Pigache, A.; Zaffran, T.; Savineau, C.; Lelong, R.; Grivel, N.; Lelong, D.; Rosanski, W.; Cieplak, P.; Connolly, M. L.; Kendall, R. A.; Dunning, T. H.; Harrison, R. J.; Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L.; Humphrey, W.; Dalke, A.; Schulten, K.; Genheden, S.; Ryde, U.; Muvva, C.; Singam, E. R.; Raman, S. S.; Subramanian, V.; Phillips, J. C.; Braun, R.; Wang, W.; Gumbart, J.; Tajkhorshid, E.; Villa, E.; Chipot, C.; Skeel, R. D.; Kale, L.; Schulten, K.; Dougherty, D. A.; Mecozzi, S.; West, A. P.; Dougherty, D. A.; Taylor, A. M.; An, Y.; Doney, A. C.; Andrade, R. B.; Wheeler, S. E.; Raju, R. K.; Bloom, J. W. G.; An, Y.; Wheeler, S. E.; Wheeler, S. E.; Popp, T. A.; Tallant, C.; Rogers, C.; Fedorov, O.; Brennan, P. E.; Müller, S.; Knapp, S.; Bracher, F.; Paton, R. S.; Goodman, J. M.; Sun, H. Y.; Li, Y. Y.; Tian, S.; Xu, L.; Hou, T. J.; Stein, R. S. L.; Li, N.; He, W.; Komives, E.; Wang, W. Cation–π Interactions in CREBBP Bromodomain Inhibition: An Electrostatic Model for Small-Molecule Binding Affinity and Selectivity. Org. Biomol. Chem. 2016, 14, 10926–10938. (49)

Hay, Duncan A.; Fedorov, Oleg; Martin, Sarah; Singleton, Dean C.; Tallant, Cynthia; Wells, Christopher; Picaud, Sarah; Philpott, Martin; Monteiro, Octovia P.; Rogers, Catherine M.; Conway, Stuart J.; Rooney, Timothy P. C.; Tumber, Anthony; Yapp, Clarence; Filippakopoulos, Panagis; Bunnage, Mark E.; Müller, Susanne; Knapp, Stefan; Schofield, Christopher J.; Brennan, Paul E. Discovery and Optimization of SmallMolecule Ligands for the CBP/P300 Bromodomains. J. Am. Chem. Soc. 2014, 136, 9308– 9319.

(50)

Hulme, Edward C.; Trevethick, Mike A. Ligand Binding Assays at Equilibrium: Validation and Interpretation. Br. J. Pharmacol. 2010, 161, 1219–1237.

(51)

Magrane, Michele; Consortium, Uni Prot. UniProt Knowledgebase: A Hub of Integrated Protein Data. Database 2011, 2011, bar009.

(52)

PubChem Identifier Exchange Service https://pubchem.ncbi.nlm.nih.gov/idexchange/idexchange.cgi (accessed Feb 8, 2018).

(53)

Murrell, Daniel S.; Cortes-Ciriano, Isidro; Van Westen, Gerard J. P.; Stott, Ian P.; Bender, Andreas; Malliavin, Thérèse E.; Glen, Robert C. Chemically Aware Model Builder (Camb): An R Package for Property and Bioactivity Modelling of Small Molecules. J. Cheminform. 2015, 7, 45–55.

ACS Paragon Plus Environment

63

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 64 of 68

(54)

Indigo Toolkit http://lifescience.opensource.epam.com/ (accessed Feb 8, 2018).

(55)

Charif, Delphine; Lobry, Jean R. SeqinR 1.0-2: A Contributed Package to the R Project for Statistical Computing Devoted to Biological Sequences Retrieval and Analysis; Springer, Berlin, Heidelberg, 2007; pp 207–232.

(56)

Paradis E., Claude J. &. Strimmer K. APE: Analyses of Phylogenetics and Evolution in R Language. Bioinformatics 2004, 20, 289–290.

(57)

Sander, Thomas; Freyss, Joel; von Korff, Modest; Rufener, Christian. DataWarrior: An Open-Source Program for Chemistry Aware Data Visualization and Analysis. J. Chem. Inf. Model. 2015, 55, 460–473.

(58)

Bemis, Guy W.; Murcko, Mark A. The Properties of Known Drugs. 1. Molecular Frameworks. J. Med. Chem. 1996, 39, 2887–2893.

(59)

Berthold, Michael R.; Cebron, Nicolas; Dill, Fabian; Gabriel, Thomas R.; Kötter, Tobias; Meinl, Thorsten; Ohl, Peter; Thiel, Kilian; Wiswedel, Bernd. KNIME - the Konstanz Information Miner. In Data Analysis, Machine Learning and Applications. Studies in Classification, Data Analysis, and Knowledge Organization.; Preisach C., Burkhardt H., Schmidt-Thieme L., Decker R., Ed.; Springer: Berlin, Heidelberg, 2009; Vol. 11, p 26.

(60)

Ahlberg, Christopher. Spotfire: An Information Exploration Environment. ACM SIGMOD Rec. 1996, 25, 25–29.

(61)

Yap, Chun Wei. PaDEL-Descriptor: An Open Source Software to Calculate Molecular Descriptors and Fingerprints. J. Comput. Chem. 2011, 32, 1466–1474.

(62)

RDKit http://www.rdkit.org (accessed Mar 19, 2018).

(63)

Schwab, Christof H. Conformations and 3D Pharmacophore Searching. Drug Discov. Today Technol. 2010, 7, e245–e253.

(64)

MOE. Mol. Oper. Environ. (MOE), 2013.08; Chem. Comput. Gr. Inc., 1010 Sherbooke St. West, Suite #910, Montr. QC, Canada, H3A 2R7, 2016.

(65)

Cruciani, Gabriele; Pastor, Manuel; Guba, Wolfgang. VolSurf: A New Tool for the Pharmacokinetic Optimization of Lead Compounds. Eur. J. Pharm. Sci. 2000, 11, S29-39.

(66)

Ekins, S.; Mestres, J.; Testa, B. In Silico Pharmacology for Drug Discovery: Applications to Targets and Beyond. Br. J. Pharmacol. 2007, 152, 21–37.

(67)

Mulliner, Denis; Schmidt, Friedemann; Stolte, Manuela; Spirkl, Hans Peter; Czich, Andreas; Amberg, Alexander. Computational Models for Human and Animal Hepatotoxicity with a Global Application Scope. Chem. Res. Toxicol. 2016, 29, 757–767.

(68)

Sandberg, Maria; Eriksson, Lennart; Jonsson, Jörgen; Sjöström, Michael; Wold, Svante. New Chemical Descriptors Relevant for the Design of Biologically Active Peptides. A Multivariate Characterization of 87 Amino Acids. J. Med. Chem. 1998, 41, 2481–2491.

(69)

Van Westen, Gerard J. P.; Swier, Remco F.; Cortes-Ciriano, Isidro; Wegner, Jorg K.; Overington, John P.; Jzerman, Adriaan P. I.; Van Vlijmen, Herman W. T.; Bender, Andreas. Benchmarking of Protein Descriptor Sets in Proteochemometric Modeling (Part 2): Modeling Performance of 13 Amino Acid Descriptor Sets. J. Cheminform. 2013, 5, 42–62.

(70)

Filippakopoulos, Panagis; Picaud, Sarah; Mangos, Maria; Keates, Tracy; Lambert, Jean-

ACS Paragon Plus Environment

64

Page 65 of 68 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Philippe; Barsyte-Lovejoy, Dalia; Felletar, Ildiko; Volkmer, Rudolf; Müller, Susanne; Pawson, Tony; Gingras, Anne-Claude; Arrowsmith, Cheryl H.; Knapp, Stefan. Histone Recognition and Large-Scale Structural Analysis of the Human Bromodomain Family. Cell 2012, 149, 214–231. (71)

Max Kuhn Contributions form Jed Wing, Author; Weston, Steve; Williams, Andre; Max Kuhn, Maintainer. caret: Classification 5.15-044., Regression Training. R package version http://cran.r-project.org/package=caret (accessed Feb 8, 2018).

(72)

Rücker, Christoph; Rücker, Gerta; Meringer, Markus. Y-Randomization and Its Variants in QSPR/QSAR. J. Chem. Inf. Model. 2007, 47, 2345–2357.

(73)

Norinder, Ulf; Carlsson, Lars; Boyer, Scott; Eklund, Martin. Introducing Conformal Prediction in Predictive Modeling for Regulatory Purposes. A Transparent and Flexible Alternative to Applicability Domain Determination. Regul. Toxicol. Pharmacol. 2015, 71, 279–284.

(74)

Norinder, Ulf; Carlsson, Lars; Boyer, Scott; Eklund, Martin. Introducing Conformal Prediction in Predictive Modeling. A Transparent and Flexible Alternative to Applicability Domain Determination. J. Chem. Inf. Model. 2014, 54, 1596–1603.

(75)

Sun, Jiangming; Carlsson, Lars; Ahlberg, Ernst; Norinder, Ulf; Engkvist, Ola; Chen, Hongming. Applying Mondrian Cross-Conformal Prediction To Estimate Prediction Confidence on Large Imbalanced Bioactivity Data Sets. J. Chem. Inf. Model. 2017, 57, 1591–1598.

(76)

M Nissink, J. Willem; Blackburn, Sam. Quantification of Frequent-Hitter Behavior Based on Historical High-Throughput Screening Data. Future Med. Chem. 2014, 6, 1113–1126.

(77)

Cumming, John G.; Davis, Andrew M.; Muresan, Sorel; Haeberlein, Markus; Chen, Hongming. Chemical Predictive Modelling to Improve Compound Quality. Nat. Rev. Drug Discov. 2013, 12, 948–962.

(78)

Godden, Jeffrey W.; Xue, Ling; Bajorath, Jürgen. Combinatorial Preferences Affect Molecular Similarity/Diversity Calculations Using Binary Fingerprints and Tanimoto Coefficients. J. Chem. Inf. Comput. Sci. 2000, 40, 163–166.

(79)

Wickham, Hadley; Francois, Romain. A Grammar of Data Manipulation https://cran.rproject.org/web/packages/dplyr/index.html (accessed Feb 8, 2018).

(80)

Niesen, Frank H.; Berglund, Helena; Vedadi, Masoud. The Use of Differential Scanning Fluorimetry to Detect Ligand Interactions That Promote Protein Stability. Nat. Protoc. 2007, 2, 2212–2221.

(81)

Wang, Limei; Wu, Xiuyin; Huang, Ping; Lv, Zhijun; Qi, Yuping; Wei, Xiujuan; Yang, Pishan; Zhang, Fenghe. JQ1, a Small Molecule Inhibitor of BRD4, Suppresses Cell Growth and Invasion in Oral Squamous Cell Carcinoma. Oncol. Rep. 2016, 36, 1989– 1996.

(82)

Theodoulou, Natalie H.; Bamborough, Paul; Bannister, Andrew J.; Becher, Isabelle; Bit, Rino A.; Che, Ka Hing; Chung, Chun Wa; Dittmann, Antje; Drewes, Gerard; Drewry, David H.; Gordon, Laurie; Grandi, Paola; Leveridge, Melanie; Lindon, Matthew; Michon, Anne Marie; Molnar, Judit; Robson, Samuel C.; Tomkinson, Nicholas C. O.; Kouzarides, Tony; Prinjha, Rab K.; Humphreys, Philip G. Discovery of I-BRD9, a Selective Cell

ACS Paragon Plus Environment

65

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 66 of 68

Active Chemical Probe for Bromodomain Containing Protein 9 Inhibition. 2016, 59, 1425–1439. (83)

Igoe, Niall; Bayle, Elliott D.; Tallant, Cynthia; Fedorov, Oleg; Meier, Julia C.; Savitsky, Pavel; Rogers, Catherine; Morias, Yannick; Scholze, Sarah; Boyd, Helen; Cunoosamy, Danen; Andrews, David M.; Cheasty, Anne; Brennan, Paul E.; Müller, Susanne; Knapp, Stefan; Fish, Paul V. Design of a Chemical Probe for the Bromodomain and Plant Homeodomain Finger-Containing (BRPF) Family of Proteins. J. Med. Chem. 2017, 60, 6998–7011.

(84)

Filippakopoulos, Panagis; Qi, Jun; Picaud, Sarah; Shen, Yao; Smith, William B.; Fedorov, Oleg; Morse, Elizabeth M.; Keates, Tracey; Hickman, Tyler T.; Felletar, Ildiko; Philpott, Martin; Munro, Shonagh; McKeown, Michael R.; Wang, Yuchuan; Christie, Amanda L.; West, Nathan; Cameron, Michael J.; Schwartz, Brian; Heightman, Tom D.; La Thangue, Nicholas; French, Christopher A.; Wiest, Olaf; Kung, Andrew L.; Knapp, Stefan; Bradner, James E. Selective Inhibition of BET Bromodomains. Nature 2010, 468, 1067–1073.

(85)

Clark, G. K.; Hay, D.; Dixon, D.; Brennan, P.; Moustakim, Moses; Clark, Peter G. K.; Hay, Duncan A.; Dixon, Darren J.; Brennan, Paul E. Chemical Probes and Inhibitors of Bromodomains Outside the BET Family. Med. Chem. Commun. J. Name 2016, 00, 1–3.

(86)

Crawford, Terry D.; Tsui, Vickie; Flynn, E. Megan; Wang, Shumei; Taylor, Alexander M.; Côté, Alexandre; Audia, James E.; Beresini, Maureen H.; Burdick, Daniel J.; Cummings, Richard; Dakin, Les A.; Duplessis, Martin; Good, Andrew C.; Hewitt, Michael C.; Huang, Hon Ren; Jayaram, Hariharan; Kiefer, James R.; Jiang, Ying; Murray, Jeremy; Nasveschuk, Christopher G.; Pardo, Eneida; Poy, Florence; Romero, F. Anthony; Tang, Yong; Wang, Jian; Xu, Zhaowu; Zawadzke, Laura E.; Zhu, Xiaoyu; Albrecht, Brian K.; Magnuson, Steven R.; Bellon, Steve; Cochran, Andrea G. Diving into the Water: Inducible Binding Conformations for BRD4, TAF1(2), BRD9, and CECR2 Bromodomains. J. Med. Chem. 2016, 59, 5391–5402.

(87)

Bennett, James; Fedorov, Oleg; Tallant, Cynthia; Monteiro, Octovia; Meier, Julia; Gamble, Vicky; Savitsky, Pavel; Nunez-Alonso, Graciela a.; Haendler, Bernard; Rogers, Catherine; Brennan, Paul E.; Müller, Susanne; Knapp, Stefan. Discovery of a Chemical Tool Inhibitor Targeting the Bromodomains of TRIM24 and BRPF. J. Med. Chem. 2016, 59, 1642–1647.

(88)

Martin, Laetitia J.; Koegl, Manfred; Bader, Gerd; Cockcroft, Xiao Ling; Fedorov, Oleg; Fiegen, Dennis; Gerstberger, Thomas; Hofmann, Marco H.; Hohmann, Anja F.; Kessler, Dirk; Knapp, Stefan; Knesl, Petr; Kornigg, Stefan; Müller, Susanne; Nar, Herbert; Rogers, Catherine; Rumpel, Klaus; Schaaf, Otmar; Steurer, Steffen; Tallant, Cynthia; Vakoc, Christopher R.; Zeeb, Markus; Zoephel, Andreas; Pearson, Mark; Boehmelt, Guido; McConnell, Darryl. Structure-Based Design of an in Vivo Active Selective BRD9 Inhibitor. J. Med. Chem. 2016, 59, 4462–4475.

(89)

Chen, Peiling; Chaikuad, Apirat; Bamborough, Paul; Bantscheff, Marcus; Bountra, Chas; Chung, Chun-wa; Fedorov, Oleg; Grandi, Paola; Jung, David; Lesniak, Robert; Lindon, Matthew; Müller, Susanne; Philpott, Martin; Prinjha, Rab; Rogers, Catherine; Selenski, Carolyn; Tallant, Cynthia; Werner, Thilo; Willson, Timothy M.; Knapp, Stefan; Drewry, David H. Discovery and Characterization of GSK2801, a Selective Chemical Probe for the Bromodomains BAZ2A and BAZ2B. J. Med. Chem. 2016, 59, 1410–1424.

ACS Paragon Plus Environment

66

Page 67 of 68 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

(90)

De Bruyn, Tom; van Westen, Gerard J. P.; Ijzerman, Adriaan P.; Stieger, Bruno; de Witte, Peter; Augustijns, Patrick F.; Annaert, Pieter P. Structure-Based Identification of OATP1B1/3 Inhibitors. Mol. Pharmacol. 2013, 83, 1257–1267.

(91)

Cortés-Ciriano, Isidro; Van Westen, Gerard J. P.; Bouvier, Guillaume; Nilges, Michael; Overington, John P.; Bender, Andreas; Malliavin, Thérèse E. Improved Large-Scale Prediction of Growth Inhibition Patterns Using the NCI60 Cancer Cell Line Panel. Bioinformatics 2015, 32, 85–95.

ACS Paragon Plus Environment

67

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 68 of 68

For Table of Contents Only:

Prospectively Validated Proteochemometric Models for the Prediction of Small Molecule Binding to Bromodomain Proteins Kathryn A. Giblin1, Samantha J. Hughes2, Helen Boyd3, Pia Hansson3, Andreas Bender1*

Chemical Features

PCM Model

Model predictions Inactive

Protein Features

Actives

Experimentally confirmed hits

Virtual Screening

ACS Paragon Plus Environment

68