Machine-Learning-Assisted Approach for Discovering Novel Inhibitors

Jun 21, 2017 - Bromodomain-containing protein 4 (BRD4) is implicated in the pathogenesis of a number of different cancers, inflammatory diseases and h...
6 downloads 9 Views 4MB Size
Article pubs.acs.org/jcim

Machine-Learning-Assisted Approach for Discovering Novel Inhibitors Targeting Bromodomain-Containing Protein 4 Jing Xing,†,‡,§,# Wenchao Lu,†,§,# Rongfeng Liu,∥,# Yulan Wang,†,§ Yiqian Xie,†,§ Hao Zhang,†,§ Zhe Shi,∥ Hao Jiang,†,§ Yu-Chih Liu,∥ Kaixian Chen,† Hualiang Jiang,† Cheng Luo,*,† and Mingyue Zheng*,† †

Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China ‡ State Key Laboratory of Natural and Biomimetic Drugs, Peking University, Xue Yuan Road 38, Beijing 100191, China § Department of Pharmacy, University of Chinese Academy of Sciences, 19A Yuquan Road, Beijing 100049, China ∥ Shanghai ChemPartner Co., LTD., #5 Building, 998 Halei Road, Shanghai 201203, China S Supporting Information *

ABSTRACT: Bromodomain-containing protein 4 (BRD4) is implicated in the pathogenesis of a number of different cancers, inflammatory diseases and heart failure. Much effort has been dedicated toward discovering novel scaffold BRD4 inhibitors (BRD4is) with different selectivity profiles and potential antiresistance properties. Structure-based drug design (SBDD) and virtual screening (VS) are the most frequently used approaches. Here, we demonstrate a novel, structurebased VS approach that uses machine-learning algorithms trained on the priori structure and activity knowledge to predict the likelihood that a compound is a BRD4i based on its binding pattern with BRD4. In addition to positive experimental data, such as X-ray structures of BRD4−ligand complexes and BRD4 inhibitory potencies, negative data such as false positives (FPs) identified from our earlier ligand screening results were incorporated into our knowledge base. We used the resulting data to train a machine-learning model named BRD4LGR to predict the BRD4i-likeness of a compound. BRD4LGR achieved a 20−30% higher AUC-ROC than that of Glide using the same test set. When conducting in vitro experiments against a library of previously untested, commercially available organic compounds, the second round of VS using BRD4LGR generated 15 new BRD4is. Moreover, inverting the machine-learning model provided easy access to structure−activity relationship (SAR) interpretation for hit-to-lead optimization.



are effective in preventing heart failure.8 Due to its high therapeutic relevance, significant efforts have been devoted to the discovery and design of structurally diverse BRD4is,9,10 and many compounds have advanced into clinical trials (Table 1). The study of BRD4is dates to 2010, when Panagis Filippakopoulos et al.11 and Edwige Nicodeme et al.7 published their discoveries of the first BRD4is, JQ1 and i-BET, respectively. These two structures are both derived from benzodiazepine, which fits the pocket perfectly. To fully study the biological function and therapeutic potential of BRD4is, many other scaffolds have been investigated, including diazobenzene,12 γ-carboline,13 2-thiazolidinones,14 and BRD4kinase dual inhibitors.15 However, no BRD4i has been approved as a drug. Hence, the development and exploration of novel chemotypes of BRD4is with different selectivity profiles may provide novel therapeutic approaches. In drug discovery, molecular docking-based virtual screening (VS) plays an important role in the identification of novel hits with diverse structures.16,17 This approach has also been applied

INTRODUCTION Epigenetic regulation of gene expression is increasingly acknowledged as an important element in the promotion of many cancers. Bromodomain (BD), which reads acetylated lysine (KAc), has received extensive attention, particularly in the search for new ligands that may interfere with its function.1 Among these bromodomain-containing proteins, the bromodomain and extra-terminal (BET) family, which includes bromodomain-containing protein 2/3/4 (BRD2/3/4) and bromodomain testis-specific protein (BRDT), exhibits high sequence similarity, and BRD4 is the most predominant drug target for intervention.2 Abnormal regulation of BRD4 leads to various cancers,2−4 including hematological carcinoma and solid tumors. For instance, BRD4 activates P-TEFb to initiate c-Myc-dependent transcription and cause mixed lineage leukemia (MLL) by binding of the BD1 domain of BRD4 to two distance-specific KAcs of the histone.5 In NUT-midline carcinoma, the BRD4NUT fusion protein loses the P-TEFb-interacting domain (PID), halting transcription, and inactivating p53.6 In addition to their anticancer properties, BRD4 inhibitors (BRD4is) also exhibit anti-inflammatory potential in vivo.7 Besides, BRD4is © 2017 American Chemical Society

Received: February 18, 2017 Published: June 21, 2017 1677

DOI: 10.1021/acs.jcim.7b00098 J. Chem. Inf. Model. 2017, 57, 1677−1690

Article

Journal of Chemical Information and Modeling Table 1. Inhibitors of BRDs in Clinical Trialsa

a

Data collected from https://integrity.thomson-pharma.com/integrity on March 30, 2017. 1678

DOI: 10.1021/acs.jcim.7b00098 J. Chem. Inf. Model. 2017, 57, 1677−1690

Article

Journal of Chemical Information and Modeling

Figure 1. Three-step workflow of this study: The first round of screening, BRD4-specific score development, and the second round of screening. The first round of screening generated not only potential hits but also negative samples. These results were combined with data from published studies of BRD4is to develop a BRD4-specific scoring model. This model was applied in a second screening that generated hits more effectively. In addition, SAR information was gathered by interpreting this model. The BRD4-specific score can be optimized by updating the data set with the results of the second round of screening.

to the discovery and analysis of BRD4is. Duffy et al.18 found a series of compounds containing N-acylated dihydroquinoxalone as potential BRD4is (IC50 less than 10 μM) from 153 structual diverse hits generated by protein−ligand docking based VS. Allen et al.19 identified a dual EGFR−BRD4 inhibitor (IC50 9 μM against BRD4 and 0.044 μM against EGFR) among 24 manually selected BRD4 docking hits, from 908 predicted EGFR inhibitors. However, false-positive (FP) rates of VS are commonly high because selection of a suitable scoring function (SF) remains challenging, and the performance of SFs may vary significantly for different protein targets.20 This target-dependence has restricted the successful application of molecular docking.16 Accordingly, target-specific SFs have been proposed that optimize the SF using ligand structure and activity data for the target of interest.21 Compared with general-purpose approaches, target-specific scoring may lower the FP rate of VS in structure-based drug design (SBDD).22 In addition to hit discovery, the putative binding mode and protein−ligand interactions are also essential for SAR analysis. Kuang et al.23 performed classical molecular dynamics (MD) and state-of-theart density functional QM/MM MD simulations, revealing L92, L94, Y97, and N140 critical in JQ1 binding/releasing kinetics. Ran et al.24 studied 20 diverse bromodomain inhibitors using a combination of molecular docking, interaction fingerprinting, MD simulation, and binding free energy calculation, which provided insights into two hot spots (W81-V87-L92-I146 and N140) for rational design of novel bromodomain inhibitors. In this paper, we introduce a novel approach for scoring the likelihood that a compound is a BRD4i using a machinelearning-assisted strategy and report some novel BRD4is. We first performed a routine docking-based VS against BRD4. The high-ranking compounds were biologically tested, and a set of active and inactive compounds was obtained due to the high FP rate of the generic docking score. We then developed a BRD4specific score by mining the structure and activity characteristics of previously reported BRD4is together with the hits and inactive results obtained from the first round of VS. The BRD4specific score was designed and optimized to meet the following two objectives: (1) improved discrimination between active and inactive BRD4is; (2) facilitation of the perception,

understanding, and interpretation of the SAR of the potential BRD4is. The diagram in Figure 1 shows the overall workflow of the study.



MATERIALS AND METHODS Molecular Modeling. Protein Preparation and Grid Generation for Docking. Each protein structure was prepared by Protein Preparation Wizard in Maestro 9.1 (Schrödinger, LLC, New York, NY, 2010), including assigning bond orders, adding hydrogens, creating disulfide bonds, converting selenomethionines to methionines, and filling in missing side chains. H-bond optimization and restrained minimization (Impref, Schrödinger, LLC, New York, NY, 2010) were then applied. Next, all water molecules were deleted except WAT 9, 12, 15, 23, 33, and 209 (the residue numbering of PDB 3MXF was followed),25 and the structures containing and excluding these six water molecules were saved separately. Finally, all prepared protein structures with and without structural water molecules were subjected to cross-docking. The grids for docking (for enrichment analysis or VS) were generated by the Receptor Grid Generation Module in Maestro 9.1. The center of the enclosing box was defined as the centroid of the native ligand, and other settings were set to default values. Ligand Preparation for Docking. Pan-Assay INterference compoudS (PAINS)26,27 with a series of substructural alerts should be treated with caution, because their undesired binding to a protein or interference with the assay may lead to false positive result in experimental screening campaigns. To avoid this issue, PAINS, inorganic and nondrug-like molecules were removed from the ligand database using Pipeline Pilot (version 7.5, Accelrys Software Inc., San Diego, CA). The 3D coordinates of the remaining ligands were generated using LigPrep (version 2.4, Schrödinger, LLC, New York, NY, 2010), and their protonation states were determined at pH 7.0 ± 2.0 with Epik (version 2.1, Schrödinger, LLC, New York, NY, 2010). Ligand structures were desalted, and tautomers were generated in default mode. Stereoisomers were generated at a maximum of 32 per ligand if the chiralities of the ligand were not specified. Only one low-energy ring conformation per ligand was generated. 1679

DOI: 10.1021/acs.jcim.7b00098 J. Chem. Inf. Model. 2017, 57, 1677−1690

Article

Journal of Chemical Information and Modeling Table 2. Components of the SAR Data Set training set

test set

source

positive no.

negative no.

positive no.

negative no.

total

literature PDBa first round screening total

193 37 18 211

31 6 346 377

66 12 3 69

40 10 86 126

330 65 453

588 a

195

783

Last accessed on June 26, 2015. Data from the PDB was included with the literature.

Docking and Screening. Docking was performed using Glide (version 5.6, Schrödinger, LLC, New York, NY, 2010) with standard precision (SP) mode, enhanced planarity of conjugated pi groups, and strain correction terms applied. In the first round of VS, filtered structures (molecular weight less than 520, LE (eq 1) more than 13, and Glide score lower than −6.0) were clustered by ECFP4 similarity with Pipeline Pilot for visual inspection. −score LE = × 1000 (1) MW

determine interaction-specific weights. Here, the molecular interaction features used to develop the BRD4-specific scoring model were PMF29 scores decomposed on a per-residue basis. By analyzing the existing crystallographic data for BRD4-ligand complexes and reported critical residues,23,24 key residues for ligand binding were identified, including Trp81, Pro82, Phe83, Gln85, Val87, Leu92, Leu94, Tyr97, Asn140, and Ile146, as well as conserved water molecules (as described in the Protein Preparation and Grid Generation for Docking section). For each residue (or water molecule) i, the PMF score Si was further decomposed into polar, nonpolar, or hydrogen-bonding contribution terms depending on the type of protein−ligand atom pair. For example, given a ligand j, the value of the nonpolar term of Val87 x87,np(j) was computed as the summation of all PMF pair potentials between the nonpolar atoms of Val87 and the nonpolar atoms of the ligand and then divided by the number of nonpolar atoms of j (Nj):

In the second round of VS, a diversity-oriented compound library was evaluated by both Glide and BRD4-specific scores. Structures with lower Glide score and higher BRD4-specific score were selected for bioactivity evaluation. Construction of BRD4-Specific Scoring Model. Structure and Activity Data for BRD4is. Structure and activity data (814 compounds in total) were collected from our screening data and 17 studies (Table S1). For each compound, a binding site comprising the ligand and residues within 12 Å was extracted from the BRD4−ligand complex based on the following criteria: (1) if a 3D structure of the complex was listed in the PDB database, the binding site was extracted after removing the ions, salts, and nonstructural water molecules in the structure; (2) if a 3D structure of the complex was not available, a putative binding complex was generated using the protein structure model of the first-round screening and the same binding site extraction procedure was followed; and (3) if the predicted binding pose of the compound was significantly different from the binding modes of known ligands (e.g., a large compound that cannot fit into the binding site), the compound was removed from the data set. A total of 30 compounds were removed by applying these criteria. A compound was defined as positive if the binding efficiency index (BEI) was higher than 12 and its IC50, Ki, or Kd was less than 50 μM or otherwise negative. The BEI28 of an active molecule was defined as follows: BEI =

pIC50 , pK i or pKd × 1000 MW

x87,np(j) = S87,np/Nj

(3)

In total, 22 molecular interaction features were defined to represent the binding patterns of the BRD4is. More details concerning the features of the molecular interactions are provided in in the Supporting Information (Method S1 and Table S2). Logistic Regression Model. In this study, a logistic regression model named BRD4LGR was developed to predict the BRD4 inhibition potency of a given compound. The logistic function of this model is described as follows: p(X ) =

e β0 + β1X1+ β2X 2 + ... +βnXn 1 + e β0 + β1X1+ β2X 2 + ... +βnXn

(4)

where p(X) is the likelihood of a molecule being active; X = (X1, ..., Xn) are n features (scaled BRD4-specific energy terms in this case) of each molecule; βi (i = 0, 1, 2, ..., n) are the coefficients to be estimated from the training data; and e is Euler’s constant.30 One of the main advantages of this methodology is that the probability of a binary response between 0 and 1 is given to quantitatively determine the likelihood that a compound is a BRD4i. Moreover, BRD4LGR provides easy access to SAR interpretation for hit-to-lead optimization. In this model, the intermolecular interaction energy is decomposed on each key residue for BRD4-ligand binding. The coefficients of different energy terms correspond to the weighting effects being estimated for the terms. A positive coefficient indicates that a favorable (negative) energy tends to reduce the likelihood of the compound being a BRD4i. Model Training and Evaluation. The logistic regression model BRD4LGR was established with the following steps. First, the values of features on the training set were standardized, i.e., centered and scaled independently for each feature. This transformation was stored as a scaler for later use

(2)

where MW is the molecular weight, p denotes the negative logarithm, and the units of IC50, Ki, or Kd are molar. Based on this definition, the ratio of the number of positive and negative hits was 5:9. The data set was randomly divided into a training set and a test set at a ratio of 3:1. The sources and amounts of the data set are shown in Table 2. Molecular Interaction Features. To develop a scoring model from the data set and characterize the significant elements of BRD4-specific binding, molecular interaction features were calculated for each compound. Protein−ligand interaction scores are useful features for approaches that directly correlate interaction energies with potencies to 1680

DOI: 10.1021/acs.jcim.7b00098 J. Chem. Inf. Model. 2017, 57, 1677−1690

Article

Journal of Chemical Information and Modeling Table 3. Active Compounds Identified by Glide VS and Their Biochemical Binding and Cellular Activities

1681

DOI: 10.1021/acs.jcim.7b00098 J. Chem. Inf. Model. 2017, 57, 1677−1690

Article

Journal of Chemical Information and Modeling Table 3. continued

a

No significant inhibitory effects were observed.

structure model for Glide VS, decoys were generated using DUD-E,34 and the ratio of actives to decoys was 1:50. The overall accuracy, sensitivity (SE) and FP-rate (eqs 7−9) of the test set were calculated to validate the accuracy and robustness of BRD4LGR:

with the test set or in new prediction. Then, a feature-selection phase was performed to identify the best feature combination. Each combination of features was evaluated by calculating the training set area under the receiver operating characteristic curve (AUC-ROC) of the model built with this subset of features. Finally, a logistic regression classifier was constructed using these features. The likelihood ratio and Wald statistics were calculated to assess the significance of the overall model and individual coefficients.31 Data scaling and logistic regression model training were implemented by the RobustScaler and LogisticRegression modules, respectively, using scikit-learn (version 0.17)32 in Python (version 2.7.5). The final model construction and statistical tests were performed using IBM SPSS (version 22.0). Model Performance Validation. ROC plots33 are commonly used to assess the enrichment of a VS model. This plot reports sensitivity as a function of (1 − specificity). sensitivity =

Nselected actives Ntotal actives

(5)

specificity =

Ndiscarded inactives Ntotal inactives

(6)

accuracy =

SE =

NTP

NTP + NTN + NFN + NTN + NFP

NTP NTP + NFN

FP rate =

NFP NTN + NFP

(7)

(8)

(9)

where NTP, NTN, NFP, and NFN refer to the numbers of true positives, true negatives, false positives, and false negatives, respectively. Enrichment analysis and 10-fold cross validation were implemented using the Metrics and Cross-Validation modules, respectively, in scikit-learn. Biological Evaluation. Compounds. In total, 385 compounds from Specs and 165 compounds from ChemPartner were purchased for experimental evaluation. The purity of all active compounds is equal or greater than 95%, except for DC_BD216 (91%), DC_BD233 (93%), DC_BD249 (93%), DC_BD009 (90%), and DC_BD506 (92%) (Table S3). The

AUC-ROC, which is the area under the ROC, would be 1.0 if the scoring function worked perfectly and 0.5 if the samples were randomly classified. In the selection of the protein 1682

DOI: 10.1021/acs.jcim.7b00098 J. Chem. Inf. Model. 2017, 57, 1677−1690

Article

Journal of Chemical Information and Modeling

Figure 2. BRD4LGR performance on the test set. (A) Likelihood plotted against the summation of weighted energy terms for the test set: (Xβ) sum of the weighted feature values of a molecule; (likelihood) the possibility that a molecule is active. A larger scatter indicates a greater likelihood that a compound is active (above the line) or inactive (under the line). The stars represent the prediction error. True actives are colored in blue, whereas true inactives are in red. (B) ROC curves and area-under-curves (AUCs) for BRD4LGR (blue), PMF (black), and Glide (red) from the test set. (C) Comparison of the ability to distinguish actives and inactives of BRD4LGR, PMF, and Glide on the test set. Active molecules (blue) are expected to fall under the dashed line, whereas inactive ones (red) should be above the dashed line.

RMSD value of 0.23 Å around the ligand-binding site. Therefore, protein rigid docking with Glide was adopted due to the marginal structural variance across different crystal structures. The results indicated that retaining structural water molecules generally yielded smaller mean RMSD values when reproducing the native binding poses (Figure S1A). For the best protein structure model, 4GPJ-with-water (PDB ID 4GPJ, retaining structural water molecules 303, 307, 308, 312, and 317), the mean RMSD was 3.09 Å, the standard deviation was 2.36 Å, and RMSD values of less than 2.00 Å were obtained for 38 different BRD4is. Five of the protein structure models with lower mean RMSDs were further assessed using enrichment analysis. Protein structure model 4GPJ-with-water also exhibited the best overall performance, with an AUC-ROC of 0.96 and a top 1% true-positive rate of 29% (Figure S1B). Therefore, this protein structure model was used for the first round of VS and the subsequent generation of putative binding poses. Screening Results. In the first round of screening, 453 compounds were purchased, including 288 from the Specs compound library (207 639 compounds, http://www.specs. net) and 165 from the compound library of ChemPartner (132 306 compounds, http://www.shangpharma.com/). After in vitro evaluation with an ALPHA screen assay, 22 active compounds (IC50 value less than 50 μM) were discovered. Seven compounds had IC50 values of less than 10 μM and also effectively inhibited three cancer cell lines that are sensitive to BRD4is, i.e., Raji, MM.1s, and MV4-1114,35 (Table 3). Detailed results of the first round of screening are provided in the Supporting Information (Figure S2, Table S4, and ci7b00098_si_002.xlsx). Establishment of BRD4LGR. Advantage of TargetSpecific SF and Machine-Learning. In the first round of VS with Glide, 22 structurally diverse BRD4is exhibited activity in the submicromolar to micromolar range, including 13 compounds with antiproliferative effects in cancer cells. However, 431 compounds were inactive, yielding a total FP rate of 95% and suggesting that there is still substantial room for improvement of the ability of SF to enrich true hits among top-ranked compounds. In addition, these results raise the question of what we can learn from screenings with large

positive control, JQ1 (100% pure by HPLC), was purchased from Selleck (http://www.selleck.cn/). Biochemical Binding Assay. The ALPHA Screen (Amplified Luminescent Proximity Homogeneous Assay Screen) was applied to test the compound activity. In brief, the reaction buffer containing 20 mM HEPES, 0.1% (w/v) BSA, pH 7.4 supplemented with 0.01% (v/v) Triton X-100. His-tagged protein BD1 was expressed and purified to >90% and biotinylated peptide SGRGK(Ac)GGK(Ac)GLGK(Ac)GGAK(Ac)RHRK(Biotin)−OH was synthesized by GL Biochem (Shanghai) Ltd. and purified to 98%. Among the 20 μL reaction volume, compound, protein, and peptide were incubated at room temperature for 30 min in 384-well plates (white opaque OptiPlate −384) and then streptavidin-coated donor beads and nickel chelate acceptor beads (PerkinElmer, 6760619M) were added under low light conditions. After 60 min incubation at RT, the signals were read by an EnVision Multilabel Plate Reader in ALPHA mode (excitation at 680 nm and emission at 570 nm). Cell Viability Assay. MV4-11 (leukemia cancer), MM.1s. (myeloma cancer), and Raji (lymphoma cancer) cell lines were purchased from ATCC. MV4-11 was maintained in IMDM (Gibco) supplemented with 10% fetal bovine serum (Gibco). MM.1s. and Raji were maintained in RPMI 1640 (Gibco) supplemented with 10% fetal bovine serum (Gibco). Cells were seeded in 384-well plates and then treated with different concentrations of inhibitors (0.5% final DMSO concentration) at 37° and 5% CO2 for 72 h. Cell viabilities were measured by CellTiter-Glo luminescent cell viability assay (Promega).



RESULTS AND DISCUSSION First Round of Screening by Glide Docking. Protein Structure Model for Glide VS. In the first round of screening, a routine docking-based VS was performed. Before screening the entire library, cross-docking and enrichment analyses were performed to select the best protein structure model among the 69 BRD4 structures available in the PDB. The protein structure models were selected based on their capacity for predicting correct binding modes and discriminating actives from decoys (Method S3). As calculated by the structural alignment using PyMOL (version 1.7.1.0), these models show a very low mean 1683

DOI: 10.1021/acs.jcim.7b00098 J. Chem. Inf. Model. 2017, 57, 1677−1690

Article

Journal of Chemical Information and Modeling Table 4. Active Compounds Identified by BRD4LGR and Their Biochemical Binding and Cellular Activities

a

No significant inhibitory effects were observed. bOOS indicates out of stock; insufficient supply was available for the assay. cNot submitted to the assay. Compounds with an IC50 of greater than 15 μM in the ALPHA screen were not evaluated in the cellular level activity assay. 1684

DOI: 10.1021/acs.jcim.7b00098 J. Chem. Inf. Model. 2017, 57, 1677−1690

Article

Journal of Chemical Information and Modeling amounts of negative results.36 As mentioned above, designing a target-specific SF based on machine-learning techniques is a viable strategy to enhance VS efficacy using existing knowledge.22 Liwei Li et al.37 assessed the performance of structurebased SVM target-specific scoring functions across 41 targets, and the enrichment factor outperformed many conventional scoring functions. Yali Li et al.38 developed a computational model for predicting selective liver X receptor β agonists using multiple machine-learning methods, and the predictive accuracy exceeded 90% for a test set of 76 compounds with new scaffolds. Although these sophisticated machine-learning models have very impressive prediction results, their practical applications are limited due to their “black box” nature, that is, the contribution of a predefined variable to the prediction is difficult to interpret from a medicinal chemistry perspective.39 By contrast, linear models, such as multiple linear regression (MLR) and partial least-squares (PLS), are commonly used because they construct direct relationships between activity and different structural and physical-chemical descriptors, which may provide useful information for hit-to-lead optimization. BRD4LGR Training and Testing. From the matrix of molecular interaction features, the BRD4-specific scoring model BRD4LGR was constructed and evaluated. The values of all features were calculated and standardized prior to training and testing the logistic regression model. A feature-selection procedure was then performed by evaluating the AUC-ROC values of a total of 4 194 303 combinations of features from the training set. The best feature subset yielded the highest AUCROC value (0.87), which was used to train the final BRD4LGR model. The resulting BRD4LGR displayed a 10-fold crossvalidation accuracy of 0.72. The likelihood ratio of this model was 513.11, which indicates that the selected features contributed significantly to the prediction of the response variable. The coefficients and Wald statistics of the selected features are listed in Table S5. Some variables that were not statistically significant (P > 0.05) were retained because their corresponding interactions are known to be important for BRD4-ligand binding1 (e.g., residues Pro82, Phe83, Asn140, and Ile146). For the test set, BRD4LGR achieved an overall accuracy of 0.77 and an AUC-ROC value of 0.83. The truepositive and false-positive rates of the test set were 0.80 and 0.24, respectively. To visualize the prediction, a logistic curve of the test set scoring is depicted in Figure 2A. Each scatter representing a compound was placed in the region of actives or inactives divided by a line at the likelihood of 0.5. Most of the compounds were predicted correctly (dense circles), and only approximately 20% were incorrect (sparse stars). As shown in Figure 2B, the AUC-ROC of BRD4LGR was 0.83. In addition to AUC-ROC, SFs can be compared based on the ability to distinguish active and inactive compounds. In Figure 2C, the compounds are ranked by BRD4LGR score; the reference line was set at 70, which was the number of active compounds in the test set. In an ideal SF, all blue bands should be below the reference line, and all red bands should be above it. BRD4LGR clearly exhibited superior discriminant validity for the red and blue bands. For comparison, ROC curve and compound ranking of Glide and PMF scores on the same test set were shown in Figure 2. These results suggest that BRD4LGR is an effective classifier for discrimination between BRD4is and noninhibitors. Second-Round of Screening by BRD4LGR. Screening Results. To determine whether the enriched knowledge improved the performance of the SF, the BRD4-specific score

was applied in the second round of VS (Figure 1). First, a diversity-oriented compound library comprising 18 616 compounds from Specs was prepared by structure clustering and similarity-based filtering, where the structural analogues (with a similarity of greater than 0.60) of active compounds discovered during the first-round screening and known BRD4is were removed. The compound library was then docked using Glide, and the generated poses with Glide scores of lower than −5.00 were rescored using BRD4LGR. Finally, visual inspections were performed and a total of 97 structurally diverse molecules were selected for bioassay evaluation. As shown in Table 4, 15 new active compounds were identified with IC50 values ranging from 0.73 to 147 μM. Detailed results of the second round of screening are provided in the Supporting Information (Figure S3, Table S6, and ci7b00098_si_002.xlsx). In this round of VS, the FP rate was 85%. At the hit discovery stage, some off-target effects are difficult to avoid due to the weak potency of compounds. Further structural optimization is needed to improve their potency and selectivity, on the condition that their activities on BRD4 are not false positives. To address the issue, PAINS structures were first removed, and interference and counterscreening experiments were conducted against the 37 potential BRD4is identified after two rounds of screening. As listed in the Supporting Information (Table S3), although none of the hits showed interference to the ALPHA screen assay, some showed significant inhibition (>90% at 50 μM) to other targets in counter screening experiments, e.g., DC_BD476, DC_BD536, DC_BD530, DC_BD518, and DC_BD506. Furthermore, the Hill slopes for the IC50 determinations were provided in the Supporting Information (Figures S2, S3, and S5), where DC_BD249 and DC_BD476 showed larger values of 3.3, indicating their inhibition can be promiscuous. SAR Interpretation of BRD4LGR. In addition to discovering novel hits, scoring functions also must be capable of identifying and visualizing key SARs. The BRD4LGR model was designed to use residue-based ligand−receptor interaction features from crystal complex structures to enable the interpretation of SARs at the residue level. Figure S4 summarizes the distributions of the interaction scores used in this study. Some of the distributions of the interaction scores differed between active and inactive compounds, thus providing a basis for determining whether a compound is active against BRD4. However, in a greater number of cases, the distributions for active and inactive compounds exhibited large overlaps. These overlapping scores and the simple additivity assumption may account for the inferior performance of traditional knowledge-based SFs. By contrast, a target-specific scoring function can be considered a weighting scheme in which different weights can be introduced to adjust for the individual contributions of different protein residues. A sector diagram visualizing the signs and absolute values of the selected features is presented in Figure 3. A negative coefficient indicates that if the corresponding interaction term has a negative value (for PMF-like scores, a negative value is favorable for binding), the likelihood that the predicted molecule is an active BRD4i increases. By contrast, a positive coefficient indicates that a favorable interaction with a specific residue may lower the likelihood that the molecule is a BRD4i. This argument is counterintuitive but highlights the limitations of standard ligand−receptor interaction scores in distinguishing true active ligands. Clearly, not all negative interaction scores are “favorable” for binding to BRD4. A large absolute value indicates that for the negative or positive 1685

DOI: 10.1021/acs.jcim.7b00098 J. Chem. Inf. Model. 2017, 57, 1677−1690

Article

Journal of Chemical Information and Modeling

Figure 3. Weights of selected features indicated in a sector diagram for SAR interpretation of BRD4LGR. Higher feature weights have larger sector areas. Negative coefficients are colored in blue, which indicates that the corresponding interactions are favorable. Positive coefficients are in red, which indicates that the corresponding interactions may reduce the likelihood of the compound being a BRD4i. Figure 4. Examples of distributions of PMF scores of energy decomposition and SAR analysis with BRD4LGR. (A) Probability density histogram of different energy distribution scores of polar interactions with Tyr97 between positive (blue) and negative (red) samples. (B) Binding poses of compounds 4a (yellow sticks, docking complex) and 4b (green sticks, PDB ID 4QB3). Hydrogen bonds are shown as dotted yellow lines. (C) Probability density histogram of different energy distribution PMF scores of polar interactions with Leu92 between positive (blue) and negative (red) samples. (D) Binding poses of 32 (green sticks, PDB ID 4WIV) and 41 (yellow sticks, docking complex).

coefficients, the corresponding interaction is either crucial or intolerant for binding, respectively. Specifically, BRD4LGR implies the following information about SARs for ligand interactions with BRD4: a. Polar interactions with Tyr97 are important for ligand binding,40 corresponding to a ligand fragment containing polar atoms deep in the pocket. b. Hydrogen-bond interactions with structural water molecules are favorable for activity.12 The water network plays an indispensable role in the stabilization of ligandBRD4 binding.10 In particular, the water bridge between Tyr97 and the ligand is highly conserved among most known scaffolds of BRD4is. c. Interactions with the WPF shelf are important because both polar and nonpolar interactions with Trp81 are significant.41 d. Polar interactions with Leu92 or Leu94 are particularly notable42 and may reduce the likelihood that a molecule is an active BRD4i. From a data-mining perspective, the BRD4LGR “learned” the above-mentioned SAR from the previous crystallographic and bioassay data for BRD4is during the model training process, which estimated the weights of different interaction terms based on the relative positions of their PMF-score distributions. Using the Tyr97 polar interaction as an example (Figure 4A), the coefficient is negative because the average score of the active compounds is lower than that of the inactive compounds. By contrast, for polar interactions with Leu92, the coefficient becomes positive because the PMF scores of the inactive compounds are located on the left side (Figure 4C). Furthermore, a larger separation between the active and inactive classes indicates a larger absolute value of the weight of the interaction. Based on these points, a more detailed SAR analysis of this system can be performed. Two compound series relating to the Tyr97 and Leu92 interactions are shown in Figure 4B and D, respectively. Compared with compound 4a, the acetyl group of compound

4b is closer to both the phenol of Tyr97 and a structural water molecule. Accordingly, 4b has lower PMF scores for the polar and water H-bonded interactions of Tyr97. These two terms result in a higher BRD4LGR score, indicating that 4b is more likely to be a BRD4i than 4a. Consistent with this prediction, the two compounds were reported by Mar Gacias et al.,43 and 4b has a lower Kd value (3.4 μM). Figure 4D shows the comparison of compounds 32 and 41. The structures of 32 and 41 differ by the replacement of the phenyl linker in 32 with pyridine in 41 to change the interactions with Leu92. Although 41 has slightly better polar interactions with Leu92, it has weaker nonpolar interactions with Leu92. The positive coefficient of the Leu92 polar score and the negative weight of the Leu92 nonpolar score reduce the likelihood of 41 being active. Comparing the total scores of these two compounds, 32 should be more likely to inhibit BRD4, consistent with the inhibitory activities of BRD4 reported by McKeown et al.42 These two examples suggest that changes to each interaction term in BRD4LGR alter the total probability of a compound becoming a BRD4i, which may provide useful information concerning structural optimization for medicinal chemists. To further examine the SAR extracted by BRD4LGR, benzo[cd]indol-2(1H)-ones, one of the scaffolds in our screening results, was studied. As summarized in Table 5 and Figure 5 (more details in Figure S5 and Table S7), two modes of SARs can clearly be interpreted from the model. The first mode is the R1 group, which is predicted to form nonpolar 1686

DOI: 10.1021/acs.jcim.7b00098 J. Chem. Inf. Model. 2017, 57, 1677−1690

Article

Journal of Chemical Information and Modeling Table 5. SAR of Benzo[cd]indol-2(1H)-ones as BRD4 Inhibitors

Figure 5. SAR interpretation by BRD4LGR. (A−C) Binding modes of three compounds with different R1 groups (DC_BD454 yellow, R1 = ethyl; DC_BD553 pink, R1 = methyl; DC_BD170 green, R1 = H). (D and E) Binding modes of two compounds with different R2 groups (DC_BD557 orange, R2 = diethyl; DC_BD564 magenta, R2 = dimethyl). (F) Plots of logIC50 and decomposed PMF scores: (blue bar) logIC50; (red or green line) PMF scores of nonpolar interactions with Trp81 or Phe83; (gray dotted line) average of these two interactions.

1687

DOI: 10.1021/acs.jcim.7b00098 J. Chem. Inf. Model. 2017, 57, 1677−1690

Journal of Chemical Information and Modeling



interactions with Phe83. R1 is ethyl-, methyl-, and H in DC_BD454, DC_BD553, and DC_BD170, respectively (Figure 5A−C). For nonpolar interactions of Phe83, the PMF score of ethyl is much lower than that of methyl or H, which indicates that DC_BD454 has the highest likelihood of inhibiting BRD4. The second mode is related to the R2 group, which is predicted to form nonpolar interactions with Trp81. The compounds exhibited increasing PMF scores for nonpolar interactions with Trp81 when R2 was substituted with pyrrolidine (DC_BD170), N,N-diethyl (DC_BD557) and N,N-dimethyl (DC_BD564), which indicates that a weak, nonpolar interaction with Trp81 reduces the likelihood of BRD4 inhibition (Figure 5C−E). Overall, the total BRD4LGR score of each compound is determined by both R1 and R2, which can be considered a superimposed effect of the nonpolar interactions with Trp81 and Phe83 (the gray dashed line in Figure 5F). As listed in Table 5, the BRD4LGR score agrees well with the activity of these compounds. These results highlight that the hydrophobic interactions with the WPF shelf are important for the design of more potent BRD4is. Update of BRD4LGR with the Results of the Second Screening. Compared with traditional SFs, one of the benefits of a machine-learning-based SF is its flexibility when new experimental data become available. In this study, BRD4LGR was further updated by integrating the second round of screening data to create BRD4LGR 2.0 based on an expanded data set. Regarding model assessment, the updated version had a likelihood ratio of 585.79, a 10-fold cross-validation accuracy of 0.72, and an AUC-ROC of 0.85 for the training set. For the test set, BRD4LGR 2.0 achieved an overall accuracy of 0.76 and an AUC-ROC of 0.80, with a true-positive rate of 0.74 and a false-positive rate of 0.22. Compared with the first version, a false-positive rate was slightly decreased for BRD4LGR 2.0. These results suggest that the performance of BRD4LGR can be further improved. The updated coefficients and Wald statistics for selected features are listed in Table S8. There are some differences between these two versions. For example, the polar interaction with Pro82 and nonpolar interaction with Ile146 were removed in BRD4LGR 2.0 due to their large P values. However, most coefficients exhibited little change, suggesting that the model became stable with the current data.

Article

ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jcim.7b00098. Methods of cross-docking and calculation of PMF based features; dose−response behaviors, purity data (confirmed by 1H NMR and LC/MS), and counterscreening results of active compounds; source literatures of the data set; statistics of BRD4LGR models; comparison of score distributions of actives versus inactives (PDF) Scores and bioassay results of compounds selected from virtual screening (XLSX) Molecular formula strings for target compounds with associated biochemical and biological data (XLSX)



AUTHOR INFORMATION

Corresponding Authors

*Phone: +86-21-50806600. E-mail: [email protected] (M.Z.). *Phone: +86-21-50806600. E-mail: [email protected] (C.L.). ORCID

Cheng Luo: 0000-0003-3864-8382 Mingyue Zheng: 0000-0002-3323-3092 Author Contributions #

J.X., W.L., and R.L. contributed equally to this work.

Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This work was supported by the “Personalized Medicines Molecular Signature-based Drug Discovery and Development”, Strategic Priority Research Program of the Chinese Academy of Sciences (XDA12050201 to M.Z.), National Key Research & Development Plan (2016YFC1201003 to M.Z.), the National Natural Science Foundation of China (21210003 and 81230076 to H.J., 81430084 to K.C., 21472208 to C.L.), the National Basic Research Program (2015CB910304 to X.L.), the State Key Laboratory of Natural and Biomimetic Drugs (K20160201 to M.Z.), and the Fund of State Key Laboratory of Toxicology and Medical Countermeasures, Academy of Military Medical Science (TMC201505 to C.L.).



CONCLUSION In this work, we report the development and application of the target-specific scoring function BRD4LGR for the discovery of novel BRD4is. Machine-learning techniques were used for model training on the basis of previously reported crystallographic structures and experimental binding and bioassay data for BRD4is. For the BRD4 target system, BRD4LGR exhibited a significant improvement in predictive performance compared to other general-purpose scoring functions. In total, 37 compounds with novel scaffolds were identified as novel BRD4is. Among these compounds, 17 compounds exhibited potent inhibitory activities in BET inhibitor-sensitive cell lines. In addition to its capacity for hit discovery, BRD4LGR can also be used to interpret the SARs of BRD4is to provide useful information for further hit-to-lead optimization. With the rapid accumulation of large-scale structural and bioactivity data for drug-like molecules, this easy-to-implement, updatable, machine-learning-assisted approach can be applied to other target systems of interest.



ABBREVIATIONS KAc, acetylated lysine; BRD4, bromodomain-containing protein 4; BD, bromodomain; FDA, Food and Drug Administration; VS, virtual screening; SF, scoring function; FP, false positive; SBDD, structure-based drug design; SAR, structure−activity relationship; PDB, Protein Data Bank; RMSD, root-mean square deviation; AUC-ROC, area under receiver operating characteristic curve; IC50, half-maximum inhibitory concentration; μ, micro; μM, micromoles per liter; LGR, logistic regression; 3D, three-dimensional; BEI, binding efficiency index; Ki, equilibrium inhibition constant; Kd, equilibrium dissociation constant; MW, molecular weight; PMF, potential of mean force; SP, standard precision; PAINS, pan assay interference compounds; ALPHA screen, amplified luminescent proximity homogeneous assay screen; HEPES, 4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid 1688

DOI: 10.1021/acs.jcim.7b00098 J. Chem. Inf. Model. 2017, 57, 1677−1690

Article

Journal of Chemical Information and Modeling



(17) Kumar, V.; Krishna, S.; Siddiqi, M. I. Virtual screening strategies: recent advances in the identification and design of anticancer agents. Methods 2015, 71, 64−70. (18) Duffy, B. C.; Liu, S.; Martin, G. S.; Wang, R.; Hsia, M. M.; Zhao, H.; Guo, C.; Ellis, M.; Quinn, J. F.; Kharenko, O. A.; Norek, K.; Gesner, E. M.; Young, P. R.; McLure, K. G.; Wagner, G. S.; Lakshminarasimhan, D.; White, A.; Suto, R. K.; Hansen, H. C.; Kitchen, D. B. Discovery of a new chemical series of BRD4(1) inhibitors using protein-ligand docking and structure-guided design. Bioorg. Med. Chem. Lett. 2015, 25, 2818−2823. (19) Allen, B. K.; Mehta, S.; Ember, S. W.; Schonbrunn, E.; Ayad, N.; Schurer, S. C. Large-Scale Computational Screening Identifies First in Class Multitarget Inhibitor of EGFR Kinase and BRD4. Sci. Rep. 2015, 5, 16924. (20) Leach, A. R.; Shoichet, B. K.; Peishoff, C. E. Prediction of Protein−Ligand Interactions. Docking and Scoring: Successes and Gaps. J. Med. Chem. 2006, 49, 5851−5855. (21) Seifert, M. H. J. Targeted scoring functions for virtual screening. Drug Discovery Today 2009, 14, 562−569. (22) Xue, M.; Zheng, M.; Xiong, B.; Li, Y.; Jiang, H.; Shen, J. Knowledge-Based Scoring Functions in Drug Design. 1. Developing a Target-Specific Method for Kinase-Ligand Interactions. J. Chem. Inf. Model. 2010, 50, 1378−1386. (23) Kuang, M.; Zhou, J.; Wang, L.; Liu, Z.; Guo, J.; Wu, R. Binding Kinetics versus Affinities in BRD4 Inhibition. J. Chem. Inf. Model. 2015, 55, 1926−1935. (24) Ran, T.; Zhang, Z.; Liu, K.; Lu, Y.; Li, H.; Xu, J.; Xiong, X.; Zhang, Y.; Xu, A.; Lu, S.; Liu, H.; Lu, T.; Chen, Y. Insight into the key interactions of bromodomain inhibitors based on molecular docking, interaction fingerprinting, molecular dynamics and binding free energy calculation. Mol. BioSyst. 2015, 11, 1295−1304. (25) Vidler, L. R.; Filippakopoulos, P.; Fedorov, O.; Picaud, S.; Martin, S.; Tomsett, M.; Woodward, H.; Brown, N.; Knapp, S.; Hoelder, S. Discovery of novel small-molecule inhibitors of BRD4 using structure-based virtual screening. J. Med. Chem. 2013, 56, 8073− 8088. (26) Capuzzi, S. J.; Muratov, E. N.; Tropsha, A. Phantom PAINS: Problems with the Utility of Alerts for Pan-Assay INterference CompoundS. J. Chem. Inf. Model. 2017, 57, 417−427. (27) Baell, J. B.; Holloway, G. A. New Substructure Filters for Removal of Pan Assay Interference Compounds (PAINS) from Screening Libraries and for Their Exclusion in Bioassays. J. Med. Chem. 2010, 53, 2719−2740. (28) Abad-Zapatero, C. Ligand efficiency indices for effective drug discovery. Expert Opin. Drug Discovery 2007, 2, 469−488. (29) Muegge, I. PMF scoring revisited. J. Med. Chem. 2006, 49, 5895−5902. (30) James, G.; Witten, D.; Hastie, T.; Tibshirani, R. Classification. In An Introduction to Statistical Learning: with Applications in R; Springer New York: New York, NY, 2013; pp 127−173. (31) Bewick, V.; Cheek, L.; Ball, J. Statistics review 14: Logistic regression. Crit. Care 2005, 9, 112−118. (32) Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; Vanderplas, J.; Passos, A.; Cournapeau, D.; Brucher, M.; Perrot, M.; Duchesnay, E. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825−2830. (33) Triballeau, N.; Acher, F.; Brabet, I.; Pin, J. P.; Bertrand, H. O. Virtual screening workflow development guided by the ″receiver operating characteristic″ curve approach. Application to highthroughput docking on metabotropic glutamate receptor subtype 4. J. Med. Chem. 2005, 48, 2534−2547. (34) Mysinger, M. M.; Carchia, M.; Irwin, J. J.; Shoichet, B. K. Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking. J. Med. Chem. 2012, 55, 6582−6594. (35) Mertz, J. A.; Conery, A. R.; Bryant, B. M.; Sandy, P.; Balasubramanian, S.; Mele, D. A.; Bergeron, L.; Sims, R. J., III Targeting MYC dependence in cancer by inhibiting BET bromodomains. Proc. Natl. Acad. Sci. U. S. A. 2011, 108, 16669−16674.

REFERENCES

(1) Filippakopoulos, P.; Knapp, S. Targeting bromodomains: epigenetic readers of lysine acetylation. Nat. Rev. Drug Discovery 2014, 13, 337−356. (2) Filippakopoulos, P.; Picaud, S.; Mangos, M.; Keates, T.; Lambert, J. P.; Barsyte-Lovejoy, D.; Felletar, I.; Volkmer, R.; Muller, S.; Pawson, T.; Gingras, A. C.; Arrowsmith, C. H.; Knapp, S. Histone Recognition and Large-Scale Structural Analysis of the Human Bromodomain Family. Cell 2012, 149, 214−231. (3) Shi, J.; Vakoc, C. R. The Mechanisms behind the Therapeutic Activity of BET Bromodomain Inhibition. Mol. Cell 2014, 54, 728− 736. (4) Jung, M.; Gelato, K. A.; Fernandez-Montalvan, A.; Siegel, S.; Haendler, B. Targeting BET bromodomains for cancer treatment. Epigenomics 2015, 7, 487−501. (5) Rahl, P. B.; Lin, C. Y.; Seila, A. C.; Flynn, R. A.; McCuine, S.; Burge, C. B.; Sharp, P. A.; Young, R. A. c-Myc regulates transcriptional pause release. Cell 2010, 141, 432−445. (6) Schroeder, S.; Cho, S.; Zeng, L.; Zhang, Q.; Kaehlcke, K.; Mak, L.; Lau, J.; Bisgrove, D.; Schnoelzer, M.; Verdin, E.; Zhou, M.-M.; Ott, M. Two-pronged Binding with Bromodomain-containing Protein 4 Liberates Positive Transcription Elongation Factor b from Inactive Ribonucleoprotein Complexes. J. Biol. Chem. 2012, 287, 1090−1099. (7) Nicodeme, E.; Jeffrey, K. L.; Schaefer, U.; Beinke, S.; Dewell, S.; Chung, C.-w.; Chandwani, R.; Marazzi, I.; Wilson, P.; Coste, H.; White, J.; Kirilovsky, J.; Rice, C. M.; Lora, J. M.; Prinjha, R. K.; Lee, K.; Tarakhovsky, A. Suppression of inflammation by a synthetic histone mimic. Nature 2010, 468, 1119−1123. (8) Anand, P.; Brown, J. D.; Lin, C. Y.; Qi, J.; Zhang, R.; Artero, P. C.; Alaiti, M. A.; Bullard, J.; Alazem, K.; Margulies, K. B.; Cappola, T. P.; Lemieux, M.; Plutzky, J.; Bradner, J. E.; Haldar, S. M. BET Bromodomains Mediate Transcriptional Pause Release in Heart Failure. Cell 2013, 154, 569−582. (9) Brand, M.; Measures, A. M.; Wilson, B. G.; Cortopassi, W. A.; Alexander, R.; Hoess, M.; Hewings, D. S.; Rooney, T. P. C.; Paton, R. S.; Conway, S. J. Small Molecule Inhibitors of Bromodomain-Acetyllysine Interactions. ACS Chem. Biol. 2015, 10, 22−39. (10) Zhang, G.; Smith, S. G.; Zhou, M.-M. Discovery of Chemical Inhibitors of Human Bromodomains. Chem. Rev. 2015, 115, 11625− 11668. (11) Filippakopoulos, P.; Qi, J.; Picaud, S.; Shen, Y.; Smith, W. B.; Fedorov, O.; Morse, E. M.; Keates, T.; Hickman, T. T.; Felletar, I.; Philpott, M.; Munro, S.; McKeown, M. R.; Wang, Y.; Christie, A. L.; West, N.; Cameron, M. J.; Schwartz, B.; Heightman, T. D.; La Thangue, N.; French, C. A.; Wiest, O.; Kung, A. L.; Knapp, S.; Bradner, J. E. Selective inhibition of BET bromodomains. Nature 2010, 468, 1067−1073. (12) Zhang, G.; Plotnikov, A. N.; Rusinova, E.; Shen, T.; Morohashi, K.; Joshua, J.; Zeng, L.; Mujtaba, S.; Ohlmeyer, M.; Zhou, M. M. Structure-guided design of potent diazobenzene inhibitors for the BET bromodomains. J. Med. Chem. 2013, 56, 9251−9264. (13) Ran, X.; Zhao, Y.; Liu, L.; Bai, L.; Yang, C. Y.; Zhou, B.; Meagher, J. L.; Chinnaswamy, K.; Stuckey, J. A.; Wang, S. StructureBased Design of gamma-Carboline Analogues as Potent and Specific BET Bromodomain Inhibitors. J. Med. Chem. 2015, 58, 4927−4939. (14) Zhao, L.; Wang, Y.; Cao, D.; Chen, T.; Wang, Q.; Li, Y.; Xu, Y.; Zhang, N.; Wang, X.; Chen, D.; Chen, L.; Chen, Y. L.; Xia, G.; Shi, Z.; Liu, Y. C.; Lin, Y.; Miao, Z.; Shen, J.; Xiong, B. Fragment-based drug discovery of 2-thiazolidinones as BRD4 inhibitors: 2. Structure-based optimization. J. Med. Chem. 2015, 58, 1281−1297. (15) Chen, L.; Yap, J. L.; Yoshioka, M.; Lanning, M. E.; Fountain, R. N.; Raje, M.; Scheenstra, J. A.; Strovel, J. W.; Fletcher, S. BRD4 Structure-Activity Relationships of Dual PLK1 Kinase/BRD4 Bromodomain Inhibitor BI-2536. ACS Med. Chem. Lett. 2015, 6, 764−769. (16) Kitchen, D. B.; Decornez, H.; Furr, J. R.; Bajorath, J. Docking and scoring in virtual screening for drug discovery: Methods and applications. Nat. Rev. Drug Discovery 2004, 3, 935−949. 1689

DOI: 10.1021/acs.jcim.7b00098 J. Chem. Inf. Model. 2017, 57, 1677−1690

Article

Journal of Chemical Information and Modeling (36) Raccuglia, P.; Elbert, K. C.; Adler, P. D.; Falk, C.; Wenny, M. B.; Mollo, A.; Zeller, M.; Friedler, S. A.; Schrier, J.; Norquist, A. J. Machine-learning-assisted materials discovery using failed experiments. Nature 2016, 533, 73−76. (37) Li, L.; Khanna, M.; Jo, I.; Wang, F.; Ashpole, N. M.; Hudmon, A.; Meroueh, S. O. Target-Specific Support Vector Machine Scoring in Structure-Based Virtual Screening: Computational Validation, In Vitro Testing in Kinases, and Effects on Lung Cancer Cell Proliferation. J. Chem. Inf. Model. 2011, 51, 755−759. (38) Li, Y.; Wang, L.; Liu, Z.; Li, C.; Xu, J.; Gu, Q.; Xu, J. Predicting selective liver X receptor β agonists using multiple machine learning methods. Mol. BioSyst. 2015, 11, 1241−1250. (39) Heikamp, K.; Bajorath, J. Support vector machines for drug discovery. Expert Opin. Drug Discovery 2014, 9, 93−104. (40) Albrecht, B. K.; Gehling, V. S.; Hewitt, M. C.; Vaswani, R. G.; Cote, A.; Leblanc, Y.; Nasveschuk, C. G.; Bellon, S.; Bergeron, L.; Campbell, R.; Cantone, N.; Cooper, M. R.; Cummings, R. T.; Jayaram, H.; Joshi, S.; Mertz, J. A.; Neiss, A.; Normant, E.; O’Meara, M.; Pardo, E.; Poy, F.; Sandy, P.; Supko, J.; Sims, R. J., III; Harmange, J.-C.; Taylor, A. M.; Audia, J. E. Identification of a Benzoisoxazoloazepine Inhibitor (CPI-0610) of the Bromodomain and Extra-Terminal (BET) Family as a Candidate for Human Clinical Trials. J. Med. Chem. 2016, 59, 1330−1339. (41) Xue, X.; Zhang, Y.; Liu, Z.; Song, M.; Xing, Y.; Xiang, Q.; Wang, Z.; Tu, Z.; Zhou, Y.; Ding, K.; Xu, Y. Discovery of Benzo[cd]indol2(1H)-ones as Potent and Specific BET Bromodomain Inhibitors: Structure-Based Virtual Screening, Optimization, and Biological Evaluation. J. Med. Chem. 2016, 59, 1565−1579. (42) McKeown, M. R.; Shaw, D. L.; Fu, H.; Liu, S.; Xu, X.; Marineau, J. J.; Huang, Y.; Zhang, X.; Buckley, D. L.; Kadam, A.; Zhang, Z.; Blacklow, S. C.; Qi, J.; Zhang, W.; Bradner, J. E. Biased Multicomponent Reactions to Develop Novel Bromodomain Inhibitors. J. Med. Chem. 2014, 57, 9019−9027. (43) Gacias, M.; Gerona-Navarro, G.; Plotnikov, A. N.; Zhang, G.; Zeng, L.; Kaur, J.; Moy, G.; Rusinova, E.; Rodriguez, Y.; Matikainen, B.; Vincek, A.; Joshua, J.; Casaccia, P.; Zhou, M.-M. Selective Chemical Modulation of Gene Transcription Favors Oligodendrocyte Lineage Progression. Chem. Biol. 2014, 21, 841−854.

1690

DOI: 10.1021/acs.jcim.7b00098 J. Chem. Inf. Model. 2017, 57, 1677−1690