A Benchmark Study Based on 2P2IDB to Gain Insights Into the

Feb 8, 2018 - Here, based on the 2P2IDB database, we explored the structural features of the known small-molecule PPI inhibitors and analyzed the char...
1 downloads 9 Views 2MB Size
Subscriber access provided by UNIVERSITY OF TOLEDO LIBRARIES

Article

A Benchmark Study Based on 2P2IDB to Gain Insights Into the Discovery of Small-Molecule PPI Inhibitors Zhe Wang, Yu Kang, Dan Li, Huiyong Sun, Xiaowu Dong, Xiaojun Yao, Lei Xu, Shan Chang, Youyong Li, and Tingjun Hou J. Phys. Chem. B, Just Accepted Manuscript • DOI: 10.1021/acs.jpcb.7b12658 • Publication Date (Web): 08 Feb 2018 Downloaded from http://pubs.acs.org on February 11, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

The Journal of Physical Chemistry B is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

A Benchmark Study Based on 2P2IDB to Gain Insights into the Discovery of SmallMolecule PPI Inhibitors Authors Zhe Wang1, Yu Kang1, Dan Li1, Huiyong Sun1, Xiaowu Dong1, Xiaojun Yao2, Lei Xu3, Shan Chang3, Youyong Li4, Tingjun Hou1,*

Affiliation 1College

of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058,

China, 2State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Avenida Wai Long, Taipa, Macau (SAR), China, 3Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China, 4Institute of Functional Nano and Soft Materials (FUNSOM), Soochow University, Suzhou, Jiangsu 215123, China

*To whom correspondence should be addressed

Corresponding Author Tel: +86-571-88208412. Email: [email protected] or [email protected].

1

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ABSTRACT Protein-protein interactions (PPIs) have been regarded as novel and highly promising drug targets in drug discovery. Numerous new experimental techniques and computational approaches have been developed to assist the design of PPI modulators during the past two decades. However, identification and optimization of smallmolecule inhibitors targeting PPIs is still a particularly challenging task due to the ‘undruggable’ profiles of PPI interfaces. Nowadays, in silico screening, especially docking-based virtual screening (DBVS), has emerged as an effective method to complement experimental high-throughput screening (HTS) in identifying novel and potent small-molecule PPI inhibitors. Here, based on the 2P2IDB database, we explored the structural features of the known small-molecule PPI inhibitors and analyzed the characteristics of the PPI binding pockets. More importantly, we evaluated the sampling power and screening power of six popular docking programs for PPI targets. Our results indicate that the chlorinated conjugate group and amide-like linkage are two types of privileged fragments of PPI inhibitors; the average druggability of the binding sites of the PPI targets in 2P2IDB is slightly worse than that of traditional ones; both academic and commercial docking programs exhibit an acceptable accuracy on pose prediction for PPI inhibitors, but their screening powers for identifying PPI inhibitors are still not satisfactory. It is expected that our work can provide valuable guidance on the construction of PPI-focused library, the determination of druggable PPI binding pocket, and the selection of docking program for the screening of small-molecule PPI inhibitors.

Keywords: protein-protein interaction (PPI), Small-molecule PPI inhibitor, Molecular docking, Virtual screening, Hot spot, Benchmark

2

ACS Paragon Plus Environment

Page 2 of 32

Page 3 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

1. INTRODUCTION In the post-genomic era, the interactome, especially protein interactome, has been a pivotal research focus in current biological research due to its significance in regulating multiple vital cellular processes1-3 The so-called protein interactome is the full repertoire of protein-protein interactions (PPIs) that can occur in an organism. In living cells, only a small part of proteins exert their biological functions independently, and the vast majority (more than 80%) of proteins fulfill their duties by interacting with other molecules.4 More importantly, a large number of critical cellular processes and biochemical events, including gene expression, signal transduction, and membrane transport, are achieved by corresponding PPIs.5, 6 It has been widely recognized that the construction of PPI networks not only plays key role in predicting protein functions, but also provide valuable information to find “druggable” targets.7, 8 More and more studies reveal that PPIs are involved in different types of disease pathways where therapeutic interventions would bring significantly beneficial effects.911

For examples, blockage of the interaction between programmed cell death protein 1

(PD1) and its ligand PD-L1 by an antagonist has emerged as an effective way to combat several types of cancers;12,

13

disruption of the interaction between anti- and

proapoptotic B-cell lymphoma-leukemia 2 (Bcl-2) proteins by an inhibitor can reactivate the apoptosis in malignant cells for cancer therapy.14,

15

Therefore, it is

explicable that PPIs are becoming highly attractive targets for drug discovery, even though this therapeutic target class was deemed to be essentially ‘undruggable’ a few years ago. Obviously, the development of PPI drugs is filled with opportunities and challenges. On the one hand, with the release of more protein-protein complex crystallographic structures and different kinds of PPI structural databases, e.g. TIMBAL16, 2P2IDB17, PrePPI18, Structure-PPi19 and PPI3D20, structure-based approaches can be used to design PPI modulators rationally. On the other hand, the relatively flat and featureless binding surfaces of PPIs are one of the most challenging obstacles that still hinders the development of potent PPI drugs. Generally, PPI modulators can be divided into three major categories: (1) 3

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

humanized monoclonal antibodies, (2) peptides and peptidomimetics, and (3) small molecules. Although each class has its own advantages and disadvantages, from a medicinal chemistry and current drug development perspective, the class of smallmolecule modulators appears more amenable.21 Over the last two decades, numerous methodologies and strategies have been developed to design small-molecule PPI inhibitors.22-24 Notably, in silico screening, a computational approach complementary to experimental high-throughput screening (HTS), has been broadly applied to search for small-molecule PPI inhibitors.25-27 Docking-based virtual screening (DBVS) is one of the most widely-used structure-based methods which is employed against not only PPI targets but also traditional drug targets. It is noteworthy that several additional important issues, including the construction of PPI-focused virtual compound libraries and the identification of druggable binding sites, need to be considered when adopting DBVS method against PPI targets. In recent years, much attention has been paid to the handling of these issues, for example, Reynès and co-workers developed a program named PPI-HitProfiler to generate a focused chemical library enriched with putative PPI inhibitors using machine-learning methods.28 As another example, Bai et al. proposed an integrated approach using molecular fragment docking and coevolutionary analysis to estimate druggable protein-protein interfaces.29 However, relatively few studies have been reported to systematically benchmark the performances of current docking programs focusing on PPI inhibitors. In the present study, six docking programs (AutoDock Vina, PLANTS, rDock, Glide, GOLD and Surflex-Dock) that possess good performances on small molecules against traditional targets were selected for benchmarking based on the 2P2IDB database. Briefly, our work can be divided into three parts: (1) the structural features of PPI inhibitors were discussed based on a comparison between 238 PPI inhibitors and 1822 FDA-approved small-molecule drugs; (2) the characteristics of PPI binding sites were analyzed according to a comparison between the PPI targets in 2P2IDB and the traditional targets in the PDBbind core set; (3) the sampling power and screening power of the selected docking programs were evaluated systematically. 4

ACS Paragon Plus Environment

Page 4 of 32

Page 5 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

2. MATERIALS AND METHODS 2.1. Benchmark Data Set 2P2IDB is a manually curated database dedicated to the structures of PPIs with known small-molecule inhibitors.17, 30 The latest version of 2P2IDB (version 2.0) contains 300 protein-ligand complexes for 26 PPIs. All the complexes are roughly categorized into three classes: Class1, Class2 and Class3, which correspond to protein-peptide complexes, globular protein-protein complexes, and bromodomain/histone proteinprotein complexes, respectively. With the purpose of avoiding possible failure in the subsequent calculations, each structure was checked manually to make sure that the crystal structure is collision-free and has only a single ligand interacting with the target. Finally, a collection of 289 protein-ligand complexes chosen from 2P2IDB was used in our study. The excluded entries are listed in Table S1 in the Supporting Information.

2.2. Structure Preparation The structures of the protein-ligand complexes were standardized using an in-house script. Briefly, the Structure Preparation and MM (molecular mechanics optimization) functions in Molecular Operating Environment (MOE) 2014.09 (Chemical Computing Group Inc., Montreal, QC, Canada) were applied to the structure of each complex, and then the prepared structure was divided into a protein and a ligand. The rotated and optimized 3D structures of each ligand for docking benchmarking were generated based on the crystal structure automatically by an in-house python script which has been used in our previous study.31 Briefly, the original conformation of each ligand was rotated 180° around the Z axis in three-dimensional space, followed with a structural optimization. The initial 3D conformation of each ligand for virtual screening was obtained from the 2D structure in 2P2IDB using the MM function in MOE. The decoy molecules for the screening power evaluation were generated based on an automated decoy generation method provided by DUD-E (http://dude.docking.org/), and their 3D structures were created using the MOE suite.32 5

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

2.3. Comparison of Ligand Structures After removing the duplicates, a total of 238 PPI inhibitors from 2P2IDB were used for structure comparison. The structures of 1822 FDA-approved small-molecule drugs (molecular weight  2000 g/mol) were obtained from the e-Drug3D database.33 In order to gain a deeper insight into the difference between PPI inhibitors and marketed drugs, the distributions of seven important molecular properties for the PPI inhibitors and marketed drugs were compared. These molecular properties, including molecular weight, number of OH and NH groups, number of O and N atoms, logP (MOE logP model unpublished. Source code in $MOE/lib/svl/quasar.svl/q_logp.svl), aqueous solubility (logS)34, topological polar surface area (TPSA)35, and number of rotatable bonds, were calculated by MOE. Furthermore, based on the ECFP_6 fingerprints, a classifier was developed to distinguish PPI inhibitors from marked drugs by using the naïve Bayesian classification (NBC) technique in Discovery Studio 2.5 (Accelrys Software Inc., San Diego, CA, USA). Briefly, each molecule was categorized into a PPI-inhibitor or a non-PPI-inhibitor with a label of 1 or 0, and then the ECFP_6 fingerprints were calculated as the feature variables for classification based on Bayes’s theorem. Compared with other machine learning approaches, NBC can deal with largescale data, trains fast and is tolerant of random noise. Moreover, the naïve Bayesian classifier can provide the weight (or score) for each feature using a Laplacian-adjusted probability estimate, and then the importance of each fragment characterized by a fingerprint can be quantitatively evaluated. The fragments with high or low scores contribute positively or negatively to the likeness of PPI inhibitors. Therefore, a classifier can be utilized to discriminate the PPI inhibitors from the traditional drugs for the purpose of highlighting the important fragments for discrimination. The similar and detailed procedure to train a naïve Bayesian classifier has been described previously. 3642

2.4. Analysis of Binding Pocket 6

ACS Paragon Plus Environment

Page 6 of 32

Page 7 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Apart from 2P2IDB, the structures of the PDBbind core set (version 2016) were included for a comparative analysis.43 In order to ensure that there is no overlap in these two datasets, all PPI entries (PDB entries: 3P5O, 3U5J, 4BKT, 4LZS, 4OGJ, 4W9C, 4W9H, 4W9I, 4W9L and 4WIV) in the PDBbind core set were excluded. All protein structures were prepared using the Protein Preparation Wizard (PrepWizard) with the default settings. To avoid bias in searching binding poses, all crystallographic water molecules were removed. Then, the SiteMap (version 3.6) module of Schrodinger suite (Schrödinger, LLC, New York, NY) was used to analyze the binding site properties for each protein.44 The binding site position of each protein was located according to the coordinates of the co-crystallized ligand. The sitebox and grid parameters were set to 5.0 Å and 0.35 Å, respectively. To detect the shallow binding sites, the enclosure and maxvdw parameters were set to 0.4 and 0.55 kcal/mol, respectively. In addition, the dthresh and rthresh parameters were set to 7.5 and 6 Å to guarantee that the nearby subsites and the main site were merged. The other parameters were kept as the default values. Finally, seven SiteMap properties, including SiteScore (for binding-site identification), Dscore (for classifying druggability), Volume (site volume), Exposure (amount of exposure to solvent), Enclosure (degree of enclosure by the protein), Contact (average grid contact strength with the protein) and Balance (the ratio of relative hydrophobicity and hydrophilicity of binding site), of the top1 site for each protein were computed to characterize the binding pocket.

2.5. Molecular Docking Six different docking programs employed in our benchmarking study can be classified into three academic programs, including AutoDock Vina version 1.1.245, PLANTS version 1.246 and rDock version 2013.147, and three commercial programs, including Glide version 6801548, GOLD version 5.249 and Surflex-Dock version 2.706.1330250. The selection of these docking programs (except PLANTS) is mainly based on the evaluation of the performance on ten docking programs for a diverse set of proteinligand complexes reported in our previous study.31 Moreover, given the recent 7

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

successful applications of PLANTS in VS, we decided to include it in this benchmark.51, 52

For each target, the docking site was determined from the coordinates of the co-

crystalized PPI inhibitor. The maximum number of the docking conformations was set to 20 and the pose clustering distance cutoff was set to 0.5 Å for all the tested docking programs. The other critical parameters and scoring functions used in our study for each program are described as follows: AutoDock Vina. The default optimization parameters were used for conformation sampling and the poses were scored by the default scoring function. For each docking run, only single-threaded execution was requested (the cpu parameter was set to 1). PLANTS. The conformations were sampled under the screen mode based on the ant colony optimization (ACO) algorithm, and then scored using the PLANTSCHEPLP scoring function. The search_speed parameter was set to speed1. rDock. The standard docking protocol was used to generate low energy binding poses. Briefly, three stages of Genetic Algorithm search (GA1, GA2 and GA3) were initially executed, followed by low temperature Monte Carlo (MC) and Simplex minimization (MIN) stages. Docking score was calculated by the default scoring function. Glide. To obtain the best balance between accuracy and speed, the conformation sampling and scoring were completed under the standard precision (SP) mode of Glide. The OPLS-2005 force field was chosen for all docking calculations. GOLD. In order to apply optimal settings for conformation sampling, the autoscale parameter was set to 1, which indicates that 100% search efficiency was employed for each ligand. In addition, the early_termination option was turned on, which means that GOLD will terminate the docking runs on a given ligand as soon as a specified number of runs have given essentially the same answer. Docking poses were scored using the ChemPLP scoring function. Surflex-Dock. The “-pgeom” option was specified to select the built-in default parameter set choices for docking. Docking poses were ranked by the total scores of Surflex-Dock.

8

ACS Paragon Plus Environment

Page 8 of 32

Page 9 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

2.6. Virtual Screening In order to reduce the computational cost, three representative PPI targets (human double minute 2 (HDM2), von Hippel-Lindau (VHL), and BRD4 bromodomain 1 (BRD4-1)) with a sufficient number of known inhibitors from three different classes were chosen for the assessment of the screening power of six docking programs. In VS, the ratio of inhibitors and decoys was set to 1:50. The detailed information about the targets and screening datasets is listed in Table S2 in the Supporting Information. In order to eliminate the induced fit bias, the apo structure of each target extracted from the protein-protein/protein-peptide complex in 2P2IDB (PDB entries: 4YCR for HDM2, 4AJY for VHL, and 3UVW for BRD4-1) was utilized for screening. For each target, the coordinates of the inhibitor were used to determine the docking site. The values of the maximum number of docking conformations and pose clustering distance cutoff were set to 10 and 2.0 Å, respectively, while the other parameters were the same used in docking which have been described in the previous section.

2.7. Assessment Methods In molecular docking, the heavy-atoms RMSD (root mean square deviation) between the docked binding pose and the native binding pose was utilized as a key criterion to evaluate the sampling power of each docking program, and it is regarded as a successful docking if RMSD is less than 2.0 Å. The pose with the highest docking score (referred to as the top scored pose) and the pose that is the closest to the native binding pose (referred to as the best pose) were both analyzed. In VS, the AUC (area under curve) value of a ROC (receiver operating characteristic) plot was used to measure the screening power of each docking program. As a widely-used metric, an ROC curve is a plot of true-positive rates versus false-positive rates for all compounds and the AUC is the probability of active compounds being ranked earlier than decoy compounds. The pre-calculated data for the ROC plots were obtained with the enrichment.py script available

from

the

Schrödinger

(https://www.schrodinger.com/scriptcenter). 9

ACS Paragon Plus Environment

Script

Center

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

3. RESULTS AND DISCUSSION 3.1. The structural features of small-molecule PPI inhibitors It is obvious that a reasonable PPI-focused library is a prerequisite for the identification of potential PPI inhibitors in DBVS campaigns. In order to facilitate the construction of high-quality chemical databases enriched with putative PPI inhibitors, a proper understanding of the physicochemical and structural features of PPI inhibitors is quite necessary. As shown in Fig. 1, the distributions of seven important physicochemical properties of 238 existing PPI inhibitors and 1822 FDA-approved small-molecule drugs are depicted. The results show that there is no significant difference in the numbers of OH and NH groups (Lipinski’s hydrogen bond donors), numbers of O and N atoms (Lipinski’s hydrogen bond acceptors), TPSA and numbers of rotatable bonds between these two groups, while emphasize an average molecular weight of 460 g/mol for PPI inhibitors versus 380 g/mol for regular drugs, an average logP of 4.03 versus 2.43, and an average logS of -5.84 versus -3.93. These characteristics are coincident with the general and qualitative trends for PPI inhibitors, i.e., higher molecular weight, higher hydrophobicity, and lower solubility, which have been highlighted in other studies.53, 54 Although increasing the molecular weight and hydrophobicity of compounds can maximize their potency against PPI targets, it might bring a series of unfavorable pharmacokinetic issues during further development that will eventually lead to failure. Therefore, the urgent needs for designing balanced libraries for PPI inhibitors are still ahead of us.

Please insert Figure 1

From the view point of medicinal chemistry, revealing the privileged fragments of PPI inhibitors is of great significance for the rationalization of library construction and lead optimization for PPIs. In order to determine the important fragments for the likeness of PPI inhibitors, a naïve Bayesian classifier based on the ECFP_6 fingerprints 10

ACS Paragon Plus Environment

Page 10 of 32

Page 11 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

was developed. This classifier was validated with a 5-fold cross-validation and had an ROC Score of 0.97 for the discrimination of PPI inhibitors and traditional drugs. The important fragments with favorable or unfavorable contributions to the discrimination were then highlighted by the Bayesian scores. By observing the top 10 privileged fragments of PPI inhibitors shown in Fig. 2, we can find that the chloride groups appear with a relatively high frequency (fragments 1, 2, 3, 4 and 7). It is quite possible that the introduction of a chlorine substituent can increase the lipophilicity of the whole molecule, which is conducive to the enhancement of bioactivity.55 Interestingly, it can also be noticed that the amide groups (fragment 5) and amide-like linkages (fragments 9 and 10) are favorable fragments for PPI inhibitors. That is because small-molecule peptide mimetics occupy the vast majority of existing PPI inhibitors. Compared with PPI inhibitors, the privileged fragments of market drugs are relatively featureless and inexplicable. However, the fragments from existing drugs may be useful for experimental and computational chemists to design PPI inhibitors with improved druglikeness.

Please insert Figure 2

3.2. The characteristics of binding sites of PPI targets PPIs are usually classified as difficult or even ‘undruggable’ targets owing to their extended surface areas and shallow interactions at protein-protein binding interfaces. Thus, identifying the PPI targets with druggable binding sites is a crucial step for the development of therapeutics. As listed in Table 1, the average values of the SiteMap properties computed for the 289 binding sites of the PPI targets from 2P2IDB and those computed for the 274 binding sites of the conventional drug targets from the PDBbind core set have intriguing differences (the density plots of these properties are also provided in the Supporting Information). The smaller pocket volumes, higher exposure scores and lower enclosure scores of PPI binding sites indicate that they are too narrow and flat to form a buried binding with ligands. However, the average values of SiteScore 11

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

and Dscore remind us that the evaluated PPI binding sites are still druggable, although slightly worse than the traditional ones. One possible explanation for this is that the relatively higher hydrophobic feature of PPI binding sites (with a larger Balance value) relieves the shape defect and enhances their druggability. These results of our quantitative assessment can not only help us to gain a deep insight into the differences of the binding site features between prominent PPI targets and traditional drug targets, but also give a reference standard for the selection of PPI binding sites.

Please insert Table 1

2P2IDB is composed of three different classes of PPI targets whose interfaces have their own characteristics. Thus, a further analysis of the PPI binding sites for the three subsets of 2P2IDB was also conducted. Fig. 3 shows the distributions of the seven SiteMap properties for the binding sites from different categories. The SiteScore and Dscore distributions of Class1 and Class2 are similar and lower than those of Class3, suggesting that the binding sites in Class3 are more druggable. In addition, it is easy to notice that the binding sites in Class3 are relative bigger (higher distribution of Volume) and deeper (lower distribution of Exposure and higher distribution of Enclosure). It is noteworthy that there are several extreme cases in the distributions. For example, in Class2, there are 8 binding sites (PDB entries: 4LV6, 4LUC, 4L7D, 4IFN, 4N1B, 4IQK, 4L7B, and 4L7C) with the pocket volumes over than 450 Å3; in Class1, there are 5 binding sites (PDB entries: 4C5D, 4LWV, 4OGN, 4LWU, and 4JSC) with the Balance values higher than 20. Overall, the binding sites in Class3 are more likely to be tightbinding sites than those in Class1 and Class2.

Please insert Figure 3

3.3. The sampling power of current docking programs for PPI-inhibitor systems In the According to the results of the sampling power benchmark on the entire 2P2IDB 12

ACS Paragon Plus Environment

Page 12 of 32

Page 13 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

for all the tested programs (Fig. 4A), we can find that the distributions of the success rates (a percentage of correctly docked ligands) for the top scored poses range from about 40 to 50%, and those for the best poses range from about 63 to 75%. In term of the success rates for the top scored poses, the performances of the docking programs follow the following rank order: GOLD (50.5%) > AutoDock Vina (47.4%) > PLANTS (46.4%) > Surflex-Dock (42.6%) > rDock (41.9%) > Glide (40.5%). By focusing the attention on the success rates for the best poses, the rank order of the performances changes significantly: Surflex-Dock (75.8%) > AutoDock Vina (74.4%) > rDock (73.0%) > Glide (67.8%) > GOLD (64.0%) > PLANTS (63.0%). As we observed, the GOLD and Surflex-Dock programs exhibit the best performances for the top scored poses and best poses, respectively. Such results indicate that the commercial docking programs have slightly better performances than the academic ones on the sampling power for PPI targets. It is important to note in particular that AutoDock Vina, an opensource program designed and implemented by Dr. Oleg Trott at Scripps Research Institute (http://www.scripps.edu/), can achieve the second-best accuracy for both the top scored poses and best poses.

Please insert Figure 4

On the other hand, the consistent rate for each docking program was also calculated. The consistent rate is defined as SRtsp/SRbp, where SRtsp and SRbp are the success rates for the top scored poses and best poses, respectively. To some extent, this parameter reflects the ranking power of a docking program which is the ability to identify the nearnative pose out of a set of docked decoy poses. As we can see from Fig. 4B, GOLD and PLANTS achieve the highest consistent rates (78.9% for GOLD and 73.6% for PLANTS) for the PPI-inhibitor systems in 2P2IDB. Interestingly, such results agree with those reported in our previous benchmarking study (theconsistent rate of GOLD is 82.5%), suggesting that the scoring function of GOLD (ChemPLP) is relatively more accurate and robust among the scoring functions used in the six evaluated docking 13

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

programs 31. The widely used commercial docking program, Surflex-Dock, exhibits the lowest consistent rate (56.2%), even though it achieves the highest success rate of pose sampling. Therefore, the rescoring step after docking is still necessary to make up for the drawbacks of current docking scoring functions. In addition, the pose prediction performances of the evaluated docking programs were further dissected for each class. As illustrated in Fig. 5, the performances are not immutable when the tested programs were applied to different classes. For example, on the basis of the results for the top scored poses, GOLD shows the best performance on Class2 and Class3, while AutoDock Vina displays a better accuracy than GOLD on Class1; Surflex-Dock shows a mediocre performance on Class1 and Class2, but becomes terrible on Class3. The average success rates for the top scored poses of the three academic programs (AutoDock Vina, PLANTS and rDock) and three commercial programs (Glide, GOLD and Surflex-Dock) are 49.0% and 47.9% for Class1, 52.9% and 53.8 for Class2, and 35.6% and 34.3% for Class3, respectively, indicating that the commercial programs have no obvious advantage on pose prediction at least for the top scored poses. On the whole, the average success rates of all programs for the best poses are 64.0% (Class1), 73.8% (Class2) and 73.4% (Class3). In other words, the accurate predictions of the binding structures for peptide inhibitors against PPIs are relative more challenging.

Please insert Figure 5

The analysis of the failure cases may offer valuable information for the users and developers of docking programs in a sense. All of the failure cases are summarized in Table S3 in the Supporting Information. The binding structures of the PPI inhibitors in 18 crystal structures could not be well predicted by any of the tested docking programs. In order to figure out the possible reasons for these unsuccessful docking, both of the ligand properties and binding site features of the failure cases were analyzed. Among the 18 unsuccessful docked PPI inhibitors, only 4 ligands (1 identical ligand binds to 4 14

ACS Paragon Plus Environment

Page 14 of 32

Page 15 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

different receptors) have more than 15 rotatable bonds, implying that the flexibility of ligand may not be the dominating factor leading to the failures. On the other side, it can be found that about 78% (14/18) of the ligands are not neutral. For instance, the Bcl-xL inhibitor BM501 contains +2 formal charges owing to the protonation of the nitrogen atoms from the piperazine group. As depicted in Fig. 6, the large conformational deviation, especially at the charged group, between the docking poses and crystal structures implies that the accuracy of nowadays docking methodologies would be reduced significantly when handling charged systems. Another finding shows that the 18 failure cases cover nearly 70% (7/11) of the targets whose binding sites cannot be successfully recognized by SiteMap (no valid binding pocket was identified by SiteMap). In addition, other 4 difficult or undruggable binding sites with Dscore less than 0.8 are also among these failures. Interestingly, we also observe that there is only 1 failure case in Class3 which is the most druggable target category in 2P2IDB. Apparently, accurate predictions of the binding poses for unrecognizable or poor PPI binding sites are extremely challenging for current docking programs.

Please insert Figure 6

3.4. The screening power of current docking programs for PPI targets The ROC curves for the three representative targets from three classes in 2P2IDB are shown in Fig. 7. As evidenced by the figure, there is a high degree of variability in the results of the screening powers, among not only targets but also docking programs. For HVL (Class2), acceptable early recovery results are obtained from Glide, GOLD and PLANTS, whereas for HDM2 (Class1) only Glide docking yields acceptable early recovery results. For BRD4-1 (Class3), none of the evaluated docking programs shows acceptable results. One possible reason for the poor screening capability of the tested docking programs for BRD4-1 may be explained by the fact that we did not keep any crystal water molecules during the VS, but they play a crucial role in the binding of various inhibitors to the bromodomains.56, 57 It was reported that several structurally 15

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

conserved water molecules at the base of the acetyl-lysine binding pocket of BRD4-1 can offer additional hydrogen bonding potential with small molecule inhibitors.58, 59 However the influence of water molecules to the VS of PPI targets is beyond the scope of this study, and will be discussed in our next work.

Please insert Figure 7

As shown in Table 2, different docking programs yield variable screening powers even for the same target, indicated by the AUC values with relatively large fluctuations. For example, for VHL, the AUC values given by Glide and GOLD (0.82) is significantly higher than that given by Surflex-Dock (0.54); for HDM2, the AUC value given by Glide (0.77) is apparently higher than that given by rDock (0.41); for BRD41, the AUC value given by Glide (0.67) is obviously higher than that given by PLANTS (0.46). Overall, Glide achieved the best screening powers for all the three tested targets. However, compared with conventional drug targets, the screening powers of the tested docking programs to the PPI targets deteriorate sharply. Taking Glide as an example, Repasky and co-workers reported that Glide (SP mode) can obtain an average AUC of 0.80 for 39 different conventional drug targets (e.g. serine proteases, kinases, metalloenzymes, nuclear hormone receptors, and folate enzymes) in the DUD dataset with a best-practice preparation scheme. However, for the three tested PPI targets in this study, the average AUC achieved by Glide is only 0.75.60 Even though a proper structure preparation, the consideration of protein flexibility and solvent effect may improve the enrichment performance, one should keep in mind that it is still difficult to get desired results by adopting simply single step or single program for the VS of PPI targets. A better strategy for the DBVS of PPI targets would be the combination of different docking tools into a single platform, which can be benefited from the advantages of different algorithms. For example, we can use Surflex-Dock to generate the binding poses of ligands, and then use Glide to rescore the poses predicted by Surflex-Dock. Furthermore, using molecular docking together with other structure16

ACS Paragon Plus Environment

Page 16 of 32

Page 17 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

based methods such as pharmacophore modeling and molecular dynamics simulations or ligand-based approaches may also achieve better result for screening PPI inhibitors.

4. CONCLUSION In this study, based on the 2P2IDB database, we explored the structural features of smallmolecule PPI inhibitors and analyzed the characteristics of PPI binding pockets. Then, we evaluated the performances of six docking programs for sampling and screening of PPI inhibitors. Although none of the tested docking programs has a satisfactory performance on both binding pose prediction and virtual screening for PPI targets, several useful conclusions summarized below can be obtained. (1) We found that, compared with traditional drugs, PPI inhibitors possess higher molecular weight, higher hydrophobicity and lower solubility. In addition, we further confirmed the top 10 privileged fragments (e.g. chloride group and amide group) for PPI inhibitors. (2) The average druggability of the binding sites of the PPI targets in 2P2IDB is slightly worse than traditional ones. The targets in Class3 are relatively more druggable than those in Class1 and Class2. (3) Surflex-Dock exhibits the best pose sampling power for the best poses with a success rate of 75.8%, while GOLD owns the best pose ranking power with a consistent rate of 78.9%. The charged ligand and poor binding site are two main factors for the docking failure of PPI inhibitors. (4) Although Glide shows a relatively better screening power for some PPI targets (HDM2 and VHL), the DBVS for PPI targets is still a big challenge for current docking programs. More rational and more accurate integrated strategies need to be developed for improving the performance of the VS for PPI targets.

ASSOCIATED CONTENT Figure S1: The density plots of the seven SiteMap properties for the 2P2IDB database and PDBbind core set. 17

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure S2: Cumulative distribution curves of RMSD for the entire database. The top scored poses (left) and best poses (right) were both analyzed. Dotted lines indicate a 2.0 Å RMSD cutoff. Table S1: The detailed information about excluded entries for benchmark. Table S2: The detailed information about targets and libraries for screening power evaluation. Table S3: The failure cases of each docking program.

AUTHOR INFORMATION Corresponding Author Tel: +86-571-88208412. Email: [email protected] or [email protected]. Notes The authors declare no competing financial interest.

ACKNOWLEDGEMENTS This study was supported by the National Key R&D Program of China (2016YFA0501701; 2016YFB0201700), and the National Science Foundation of China (21575128; 81773632). We thank the National Supercomputer Center in Guangzhou (NSCC-GZ) for providing the computing resources.

REFERENCES 1. 2.

Bonetta, L., Protein-protein interactions: Interactome under construction. Nature 2010, 468, 851-4. Perkel, J. M., Protein-Protein Interaction Technologies Toward a Human Interactome. Science 2010, 329, 463-465.

3.

Yu, H.; Tardivo, L.; Tam, S.; Weiner, E.; Gebreab, F.; Fan, C.; Svrzikapa, N.; Hirozane-Kishikawa, T.; Rietman, E.; Yang, X.; Sahalie, J.; Salehi-Ashtiani, K.; Hao, T.; Cusick, M. E.; Hill, D. E.; Roth, F. P.; Braun, P.; Vidal, M., Next-generation sequencing to generate interactome datasets. Nat. Methods 2011, 8, 478-480.

4.

Berggard, T.; Linse, S.; James, P., Methods for the detection and analysis of protein-protein interactions. Proteomics 2007, 7, 2833-2842.

5.

Ivanov, A. A.; Khuri, F. R.; Fu, H. A., Targeting protein-protein interactions as an anticancer strategy. Trends Pharmacol. Sci. 2013, 34, 393-400. 18

ACS Paragon Plus Environment

Page 18 of 32

Page 19 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

6.

Keskin, O.; Tuncbag, N.; Gursoy, A., Predicting Protein-Protein Interactions from the Molecular to

7.

Letovsky, S.; Kasif, S., Predicting protein function from protein/protein interaction data: a

the Proteome Level. Chem. Rev. 2016, 116, 4884-4909. probabilistic approach. Bioinformatics 2003, 19, i197-204. 8.

Wang, Y. C.; Chen, S. L.; Deng, N. Y.; Wang, Y., Computational probing protein-protein interactions targeting small molecules. Bioinformatics 2016, 32, 226-234.

9.

Ideker, T.; Sharan, R., Protein networks in disease. Genome Res. 2008, 18, 644-652.

10. Taylor, I. W.; Wrana, J. L., Protein interaction networks in medicine and disease. Proteomics 2012, 12, 1706-1716. 11. Hu, G.; Xiao, F.; Li, Y.; Li, Y.; Vongsangnak, W., Protein-Protein Interface and Disease: Perspective from Biomolecular Networks. Adv. Biochem. Eng. Biotechnol. 2016, 160, 57-74. 12. Zitvogel, L.; Kroemer, G., Targeting PD-1/PD-L1 interactions for cancer immunotherapy. Oncoimmunology 2012, 1, 1223-1225. 13. Chang, H. N.; Liu, B. Y.; Qi, Y. K.; Zhou, Y.; Chen, Y. P.; Pan, K. M.; Li, W. W.; Zhou, X. M.; Ma, W. W.; Fu, C. Y.; Qi, Y. M.; Liu, L.; Gao, Y. F., Blocking of the PD-1/PD-L1 Interaction by a DPeptide Antagonist for Cancer Immunotherapy. Angew. Chem. Int. Ed. Engl. 2015, 54, 11760-11764. 14. Souers, A. J.; Leverson, J. D.; Boghaert, E. R.; Ackler, S. L.; Catron, N. D.; Chen, J.; Dayton, B. D.; Ding, H.; Enschede, S. H.; Fairbrother, W. J.; Huang, D. C. S.; Hymowitz, S. G.; Jin, S.; Khaw, S. L.; Kovar, P. J.; Lam, L. T.; Lee, J.; Maecker, H. L.; Marsh, K. C.; Mason, K. D.; Mitten, M. J.; Nimmer, P. M.; Oleksijew, A.; Park, C. H.; Park, C. M.; Phillips, D. C.; Roberts, A. W.; Sampath, D.; Seymour, J. F.; Smith, M. L.; Sullivan, G. M.; Tahir, S. K.; Tse, C.; Wendt, M. D.; Xiao, Y.; Xue, J. C.; Zhang, H. C.; Humerickhouse, R. A.; Rosenberg, S. H.; Elmore, S. W., ABT-199, a potent and selective BCL-2 inhibitor, achieves antitumor activity while sparing platelets. Nat. Med. 2013, 19, 202-208. 15. Delbridge, A. R. D.; Grabow, S.; Strasser, A.; Vaux, D. L., Thirty years of BCL-2: translating cell death discoveries into novel cancer therapies. Nat. Rev. Cancer 2016, 16, 99-109. 16. Higueruelo, A. P.; Schreyer, A.; Bickerton, G. R.; Pitt, W. R.; Groom, C. R.; Blundell, T. L., Atomic interactions and profile of small molecules disrupting protein-protein interfaces: the TIMBAL database. Chem. Biol. Drug Des. 2009, 74, 457-467. 17. Basse, M. J.; Betzi, S.; Bourgeas, R.; Bouzidi, S.; Chetrit, B.; Hamon, V.; Morelli, X.; Roche, P., 2P2Idb: a structural database dedicated to orthosteric modulation of protein-protein interactions. Nucleic Acids Res. 2013, 41, D824-827. 18. Zhang, Q. C.; Petrey, D.; Garzon, J. I.; Deng, L.; Honig, B., PrePPI: a structure-informed database of protein-protein interactions. Nucleic Acids Res. 2013, 41, D828-833. 19. Vazquez, M.; Valencia, A.; Pons, T., Structure-PPi: a module for the annotation of cancer-related single-nucleotide variants at protein-protein interfaces. Bioinformatics 2015, 31, 2397-2399. 20. Dapkunas, J.; Timinskas, A.; Olechnovic, K.; Margelevicius, M.; Diciunas, R.; Venclovas, C., The PPI3D web server for searching, analyzing and modeling protein-protein interactions in the context of 3D structures. Bioinformatics 2016, 33, 935-937. 21. Zinzalla, G.; Thurston, D. E., Targeting protein-protein interactions for therapeutic intervention: a challenge for the future. Future Med. Chem. 2009, 1, 65-93. 22. Milroy, L. G.; Grossmann, T. N.; Hennig, S.; Brunsveld, L.; Ottmann, C., Modulators of proteinprotein interactions. Chem Rev 2014, 114, 4695-4748. 23. Modell, A. E.; Blosser, S. L.; Arora, P. S., Systematic Targeting of Protein-Protein Interactions. 19

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Trends Pharmacol. Sci. 2016, 37, 702-713. 24. Falchi, F.; Caporuscio, F.; Recanatini, M., Structure- based design of small- molecule proteinprotein interaction modulators: the story so far. Future Med. Chem. 2014, 6, 343-357. 25. Zhong, S.; Macias, A. T.; MacKerell, A. D., Jr., Computational identification of inhibitors of proteinprotein interactions. Curr. Top. Med. Chem. 2007, 7, 63-82. 26. Voet, A.; Banwell, E. F.; Sahu, K. K.; Heddle, J. G.; Zhang, K. Y., Protein interface pharmacophore mapping tools for small molecule protein: protein interaction inhibitor discovery. Curr. Top. Med. Chem. 2013, 13, 989-1001. 27. Johnson, D. K.; Karanicolas, J., Ultra-High-Throughput Structure-Based Virtual Screening for Small-Molecule Inhibitors of Protein-Protein Interactions. J. Chem. Inf. Model. 2016, 56, 399-411. 28. Reynès, C.; Host, H.; Camproux, A.-C.; Laconde, G.; Leroux, F.; Mazars, A.; Deprez, B.; Fahraeus, R.; Villoutreix, B. O.; Sperandio, O., Designing focused chemical libraries enriched in proteinprotein interaction inhibitors using machine-learning methods. PLoS Comput. Biol. 2010, 6, e1000695. 29. Bai, F.; Morcos, F.; Cheng, R. R.; Jiang, H. L.; Onuchic, J. N., Elucidating the druggable interface of protein-protein interactions using fragment docking and coevolutionary analysis. Proc. Natl. Acad. Sci. U.S.A. 2016, 113, E8051-8058. 30. Bourgeas, R.; Basse, M. J.; Morelli, X.; Roche, P., Atomic analysis of protein-protein interfaces with known inhibitors: the 2P2I database. PLoS One 2010, 5, e9598. 31. Wang, Z.; Sun, H.; Yao, X.; Li, D.; Xu, L.; Li, Y.; Tian, S.; Hou, T., Comprehensive evaluation of ten docking programs on a diverse set of protein-ligand complexes: the prediction accuracy of sampling power and scoring power. Phys. Chem. Chem. Phys. 2016, 18, 12964-12975. 32. Mysinger, M. M.; Carchia, M.; Irwin, J. J.; Shoichet, B. K., Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking. J. Med. Chem. 2012, 55, 6582-6594. 33. Pihan, E.; Colliandre, L.; Guichou, J. F.; Douguet, D., e-Drug3D: 3D structure collections dedicated to drug repurposing and fragment-based drug design. Bioinformatics 2012, 28, 1540-1541. 34. Hou, T. J.; Xia, K.; Zhang, W.; Xu, X. J., ADME evaluation in drug discovery. 4. Prediction of aqueous solubility based on atom contribution approach. J. Chem. Inf. Comput. Sci. 2004, 44, 266275. 35. Ertl, P.; Rohde, B.; Selzer, P., Fast calculation of molecular polar surface area as a sum of fragmentbased contributions and its application to the prediction of drug transport properties. J. Med. Chem. 2000, 43, 3714-3717. 36. Wang, S.; Li, Y.; Wang, J.; Chen, L.; Zhang, L.; Yu, H.; Hou, T., ADMET evaluation in drug discovery. 12. Development of binary classification models for prediction of hERG potassium channel blockage. Mol. Pharm. 2012, 9, 996-1010. 37. Tian, S.; Wang, J. M.; Li, Y. Y.; Xu, X. J.; Hou, T. J., Drug-likeness Analysis of Traditional Chinese Medicines: Prediction of Drug-likeness Using Machine Learning Approaches. Mol. Pharm. 2012, 9, 2875-2886. 38. Chen, L.; Li, Y.; Yu, H.; Zhang, L.; Hou, T., Computational models for predicting substrates or inhibitors of P-glycoprotein. Drug Discov. Today 2012, 17, 343-351. 39. Li, D.; Chen, L.; Li, Y.; Tian, S.; Sun, H.; Hou, T., ADMET Evaluation in Drug Discovery. 13. Development ofin SilicoPrediction Models for P-Glycoprotein Substrates. Mol. Pharm. 2014, 11, 716-726. 40. Tian, S.; Sun, H.; Li, Y.; Pan, P.; Li, D.; Hou, T., Development and evaluation of an integrated virtual 20

ACS Paragon Plus Environment

Page 20 of 32

Page 21 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

screening strategy by combining molecular docking and pharmacophore searching based on multiple protein structures. J. Chem. Inf. Model. 2013, 53, 2743-2756. 41. Tian, S.; Sun, H.; Pan, P.; Li, D.; Zhen, X.; Li, Y.; Hou, T., Assessing an ensemble docking-based virtual screening strategy for kinase targets by considering protein flexibility. J. Chem. Inf. Model. 2014, 54, 2664-2679. 42. Wang, S.; Sun, H.; Liu, H.; Li, D.; Li, Y.; Hou, T., ADMET Evaluation in Drug Discovery. 16. Predicting hERG Blockers by Combining Multiple Pharmacophores and Machine Learning Approaches. Mol. Pharm. 2016, 13, 2855-2866. 43. Liu, Z.; Li, Y.; Han, L.; Li, J.; Liu, J.; Zhao, Z.; Nie, W.; Liu, Y.; Wang, R., PDB-wide collection of binding data: current status of the PDBbind database. Bioinformatics 2015, 31, 405-412. 44. Halgren, T. A., Identifying and characterizing binding sites and assessing druggability. J. Chem. Inf. Model. 2009, 49, 377-389. 45. Trott, O.; Olson, A. J., AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 2010, 31, 455-461. 46. Korb, O.; Stützle, T.; Exner, T. E. PLANTS: Application of ant colony optimization to structurebased drug design. In International Workshop on Ant Colony Optimization and Swarm Intelligence 2006, 4150, 247-258. 47. Ruiz-Carmona, S.; Alvarez-Garcia, D.; Foloppe, N.; Garmendia-Doval, A. B.; Juhos, S.; Schmidtke, P.; Barril, X.; Hubbard, R. E.; Morley, S. D., rDock: a fast, versatile and open source program for docking ligands to proteins and nucleic acids. PLoS Comput. Biol. 2014, 10, e1003571. 48. Friesner, R. A.; Banks, J. L.; Murphy, R. B.; Halgren, T. A.; Klicic, J. J.; Mainz, D. T.; Repasky, M. P.; Knoll, E. H.; Shelley, M.; Perry, J. K.; Shaw, D. E.; Francis, P.; Shenkin, P. S., Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J. Med. Chem. 2004, 47, 1739-1749. 49. Jones, G.; Willett, P.; Glen, R. C.; Leach, A. R.; Taylor, R., Development and validation of a genetic algorithm for flexible docking. J Mol Biol 1997, 267, 727-748. 50. Jain, A. N., Surflex: Fully automatic flexible molecular docking using a molecular similarity-based search engine. J. Med. Chem. 2003, 46, 499-511. 51. Jansen, C.; Wang, H.; Kooistra, A. J.; de Graaf, C.; Orrling, K. M.; Tenor, H.; Seebeck, T.; Bailey, D.; de Esch, I. J.; Ke, H.; Leurs, R., Discovery of novel Trypanosoma brucei phosphodiesterase B1 inhibitors by virtual screening against the unliganded TbrPDEB1 crystal structure. J. Med. Chem. 2013, 56, 2087-2096. 52. Kooistra, A. J.; Vischer, H. F.; McNaught-Flores, D.; Leurs, R.; de Esch, I. J. P.; de Graaf, C., Function-specific virtual screening for GPCR ligands using a combined scoring method. Sci. Rep. 2016, 6, srep28288. 53. Sperandio, O.; Reynes, C. H.; Camproux, A. C.; Villoutreix, B. O., Rationalizing the chemical space of protein-protein interaction inhibitors. Drug Discov. Today 2010, 15, 220-229. 54. Villoutreix, B. O.; Labbe, C. M.; Lagorce, D.; Laconde, G.; Sperandio, O., A Leap into the Chemical Space of Protein-Protein Interaction Inhibitors. Curr. Pharm. Des. 2012, 18, 4648-4667. 55. Naumann, K., Influence of chlorine substituents on biological activity of chemicals: a review (Reprinted from J Prakt Chem, vol 341, pg 417-435, 1999). Pest Manag. Sci. 2000, 56, 3-21. 56. Vukovic, S.; Brennan, P. E.; Huggins, D. J., Exploring the role of water in molecular recognition: predicting protein ligandability using a combinatorial search of surface hydration sites. J. Phys. Condens. Matter 2016, 28, 344007. 21

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 32

57. Crawford, T. D.; Tsui, V.; Flynn, E. M.; Wang, S. M.; Taylor, A. M.; Cote, A.; Audia, J. E.; Beresini, M. H.; Burdick, D. J.; Cummings, R.; Dakin, L. A.; Duplessis, M.; Good, A. C.; Hewitt, M. C.; Huang, H. R.; Jayaram, H.; Kiefer, J. R.; Jiang, Y.; Murray, J.; Nasveschuk, C. G.; Pardo, E.; Poy, F.; Romero, F. A.; Tang, Y.; Wang, J.; Xu, Z. W.; Zawadzke, L. E.; Zhu, X. Y.; Albrecht, B. K.; Magnuson, S. R.; Bellon, S.; Cochran, A. G., Diving into the Water: Inducible Binding Conformations for BRD4, TAF1(2), BRD9, and CECR2 Bromodomains. J. Med. Chem. 2016, 59, 5391-5402. 58. Ember, S. W.; Zhu, J. Y.; Olesen, S. H.; Martin, M. P.; Becker, A.; Berndt, N.; Georg, G. I.; Schonbrunn, E., Acetyl-lysine binding site of bromodomain-containing protein 4 (BRD4) interacts with diverse kinase inhibitors. ACS Chem. Biol. 2014, 9, 1160-1171. 59. Deepak, V.; Wang, B.; Koot, D.; Kasonga, A.; Stander, X. X.; Coetzee, M.; Stander, A., In silico design and

bioevaluation

of

selective

benzotriazepine

BRD4

inhibitors

with

potent

antiosteoclastogenic activity. Chem. Bio.l Drug Des. 2016, 90, 97-111. 60. Repasky, M. P.; Murphy, R. B.; Banks, J. L.; Greenwood, J. R.; Tubert-Brohman, I.; Bhat, S.; Friesner, R. A., Docking performance of the glide program as evaluated on the Astex and DUD datasets: a complete set of glide SP results and selected results for a new scoring function integrating WaterMap and glide. J. Comput. Aided Mol. Des. 2012, 26, 787-799. 61. Pettersen, E. F.; Goddard, T. D.; Huang, C. C.; Couch, G. S.; Greenblatt, D. M.; Meng, E. C.; Ferrin, T. E., UCSF chimera - A visualization system for exploratory research and analysis. J. Comput. Chem. 2004, 25, 1605-1612.

22

ACS Paragon Plus Environment

Page 23 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Table 1. Average values of seven SiteMap properties across the 2P2IDB database and PDBbind core set. Source

2P2IDB2

PDBbind3

SiteScore

Dscore

Volume1

Exposure

Enclosure

Contact

Balance

0.93

1.04

217.61

0.70

0.44

0.45

1.61

±0.08

±0.11

±131.33

±0.07

±0.06

±0.08

±1.08

1.01

1.05

394.12

0.54

0.63

0.71

1.30

±0.08

±0.10

±193.25

±0.14

±0.15

±0.25

±1.68

1

The unit of binding pocket volume is Å3.

2

The number of successfully computed proteins from 2P2IDB is 289.

3

The number of successfully computed proteins from PDBbind core set is 274.

Table 2. The AUC value of six docking programs for the representative targets from three classes. Target

AUC Autodock Vina

PLANTS

rDock

Glide

GOLD

Surflex-Dock

HDM2

0.50

0.63

0.41

0.77

0.56

0.42

VHL

0.60

0.79

0.59

0.82

0.82

0.54

BRD4-1

0.63

0.46

0.64

0.67

0.51

0.55

23

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Legend of Figures

Figure 1. The distributions of seven molecular properties, including molecular weight, number of OH and NH groups, number of O and N atoms, logP, logS, topological polar surface area (TPSA) and number of rotatable bonds for PPI inhibitors (orange) and FDA approved small molecule drugs (blue). Figure 2. Top 10 good features from 238 PPI inhibitors (A) and 1822 FDA approved small molecule drugs (B). The structures of PPI inhibitors and drugs are from 2P2IDB and e-Drug3D database, respectively. Figure 3. Boxplot graph of SiteMap properties of the targets from different categories in 2P2IDB. The number of successfully computed binding pockets for Class1, Class2, and Class3 are 114 (3 failures), 62 (8 failures), and 102, respectively. Figure 4. Success rates (A) and consistent rates (B) of six docking programs (AutoDock, PLANTS, rDock, Glide, GOLD and Surflex-Dock) based on 289 complexes from 2P2IDB. 2.0 Å was used as RMSD cutoff. Figure 5. Cumulative distribution curves of RMSD for each class. (A) Class1 (117 complexes), (B) Class2 (70 complexes) and (C) Class3 (102 complexes). Top scored poses (left) and best poses (right) are both analyzed. Dotted lines indicate a 2.0 Å RMSD cutoff. Figure 6. Chemical structure of Bcl-xL inhibitor (PDB code: 3SPF) and superimposed best docked poses. The binding pocket is represented as surface with coulombic surface coloring (negative charges is in red; positive charges is in blue) generated using UCSF Chimera version 1.11.2.61 The ligand is depicted as sticks and the carbon atoms from crystal structure and docking pose are colored yellow and green, respectively. Figure 7. ROC curves for the three targets in different classes using six docking programs. (A) HDM2 (Class1), (B) VHL(Class2) and (C) BRD4-1 (Class3).

24

ACS Paragon Plus Environment

Page 24 of 32

Page 25 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Figure 1.

25

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2.

26

ACS Paragon Plus Environment

Page 26 of 32

Page 27 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Figure 3.

27

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

50af7328dfd7

Figure 4.

28

ACS Paragon Plus Environment

Page 28 of 32

Page 29 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Figure 5.

29

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 6.

30

ACS Paragon Plus Environment

Page 30 of 32

Page 31 of 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Figure 7.

31

ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table of Contents Graphic

32

ACS Paragon Plus Environment

Page 32 of 32