Do Fragments and Crystallization Additives Bind Similarly to Drug-like

Apr 17, 2017 - The success of fragment-based drug design (FBDD) hinges upon the optimization of low-molecular-weight compounds (MW < 300 Da) with weak...
0 downloads 13 Views 6MB Size
Article pubs.acs.org/jcim

Do Fragments and Crystallization Additives Bind Similarly to Druglike Ligands? Malgorzata N. Drwal,† Célien Jacquemard,† Carlos Perez,‡ Jérémy Desaphy,§ and Esther Kellenberger*,† †

Laboratoire d’Innovation Thérapeutique UMR 7200, CNRS-Université de Strasbourg, 74 Route du Rhin, 674000 Illkirch, France Eli Lilly Research Laboratories, Avenida de la Industria 30, 28108 Alcobendas, Madrid, Spain § Lilly Research Laboratories, Eli Lilly and Company, Lilly Corporate Center, Indianapolis, Indiana 46285, United States ‡

S Supporting Information *

ABSTRACT: The success of fragment-based drug design (FBDD) hinges upon the optimization of low-molecular-weight compounds (MW < 300 Da) with weak binding affinities to lead compounds with high affinity and selectivity. Usually, structural information from fragment−protein complexes is used to develop ideas about the binding mode of similar but drug-like molecules. In this regard, crystallization additives such as cryoprotectants or buffer components, which are highly abundant in crystal structures, are frequently ignored. Thus, the aim of this study was to investigate the information present in protein complexes with fragments as well as those with additives and how they relate to the binding modes of their drug-like counterparts. We present a thorough analysis of the binding modes of crystallographic additives, fragments, and drug-like ligands bound to four diverse targets of wide interest in drug discovery and highly represented in the Protein Data Bank: cyclindependent kinase 2, β-secretase 1, carbonic anhydrase 2, and trypsin. We identified a total of 630 unique molecules bound to the catalytic binding sites, among them 31 additives, 222 fragments, and 377 drug-like ligands. In general, we observed that, independent of the target, protein−fragment interaction patterns are highly similar to those of drug-like ligands and mostly cover the residues crucial for binding. Crystallographic additives are also able to show conserved binding modes and recover the residues important for binding in some of the cases. Moreover, we show evidence that the information from fragments and druglike ligands can be applied to rescore docking poses in order to improve the prediction of binding modes.



modes.3 Ichihara and colleagues assembled a data set of 23 fragment−ligand pairs from drug design studies that express a conserved binding mode.6 The group of Vajda proved on several examples that binding modes tend to be conserved upon fragment growth when the fragment is located within the lowest-energy interaction hot spot as predicted by computational solvent mapping.7 Other studies have applied ligand deconstruction techniques to prove that fragments of larger ligands or new ligands obtained by their reassembly will bind in a similar fashion.3,8−10 Nonetheless, several negative examples have also been reported where fragments did not retain the same binding position and interactions as in the full ligand.7,11−13 To date, however, the analysis of the extent of

INTRODUCTION The fragment-based drug design (FBDD) paradigm is supported by the fact that over 30 FBDD-derived compounds are currently in clinical trials and two of them, vemurafenib and venetoclax, both used for treatment of specific tumor types, have been approved in recent years.1 Case studies have also reported the usefulness of fragment screening to identify starting points for lead optimization for diverse targets, e.g., kinases such as B-RAF and cyclin-dependent kinases (CDKs), enzymes like β-secretase (BACE1), phosphodiesterase 10a, and heat shock protein 90, G-protein coupled receptors, and protein−protein interactions, as summarized in various reviews.2−5 A widely adopted assumption in FBDD is that a fragment and a similar drug-like ligand will have comparable molecular interactions with the protein and thus conserved binding © 2017 American Chemical Society

Received: December 19, 2016 Published: April 17, 2017 1197

DOI: 10.1021/acs.jcim.6b00769 J. Chem. Inf. Model. 2017, 57, 1197−1209

Article

Journal of Chemical Information and Modeling

Figure 1. Overview of the procedure. PDB files of protein−ligand complexes for the selected targets were obtained from the RCSB Protein Data Bank and filtered according to their quality (deposit date, resolution). The remaining files were prepared in four steps: Protein residues were renumbered according to the Uniprot sequence, and those proteins exhibiting a non-wild-type active site were removed. The remaining complexes were superimposed and protonated, allowing an analysis of the binding modes.

disease,15,16 respectively, whereas carbonic anhydrases and serine proteases similar to TRY1 have been investigated for multiple medical applications.17,18 From the 947 different protein targets in the PDB that have been crystallized with additives, fragments, and drug-like ligands (to which we refer as the filtered PDB set), the investigated targets are the most abundant ones, with a total of more than 1000 PDB files of protein−ligand complexes (see Supplementary Table S1). Molecule Class Definitions and Data Set Composition. A crucial step for the analysis was the definition of ligand classes (as described in Molecule Classes). Especially the definition of fragments has been the subject of many discussions.19−22 Here we use two simple rules to identify fragments: a molecular weight (MW) below 300 Da and a heavy-atom count below 19. However, we carefully distinguish two classes of fragments that are examined separately throughout the article: A list of manually curated crystallization additives, including solvents, buffers, and other biologically nonrelevant agents, is used to distinguish these molecules from “true” fragments that have been crystallized on purpose. Using these molecule classes, our data set contains 31 unique additives, 222 unique fragments, and 377 unique drug-like molecules. This set of molecules, called the working data set, represents approximately 10% of the ligands present in the filtered PDB set of all X-ray structures with a resolution of ≤3 Å that were deposited after 01/01/2000 and crystallized with small molecules of at least two of the previously defined compound classes (see Supplementary Table S1 for more details). To verify whether the chosen data set is representative of the fragments and drug-like ligands present in the PDB, we performed a principal component analysis (PCA) on a set of molecular descriptors, comparing our molecule sets for the four targets to the ligands from all available X-ray structures of the filtered PDB set as well as known fragments and drug-like ligands extracted from the ChEMBL database. The descriptors

conservation between protein−fragment and protein−ligand interaction modes has been performed on a number of examples and not in a systematic manner using all of the structural data available for a given target. Thus, it is unclear whether interactions formed by fragments can generally predict the interaction hot spots of drug-like ligands and vice versa. In addition, current studies have focused on compounds considered as true fragments, neglecting the information contained in three-dimensional (3D) protein complexes with crystallization additives and other biologically nonrelevant molecules. In the present study, we analyzed the binding modes of three classes of moleculescrystallization additives, fragments and drug-like ligandsfor four protein targets of high pharmaceutical interest that are highly represented in the Protein Data Bank (PDB).14 These include human carbonic anhydrase 2 (CAH2), human cyclin-dependent kinase 2 (CDK2), human βsecretase 1 (BACE1), and bovine trypsin (TRY1). Using shape overlap and protein−ligand interaction fingerprints (IFPs), we compared the interaction patterns of the different molecule classes and determined the binding mode conservation for direct substructure pairs. Finally, we evaluated whether the interaction pattern information derived from a molecular class can be used to predict the binding mode of another class.



RESULTS AND DISCUSSION In order to determine which information can be obtained from binding modes of fragments and crystallization additives and how it is relevant for FBDD, we systematically examined publically available crystal structures of protein−ligand complexes (see Data Set and General Procedure). In this study, we focus on four diverse protein targets highly represented in the PDB and of interest in the drug design field: CDK2, BACE1, CAH2, and TRY1. CDK2 and BACE1 are well-described drug targets for cancer and Alzheimer’s 1198

DOI: 10.1021/acs.jcim.6b00769 J. Chem. Inf. Model. 2017, 57, 1197−1209

Article

Journal of Chemical Information and Modeling

Figure 2. Interaction heatmaps for (top) BACE1 and (bottom) CDK2 for (left) additives, (middle) fragments, and (right) drug-like ligands. Pocket residues taking part in ligand interactions are shown on the y axis, while different interaction types are shown on the x axis. Five different interaction types are distinguished: hydrophobic interactions (HYD); aromatic interactions (AROM), including face-to-face and edge-to-face π−π interactions as well as π−cation interactions; hydrogen-bonding interactions (HB); and ionic interactions (ION). The heatmaps for drug-like ligands show the frequency of each interaction in all protein complexes containing drug-like molecules, with dark blue encoding a high frequency. On the other hand, the heatmaps for additives and fragments show the differences in interactions between these sets and the drug-like ligands. Interactions occurring only with drug-like ligands (D) are shown in light blue, those occurring only with additives/fragments (F) in red, and the interactions occurring in both sets (overlap, O) in dark blue.

types (single, double, triple, aromatic, rotatable), ring types, and element types and physicochemical properties such as counts of hydrogen-bond donors and acceptors, the logP coefficient, and

used for the PCA did not include the heavy-atom count or MW, as these are clearly different for the different molecule classes, but instead comprised the counts of different bond 1199

DOI: 10.1021/acs.jcim.6b00769 J. Chem. Inf. Model. 2017, 57, 1197−1209

Article

Journal of Chemical Information and Modeling

Figure 3. Interaction counts for different molecule subsets. The box plots show the distributions of counts of aromatic, ionic, hydrogen-bonding and hydrophobic interactions in protein complexes with additives, fragments, and drug-like ligands. Aromatic interactions include stacking (face-to-face, edge-to-face) and π−cation interactions. Ionic interactions include interactions between charged protein residues, molecule groups, and in the case of CAH2, also metal ions. Interaction fingerprints indicate the presence of a specific type of interaction with a given pocket residue. Even though a residue might form multiple interactions, e.g., hydrogen bonds, these are counted as one. The median of each distribution is indicated as an orange line.

Binding Mode Comparison. The comparison of binding modes among additives, fragments, and drug-like ligands required the development of a multistep process to filter, prepare, and analyze protein−ligand complexes. The procedure is summarized in Figure 1 and described in detail in Methods. Briefly, we filtered the PDB to establish a high-quality set of protein−ligand crystal structures and for each target normalized the PDB files to possess the same residue numbering and coordinates. Importantly, small changes in the protein pocket can have an influence on ligand binding. Therefore, a crucial step of the procedure was the removal of all of the protein structures that had mutated or missing residues within the active site. This step led to the removal of more than 200 protein chains, some of them containing more than one mutated or missing residue. Additionally, the structural alignment of the protein complexes can influence the results of shape overlaps between molecules and pockets. Thus, we verified our protein alignment by computing the Cα atom rootmean-square deviation (RMSD) of the binding pocket residues. For CAH2 and TRY1, the binding pockets in the data set are rigid, and thus, a very good alignment was achieved (average RMSD = 0.215 ± 0.061 Å and 0.249 ± 0.074 Å for CAH2 and TRY1, respectively). On the other hand, the CDK2 and BACE1 pockets are more flexible, and herein the alignment was

the polar surface area (see Assessment of Structural Diversity for more details). As shown Supplementary Figure S1, the fragment and ligand sets for the four targets show a good overlap of molecular features to all fragments and ligands available in the PDB and ChEMBL (not shown). Only a small fraction of drug-like PDB ligands (see Supplementary Figure S1, compounds with PC2 below −0.4) are not covered by the ligands in the working data set. These include mostly hydrophobic compounds like steroids or compounds with long aliphatic chains. Additionally, the overlap between crystallographic additives versus fragment/drug-like ligands is relatively low, showing the clear structural differences between these molecular classes and therefore the potential of additives to complement the fragment structural information. Although the additives in our set are only a small fraction of all additives in X-ray structures, they contain many frequently used crystallization agents, including the cryoprotectants glycerol (GOL), 1,2-ethanediol (EDO), and dimethyl sulfoxide (DMSO) as well as ions like phosphate (PO4), sulfate (SO4) and acetate (ACT) (see Supplementary Table S2 for more information). Therefore, we believe that the chosen data sets can adequately represent the information in the PDB and can be used to derive general conclusions. 1200

DOI: 10.1021/acs.jcim.6b00769 J. Chem. Inf. Model. 2017, 57, 1197−1209

Article

Journal of Chemical Information and Modeling

Figure 4. CDK2 structure and binding modes. Different domains of CDK2 are indicated by colors: the N-lobe in lemon green, the C-lobe in dark green, and the hinge region in pink. (A) Pocket location: CDK2 complex with the type-I inhibitor RC-2-38 (PDB code 3QU0). The binding pocket is indicated as a gray surface, and the ligand is shown in stick representation. (B) Overlay of all CDK2 fragments in the binding site. All of the residues that frequently interact with drug-like ligands are labeled and shown as sticks. (C) Overlay of the CDK2 complex with a type-I drug-like inhibitor (PDB code 3QU0) with 1,2-ethanediol (EDO) molecules and another ligand, 8-anilino-1-naphthalene (in orange, PDB code 3PY1), binding to the back cleft of the pocket, found behind the gatekeeper residue. The additives are found in four crystal structures with the PDB codes 3QWK, 3TIZ, 4ERW, and 4EZ3.

crystallographic additives can be mostly regarded as secondary hot spots, nevertheless presenting interesting subcavities, e.g., when extending drug-like ligands. We will discuss the targets BACE1 and CDK2 in more detail below. CDK2 Fragment and Ligand Binding Modes. The CDK2 ATP binding pocket is formed by more than 100 consecutive residues at the junction of the C-lobe and the Nlobe of the protein (Figure 4) and is the major site for potent CDK2 inhibitors.28 These are usually classified as type-I and type-II inhibitors, depending on whether they bind to the active or inactive conformation of the protein, the latter being associated with an outward flip of the conserved DFG (Asp145, Phe146, Gly147) motif.29 Type-I inhibitors usually bind by forming hydrogen bonds with the backbone of the hinge (residues 81−83), a loop connecting the C-lobe and N-lobe, whereas type-II inhibitors occupy the space created by the outward DFG conformation. Hydrophobic interactions with Phe80, the gatekeeper residue that controls access to the back cleft of the pocket, and the aliphatic side chain of Lys33, a conserved kinase residue, are also common.29 Indeed, we observe that most of the drug-like ligands in our data set form hydrogen bonds to the hinge residues Glu81 and Leu83 as well as hydrophobic contacts with the gatekeeper residue and several residues in the proximity of the hinge (Ile10, Val18, Ala31, Asp86, and Leu134) (Figure 3). All of these interactions are also observed with fragments, although in some cases with lower frequency (e.g., hydrophobic contacts with Ile10 and Leu134). Surprisingly, some of the important interactions can also be found by looking at the interaction patterns of additives (Figure 3). Although their frequency is also much lower than for fragments or drug-like ligands, interactions with the residues Ile10, Lys33, and Phe80 can be observed. Interestingly, a few additives (e.g., 1,2-ethanediol) show interactions not seen in fragments or drug-like ligands. In some PDB structures, additives are located in the back cleft of the pocket, located behind the gatekeeper residue (Figure 4), a region typically occupied by tyrosine kinase type-II inhibitors. Interestingly,

slightly worse but nevertheless acceptable (average RMSD = 1.077 ± 0.330 Å and 1.115 ± 0.678 Å for BACE1 and CDK2, respectively). To compare the interaction patterns of additives, fragments, and drug-like ligands, we detected all of the protein−ligand interactions that occur in the catalytic cavity, including hydrophobic, aromatic, ionic, metal and hydrogen-bonding interactions, and visualized them in interaction heatmaps (see Figure 2 for BACE1 and CDK2 and Supplementary Figure S2 for CAH2 and TRY1). As expected, we could observe that drug-like ligands are able to make more interactions per complex (see Figure 3). Consistent with previous studies,23 we found that fragments and additives engage in two hydrogen bonds on average, whereas drug-like ligands form typically four. The difference is even more pronounced for hydrophobic interactions. In general, in agreement with FBDD assumptions, we could observe that fragments and drug-like ligands show similar interactions patterns and that all of the interactions frequently occurring with drug-like ligands can also be identified with fragments. However, this is not necessarily the case for additives. Furthermore, differences in interaction frequencies could be observed between additives/fragments and drug-like ligands. These could be due to the fact that additives and fragments are smaller and thus cannot form as many interactions per molecule. Finally, we could also observe interactions of additives and fragments that do not occur with drug-like ligands and that might be of interest for future drug design studies for these or related targets. In comparison with other hot spot detection methods, the analysis of additive interaction patterns is computationally less expensive than methods such as the analysis of hydration sites or simulated solvent binding regions,24,25 but it is obviously biased toward the information available in the PDB. Whereas experimental solvent mapping techniques allow systematic solving of crystal structures with a variety of organic solvents,26,27 the protein− additive complexes in the PDB were mostly solved in the presence of another ligand. Thus, the hot spots detected by the 1201

DOI: 10.1021/acs.jcim.6b00769 J. Chem. Inf. Model. 2017, 57, 1197−1209

Article

Journal of Chemical Information and Modeling

Figure 5. BACE1 structure and binding modes. (A) BACE1 structure (PDB code 1FKN) and location of the binding pocket. The inhibitor OM99-2 is shown as sticks. The protein is colored in orange, and the flexible flap region is highlighted in yellow. The location of the active site is indicated as a gray surface. (B) Active site of BACE1. The subpockets are highlighted as follows: S1 in red, S2 in green, S3 in blue, S4 in yellow, S1′ in pink, S2′ in cyan, S3′ in gray, and S4′ in orange. The inhibitor OM99-2 is shown in light green, and all of the additives in the active site are shown in gray.

Figure 6. Examples of conserved binding modes between additives and ligands. (A) BACE1 pocket with dimethyl sulfoxide (light red, sticks) and overlapping ligand superstructure (dark red, wire) from PDB files 4B1C and 2HIZ. (B) CAH2 pocket with glycerol (light blue, sticks) and overlapping ligand superstructure (cyan, wire) from PDB files 2WEJ and 3T84.

type-II inhibitors have recently been reported for CDK2.30 As mentioned, this part of the pocket is accessible to inhibitors of different subfamilies of the kinome; a summary of these is available from the KLIFS database of kinase−ligand interactions.31 Notably, the KLIFS database also lists seven PDB files of CDK2 complexes with a ligand binding to the back pocket. None of these, however, were part of the current data set because they did not fall into any of the ligand categories. 8Anilino-1-naphthalene, which is present in four of these structures (PDB entries 3PXF, 3PXQ, 3PXZ, and 3PY1), is neither a fragment nor a drug-like ligand because of its MW and atom count, but it binds in close proximity to the additive molecules (see Figure 4C). Thus, the exploration of additives and fragments for a target might give hints of interaction hot spots not only for that particular target but also for a protein family.

BACE1 Fragment and Ligand Binding Modes. Human BACE1 is an aspartic protease that cleaves the amyloid precursor protein with the help of two catalytic aspartates, Asp32 and Asp228.32 The active site is located in the vicinity of the catalytic dyad and can be divided into different subpockets (Figure 5). As shown in the interaction heatmap (Figure 2), known BACE1 drug-like ligands frequently form hydrogenbonding or ionic interactions with the catalytic aspartates, and these interactions are also found by fragments but not by additives. However, as shown in Figure 5B, the additives are distributed among many different subpockets, and thus, their 3D structures can be used to identify interaction hot spots in the pocket. The main reason why additives never interact with the catalytic aspartates is that none of the 18 BACE1 complex structures containing additives is ligand-free. As in the case of CDK2, additives also show several interactions that do not 1202

DOI: 10.1021/acs.jcim.6b00769 J. Chem. Inf. Model. 2017, 57, 1197−1209

Article

Journal of Chemical Information and Modeling

Figure 7. Comparison of binding mode conservation measures between different targets and subsets. Box plots for the shape similarity and IFP similarity are shown for BACE1 (orange), CAH2 (blue), CDK2 (green), and TRY1 (red) and additive (left) and fragment (right) substructure pairs.

with with orientations very similar to those of their drug-like superstructures. Two examples of those are shown in Figure 6. Overall, however, the binding mode conservations between additives and ligands, expressed here as similarity coefficients of shape overlaps and interaction patterns, are quite low except for CAH2, where large variance exists (see Figure 7). In contrast, high binding mode conservation can be observed for fragments of all targets (Figure 7). It is important to recall that these observations are not related to a large molecular weight difference between additives, fragments, and drug-like ligands since the similarity is expressed as a size-independent Tversky metric (see Binding Mode Conservation of Substructure Pairs). Even though the information contained in additives depends on the data set, we observed that a clustering of additives can give a good indication about conserved hot spots. Especially in CAH2, glycerol molecules have been crystallized in many protein complexes, and in the majority of structures, they cluster at exactly the same position (see Figure 6B) and show average b factors similar to that of the entire protein, emphasizing the valuable information that can be provided by protein−additive complexes. The high degree of fragment binding mode conservation observed here is confirmed by a recent study published during the preparation of this article. In a similar analysis, Malhotra and Karanicolas36 investigated 297 fragment−ligand pairs from the PDBBind database and noted unchanged binding modes in 86% of the pairs. In the next step, we considered substructure pairs that display a conserved binding mode, here defined as an average IFP similarity above 0.6 (see Binding Mode Conservation of Substructure Pairs for more details). When common fragment− ligand interactions were examined, a large difference between

occur with drug-like ligands or fragments of our set (Figure 3). Those residues (Lys224, Arg235, Arg307, and Ser325) have been previously reported to play a role in the binding of peptidic BACE1 inhibitors,16,33 thus confirming the information present in additive structures. Quantifying Binding Mode Conservation. Another aspect of the presented analysis is that it can provide clues to the question of how often binding modes are conserved between fragments/additives and ligands and whether the degree of conservation can be explained by properties of the fragment. To allow a direct comparison, we identified all molecule pairs in which an additive or fragment is an exact substructure of the larger, drug-like ligand. For the four targets, we found a total of 348 substructure pairs all bound to the same druggable cavity (see Supplementary Tables S3 and S4). Some of the substructure pairs have been well-described in the literature. The most prominent example is found in TRY1 complexes: The benzamidine fragment, known to exhibit a conserved binding mode,34,35 is a substructure of many different TRY1 ligands. Interestingly, the identified pairs also contain many examples that were not previously reported,6,7 partly because they contain additives and partly because the fragment and ligand structures were investigated by different groups and thus were not part of a single ligand design or deconstruction study. It should also be noted that the previous studies have presented fragment−ligand pairs that are not necessarily true substructure pairs. For all of the studied targets, we were able to find fragment− ligand pairs bound in exactly the same position and exhibiting very similar interaction patterns. Surprisingly, for BACE1, CAH2, and TRY1, among those pairs were also found additives 1203

DOI: 10.1021/acs.jcim.6b00769 J. Chem. Inf. Model. 2017, 57, 1197−1209

Article

Journal of Chemical Information and Modeling

Figure 8. Fragment docking rescoring. Box plots of the RMSD (in Å) of the selected docking pose to the original X-ray structure pose are shown. Four rescoring schemes are evaluated: maximal IFP similarity (upper left), consensus IFP similarity (upper right), maximal ROCS similarity (lower left), and consensus ROCS similarity (lower right). In each panel, different molecule sets are used for rescoring: (A) additives, (F) fragments, (D) drug-like ligands, (FD) fragments and drug-like ligands, (all) additives, fragments, and drug-like ligands. DS indicates the use of the original docking score (ChemPLP), and best is a control indicating the best solution among the docking poses for each molecule. The horizontal dotted orange line indicates the median of the RMSDs of poses selected by the docking scoring function.

indication about the conservation or variability of its binding mode when part of a drug-like ligand. Finally, we explored whether structural properties of fragments and additives are indicative of a conserved binding mode. Because the data set of substructures is rather small, no predictive statistical models can be developed. However, we observed a trend that the number of heavy atoms (and the highly correlated MW) can be used to distinguish between conserved and nonconserved binding modes. In almost 90% of cases with a conserved binding mode, the additive/fragment has more than eight heavy atoms and MW higher than 110 Da. These findings are clearly in line with typical FBDD screening libraries, which usually contain fragments with MW between 150 and 250 Da or nine to 18 heavy atoms.5,8,20,22 We further tested whether the results are also consistent with the molecular complexity theory,39 which suggests that simple molecules can bind to diverse proteins and binding sites whereas more complex molecules tend to exhibit a single binding mode. However, molecular complexity lies in the eye of the beholder, as no general definition exists.40 Simple topological complexity descriptors include the fraction of sp3-hybridized carbons over

conserved and variable binding mode fragments was observed. Fragments with conserved binding modes matched most of the hydrogen and ionic bonds and many of the hydrophobic contacts observed with their drug-like counterparts, but not the aromatic interactions. The average numbers of matched fragment interactions in a conserved binding mode were 2.3 ± 0.8 hydrophobic, 2.6 ± 0.6 hydrogen-bonding, and 1.0 ± 0.2 ionic interactions. The lack of aromatic interaction matches might be influenced by the settings used to identify interactions: here we used a rather strict definition of stacking interactions with a threshold of 5 Å for the distance between aromatic ring centers. The other values, however, highlight the importance of both polar and apolar interactions for specific binding. As shown in several studies, strong hydrogen bonds are important for fragment anchoring, whereas a hydrophobic environment can be important for the stabilization of hydrogen bonds.23,37,38 Thus, the overlap of a fragment with a low-energy protein hot spot defined by both polar and apolar interaction energies, as suggested previously,7 together with the involvement of a fragment in strong protein interactions can give an 1204

DOI: 10.1021/acs.jcim.6b00769 J. Chem. Inf. Model. 2017, 57, 1197−1209

Article

Journal of Chemical Information and Modeling

ChemPLP scoring function has a higher deviation from the crystallographic pose (median RMSD below 3 Å; see Supplementary Tables S5 and S6). Therefore, it is possible that the use of a rescoring scheme can improve the binding pose prediction. Indeed, we observed that rescoring with IFPs as well as with ROCS shape and atom-type matching can result in increased performance compared with the docking scoring function. The binding mode rescoring can lead to a decrease of the median RMSD by up to 1 Å and an increase in the percentage of good poses (RMSD below 2 Å) by up to 17%. The improvements can be even more pronounced for targets where the poses selected by the docking scoring function are poor, e.g., for BACE1 fragments or CDK2 ligands (see Supplementary Tables S5 and S6). Although the rescoring with additives alone generally does not seem to be of advantage, the use of other sets (fragments, ligands) or their combination largely improves the prediction. This can be explained by the fact that additives are mostly bound to the protein pocket in the presence of other molecules and therefore seldom occupy the most important part of the pocket exhibiting the most crucial interactions. However, the combination of additives with other molecule classes does not bring any disadvantages to the docking rescoring. Furthermore, for specific targets (e.g., CAH2; see Supplementary Tables S5 and S6) even rescoring with additives alone can increase the percentage of good poses (RMSD below 2 Å) selected. In summary, in the presence of structural information on protein−small-molecule complexes, we observe that their binding modes can be used to guide the ranking of docking poses. In particular, we believe that this approach can be applied for both fragment and ligand docking. In the case of fragments, their docking is of interest for FBDD because it can be applied to predict the binding modes of structural analogues of experimentally determined fragment hits or, as recently described, for virtual fragment screening.51

all carbons (FSP3)the higher the FSP3 value, the more branched and three-dimensional the molecule is. Other definitions express molecular complexity by calculating fingerprint densities, therefore encoding the richness of different substructures in the molecule.41 Interestingly, we find no substantial difference between fragments with conserved or variable binding modes when investigating their radius-2 extended-connectivity fingerprint (ECFP2) density and FSP3 values. Additionally, no clear trend between properties of the Rule of 3 and binding mode conservation was observed.19 Use of Binding Mode Information To Improve Docking. Several studies indicate that binding mode information can be used to improve the performance of molecular docking in terms of the selection of the correct binding pose. Using a set of 42 protein−ligand complexes for 10 diverse pharmaceutically relevant targets (including CDK2 and BACE1) and low-MW ligands (150−250 Da), Marcou and Rognan42 showed that the binding pose prediction can be enhanced when docking poses are rescored with IFP similarities to the native X-ray pose. In a follow-up study, Desaphy et al.43 demonstrated that this also holds true for additional protein− ligand complexes from the Astex diverse set.44 In a recent study, a force field derived from protein−ligand interactions (PLIff) was developed and applied, among others, to pose prediction and shown to compete with docking scoring functions, especially when considering in-house Astex data.45 Similarly, using shape similarity of docking poses to shapes of crystal structure ligands has been shown to improve enrichment in virtual screening as well as binding pose predictions.46,47 We thus asked ourselves whether the binding modes of one molecule class can be successfully used to rescore the docking poses of another class of compounds, e.g., whether the information from additive and fragment complexes can be used to predict the binding modes of drug-like ligands. In addition, we tried to combine different molecule classes for rescoring. The performance of docking scoring functions in native docking pose prediction is generally good, whereas non-native docking remains the more challenging application.48 We therefore applied non-native docking into the selected representative structures for each target (chosen by clustering of all binding site structures; see Methods) and rescored the docking poses using IFP and ROCS shape similarities of additives, fragments, and drug-like ligands. For both approaches, we used two different scoring techniques. First, in accordance with previous studies,43,46 we chose to rescore the docking pose on the basis of the maximal IFP/shape similarity to all molecules of a class. Second, each docking pose was compared to a consensus interaction fingerprint or a consensus shape of a molecule class. For a given class (e.g., fragments), a consensus fingerprint, like the interaction heatmap, represents the relative frequency of each interaction type. Although fingerprints with continuous variables are used only rarely in chemoinformatics,49 the similarity coefficients have been defined similarly to binary fingerprints and used here.50 The consensus shape, on the other hand, is defined from the atom coordinates of all combined atoms of a molecule class, e.g., the shape around all fragment atoms. The general trends of fragment and drug-like ligand rescoring for all four protein targets are presented in Figure 8 (fragments) and Supplementary Figure S3 (ligands). In both cases, the docking program is able to find good docking poses (median RMSD below 1.5 Å), whereas the pose chosen by the



CONCLUSIONS In the current study, we present a thorough analysis of the binding modes of additives and fragments in comparison with drug-like ligands. Using shape overlap and molecular interaction comparisons, we show that independent of the target, fragments and drug-like ligands tend to exhibit similar interaction patterns and that the information present in protein−additive complexes can also be valuable in drug design projects. Because of the choice of drug targets that are highly represented in the PDB, we were able to establish a large and diverse set of molecules and discover new fragment−ligand pairs with conserved and nonconserved binding modes. Because the chosen targets have been well-studied, we were able to validate our assumption that interaction patterns of additives and fragments resemble those of drug-like molecules. In addition, we found that the interaction patterns of each molecule class can be used to improve docking pose predictions. We believe that the present study is an important proof of concept and can be useful for future investigations of new, unliganded protein targets.



METHODS Data Set and General Procedure. The analysis was performed on publicly available data from the RCSB PDB Web site.52 Crystal structures of protein−ligand complexes with a 1205

DOI: 10.1021/acs.jcim.6b00769 J. Chem. Inf. Model. 2017, 57, 1197−1209

Article

Journal of Chemical Information and Modeling resolution of ≤3 Å that were deposited between January 2000 and September 2015 and contain at least one protein chain were analyzed. Similar to the filtering rules of the in-house scPDB database,53,54 short peptides with less than 36 residues and proteins with only Cα coordinates were discarded. Furthermore, the identity of each protein chain was validated using the scPDB protocol, and only those structures containing a crystallization additive, fragment, or drug-like ligand (see below) were kept. This set is called the filtered PDB set throughout the article. Four protein targets, human β-secretase 1 (BACE1, Uniprot P56817), human cyclin-dependent kinase 2 (CDK2, Uniprot P24941), human carbonic anhydrase 2 (CAH2, Uniprot P00918), and bovine trypsin (TRY1, Uniprot P00760), were selected as representative targets for this study, and we refer to them as the working data set. An overview of the filtering and processing of PDB files is given in Figure 1. Molecule Classes. For each distinct ligand HET code found in the filtered PDB set, the molecular structure in form of a Simplified Molecular Input Line Entry Specification (SMILES) was obtained from the mmCIF dictionary55 and prepared using the scPDB procedure.53,54 In particular, ligands were ionized at pH 7.4 with the Filter tool (OpenEye Scientific Software, Santa Fe, NM, USA), their charges and stereo information were standardized, and duplicates were identified using canonical SMILES, both using Pipeline Pilot 9.5 (BIOVIA, Dassault Systèmes, Vélizy-Villacoublay, France). Molecular properties were then determined, also using Pipeline Pilot. All molecules containing fewer than two heavy atoms were removed. The following classes of ligands were defined: drug-like ligands, fragments, and crystallization additives. Additives were identified using manually assembled in-house HET code lists consisting of common crystallization agents (e.g., buffers, solvents, sugars, and poly(ethylene glycols)). Fragments were defined as molecules with MW below 300 Da and fewer than 19 heavy atoms. Finally, drug-like ligands were defined as all nonfragments that follow Lipinski’s rule of five56 with up to one exception. Additionally, scPDB ligand rules were used for the human targets BACE1, CDK2, and CAH2. Whether fragments or additives were substructures of drug-like ligands was determined using a substructure map implemented in Pipeline Pilot. Assessment of Structural Diversity. To determine the structural diversity of the BACE1, CDK2, CAH2, and TRY1 data sets of additives, fragments, and drug-like ligands, extended-connectivity fingerprints with a radius of 2 ( E C F P 2 ) a n d M A C CS s u b s t r u c t u r e fin g e r p r i n t s (MACCS166) were calculated with Pipeline Pilot. For each molecule, the average Tanimoto similarity to all compounds of this subset was calculated. Furthermore, the sets were compared to all additives, fragments, and drug-like ligands present in the filtered PDB set (see Molecule Class Definitions and Data Set Composition) and also to known active molecules for the four targets. These were obtained from the ChEMBL database version 21 (March 2016 release)57 by searching for compounds with an IC50, EC50, Ki, or Kd below 1 μM as determined in a binding assay. To ensure a high-quality data set,58 further filtering criteria were included: only molecules with a specific activity value (activity relation “=”), no inconclusive comments, and binding to a single protein or defined single subunit of a protein complex (confidence score ≥ 7) were included in the analysis. Furthermore, the molecules were required to fulfill the characteristics of additives, fragments, or drug-like ligands as described above. Data sets

were compared using a principal component analysis (PCA) performed with the Python scikit-learn library (http://scikitlearn.org/stable/index.html). The descriptors used for the analysis included bond counts (single, double, triple, aromatic), ring counts (all rings, aromatic rings), element counts (C, N, O, S, P, Cl, F), the fractional polar surface area, AlogP, and numbers of hydrogen-bond donors and acceptors, all calculated using Pipeline Pilot. Because the used descriptors have diverse property ranges, they were all normalized using MinMax scaling (Python scikit-learn). To compute the complexities of fragments and additives, their ECFP2 densities and ratios of sp3-hybridized carbons to all carbons were calculated using Pipeline Pilot. Preparation of PDB Files. The PDB files for the four targets were downloaded from the RCSB PDB Web site and subjected to a number of preparation steps (see Figure 1 for a summary). Files were first preprocessed using an scPDB procedure,53,54 including the removal of alternative (less frequent) atom locations and the use of continuous residue numbers. To enable comparability, all protein residues were renumbered according to an alignment of the PDB sequence to the original entry from the Universal Protein Resource (Uniprot),59 a database for protein sequences and annotations. The numbers were adjusted to match the residue numbers of catalytic residues used in the literature (BACE1 residue renumbering, −61; TRY1 residue renumbering, −5). Files containing multiple copies of the same protein were separated, and only those containing a ligand were kept. The ligand binding site was identified in PDB files containing drug-like ligands and defined by all residues present in more than 70% of the PDB entries for which a heavy atom is located within 6.5 Å of any ligand heavy atom. In the last step, binding sites were checked for mutated, modified, missing, or inserted residues. PDB files with affected residues were removed from further analysis. To allow direct comparisons among all of the complexes of a target, proteins were superimposed to a reference structure using the CE tool.60 The reference structure was identified on the basis of its binding pocket characteristics. Pairwise pocket similarities were computed using Shaper,61 and the centroid structure exhibiting the maximal similarity to all other pockets was determined. The following reference structures were chosen: PDB codes 3IND, 3QU0, 4BF6, and 1F0U for BACE1, CDK2, CAH2, and TRY1, respectively.62−65 As the structural alignment used in this study has an impact on the computed shape overlaps (see Determination of Binding Modes and Interaction Heatmaps and Binding Mode Conservation of Substructure Pairs), we checked the RMSD of binding pocket residue Cα atoms for each aligned protein. To correctly compute protein−ligand interactions, the selected complexes were protonated using Protoss version 2.66 The protonation and tautomeric states of small molecules and important protein residues were checked manually and corrected if necessary. In particular, the protonation states of CAH2 ligands containing sulfonamide groups and interacting with zinc as well as the two catalytic aspartates of BACE1 (Asp32 and Asp228) were carefully assigned. Furthermore, the electron densities of the example molecules shown in the figures were checked for consistency with the molecule’s positioning. Determination of Binding Modes and Interaction Heatmaps. For the analysis of binding modes, only molecules bound to the main druggable site were considered. In order to 1206

DOI: 10.1021/acs.jcim.6b00769 J. Chem. Inf. Model. 2017, 57, 1197−1209

Article

Journal of Chemical Information and Modeling

The IFPs for each molecule class were transformed into a continuous consensus fingerprint containing the relative frequency of each interaction type with each residue. The similarity between a docking pose IFP and a consensus fingerprint was calculated using the Tanimoto metric defined for continuous variables.50 Additionally, docking poses were also rescored using individual IFPs for each molecule in the data set excluding the docked compound itself. The maximal obtained Tanimoto similarity was then used to find the best docking pose. Furthermore, docking poses were rescored using ROCS combo similarity, which takes into account the overlap of the shape and the atom types. The similarity was expressed using the Tanimoto coefficient. The poses were rescored both using the maximal ROCS similarity to a specific molecule set and using the similarity to a consensus shape for a molecule class. The consensus shape was determined by merging all of the molecules of a class into one multimeric mol2 file. The docking performance was evaluated using the RMSD between the selected pose and the original X-ray structure as calculated by Surflex version 3066.70 The RMSD box plots were generated using Python (matplotlib library). Information from all four targets was combined into one plot. To give the same weight to each target, the RMSD sets for each target were multiplied to give approximately the same size.

identify the main druggable site, the Volsite program was used.61 All files containing a small molecule displaying a minimal ROCS shape overlap with the main druggable cavity were kept for further analysis.67 A shape Tversky similarity (ROCS FitTversky with weights of α = 0.95 for the molecule and β = 0.05 for the cavity) of at least 0.1 between the molecule and the cavity was required. The binding mode of each small molecule was determined in the form of an interaction fingerprint (IFP) based on the molecular interactions with the protein residues. IFPs between the previously defined pocket residues and the bound small molecule were calculated using the in-house IChem software.43 An extended version of the fingerprint was used that recognizes different types of molecular interactions (hydrophobic, hydrogen-bonding, ionic, aromatic, and metal interactions) on the basis of the previously described rules.42,43 The distance between aromatic ring centroids was changed to 5 Å from the default value of 4 Å. The relative occurrence of different interaction types with the pocket residues was calculated from the IFPs and transformed into an interaction heatmap generated using the Python library matplotlib. Binding Mode Conservation of Substructure Pairs. To compare the binding modes of substructure pairs, the similarity between IFPs was calculated using the Tversky coefficient. The Tversky parameters α and β were set to 0 (ligands) and 1 (fragments), respectively, so that a perfect similarity score of 1 was obtained whenever all fragment interactions were also found in the ligand. In addition to the IFP similarity, the shape overlap of the substructure pairs was determined using the ROCS program.67 To express the similarity in shape overlap, again the Tversky coefficient was used. A perfect Tversky score was obtained when the fragment shape fully overlapped with the ligand shape. A conserved binding mode was defined for each fragment cluster by an average IFP similarity to the superstructure of ≥0.6. Docking Procedure. Non-native docking was performed with the PLANTS program using the ChemPLP scoring function with the search speed set to 1 (highest accuracy).68 The binding sites of all structures of each protein target were clustered by Cα RMSDs using hierarchical clustering with average linkage, as implemented in Python (SciPy cluster.hierarchy package). The clustering distance was set at 1 Å. For CAH2, TRY1, and CDK2, we observed that the previously chosen reference structure (see Preparation of PDB Files) lies within the largest cluster of sites, and we therefore docked into these structures. The good docking performance confirmed that the reference structures were appropriate for docking. On the other hand, the BACE1 reference structure was not found within the largest cluster of sites, and docking into this reference structure gave poor results. Therefore, we chose a different docking reference structure, representing the centroid of the largest cluster of sites (PDB code 4D8C).69 For all dockings, the binding site center was defined as the centroid of all drug-like ligands for that target. The binding site radius necessary to encompass all drug-like ligands of the respective data set was checked. Apart from TRY1, for which a radius of 13 Å was chosen to cover large ligands, 12 Å was used as the binding site radius. For each fragment and ligand, 10 poses were saved. The pose clustering RMSD was set to 2 Å, meaning that poses more similar than this cutoff were not saved. Each docking pose was then rescored using interaction fingerprint and shape similarities.



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jcim.6b00769. Supplementary Tables S1−S3, S5, and S6 and Supplementary Figures S1−S3 (PDF) Supplementary Table S4 (TXT)



AUTHOR INFORMATION

Corresponding Author

*Phone: +33 3 688 54 221. E-mail: [email protected]. ORCID

Esther Kellenberger: 0000-0002-9320-4840 Author Contributions

Project coordination: E.K. and J.D. Design of the protocol: E.K. and M.N.D. Molecule classification: M.N.D., C.P., and E.K. Implementation of the protocol and data analysis: M.N.D. and C.J. Preparation of the manuscript: M.N.D., E.K., C.P., and J.D. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This work was supported by Eli Lilly and Company through the Lilly Research Award Program (LRAP). The authors thank their Eli Lilly and University of Strasbourg collaborators: Jon Erickson and Thibault Varin for reviewing the results of the analysis on the four protein targets, Noé Sturm for help with the protein sequence alignment, Guillaume Bret for technical support, and Didier Rognan for support and critical assessment of the project and manuscript.



ABBREVIATIONS BACE1, human β-secretase 1; CAH2, human carbonic anhydrase 2; CDK2, human cyclin-dependent kinase 2; FBDD, fragment-based drug design; IFP, interaction finger1207

DOI: 10.1021/acs.jcim.6b00769 J. Chem. Inf. Model. 2017, 57, 1197−1209

Article

Journal of Chemical Information and Modeling

(21) Köster, H.; Craan, T.; Brass, S.; Herhaus, C.; Zentgraf, M.; Neumann, L.; Heine, A.; Klebe, G. A Small Nonrule of 3 Compatible Fragment Library Provides High Hit Rate of Endothiapepsin Crystal Structures with Various Fragment Chemotypes. J. Med. Chem. 2011, 54 (22), 7784−7796. (22) Morley, A. D.; Pugliese, A.; Birchall, K.; Bower, J.; Brennan, P.; Brown, N.; Chapman, T.; Drysdale, M.; Gilbert, I. H.; Hoelder, S.; et al. Fragment-Based Hit Identification: Thinking in 3D. Drug Discovery Today 2013, 18 (23−24), 1221−1227. (23) Ferenczy, G. G.; Keserű , G. M. Thermodynamics of Fragment Binding. J. Chem. Inf. Model. 2012, 52 (4), 1039−1045. (24) Abel, R.; Young, T.; Farid, R.; Berne, B. J.; Friesner, R. A. Role of the Active-Site Solvent in the Thermodynamics of Factor Xa Ligand Binding. J. Am. Chem. Soc. 2008, 130 (9), 2817−2831. (25) Ung, P. M. U.; Ghanakota, P.; Graham, S. E.; Lexa, K. W.; Carlson, H. A. Identifying Binding Hot Spots on Protein Surfaces by Mixed-Solvent Molecular Dynamics: HIV-1 Protease as a Test Case. Biopolymers 2016, 105 (1), 21−34. (26) Landon, M. R.; Lieberman, R. L.; Hoang, Q. Q.; Ju, S.; Caaveiro, J. M. M.; Orwig, S. D.; Kozakov, D.; Brenke, R.; Chuang, G.-Y.; Beglov, D.; et al. Detection of Ligand Binding Hot Spots on Protein Surfaces via Fragment-Based Methods: Application to DJ-1 and Glucocerebrosidase. J. Comput.-Aided Mol. Des. 2009, 23 (8), 491− 500. (27) Mattos, C.; Bellamacina, C. R.; Peisach, E.; Pereira, A.; Vitkup, D.; Petsko, G. A.; Ringe, D. Multiple Solvent Crystal Structures: Probing Binding Sites, Plasticity and Hydration. J. Mol. Biol. 2006, 357 (5), 1471−1482. (28) Li, Y.; Zhang, J.; Gao, W.; Zhang, L.; Pan, Y.; Zhang, S.; Wang, Y. Insights on Structural Characteristics and Ligand Binding Mechanisms of CDK2. Int. J. Mol. Sci. 2015, 16 (5), 9314−9340. (29) Roskoski, R. Classification of Small Molecule Protein Kinase Inhibitors Based upon the Structures of Their Drug-Enzyme Complexes. Pharmacol. Res. 2016, 103, 26−48. (30) Alexander, L. T.; Möbitz, H.; Drueckes, P.; Savitsky, P.; Fedorov, O.; Elkins, J. M.; Deane, C. M.; Cowan-Jacob, S. W.; Knapp, S. Type II Inhibitors Targeting CDK2. ACS Chem. Biol. 2015, 10 (9), 2116−2125. (31) Kooistra, A. J.; Kanev, G. K.; van Linden, O. P. J.; Leurs, R.; de Esch, I. J. P.; de Graaf, C. KLIFS: A Structural Kinase-Ligand Interaction Database. Nucleic Acids Res. 2016, 44 (D1), D365−D371. (32) Gorfe, A. A.; Caflisch, A. Functional Plasticity in the Substrate Binding Site of β-Secretase. Structure 2005, 13 (10), 1487−1498. (33) Menting, K. W.; Claassen, J. A. H. R. β-Secretase Inhibitor; a Promising Novel Therapeutic Drug in Alzheimer’s Disease. Front. Aging Neurosci. 2014, 6, 165. (34) Mares-Guia, M.; Shaw, E. Studies on the Active Center of Trypsin. The Binding of Amidines and Guanidines as Models of the Substrate Side Chain. J. Biol. Chem. 1965, 240, 1579−1585. (35) Marquart, M.; Walter, J.; Deisenhofer, J.; Bode, W.; Huber, R. IUCr. The Geometry of the Reactive Site and of the Peptide Groups in Trypsin, Trypsinogen and Its Complexes with Inhibitors. Acta Crystallogr., Sect. B: Struct. Sci. 1983, 39 (4), 480−490. (36) Malhotra, S.; Karanicolas, J. When Does Chemical Elaboration Induce a Ligand To Change Its Binding Mode? J. Med. Chem. 2017, 60 (1), 128−145. (37) Muley, L.; Baum, B.; Smolinski, M.; Freindorf, M.; Heine, A.; Klebe, G.; Hangauer, D. G. Enhancement of Hydrophobic Interactions and Hydrogen Bond Strength by Cooperativity: Synthesis, Modeling, and Molecular Dynamics Simulations of a Congeneric Series of Thrombin Inhibitors. J. Med. Chem. 2010, 53 (5), 2126−2135. (38) Fraser, C. M.; Fernández, A.; Scott, L. R. Dehydron Analysis: Quantifying the Effect of Hydrophobic Groups on the Strength and Stability of Hydrogen Bonds. Adv. Exp. Med. Biol. 2010, 680, 473−479. (39) Hann, M. M.; Leach, A. R.; Harper, G. Molecular Complexity and Its Impact on the Probability of Finding Leads for Drug Discovery. J. Chem. Inf. Comput. Sci. 2001, 41 (3), 856−864.

print; MW, molecular weight; PCA, principal component analysis; PDB, Protein Data Bank; RMSD, root-mean-square deviation; TRY1, bovine trypsin



REFERENCES

(1) Erlanson, D. A.; Fesik, S. W.; Hubbard, R. E.; Jahnke, W.; Jhoti, H. Twenty Years on: The Impact of Fragments on Drug Discovery. Nat. Rev. Drug Discovery 2016, 15, 605−619. (2) Whittaker, M.; Law, R. J.; Ichihara, O.; Hesterkamp, T.; Hallett, D. Fragments: Past. Drug Discovery Today: Technol. 2010, 7 (3), e163−e171. (3) Murray, C. W.; Verdonk, M. L.; Rees, D. C. Experiences in Fragment-Based Drug Discovery. Trends Pharmacol. Sci. 2012, 33 (5), 224−232. (4) Keserű , G. M.; Erlanson, D. A.; Ferenczy, G. G.; Hann, M. M.; Murray, C. W.; Pickett, S. D. Design Principles for Fragment Libraries: Maximizing the Value of Learnings from Pharma Fragment-Based Drug Discovery (FBDD) Programs for Use in Academia. J. Med. Chem. 2016, 59 (18), 8189−8206. (5) Murray, C. W.; Rees, D. C. The Rise of Fragment-Based Drug Discovery. Nat. Chem. 2009, 1 (3), 187−192. (6) Ichihara, O.; Shimada, Y.; Yoshidome, D. The Importance of Hydration Thermodynamics in Fragment-to-Lead Optimization. ChemMedChem 2014, 9 (12), 2708−2717. (7) Kozakov, D.; Hall, D. R.; Jehle, S.; Luo, L.; Ochiana, S. O.; Jones, E. V.; Pollastri, M.; Allen, K. N.; Whitty, A.; Vajda, S.; et al. Ligand Deconstruction: Why Some Fragment Binding Positions Are Conserved and Others Are Not. Proc. Natl. Acad. Sci. U. S. A. 2015, 112 (20), E2585−E2594. (8) Hajduk, P. J. Fragment-Based Drug Design: How Big Is Too Big? J. Med. Chem. 2006, 49 (24), 6972−6976. (9) Chen, H.; Zhou, X.; Wang, A.; Zheng, Y.; Gao, Y.; Zhou, J. Evolutions in Fragment-Based Drug Design: The Deconstruction− reconstruction Approach. Drug Discovery Today 2015, 20 (1), 105− 113. (10) Andersen, O. A.; Nathubhai, A.; Dixon, M. J.; Eggleston, I. M.; van Aalten, D. M. F. Structure-Based Dissection of the Natural Product Cyclopentapeptide Chitinase Inhibitor Argifin. Chem. Biol. 2008, 15 (3), 295−301. (11) Barelier, S.; Pons, J.; Marcillat, O.; Lancelin, J.-M.; Krimm, I. Fragment-Based Deconstruction of Bcl-xL Inhibitors. J. Med. Chem. 2010, 53 (6), 2577−2588. (12) Babaoglu, K.; Shoichet, B. K. Deconstructing Fragment-Based Inhibitor Discovery. Nat. Chem. Biol. 2006, 2 (12), 720−723. (13) Brandt, P.; Geitmann, M.; Danielson, U. H. Deconstruction of Non-Nucleoside Reverse Transcriptase Inhibitors of Human Immunodeficiency Virus Type 1 for Exploration of the Optimization Landscape of Fragments. J. Med. Chem. 2011, 54 (3), 709−718. (14) Berman, H. M. The Protein Data Bank. Nucleic Acids Res. 2000, 28 (1), 235−242. (15) Chohan, T. A.; Qian, H.; Pan, Y.; Chen, J.-Z. Cyclin-Dependent Kinase-2 as a Target for Cancer Therapy: Progress in the Development of CDK2 Inhibitors as Anti-Cancer Agents. Curr. Med. Chem. 2015, 22 (2), 237−263. (16) Ghosh, A. K.; Osswald, H. L. BACE1 (β-Secretase) Inhibitors for the Treatment of Alzheimer’s Disease. Chem. Soc. Rev. 2014, 43 (19), 6765−6813. (17) Supuran, C. T. Carbonic Anhydrases as Drug Targets–an Overview. Curr. Top. Med. Chem. 2007, 7 (9), 825−833. (18) Turk, B. Targeting Proteases: Successes, Failures and Future Prospects. Nat. Rev. Drug Discovery 2006, 5 (9), 785−799. (19) Congreve, M.; Carr, R.; Murray, C.; Jhoti, H. A “Rule of Three” for Fragment-Based Lead Discovery? Drug Discovery Today 2003, 8 (19), 876−877. (20) Jhoti, H.; Williams, G.; Rees, D. C.; Murray, C. W. The “Rule of Three” for Fragment-Based Drug Discovery: Where Are We Now? Nat. Rev. Drug Discovery 2013, 12 (8), 644−645. 1208

DOI: 10.1021/acs.jcim.6b00769 J. Chem. Inf. Model. 2017, 57, 1197−1209

Article

Journal of Chemical Information and Modeling (40) Méndez-Lucio, O.; Medina-Franco, J. L. The Many Roles of Molecular Complexity in Drug Discovery. Drug Discovery Today 2017, 22 (1), 120−126. (41) Selzer, P.; Roth, H.-J.; Ertl, P.; Schuffenhauer, A. Complex Molecules: Do They Add Value? Curr. Opin. Chem. Biol. 2005, 9 (3), 310−316. (42) Marcou, G.; Rognan, D. Optimizing Fragment and Scaffold Docking by Use of Molecular Interaction Fingerprints. J. Chem. Inf. Model. 2007, 47 (1), 195−207. (43) Desaphy, J.; Raimbaud, E.; Ducrot, P.; Rognan, D. Encoding Protein-Ligand Interaction Patterns in Fingerprints and Graphs. J. Chem. Inf. Model. 2013, 53 (3), 623−637. (44) Hartshorn, M. J.; Verdonk, M. L.; Chessari, G.; Brewerton, S. C.; Mooij, W. T. M.; Mortenson, P. N.; Murray, C. W. Diverse, HighQuality Test Set for the Validation of Protein-Ligand Docking Performance. J. Med. Chem. 2007, 50 (4), 726−741. (45) Verdonk, M. L.; Ludlow, R. F.; Giangreco, I.; Rathi, P. C. Protein−Ligand Informatics Force Field (PLIff): Toward a Fully Knowledge Driven “Force Field” for Biomolecular Interactions. J. Med. Chem. 2016, 59 (14), 6891−6902. (46) Anighoro, A.; Bajorath, J. Three-Dimensional Similarity in Molecular Docking: Prioritizing Ligand Poses on the Basis of Experimental Binding Modes. J. Chem. Inf. Model. 2016, 56 (3), 580−587. (47) Kumar, A.; Zhang, K. Y. J. Application of Shape Similarity in Pose Selection and Virtual Screening in CSARdock2014 Exercise. J. Chem. Inf. Model. 2016, 56 (6), 965−973. (48) Verdonk, M. L.; Mortenson, P. N.; Hall, R. J.; Hartshorn, M. J.; Murray, C. W. Protein−Ligand Docking against Non-Native Protein Conformers. J. Chem. Inf. Model. 2008, 48 (11), 2214−2225. (49) Chuaqui, C.; Deng, Z.; Singh, J. Interaction Profiles of Protein Kinase-Inhibitor Complexes and Their Application to Virtual Screening. J. Med. Chem. 2005, 48 (1), 121−133. (50) Bajusz, D.; Rácz, A.; Héberger, K. Why Is Tanimoto Index an Appropriate Choice for Fingerprint-Based Similarity Calculations? J. Cheminf. 2015, 7 (1), 20. (51) Sirci, F.; Istyastono, E. P.; Vischer, H. F.; Kooistra, A. J.; Nijmeijer, S.; Kuijer, M.; Wijtmans, M.; Mannhold, R.; Leurs, R.; de Esch, I. J. P.; et al. Virtual Fragment Screening: Discovery of Histamine H 3 Receptor Ligands Using Ligand-Based and Protein-Based Molecular Fingerprints. J. Chem. Inf. Model. 2012, 52 (12), 3308− 3324. (52) RCSB Protein Data Bank. www.rcsb.org (accessed Feb 1, 2016). (53) Kellenberger, E.; Muller, P.; Schalon, C.; Bret, G.; Foata, N.; Rognan, D. Sc-PDB: An Annotated Database of Druggable Binding Sites from the Protein Data Bank. J. Chem. Inf. Model. 2006, 46 (2), 717−727. (54) Desaphy, J.; Bret, G.; Rognan, D.; Kellenberger, E. Sc-PDB: A 3D-Database of Ligandable Binding Sites–10 Years On. Nucleic Acids Res. 2015, 43 (D1), D399−D404. (55) Worldwide Protein Data Bank Chemical Component Dictionary. www.wwpdb.org/data/ccd (accessed Feb 1, 2016). (56) Lipinski, C. A.; Lombardo, F.; Dominy, B. W.; Feeney, P. J. Experimental and Computational Approaches to Estimate Solubility and Permeability in Drug Discovery and Development Settings. Adv. Drug Delivery Rev. 2001, 46 (1−3), 3−26. (57) Bento, A. P.; Gaulton, A.; Hersey, A.; Bellis, L. J.; Chambers, J.; Davies, M.; Krüger, F. A.; Light, Y.; Mak, L.; McGlinchey, S. The ChEMBL Bioactivity Database: An Update. Nucleic Acids Res. 2014, 42, D1083−D1090. (58) Hu, Y.; Bajorath, J. Influence of Search Parameters and Criteria on Compound Selection, Promiscuity, and Pan Assay Interference Characteristics. J. Chem. Inf. Model. 2014, 54 (11), 3056−3066. (59) Universal Protein Resource (UniProt). http://www.uniprot.org (accessed Feb 1, 2016). (60) Shindyalov, I. N.; Bourne, P. E. Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path. Protein Eng., Des. Sel. 1998, 11 (9), 739−747.

(61) Desaphy, J.; Azdimousa, K.; Kellenberger, E.; Rognan, D. Comparison and Druggability Prediction of Protein−Ligand Binding Sites from Pharmacophore-Annotated Cavity Shapes. J. Chem. Inf. Model. 2012, 52 (8), 2287−2299. (62) Malamas, M. S.; Erdei, J.; Gunawan, I.; Turner, J.; Hu, Y.; Wagner, E.; Fan, K.; Chopra, R.; Olland, A.; Bard, J.; et al. Design and Synthesis of 5,5′-disubstituted Aminohydantoins as Potent and Selective Human Beta-Secretase (BACE1) Inhibitors. J. Med. Chem. 2010, 53 (3), 1146−1158. (63) Schonbrunn, E.; Betzi, S.; Alam, R.; Martin, M. P.; Becker, A.; Han, H.; Francis, R.; Chakrasali, R.; Jakkaraj, S.; Kazi, A.; et al. Development of Highly Potent and Selective Diaminothiazole Inhibitors of Cyclin-Dependent Kinases. J. Med. Chem. 2013, 56 (10), 3768−3782. (64) Leitans, J.; Sprudza, A.; Tanc, M.; Vozny, I.; Zalubovskis, R.; Tars, K.; Supuran, C. T. 5-Substituted-(1,2,3-Triazol-4-Yl)thiophene2-Sulfonamides Strongly Inhibit Human Carbonic Anhydrases I, II, IX and XII: Solution and X-Ray Crystallographic Studies. Bioorg. Med. Chem. 2013, 21 (17), 5130−5138. (65) Maignan, S.; Guilloteau, J. P.; Pouzieux, S.; Choi-Sledeski, Y. M.; Becker, M. R.; Klein, S. I.; Ewing, W. R.; Pauls, H. W.; Spada, A. P.; Mikol, V. Crystal Structures of Human Factor Xa Complexed with Potent Inhibitors. J. Med. Chem. 2000, 43 (17), 3226−3232. (66) Bietz, S.; Urbaczek, S.; Schulz, B.; Rarey, M. Protoss: A Holistic Approach to Predict Tautomers and Protonation States in ProteinLigand Complexes. J. Cheminf. 2014, 6, 12. (67) Hawkins, P. C. D.; Skillman, A. G.; Nicholls, A. Comparison of Shape-Matching and Docking as Virtual Screening Tools. J. Med. Chem. 2007, 50 (1), 74−82. (68) Korb, O.; Stützle, T.; Exner, T. E. Empirical Scoring Functions for Advanced Protein-Ligand Docking with PLANTS. J. Chem. Inf. Model. 2009, 49 (1), 84−96. (69) Rueeger, H.; Lueoend, R.; Rogel, O.; Rondeau, J.-M.; Möbitz, H.; Machauer, R.; Jacobson, L.; Staufenbiel, M.; Desrayaud, S.; Neumann, U. Discovery of Cyclic Sulfone Hydroxyethylamines as Potent and Selective β-Site APP-Cleaving Enzyme 1 (BACE1) Inhibitors: Structure-Based Design and in Vivo Reduction of Amyloid β-Peptides. J. Med. Chem. 2012, 55 (7), 3364−3386. (70) Spitzer, R.; Jain, A. N. Surflex-Dock: Docking Benchmarks and Real-World Application. J. Comput.-Aided Mol. Des. 2012, 26 (6), 687− 699.

1209

DOI: 10.1021/acs.jcim.6b00769 J. Chem. Inf. Model. 2017, 57, 1197−1209