Structural Insights on Fragment Binding Mode Conservation - Journal

Jun 15, 2018 - IFP similarity values range from zero (no common interactions) to 1 (exactly the same interactions). ... The removal of fragments with ...
0 downloads 0 Views 2MB Size
Subscriber access provided by UNIVERSITY OF TOLEDO LIBRARIES

Article

Structural insights on fragment binding mode conservation Malgorzata N Drwal, Guillaume Bret, Carlos Perez, Célien Jacquemard, Jérémy Desaphy, and Esther Kellenberger J. Med. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.jmedchem.8b00256 • Publication Date (Web): 15 Jun 2018 Downloaded from http://pubs.acs.org on June 17, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

Structural insights on fragment binding mode conservation

Malgorzata N. Drwal∞, Guillaume Bret



, Carlos Perez§, Célien Jacquemard∞, Jeremy

Desaphy#, Esther Kellenberger∞* ∞

Laboratoire d’innovation thérapeutique (UMR7200), Université de Strasbourg, 74 Route du

Rhin, 67401 Illkirch, France; §

Eli Lilly Research Laboratories, Avenida de la Industria, 30, 28108, Alcobendas, Madrid, Spain;

#

Lilly Research Laboratories, Eli Lilly and Company, Lilly Corporate Center, Indianapolis, IN

46285, USA; *

To whom correspondence should be addressed.

ACS Paragon Plus Environment

1

Journal of Medicinal Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 42

KEYWORDS Protein DataBank, interaction fingerprints, hot spots, coverage of pocket interactions, crystallization additives

ABSTRACT

Aiming at a deep understanding of fragment binding to ligandable targets, we performed a large scale analysis of the Protein Data Bank. Binding modes of 1832 drug–like ligands and 1079 fragments to 235 proteins were compared. We observed that the binding modes of fragments and their drug-like superstructures binding to the same protein are mostly conserved, thereby providing experimental evidence for the preservation of fragment binding modes during molecular growing. Furthermore, small chemical changes in the fragment are tolerated without alteration of the fragment binding mode. The exceptions to this observation generally involve conformational variability of the molecules. Our data analysis also suggests that, provided enough fragments have been crystallized within a protein, good interaction coverage of the binding pocket is achieved. Last, we extended our study to 126 crystallization additives and discuss in which cases they provide information relevant to structure-based drug design.

ACS Paragon Plus Environment

2

Page 3 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

INTRODUCTION Fragment-based drug design (FBDD) is a well-established method to find new drug candidates by optimizing small chemical fragments into larger molecules.1,2 As compared to larger ligands, fragments have in principle a higher proportion of functional groups involved in protein binding and use many of them to precisely fits the target subpockets. Moreover, due to their reduced size and complexity, fragments allow an efficient exploration of protein binding sites.3 Thus, FBDD usually results in higher hit rates than high-throughput screening (HTS) with large molecules, providing good starting points for drug discovery programs.4,5 Examples have shown that FBDD can be successful where other drug discovery programs have failed, e.g. for difficult protein targets or protein-protein interfaces.6 FBDD generally begins with the experimental screening of fragment libraries to determine possible hits. Once their 3D structure with the protein has been determined, they are optimized into larger molecule by growing them into molecules occupying the entire binding pocket or linking fragments binding to different subpockets. At this stage, computational chemists can support the FBDD project by predicting how to grow or link fragments to develop a high-affinity drug-like ligand. Usually, the predictions rely on the assumption that the binding mode of a fragment is unique, and that the binding mode of a fragment and its drug-like counterpart will be conserved. However, several studies on ligand deconstruction have shown that this is not always the case. Based on eight examples, Kozakov and colleagues showed that fragments coinciding with low-energy hot spots tend to have conserved binding modes.7 The detection of protein hot spot has been proposed to assist fragment selection and elaboration by prioritizing protein subpockets.8

ACS Paragon Plus Environment

3

Journal of Medicinal Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 42

In the current study, we have performed a large-scale analysis of fragment binding modes in crystallographic complexes obtained from the RCSB Protein Data Bank (PDB).9 The first question we asked is how often and why fragments crystallized multiple times in the same protein cavity have variable binding modes. We have evaluated the degree of binding mode conservation between drug-like ligands and their root fragments bound to the same protein target and the influencing parameters. In particular, we have systematically compared interactions made by fragments and their drug-like superstructures bound to the same proteins. Considering all the fragments and drug-like ligands bound to the same protein binding site, we have investigated if fragments cover all the interactions made by the drug-like ligands or if, by contrast, they have specific recognition subsites. We have extended our analyses to small crystallization additives and have characterized which information is relevant for the design of drug-like ligands. Overall, our results provide guidelines for the computational support of an FBDD project where one or more fragment-protein complex structures have been solved.

RESULTS AND DISCUSSION In the current study, we explore and compare the binding modes of fragments and drug-like ligands which have been crystallized with the same protein target and within the same ligandable binding site (Figure 1A). For this purpose, we have processed protein complexes from the PDB as described in detail in the Experimental Section. Special care was taken to ensure relevant description of the binding mode by removing complexes with protein pockets containing mutated or missing residues as well as removing small molecules with missing atoms. The quality of 3Dstructures was assessed using the EDIA approach recently proposed by Meyder et al.10 On average, the 3D-structure is well covered by the electron density in more than 80% of the studied

ACS Paragon Plus Environment

4

Page 5 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

complexes. Furthermore, in none of them, the structure of the ligand or protein residues poorly fits to the corresponding electron density (more statistics are given in the Supplementary Table 1). The studied dataset contains 1079 fragments and 1832 drug-like ligands found in 1404 and 2268 PDB files, respectively. In total, 240 binding sites in 235 different protein targets are considered (Figure 1A). The average molecular weight of fragments and drug-like ligands is 204.5±45.6 and 398.0±79.2 Da. Binding modes are compared by investigating the non-covalent interactions between the small molecule and the protein pocket residues. These include nondirectional (hydrophobic) as well as directional (polar) interactions like hydrogen bonds (Hbonds), aromatic and ionic interactions, and also metal interactions with divalent cation. The numerical representation of interactions involves the definition of a binding site which is common all the ligands bound to a pocket (Figure 1B). We present here the study of two different aspects of binding modes: binding mode conservation and interaction pattern similarity. Focusing on binding mode conservation, we investigate the binding mode similarity between a single fragment and a single drug-like ligand (one-to-one comparisons scored using IFP similarity, Figure 1C). Focusing on interaction pattern similarity, we study the interactions of all fragments and compare them to the interactions of all drug-like ligands, to answer the question whether the two types of ligands display the same coverage of the pocket interactions (comparisons of consensus binding mode score using cIFP similarity, Figure 1D). In the last part of the manuscript, we extend the analysis to small crystallization additives and ask the question whether they contain useful information for FBDD studies. The parsing of the PDB selected 3287 files allowing the comparison of the binding mode of additives with the binding mode of drug-like ligands bound to the same protein pocket (Figure 1). We investigate

ACS Paragon Plus Environment

5

Journal of Medicinal Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 42

here 126 additives found in 319 binding sites of 309 proteins. We especially focus on additives bound to a free protein (or apo additives). On average, apo additives have more accurate structures than other additives. About three quarters of them show good local fit to the electron density (Supplementary Table 1).

PART I: Fragment binding mode conservation

The general assumption in FBDD when a fragment is extended into a drug-like ligand is that the binding mode of the shared substructure will be conserved. In this section, we focus on the binding mode conservation between a single fragment and a single drug-like ligand on the PDB scale. In particular, we aim at finding answers to these four different questions: (1) Is the binding mode of a single fragment conserved in multiple complexes with the same protein? (2) Is the binding mode conserved when extending the fragment into a drug-like ligand superstructure? (3) Is the binding mode conserved when extending the fragment into a structurally similar drug-like ligand? (4) Is there a correlation between fragment size and binding mode conservation?

Binding mode conservation of fragments within the same protein pocket We investigated the binding mode of 453 fragments which have been crystallized multiple times with the same protein and within the same, ligandable, binding cavity. These fragments are found in 501 complexes, involving 152 binding sites in 149 proteins. In total, 1502 3Dstructures are considered. Of note, a single PDB file can contain several biounits, generally distinguished by different chain names, and therefore can provide more than one 3D-structure

ACS Paragon Plus Environment

6

Page 7 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

for the same complex. More than two thirds of the 501 complexes have only two 3D-structures, while a few of them have more than 10 copies in our dataset (Supplementary Figure 1). For each complex, we evaluate the degree of binding mode conservation by considering the minimum IFP similarity value obtained for the comparisons of all pairs of their 3D-structures (Figure 1C). IFP similarity values range from zero (no common interactions) to 1 (exactly the same interactions). IFP similarity values in the 0.6-1 range correspond to low of root-meansquare deviations (RMSD) of the fragment atom coordinates with a mean value of 0.39+/-0.36 Å (Supplementary Figure 2A). Conversely, low IFP similarity values correspond to high RMSD of the fragment atom coordinates (mean RMSD = 7.62+/-2.68 Å if 0 ≤ IFP sim. 300 Da, because the larger molecule does not comply the rule-of-five or because of mutations in binding site). In our previous study on four targets which are over-represented in the PDB13, we also observed that there is a relationship between the fragment size and the binding mode conservation. In particular, in 90% of the studied complexes with conserved binding modes, the number of additive/fragment heavy atoms was higher than 8 and the MW was higher than 110 Da. We here extend our analysis to the 359 substructure pairs and the 1533 chemically similar pairs in the PDB (Supplementary Figure 5). We confirm that binding mode is overall conserved if the fragment MW is high enough, with a threshold around 150 Da. The same trend is observed for the fragment binding pose, although the level of conservation is generally lower, as previously mentioned in the analysis of substructure pairs (Figure 3). Are there other reasons why a fragment binding pose varies? Case study is more suitable than global analysis of the PDB to unravel the cause of change in binding mode. For example, Schauperl et al. used molecular dynamics simulations to provide a thermodynamical understanding of the variable binding mode of fragments in complex with the TGFBR1 kinase.20

Part II: Similarity between fragment and drug-like interaction patterns

In this section, we examine the coverage of protein binding site by fragments and drug-like ligands. Therefore, all available fragment- and drug-like ligand-protein complexes of the same

ACS Paragon Plus Environment

12

Page 13 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

target are considered and global interaction patterns of the fragment and drug-like ligand sets are compared. Only a few ligandable proteins in the PDB have high interaction coverage datasets In the entire PDB we found 240 ligandable sites, in 235 proteins, in complex with both fragment and drug-like ligands. When only one fragment and one drug-like ligand complex are available, different levels of interaction pattern similarity are observed. Therefore, we hypothesized that the more fragment and drug-like ligands are available for a given target, the more chances of mapping the entire pocket and thus the more chances of high interaction coverage. Indeed, our hypothesis was confirmed, as indicated in Figure 4. As a rule of thumb, data suggest that at least nine different fragments and nine different drug-like ligands in complex with the same target are necessary to observe good interaction coverage between the two sets. In our PDB data set, there are only 11 proteins which fulfil this rule ("ligand- and fragment-rich targets"; Table 1). They however belong to diverse functional classes. Conditions for similar coverage of the pocket interactions by fragment and drug-like ligands When investigating the pocket properties of the 11 ligand- and fragment-rich targets, we observe large variability in size, polarity and flexibility (Supplementary Table 3). The size of the pocket, for instance, varies between 25 and 60 residues and an average volume between 327 and 739 Å3. It is interesting to note that the two largest pockets when considering the residue count (BACE1 and LTA4H) have the lowest interaction coverage between ligands and fragments, while the smallest pocket (BRD4) has the highest interaction pattern similarity. For the other ligand- and fragment-rich target properties, however, no correlation with the interaction pattern similarity is apparent. When extending the analysis to all targets with good interaction coverage (≥ 0.6; 81 targets), the pocket properties are even more variable. The count of pocket residues in

ACS Paragon Plus Environment

13

Journal of Medicinal Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 42

those target lies between 20 and 78 residues, the average volume between 150 and 1037 Å3, the volume variation between 155 and 1121 Å3, the average Cα RMSD to the reference structure between 0.1 and 4.3 Å and the average percentage of polar points between 27 and 94 %. Taken together, the results suggest that high coverage of fragment and drug-like ligand interactions can be observed independent of the pocket properties. We then verified that the similarity of the interaction patterns is not overestimated due to low chemical diversity of the ligands. The structural diversity within the fragment and drug-like ligand sets of the ligand- and fragment-rich targets ranges between medium and high (Table 1, Supplementary Table 3). In most of the cases, the diversity within the fragment set is higher than the diversity of the drug-like ligands. Comparing the fragments and the drug-like ligands, interset similarity is relatively low, although there are, for 10 of the 11 targets, a few pairs where the fragment is an exact or a close substructure of the drug-like ligand (Supplementary Figure 6). The exception is LTA4H, where the maximal similarity of fragments and ligands is 0.63 and the interaction pattern similarity is relatively low (0.425). The low similarity is however not due to fragment and drug-like ligands interaction with mainly specific residues, yet to an unbalanced distribution of fragments and ligands within the binding site (Figure 5). Whereas drug-like ligands (orange) are evenly distributed over the entire LTA4H pocket, all but one fragment (cyan) are found on the left site of the pocket. Correspondingly, the distribution of fragments and drug-like ligands in the BACE1 pocket (Supplementary Figure 7) also explains the rather low interaction pattern similarity observed for this target, stressing that our computing approach can underestimate interaction coverage in the cases of poor spatial coverage. Lastly, we asked ourselves whether some polar interactions are specific to fragment or drug-like ligands. We observe that on average 20 % of polar interactions found in fragment complexes are

ACS Paragon Plus Environment

14

Page 15 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

unique and thus not found in drug-like ligand complexes. For example, fragments form two specific H-bonds with LTA4H (Figure 5A). Considering the 11 ligand- and fragment-rich proteins, there are target-dependent differences in the percentage of unique polar interactions (Supplementary Table 4). The values can vary between 0 and 39 % for unique H-bonds and between 0 and 100 % for unique aromatic bonds. It could be assumed that a factor which influences the results is the nature of the fragments and drug-like ligands in the dataset. However, this is not confirmed for the ligand- and fragment-rich targets, where neither the overall fragment-ligand set similarity (Supplementary Table 3, Supplementary Figure 6) nor the count of H-bond donors or acceptors (data not shown) correlates with the observed unique interaction ratio. The differences can neither be simply explained by the diverse polarities of the corresponding binding sites (Supplementary Table 4).

Part III: Can crystallization additives be regarded as fragments?

In contrast to fragments which are crystallized with the target protein on purpose, other small molecules identified in PDB structures had been unintentionally incorporated into the protein crystals.21 These compounds are buffers, reducing agents, ions, detergents, or cryoprotectants added to the experimental sample for solubilizing and stabilizing the protein. They can also be precipitants and additives of various chemical nature and size added to the experimental sample for promoting crystal formation. A study of additives interaction in crystals grown in different experimental conditions has emphasized the important role of additives in the crystal formation, showing that additives directly mediate intermolecular crystal contacts or induce conformational changes at the protein surface.22,23 An interesting case is the additive benzamidine, which enhances the crystallization of trypsin when it is bound to the active site.23 Benzamidine is

ACS Paragon Plus Environment

15

Journal of Medicinal Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 42

actually also a substructure of drug-like competitive inhibitors of trypsin, and its binding mode to the enzyme active site is indeed preserved in the drug-like superstructures,13 thereby benzamidine can also be regarded as fragment. We were here wondering whether we can find in the PDB other examples of protein-additive complexes containing useful information for computational drug design. Focusing on small additives with a MW below 300 Da, we repeated the analyses made for fragments (Figure 1): we first checked whether additives have a consistent binding mode in a protein site, next we compared additives and drug-like ligands binding modes, pair-by-pair for each protein, and last, we considered the coverage of binding site interactions by additives and by drug-like ligands. We distinguished between additives bound to a free protein (apo additives) and those present in a protein in complex between another molecule, e.g. a fragment or drug-like ligand. This distinction is important, since additives can hardly compete with a larger and stronger ligand for targeting protein hot spots. When considering the additives as a whole, we found that additives and fragments behave differently, while when focusing on apo additives, we observed common trends (see Supplementary Figure 8, 9 and 10). An example of a target with many fragment- and additive-drug-like ligand substructure pairs is human macrophage metalloelastase (MMP12). As shown in Figure 6, both structures of the additive HAE, one being an apo additive structure, show a perfect conservation of the binding mode to the drug-like ligand superstructure CGS. Apart from hydrophobic interactions, the metal interactions with the zinc ion as well as a H-bond to Ala182 are conserved. Another example of well conserved binding modes is displayed in Figure 6. Bacillus thermoproteolyticus thermolysin is one of the targets with multiple apo additive complexes. In a previous study, the benzene ring of the fragment N-(phenylcarbonyl)-beta-alanine (BYA) has been identified as a hot spot

ACS Paragon Plus Environment

16

Page 17 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

because several ligands place this ring at the given position.8 Interestingly, several apo additives map the directional interactions of BYA (Figure 6) and show good binding mode conservation. In this case, all of the additives are very small (4 or 5 heavy atoms), indicating that a certain molecular complexity is not necessary to observe binding mode conservation. When considering chemical structure, the set of additives is very heterogeneous. There are small inorganic and organic ions (e.g., phosphate and tris buffer), long straight-chain compounds (e.g., fatty acid and poly ethylene glycol), small polyols (e.g., 1,2-ethanediol), small aromatic compounds (e.g., benzoic acid), and other polar cyclic or linear compounds of various sizes (e.g., DMSO, panthothenic acid or MES buffer). We thus questioned whether binding mode conservation depends on the chemical nature of additives. When we investigated the variability of additive binding modes within the same protein, we observed that although there are high discrepancies in the classes containing the smallest compounds (small anions, small aromatic compounds, small inorganic compounds and small polyols), more than half of the additives cluster in three or less spots of a protein site (Supplementary Figure 11). For example, the ten poses of malonate bound to E. coli Aminopeptidase N pocket cluster into three groups in which the orientation and interactions are well preserved (Figure 7, left). By contrast, the 18 poses of glycerol bound to the same protein pocket define a continuous shape overlapping the hydrophobic parts of the drug-like ligand actinonin (Figure 7, right).

Exploring the interactions made by additives revealed that the small anions can act as molecular probes to reveal charged regions in the protein pocket and that the long straight-chain compounds can act as molecular probes to reveal hydrophobic regions. More surprisingly, we found that polar additives such as small polyols or DMSO are largely engaged in hydrophobic

ACS Paragon Plus Environment

17

Journal of Medicinal Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 42

contacts (Supplementary Figure 12). The above described example of glycerol bound to E. coli Aminopeptidase N pocket well illustrates that small polyols tend to bind efficiently to hydrophobic regions of the protein. In the 18 structures available for glycerol bound to E. coli Aminopeptidase N, we identified 86 interactions between the additive and protein, including 56 hydrophobic contacts with ten protein residues and 30 H-bonds with 14 protein residues. Two additional examples of polar additives engaged in an H-bond on one side and in hydrophobic contact in the other side are shown in Figure 8. Glycerol makes H-bonds with the hinge region of the ALK tyrosine kinase receptor, like most inhibitors of kinase ATP binding site. The carbon atoms of glycerol face hydrophobic side chains in the protein pocket. Another example is 1,2ethanediol which is engaged in two H-bonds with a hydrophobic site of the bromodomaincontaining protein 2. Of note, additive was crystallized alone in the protein in the two examples (apo additive).

CONCLUSIONS The PDB-wide analysis of fragment binding mode has answered several questions related to FBDD projects (see Table 2). The study revealed that a fragment crystallized multiple times in the same protein cavity tends to occupy the same position and form the same interactions in the binding pocket. In addition, interactions made by fragments are well conserved in the related drug-like ligands, independently of the fragment size. Directional interactions, especially Hbonds are highly conserved. Examples of variable binding modes drew our attention to protein flexibility and to the ability of some ligands to adopt multiple bound conformations. Growing fragments into structurally similar ligands results in good conservation of polar interactions provided the chemical modification does not induce change in the protein structure.

ACS Paragon Plus Environment

18

Page 19 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

When comparing interaction coverage by fragments and by drug-like ligands, we observed that the more fragments and drug-like ligands have been crystallized with a protein, the better the sampling of interactions in its binding site. We evaluated that nine different fragments are enough to achieve good interaction coverage for diverse pockets. Besides, fragments tend to have unique polar interactions (not seen in complexes with drug-like ligands) in polar protein pocket. The study of small additives binding to ligandable PDB proteins showed that clusters of apo additives usually reveal interactions made by drug-like ligands. Interestingly, binding of small polyols and small polar compounds (e.g., glycerol or DMSO) involves both H-bonds and hydrophobic contacts. In conclusion, from a purely structural perspective, small additives cannot be regarded as fragments because their binding modes are too variable, nevertheless they provide key information on the protein binding properties, especially revealing pharmacophoric anchor in the protein and supporting target ligandability. The main conclusions of the study are summarized in Table 2.

ACS Paragon Plus Environment

19

Journal of Medicinal Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 42

EXPERIMENTAL SECTION Data set and general procedure The analysis was performed on publicly available data from the RCSB PDB web site9. Crystal structures of additive, fragment or ligand – protein complexes were filtered as described previously13. Briefly, structures with a resolution ≤ 3 Å that were deposited between January 2000 and August 2016 and containing at least one protein chain were analyzed. Fragments were defined as small molecules with a molecular weight (MW) below 300 Da and between 2 and 18 heavy atoms, whereas ligands were defined as non-fragments following the scPDB24 ligand rules as well as the Rule of 525 with up to one exception. Crystallization additives were removed from the list of fragments and ligands and treated separately if they were found to agree with the fragment rules (small additives).

Preparation of PDB files All PDB files were downloaded from the RCSB PDB web site and prepared in a number of steps, similarly to the previously described procedure13. First, protein complexes were protonated using Protoss26 and their binding pockets were identified as all residues having at least one atom within 6.5 Å around the ligand. Protein residues were renumbered to match the Universal Protein Resource (UniProt)27 residue numbering. This was achieved using a protein chain sequence alignment with the EMBOSS needle package28 or, in cases with many gaps and mismatches, a structure-based sequence alignment with MOE (Molecular Operating Environment, 2016.08; Chemical Computing Group ULC, Montreal, Canada). For each target with drug-like ligands, one or multiple "common" binding sites were defined as follows. A first distinction was made between pockets containing one ("monomeric" pockets) or multiple protein chains ("dimeric"

ACS Paragon Plus Environment

20

Page 21 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

pockets, etc.). Because the study focuses on the comparison of binding modes within one protein target, hetero-multimeric pockets were not considered in the analysis. Secondly, drug-like ligand binding sites were clustered based on their residue overlap to distinguish between spatially separated sites of a target. This was achieved using a hierarchical clustering with average linkage. When two site clusters show a residue overlap of at least 5 %, the clusters are merged. For each binding site cluster, residues occurring in more than 10 % of sites were defined as the "common" site. All PDB files containing mutations or missing residues in the common site were removed from the analysis. The remaining PDB files for each target and common site were superimposed to a reference structure using CE29. The reference was chosen as the centroid according to binding site similarity calculated with the Shaper software30. Finally, binding modes of all molecules overlapping with the drug-like ligand cavity were investigated in terms of interaction fingerprints (IFPs)31 calculated using the in-house IChem software32. Interaction fingerprints represent the presence and absence of specific interaction types with the binding site residues. Default IChem settings were used to detect H-bonding (distance between H-bond donor atom and H-bond acceptor atom is lower the 3.5 Å and H-bond angle is 180° ± 60°), ionic interaction (distance between positively and negatively charged atoms is lower than 4.0 Å) and interaction between a H-bond acceptor and a metallic cation (distance is lower than 2.8 Å). Hydrophobic contacts were detected between carbon atoms if the interatomic distance is lower than 5 Å. Aromatic interactions were detected as follows: in π-π interactions, the distance between π ring centers is lower than 5.0 Å and angle between π ring planes is 180° ± 30 or 90° ± 60; in π-cation interaction, the distance between π ring center and aromatic cation is lower than 5.0 Å and the

ACS Paragon Plus Environment

21

Journal of Medicinal Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 42

angle between planes is 180° ± 30. Water molecules, which are not built in all the studied PDB structures, were not considered in the analysis.

Comparison of binding modes and interaction patterns To compare the binding mode conservation of the same fragment within a protein pocket, the IFP similarity was calculated using the Tanimoto metric (Tc) and the minimal observed similarity for a given molecule and target was investigated. It was also determined whether the fragment in different PDB structures binds to the protein pocket with a similar pose and with a similar conformation. To that purpose, the RMSD of non-hydrogen atom coordinates of the fragment was computed using a script in python 2.7.14, respectively before and after optimal superposition of the compared fragment poses using the maximum common substructure alignment from oechem library (OpenEye Scientific, USA). To compare the binding mode conservation of a fragment (or an additive) with the binding mode of drug-like ligand within the same protein pocket, IFP similarity was calculated as a proportion (Pr) of the common interactions divided by the smaller molecule’s interactions, as described by the following formula:  =

  + 

Where molecule A is smaller than molecule B (i.e. A is a fragment or an additive and B is a drug-like ligand), o is the count of overlapping interactions and u are the unique molecular interactions. For the one-to-one comparisons between a fragment and its drug-like superstructure within the same protein pocket (i.e. the substructure pairs), it was also determined whether the fragment alone and included in a larger ligand binds to the protein pocket with a similar pose. To that

ACS Paragon Plus Environment

22

Page 23 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

purpose, shape overlap and chemistry match were computed using ROCS v3.2.0.4 (OpenEye Scientific, USA). Importantly, scoring calculation only was performed since all the structures were 3D-aligned onto the reference structure of the protein pocket beforehand. The shape overlap (ROCS shape) and chemistry match (ROCS color) were evaluated using the Tversky and ColorTversky scores, respectively. To compare the interaction patterns of all the fragments and all the ligands bound to the same protein pocket, consensus interaction fingerprints (cIFPs) were generated. These are numerical fingerprints describing the frequency of each interaction within the given set (fragments or druglike ligands). To describe the similarity between two cIFPs, the Tanimoto coefficient, defined for continuous variables33, was used. In addition, numerical cIFPs were converted into binary fingerprints to study the ratios of unique and shared interactions between two molecule sets. Different thresholds to convert the fingerprints were tested, but the clearest results were generally obtained when a threshold of > 0 was set. Using this threshold, every interaction observed at least once is converted into an on-bit in the binary fingerprint. Importantly, all interaction similarities (as expressed by IFP Tc, IFP Pr or cIFP similarity) were calculated separately for hydrophobic and polar interactions and both weighted with 0.5.

Molecular properties To investigate the similarity/diversity within a molecule set as well as between sets, extendedconnectivity fingerprints with the radius 2 (ECFP2) were calculated using Pipeline Pilot 2016 (BIOVIA, Dassault Systèmes, France). Furthermore, several molecular properties were tested for

ACS Paragon Plus Environment

23

Journal of Medicinal Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 42

their correlation with binding mode similarity: molecular weight and the count of heavy atoms. Both of them were also calculated with Pipeline Pilot 2016.

Pocket properties Properties of binding sites were calculated using the in-house IChem software (Volsite30). For each common binding site, the average descriptors of all pockets of this target were determined. Apart from calculating the pocket volume, the software determines all possible interaction points describing different molecular interactions (e.g. hydrophobic points, hydrogen bonding points, etc.). For the pocket volume, the pocket volume variation was also determined by subtracting the minimum from the maximum volume. Box plot Box plots were generated by matplotlib 2.1.0 in python 2.7.14. Only figures S12 and S13 use the display obtained using the default settings. In the other figures, the box gives the first and third quartiles, the median is shown in red, the whiskers indicate the first and ninth deciles, and outliers are not represented.

ASSOCIATED CONTENT A web interface has been designed to query the data presented in this article. It is freely available at bioinfo-pharma.u-strasbg.fr/PDBmob.

Supporting Information. The following files are available free of charge.

ACS Paragon Plus Environment

24

Page 25 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

Twelve figures and four tables are provided in a PDF files. Are given the number of structures available for fragment-protein complexes (Supp. Fig. 1), the effect of conformational change and structure quality on the binding mode conservation of the same fragment within the same binding pocket (Supp. Fig 2.), the effect on protein flexibility on binding modes conservation (Supp. Fig. 3), the dependence of fragment-ligand binding mode conservation on chemical similarity (Supp. Fig. 4), the fragment size and conservation of binding mode and binding pose (Supp. Fig. 5), the chemical similarity between fragment and ligand sets for the fragment and ligand-rich targets (Supp. Fig. 6), the coverage of human beta secretase pocket (Supp. Fig. 7), the binding mode conservation of the same additive bound to multiple structures of the same protein site (Supp. Fig. 8), the binding mode conservation of additive-ligand substructure pairs (Supp. Fig. 9), the dependence of additive-drug-like ligand interaction pattern similarity on the number of different molecules (Supp. Fig. 10), the variability of additive binding mode in multiple structures of the same protein site (Supp. Fig. 11), the nature of interaction made by additives (Supp. Fig. 12), the 3D-structure quality evaluated using the electron density support for individual atoms (Supp. Table 1), the EDIA score of the complexes described in the figures (Supp. Table 2), the properties of ligand and fragment-rich targets (Supp. Table 3) and the unique polar interactions of targets with many fragments and ligands (Supp. Table 4). AUTHOR INFORMATION Corresponding Author *Prof Esther KELLENBERGER, [email protected], phone: +33 368854221 Present Addresses

ACS Paragon Plus Environment

25

Journal of Medicinal Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 42

†If an author’s address is different than the one given in the affiliation line, this information may be included here. Author Contributions Project coordination: EK & JD; design of protocol: EK & MND; molecule classification: MND, CP & EK; implementation of protocol: MND; data preparation: MND & GB, data analysis: MND, EK & CJ; preparation of manuscript: MND, EK, CJ, CP & JD. All authors have given approval to the final version of the manuscript. Funding Sources This work was supported by Eli Lilly and Company through the Lilly Research Award Program (LRAP). ACKNOWLEDGMENT

The authors would like to thank the LRAP funding program. ABBREVIATIONS CAM – camphor; DMSO – dimethyl sulfoxide ; FBDD – fragment-based drug design; IBM – 3isobutyl-1-methylxanthine ; IFP – interaction fingerprint; MES – 2-(Nmorpholino)ethanesulfonic acid ; MW – molecular weight; MYI – 5-methoxy-indole acetate ; PDB – Protein Data Bank ; PPARγ – human peroxisome proliferator-activated protein gamma; RMSD – root mean square deviation ; ; TGFBR1 – transforming growth factor β receptor type 1; TRIS – tris(hydroxymethyl)aminomethane ; Other protein name abbreviations are given in the Table 1.

ACS Paragon Plus Environment

26

Page 27 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

REFERENCES

(1)

Doak, B. C.; Norton, R. S.; Scanlon, M. J. The Ways and Means of Fragment-Based Drug Design. Pharmacol. Ther. 2016, 167, 28–37.

(2)

Rees, D. C.; Congreve, M.; Murray, C. W.; Carr, R. Fragment-Based Lead Discovery. Nat. Rev. Drug Discov. 2004, 3 (8), 660–672.

(3)

Kalliokoski, T.; Olsson, T. S. G.; Vulpetti, A. Subpocket Analysis Method for FragmentBased Drug Discovery. J. Chem. Inf. Model. 2013, 53 (1), 131–141.

(4)

Carr, R. A. E.; Congreve, M.; Murray, C. W.; Rees, D. C. Fragment-Based Lead Discovery: Leads by Design. Drug Discov. Today 2005, 10 (14), 987–992.

(5)

Hajduk, P. J.; Greer, J. A Decade of Fragment-Based Drug Design: Strategic Advances and Lessons Learned. Nat. Rev. Drug Discov. 2007, 6 (3), 211–219.

(6)

Price, A. J.; Howard, S.; Cons, B. D. Fragment-Based Drug Discovery and Its Application to Challenging Drug Targets. Essays Biochem. 2017, 61 (5), 475–484.

(7)

Kozakov, D.; Hall, D. R.; Jehle, S.; Luo, L.; Ochiana, S. O.; Jones, E. V; Pollastri, M.; Allen, K. N.; Whitty, A.; Vajda, S.; Jehle, S. Ligand Deconstruction: Why Some Fragment Binding Positions Are Conserved and Others Are Not. Proc Natl Acad Sci PNAS 2015, 20112 (28), 2585–2594.

(8)

Rathi, P. C.; Ludlow, R. F.; Hall, R. J.; Murray, C. W.; Mortenson, P. N.; Verdonk, M. L. Predicting “Hot” and “Warm” Spots for Fragment Binding. J. Med. Chem. 2017, 60 (9), 4036–4046.

(9)

Berman, H. M. The Protein Data Bank. Nucleic Acids Res. 2000, 28 (1), 235–242.

ACS Paragon Plus Environment

27

Journal of Medicinal Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 42

(10) Meyder, A.; Nittinger, E.; Lange, G.; Klein, R.; Rarey, M. Estimating Electron Density Support for Individual Atoms and Molecular Fragments in X‑ray Structures. J. Chem. Inf. Model. 2017, 57 (10), 2437–2447. (11) Lingel, A.; Sendzik, M.; Huang, Y.; Shultz, M. D.; Cantwell, J.; Dillon, M. P.; Fu, X.; Fuller, J.; Gabriel, T.; Gu, J.; Jiang, X.; Li, L.; Liang, F.; McKenna, M.; Qi, W.; Rao, W.; Sheng, X.; Shu, W.; Sutton, J.; Taft, B.; Wang, L.; Zeng, J.; Zhang, H.; Zhang, M.; Zhao, K.; Lindvall, M.; Bussiere, D. E. Structure-Guided Design of EED Binders Allosterically Inhibiting the Epigenetic Polycomb Repressive Complex 2 (PRC2) Methyltransferase. J. Med. Chem. 2017, 60 (1), 415–427. (12) Czodrowski, P.; Hölzemann, G.; Barnickel, G.; Greiner, H.; Musil, D. Selection of Fragments for Kinase Inhibitor Design: Decoration Is Key. J. Med. Chem. 2015, 58 (1), 457–465. (13) Drwal, M. N.; Perez, C.; Desaphy, J. J.; Kellenberger, E.; Jacquemard, C.; Perez, C.; Desaphy, J. J.; Kellenberger, E. Do Fragments and Crystallization Additives Bind Similarly to Drug-like Ligands? J. Chem. Inf. Model. 2017, 57 (5), 1197–1209. (14) Drwal, M. N.; Bret, G.; Kellenberger, E. Multi-Target Fragments Display Versatile Binding Modes. Mol. Inform. 2017, 36 (10), 1700042–1700042. (15) Kontopidis, G.; McInnes, C.; Pandalaneni, S. R.; McNae, I.; Gibson, D.; Mezna, M.; Thomas, M.; Wood, G.; Wang, S.; Walkinshaw, M. D.; Fischer, P. M. Differential Binding of Inhibitors to Active and Inactive CDK2 Provides Insights for Drug Design. Chem. Biol. 2006, 13 (2), 201–211. (16) Tarcsay, Á.; Nyíri, K.; Keserű, G. M. Impact of Lipophilic Efficiency on Compound Quality. J. Med. Chem. 2012, 55 (3), 1252–1260.

ACS Paragon Plus Environment

28

Page 29 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

(17) Smith, C. R.; Dougan, D. R.; Komandla, M.; Kanouni, T.; Knight, B.; Lawson, J. D.; Sabat, M.; Taylor, E. R.; Vu, P.; Wyrick, C. Fragment-Based Discovery of a Small Molecule Inhibitor of Bruton’s Tyrosine Kinase. J. Med. Chem. 2015, 58 (14), 5437– 5444. (18) Malhotra, S.; Karanicolas, J. When Does Chemical Elaboration Induce a Ligand To Change Its Binding Mode? J. Med. Chem. 2017, 60 (1), 128–145. (19) Congreve, M.; Carr, R.; Murray, C.; Jhoti, H. A “Rule of Three” for Fragment-Based Lead Discovery? Drug Discov. Today 2003, 8 (19), 876–877. (20) Schauperl, M.; Czodrowski, P.; Fuchs, J. E.; Huber, R. G.; Waldner, B. J.; Podewitz, M.; Kramer, C.; Liedl, K. R. Binding Pose Flip Explained via Enthalpic and Entropic Contributions. J. Chem. Inf. Model. 2017, 57 (2), 345–354. (21) Kirkwood, J.; Hargreaves, D.; O’Keefe, S.; Wilson, J. Analysis of Crystallization Data in the Protein Data Bank. Acta Crystallogr. Sect. F Struct. Biol. Commun. 2015, 71 (10), 1228–1234. (22) McPherson, A.; Cudney, B. Searching for Silver Bullets: An Alternative Strategy for Crystallizing Macromolecules. J. Struct. Biol. 2006, 156 (3), 387–406. (23) Larson, S. B.; Day, J. S.; Cudney, R.; McPherson, A. A Novel Strategy for the Crystallization of Proteins: X-Ray Diffraction Validation. Acta Crystallogr. D Biol. Crystallogr. 2007, 63 (3), 310–318. (24) Desaphy, J.; Bret, G.; Rognan, D.; Kellenberger, E. Sc-PDB: A 3D-Database of Ligandable Binding Sites--10 Years On. Nucleic Acids Res. 2015, 43 (Database issue), D399–D404.

ACS Paragon Plus Environment

29

Journal of Medicinal Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 42

(25) Lipinski, C. A.; Lombardo, F.; Dominy, B. W.; Feeney, P. J. Experimental and Computational Approaches to Estimate Solubility and Permeability in Drug Discovery and Development Settings. Adv. Drug Deliv. Rev. 2001, 46 (1-3), 3–26. (26) Bietz, S.; Urbaczek, S.; Schulz, B.; Rarey, M. Protoss: A Holistic Approach to Predict Tautomers and Protonation States in Protein-Ligand Complexes. J. Cheminformatics 2014, 6, 12. (27) The UniProt Consortium. UniProt: The Universal Protein Knowledgebase. Nucleic Acids Res. 2017, 45 (D1), D158–D169. (28) Rice, P.; Longden, I.; Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet. TIG 2000, 16 (6), 276–277. (29) Shindyalov, I. N.; Bourne, P. E. Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path. Protein Eng. 1998, 11 (9), 739–747. (30) Desaphy, J.; Azdimousa, K.; Kellenberger, E.; Rognan, D. Comparison and Druggability Prediction of Protein–Ligand Binding Sites from Pharmacophore-Annotated Cavity Shapes. J. Chem. Inf. Model. 2012, 52 (8), 2287–2299. (31) Marcou, G.; Rognan, D. Optimizing Fragment and Scaffold Docking by Use of Molecular Interaction Fingerprints. J. Chem. Inf. Model. 2007, 47 (1), 195–207. (32) Desaphy, J.; Raimbaud, E.; Ducrot, P.; Rognan, D. Encoding Protein-Ligand Interaction Patterns in Fingerprints and Graphs. J. Chem. Inf. Model. 2013, 53 (3), 623–637. (33) Bajusz, D.; Rácz, A.; Héberger, K. Why Is Tanimoto Index an Appropriate Choice for Fingerprint-Based Similarity Calculations? J. Cheminformatics 2015, 7 (1), 20.

ACS Paragon Plus Environment

30

Page 31 of 42

Journal of Medicinal Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Paragon Plus Environment

31

Journal of Medicinal Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 42

Table 1. Ligand-and fragment-rich targets Protein name

UniProt AC

Protein class

Ligand count

Fragment Count

human beta secretase 1 (BACE1) human cyclin-dependent kinase 2 (CDK2) human heat-shock protein 90α (HSP90) human carbonic anhydrase 2 (CAH2) human protein kinase pim1 (PIM1) human estrogen receptor 1 (ESR1) human phosphodiesterase 10A (PDE10A) Human tankyrase-2 (TNKS2) Human bromodomaincontaining protein 4 (BRD4) Human leukotriene A-4 hydrolase (LTA4H) Rabbit muscle glycogen phosphorylase (PYGM)

P56817

139

P07900

Hydrolase (aspartyl protease) Transferase (protein kinase) Chaperone

P00918 P11309

P24941

P03372 Q9Y233 Q9H2K2 O60885

P09960 P00489

17

Ligand/Fragment median similarity 0.41

Interaction pattern similarity 0.665

123

42

0.36

0.895

69

44

0.40

0.785

Lyase

50

118

0.40

0.840

Transferase (protein kinase) Activator/Receptor (nuclear receptor) Hydrolase (phosphodiesterase) Transferase (glycosyltransferase) Chromatin regulator

36

17

0.38

0.800

18

11

0.50

0.765

14

22

0.38

0.725

13

26

0.47

0.860

13

13

0.35

0.905

Hydrolase (metalloprotease) Transferase (glycosyltransferase)

11

14

0.42

0.425

10

10

0.53

0.810

The colors indicate the diversity within the fragment or ligand sets. Green: high diversity (average(1 - similarity) > 0.75), light green: medium diversity (average(1 - similarity) > 0.6). Tanimoto similarity between molecules is calculated using ECFP2 fingerprints.

ACS Paragon Plus Environment

32

Page 33 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

Table 2. Main conclusions for FBDD

Question

Answer Cases

Does a fragment always bind a protein pocket in a similar way? Is the binding mode between a fragment and its drug-like superstructure conserved? Is the binding mode between a fragment and a similar drug-like ligand conserved?

Yes

Yes

Yes

74% (IFP sim. ≥0.6)

75%

Is there a minimal dataset size to observe interaction pattern similarity between fragment and ligand complexes?

Can crystallization additives be regarded as fragments in drug design?

Exceptions: protein flexibility, multiple conformations of fragment, multiple tautomeric states, multiple molecules within the binding pocket Exceptions: protein flexibility.

(IFP sim. ≥0.8)

Interactions are better conserved than pose

62.5%

Exceptions: protein flexibility.

(ECFP2 sim. ≥0.7,

Polar interactions conserved.

IFP sim. ≥0.8) Is there an influence of fragment size on binding mode conservation?

Comments

are

better

Yes

inferred from 623 Binding mode is generally conserved similar fragment- if MW > 150 Da. Same trend is ligand pairs observed for binding pose.

Yes

inferred from 235 > 9 different fragments and ligands proteins binding 1079 should be crystallized in the protein fragments and 1832 pocket (fulfilled by 11 proteins only). drug-like ligands

No

Additive binding modes are more inferred from 309 variable than those of fragments and proteins binding 1079 rarely bound alone in the pocket. fragments and 1832 Apo additives however reveal drug-like ligands interactions made by drug-like ligands

ACS Paragon Plus Environment

33

Journal of Medicinal Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 42

Table of Contents Graphic

ACS Paragon Plus Environment

34

Page 35 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

Figure 1. Overview of the study. A. Selection of 3D-structures in the PDB. B. Numerical representation of binding mode used in this study. For the sake of illustration is depicted the fragment X76 bound to human CDK2 ATP-binging pocket in the PDB structure 3R1Y. C. Similarity between the binding modes of two small molecules bound to the same pocket. For the sake of illustration are displayed the fragment X76 and druglike ligand Z04 bound to human CDK2 ATP-binging pocket in the PDB structures 3R1Y and 3R7Y (IFP sim. = 1). D. Similarity between the interactions coverage by all fragments and by all drug-like ligands bound to the same pocket. For the sake of illustration are displayed all the ligands of human CDK2 ATP-binging pocket (cIFP sim. = 0.895). 193x235mm (150 x 150 DPI)

ACS Paragon Plus Environment

Journal of Medicinal Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2. Binding mode conservation of the same fragment bound to multiple structures of the same protein site. (A). Histogram shows the frequency of the observed minimum binding mode similarity values. (B) 3isobutyl-1-methylxanthine (IBM) bound to human phosphodiesterase 9A2 in two biounits of the PDB structure 2HD1 (biounit 1 in magenta and biounit 2 in yellow). (C) Camphor (CAM) bound to Pseudomonas putida Camphor 5-monooxygenase in the PDB structures 1DZ4 (green) and 4JX1 (orange). In (B) and (C) protein backbone is displayed as cartoon, binding site residues as lines and fragments as sticks. Fragment chemical structure is inserted in the two images. 180x189mm (150 x 150 DPI)

ACS Paragon Plus Environment

Page 36 of 42

Page 37 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

Figure 3. Analysis of 359 fragment-ligand substructure pairs. (A) Overall binding mode similarity considering all interactions (left) and polar interactions (right). The median for each substructure pair is shown. (B) Effect of fragment binding mode variability on the binding mode similarity of fragment/ligand substructure pairs. Fragment CK2 (PDB structures 1PXJ and 2C5O) and its drug-like superstructure CK8 (PDB structure 2C5N) bound to human cyclin dependent kinase 2. The protein in its active form is displayed as ribbon, the fragment (orange) and ligand (green) as sticks, and hydrogen bonds in the hinge region of the protein in yellow. For the sake of comparison, the inactive form of the protein is shown as a light orange ribbon on the right panel. The proportion of interactions conserved is high (0.89) when considering fragment and ligand binding to the active form of the protein (left) and decreases (0.57) when considering fragment bound to the inactive form and ligand bound to the active one (right). (C) Overall pose similarity considering shape overlap (ROCS shape, left) and chemistry match (ROCS color, right). The median for each substructure pair is shown. 180x161mm (150 x 150 DPI)

ACS Paragon Plus Environment

Journal of Medicinal Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 4 Similarity between fragment and drug-like interaction patterns. Dependence of fragment-ligand interaction pattern similarity on the number of different molecules considered. In each step, all targets with at least X (1, 2, 3, etc.) HET codes of both fragments and drug-like ligands are considered. Number of distinct binding sites is shown in red. 82x63mm (96 x 96 DPI)

ACS Paragon Plus Environment

Page 38 of 42

Page 39 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

Figure 5. Coverage of human leukotriene A-4 hydrolase pocket. (A) Drug-like ligand (left) and fragment (right) interaction heatmaps. Different types of interactions (hydrophobic: HYD, aromatic: AROM, hydrogen bonding: HB, ionic: IONIC and metal: METAL) are displayed on the X-axis. The binding site residues (oneletter code, residue number, chain) are displayed on the y-axis. The color intensity describes the frequency of the observed interaction in all complexes for this set (e.g. fragments). Residue number 1 is zinc (B) Overlay of all fragments and drug-like ligands in the reference 3D-structure. Drug-like ligands (green) and fragments (magenta) are shown as sticks, the binding site as surface and the residues of the binding site as lines. 127x250mm (150 x 150 DPI)

ACS Paragon Plus Environment

Journal of Medicinal Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 6. Examples of additive binding mode conservation. (Left panel) Example of the binding mode conservation of an additive / drug-like ligand substructure pair. Example of the drug-like ligand CGS (green, PDB: 1JIZ) and its additive substructure HAE bound alone (white, PDB: 1OS2) and in presence of the ligand EEG (light pink, PDB: 3LIK) in the human macrophage metalloelastase (MMP12) pocket. The protein is shown as green cartoon and important protein residues as sticks. Polar interactions are indicated as dotted lines. The zinc ion is shown as small sphere. (Right panel) Example of apo additives matching the polar interactions of a fragment. Bacillus thermoproteolyticus thermolysin bound to the fragment BYA (green, PDB: 3FGD) and three examples of apo additives: 2-bromoacetate (cyan; PDB: 3NN7), S-1,2-propanediol (blue, PDB: 3N21) and acetic acid (white, PDB: 2A7G). The small molecules are shown are sticks, the protein as cartoon, the interacting protein residues as lines, the zinc ion as a small sphere. Polar interactions are indicated as dotted lines: hydrogen bonds in green and metal interaction in yellow. All electron densities have been deposited and correspond with the shown structures. 157x56mm (150 x 150 DPI)

ACS Paragon Plus Environment

Page 40 of 42

Page 41 of 42 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Medicinal Chemistry

Figure 7. Examples of additive multiple poses in a protein site. (Left panel) Malonate (MLI, pink) bound to E. coli Aminopeptidase N pocket (UniProt AC: P04825). The protein is shown as green cartoon and zinc ion as sphere. (Right panel) Glycerol (GOL, pink) and actinonin (BB2, green, PDB ID: 4Q4E) bound to E. coli Aminopeptidase N pocket. The protein is shown as green cartoon and hydrophobic protein residues in site as lines. 150x75mm (150 x 150 DPI)

ACS Paragon Plus Environment

Journal of Medicinal Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 8. Examples of additive interactions. (Left panel) Glycerol (GOL, pink) and drug-like inhibitor (OFG, green, PDB ID: 4CCB) bound to the hinge region of ALK tyrosin kinase receptor ATP binding site. (Right panel) 1,2-ethanediol (EDO, pink) and the inhibitor GSK525762 (EAM, green, PDB ID: 2YEK) bound to bromodomain-containing protein 2. The protein is shown as green cartoon and hydrophobic protein residues in site as lines. Hydrogen bonds are indicated as yellow dotted lines. 165x59mm (150 x 150 DPI)

ACS Paragon Plus Environment

Page 42 of 42