Identification and Visualization of Kinase-Specific Subpockets

Jan 6, 2016 - This is, in particular, true for therapeutically important protein ... (27, 33, 35, 42, 43) To highlight physicochemical hot spots withi...
0 downloads 0 Views 4MB Size
Subscriber access provided by UNIV OF CALIFORNIA SAN DIEGO LIBRARIES

Article

Identification and Visualization of Kinase-Specific Subpockets Andrea Volkamer, Sameh Eid, Samo Turk, Friedrich Rippmann, and Simone Fulle J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.5b00627 • Publication Date (Web): 06 Jan 2016 Downloaded from http://pubs.acs.org on January 10, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Information and Modeling is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 42

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Identification and Visualization of Kinase-Specific Subpockets Andrea Volkamer1* , Sameh Eid1 , Samo Turk 1 , Friedrich Rippmann2 , Simone Fulle1* 1

BioMed X Innovation Center, Im Neuenheimer Feld 515, 69120 Heidelberg, Germany.

2

Global Computational Chemistry, Merck KGaA, Frankfurter Str. 250, 64293 Darmstadt,

Germany.

Keywords: Kinases, selectivity, subpockets, flexibility, rational hit optimization

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling Volkamer et al. – Identification of kinase-specific subpockets

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 42 2

Abstract The identification and design of selective compounds is important for the reduction of unwanted side effects as well as for the development of tool compounds for target validation studies. This is in particular true for therapeutically important protein families that possess conserved folds and have numerous members such as kinases. To support the design of selective kinase inhibitors, we developed a novel approach that allows to identify specificity determining subpockets between closely related kinases solely based on their three-dimensional structures. To account for the intrinsic flexibility of the proteins, multip le X-ray structures of the target protein of interest as well as of unwanted off-target(s) are taken into account. The binding pockets of these protein structures are calculated and fused to a combined target and off-target pocket, respectively. Subsequently, shape differences between these two combined pockets are identified via fusion rules. The approach provides a userfriendly visualization of target-specific areas in a binding pocket which should be explored when designing selective compounds. Furthermore, the approach can be easily combined with in silico alanine mutation studies to identify selectivity determining residues. The potential impact of the approach is demonstrated in four retrospective experiments on closely related kinases, i.e. p38α vs. Erk2, PAK1 vs. PAK4, ITK vs. AurA, and BRAF vs. VEGFR2. Overall, the presented approach does not require any profiling data for training purposes, provides an intuitive visualization of a large number of protein structures at once, and could also be applied to other target classes.

ACS Paragon Plus Environment

Page 3 of 42

Journal of Chemical Information and Modeling Volkamer et al. – Identification of kinase-specific subpockets

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

3

Introduction Kinases are established drug targets for cancer and inflammatory diseases. 1 Given that they are among the most frequently mutated proteins in tumors, 2–4 it is not surprising that they are estimated to account for 30% of all drug discovery projects. 5 These efforts resulted until now into 26 FDA approved small molecule kinase inhibitors. 6 However, the human genome codes for 560 kinases, and a large number of these kinases – including potential cancer targets - is still unexplored.7,8 To date over 3600 protein structures of human kinases are available in the PDB,9 and many more crystal structures of kinases exist in-house in pharmaceutical companies. The three-dimensional structure of kinases, especially the ATP-binding site, is highly conserved, which hinders the identification of selective compounds. Profiling studies 10–15 verified that several compounds, including approved drugs, are relatively promiscuous (e.g., sunitinib inhib its more than 50% of the tested kinase panel10 ). The interaction with multiple targets can be synergistic by increasing its effect on a particular pathway or by affecting alternative pathways in the case of a resistance mechanism.16 Imatinib, for example, has been initially developed to selectively inhibit platelet-derived growth factor receptors (PDGFR) alpha and beta, but was found later to also bind the oncogenic kinases c-Kit and BCR-Abl.16,17 However, uninte nded drug-target interactions can cause toxic side-effects

18,19

and are often a driving cause for the

termination of drug development programs. 20 Designing selective compounds is of high practical value, not only in drug discovery efforts, but also for target validation studies.21,22 Several computational approaches already deal with the prediction of selectivity determining features. The most prominent ones include the physicochemical analysis of binding sites,23–27 docking calculations,28 matched molecular pair analysis,29–31 QSAR,32 proteochemometr ic approaches,33,34 analysis of protein ligand interactions,35,36 as well as rule-based methods.22 Most of the initial binding site comparison studies focused on the identification of similarities in order to functionally classify proteins.37–41 Nevertheless, several approaches to identify selectivity

determining

features

in

binding

sites

also

exist.27,33,35,42,43 To highlight

physicochemical hot spots within the binding site, many grid-based approaches use molecular interaction fields (MIFs) 23,24 or knowledge-based potentials

25,26 . BioGPS 27,44

for instance uses

MIFs to identify and compare hydrophilic as well as hydrophobic areas in the binding sites of interest. While many previous GRID-based methods

23,39,40 only

allowed to calculate hot-spots

for one structure at a time, BioGPS introduces so-called pharmacophore interaction points for

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling Volkamer et al. – Identification of kinase-specific subpockets

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 42 4

structural ensembles. The method is very helpful for identifying hot-spots in protein ensembles; nevertheless, the direct comparison between two sets of protein ensembles has still to be done manually. Due to the increased coverage of profiling data as well as crystal structures, especially in the field of kinases,10–15 proteochemometric approaches (PCM), which combine ligand and receptor information, became feasible for selectivity predictions.33,34 For example, Subramania n et al.33 described binding sites by MIFs and compounds by molecular fingerprints - amongst other descriptors - to set up a PCM model based on kinase profiling data. Involving manual curation, the model allowed extracting structural elements of ligands and protein binding sites relevant for either affinity or selectivity. A potential downside is the fact that crystal structures as well as compounds and activity data for a relatively large number of proteins have to be known beforehand, which is not available for many orphan targets and might not be accessible for all research groups in general. Here, we describe a novel procedure to highlight specific subpockets between two sets of protein structures. The approach is solely based on automatically derived grid representations of the protein binding sites of interest. The only requirement is the corresponding 3D information of the structures and, optionally, the binding mode and activity value of one lead compound. The method allows to consider multiple conformations of the protein(s) under investigation and takes thereby the flexibility of the receptor structure implicitly into account. The applications are manifold: a key target can be compared versus one off-target as well as versus several off-targets. The off-targets can be project-specific kinases or other proteins such as common toxicity-related targets45 . In this study, we use DoGSiteScorer46 to calculate grid representations of potential binding pockets. Our novel post-processing procedure fuses first all pocket grids of the keytarget structures to a ‘combined target pocket’ and all pocket grids of the off-target structures to a ‘combined off-target pocket’, such that the frequency of each grid point in the respective pocket set is retained. Subsequently, the two combined pockets are fused to one ‘difference pocket’ such that three distinguishable areas are revealed: the common core, the target-specific, and the off-target specific areas. Besides the target and off-target specific areas, also the common core contains insightful information regarding selectivity determining areas, i.e., which grid points appear more frequently in one or the other set. Additionally, we shortly describe a PyMOL based visualization of the combined and difference grids, which supports the visual inspection of the differences of the proteins of interests. Functionalities include the options to highlight specific subpockets with variable granularity and in different colors as well as to cluster points of the ACS Paragon Plus Environment

Page 5 of 42

Journal of Chemical Information and Modeling Volkamer et al. – Identification of kinase-specific subpockets

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

5

specific areas. Overall, the presented approach can be used as an idea generator by highlighting which areas of the pocket should be further explored and which areas can be avoided when designing selective compounds. Finally, the benefit of this approach is demonstrated via four kinase examples, namely the selective binding of kinase inhibitors I) to two closely related mitogen-activated protein kinases (MAPKs), i.e. p38 and Erk2,44,47 II) to p21-activated kinases (PAKs)

48–51

from the subgroups

I (PAK1) and II (PAK4), III) to the IL-2 inducible T-cell kinase (ITK) over Aurora kinase A (AurA),49 and IV) to the rapidly accelerated fibrosarcoma kinase B (BRAF) over vascular endothelial growth factor receptor (VEGFR2); also known as kinase insert domain receptor (KDR)53 .

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling Volkamer et al. – Identification of kinase-specific subpockets

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 42 6

Materials and Methods Preparation of PDB structures. For each test case, all available structures were downloaded from the PDB and processed as described in ref. 8 . Briefly, MOE 54 was used to add hydrogen atoms and missing residues, and an internal Python script was used to remove non-kinase domains, alternative side chain conformations, waters, ions, and buffers. Additionally, modified amino acids were changed to the corresponding standard amino acids. Subsequently, PyMOL’s55 pairwise alignment function was used to iteratively align all structures to the reference kinase structure 1atp (chain A). To assure correct binding site superposition, the alignment was refined using only binding site residues, defined by all residues within 7 Å around the ATP molecule from the reference structure 1atp. Finally, in each experiment, the structures were assigned to either the target or off-target set.

General processing of binding pockets. The complete difference pocket calculation is based on protein pockets encoded as grids. First of all, pockets are calculated for each structure. Subsequently, the pockets of each set, i.e. target and off-target, are merged to a combined target and off-target pocket, respectively, and finally the two combined pockets are fused to one difference pocket (Figure 1). Pocket prediction: The ATP binding pocket is automatically detected using DoGSiteScorer,

46,56

a grid-based pocket detection method, with a grid spacing of 0.4 Å (Figure 1.b). The reference ligand of PDB structure 1atp is employed to automatically select the ATP-binding pocket out of multiple pockets usually found per structure. To ensure a perfect superposition of the grid points for later pocket fusion, pockets are calculated within a fixed grid box, spanned by the maximum dimension of all structures in the two sets. In addition to the information which points belong to a pocket, DoGSiteScorer returns a buriedness value for each grid point. Buriedness57 is calculated by scanning each grid point with seven rays and by counting the number of proteinsolvent-protein events within 10 Å of this point, normalized to a value between 0 and 0.7. Combined pocket calculation: Next, the pocket grids belonging to the target and off-target sets (Figure 1.c) are fused to a combined pocket each (called combined target pocket and combined off-target pocket). For every grid point in the combined pocket, the frequency, normalized to a value between 0 and 1, and the mean buriedness value in the respective set of pockets is

ACS Paragon Plus Environment

Page 7 of 42

Journal of Chemical Information and Modeling Volkamer et al. – Identification of kinase-specific subpockets

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

7

calculated. Thus, the two combined pockets describe the available binding site volume of the target and off-target ensembles, respectively.

Figure 1: Procedure to calculate combined and difference pockets. a) Two sets of starting structures: Target (green) and off-target (red) structures are separated and b) pockets are calculated for each structure. c) A combined pocket is derived for each set containing information about the frequency of each grid point as indicated by the blob sizes. d) Calculation of the difference pocket, which encodes similarities (grey points) and differences between the two combined pockets (target-specific: green; off-target specific: red). The size of the blobs represents the frequency of the respective grid points.

Difference pocket calculation: Finally, the two combined pocket grids are fused to one differe nce pocket (Figure 1.d) in which each grid point is assigned an appropriate frequency and buriedness value (Eq. 1): First, if the grid point belongs only to the off-target set, the frequency value is multiplied by -1, shifting the frequency value range to [-1,0[. Second, for grid points present in both sets and, thus, belonging to the common core, the frequency values are calculated as the difference of the frequencies of the ‘combined target-pocket’ and ‘combined off-target-pocket’ and normalized to the range [0, 1]. Values above 0.5 describe common areas that are more frequently present in the target structures, while values below 0.5 point to common areas that are more pronounced in the off-target structures. Third, if a grid point belongs only to the target set, the point’s frequency value is shifted by +1 to the range ]1,2]. The calculation is outlined in

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling Volkamer et al. – Identification of kinase-specific subpockets

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 42 8

Eq. 1, where freq(p) is the frequency of grid point p, freq(off-tar,p) is the frequency of point p in the combined off-target pocket and freq(tar,p) is the frequency of point p in the combined target pocket. The resulting frequency values range from -1 to 2.

𝑓𝑟𝑒𝑞(𝑜𝑓𝑓 − 𝑡𝑎𝑟, 𝑝) ∗ (−1) ∶ 𝑝 𝑏𝑒𝑙𝑜𝑛𝑔𝑠 𝑡𝑜 𝑐𝑜𝑚𝑏𝑖𝑛𝑒𝑑 𝑜𝑓𝑓 − 𝑡𝑎𝑟𝑔𝑒𝑡 𝑝𝑜𝑐𝑘𝑒𝑡 𝑓𝑡𝑝−𝑓𝑜𝑝

𝑓𝑟𝑒𝑞(𝑝) = {

2

+ 0.5 ∶ 𝑝 𝑏𝑒𝑙𝑜𝑛𝑔𝑠 𝑡𝑜 𝑏𝑜𝑡ℎ 𝑝𝑜𝑐𝑘𝑒𝑡𝑠

𝑓𝑟𝑒𝑞(𝑡𝑎𝑟, 𝑝) + 1 ∶ 𝑝 𝑏𝑒𝑙𝑜𝑛𝑔𝑠 𝑡𝑜 𝑐𝑜𝑚𝑏𝑖𝑛𝑒𝑑 𝑡𝑎𝑟𝑔𝑒𝑡 𝑝𝑜𝑐𝑘𝑒𝑡 Eq. 1

Visualization of combined and difference pockets The respective information of the three areas ‘target-specific’, ‘off-target-specific’, and ‘common core’ are visualized via PyMOL55 . For this, each pocket grid point is stored as pseudo atom in a special PDB file in which the position is described by the coordinates, the frequency is stored in the b-factor column, and the buriedness value is retained in the occupancy column. This format allows highlighting different information depending on the respective question in mind. Color coding: To visualize the static as well as the more flexible parts of the pocket, the combined pocket can be colored based on the frequency of the grid points in the pocket ensemble using a yellow–white–blue spectrum. Yellow describes rare parts, while blue describes the more frequent parts of the combined pocket (Figure 2.a). Similarly, the frequency of the individ ua l parts of the difference pocket are visualized using a red-white-green spectrum: Off-target specific (dark red, [-1, 0[), common core (red-white-green spectrum, ]0, 1[), and target-specific (dark green, ]1, 2]) (Figure 2.b). Overall, the use of the frequency information allows highlighting those parts of the binding pocket that are only present in a specific percentage of the structures. The visualization of the gradients is done using PyMOLs command spectrum applied to the b-factor values. Buried subpockets: The pocket border definition – the delineation from the solvent – is relative ly ambiguous.58 To reduce the noise from the automatic pocket detection, very solvent-exposed parts can be hidden from the visualization by applying a buriedness cut-off, ranging from 0 to 0.7.

ACS Paragon Plus Environment

Page 9 of 42

Journal of Chemical Information and Modeling Volkamer et al. – Identification of kinase-specific subpockets

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

9

Ligand focus zone: To concentrate on potential subpockets around a specific compound of interest, the analysis of the examples in this work is restricted to pocket points within 5 Å of the respective co-crystallized compound. Clustering: Furthermore, the grid points of the specific areas can be clustered using the k-means clustering function in Python’s scikit-learn library.59 To ensure that the resulting clusters are neither too large nor too small for the present examples, the number of clusters k (starting from k=1) is dynamically increased until the points in each cluster span a diameter of at most 4.0 Å, while final clusters with a volume lower than 10 ų are discarded.

Figure 2: Color coding of the frequency information of grid points. a) Combined pockets are color coded with a frequency spectrum yellow-white-blue (yellow: 0 %; white: 50 %; blue: 100 %). b) The difference pocket can be dissected in target (dark green) and off-target (dark red) specific areas. Furthermore, the common core is colored in a red-white-green spectrum (red: more frequently present in off-target structures, white: equally frequent in the target and off-target set, green: more frequently present in target structures).

Data sets: Kinases undergo large conformational rearrangements especially in the DFG-loop when changing from the active (DFG-in) to the inactive (DFG-out) state, which is reflected in major binding site differences.8,60 Thus, it is advisable to only compare structures from the same state. In the following, four retrospective test cases are introduced, three in DFG-in and one in DFG-out state, containing the used PDB structures as well as activities of the compounds of interest against the specific targets. A zip file containing all refined and aligned input PDB structures is provided as Supplementary Information. ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

Page 10 of 42

Volkamer et al. – Identification of kinase-specific subpockets

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

10

1. Mitogen-activated protein kinases (MAPKs): Six p38 and six Erk2 structures (Table 1), also used for validating BioGPS

44 , were

chosen and

processed as described above. However, in addition to the default 1atp-alignment, a second alignment was done using the PDB entry 1a9u of p38 as reference structure and employing all residues within a radius of 5.5 Å around the co-crystallized ligand. Each structure contained exactly one chain, thus, the 12 chains were used for the initial study. An indazole-based inhibitor (PDB identifier SB2) is known to bind selectively to p38α over Erk2.13 In a second analysis (named reduced set), structures bound to SB2 and its analogues were discarded (1a9u, 3erk and 1pme). Finally, for an in silico mutation study, only two apo structures (1wfc of p38 and 2erk of Erk2) were considered, whereat Lys53, Leu104 and Thr106 in 1wfc, and Lys52, Ile101 and Gln103 in 2erk were mutated to Ala using MOE without any further energy minimization.

Table 1: PDB structures of p38α and Erk2 in the MAPK data set.1 Kinase

pIC50 2

PDB codes and mutation(s)

3

SB2 p38

7.9

1a9u (SB2), 1di9 (MSQ), 1ouy (094), 1ove (358, C162S), 1oz1 (FPH), 1wfc

Erk2

5.0

1erk, 2erk, 3erk (SB4), 4erk (OLO), 1gol (ATP, K52R), 1pme (SB2, I103L, Q105T, D106H, E109G, T110A)

1

The data set contains a calibrated set of six p38 and six Erk2 structures obtained from ref.

2

Activity values are provided for the respective wild type kinase.13

3 In

44 .

parentheses, PDB identifier of co-crystallized ligand and single point mutations in the respective structure. PDB

structures not considered in the ‘reduced set’ are marked in italics. The two apo structures used in the Ala mutation experiment are underlined.

2. p21-activated kinases (PAKs): All human PAK1 and PAK4 structures were downloaded from the PDB (July 2015) and processed as described above. Only the DFG-in structures with relatively complete activatio n loops were extracted by considering the classification in MOE’s kinase suite as well as visual inspection. This resulted in 13 PAK1 and 23 PAK4 chains from 13 PAK1 and 19 PAK4 structures, respectively (Table 2). Selective PAK compounds include FRAX597 (PAK1

ACS Paragon Plus Environment

Page 11 of 42

Journal of Chemical Information and Modeling

Volkamer et al. – Identification of kinase-specific subpockets

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

11

selective; PDB identifier: XR1) and ‘compound 17’ (PAK4 selective; PDB identifie r : 2OL).50,51,61 In a second analysis, all PAK structures with a co-crystallized ligand occupying the selective backpocket were excluded (marked in italics in Table 2, named reduced set). In a third analysis, additionally, only the 2 PAK1 structures without a critical binding site mutatio n (K299R) were considered and only 4 PAK4 structures were used to equalize the data set (underlined in Table 2, named K299-wildtype set).

Table 2: PDB structures of PAK1 and PAK4 in the PAK data set. Kinase

PAK1

pIC50 1

PDB codes, chain identifiers and mutation(s)

XR1

2OL

8.1

5.3

2

1yhv*, 1yhw*, 2hy8*, 3fxz*, 3fy0*, 3q4z*, 3q52*, 3q53*, 4daw*, 4eqc* (XR1), 4o0r, 4o0t (2OL), 4p90

PAK4

5

8.1

2bva A/B, 2cdz, 2j0i, 2q0n, 2x4z, 4app, 4fif A/B, 4fig A/B, 4fih, 4fii, 4fij, 4jdh, 4jdi, 4jdj, 4jdk, 4l67, 4njd, 4o0v (2OL), 4o0x (2OQ), 4o0y (2OO)

1

Activity values are provided for the respective wild type kinase. 51

2

In parentheses, PDB identifier of co-crystallized ligands used in the current analysis . Note that 10 out of the 13

PAK1 structures have a K299R mutation and are marked with ‘*’. PDB structures not considered in the reduced set are marked in italics. Structures used in the K299-wildtype example are underlined.

3. Selective inhibition of ITK over AurA: All human ITK and AurA structures were downloaded from the PDB (July 2015) and processed as described above. Only the DFG-in structures with relatively complete activation loops were extracted by considering the classification in MOE’s kinase suite as well as visual inspectio n. 37 chains from 20 ITK structures were compared against 45 chains from 40 AurA structures (Table 3). The starting compound (compound 2 in ref.

52 , PDB

identifier: 2VU) is co-crystallized

to the ITK structure 4ppa (which was not used in this analysis since part of the G-loop is distorted) as well as to the AurA structure 4prj. The selective ITK inhibitor GNE-9822 (PDB identifier: 2W6) is co-crystallized to the ITK structure 4pqn.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling Volkamer et al. – Identification of kinase-specific subpockets

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 42 12

Table 3: PDB structures of ITK and AurA. Kinase

ITK

pIC50 52

PDB codes and chain identifiers

2VU

2W6

8.7

8.9

1

1sm2 A/B, 1snu A/B, 1snx A/B, 3miy A/B, 3mj1, 3mj2, 3qgw A/B, 3qgy A/B, 3t9t, 3v5l A-D, 3v8t A/B, 3v8w A/B, 4hct, 4hcu, 4hcv, 4kio A-D, 4l7s A/B, 4mf1 A/B, 4pp9 A/B, 4pqn (2W6)

AurA

6.8

6.0

1mq4, 1ol5, 1ol6, 1ol7, 2dwb, 2w1c, 2w1d, 2w1e, 2w1f, 2w1g, 2wtw, 2x6d, 2x6e, 2xne, 2xng, 2xru, 3e5a, 3ha6, 3myg, 3nrm, 3r21, 3uo4, 3uo5, 3uod, 3up2, 3up7, 4bn1, 4byi, 4c3p A/D, 4ceg, 4dea, 4deb, 4ded, 4dee, 4dhf A/B, 4j8n A-D, 4jaj, 4o0s, 4o0w, 4prj (2VU)

1

In parentheses, PDB identifier of co-crystallized ligands used in the current analysis.

4. Selective inhibition of BRAF over VEGFR2: All human BRAF and VEGFR2 structures were downloaded from the PDB (July 2015) and processed as described above. In this example, only the DFG-out structures with mostly complete activation loops were extracted by considering the classification in MOE’s kinase suite as well as visual inspection. This resulted in 22 chains from 12 BRAF structures and 31 chains from 22 VEGFR2 structures (Table 4). BRAF structure 4dbn and VEFGR2 structure 3vnt are co-crystallized with compound 162 (PDB identifier: 0JA). The BRAF selective inhibitor TAK632 (PDB identifier: 1SU) is co-crystallized to the BRAF structure 4ksp.53 Table 4: PDB structures of BRAF and VEGFR2. Kinase

BRAF

pIC50 53,62

PDB codes and chain identifiers 1

0JA

1SU

7.9

8.6

1uwh A/B, 1uwj A/B, 3c4c, 3idp A/B, 3ii5 A/B, 3q96, 4dbn (0JA), 4g9c, 4g9r A/B, 4jvg A/B/C/D,4ksp A/B

VEGFR2

8.6

6.8

(1SU),4ksq A/B 3vnt (0JA), 2oh4, 2p2i A/B, 2qu5, 2qu6 A/B, 2rl5, 2xir, 3b8q A/B, 3be2, 3cP9 A/B, 3cpb A/B, 3cpc A/B, 3dtw A/B, 3efl A/B, 3ewh, 3u6j, 3vhe, 3vhk, 4ag8, 4agc, 4agd, 4asd, 4ase

1

In parentheses, PDB identifier of co-crystallized ligands used in the current analysis.

ACS Paragon Plus Environment

Page 13 of 42

Journal of Chemical Information and Modeling

Volkamer et al. – Identification of kinase-specific subpockets

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

13

Results and Discussions Selectivity of compounds can be obtained by exploiting differences of the binding side shape, protein-ligand interactions, solvent interactions, and entropic contributions of the receptor.63 Here, we consider only differences of the shape and will show that this can already be useful to pinpoint selectivity determining subpockets. Given our aim to identify binding site areas that can be exploited to improve the selectivity of compounds (i.e. activity differences), we focused on identifying differences in the buried part of the respective binding sites. The solvent exposed parts were neglected by applying a buriedness cut-off. The target and the off-target specific points provide information about areas exclusively detected in one of the two sets. Additiona lly, the common core contains useful information regarding the predominance of target or off-target specific points in the respective common areas decoded in the frequency values. The benefit of the presented procedure to calculate selective subpockets will be demonstrated based on the following four retrospective kinase examples.

8,60

1. Mitogen activated protein kinases: p38 vs. Erk2 The pathways of mitogen activated protein kinases (MAPKs) are mediated by the extracellularsignal-regulation kinases (ERKs), c-Jun amino-terminal kinases (JNKs), and different isoforms of p38. ERKs are involved in the regulation of cell proliferation, differentiation, and apoptosis and are, in the case of aberrant regulation, involved in cancer and other human diseases. 64 Members of the p38 family are involved in the regulation of immune and inflamma tor y responses

65

and are, thus, potential targets for autoimmune diseases. 44 MAP kinases show

generally similar binding profiles47 which makes it difficult to dissect their exact biologica l function and, in the case of modulation by drugs, can result in unwanted side-effects. Nevertheless, some compound classes show a selective inhibition of p38  over Erk2.47 One example of a selective binder is a pyridinyl imidazole derivative reported by SmithK line Beecham

66

(SB2; Figure 3.a), which inhibits p38 (pIC50 = 7.9) but is inactive on Erk2 (pIC50

= 5.0; Table 1).13 While the two kinases p38 and Erk2 share a sequence identity of 56% in the binding site, a difference in the gatekeeper position is known to be one of the major determina nts for binding differences to the two kinases: the small threonine in p38 increases the size of the hydrophobic backpocket compared to the larger glutamine in Erk2. 47,67 Indeed, a penta-mutant of Erk2 (PDB code: 1pme), which mimics the binding pocket of p38 and contains the

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling Volkamer et al. – Identification of kinase-specific subpockets

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 42 14

Gln103Thr mutation of the gatekeeper residue (Table 1), makes Erk2 susceptible to pyridinyl imidazole-based

inhibitors of p38 (including SB2).66,67 The underlying protein-liga nd

interaction schemes are provided in Figure S1 in the Supplementary Information. Another reason why the SB2 compound is an interesting test case is that it has already been investigated in other binding site comparison studies e.g. Cavbase47 and BioGPS44 . For comparison, we employed the MAPK data set extracted for the BioGPS study44 (Table 1), which consists of six structures of p38 and Erk2, respectively. Protein alignment: A visual inspection of the respective X-ray structures (Figure 3) revealed a caveat with respect to the default 1atp-alignment (see Material and Methods). While the procedure nicely superimposed the C-lobes of the kinase structures including the activation loop and DFG-motif, it results in a slight shift of the hinge motif and the G-loop (Figure 3.a). In contrast, using 1a9u as reference for the alignment, which is p38α co-crystallized to SB2, yields a larger RMSD in the C-lobes, but a better superimposition of the N-lobes, resulting in an improved alignment of the SB2 compounds (Figure 3.b). Given that our analysis is focused on the binding site area covered by SB2 (i.e. the backpocket and the hinge region), we decided to base the analysis on the latter alignment procedure.

Figure 3: p38 structure (PDB code 1a9u) is shown in green and Erk2 structure (PDB code 1pme) in red; both structures are co-crystallized with SB2. Structures have been aligned using the PDB codes a) 1atp and b) 1a9u as reference structure.

ACS Paragon Plus Environment

Page 15 of 42

Journal of Chemical Information and Modeling

Volkamer et al. – Identification of kinase-specific subpockets

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

15

Difference pocket analysis: The difference pocket was calculated for the six p38 and six Erk2 structures aligned to 1a9u. As expected, the common core encompasses the binding position of SB2 (Figure 4.a). While a large part of the core is similarly populated in the target and off-target sets (white points in Figure 4.a), trends for target and off-target specific areas are also present in the core (light red and light green points in Figure 4.a). Restricting the visualization of the common core to only points that are 60% more frequent in one or the other set indicates that the buried part of the backpocket is rather p38-specific (blue circle in Figure 4.b), while an adjacent part close to the G-loop is rather Erk2-specific (orange circle in Figure 4.b). Considering the target and off-target specific areas alone (dark green and dark red points in Figure 4.c) does not reveal a pronounced target-specific area but also an off-target specific cluster close to the G-loop (orange circle). In this experiment, the protein complex structures of SB2 (including the rationally mutated 1pme structure) and of one of its analogues (SB4) were included in the analysis. To demonstrate the benefit of the approach in prospective experiments, we repeated the calculation without these structures (i.e. PDB codes 1pme, 1a9u, and 3erk were removed). Encouragingly, the specific backpocket in the active site of p38 became even more pronounced (Figure 4.d). The underlying reason is that points from the common core in the original set analysis that were more pronounced in the target structures moved to the target-specific area in the reduced set analysis. Overall, similar to previous studies on MAPKs, 44,47 our approach identified a p38-specific subpocket in proximity to the gatekeeper residue, which is nicely filled by the fluorophenyl ring of SB2. Besides the different gatekeeper residues in the backpocket, the conformations of a conserved Lys (Lys53 in p38, Lys52 in Erk2) at the tip of the G-loop is involved in forming this target-specific pocket. An Erk2 (off-target) specific area is predicted in proximity to the G-loop which could be rationalized by differences in the flexibility of the loop itself (e.g. different conformations of the Tyr35 in p38) as well as the Lys mentioned above. For the reduced set (Figure 4.d), two additional smaller target-specific (p38) clusters were identified, one close to the hinge and one close to the DFG-motif. The target-specific area close to the hinge is provoked by different orientations of an Ile (84/82 in p38/Erk2) as well as amino acid differences in the C-lobe, e.g. Ala157 in p38 compared to Leu157 in Erk2 (left light blue circle, Figure 4.d). The target-specific area close to the DFGmotif occurs due to the rather poor alignment within this area (Figure 3.b). Note that the

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling Volkamer et al. – Identification of kinase-specific subpockets

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 42 16

selectivity driving subpocket was found independent of the reference structure used for the alignment, i.e. 1atp and 1a9u (Figure S2 in the Supplementary Information).

Figure 4: p38 structure bound to the ligand SB2 (PDB code 1a9u). The calculation of the difference grids was done in a)-c) for the initial set of six p38 and six Erk2 structures and in d) for the ‘reduce set’ of nine structures in total. Shown are all grid points within 5 Å of SB2; blue circles point to target-specific and orange circles to off-target specific areas. a) Common core with the frequency color coded in a red-white-green spectrum. b) Common core restricted to those points that appear 60% more often in the target pockets (green) or the off-target pockets (red). c-d) Exclusively p38α-specific (dark green) and Erk2-specific (dark red) subpockets calculated in c) for the initial set and in d) for the reduced set. Alanine mutation study: An interesting question remains: Which residues contribute to the p38specific backpocket? The selectivity of SB2 has been mainly attributed so far to the size of the gatekeeper residue (Thr106 in p38; Gln103 in Erk2).47,67 Indeed, mutation of Thr106 to Gln renders the SB2 compound ineffective, most likely because the larger side chain blocks access

ACS Paragon Plus Environment

Page 17 of 42

Journal of Chemical Information and Modeling

Volkamer et al. – Identification of kinase-specific subpockets

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

17

to the backpocket.68 Nevertheless, mutation studies of Lisnock et al.68 suggest that besides Thr106 also other residues may contribute to selective binding. Similarly, our analysis suggested that different conformations of a conserved Lys residue behind the G-loop (Lys53 in 1a9u), and the presence of Ile in Erk2 but Leu in p38 (Leu104 in 1a9u) at the back of the backpocket also contribute to the selectivity (Figure 5.a). To quantify the contribution of these three amino acids to the p38-specific backpocket, the respective residues were incrementally mutated to Ala residues in an apo structure of p38 and Erk2, respectively. The difference pocket calculatio n was repeated for the structure pairs of the respective wildtype (Figure 5.a), the single mutant (Lys to Ala, Figure 5.b), the double mutant (Lys and Thr/Glu to Ala, Figure 5.c), and the triple mutant (Lys, Thr/Glu and Ile/Leu to Ala, Figure 5.d). The difference pocket of the wildtype apo structures contains a pronounced target (p38) specific subpocket (Figure 5.a). The deletion of the Lys side chain reduces the middle part of this p38-specific area and generates a few other p38-specific points close to the G-loop, due to a slightly different conformation of the flexib le G-loop in Erk2 (Figure 5.b, orange circle). The double mutation (including the gatekeeper residue mutation), further shrinks the p38-specific pocket (Figure 5.c). Finally, the triple mutation does not contain any target-specific areas in the backpocket (Figure 5.d). Note that two target-specific areas remain which are further away from the backpocket, located close to the hinge region and DFG-motif. Overall, the experiment indicates that all three residues have an impact on the p38-specific backpocket. Although this would require experimental validatio n, it nicely demonstrates that our approach can be easily combined with in silico Ala mutatio n studies to get a better understanding of selectivity determining features without running more time consuming free-energy decomposition analysis. Furthermore, the results indicate that our approach can provide valuable information beyond the sequence level using shape-based differences only.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling Volkamer et al. – Identification of kinase-specific subpockets

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 42 18

Figure 5: Influence of individual residues on the p38 -specific area in the hydrophobic backpocket (green points). p38 (PDB code 1wfc) is shown in green, Erk2 (PDB code 2erk) in red; residues of interest are rendered as sticks. Circles describe the area in which the points disappear after the respective Ala mutation. The respective calculation was done using a) apo structures, b) single mutant structures (K53 from p38α and K52 from Erk2 to Ala), c) double mutant structures (single mutant + T106 from p38α and Q103 from Erk2 to Ala), and d) triple mutant structures (double mutant + L104 from p38α and I101 from Erk2 to Ala).

2. p21-activated kinases: PAK1 vs. PAK4 P21-activated kinases (PAKs) play an important role in cell proliferation, survival, mobility and angiogenesis. Their direct role in several tumor subtypes via overexpression, amplification or activation, as well as their indirect role as key effector of established oncogenes, such as the RAS small monomeric GTPase,48 indicate that the different PAK isoforms, in particular PAK1

ACS Paragon Plus Environment

Page 19 of 42

Journal of Chemical Information and Modeling

Volkamer et al. – Identification of kinase-specific subpockets

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

19

and PAK4, are attractive therapeutic targets. 69–72 The PAK family is divided into two groups: the first one consists of PAK1-3 and the second one of PAK4-6. Sequence identity within both groups is above 80% and between the two groups around 54%.48 This suggests that - although some effectors of PAKs are overlapping between the two groups – the individual groups may recognize different substrates and govern distinct cellular processes.69 Several pan-PAK inhibitors have been identifed to date,73–75 many of them bind unselectively to PAK1 and PAK4.50,51 However, Afraxis reported in 2010 a series of highly selective group I inhibitors (FRAX48676 and FRAX59761 ), while Genentech disclosed in 2014 a selective PAK4 inhib itor (compound 1750 ). FRAX597 and compound 17 bind the DFG-motif and the C-helix in the active (‘in’) conformation, despite the fact that both compounds extend into the hydrophobic backpocket adjacent to the C-helix (Figure 6). See Figure S1 for protein-ligand interactio n schemes. The initial PAK data set (i.e. 13 PAK1 and 23 PAK4 chains, Table 2) revealed two distinct backpockets of PAK1 and PAK4 (e.g. considering the common core restricted to points with 60% predominance for one set as well as the specific areas with a frequency above 50%, data not shown). Note that the data set contains PAK1 and PAK4 structures co-crystallized with the Genentech compound (PDB codes: 4o0t and 4o0v), both in an almost identical binding mode protruding into the backpocket but with over 700-fold lower affinity to PAK1 than to PAK4 (Table 2). Similar to the p38/Erk2 example, we decided to exclude all structures with a compound bound to one of the backpockets (i.e. excluding the structures of the Genentech series 4o0t, 4o0v, 4o0x, 4o0y, and the Afraxis compound 4eqc). The calculation on the reduced set might actually resemble more a real live scenario where the complex structures would not have been resolved so far. After executing the procedure on the reduced set, again, points from the common core (that were predominant for PAK1 or PAK4, respectively) moved to the specific areas. The differe nce pocket of the reduced set revealed a pronounced PAK4-specific area (red points) occupied by the cyclohexanol moiety of the Genentech compound as well as a PAK1-specific area (green points) close to the thiazole ring of the Afraxis compound (Figure 6.a). The PAK1- and PAK4specific areas extend into the backpocket with different vectors: The PAK1-specific subpocket occupies the upper part of the backpocket, while the PAK4-specific subpocket occupies the lower one.48,50 The occurrence of the PAK4-specific backpocket is dominated by the orientatio n of a Met of the C-helix (Met319 in PAK1 and Met370 in PAK4, Figure 6.a, orange circle), ACS Paragon Plus Environment

Journal of Chemical Information and Modeling Volkamer et al. – Identification of kinase-specific subpockets

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 42 20

which points in the PAK1 structures predominately inside the backpocket and, thus, occupies part of this subpocket. The corresponding Met in PAK4 is more flexible and can adopt conformations that open the PAK4-specific backpocket. An explanation is that differences in the flexibility of the C-terminal turn of the C-helix in PAK4 compared to PAK1 exist due to sequence difference (Asn322 rigidifies the helix in PAK1 and projects the Met into the binding site, while a Tyr at the same position in PAK4 enables the displacement of the Met).48,50 However, the PAK4-specific backpocket can also open up in PAK1 (e.g. PDB code 4o0t) but with a high energy cost (pIC50 = 5.3). The PAK1-specific area in the upper part of the backpocket is only detected partly and lies slightly above the Afraxis compound (Figure 6.a, blue circle). A closer inspection of the used PDB structures revealed that 10 out of the 13 PAK1 structures have a K299R mutation in the Gloop region, introduced for crystallization reasons. The bulky side chain of Arg engages more space - compared to Lys - and thus, points into the binding site. Repeating the analysis without these mutated structures resulted in a PAK1-specific backpocket (Figure 6.b) matching the location of the thiazole ring of the Afraxis compound. The PAK1-specific subpocket is lined by three conserved residues (Lys299, Met301, and Met344 in PAK1, Lys350, Met352 and Met395 in PAK4) which occupy different conformations in PAK1 compared to PAK4: the three amino acids have a more open conformation in PAK1 but point further into the binding side in PAK4, which may be due differences in the flexibility of the G-loop in the two kinases.

Figure 6: PAK1 structure is shown in green and PAK4 structure in red. PDB codes: a) 4eqc and 4o0v, b) 4p90 and 4o0v. Only target-specific (dark green) as well as off-target specific (dark red) grid points present in at least 50% of the respective structures are visualized. The results ACS Paragon Plus Environment

Page 21 of 42

Journal of Chemical Information and Modeling

Volkamer et al. – Identification of kinase-specific subpockets

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

21

were obtained for structures a) without co-crystallized compounds that already explore the specific backpockets (i.e. 11 PAK1 vs. 20 PAK4 chains) and b) without the K299R mutation (i.e. 2 PAK1 vs. 4 PAK4 chains). 3. Selective inhibition of ITK over AurA Interleukin-2 inducible T-cell kinase (ITK) belongs to the Tec family of tyrosine kinases and is known to play a major role in T-cell signaling downstream of the T-cell receptor.77 Due to the role of ITK in allergic asthma and other inflammatory diseases, several attempts have been made to identify selective ITK compounds.52 For example, an indazole series from Genentech resulted in several hit compounds but with unwanted off-target activity against AurA (e.g. compounds 2 in ref.

52 ).

Using structure-guided design, a selectivity pocket above the ligand plane became

apparent and appropriate lipophilic substituents were explored to fill this pocket. This resulted into a tetrahydroindazole (GNE-9822 in ref.

52 ) with

a 660-fold higher activity against ITK over

AurA (Table 3). See Figure S1 for protein-ligand interaction schemes. We performed the difference analysis on a set of 37 chains from ITK and 45 chains from AurA (Table 3). First, the combined pockets for the two kinases were investigated. Considering pocket points present in at least 15% of the underlying structural ensemble, the known selective pocket above the ligand plane

52

became apparent (Figure 7.a and 7.b, orange circles). A major cause

for the difference is the gatekeeper Phe435 in ITK, which is a Leu210 in AurA. While this area has been identified manually before based on the inspection of single structures, our approach identified this area to be present in the majority of all available ITK structures (#37) at one glance. Furthermore, the difference pocket between the two sets – ITK and AurA – also points to this target-specific area above the ligand plane (Figure 7.c). Since the representation of all selective points may be overwhelming, we implemented a clustering procedure to highlight the most populated areas. This resulted into three clusters, one larger cluster located above the ligand plane and two smaller clusters behind the pyrazole ring of the co-crystallized compound (Figure 7.d). This suggests that an extension on the pyrazole ring of the tetrahydroindazole compound (see vector in Figure 7.d) could further enhance the selectivity of this compound.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling Volkamer et al. – Identification of kinase-specific subpockets

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 42 22

Figure 7: ITK structure (PDB code 4pqn) is shown in green and AurA structure (PDB code 4prj) in red. a-b) The combined pockets are shown in a yellow-white-blue color spectrum calculated a) based on 37 ITK chains and b) based on 45 AurA chains. Only points with a frequency above 15% in the respective ensemble and within 5 Å of any ligand atom are visualized. c) Calculated difference pocket with points that are present in more than 25% of the target-specific (dark green) and off-target specific (dark red) areas as well as points in the common core with a 60% predominance for one kinase (red-white-green). d) Target-specific clusters calculated for the points shown in c).

4. Selective inhibition of BRAF over VEGFR2 The RAF kinase family is known to play a fundamental role in cancer progression. 73 BRAF selective inhibitors such as vemurafenib and dabrafenib are used for treating melanoma (skin cancer) but also result in adverse effects.53 Unlike these DFG-in binders, an effective DFG-out type thiazolo[5,4-b]pyridine derivative (compound 1 in ref.

62 )

was developed that inhib its

BRAF (pIC50 = 7.9) and VEGFR2 kinases (pIC50 = 8.6).62 Compound 1 binds in an almost identical mode to BRAF (PDB code: 4dbn) and VEGFR2 (PDB code: 3vnt). See Figure S1 for protein-ligand interaction schemes. Thus, the question arose how this compound could be modified to become more selective towards BRAF. After identification of a BRAF-specific ACS Paragon Plus Environment

Page 23 of 42

Journal of Chemical Information and Modeling

Volkamer et al. – Identification of kinase-specific subpockets

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

23

conformation of the phenylalanine side chain in the DFG-motif, the compound was optimized into a compound called TAK-632 with a 67-fold improved BRAF activity over VEGFR2.53 To retrospectively validate our approach, all available human kinase DFG-out structures of BRAF and VEGFR2 (Table 4; 53 chains) were used for calculating a difference pocket. Generally, the BRAF structures have a larger pocket adjacent to the G-loop, explaining most of the upper target-specific points (data not shown). The clustering results point to a pronounced BRAF-specific subpocket at the phenylalanine of the DFG-motif (Figure 8). Phe1047 in VEGFR points inside this target-specific cluster, while the Phe594 in BRAF is twisted outwards in most of the underlying structures. This conformation allows binding of the compound 0JA to VEGFR2 (Figure 8, red) but not of TAK-632, which contains an attached cyano group at the C7-position (Figure 8, green), explaining the lower affinity of TAK-632 towards VEGFR2 (Table 4). The different conformations and flexibility of the Phe residue in BRAF and VEGFR2 may be explained by the adjacent residues which is Gly in BRAF and Cys in VEGFR2. Besides the target-specific subpocket, two off-target specific (VEGFR2) clusters popped up: One cluster is located in the left part, occupied by Phe582 in BRAF, and one cluster is located behind the DFG-motif, occupied by Leu513 in BRAF, whereas at both positions VEGFR2 has residues that open up more space (Leu1035 and Val899, respectively).

Figure 8: BRAF structure (PDB code 4ksp) shown in green and VEGFR2 structure (PDB code 3vnt) in red together with the respective target-specific (green) and off-target (red) specific clusters. The clusters were calculated based on points that are present in more than 25% of the target or off-target specific points as well as points in the common core with a 60% predominance for one kinase. ACS Paragon Plus Environment

Journal of Chemical Information and Modeling Volkamer et al. – Identification of kinase-specific subpockets

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 42 24

Conclusion We presented here an intuitive method to extract specific subpockets in a set of target structures compared to a set of off-target structures based on fusion of binding pocket grids. The approach first calculates for each set of structures the respective pocket imprint, called ‘combined pocket’. Each pocket grid point is assigned with a frequency value, which allows distinguishing the flexible pocket parts from the more rigid ones. Furthermore, the two combined pockets can be directly compared by calculating a ‘difference pocket’ that subdivides the available pocket space into three areas: the common core, the target-specific, and off-target specific areas. Overall, the resulting difference grids are easily interpretable and enable a visual inspection of multip le structures at one glance. The visualization of the grid points can be easily filtered by their frequency and buriedness values as well as by clustering of the grid points. The frequency information in the target-specific area, for example, allows highlighting those parts of the pocket that are present in all structures of the target set as well as those parts that are only present in a subset due to induced- fit adaptations to already explored ligands. The common core also provides useful information for improving the selectivity of compounds as it can point to those areas that are, although not unique, more highly populated in the target structures compared to the off-target ones. The benefit to explore the common core holds in particular true if different proteins are taken into account as off-targets. In turn, the buriedness information can be used to exclude the solvent exposed part of the binding site from the visualization. This might be necessary as the exact delimitation of the binding pocket from the solvent is error prone independently of whether manual or automatic detection procedures are used.58 However, solvent exposed parts are difficult to be exploited in ligand design due to their flexible nature, while buried (especially hydrophobic) subpockets usually contribute substantially to the affinity of compounds.79 Another advantage of the presented method to fuse binding pockets is that the binding mode and activity of only one hit compound is sufficient and that the approach, thus, does not require any profiling data for training purposes. Accordingly, the approach can also be applied for rather unexplored proteins, as long as 3D structures are available. The considered structures can be obtained from the PDB (including alternative conformations if present) or proprietary databases, but can also be generated via sampling approaches such as molecular dynamics (MD) simulations. The method enables to consider the entire conformational ensemble of the

ACS Paragon Plus Environment

Page 25 of 42

Journal of Chemical Information and Modeling

Volkamer et al. – Identification of kinase-specific subpockets

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

25

respective binding pockets and, thus, coping with protein flexibility comes into reach. Attempts to track pockets as generated by MD simulations already exist, e.g. TRAPP80 and MDPocket81 . However, their main application has so far been the identification of transient pockets or the analysis of the flexibility of binding sites rather than to identify selectivity determining subpockets between different proteins as done in the present study. Finally, not only specific subpockets between two sets of structures but also residues forming these subpockets can be identified. Thus, the approach can be easily combined with alanine mutation studies of these residues, thereby providing interesting insights on the selectivity driving features of the respective binding site. These features could be certain residues such as the gatekeeper residue in p38, but also particular conformations such as Met in the backpocket of the PAK structures. Although the presented method can be automatically applied, we would like to note that the structure selection should be done with caution. For example, it can be beneficial to remove structures with poor quality as well as mutated ones, as done e.g. in the present MAPK and PAK examples. Note also that the used alignment is a crucial step for the subsequent differe nce calculations. As discussed in the p38/Erk2 example, the alignment procedure of choice might depend on the binding site area of interest. The presented approach – applied here to identify target-specific subpockets in the ATP binding site of kinases - could also be applied for the analysis of potential allosteric sites as well as for other target classes with many homologue structures such as bromodomains or histone methyltransferases. Concluding, the method can be applied to detect similarities and dissimilarities between sets of hundreds of structures and, thus, can facilitate the design of more selective compounds.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

Page 26 of 42

Volkamer et al. – Identification of kinase-specific subpockets

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

26

Author Information *Corresponding

Authors:

Andrea

Volkamer

([email protected])

and

Simone

Fulle

([email protected]), BioMed X Innovation Center, Im Neuenheimer Feld 515, 69120 Heidelber g, Germany Author Contributions: The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript. Funding Sources: The research of the ‘Selective Kinase Inhibitor’ team at the BioMed X Innovatio n Center is kindly sponsored by Merck KGaA.

Abbreviations MIFs: molecular interaction fields; PCM: proteochemometric; MAPKs: mitogen activated protein kinases; PAKs: p21-activated kinases; ITK: IL-2 inducible T-cell kinase; AurA: Aurora kinase A; BRAF: rapidly accelerated fibrosarcoma kinase B; VEGFR2: vascular endothelial growth factor receptor kinase 2; KDR: kinase insert domain receptor; Erk2: extracellular-signal-regulation kinase 2; ATP: adenosine triphosphate; PDB: protein data bank, DFG: kinase-specific motif of three amino acids aspartic acid (D), phenylalanine (F) and glycine (G), the conformation of this motif is used to distinguish between the active (DFG-in) and inactive (DFG-out) kinase form.

Acknowledgment We thank Paul Finn, Mireille Krier, Daniel Kuhn, and Rebecca Wade for fruitful discussions. Furthermore, we thank BioSolveIT for providing a free license of the DoGSiteScorer.

Associated Content Supporting Information: Figure S1: Protein ligand interaction schemes for the four major target and off-target complex structures in the data sets. Figure S2: Specific subpockets obtained for p38α and Erk2 using 1atp as reference for the structural alignment. Additionally, a zip file containing all refined and aligned input PDB structures is provided. This material is available free of charge via the Internet at http://pubs.acs.org.

ACS Paragon Plus Environment

Page 27 of 42

Journal of Chemical Information and Modeling

Volkamer et al. – Identification of kinase-specific subpockets

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

27

References (1)

Kéri, G.; Őrfi, L.; Németh, G. Rational Drug Design of Kinase Inhibitors for Signal Transduction Therapy. In Protein Kinases as Drug Targets; Kleb, B., Muller, G., Hamacher, M., Eds.; Wiley-VCH Verlag GmbH & Co. KGaA, 2011; pp 85–114.

(2)

Wood, L. D.; Parsons, D. W.; Jones, S.; Lin, J.; Sjöblom, T.; Leary, R. J.; Shen, D.; Boca, S. M.; Barber, T.; Ptak, J.; Silliman, N.; Szabo, S.; Dezso, Z.; Ustyanksky, V.; Nikolskaya, T.; Nikolsky, Y.; Karchin, R.; Wilson, P. A.; Kaminker, J. S.; Zhang, Z.; Croshaw, R.; Willis, J.; Dawson, D.; Shipitsin, M.; Willson, J. K. V; Sukumar, S.; Polyak, K.; Park, B. H.; Pethiyagoda, C. L.; Pant, P. V. K.; Ballinger, D. G.; Sparks, A. B.; Hartigan, J.; Smith, D. R.; Suh, E.; Papadopoulos, N.; Buckhaults, P.; Markowitz, S. D.; Parmigiani, G.; Kinzler, K. W.; Velculescu, V. E.; Vogelstein, B. The Genomic Landscapes of Human Breast and Colorectal Cancers. Science 2007, 318, 1108–1113.

(3)

Lin, J.; Gan, C. M.; Zhang, X.; Jones, S.; Sjöblom, T.; Wood, L. D.; Parsons, D. W.; Papadopoulos, N.; Kinzler, K. W.; Vogelstein, B.; Parmigiani, G.; Velculescu, V. E. A Multidimensional Analysis of Genes Mutated in Breast and Colorectal Cancers. Genome Res. 2007, 17, 1304–1318.

(4)

Futreal, P. A.; Coin, L.; Marshall, M.; Down, T.; Hubbard, T.; Wooster, R.; Rahman, N.; Stratton, M. R. A Census of Human Cancer Genes. Nat. Rev. Cancer 2004, 4, 177–183.

(5)

Cohen, P. Protein Kinases - the Major Drug Targets of the Twenty-First Century? Nat. Rev. Drug Discov. 2002, 1, 309–315.

(6)

Bento, A. P.; Gaulton, A.; Hersey, A.; Bellis, L. J.; Chambers, J.; Davies, M.; Krüger, F. A.; Light, Y.; Mak, L.; McGlinchey, S.; Nowotka, M.; Papadatos, G.; Santos, R.; Overington, J. P. The ChEMBL Bioactivity Database: An Update. Nucleic Acids Res. 2014, 42, D1083–D1090.

(7)

Fedorov, O.; Müller, S.; Knapp, S. The (un)targeted Cancer Kinome. Nat. Chem. Biol. 2010, 6, 166–169.

(8)

Volkamer, A.; Eid, S.; Turk, S.; Jaeger, S.; Rippmann, F.; Fulle, S. Pocketome of Human Kinases: Prioritizing the ATP Binding Sites of (yet) Untapped Protein Kinases for Drug Discovery. J. Chem. Inf. Model. 2015, 55, 538–549.

(9)

Berman, H. M. The Protein Data Bank. Nucleic Acids Res. 2000, 28, 235–242.

(10)

Karaman, M. W.; Herrgard, S.; Treiber, D. K.; Gallant, P.; Atteridge, C. E.; Campbell, B. T.; Chan, K. W.; Ciceri, P.; Davis, M. I.; Edeen, P. T.; Faraoni, R.; Floyd, M.; Hunt, J. P.; Lockhart, D. J.; Milanov, Z. V; Morrison, M. J.; Pallares, G.; Patel, H. K.; Pritchard, S.; Wodicka, L. M.; Zarrinkar, P. P. A Quantitative Analysis of Kinase Inhibitor Selectivity. Nat. Biotechnol. 2008, 26, 127–132.

(11)

Fedorov, O.; Marsden, B.; Pogacic, V.; Rellos, P.; Müller, S.; Bullock, A. N.; Schwaller, J.; Sundström, M.; Knapp, S. A Systematic Interaction Map of Validated Kinase Inhibitors with Ser/Thr Kinases. Proc. Natl. Acad. Sci. U. S. A. 2007, 104, 20523–20528.

(12)

Anastassiadis, T.; Deacon, S. W.; Devarajan, K.; Ma, H.; Peterson, J. R. Comprehensive Assay of Kinase Catalytic Activity Reveals Features of Kinase Inhibitor Selectivit y. Nat. Biotechnol. 2011, 29, 1039–1045.

(13)

Davis, M. I.; Hunt, J. P.; Herrgard, S.; Ciceri, P.; Wodicka, L. M.; Pallares, G.; Hocker, M.;

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling Volkamer et al. – Identification of kinase-specific subpockets

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 42 28

Treiber, D. K.; Zarrinkar, P. P. Comprehensive Analysis of Kinase Inhibitor Selectivity. Nat. Biotechnol. 2011, 29, 1046–1051. (14)

Metz, J. T.; Johnson, E. F.; Soni, N. B.; Merta, P. J.; Kifle, L.; Hajduk, P. J. Navigating the Kinome. Nat. Chem. Biol. 2011, 7, 200–202.

(15)

Dranchak, P.; MacArthur, R.; Guha, R.; Zuercher, W. J.; Drewry, D. H.; Auld, D. S.; Inglese, J. Profile of the GSK Published Protein Kinase Inhibitor Set across ATPDependent and-Independent Luciferases: Implications for Reporter-Gene Assays. PLoS One 2013, 8, e57888.

(16)

Morphy, R. Selectively Nonselective Kinase Inhibition: Striking the Right Balance. J. Med. Chem. 2010, 53, 1413–1437.

(17)

Lorusso, P. M.; Eder, J. P. Therapeutic Potential of Novel Selective-Spectrum Kinase Inhibitors in Oncology. Expert Opin. Investig. Drugs 2008, 17, 1013–1028.

(18)

Bamborough, P. System-Based Drug Discovery within the Human Kinome. Expert Opin. Drug Discov. 2012, 7, 1053–1070.

(19)

Scapin, G. Protein Kinase Inhibition: Different Approaches to Selective Inhibitor Design. Curr. Drug Targets 2006, 7, 1443–1454.

(20)

Azzaoui, K.; Hamon, J.; Faller, B.; Whitebread, S.; Jacoby, E.; Bender, A.; Jenkins, J. L.; Urban, L. Modeling Promiscuity Based on in Vitro Safety Pharmacology Profiling Data. ChemMedChem 2007, 2, 874–880.

(21)

Arrowsmith, C. H.; Audia, J. E.; Austin, C.; Baell, J.; Bennett, J.; Blagg, J.; Bountra, C.; Brennan, P. E.; Brown, P. J.; Bunnage, M. E.; Buser-Doepner, C.; Campbell, R. M.; Carter, A. J.; Cohen, P.; Copeland, R. A.; Cravatt, B.; Dahlin, J. L.; Dhanak, D.; Edwards, A. M.; Frye, S. V; Gray, N.; Grimshaw, C. E.; Hepworth, D.; Howe, T.; Huber, K. V. M.; Jin, J.; Knapp, S.; Kotz, J. D.; Kruger, R. G.; Lowe, D.; Mader, M. M.; Marsden, B.; MuellerFahrnow, A.; Müller, S.; O’Hagan, R. C.; Overington, J. P.; Owen, D. R.; Rosenberg, S. H.; Roth, B.; Ross, R.; Schapira, M.; Schreiber, S. L.; Shoichet, B.; Sundström, M.; SupertiFurga, G.; Taunton, J.; Toledo-Sherman, L.; Walpole, C.; Walters, M. A.; Willson, T. M.; Workman, P.; Young, R. N.; Zuercher, W. J. The Promise and Peril of Chemical Probes. Nat. Chem. Biol. 2015, 11, 536–541.

(22)

Uitdehaag, J. C. M.; Verkaar, F.; Alwan, H.; de Man, J.; Buijsman, R. C.; Zaman, G. J. R. A Guide to Picking the Most Selective Kinase Inhibitor Tool Compounds for Pharmacological Validation of Drug Targets. Br. J. Pharmacol. 2012, 166, 858–876.

(23)

Goodford, P. J. A Computational Procedure for Determining Energetically Favorable Binding Sites on Biologically Important Macromolecules. J. Med. Chem. 1985, 28, 849– 857.

(24)

Reynolds, C. A.; Wade, R. C.; Goodford, P. J. Identifying Targets for Bioreductive Agents: Using GRID to Predict Selective Binding Regions of Proteins. J. Mol. Graph. 1989, 7, 103– 108, 100.

(25)

Gohlke, H.; Klebe, G. DrugScore Meets CoMFA: Adaptation of Fields for Molecular Comparison (AFMoC) or How to Tailor Knowledge-Based Pair-Potentials to a Particular Protein. J. Med. Chem. 2002, 45, 4153–4170.

(26)

Hillebrecht, A.; Supuran, C. T.; Klebe, G. Integrated Approach Using Protein and Ligand Information to Analyze Selectivity- and Affinity-Determining Features of Carbonic ACS Paragon Plus Environment

Page 29 of 42

Journal of Chemical Information and Modeling

Volkamer et al. – Identification of kinase-specific subpockets

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

29

Anhydrase Isozymes. ChemMedChem 2006, 1, 839–853. (27)

Ferrario, V.; Siragusa, L.; Ebert, C.; Baroni, M.; Foscato, M.; Cruciani, G.; Gardossi, L. BioGPS Descriptors for Rational Engineering of Enzyme Promiscuity and Structure Based Bioinformatic Analysis. PLoS One 2014, 9, e109354.

(28)

Lionta, E.; Spyrou, G.; Vassilatis, D. K.; Cournia, Z. Structure-Based Virtual Screening for Drug Discovery: Principles, Applications and Recent Advances. Curr. Top. Med. Chem. 2014, 14, 1923–1938.

(29)

Weber, J.; Achenbach, J.; Moser, D.; Proschak, E. VAMMPIRE: A Matched Molecular Pairs Database for Structure-Based Drug Design and Optimization. J. Med. Chem. 2013, 56, 5203–5207.

(30)

Posy, S. L.; Claus, B. L.; Pokross, M. E.; Johnson, S. R. 3D Matched Pairs: Integrating Ligand- and Structure-Based Knowledge for Ligand Design and Receptor Annotation. J. Chem. Inf. Model. 2013, 53, 1576–1588.

(31)

Hu, Y.; Bajorath, J. Systematic Assessment of Molecular Selectivity at the Level of Targets, Bioactive Compounds, and Structural Analogues. ChemMedChem 2015.

(32)

Mei, D.; Yin, Y.; Wu, F.; Cui, J.; Zhou, H.; Sun, G.; Jiang, Y.; Feng, Y. Discovery of Potent and Selective Urea-Based ROCK Inhibitors: Exploring the Inhibitor’s Potency and ROCK2/PKA Selectivity by 3D-QSAR, Molecular Docking and Molecular Dynamics Simulations. Bioorg. Med. Chem. 2015, 23, 2505–2517.

(33)

Subramanian, V.; Prusis, P.; Pietilä, L.-O.; Xhaard, H.; Wohlfahrt, G. Visually Interpretable Models of Kinase Selectivity Related Features Derived from Field-Based Proteochemometrics. J. Chem. Inf. Model. 2013, 53, 3021–3030.

(34)

Paricharak, S. S.; Cortés-Ciriano, I. I.; IJzerman, A. A.; Malliavin, T. T.; Bender, A. A. Proteochemometric Modelling Coupled to in Silico Target Prediction: An Integrated Approach for the Simultaneous Prediction of Polypharmacology and Binding Affinity/potency of Small Molecules. J. Cheminform. 2015, 7, 15.

(35)

Kooistra, A. J.; Kanev, G. K.; van Linden, O. P. J.; Leurs, R.; de Esch, I. J. P.; de Graaf, C. KLIFS: A Structural Kinase-Ligand Interaction Database. Nucleic Acids Res. 2015.

(36)

Méndez-Lucio, O.; Kooistra, A. J.; de Graaf, C.; Bender, A.; Medina-Franco, J. L. Analyzing Multitarget Activity Landscapes Using Protein-Ligand Interaction Fingerprints: Interaction Cliffs. J. Chem. Inf. Model. 2015, 55, 251–262.

(37)

Schmitt, S.; Kuhn, D.; Klebe, G. A New Method to Detect Related Function among Proteins Independent of Sequence and Fold Homology. J. Mol. Biol. 2002, 323, 387–406.

(38)

Shulman-Peleg, A.; Nussinov, R.; Wolfson, H. J. Recognition of Functional Sites in Protein Structures. J. Mol. Biol. 2004, 339, 607–633.

(39)

Weill, N.; Rognan, D. Alignment-Free Ultra-High- Throughput Comparison of Druggable Protein-Ligand Binding Sites. J. Chem. Inf. Model. 2010, 50, 123–135.

(40)

Cross, S.; Baroni, M.; Carosati, E.; Benedetti, P.; Clementi, S. FLAP: GRID Molecular Interaction Fields in Virtual Screening. Validation Using the DUD Data Set. J. Chem. Inf. Model. 2010, 50, 1442–1450.

(41)

Von Behren, M. M.; Volkamer, A.; Henzler, A. M.; Schomburg, K. T.; Urbaczek, S.; Rarey, M. Fast Protein Binding Site Comparison via an Index-Based Screening ACS Paragon Plus Environment

Journal of Chemical Information and Modeling Volkamer et al. – Identification of kinase-specific subpockets

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 42 30

Technology. J. Chem. Inf. Model. 2013, 53, 411–422. (42)

Naumann, T.; Matter, H. Structural Classification of Protein Kinases Using 3D Molecular Interaction Field Analysis of Their Ligand Binding Sites: Target Family Landscapes. J. Med. Chem. 2002, 45, 2366–2378.

(43)

Hoppe, C.; Steinbeck, C.; Wohlfahrt, G. Classification and Comparison of Ligand-Binding Sites Derived from Grid-Mapped Knowledge-Based Potentials. J. Mol. Graph. Model. 2006, 24, 328–340.

(44)

Siragusa, L.; Cross, S.; Baroni, M.; Goracci, L.; Cruciani, G. BioGPS: Navigating Biological Space to Predict Polypharmacology, off-Targeting, and Selectivity. Proteins 2015, 83, 517–532.

(45)

Lounkine, E.; Keiser, M. J.; Whitebread, S.; Mikhailov, D.; Hamon, J.; Jenkins, J. L.; Lavan, P.; Weber, E.; Doak, A. K.; Côté, S.; Shoichet, B. K.; Urban, L. Large-Scale Prediction and Testing of Drug Activity on Side-Effect Targets. Nature 2012, 486, 361– 367.

(46)

Volkamer, A.; Kuhn, D.; Grombacher, T.; Rippmann, F.; Rarey, M. Combining Global and Local Measures for Structure-Based Druggability Predictions. J. Chem. Inf. Model. 2012, 52, 360–372.

(47)

Kuhn, D.; Weskamp, N.; Hüllermeier, E.; Klebe, G. Functional Classification of Protein Kinase Binding Sites Using Cavbase. ChemMedChem 2007, 2, 1432–1447.

(48)

Rudolph, J.; Crawford, J. J.; Hoeflich, K. P.; Wang, W. Inhibitors of p21-Activated Kinases (PAKs). J. Med. Chem. 2015, 58, 111–129.

(49)

Crawford, J. J.; Lee, W.; Aliagas, I.; Mathieu, S.; Hoeflich, K. P.; Zhou, W.; Wang, W.; Rouge, L.; Murray, L.; La, H.; Liu, N.; Fan, P. W.; Cheong, J.; Heise, C. E.; Ramaswamy, S.; Mintzer, R.; Liu, Y.; Chao, Q.; Rudolph, J. Structure-Guided Design of Group I Selective p21-Activated Kinase Inhibitors. J. Med. Chem. 2015, 58, 5121–5136.

(50)

Staben, S. T.; Feng, J. A.; Lyle, K.; Belvin, M.; Boggs, J.; Burch, J. D.; Chua, C.; Cui, H.; DiPasquale, A. G.; Friedman, L. S.; Heise, C.; Koeppen, H.; Kotey, A.; Mintzer, R.; Oh, A.; Roberts, D. A.; Rouge, L.; Rudolph, J.; Tam, C.; Wang, W.; Xiao, Y.; Young, A.; Zhang, Y.; Hoeflich, K. P. Back Pocket Flexibility Provides Group II p21-Activated Kinase (PAK) Selectivity for Type I 1/2 Kinase Inhibitors. J. Med. Chem. 2014, 57, 1033–1045.

(51)

Karpov, A. S.; Amiri, P.; Bellamacina, C.; Bellance, M.-H.; Breitenstein, W.; Daniel, D.; Denay, R.; Fabbro, D.; Fernandez, C.; Galuba, I.; Guerro-Lagasse, S.; Gutmann, S.; Hinh, L.; Jahnke, W.; Klopp, J.; Lai, A.; Lindvall, M. K.; Ma, S.; Möbitz, H.; Pecchi, S.; Rummel, G.; Shoemaker, K.; Trappe, J.; Voliva, C.; Cowan-Jacob, S. W.; Marzinzik, A. L. Optimization of a Dibenzodiazepine Hit to a Potent and Selective Allosteric PAK1 Inhibitor. ACS Med. Chem. Lett. 2015, 6, 776–781.

(52)

Burch, J. D.; Lau, K.; Barker, J. J.; Brookfield, F.; Chen, Y.; Chen, Y.; Eigenbrot, C.; Ellebrandt, C.; Ismaili, M. H. A.; Johnson, A.; Kordt, D.; MacKinnon, C. H.; McEwan, P. A.; Ortwine, D. F.; Stein, D. B.; Wang, X.; Winkler, D.; Yuen, P.-W.; Zhang, Y.; Zarrin, A. A.; Pei, Z. Property- and Structure-Guided Discovery of a Tetrahydroindazole Series of Interleukin-2 Inducible T-Cell Kinase Inhibitors. J. Med. Chem. 2014, 57, 5714–5727.

(53)

Okaniwa, M.; Hirose, M.; Arita, T.; Yabuki, M.; Nakamura, A.; Takagi, T.; Kawamoto, T.; Uchiyama, N.; Sumita, A.; Tsutsumi, S.; Tottori, T.; Inui, Y.; Sang, B.-C.; Yano, J.; ACS Paragon Plus Environment

Page 31 of 42

Journal of Chemical Information and Modeling

Volkamer et al. – Identification of kinase-specific subpockets

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

31

Aertgeerts, K.; Yoshida, S.; Ishikawa, T. Discovery of a Selective Kinase Inhibitor (TAK632) Targeting Pan-RAF Inhibition: Design, Synthesis, and Biological Evaluation of C-7Substituted 1,3-Benzothiazole Derivatives. J. Med. Chem. 2013, 56, 6478–6494. (54)

Chemical Computing Group Inc. Molecular Operating Environment (MOE). Sci. Comput. Instrum. 2004, 22, 32.

(55)

DeLano, W. Pymol: An Open-Source Molecular Graphics Tool. CCP4 Newsl. Protein Crystallogr. 2002, 40, 44–53.

(56)

Volkamer, A.; Griewel, A.; Grombacher, T.; Rarey, M. Analyzing the Topology of Active Sites: On the Prediction of Pockets and Subpockets. J. Chem. Inf. Model. 2010, 50, 2041– 2052.

(57)

Hendlich, M.; Rippmann, F.; Barnickel, G. LIGSITE: Automatic and Efficient Detection of Potential Small Molecule-Binding Sites in Proteins. J. Mol. Graph. Model. 1997, 15, 359– 363.

(58)

Volkamer, A.; Rarey, M. Exploiting Structural Information for Drug-Target Assessment. Future Med. Chem. 2014, 6, 319–331.

(59)

Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; Vanderplas, J.; Passos, A.; Cournapeau, D.; Brucher, M.; Perrot, M.; Duchesnay, É. Scikit-Learn: Machine Learning in Python. … Mach. Learn. … 2012, 12, 2825–2830.

(60)

Brooijmans, N.; Chang, Y.-W.; Mobilio, D.; Denny, R. A.; Humblet, C. An Enriched Structural Kinase Database to Enable Kinome-Wide Structure-Based Analyses and Drug Discovery. Protein Sci. 2010, 19, 763–774.

(61)

Licciulli, S.; Maksimoska, J.; Zhou, C.; Troutman, S.; Kota, S.; Liu, Q.; Duron, S.; Campbell, D.; Chernoff, J.; Field, J.; Marmorstein, R.; Kissil, J. L. FRAX597, a Small Molecule Inhibitor of the p21-Activated Kinases, Inhibits Tumorigenesis of Neurofibromatosis Type 2 (NF2)-Associated Schwannomas. J. Biol. Chem. 2013, 288, 29105–29114.

(62)

Okaniwa, M.; Hirose, M.; Imada, T.; Ohashi, T.; Hayashi, Y.; Miyazaki, T.; Arita, T.; Yabuki, M.; Kakoi, K.; Kato, J.; Takagi, T.; Kawamoto, T.; Yao, S.; Sumita, A.; Tsutsumi, S.; Tottori, T.; Oki, H.; Sang, B. C.; Yano, J.; Aertgeerts, K.; Yoshida, S.; Ishikawa, T. Design and Synthesis of Novel DFG-Out RAF/vascular Endothelial Growth Factor Receptor 2 (VEGFR2) Inhibitors. 1. Exploration of [5,6]-Fused Bicyclic Scaffolds. J. Med. Chem. 2012, 55, 3452–3478.

(63)

Huggins, D. J.; Sherman, W.; Tidor, B. Rational Approaches to Improving Selectivity in Drug Design. J. Med. Chem. 2012, 55, 1424–1444.

(64)

Peti, W.; Page, R. Molecular Basis of MAP Kinase Regulation. Protein Sci. 2013, 22, 1698–1710.

(65)

Kumar, S.; Boehm, J.; Lee, J. C. p38 MAP Kinases: Key Signalling Molecules as Therapeutic Targets for Inflammatory Diseases. Nat. Rev. Drug Discov. 2003, 2, 717–726.

(66)

Wang, Z.; Canagarajah, B. J.; Boehm, J. C.; Kassisà, S.; Cobb, M. H.; Young, P. R.; AbdelMeguid, S.; Adams, J. L.; Goldsmith, E. J. Structural Basis of Inhibitor Selectivity in MAP Kinases. Structure 1998, 6, 1117–1128.

(67)

Fox, T.; Coll, J. T.; Xie, X.; Ford, P. J.; Germann, U. A.; Porter, M. D.; Pazhanisamy, S.; ACS Paragon Plus Environment

Journal of Chemical Information and Modeling Volkamer et al. – Identification of kinase-specific subpockets

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 42 32

Fleming, M. A.; Galullo, V.; Su, M. S.; Wilson, K. P. A Single Amino Acid Substitution Makes ERK2 Susceptible to Pyridinyl Imidazole Inhibitors of p38 MAP Kinase. Protein Sci. 1998, 7, 2249–2255. (68)

Lisnock, J.; Tebben, A.; Frantz, B.; O’Neill, E. A.; Croft, G.; O’Keefe, S. J.; Li, B.; Hacker, C.; de Laszlo, S.; Smith, A.; Libby, B.; Liverton, N.; Hermes, J.; LoGrasso, P. Molecular Basis for p38 Protein Kinase Inhibitor Specificity. Biochemistry 1998, 37, 16573–16581.

(69)

Ye, D. Z.; Field, J. PAK Signaling in Cancer. Cellular Logistics. 2012, pp 105–116.

(70)

Kumar, R.; Gururaj, A. E.; Barnes, C. J. p21-Activated Kinases in Cancer. Nat. Rev. Cancer 2006, 6, 459–471.

(71)

Ma, Q.-L.; Yang, F.; Frautschy, S. A.; Cole, G. M. PAK in Alzheimer Disease, Huntington Disease and X-Linked Mental Retardation. Cellular Logistics. 2012, pp 117–125.

(72)

Radu, M.; Semenova, G.; Kosoff, R.; Chernoff, J. PAK Signalling during the Development and Progression of Cancer. Nat. Rev. Cancer 2014, 14, 13–25.

(73)

Coleman, N.; Kissil, J. Recent Advances in the Development of p21-Activated Kinase Inhibitors. Cellular Logistics. 2012, pp 132–135.

(74)

Murray, B. W.; Guo, C.; Piraino, J.; Westwick, J. K.; Zhang, C.; Lamerdin, J.; Dagostino, E.; Knighton, D.; Loi, C.-M.; Zager, M.; Kraynov, E.; Popoff, I.; Christensen, J. G.; Martinez, R.; Kephart, S. E.; Marakovits, J.; Karlicek, S.; Bergqvist, S.; Smeal, T. SmallMolecule p21-Activated Kinase Inhibitor PF-3758309 Is a Potent Inhibitor of Oncogenic Signaling and Tumor Growth. Proc. Natl. Acad. Sci. U. S. A. 2010, 107, 9446–9451.

(75)

Guo, C.; McAlpine, I.; Zhang, J.; Knighton, D. D.; Kephart, S.; Johnson, M. C.; Li, H.; Bouzida, D.; Yang, A.; Dong, L.; Marakovits, J.; Tikhe, J.; Richardson, P.; Guo, L. C.; Kania, R.; Edwards, M. P.; Kraynov, E.; Christensen, J.; Piraino, J.; Lee, J.; Dagostino, E.; Del-Carmen, C.; Deng, Y.-L.; Smeal, T.; Murray, B. W. Discovery of Pyrroloaminopyrazoles as Novel PAK Inhibitors. J. Med. Chem. 2012, 55, 4728–4739.

(76)

Dolan, B. M.; Duron, S. G.; Campbell, D. A.; Vollrath, B.; Shankaranarayana Rao, B. S.; Ko, H.-Y.; Lin, G. G.; Govindarajan, A.; Choi, S.-Y.; Tonegawa, S. Rescue of Fragile X Syndrome Phenotypes in Fmr1 KO Mice by the Small-Molecule PAK Inhibitor FRAX486. Proc. Natl. Acad. Sci. U. S. A. 2013, 110, 5671–5676.

(77)

Felices, M.; Falk, M.; Kosaka, Y.; Berg, L. J. Tec Kinases in T Cell and Mast Cell Signaling. Advances in Immunology. 2007, pp 145–184.

(78)

Maurer, G.; Tarkowski, B.; Baccarini, M. Raf Kinases in Cancer-Roles and Therapeutic Opportunities. Oncogene 2011, 30, 3477–3488.

(79)

Bissantz, C.; Kuhn, B.; Stahl, M. A Medicinal Chemist’s Guide to Molecular Interactions. J. Med. Chem. 2010, 53, 5061–5084.

(80)

Kokh, D. B.; Richter, S.; Henrich, S.; Czodrowski, P.; Rippmann, F.; Wade, R. C. TRAPP: A Tool for Analysis of Transient Binding Pockets in Proteins. J. Chem. Inf. Model. 2013, 53, 1235–1252.

(81)

Schmidtke, P.; Bidon-Chanal, A.; Luque, F. J.; Barril, X. MDpocket: Open-Source Cavity Detection and Characterization on Molecular Dynamics Trajectories. Bioinformatics 2011, 27, 3276–3285.

ACS Paragon Plus Environment

Page 33 of 42

Journal of Chemical Information and Modeling

Volkamer et al. – Identification of kinase-specific subpockets

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

For Table of Contents Only

ACS Paragon Plus Environment

33

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 1: Procedure to calculate combined and difference pockets. a) Two sets of starting structures: Target (green) and off-target (red) structures are separated and b) pockets are calculated for each structure. c) A combined pocket is derived for each set containing information about the frequency of each grid point as indicated by the blob sizes. d) Calculation of the difference pocket, which encodes similarities (grey points) and differences between the two combined pockets (target-specific: green; off-target specific: red). The size of the blobs represents the frequency of the respective grid points. 266x134mm (96 x 96 DPI)

ACS Paragon Plus Environment

Page 34 of 42

Page 35 of 42

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 2: Color coding of the frequency information of grid points. a) Combined pockets are color coded with a frequency spectrum yellow-white-blue (yellow: 0 %; white: 50 %; blue: 100 %). b) The difference pocket can be dissected in target (dark green) and off-target (dark red) specific areas. Furthermore, the common core is colored in a red-white-green spectrum (red: more frequently present in off-target structures, white: equally frequent in the target and off-target set, green: more frequently present in target structures). 255x163mm (96 x 96 DPI)

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3: p38α structure (PDB code 1a9u) is shown in green and Erk2 structure (PDB code 1pme) in red; both structures are co-crystallized with SB2. Structures have been aligned using the PDB codes a) 1atp and b) 1a9u as reference structure. 386x182mm (96 x 96 DPI)

ACS Paragon Plus Environment

Page 36 of 42

Page 37 of 42

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 4: p38α structure bound to the ligand SB2 (PDB code 1a9u). The calculation of the difference grids was done in a)-c) for the initial set of six p38α and six Erk2 structures and in d) for the ‘reduce set’ of nine structures in total. Shown are all grid points within 5 Å of SB2; blue circles point to target-specific and orange circles to off-target specific areas. a) Common core with the frequency color coded in a red-whitegreen spectrum. b) Common core restricted to those points that appear 60% more often in the target pockets (green) or the off-target pockets (red). c-d) Exclusively p38α-specific (dark green) and Erk2-specific (dark red) subpockets calculated in c) for the initial set and in d) for the reduced set. 321x234mm (96 x 96 DPI)

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 5: Influence of individual residues on the p38α-specific area in the hydrophobic backpocket (green points). p38α (PDB code 1wfc) is shown in green, Erk2 (PDB code 2erk) in red; residues of interest are rendered as sticks. Circles describe the area in which the points disappear after the respective Ala mutation. The respective calculation was done using a) apo structures, b) single mutant structures (K53 from p38α and K52 from Erk2 to Ala), c) double mutant structures (single mutant + T106 from p38α and Q103 from Erk2 to Ala), and d) triple mutant structures (double mutant + L104 from p38α and I101 from Erk2 to Ala). 284x233mm (96 x 96 DPI)

ACS Paragon Plus Environment

Page 38 of 42

Page 39 of 42

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 6: PAK1 structure is shown in green and PAK4 structure in red. PDB codes: a) 4eqc and 4o0v, b) 4p90 and 4o0v. Only target-specific (dark green) as well as off-target specific (dark red) grid points present in at least 50% of the respective structures are visualized. The results were obtained for structures a) without co-crystallized compounds that already explore the specific backpockets (i.e. 11 PAK1 vs. 20 PAK4 chains) and b) without the K299R mutation (i.e. 2 PAK1 vs. 4 PAK4 chains). 387x157mm (96 x 96 DPI)

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 7: ITK structure (PDB code 4pqn) is shown in green and AurA structure (PDB code 4prj) in red. a-b) The combined pockets are shown in a yellow-white-blue color spectrum calculated a) based on 37 ITK chains and b) based on 45 AurA chains. Only points with a frequency above 15% in the respective ensemble and within 5 Å of any ligand atom are visualized. c) Calculated difference pocket with points that are present in more than 25% of the target-specific (dark green) and off-target specific (dark red) areas as well as points in the common core with a 60% predominance for one kinase (red-white-green). d) Target-specific clusters calculated for the points shown in c). 359x220mm (96 x 96 DPI)

ACS Paragon Plus Environment

Page 40 of 42

Page 41 of 42

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 8: BRAF structure (PDB code 4ksp) shown in green and VEGFR2 structure (PDB code 3vnt) in red together with the respective target-specific (green) and off-target (red) specific clusters. The clusters were calculated based on points that are present in more than 25% of the target or off-target specific points as well as points in the common core with a 60% predominance for one kinase. 248x193mm (96 x 96 DPI)

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

For Table of Contents Only 225x152mm (96 x 96 DPI)

ACS Paragon Plus Environment

Page 42 of 42