Discovery of Subtype Selective Janus Kinase (JAK) Inhibitors by

Dec 18, 2015 - Janus kinase inhibitors represent a promising opportunity for the pharmaceutical intervention of various inflammatory and oncological ...
0 downloads 0 Views 3MB Size
Subscriber access provided by UNIV OF CAMBRIDGE

Article

Discovery of Subtype Selective Janus Kinase (JAK) Inhibitors by Structure-Based Virtual Screening Dávid Bajusz, Gyorgy G Ferenczy, and Gyorgy M Keseru J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.5b00634 • Publication Date (Web): 18 Dec 2015 Downloaded from http://pubs.acs.org on December 22, 2015

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Information and Modeling is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 56

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Discovery of Subtype Selective Janus Kinase (JAK) Inhibitors by Structure-Based Virtual Screening Dávid Bajusz, György G. Ferenczy, György M. Keserű* Medicinal Chemistry Research Group, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Magyar tudósok krt. 2., Budapest 1117, Hungary

ABSTRACT

Janus kinase inhibitors represent a promising opportunity for the pharmaceutical intervention of various inflammatory and oncological indications. Subtype selective inhibition of these enzymes, however is still a very challenging goal. In this study, a novel, customized virtual screening protocol was developed with the intention of providing an efficient tool for the discovery of subtype selective JAK2 inhibitors. The screening protocol involves protein ensemble-based docking calculations combined with an Interaction Fingerprint (IFP) based scoring scheme for estimating ligand affinities and selectivities, respectively. The methodology was validated in retrospective studies and was applied prospectively to screen a large database of commercially available compounds. Six compounds were identified and confirmed in vitro, with an indazole-based hit exhibiting promising selectivity for JAK2 vs. JAK1. Having demonstrated that the described methodology is capable of identifying subtype selective chemical starting points with a favorable hit rate (11%), we believe that the 1 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

presented screening concept might be useful for other kinase targets with challenging selectivity profiles.

1. INTRODUCTION Janus kinases (JAKs) are a family of non-receptor tyrosine kinases associated with cytokine receptors.1,2 The JAK family consists of four enzymes, JAK1, JAK2, JAK3 and TYK2.3 Their main function is the regulation of the JAK/STAT intracellular signaling pathway, which affects gene transcription.4,5 The JAK2/STAT pathway is involved in a number of physiological processes, primarily inflammation and immune responses. Malfunctions of this pathway are linked to the development of myeloproliferative neoplasms (MPNs) such as primary myelofibrosis (PMF), essential thrombocytemia (ET) and polycythemia vera (PV).6 JAK pathway activation can also result in impaired immune modulation and inflammation, which can contribute to the development of inflammatory diseases such as rheumatoid arthritis (RA), psoriasis, lupus and inflammatory bowel disease (IBD).7 JAK1 was also revealed to contribute to tumour metastasis by the generation of actomyosin contractility.8 Small-molecule inhibition of Janus kinases therefore represents novel treatment options for indications with high unmet medical need. Currently, there are two marketed JAK inhibitors: the first is the JAK1/JAK2 inhibitor ruxolitinib, approved in late 2011 for the treatment of myelofibrosis.9 A year later, a pan-JAK inhibitor, tofacitinib was approved for the treatment of rheumatoid arthritis.10,11 (See Figure 1.) Opportunities for their application in cancer are also being explored, with a recent success in the treatment of pancreatic cancer with a combination therapy involving ruxolitinib and capecitabine.12 Figure 1

2 ACS Paragon Plus Environment

Page 2 of 56

Page 3 of 56

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

A key aspect that makes JAK inhibitor design a thriving, and at the same time challenging area is subtype selectivity. While widening the space of explored JAK inhibitors is valuable itself for providing new drug candidates, the ultimate goal of this area is definitely the selective inhibition of the JAK isoforms. Besides the obvious reason that more selective ligands tend to produce less undesired side-effects, intra-family and inter-kinome selective JAK inhibitors could also serve as probe compounds, contributing to the better understanding of the functions and mechanisms of Janus kinases. However, designing such compounds proves to be a tough challenge due to the high sequential homology and structural similarity of the ATP binding sites of the JAK isoforms (and likewise, of all kinases).13 Nevertheless, some progress has been made in the recent years.3,14 Intra-family selectivities on the scale of 10-100-fold have been reached for several compounds. One of the most challenging tasks, however,

is the accomplishment of JAK2 vs. JAK1 selectivity, with so far only few

moderately selective chemotypes and only three published compounds exceeding 100-fold selectivity.15–17 This likely stems from the high similarity of the ATP sites differing in only a few residues: of those, only Gln853 in JAK2 (with Arg879 being its counterpart in JAK1) and Asp939 (with Glu966 as its counterpart) have the right orientations toward the binding site to enable direct targeting with a small molecule.3,14 In the present work, a novel structure-based screening protocol for selective JAK2 inhibitors is presented, along with the first prospective results of its application on a large compound supplier database. We postulated that ligand docking could be complemented with additional binding site information derived from interaction fingerprints (IFPs) to enrich JAK2-selective compounds in a virtual screening campaign. Analyzing interaction fingerprints of known inhibitors, we identified interactions (residues) important for achieving intra-family selectivity. A scoring scheme based on the results of this analysis was developed and implemented into our screening protocol. The hits resulting from virtual screening can 3 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

form the basis of medicinal chemistry optimizations, during which their potency and selectivity can be further improved. With this work, we also carry on and complement our previous research in the area of JAK2 inhibitor design.18–20

2. MATERIALS AND METHODS

2.1 Databases Structures and activity data of known JAK inhibitors were retrieved from the ChEMBL Kinase Sarfari database (version 17)21, the Thomson Reuters Integrity database22 and a vast amount of primary literature16,23–61 to create our Combined Training Set. Literature data was manually entered, the compounds from different sources were merged, counterions were deleted and duplicates were removed from the dataset. All dataset operations were carried out with Instant JChem62 and Knime63. A total of 1601 molecules were collected with bioactivity data on at least one JAK isozyme (including JAK3 and TYK2), out of which 342 compounds had been tested against both JAK1 and JAK2. Several training sets were compiled from these molecules to train our docking methods to distinguish either active or selective compounds, as described in the respective sections below. For prospective screening studies, the Mcule Purchasable Compounds Database (MPCD) was used64.

2.2 Property-based pre-screening In order to reduce the computational cost of the more robust structure-based protocol, we applied a stepwise pre-screening to the initial MPCD dataset (approx. 5.3M molecules). First, molecules with reactive groups (as defined in the Schrödinger Suite65) were removed. Then, a

4 ACS Paragon Plus Environment

Page 4 of 56

Page 5 of 56

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

lead-like filter was applied on the remaining 4.3M molecules, resulting in a total of 1.2M lead-like compounds, as defined by Teague et al. (250 ≤ MW ≤ 350, logP ≤ 3.5, rotB ≤ 7) 66. This set was further focused to “kinase-like” compounds using a scoring scheme based on tailor-made desirability functions

67–69

. Variables (properties) were selected by a statistical

comparison of property distributions for known kinase actives and random molecules. Significant differences in the distributions could be observed for six properties, which were included in the scoring scheme: topological polar surface area (TPSA) and the number of oxygen atoms (NO), nitrogen atoms (NN), rotatable bonds (rotB), aromatic rings (NArom) and hydrogen bond donors (HBD). A score value between 0 and 1 was assigned to each molecule for each of these properties and these scores were summed to make up the overall desirability score (KiDS – Kinase Desirability Score). A KiDS cutoff of 4 was used in our prospective studies to select compounds for structure-based screening for two reasons: the number of compounds submitted to the more time-demanding structure-based calculation was decreased by an order of magnitude, while the ratio of kinase actives was increased roughly 4-fold as shown in our recent retrospective study (in the sense of ROC and “conventional” enrichment as well).70

2.3 Structure-based screening Docking templates were acquired from the Protein Data Bank having more than fifty Janus kinase X-ray structures available. Due to their large number, the structures were superimposed with a built-in Schrödinger65 script and visually inspected in order to avoid using redundant structures for ligand docking. Upon visual inspection of the superimposed binding sites, three clusters were identified for 38 JAK2 crystal structures. One representative structure with the most active co-crystallized ligand was selected from each cluster (PDB accession codes: 3TJD16, 4GMY71 and 3E6261) for further examination. 5 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 56

The selected structures were used to develop a docking protocol by retrospective enrichment studies. Known actives from our Combined Training Set were mixed into a CDK2 decoy set (2074 molecules) obtained from the Directory of Useful Decoys72. (Since at the time of our research, there was no published decoy set for Janus kinases, the CDK2 decoy set was selected for this purpose, due to CDK2 having the highest level of sequential homology with Janus kinases among the targets with published decoy sets.) JAK2 actives were defined as compounds having IC50 values below 1 µM. Several datasets of similar composition were created: a training and two test sets with 69, 66 and 67 JAK2 actives, respectively. With the number of actives per dataset between 60 and 70 (versus the 2074 decoys), the ratio of actives to decoys was approximately 1:30 in each dataset. The datasets were docked into each of the selected protein structures using the Schrödinger Virtual Screening Workflow, sorted by Glide Docking Score73,74 and early enrichment factors and Receiver Operating Characteristic curves were calculated. Enrichment factors were defined as the y/x ratios for given x values (false positive rates) on the ROC curve, as suggested by Jain and Nicholls75, to provide a size-independent measure of early enrichment. In addition we also report conventional enrichment factors, defined as: EFx% = (Nact, x% / Nx%) / (Nact / N)

(1)

in the Supporting information. (Here, Nact, x% and Nx% are the number of actives and the total number of compounds, respectively, in the top x% of the ranked list, while Nact and N are the number of actives and the total number of compounds in the whole dataset, respectively.)

2.4 Ligand docking The most effective protocol in single-structure docking runs involved the following steps: 1. Compounds are prepared with LigPrep. Duplicates are removed, protomers are generated at a target pH of 7.4 using Epik76,77, high-energy tautomer states are removed, 6 ACS Paragon Plus Environment

Page 7 of 56

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

then up to 8 stereoisomers are generated and ring conformations are sampled. (For the retrospective tests, reactive compounds were filtered out before ligand preparation; in the prospective study, this step was carried out as a part of the pre-screening.) 2. Ligand preparation is followed by a Glide HTVS docking. 3. The top 20% of best scoring poses are then submitted to a Glide Single Precision docking. In this step, the establishment of a hydrogen bond with at least one of the anchor groups of the hinge region of the kinase is required. (Figure 2B) Hydroxyl groups near the binding site are allowed to rotate (Ser963 for JAK1 and Tyr931 and Ser936 for JAK2). The best scoring pose is selected for each ligand as the final output of the docking procedure. The same procedure was applied during ensemble docking. In this case, the best (i.e. lowest) scoring binding pose across the individual protein structures was selected as the final pose (this is the default approach implemented in Schrödinger).

2.5 MD simulations MD simulations were carried out with Desmond.78–80 The original ligand and water molecules from the X-Ray structures were kept and the systems were solvated in TIP3P water81 in a cubic box and neutralized by adding four and six sodium ions for 4IVC and 3TJD respectively. A total of 29 sodium and 29 chloride ions were added to each system to adjust the salt concentration to 0.15 M. Equilibration was carried out with the default protocol of Desmond, which comprises the following steps: 1. Minimization with the solute restrained 2. Minimization without restraints 3. 12 ps of simulation in the NVT ensemble on 10K using a Berendsen thermostat82 with non-hydrogen atoms of the solute restrained 7 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

4. 12 ps of simulation in the NPT ensemble on 10K and 1 atm using a Berendsen thermostat and a Berendsen barostat82 with non-hydrogen atoms of the solute restrained 5. 24 ps of simulation in the NPT ensemble on 300K and 1 atm using a Berendsen thermostat and a Berendsen barostat with non-hydrogen atoms of the solute restrained 6. 24 ps of simulation in the NPT ensemble on 300K and 1 atm using a Berendsen thermostat and a Berendsen barostat without restraints Equilibration was followed by a 20-ns-long production run in the NPT ensemble on 300K using a Nosé-Hoover thermostat83,84 and a Martyna-Tobias-Klein barostat.85 The cutoff for short-range electrostatic interactions was 9.0 Å and a Smooth PME method86 was applied for the treatment of long-range interactions. A 2/2/6 fs multistepping scheme was applied with the RESPA integrator for time stepping.

2.6 JAK inhibition measurements Hit compounds were tested for JAK inhibition in a Z’-LYTE kinase inhibition assay (Life Technologies).87 The Z´-LYTE biochemical assay employs a fluorescence-based, coupledenzyme format and is based on the differential sensitivity of phosphorylated and nonphosphorylated peptides to proteolytic cleavage. The peptide substrate (Z'-LYTE Kinase Assay Kit - Tyrosine 06 Peptide for JAK1 and JAK2) is labeled with two fluorophores – coumarin and fluorescein, one at each end – that make up a FRET pair. After incubating the kinase + peptide + test compound mixture for an hour, a development reaction is carried out, during which any peptide that was not phosphorylated by the kinase is cleaved, disrupting the resonance energy transfer between the FRET pair. Based on the ratio of the detected emission at 445 nm (coumarin) and 520 nm (fluorescein), the ratio of cleaved vs. intact peptide (and thus, the reaction progress) can be quantified.

8 ACS Paragon Plus Environment

Page 8 of 56

Page 9 of 56

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

The compounds were screened at 20 µM concentration in 1% DMSO (final). Each data point was acquired twice and the results were averaged. For the IC50 measurements, 10-point curves were obtained with 3-fold serial dilutions of the compounds, starting from 100 µM in 1% DMSO. (Duplicate data points were acquired for each concentration.) For compounds B03, B04 and B06 the top concentration in IC50 determinations was 20 µM due to solubility issues. The following assay protocols were applied: JAK1: 5 µL of JAK1 / Substrate mixture is prepared and added to 2.5 µL of Test Compound solution, to which 2.5 µL of ATP solution is added. The final 10 µL Kinase Reaction consists of 21.2 - 91.5 ng JAK1 and 2 µM Substrate in 50 mM HEPES pH 7.0, 0.01% BRIJ-35, 10 mM MgCl2, 1 mM EGTA, 0.01% NaN3. After the 1 hour Kinase Reaction incubation, 5 µL of a 1:128 dilution of Development Reagent A is added. After another hour of incubation, the fluorescence is read out and the data is analyzed. JAK2: 5 µL of JAK2 / Substrate mixture is prepared and added to 2.5 µL of Test Compound solution, to which 2.5 µL of ATP solution is added. The final 10 µL Kinase Reaction consists of 0.05 - 0.42 ng JAK2 and 2 µM Substrate in 50 mM HEPES pH 7.5, 0.01% BRIJ-35, 10 mM MgCl2, 1 mM EGTA. After the 1 hour Kinase Reaction incubation, 5 µL of a 1:128 dilution of Development Reagent A is added. After another hour of incubation, the fluorescence is read out and the data is analyzed.

3. RESULTS AND DISCUSSION 3.1 Retrospective screening and evaluation To identify the most suitable screening protocol for our hit discovery effort, we have carried out a large-scale retrospective screening on known JAK inhibitors and decoy molecules (see Section 2.3). Iterating over a number of settings (HTVS/SP/XP precision, percentage of top9 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

scored compounds to keep, considering H-bond constraints, etc.), the procedure which provided the highest early enrichments was selected for further retrospective and prospective docking studies (see Section 2.4 and Figure 2C for details). While initially we have investigated the possible benefits of a Glide XP docking step, we have omitted it later, as it has not improved the results significantly, but has noticeably increased the computational time. In single-structure docking, the developed docking protocol exhibited moderate to good early enrichments on the training set (see Supplementary Tables S1 and S2). However, its robustness was unsatisfactory, since these results could not be reproduced for the test sets in most cases. Moreover, in all but two cases, less than half of the actives were successfully recovered. (As only the top 20% of the molecules in the HTVS docking were submitted to SP docking, many actives were “lost”.) To enhance the efficiency of our docking protocol, two options were investigated. First, the presence and possible importance of structurally conserved waters were investigated. Upon inspection of the available JAK X-Ray structures, one structurally conserved water molecule could be identified for JAK2 that is located between the ligand, the DFG activation segment and the N-terminal lobe. After repeating the single-structure docking runs for the representative JAK2 structures keeping the mentioned water molecule as part of the binding site, results have deteriorated for two of the structures, but a noticeable improvement was observed for 3E62. The water molecule was therefore kept in our further studies as a part of this protein structure. The second option was changing our protocol from single-structure to ensemble docking that provides a way to account for the flexibility of the protein. Since we have concluded earlier, that the X-ray structures have a high degree of similarity at the binding site and can be clustered into three groups, we decided to consider more protein conformers by the means of molecular dynamics simulations. It was shown in a recent study by Tarcsay et al. that protein 10 ACS Paragon Plus Environment

Page 10 of 56

Page 11 of 56

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

conformer sampling from MD simulations can enhance the performance of virtual screening88. Their results prompted us to initiate an all-atom MD simulation on the structure, that performed best on average in single-structure docking (3TJD16). 21 frames from the production run were selected periodically (one for every nanosecond, including the starting conformation) for further examination and subjected to a short restrained minimization process, as implemented in the Schrödinger Protein Preparation Wizard89,90. To assess the applicability of the acquired protein structures, single-structure docking runs were performed on each of them (according to the protocol described above) for the training set. Structures performing better than arbitrarily defined cutoff values of early enrichments (at least 11-fold enrichment in the top 1% and 6-fold enrichment in the top 2%) were selected for further docking studies. Ensemble docking was carried out on the representative X-ray structures and on those taken from the MD simulations (a total of 16 structures) using the training set. This approach yielded excellent results in terms of the ratio of recovered actives and a slight improvement was also observed for the early enrichment factors (reported in Table 1 and Figure 2C). To keep the computational cost reasonable for prospective screening, we have reduced the number of ensemble structures to five in a stepwise manner. In each step, the structure which least affected the overall results was omitted. The final ensemble included the representative crystal structure 3E62 (with the structural water molecule) and frames 4, 9, 18 and 20 of the MD simulation. The final docking procedure was validated on the test sets. To our pleasant surprise, the final ensemble even showed some further improvements in terms of early enrichments (see Table 1). It is clear that the robustness of the method was greatly improved with the inclusion of further protein conformers. Table 1

11 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

3.2 Estimation of selectivity Since an important goal of our work is to achieve acceptable levels of intra-family selectivity, we have devised a scoring system based on interaction fingerprints91 to estimate the selectivity of our docking hits. Interaction fingerprints are typically used to enhance affinity predictions by direct comparison to known high affinity ligands92. In our case however, such a comparison would be insufficient, since selectivity can be achieved by a number of protein-ligand interactions, not all of which are necessarily established by a single ligand. To get around this problem, we identified those interactions which could drive JAK2 vs. JAK1 selectivity and established a scoring scheme for the evaluation of interaction patterns found in docking calculations. Selection of these key interactions was carried out by docking JAK2 selective and reference ligands into the protein structures used for ensemble docking, and comparing their interaction fingerprints with their docked poses in JAK1. To that end, JAK1 X-Ray structures 4IVC93 and 3EYG94 were used in a similar ensemble docking approach (see the previous section), augmented by three MD-frames from a 20ns-long MD simulation of the X-Ray structure 4IVC. (For the selection of the final JAK1 ensemble from the X-Ray structures and the MD simulation, we have applied the same method as the one detailed in the previous section: ensemble docking was carried out many times on a training set of 62 JAK1 actives mixed into the same decoy set from DUD. In each turn, the structure that least affected the outcome was omitted, until we arrived at the final set of five structures. The results were validated on a test set containing 59 other JAK1 actives.) To identify and assemble those interactions which could be important for JAK2 vs. JAK1 selectivity, we applied the procedure detailed below. Two sets of compounds were compiled based on inhibitory activity: JAK2-selectives (31) and reference (94) compounds. Selective compounds were defined as having at least 10-fold 12 ACS Paragon Plus Environment

Page 12 of 56

Page 13 of 56

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

selectivity for JAK2 over JAK1, while references were defined as having at most 5-fold JAK2/JAK1 selectivity (including JAK1 selectives where this ratio is below 1). Both sets were docked into all five protein structures for each isozyme (Glide SP docking), then interaction fingerprints (IFPs) were generated for each docked complex. To compare the interaction fingerprints, some preparatory steps were necessary: 1. A common subset of residues was compiled. (Since not necessarily the same set of residues is present in each X-ray structure, insertions/deletions were omitted. No binding site residues were affected by this.) 2. In addition, each residue in JAK1 had to be assigned to a corresponding residue in JAK2 in a mutually exclusive manner. To that end, the two isozymes were sequence aligned. 3. To simplify the comparison of the IFPs of the same compound on the different isozymes, residues in the JAK2 IFP-s were renumbered to reflect the residue numbering of JAK1 (based on the sequence alignment). After identifying a common subset of residues, the following steps were taken to identify the key interactions: 1. For each compound, and for each interaction, a δ value was calculated: ߜ = ܰሺ‫ܭܣܬ‬2ሻ − ܰሺ‫ܭܣܬ‬1ሻ where N(JAK1) and N(JAK2) are the number of JAK1 and JAK2 complexes of the ligand (respectively) in which the interaction is present. Hence, N(JAK1) and N(JAK2) can take values between 0 and 5, while δ can take values between -5 and 5. It is easily seen that the greater δ is, the more specific that given interaction is towards JAK2 (for the given compound). 2. For each interaction, the δ values were averaged for the two datasets separately (JAK2selective and reference compounds). 13 ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

For an interaction to be classified as important for JAK2 selectivity we used a dual criteria. The interaction needs to be specific for JAK2 among JAK2-selective compounds (δavg.(JAK2selectives) > 0) and it needs to be more prevalent in complexes of JAK2-selective ligands than in complexes of reference ligands: δavg.(JAK2-selectives) > δavg.(references). Based on these criteria, 26 interactions were selected and included in the IFP scoring process. The IFP score was defined as the fraction of these interactions that are established in the protein-ligand complex, averaged over the up to five structures the ligand was successfully docked in. To validate the scoring scheme, we calculated IFP selectivity scores for the mentioned training set and plotted them against Glide Docking Scores that estimate the binding affinity (Figure 2D). Cutoff values were determined for both the affinity and selectivity scores to define the region with the best enrichment of JAK2 selective compounds. The cutoffs were used later for the identification of potentially selective compounds in our prospective screening campaign. The main results of the retrospective study are summarized in Figure 2. Figure 2

3.3 Prospective in silico and in vitro screening For prospective screening purposes, we used the MPCD collection that contains ~5.1M compounds in total, out of which there are ~1.2M leadlike molecules. The leadlike subset was focused to kinase-like molecules based on the property-based pre-screening step70 detailed earlier. The resulting ~105k kinase-like compounds were submitted to the ensemble dockingbased screening protocol. Based on the activity and selectivity criteria set for selective ligands (Docking Score ≤ -8 and IFP score ≥ 0.33, respectively), 429 virtual hits were retrieved, from which a diverse selection of 130 compounds was extracted (