Accelerating Lead Identification by High Throughput Virtual Screening

3 days ago - Here, we describe three prospective case studies, Bruton Tyrosine Kinase (BTK), RAR-Related Orphan Receptor γ t (RORγt) and Human ...
1 downloads 0 Views 3MB Size
Subscriber access provided by Washington University | Libraries

Computational Chemistry

Accelerating Lead Identification by High Throughput Virtual Screening: Prospective Case Studies from the Pharmaceutical Industry Kelly L. Damm-Ganamet, Nidhi Arora, Stephane Becart, James Patrick Edwards, Alec D. Lebsack, Heather McAllister, Marina I Nelen, Navin Rao, Lori Westover, John J. M. Wiener, and Taraneh Mirzadegan J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.8b00941 • Publication Date (Web): 28 Feb 2019 Downloaded from http://pubs.acs.org on March 1, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 72 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Accelerating Lead Identification by High Throughput Virtual Screening: Prospective Case Studies from the Pharmaceutical Industry

Kelly L. Damm-Ganamet§, Nidhi Arora§, Stephane Becart, James P. Edwards, Alec D. Lebsack, Heather M. McAllister, Marina I. Nelen, Navin L. Rao, Lori Westover, John J. M. Wiener, Taraneh Mirzadegan

Discovery

Sciences and Immunology, Janssen Research and Development, 3210 Merryfield

Row, San Diego, CA 92121, USA. Discovery

Sciences, Janssen Research and Development, Welsh and McKean Roads, Spring

House, PA 19477, USA.

Keywords: High Throughput Virtual Screen, HTVS, In-Silico Screen, Structure based Drug Design, Bruton Tyrosine kinase, BTK, Related Orphan Receptor  t, RORt, Major Histocompatibility Complex, MHC, Human Leukocyte Antigen DR isotype, HLA-DR 1

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ABSTRACT

At the onset of a drug discovery program, the goal is to identify novel compounds with appropriate chemical features that can be taken forward as lead series. Here, we describe three prospective case studies, Bruton Tyrosine Kinase (BTK), RAR-Related Orphan Receptor  t (RORt) and Human Leukocyte Antigen DR isotype (HLA-DR) to illustrate the positive impact of high throughput virtual screening (HTVS) on the successful identification of novel chemical series. Each case represents a project with a varying degree of difficulty due to the amount of structural and ligand information available internally or in the public domain to utilize in the virtual screens. We show that HTVS can be effectively employed to identify a diverse set of potent hits for each protein system even when the gold standard, high resolution structural data or ligand binding data for benchmarking is not available.

2

ACS Paragon Plus Environment

Page 2 of 72

Page 3 of 72 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

INTRODUCTION Identification of bioactive leads that carve a novel chemical space remains a constant challenge for drug discovery programs in pharmaceutical settings, especially when a significant number of competitor molecules are progressing towards the clinic. Such a challenge calls for methodologies that can rapidly evaluate millions of molecules from internal or public databases to identify bioactive hits in a resource efficient manner. High-throughput virtual screening (HTVS) has been shown to be a promising and resource efficient tool for hit identification1-8 with a number of recent studies illustrating its successful application to drug discovery9-14. The methodology involves utilizing the protein binding site of interest, as determined from available crystal structures or homology models, to dock and score millions of compounds from an internal or public database. In-silico hits identified from this docking experiment are then screened in suitable biophysical or biochemical assays to confirm engagement and determine the affinity to the target of interest. Optimizing the time required for docking millions of molecules, however, requires simplification of the problem to include a static snapshot of the protein and calculating the binding enthalpy while typically ignoring protein flexibility, entropy of binding and desolvation. A review by Wingert and Camacho discusses recent improvements in virtual screening and summarizes lessons learned from prospective studies.15 The success of a structure-based HTVS campaign depends on multiple factors. First, an accurate structure of the target of interest is necessary. High resolution crystal structures are deemed the gold standard; however, nuclear magnetic resonance (NMR) structures or high-quality homology models can also be employed when crystal structures are not available. Experimentally determined structures are typically obtained from the Protein Data Bank (PDB)16 or solved internally when achievable. Having a ligand-bound structure, whether it be a small molecule or 3

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 72

peptide, can aid the HTVS by revealing the location of a legitimate binding site. Additionally, a ligand-bound site may be better suited for small molecule binding due to any induced fit that has occurred to allow for the native ligand to optimally bind, creating a pocket that may be more amenable to other small molecules. On the contrary, apo sites may have side chains protruding into the binding site which can decrease the pocket volume or even obstruct a small molecule from binding, thus, reducing the number of hits found in an HTVS. Moreover, the pocket may not be entirely formed in an apo structure. When a binding site is unknown, a variety of available tools exist to identify and evaluate potential sites on the surface of a protein as summarized in a recent review by Vajda et al17. Geometric or grid-based techniques include methods such as SiteMap18,

19,

SiteFinder20,

PocketPicker21, LIGSITE22, vMAP23 and Fpocket24 while computational tools such as FTMap25, MCSS26, or MixMD27, 28 can be utilized to detect hot spots, which are areas that have a high propensity for ligand binding. The challenge when using these tools is determining which identified sites are valid pockets where ligand binding would be functionally relevant. Lastly, known literature information regarding small molecule binding to the target of interest can also positively influence the outcome of an HTVS by providing validated binding affinities which can be used for benchmarking prior to the large screen. A review by Legarde et al. provides an overview of benchmarking data sets developed over the years.29 Resources such as Community Structure-Activity Resource (CSAR)30-36 and Drug Design Data Resource (D3R)37,

38

provide

curated experimental datasets of crystal structures and multiple ligands with measured affinity data; moreover, ChEMBL39 and PubChem40 are databases of chemical molecules and their corresponding biological activities. With the simplifications that are employed to reduce the computation time of screening large molecule databases, it is important to do a thorough

4

ACS Paragon Plus Environment

Page 5 of 72 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

benchmarking of docking conditions to ensure a high hit rate when possible. In our experience, certain targets, such as kinases, tend to fare better in an HTVS campaign likely because they are very well studied systems and large amounts of data are available in the public domain (i.e. multiple crystal structures solved with a diverse set of ligands, wealth of ligand-binding information). Additionally, targets with smaller, well-buried pockets can show enriched performance in an HTVS as there is less volume that a ligand needs to sample and lack of an appropriate desolvation penalty may not be as big of a factor in scoring. Herein, we describe three prospective case studies, Bruton Tyrosine Kinase (BTK), RARRelated Orphan Receptor  t (RORt) and Human Leukocyte Antigen DR isotype (HLA-DR) to illustrate the positive impact of virtual screening on the identification of novel chemical series in the pharmaceutical industry. The focus here is on the performance of the HTVS and its impact on lead identification not a full project account, so the complete derivation of the compounds introduced may not be discussed in detail. Each case represents a project with varying degree of difficulty with respect to available structural and ligand information. Additionally, throughout this article, the term HTVS applies to the general virtual screening methodology, not a specific method by a group or company. Of the three targets, the kinase BTK presented a relatively straightforward example, considering the availability of many crystal structures bound to a diverse set of ligands with known binding affinity data. Two potent hits identified from the HTVS were successfully developed to become lead series for the internal BTK project. RORt presented the next level of challenge as a crystal structure of the ligand binding domain (LBD) was not available at the time that the HTVS was completed and very little data was available on small molecule binding affinities. Despite using homology models, the HTVS was very successful and a compound identified from the HTVS

5

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

screen was advanced to New Molecular Entity (NME) status. Lastly, HLA-DR presented the highest level of challenge for a successful virtual screen as the discovery team sought a compound to attenuate a protein-protein interaction. Although multiple crystal structures of HLA-DR are available, the binding site is an open, solvent exposed peptide binding groove and the few small molecules inhibitors that are reported in literature have very low potency41. HTVS for the HLADR project successfully led to the discovery of compounds with nanomolar cellular potency. We illustrate here that HTVS is a viable option, both for jumpstarting compound identification for a discovery program and as a method to complement the discovery of bioactive compounds in conjunction with a high throughput screen (HTS). It can be effectively employed to identify a diverse set of potent hits even when the gold standard, high resolution structural data, is not available. Furthermore, this powerful tool can be used to expedite hit identification, especially by research groups that do not have access to larger corporate compound collections or do not have resources to support HTS. Although routine in large pharma, HTS is not always available to academics or small biotech companies. We hope that this work will help other researchers in optimizing the enrichment of their virtual screens and in expediting the identification and progression of in-silico hits to leads.

RESULTS AND DISCUSSION Case Study 1: Bruton Tyrosine Kinase Background. Bruton Tyrosine Kinase (BTK) is a member of the Tec family of Tyrosine Kinases and an upstream activator of many apoptotic proteins of the NF-kB pathway. It is 6

ACS Paragon Plus Environment

Page 6 of 72

Page 7 of 72 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

overexpressed in many leukemias and lymphomas of the B-lineage including Acute Lymphoblastic Leukemia (ALL), Chronic Lymphocytic Leukemia (CLL), and Non-Hodgkin’s Lymphoma (NHL).42 As BTK plays a key role in B-cell proliferation, differentiation, function, and signaling, it is considered an important target for modulating immune cell function, for treating lymphomas of B-lineage and for treating autoimmune diseases such as rheumatoid arthritis and systemic lupus erythematosus.43, 44 BTK inhibitors, similar to other kinase inhibitors, are classified based on where they bind and the conformation assumed by structural elements of the protein: Type 1 (bind to the active protein kinase conformation (DFG-Asp in, C-helix in)), Type 1.5 (bind to a DFG-Asp in inactive conformation), Type 2 (bind to a DFG-Asp out inactive conformation) or Type 3 (bind allosterically adjacent to the ATP-binding pocket).45, 46 A more detailed description of the kinase inhibitor classification can be found in a review by Roskoski47. The challenge for this project was to carve out a novel chemistry space in a very crowded chemical landscape as a significant number of potent, non-covalent competitor compounds were already being evaluated pre-clinically for immunology indications. Additionally, covalent BTK inhibitors (Ibrutinib (Pharmacyclics/Janssen), AVL-292 (Avila)) were progressing through Phase 2 and Phase 1 respectively for oncology indications. Considering the significant amount of literature and crystal structure information available on BTK inhibitors, we decided to leverage this data and prioritized HTVS over HTS as the method of choice to jumpstart our internal discovery program. Benchmarking and Pocket Validation to Optimize HTVS Enrichment. To determine the optimal BTK structures for the HTVS, eleven liganded and one apo crystal structure publicly available in the PDB were evaluated. In order to select a diverse set of chemotypes and classes of compounds while being cognizant of resources, we selected two high resolution

7

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

crystal structures (PDB IDs 3OCS and 3PJ348) to carry forward for HTVS. The 3OCS structure (1.8 Å resolution) of BTK is bound to a very potent, Type 1.5 ligand, CGI-1746 (IC50 = 0.002M) while the 3PJ3 structure (1.85 Å resolution) has the ligand 04L bound but the side chains adopt a conformation that allow binding of Type 1-3 inhibitors as shown in Figure 1. Self-docking (docking of the ligand into its own crystal structure receptor) and cross-docking (docking a series of ligands into different crystal structure conformations of the same receptor) experiments were performed, along with a benchmark study to determine the most optimal software, HTVS parameters and hydrogen bond constraints for docking into BTK. Of the protocols tested for the 3PJ3 pocket, the best performance was achieved using Glide-SP49, 50 requiring 1 of 2 hydrogen bonds constraints (hydrogen bonds with Glu475 backbone CO, and Met477 backbone NH). For the 3OCS pocket, the best performance was achieved by a combination protocol where the completely enumerated database was docked using FRED51, 52

followed by re-docking of the top 30% hits into 3OCS pocket with Glide-XP53, requiring

two hydrogen bond constraints (constraints located at Glu475 backbone carbonyl and Met477 backbone amine). Further details regarding the benchmark study can be found in the supporting Information.

8

ACS Paragon Plus Environment

Page 8 of 72

Page 9 of 72 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

A

B

Figure 1. Significant plasticity is observed in the BTK binding site to accommodate Type 1, Type 1.5, Type 2 and Type 3 kinase inhibitors. A. 3OCS binding pocket with Type 1.5 inhibitor-CGI1746 bound, B. 3PJ3 binding pocket (DFG-out) with Type 2 ligand 04L bound. HTVS Identified Diverse and Potent Inhibitors. The HTVS database of 891,982 unique compounds (2.7 million total structures), which is a portion of the full Janssen library, was docked in the 3PJ3 and 3OCS pockets using the top performing protocols from benchmarking (3PJ3: Glide-SP requiring 1 of 2 hydrogen bonds to the hinge; 3OCS: combination protocol using FRED docking to select the top 30% hits followed by re-docking using Glide-XP requiring two constraints). Based on available experimental screening resources, the team had requested a focused library of 6000 compounds to be prioritized by the HTVS. We selected and pooled 4970 virtual hits at 2.5 (GlideScore < -9.05) from the 3PJ3 run and 2047 hits at 2 (GlideScore < -9.19) from the 3OCS run and subsequently removed the duplicates. As the HTVS database was composed of compounds that were pre-filtered to assimilate this lead-like set, no further filtration was done based on molecular properties. The remaining compounds were clustered 9

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 72

using an internal proprietary clustering methodology54 using Pipeline Pilot55. This resulted in a focused library of 5547 compounds as the final HTVS prioritized set which was sent for evaluation in a target binding assay using ThermoFluor56. Of the 5547 focused library compounds prioritized by HTVS, 880 primary hits were identified by ThermoFluor (TF) in a single-point screen (n=1). Of these primary hits, 795 actives were re-confirmed in TF (n=2) and characterized in dose response in both binding (TF) and high throughput mass spectrometry-based (HTMS) enzyme activity assays. The confirmed HTVS actives had Tm values in the range of 1-12° C and binding IC50 values in the range of 0.003-11.55 M as shown in Table 1. Upon clustering the 795 actives, 171 unique clusters including 65 singletons were obtained at a Tanimoto cutoff of 0.65. The largest cluster had 67 members. Selected representative compounds identified in the virtual screen are shown in Table 2, highlighting the potency and diversity of the identified compounds. These results illustrate that this virtual screen had a significantly high hit rate of 14.3% and was very successful in identifying a set of diverse and potent inhibitors. Table 1: ThermoFluor binding affinity and corresponding inhibition constant values for confirmed HTVS actives. Tm (°)

# VS Actives

% VS Actives

1-3

326

41.0

1.08-38.34

0.11-11.55

3-5

235

29.6

0.19-38.34

0.023-5.75

10

ThermoFluor (µM)

ACS Paragon Plus Environment

Kd

HTMS IC50 (µM)

Page 11 of 72 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

5-7

178

22.4

0.04-8.4

0.003-5.75

7-9

50

6.3

0.024-0.56

0.028-0.8

9+

6

0.75

0.014-0.051

0.041 -0.57

Table 2: Examples of identified compounds from the BTK HTVS along with respective ThermoFluor and HTMS data.

Compound

Structure

1

O NH

ThermoFluor Kd (µM)

HTMS IC50 (µM)

0.07

0.04

1.36

0.97

2.33

2.09

O N

N

N

N HN

2

O

Cl

OH

N O HN

O N

O O

3

HN HN

N N N

O

NH

F

F

11

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

4

O C

Page 12 of 72

0.04

0.04

0.09

0.12

O N

C C C C HN C

N NH O O O

N

5

O HN S O O HN

HN N

Running multiple virtual screens of large corporate databases can be very timeconsuming as compared to simple ligand-based 2D similarity or substructure searching. As such, an analysis was completed to ensure that the HTVS identified diverse chemistry and not just chemotypes similar to known BTK inhibitors available in the literature. A database was created using a subset of the Kinase Knowledgebase from Eidogen-Sertanty57, a SAR database of kinase inhibitors curated from scientific literature and patents. As of the 2018 Q2 release, a search for BTK inhibitors identified 26,286 unique compounds with assay data. This list was then filtered for those with pIC50 values of ≥ 5 resulting in 2,577 unique compounds. These compounds were then clustered using in-house methodology54 based on the concept of maximum common fingerprints at a level of 0.5 and a cluster head was randomly chosen. This resulted in a 12

ACS Paragon Plus Environment

Page 13 of 72 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

final dataset of 127 very diverse, potent inhibitors. Histograms of the Tanimoto coefficients for Compounds 1-5 (Table 2) compared to the 127 known BTK inhibitors are provided in the supporting information and demonstrate that these compounds are indeed very dissimilar structurally to the BTK in the literature. Since almost all the Tanimoto coefficients fall below 0.30, it is highly unlikely that these five active hits would have been identified by a simple, 2D similarity or substructure searching method. Accurate Binding Modes Predicted by HTVS. A crystal structure of Compound 1 (HTMS IC50 = 0.04 M) in complex with BTK was obtained. An overlay of the best scoring Glide docking mode of Compound 1 in the 3OCS pocket with the crystal structure shows that the identified binding mode was similar to that identified in the crystal structure (RMSD = 2.92 Å). This illustrates that the selected docking protocol was also able to predict the docked binding mode reasonably accurately (Figure 2A). In the docked binding mode, the molecule makes two hydrogen bonds with the hinge Met477 backbone and places the front phenyl and indane group in the same region as the crystal binding mode. The only significant difference is in the placement of the flexible aliphatic chain in the solvent-exposed region in front, which caused the RMSD to be higher than the 2.0 Å threshold typically utilized as a cut-off to assess successful pose prediction. Furthermore, we were able to obtain the crystal structure for Compound 5 (HTMS IC50 = 0.12 M), another potent sub-micromolar inhibitor. A superposition of the docked binding mode of this inhibitor in the 3PJ3 pocket also shows a very good overlap with the crystal structure (RMSD = 1.01 Å), Figure 2B. In the predicted binding mode, Compound 5 makes two hydrogen bonds to the hinge backbone of Glu475 and

13

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Met477 and a third with Thr474 while positioning the dimethyl phenyl in the back pocket similar to the experimentally solved structure. This also illustrates that the second HTVS protocol of Glide-SP requiring one of two hydrogen bond constraints with the hinge was also quite effective in predicting the binding mode for this compound.

A

T474

B T474 Q475

M477 M477

Figure 2A. Predicted docking mode of Compound 1 (pink) shown overlaid with the experimental cocrystal structure (PDB ID: 6NFH, yellow). Dotted lines illustrate two important hydrogen bonds formed between the ligand and Glu475 and Met477, the hinge region of the protein backbone. B. Predicted docking mode of Compound 5 (pink) shown overlaid with the experimental cocrystal structure (PDB ID: 6NFI, yellow). TF-HTS and Cell-HTS Complement Hits Identified by HTVS. To be comprehensive and potentially identify additional chemotypes, two high throughput screens were subsequently run for BTK. A set of 302,465 compounds were screened by ThermoFluor56 (TFHTS) and led to the identification of 2083 confirmed actives (n=2) and another set of 433,657 compounds were screened in a Ramos cell-based FLIPR Assay58 (Cell-HTS) yielding 1693

14

ACS Paragon Plus Environment

Page 14 of 72

Page 15 of 72 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

confirmed actives (n=3). The tested compounds selections were subsets of the Janssen compound library collection and had a significant overlap with the compounds docked in the HTVS: 70.2% of TF-HTS compounds (197,984) and 81.7% of Cell-HTS compounds (354,209). Furthermore, there was 46.8% (132,008 compounds) overlap between the compounds screened by all three methods (HTVS, TF-HTS and Cell-HTS). A comparison of confirmed actives versus the total number of compounds assayed in the three screens shows that: 1) 5547 VS hits were screened and resulted in 795 confirmed actives yielding a hit rate of 14.3%, 2) 302,465 compounds were screened in TF-HTS and yielded 2083 hits with a hit rate of 0.68% and 3) 433,657 compounds were screened in the Cell-HTS and yielded 1693 confirmed hits and a hit rate of 0.4%. Although a much larger number of compounds were screened in the two HTS runs, HTVS was significantly more resource efficient and yielded a 21-fold higher hit rate versus TF-HTS and 36-fold higher hit rate versus the Cell-HTS. It is interesting to note that the 795 HTVS actives were present in the combined HTS screening decks; the experimental methods together were able to identify 565 of the actives (71%) while the remaining 230 actives (29%) were missed as ‘false negatives’. These active compounds would not have been identified if only HTS was used as a strategy for hit identification and illustrate the complementarity of the approaches. HTVS Led to the Discovery of a Novel and Potent Leads Series. The team selected six BTK inhibitors from the HTVS to progress further into our discovery pipeline as lead series, provided in Table 2 above. These compounds were selected for further characterization based

15

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

on the following criteria: novel chemotype, high potency in BTK assays, selectivity with respect to a small panel of kinases, availability of initial structure-activity relationships (SAR) in the analogs, and chemical tractability. This case study illustrates that when rigorous benchmarking with the large amount of structural and ligand binding data available in the literature is utilized to optimize enrichment, HTVS can be effectively employed to identify a diverse set of potent hits.

Case Study 2: RAR-Related Orphan Receptor  t Background. RAR-related orphan receptors (RORs) belong to the nuclear receptor family of intracellular transcription factors and are comprised of three members, ROR, ROR and ROR, each having multiple isoforms. RORt is expressed in a variety of cell types of the innate and adaptive immune system59 and drives Th17 cell differentiation and induces the transcription of cytokines including interleukin (IL)-17A and IL-17F60. Both IL-17A and the Th17 pathways have significant involvement in the development of autoimmune conditions including systemic lupus erythematosus, multiple sclerosis, psoriasis, rheumatoid arthritis, and Crohn’s disease.61 Hence, modulation of RORt through the displacement of endogenous ligands in the ligand binding domain (LBD) by small molecule binding is an active area of drug discovery research.62-67

16

ACS Paragon Plus Environment

Page 16 of 72

Page 17 of 72 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

The ligand binding domain (LBD) of nuclear receptors is critical for nuclear localization as it contains the activation function 2 (AF2 or Helix 12 (H12)) region. Upon binding to the LBD, small molecule effectors induce a conformation change in H12 which can either recruit co-activator or co-repressor proteins. Agonist binding stabilizes H12 in a conformation that enhances binding of a co-activator protein and results in an increase of RORt driven gene transcription whereas inverse agonist binding induces a conformation favorable for corepressor protein binding leading to decreased gene transcription. Additionally, ligands falling under the category that includes inverse agonism or neutral antagonism induce a conformation that prevents either co-activator or co-repressor protein binding. At the time of this research, there were no known small molecule binders of RORt outside of endogenous ligands, as such the goal of the project team was to discover novel inverse agonists for the treatment of inflammatory diseases. Homology Modeling of both Antagonist and Agonist Conformations. At the time of this study, a crystal structure was not available of the LBD of RORt. To determine the optimal template for a homology model, a structural and sequence analysis was completed of the nuclear hormone receptor family members. When available, a representative crystal structure was chosen for each family member to compare the diversity of LBD conformations in the public domain. Upon visual inspection and utilizing sequence alignments, it was determined that the most appropriate templates were ROR (PDB ID 1N8368; 53% identity) for the agonist conformation and PPAR (PDB ID 1KKQ69; 26% identity) for the antagonist conformation.

17

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

While the overall goal was to identify novel inverse agonists, compounds that could target both the agonist and antagonist-bound forms of the receptor were also of interest to the team. The 1.63 Å co-crystal structure of human ROR has cholesterol bound in the hydrophobic region of the LBD and adopts an H12 conformation appropriate for agonist binding. The human PPAR co-crystal structure is of lower resolution, 3 Å, and is in complex with the high affinity antagonist GW6471 (IC50 = 240 nm), hence H12 assumes a conformation appropriate for antagonist binding. Although the human PPAR structure has lower sequence identity to RORt and is not a high-resolution structure, it was the most appropriate option for an antagonist conformation at the time of this study.

18

ACS Paragon Plus Environment

Page 18 of 72

Page 19 of 72 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

The RORt homology models were compared, and the primary difference between the structures is the position of H12, as one would expect, demonstrating the induced conformational changes that occur upon agonist or antagonist binding to the LBD. Figure 3A shows the overlay, and helix 12 is highlighted with an arrow. In the antagonist form, Helix 12 flips out into the solvent, while in the agonist form, it sits on the edge of the active site, Figure 3B. When H12 is in the conformation conducive for agonist binding, part of the antagonist binding site is blocked. For example, comparing the overlay of the agonist and antagonist structures, the side chains of residues Phe 506, Tyr 502, Phe 506, His 479, and Leu 483 in the agonist conformation overlap with the antagonist ligand. Hence, H12 must swing out into the solvent to make room for an antagonist ligand to bind.

His 479

Phe 506 Tyr 502 (ag) (ag) H12

(ag)

Leu 483

Phe 506 (an)

Phe 498

A

H12

B

Tyr 502 (an)

H12 (an)

Figure 3: Overlay of RORt agonist (green) and antagonist (purple) conformations. The agonist ligand (cholesterol) is shown in blue while the antagonist ligand (GW6471) is shown in yellow. A. Global view of structure. An arrow highlights H12. B. Zoom-in view of the active site. The residues with significantly different conformations are labeled (His 479, Leu 483, Phe 498, Tyr

19

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

502, and Phe 506). GW6471 is shown in space-filling representation. Ag: agonist conformation, An: antagonist conformation. HTVS Identified Diverse and Active Compounds. A virtual screen of a portion of the Janssen corporate library (~1.5 million compounds) was completed using both agonist and antagonist homology models of RORt, requiring 1 of 2 hydrogen bond constraints to be matched (placed on the guanidinium group of Arg 367 and Arg 364). At the time of the study, there were no known RORt compounds for benchmarking. As such the constraints that were utilized in this study were chosen based on an examination of publicly available ROR cocrystal structures and recognizing important interactions across various chemical series. Only compounds with a GlideScore < -8 were retained and various filters were applied such as an HTS filter (removing compounds with a molecular weight of 80% of the RA population89. For both HLA-DRB1*04:01 and HLA-DRB1*04:04, three distinct sub-pockets were identified: P1, P4 and P9, shown in Figure 7. P1 and its adjacent sub-pocket region has a volume of ~800 Å3 and significant hydrophobic volume for both alleles. P4 and adjacent sub-pocket region has a volume of 614 Å3 in HLA-DRB1*04:01 but is slightly larger in HLA-DRB1*04:04 with a volume of 886 Å3. P1 along with P4 offers significant volume and diverse anchor points for small molecule design. For HLA-DRB1*04:01, P9 and its adjacent sub-pocket region has a volume of 1166 Å3 with 282 Å3 of the site exposed to solvent. However, this pocket is smaller in HLADRB1*04:04 with a volume of 648 Å3 (101 Å3 exposed to solvent). This pocket also has significant hydrophobic volume. Based on this analysis, it was concluded that P1 is the deepest pocket with significant volume and may be the most optimal for HTVS.

27

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 72

HLA-DRB1*04:01 P9

P4

P1

P9

P4

P1

HLA-DRB1*04:04

Figure 7. Pocket volume analysis undertaken to identify sub-pockets amenable to small molecule design for both HLA-DR4:01 and HLA-DR4:04. The chemical environment of the predicted volumes is color-coded based on the chart provided. To validate the pockets identified by vMAP, seven small docking runs were completed using Glide-SP 49, 50 without constraints. A test set of 11K compounds (fully enumerated) were docked into each pocket. For both HLA-DRB1*04:01 and HLA-DRB1*04:04, docking to both P1 and P9 alone resulted in low docking scores. Very few compounds were identified as hits. Additionally, combinations for the pockets were evaluated and P4-P9 HTVS results were similar to HTVS of P1 and P9 alone. However, the combination pocket of P1-P4 yielded many compounds with reasonable scores and appropriate looking binding modes for both HLA-

28

ACS Paragon Plus Environment

Page 29 of 72 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

DRB1*04:01 and HLA-DRB1*04:04. As such, it was concluded that virtual screens would be conducted using joint P1-P4 pockets from both the HLA-DRB1*04:01 and HLA-DRB1*04:04 alleles. HTVS Identified Desirable Compounds Similar to HTS. The HTVS database of 1,033,001 unique compounds which is a portion of the full Janssen library, was docked in the HLA-DRB1*04:01 and HLA-DRB1*04:04 P1-P4 sub-pockets of the binding cleft using GlideSP 49, 50 without constraints. Compounds with a GlideScore < -8.5 were retained from the HLADRB1*04:01 screen (12,919) and with a GlideScore < -8.25 from the HLA-DRB1*04:04 screen (6,722). Duplicate compounds were eliminated, and various filters were applied, such as an HTS filter and the availability of the compound. The resulting 12,972 molecules were then clustered using an internal proprietary clustering methodology54 using Pipeline Pilot55. This resulted in a focused library of 7,260 compounds currently available in the Janssen inventory as the final HTVS prioritized set to be sent for evaluation in the HTS primary assay. Of this list, 63% (4,587 hits) were already part of the HTS deck that was scheduled to be screened alongside the HTVS; hence, 2,673 additional compounds were identified that would have been missed if using just HTS as a method for lead identification. Of the 7,260 focused library compounds prioritized by HTVS, 199 actives were reconfirmed in the confirmation assay at 50% inhibition resulting in a hit rate of 2.7%. In contrast, of the 632,000 compounds in the HTS deck, 1,529 hits were confirmed resulting in a hit rate of 0.2%. A selection of 502 compounds (with 11 identified from HTVS) were 29

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

designated by the project team to move forward into further profiling based on IC50 values and chemical structure to ensure tractability. A cluster analysis was completed of the 502 compounds to confirm diversity of the set. Using an internal proprietary clustering methodology54, a level of 0.5 similarity resulted in 145 clusters with a median of 2 compounds per cluster. 25.5% of the clusters were singletons and the 23.4% contained between 4 and 69 compounds. Five lead series were identified and pursued by the HLA-DR project team. Of these five, three were identified by both HTS and HTVS and two from only HTS. The HTRF IC50 values of the original hits identified from both HTS and HTVS were 3.31, 4.72 and 5.78 M and for the HTS only, 1.50 and 6.06 M. It is interesting to note that although only 11 of the 502 compounds taken forward for additional profiling were identified by the HTVS, 3 of these (27%) were cherry picked by the chemistry team to move forward into lead optimization. This demonstrates that virtual screening was able to identify enriched compound lists with diverse, attractive chemical structures, appropriate to carry forward as potential drug candidates. The compounds were further optimized to nanomolar cellular potency and based on docking models, hypothesized to bind in the P1-P4 sub-pockets, as shown in Figure 8. The series is predicted to occupy the P1 sub-pocket similar to the peptide antigen anchor and span into the P4 sub-pocket.

30

ACS Paragon Plus Environment

Page 30 of 72

Page 31 of 72 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

P4 P1

Figure 8. Predicted binding mode of representative lead series (dark blue). HLA-DR is shown in green, bound to citrullinated aggrecan peptide in orange (PDB ID: 4MD4). The TCR sits on top of the binding cleft (yellow; PDB ID: 1J8H). Virtual Screens Can be Very Efficient. In this case study, ~1 million compounds were screened in the HTVS and after careful curation, a focused library of 7,260 compounds were deemed to have the appropriate size and chemical functionality to test experimentally. In contrast, a set of 632,000 random, diverse compounds were experimentally tested in the HTS. Here, both methodologies identified compounds that moved forward as lead series; however, just using HTVS to identify a smaller, focused library to carry forward into experimental testing would have also resulted in the identification of multiple hit series for this challenging target.

31

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

DISCUSSION AND CONCLUSION At the onset of a drug discovery program, the goal is lead identification: to identify novel compounds with appropriate chemical features that can be taken forward as lead series. HTS is historically the most utilized methodology for this step. Here, we demonstrate how computational chemists can play an important role in lead identification through the implementation of virtual screens as an essential component of the hit identification paradigm. Not all proteins are created equal regarding their behavior in both experimental and theoretical studies, and not all systems have large amounts of data available in the public domain to utilize. We show here that virtual screens can be successful at identifying exciting leads despite not having gold standard data available for every protein system. Our first case study, BTK, is a kinase, and kinases historically tend to perform well in virtual screens likely due to their well-defined and buried pockets and large amounts of available data in literature. In this case, multiple protein structures were utilized to capture protein flexibility in the binding site. Additionally, extensive benchmarking could be performed due to availability of many crystal structures bound to a diverse set of ligands with known binding affinity data. HTVS was able to identify diverse and potent compounds and additionally, for the crystal structures solved, the predicted binding poses were highly accurate. Furthermore, this case study demonstrated how HTVS can be used to successfully jumpstart a

32

ACS Paragon Plus Environment

Page 32 of 72

Page 33 of 72 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

project alone or complement an HTS campaign and mitigate potential primary assay short-comings such as false negatives. Two potent hits were identified by this HTVS and successfully developed to become the lead series for an internal BTK project. Our second case study, RORt, while considerably challenging considering the lack of structural and ligand binding affinity data available, is arguably our most successful HTVS as it led to an NME for the company. Nuclear hormone receptors have a well-defined, buried pocket; however, at the time of this study there were no crystal structures of RORt available so a homology model was utilized. The use of homology models in virtual screens is becoming more mainstream but still can be seen as a negative factor when determining if HTVS is viable for a particular project. This case study supports the use of homology models and shows that they are a viable option when crystal structures are not available or to capture additional conformational forms of the protein. Our final case study, HLA-DR, is considered the most challenging case as the goal was to identify a compound to block a shallow protein-peptide interface. While many crystal structures of HLA-DR are available with peptides bound to the binding groove, the challenge was to determine the most appropriate sub-pocket(s) for virtual screening through extensive pocket evaluation. Utilizing a combination of sub-pockets, the HTVS was able to successfully identify three of the five lead series pursued by the project team. This case study demonstrates that HTVS campaigns can be very successful not only for well-established protein-ligand

33

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

binding sites but also for PPI targets. Additionally, virtual screens are very efficient requiring less compounds to be assayed experimentally, resulting in a time and cost savings. In this work, we have shown that high throughput virtual screening can be used to jumpstart a project or add additional chemical matter to an ongoing HTS. All protein targets should be considered for HTVS even if there is a lack of gold standard data available. This study supports an emerging trend demonstrated for a diverse set of proteins that HTVS hits rates tend to be higher than HTS90. Additionally, the lessons learned here can be applied to ultrahigh throughput screening (UVTS). Virtual libraries with millions or even billions of purchasable compounds are now available such as the Enamine dataset91 and ZINC72 that would allow companies to cover additional chemical space not fully represented in their corporate libraries or to focus on libraries created for a specific target class. Additionally, academics or small companies without sizable propriety databases now have access to large, virtual databases of purchasable compounds available in the public domain. A recent paper by Lyu et al. demonstrates the successful docking of a database of 170 million publicly available, make-on-demand compounds against multiple targets.92 For each target, a diverse set of chemotypes was identified and optimized to nanomolar or picomolar affinity. UVTS and HTVS are currently very important tools in the drug discovery process and are poised to make an even larger impact in the future.

MATERIALS AND METHODS Case Study 1: Bruton Tyrosine Kinase

34

ACS Paragon Plus Environment

Page 34 of 72

Page 35 of 72 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Ligand Database Preparation. A benchmarking database was prepared to evaluate the performance of the docking methodologies. It consisted of 10,993 drug-like decoys, selected from our corporate database. The decoys were seeded with 16 known, active literature compounds with a potency of less than 10 M in BTK biochemical assays. The database was prepared for Glide49, 50

docking using default parameters in the LigPrep93 module of Schrödinger Suite using the default

Ionizer protocol for generating ligand tautomers and ionization states. Additionally, the structures from LipPrep were also used as a starting conformation for elaboration by Omega94; 200 conformations per structure were generated and used for docking with FRED/OE Dock51, 52. A second, previously generated, virtual database of 891,982 unique compounds from the Janssen corporate library was used for the HTVS. This dataset was created using default LigPrep93 parameters and contained various tautomeric and ionization states and stereoisomers when appropriate. Compounds were prepared for FRED/OE Dock51, 52 as described above. Protein Structure Preparation. Twelve BTK crystal structures (PDB IDs = 3P08, 3OCS, 3PJ148, 3PJ248, 3PJ348, 3K5495, 3PIX48, 3PIY48, 3PIZ48, 3GEN95, 3OCT and 3T9T96) available in the public domain at the time of the study were evaluated visually to prioritize two crystal structures that could accommodate Type1, Type 1.5, Type 2, and Type 3 kinase inhibitors. The structures were visually evaluated for missing regions, access to the back pocket and phosphorylation state. From this evaluation, two high resolution crystal structures (PDB ID 3OCS, 1.8 Å resolution and PDB ID 3PJ348, 1.85 Å resolution) were selected for the HTVS. The structures were prepared using the default parameters of Protonate3D97 module in MOE98. The protonated structures were reevaluated for any forcefield assignment clashes in Maestro99 and

35

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

FRED51, 52, respectively. Ligand and waters from the prepared structures were eliminated to derive the final, prepared protein structures. Benchmarking Docking Validation. Self and cross-docking experiments were conducted using Glide-SP49, 50, Glide-XP53 and FRED51, 52. The objective of these experiments was to identify the parameters that would lead to the identification of a docked binding mode conformation as the top ranked pose for each protein (3PJ348 and 3OCS) and help guide the selection of protocols for the mini virtual screens. Mini virtual screens were then set up to evaluate the performance of various docking protocols at 1% and 2% of the ranked database. A drug-like decoy database of 10,993 compounds (as prepared above) along with the 16 known actives was docked to the two prepared protein pockets. Benchmarking experiments included evaluating the impact of a) varying the docking protocol (Glide-SP, Glide-XP, FRED or combinations), b) no constraints versus requiring 1 or 2 hydrogen constraints (constraints located at Glu475 backbone carbonyl and Met477 backbone amine), c) the impact of using a combination of Glide/FRED to optimize runtime versus performance as determined by enrichment. The enrichment factor100 (EF) was calculated as EF = (Actives-DB/Actives-Total) *(N-Total/N-DB), where Actives-DB: # Topranked Actives out of N-DB top ranked compounds docked at a certain % of Database; ActivesTotal is the Total # of Actives out of N-Total # of compounds docked. Only the top scoring pose per compound was saved for the benchmarking runs and 16 docking protocols tested are provided in the supporting information. High-Throughput Virtual Screen. The database of 891,982 compounds from the Janssen corporate library was docked to 3PJ348 and 3OCS binding sites. The final two docking protocols selected from the benchmarking validation runs were used for docking this larger database of compounds on a Linux cluster using 40 Intel Xeon processors (3PJ3: Glide SP49, 50 requiring 1 of 36

ACS Paragon Plus Environment

Page 36 of 72

Page 37 of 72 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

2 hydrogen bonds to the hinge; 3OCS: combination protocol using FRED51, 52 docking to select the top 30% hits followed by re-docking using Glide-XP25,

53

requiring two constraints). OE

Tanimoto Combo scoring was used to prioritize the docked ligands for FRED77, 80 docking and the final selection of docked hits was done based on the Glide Score obtained in the subsequent GlideXP53 docking experiments. The Glide Score distribution for the HTVS hits obtained for each pocket was plotted to determine the number of molecules at 2, 2.5 and 3 from the mean. 2047 molecules (2, Glide Score < -9.19) for 3OCS pocket and 4970 molecules (2.5, Glide Score < 9.05) were pooled, duplicate structures were eliminated, and remaining compounds were clustered using Pipeline Pilot55. ThermoFluor HTS. Kinase domain of human BTK (amino acids 389-659) was purchased from Proteros Biostructures GmbH. BTK kinase domain was expressed in Baculovirus/insect cell expression system. The protein purification included affinity chromatography, ion exchange and gel-filtration. Purity assessed by SDS-PAGE was >90%. The purity and identity of the protein were confirmed in-house by SEC and MS analysis that yielded a single peak with MW of 31561 Da (theoretical MW 31560 Da). The kinase domain was screened in a non-phosphorylated form. Compounds used in this study were selected from Janssen R&D internal collections. The screening samples were provided as 100% DMSO solutions at a putative concentration of 2mM in 384-well microtiter plates (Greiner #781201). Hit confirmations and dose response were performed using samples resupplied typically at 5-10mM in 100% DMSO. ThermoFluor experiments were carried out using instruments owned by Janssen R&D, L.L.C. through its acquisition of 3-Dimensional Pharmaceuticals, Inc. 1,8-ANS was used as a fluorescent dye (Invitrogen A-47). BTK and compound solutions were dispensed into black 384-

37

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

well polypropylene PCR microplates (Abgene TF-0384/k) and overlaid with silicone oil (Fluka 85411) to prevent evaporation. Barcoded assay plates were robotically loaded onto a thermostatically controlled PCR-type thermal block and then heated at a typical ramp-rate of 1 ˚C/min for all experiments. Fluorescence was measured by continuous illumination with UV light supplied via fiber optic and filtered through a band-pass filter (380-400 nm; >6 OD cutoff). Fluorescence emission of the entire 384well plate was detected by measuring light intensity using a CCD camera filtered to detect 500 ± 25 nm, resulting in simultaneous and independent readings of all 384 wells. One or more images were collected at each temperature, and the sum of the pixel intensity in a well was recorded vs. temperature. The typical imaging time was 2-30 seconds. If multiple images were collected at a given temperature, the intensity per well of these images was averaged. Reference wells contained BTK without compounds, and the assay conditions were as follows: 0.04 mg/ml BTK and 70M 1,8-ANS in 50mM PIPES pH 7.0, 100mM NaCl, 0.002% Tween-20. During the HTS, the compounds were tested at a single dose of 30 M. HTS hits were confirmed first in duplicate at the screening dose and then profiled further in dose response. Typically, compounds were serially diluted in 100% DMSO by 1:2 from a high concentration of 5-10 mM over 12 columns within a series (column 12 is a reference well containing DMSO only). The compounds were robotically dispensed directly into assay plates (50nL) using a nl capillary dispenser HummingBird (DigiLab/CyBio AG) or an acoustic dispenser Echo (Labcyte Inc.). Following compound dispense, protein and dye solution in buffer was added to achieve the final assay volume of 3 L followed by 1 L of silicone oil. The binding affinity was estimated as

38

ACS Paragon Plus Environment

Page 38 of 72

Page 39 of 72 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

reported previously in using thermodynamic parameters of protein unfolding H(Tm) = 100 kcal/mol and Cp(Tm) = 4 kcal/mol.70 Reference Tm was determined for each 384-w plate run because its value depends on the protein batch, buffer conditions and may slightly vary from one ThermoFluor instrument to another. In this study, the typical Tm value for BTK was 47.5° C. Cell Based HTS. The Ramos B cells were obtained from the American Type Culture Collections (ATCC, Rockville, MD). Goat anti-IgM (immunoglobulin M) was purchased from Acris Antibodies (San Diego, CA). Hank’s balanced salt solution (HBSS) was obtained from Life Technologies (Carlsbad, CA). HEPES was purchased from Thermo Fisher Scientific (Waltham, MA). RPMI and PenStrep were obtained from CellGro (Corning, NY). Heat-inactivated fetal bovine serum (FBS) was purchased from PAA (Pittsburgh, PA). Black, clear-bottom, 384-well, poly D-lysine coated plates were obtained from Greiner Bio-One (Monroe, NC). The calcium assay kit was purchased from BD Biosciences (San Jose, CA). CGI-1746 was synthesized in house and can be obtained from Sellekchem (Houston, TX). Ramos cells were maintained in RPMI + 10% FBS + 1× PenStrep at a density between 5×105 cells/mL and 1.2×106 cells/mL. The day before the assay, cells were seeded in RPMI + 1% FBS + 1× PenStrep. On the day of the assay, cells were resuspended in media containing 1% FBS at a density of 1.5 x106 cells/mL. An equal volume of the no-wash calcium dye was added to the suspension. Cells were seeded into a 384-well poly D-lysine coated plate using a Multidrop Combi (Thermo Fisher Scientific, Waltham, MA) in a volume of 40uL. Cells were incubated at 37 °C/5% CO2 for 1 h. Compounds were diluted in HBSS supplemented with 20 mM HEPES, using a Bravo liquid handler (Agilent Technologies, Santa Clara, CA). Compounds were added to the cells

39

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(10uM final concentration) with a VPrep liquid handler (Agilent Technologies, Santa Clara, CA). Cells were incubated with compound at room temperature for 30 min. Anti-IgM was prepared in HBSS supplemented with 20mM HEPES and 0.1% bovine serum albumin. Cells were stimulated with EC80 anti-IgM. The change in fluorescence was recorded in the FLIPR Tetra (Molecular Devices, Sunnyvale, CA) both pre- and post anti-IgM addition. Positive inhibitor control wells were treated with CGI-1746. FLIPR traces were analyzed using ScreenWorks 3.2 (Molecular Devices, Sunnyvale, CA). Raw data is exported from the FLIPR as maximum signal minus minimum signal in relative light units (RLUs) during the kinetic read. Data were exported to 3DX (J&J proprietary software) for normalization, Z’ analysis, and calculation of percent inhibition. Curve fitting and IC50 calculation for dose response data was performed in 3DX or Graphpad Prism (Graphpad Software, San Diego, CA). The data were converted to percent inhibition by normalizing the response to controls. The controls were DMSO (negative control: wells A23-P23) and CGI1746 (10 M, positive control: wells I24-P24). For quality control, results from an assay plate were discarded if z’95% pure, Compound 2 is >92% and Compound 5 is >88%.

Case Study 2: RAR-Related Orphan Receptor  t

42

ACS Paragon Plus Environment

Page 42 of 72

Page 43 of 72 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Homology Modeling: Antagonist and Agonist Conformations. At the time of this study, a crystal structure of RORt LBD was not available. As such, the homology modeling tool Prime101, 102, from the Schrödinger Suite of Programs99, was used to generate a model of the antagonist and agonist conformations of the LBD of RORt. The ROR crystal structure PDB ID: 1N8368 (53% identity) was utilized as the template for the agonist conformation and the PPAR crystal structure PDB ID: 1KKQ69 (26% identity) for the antagonist conformation, using default parameters. The crystal structures were first prepared using the Protein Preparation Wizard within Maestro99 including adding hydrogens, filling in missing side chains, optimizing hydrogen bonds and a restrained minimization of all protein atoms. Given the fact that hormone receptor structures are well conserved across its family and number of conserved residues that could be used to guide the sequence alignments and the high sequence identity between RORt. and ROR for the agonist conformation, the resulting models are considered high quality and the tertiary structure, secondary structure and side chain conformations of conserved residues are maintained. Ligand Database Preparation. A virtual database was previously created from a portion of the Janssen corporate library using default LigPrep99 parameters to generate tautomers, ionizations states (using Epik103,

104),

and stereoisomers. The final virtual data set contained

1,510,503 unique compounds. At the time of the study, there were no known RORt compounds for benchmarking.

43

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

High-Throughput Virtual Screen. The virtual dataset of ~1.5 million compounds was docked to both the agonist and antagonist homology models of RORt using Glide HTVS 49, 50. Constraints were placed on the guanidinium group of Arg 367 and Arg 364 and matching 1 of 2 of the hydrogen bonds was required. Analysis of the HTVS results was performed using Pipeline Pilot55. All scored compounds were filtered using the following criteria: GlideScore of