Subscriber access provided by UNIV OF TASMANIA
Article
GPCR-Bench: A Benchmarking Set and Practitioners’ Guide for G Protein-Coupled Receptor Docking Dahlia R Weiss, Andrea Bortolato, Ben Tehan, and Jonathan S. Mason J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.5b00660 • Publication Date (Web): 09 Mar 2016 Downloaded from http://pubs.acs.org on March 13, 2016
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
Journal of Chemical Information and Modeling is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 23
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
GPCR-Bench: A Benchmarking Set and Practitioners’ Guide for G Protein-Coupled Receptor Docking Dahlia R Weiss, Andrea Bortolato, Benjamin Tehan and Jonathan S Mason* Heptares Therapeutics Ltd, BioPark, Broadwater Road, Welwyn Garden City, Herts, AL7 3AX, UK
*
To whom correspondence should be addressed. Phone: +44(0)1707 358646; Fax: +44(0)1707 358640; E-mail:
[email protected] Keywords: GPCR, Molecular Docking, Virtual Screening
Abstract Virtual screening is routinely used to discover new ligands, and in particular new ligand chemotypes, for G Protein-Coupled Receptors (GPCRs). To prepare for a virtual screen, we often tailor a docking protocol that will enable us to select the best candidates for further screening. To aid this, we created GPCRBench, a publically available docking benchmarking set in the spirit of the DUD and DUD-E reference datasets for validation studies, containing 25 non-redundant high-resolution GPCR co-structures with an accompanying set of diverse ligands and computational decoy molecules for each target. Benchmarking sets are often used to compare docking protocols, however it is important to evaluate docking methods not by ‘retrospective’ hit-rates, but by the actual likelihood they will produce novel prospective hits. Therefore, docking protocols must not only rank active molecules highly, it must also produce good poses that a chemist will select for purchase and screening. Currently, no simple objective machinescriptable function that can do this; instead docking hit lists must be subjectively examined in a consistent way to compare between docking methods. We present here a case study highlighting considerations we feel are of importance when evaluating a method, intended to be useful as a practitioners’ guide.
1 ACS Paragon Plus Environment
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Introduction Virtual screening of G Protein-Coupled Receptors (GPCRs) is a routine and often very fruitful method for the discovery of new ligands1-4. The use of retrospective benchmarking sets of ligands and decoys to compare docking methods is well established5-10. A recent explosion in the number of GPCR structures1115 , as well as a constantly expanding body of ligand binding data, necessitates a periodic update of the molecular docking benchmarking sets we use to assess competing protocols for virtual screening. It is also important to have a practitioner’s view, evaluating different methods not in terms of whether actives are in the top percentile of the ranked hits, but in terms of whether those actives would actually be picked by an experienced computational chemist in the way they are docked, and how many different chemotypes/binding motifs are found. Molecules with the same chemotype can broadly be defined as having a general common scaffold and share chemical features important for binding. Even if it is possible to define the different chemotypes in a database using computational chemistry approaches, the exact classification depend on the protein target, the chemical space evaluated and the patent landscape. The final chemotype definition is often subjective and agreed within the project team. In an attempt to achieve an analysis as close as possible to a realistic scenario for the adenosine A2A receptor test case, that we had previously worked on using structure-based virtual screening to get hit structures1, we follow the same chemotypes/binding motifs and sub-types definition used in the project environment. We must normally evaluate only the top 1% or less, not the higher numbers sometimes used, as in a real-world scenario if 500K compounds are screened only about 1K compounds (or 0.2%) will be looked at by a human being. Drilling deeper in a docking list, discussed later, is a possibility, but manual selection from the evaluaton of docked structures is critical to avoid large numbers of false positives. Chemotypes/binding motifs must be considered, as similar overall hit rates may result from the high enrichment of one chemotype or multiple diverse chemotypes (the desired outcome). Finally, credible poses for the true ligands are of tantamount importance, as in a real-world case only those active compounds with an acceptable binding pose will be picked and tested (together with false positives). The importance of GPCRs as pharmaceutical targets warrants the creation of specialist benchmarking sets. This has been done many times, for example the GPCR Decoy Database or GDD16. The ligands in the GDD itself were based on the GLIDA database17, first constructed in 2006. The DUD-E dataset also contains the 5 GPCR targets for which there were high-resolution structures in 2012, when it was constructed. However, over the past 3 years the number crystal structures has quintupled, with 25 nonredundant ligand binding GPCR high-resolution structures available as of January 2015 (the actual number of GPCR structures is higher, because several receptors have been co-crystalized with multiple ligands a number of times). Notably, structures are now available for several classes of GPCR, which should allow docking protocols to be evaluated/developed for each type of binding site (small molecule, lipid, peptide etc). We present here GPCR-Bench: an updated ligand and decoy benchmarking set for the 25 ligand binding GPCRs with high resolution crystal structures as of January 2015. Evaluation of docking and scoring protocols is a complicated task18. Clearly, the success or failure of a hit identification effort will depend largely on the ability of the protocol to identify correctly molecules that will bind to the receptor. Protocols are often tailored for each receptor individually, and might be continually updated as a drug design campaign proceeds. The choice of background decoys is one of critical importance as the outcome of any comparisons will depend on the decoys9. While enrichment factors and retrospective hit rates make comparison of methods very straightforward, they cannot truly 2 ACS Paragon Plus Environment
Page 2 of 23
Page 3 of 23
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
inform a ‘real-life’ virtual screen used to prospectively find hits. It is just as important to ask whether high scoring hits would actually be chosen by a computational chemist for having a good, or at least acceptable, pose (i.e. the docking-based selection is right for the right reasons). Equally important is the enrichment of many diverse chemotypes with good poses. As shown by Good et al19 enrichments based on chemotype space provide a more realistic measure of virtual screening success, and can provide a differentiation of methods that is masked by overall hit rates, particularly in active data sets containing large numbers of closely related analogues. Such categorizations into chemotypes based on substructure and selection of acceptable poses inevitably contain elements of subjectivity, but reflects what happens in real life prospective virtual screens. We present here a detailed case study of the adenosine A2A Receptor (A2AR) as a practical guide to evaluating docking and scoring protocol
3 ACS Paragon Plus Environment
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Methods Ligand Retrieval from ChEMBL19. Ligands were extracted for each target from the ChEMBL19 database , using an automated script based on the ChEMBL Web Service API. We aimed to automate the retrieval process as much as possible, so that the dataset could be updated periodically as new releases of ChEMBL become available. Only compounds with bioactivity_type corresponding to Ki, Kd, IC50 or EC50 and with a value less than or equal to 10000 were stored. We chose the value of 10000 as an upper limit however, as described below, a more stringent cutoff was applied except where there was a paucity of available ligand data. High concentrations can introduce false positives via non-specific mechanisms such as compound aggregation22, and it is therefore preferable to use a stricter threshold where possible. Compounds with assay_description matching PUBCHEM_BIOASSAY or DRUGMATRIX were not included as these data sources led to many false positives in the set. Compounds were then filtered for ‘drug-likeness’, which included the following terms: Sulfur Count