Subscriber access provided by Universiteit Utrecht
Article
A Hybrid Approach to Structure and Function Modeling of G Protein-Coupled Receptors Dorota Latek, Marek W. Bajda, and Slawomir H. Filipek J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.5b00451 • Publication Date (Web): 15 Mar 2016 Downloaded from http://pubs.acs.org on March 17, 2016
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
Journal of Chemical Information and Modeling is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 42
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
A Hybrid Approach to Structure and Function Modeling of G Protein-Coupled Receptors Dorota Latek†*, Marek Bajda†‡, Sławomir Filipek†* †Faculty of Chemistry, Biological and Chemical Research Centre, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland ‡Department of Physicochemical Drug Analysis, Faculty of Pharmacy, Medical College, Jagiellonian University, Medyczna 9, 30-688 Cracow, Poland * To whom correspondence should be sent
ACS Paragon Plus Environment
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ABSTRACT
The recent GPCR Dock 2013 assessment of serotonin receptor 5-HT1B and 5-HT2B, and smoothened receptor SMO targets, exposed the strengths and weaknesses of the currently used computational approaches. The test cases of 5-HT1B and 5-HT2B demonstrated that both the receptor structure and the ligand binding mode can be predicted with the atomic-detail accuracy as long as the targettemplate sequence similarity is relatively high. On the other hand, a low target-template sequence similarity observed, e.g., between SMO from the frizzled GPCR family and members of the rhodopsin family, hampers the GPCR structure prediction and ligand docking. Indeed, in GPCR Dock 2013, accurate prediction of the SMO target was still beyond the capabilities of most research groups. Another bottleneck in the current GPCR research, as demonstrated by the 5-HT2B target, is the reliable prediction of global conformational changes induced by activation of GPCRs. In this work, we report details of our protocol used during GPCR Dock 2013. Our structure prediction and ligand docking protocol was especially successful in the case of 5-HT1B and 5-HT2B-ergotamine complexes for which we provide one of the most accurate predictions. In addition to a description of the GPCR Dock 2013 results, we propose a novel hybrid computational methodology to improve GPCR structure and function prediction. This computational methodology employs two separate rankings for filtering GPCR models. The first ranking is ligand-based while the second is based on the scoring scheme of the recently published BCL method. In this work, we prove that usage of knowledge-based potentials implemented in BCL is an efficient way to cope with major bottlenecks in the GPCR structure prediction. Thereby we also demonstrate that the knowledge-based potentials for membrane proteins were significantly improved due to the recent surge in available experimental structures.
ACS Paragon Plus Environment
Page 2 of 42
Page 3 of 42
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
INTRODUCTION G Protein-Coupled Receptors (GPCRs) constitute the largest superfamily of human membrane proteins. Due to their fundamental role in cellular signal transduction they are important drug targets. GPCR drugs demonstrate either agonistic or antagonistic effect when interacting with a receptor. They inhibit or activate physiological processes associated with the nervous, endocrine and cardiovascular systems of the human body. GPCR drugs may also exhibit a modulatory effect when they bind to the secondary, allosteric binding sites of the receptor.1, 2 Lack of crystal structures for the majority of G protein-coupled receptors and difficulties in capturing their fully-active or semi-active conformational states prompted the use of homology models in GPCR drug discovery. In principle, the highly conserved fold of GPCRs, comprised of seven transmembrane α-helices, should guarantee accurate homology models and a high success rate in structure-based drug design as long as the target-template sequence identity is above 30%.3 However, despite the well-conserved fold, GPCRs often share sequence identities of less than 30% and many studies4 have shown that the pharmacological effect of GPCR ligands is extremely sensitive to even slight changes in the receptor binding site. This is in contrast to other drug targets such as, for example, adenosine deaminase (ADA) or Factor Xa (FXa), the homology models of which often perform equally well in virtual screening as a high-quality experimental structure.5, 6 Over the last five years progress in crystallization methodologies for membrane proteins, and GPCRs in particular, resulted in as many as twelve new experimental GPCR structures per year on average. Nevertheless, it will be impossible to determine experimental structures for more than 800 wild-type human GPCR receptors7, their disease-causing mutants, polymorphic variants and complexes with all small molecules of interest. Therefore, rather than diminish, the role of comparative modeling of GPCRs and ligand docking is expected to increase over the coming years due to the availability of additional template structures.
ACS Paragon Plus Environment
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
The importance of GPCR modeling and docking generates the need to develop new theoretical approaches and carefully evaluate their performance. In this context GPCR Dock is a critical assessment of the current state-of-art of the computational methodology used in prediction of both GPCR structures and ligand binding modes.3,
4
During GPCR Dock 2013, as the group #4628
named ‘Warsaw’, we were ranked the first among 44 participating groups in the 5-HT2B target category and consistently ranked among the top groups in the remaining categories.3 Those successful results prompted us to describe our method in detail and propose ways to further improve our results in future GPCR Dock assessments. In GPCR Dock 2013 we used our recently developed GPCRM web service8 for target-template alignment, model building and subsequent loop refinement. GPCRM employs a profile-profile alignment, multiple structural templates and Z-coordinate-based filtering to build a GPCR model. In addition to the GPCRM methodology we ranked models of 5-HT1B, 5-HT2B and SMO receptors by their performance in the ligand docking. Namely, we used the convergence of docking runs, general features of binding modes and the ChemPLP scoring function in GOLD9 to select GPCR models suitable for the extra-precision ligand docking in Glide10. Final models of ligand-receptor complexes were supplied to the organizers of GPCR Dock 2013 for further evaluation. Despite our successful GPCR Dock 2013 results regarding 5-HT1B and 5-HT2B complexes with ergotamine, the ligand-based selection of models failed in the case of the SMO receptor (see Results). What is more, as we had inspected post-factum, our ligand-based evaluation of GPCR models only weakly correlated with their similarity to the crystal structure assessed by the RMSD value. Fan et al.6 drew the same conclusion after more extensive studies performed also for proteins from other families. They observed that the lowest RMSD protein models do not always guarantee success in the ligand docking6. In this manuscript, we propose a simple improvement of our GPCR model selection by the usage of BCL::Score11, which is a membrane-fitted knowledge-based energy
ACS Paragon Plus Environment
Page 4 of 42
Page 5 of 42
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
function implemented in the BCL::Fold program12,
13
. BCL::Fold simplifies membrane protein
structures to the level of regular secondary structure elements (SSE) such as α-helices and βstrands. The precise placement of such SSEs in the three-dimensional space defines the overall topology of the majority of well-structured proteins including the membrane protein class.12 Notably, such reduction of interactions to the most stabilizing ones makes the BCL::Score energy landscape smooth and enables gradient based minimizations.11 A small number of experimental structures of membrane proteins deposited in the Protein Data Bank imposes serious limitations on the statistical analysis and therefore there are only few studies on membrane-fitted statistical potentials. Those few studies include, e.g., a membrane-oriented knowledge-based force field implemented in Rosetta membrane ab initio14 and other Rosetta structure prediction protocols15. Statistical potentials, based on the Bayesian statistics and the Boltzmann equation, were also implemented in FILM316, which is a program for structure prediction of membrane proteins. Contrary to statistical potentials used for structure prediction of membrane proteins, there are quite a few knowledge-based methods for the membrane protein model quality assessment, e.g., ProQM17 , and for the prediction of one-dimensional features such as transmembrane topology, e.g., TOPCONS18. In this manuscript we demonstrate that our hybrid approach to GPCR complex structure prediction (see Fig. 1) involving both knowledge-based and ligand-based methods, helped to overcome limitations of each of these groups of methods. For example, the ligand-based selection of models concentrates on the binding site neglecting the rest of the protein. On the other hand, the selection of models based on specific knowledge-based potentials derived for structure prediction improves the overall fold of a protein while neglecting the local arrangement of amino acids inside the binding site. In general, hybrid approaches to protein structure prediction are used when a protein structure is too large or too complex to be solved with a single computational or experimental
ACS Paragon Plus Environment
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
method.19 Hybrid approaches combine various theoretical methods in one modeling pipeline and are often used to manage with sparse, yet easy to obtain experimental data from cross-linking, cryoEM, NMR, EPR or, for example, evolutionary constraints derived from high-throughput genome sequencing20. The biggest advantage of hybrid methods is their intrinsic ability to self-update and self-refine when additional structural data are acquired. Moreover, diversity of computational methods and structural data used in hybrid approaches for structure prediction limit inaccuracies resulting from theoretical approximations or fuzziness of experimental data.21, 22 MATERIAL AND METHODS Model building To build models of 5-HT1B, 5-HT2B and SMO receptors we used a standalone version of GPCRM8 supplemented with the Rosetta cluster application. In the standalone version of GPCRM a user can manually adjust the alignment and select the template set. Also, the number of GPCR models generated in a single MODELLER (GPCRM-MODELLER) or Rosetta (GPCRM-Rosetta) run can be increased for the sake of improving their accuracy. The standalone version of GPCRM is available on request from authors of this manuscript, while the online version of GPCR is available at http://gpcrm.biomodellab.eu. To build a GPCR model GPCRM selects a template (or templates) based on the target-template sequence similarity and a desired receptor activation state. Using that methodology in GPCR Dock 2013 we chose beta-1 adrenergic receptor (β1AR) bound to a full agonist, carmoterol, (PDB id: 2Y02)23 as a template structure to build 5-HT1B and 5-HT2B receptor models in a semi-active conformational state (the fully active conformational state is achieved only with bound G protein or an antibody that mimics it). Here, only a single template structure was used due to the relatively high sequence identity, exceeding 30%, between both targets and the β1AR template. On the
ACS Paragon Plus Environment
Page 6 of 42
Page 7 of 42
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
contrary, to build a SMO receptor model in an inactive conformational state, we used a multiple template set (see Supporting Information Table S3) due to the extremely low sequence identity (