Prediction of Loops in G Protein-Coupled Receptor ... - ACS Publications

Mar 15, 2016 - IITB−Monash Research Academy, IIT Bombay, Mumbai 400076, India ... of these errors on loop modeling using ICM High Precision Sampling...
0 downloads 0 Views 492KB Size
Subscriber access provided by MAHIDOL UNIVERSITY (UniNet)

Article

Prediction of Loops in GPCR Homology Models: Effect of Imprecise Surroundings and Constraints. Bhumika Arora, Thomas Coudrat, Denise Wootten, Arthur Christopoulos, Santosh B Noronha, and Patrick M. Sexton J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.5b00554 • Publication Date (Web): 15 Mar 2016 Downloaded from http://pubs.acs.org on March 16, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Information and Modeling is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

JCIM ci-2015-00554v

Prediction of Loops in GPCR Homology Models: Effect of Imprecise Surroundings and Constraints. AUTHOR NAMES. Bhumika Arora 1,2,3, Thomas Coudrat 4, Denise Wootten 4, Arthur Christopoulos 4, Santosh B. Noronha* 1, and Patrick M. Sexton* 4.

AUTHOR ADDRESSES 1 Department of Chemical Engineering, IIT Bombay, Mumbai, 400076, India. 2 Department of Pharmacology, Monash University, Clayton, Victoria, 3800, Australia. 3 IITB-Monash Research Academy, IIT Bombay, Mumbai, 400076, India. 4 Drug Discovery Biology, Monash Institute of Pharmaceutical Sciences and Department of Pharmacology, Monash University, Parkville, Victoria, 3052, Australia.

KEYWORDS. GPCRs, GPCR structure models, loop structure prediction, homology modeling, loop anchors, loop anchor positions in models, loop modeling.

ACS Paragon Plus Environment

1

Journal of Chemical Information and Modeling

Page 2 of 44

JCIM ci-2015-00554v

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ABSTRACT. In the present study, we explored the extent to which inaccuracies inherent in homology models of the transmembrane helical cores of GPCRs could impact on loop prediction. We demonstrate that loop prediction in homology models is much more difficult than loop reconstruction in crystal structures, owing to the imprecise positioning of loop anchors. Deriving information from 17 recently available GPCR crystal structures, we estimated all possible errors that could occur in loop anchors as the result of comparative modeling. Subsequently, we performed an exhaustive analysis to decipher the effect of these errors on loop modeling using ICM high precision sampling. The influence of the presence of other extracellular loops was also explored. Our results reveal that the error space of modeled loop residues is much larger than that of the anchor residues, however, modeling a particular extracellular loop, in the presence of other extracellular loops provides constraints that help in predicting near-native loop conformations observed in crystal structures. This implies that errors in loop anchor positions introduce increased uncertainty in the modeled loop coordinates. Therefore, for the success of any GPCR structure prediction algorithm, minimizing errors in the helical end points is likely to be critical for successful loop modeling.

INTRODUCTION G protein-coupled receptors (GPCRs) constitute a large family of membrane proteins that are involved in a vast variety of physiological functions.1 They form the largest group of potential drug targets, with >30% of the current therapeutic compounds acting through these GPCRs.2 Thus, knowledge of their three dimensional structure is important for rational drug design and understanding basic principles of molecular recognition. However, as with other membrane proteins, the elucidation of their structure has been extremely difficult. The limited availability of experimental structures for GPCRs has led to a need for alternate methods to derive high-

ACS Paragon Plus Environment

2

Page 3 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

JCIM ci-2015-00554v

resolution structural information. As such, various structure prediction efforts involving methods ranging from de novo to homology-based approaches have been applied to GPCRs.3 Homology or comparative modeling is a method by which a 3D model of a protein with unknown structure (target) can be built from a homolog protein with which it shares a significant sequence similarity and whose structure is known (template).4 GPCRs share a common architecture comprising seven transmembrane helices connected by alternating intra and extracellular loops. As this architecture is conserved, homology modeling is a preferred approach for modeling the trans-membrane helical segments of GPCRs.5, 6 Nonetheless, it has limitations in application to overall GPCR structure prediction due to the relative paucity of template structures, and high variance in loop regions of currently solved crystal structures. Moreover, these homology models are most closely related to the template structure and therefore, they can only be considered low resolution approximates of the target protein structure. The extracellular loops of GPCRs play a significant role in ligand binding by facilitating access to binding sites and/or making important interactions with the ligands.7-11 Thus, the structure and conformation of these loops need to be predicted with high precision. The loop prediction problem can be formulated as generation and identification of near-native loop conformation(s) given the structure of the rest of the protein; that structure could be in the form of exact experimental coordinates or more commonly, an inexact model. The aim is to identify nearnative conformations for a given loop that fit the two anchor regions. Here anchors are defined as the backbone atoms of the protein core that precede and follow the loop, but are not part of it.4 Similar to other classes of protein, the loops are the most variable regions of GPCR sequences and structures. This variability leads to complexity in loop modeling rendering homology-based methods alone largely unsuitable.

ACS Paragon Plus Environment

3

Journal of Chemical Information and Modeling

Page 4 of 44

JCIM ci-2015-00554v

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Lack of structural homology in loop regions of the currently known GPCR structures indicates that the ability to generate accurate loop structures from ab-initio principles would be beneficial to the field, but this is again a very challenging task in itself mainly because of two reasons: 1) The preferred conformation of loops often remains unknown even when the rest of the protein structure is resolved at high resolution. This is due to the high flexibility of loops that is often related to their function. Also, the conformational space of a loop increases exponentially with its length. 2) Any modeling technique used for transmembrane domain (TM) helical region structure prediction will result in some inaccuracies; as such, there will be errors in the helical ends (loop anchors), which could drastically affect the subsequent loop modeling. During the last few decades, a variety of loop prediction tools have been proposed. These methods can be basically categorized into ab initio techniques and database search algorithms (knowledge-based methods). Moreover, some of the prediction methods utilize elements of both these approaches. Database search methods build loops by searching a large database of known loop structures for ideal loop candidates, based on geometric fitness criteria. SuperLooper is an example of this method that uses the LIP (Loop In Protein)12 and LIMP (Loop In Membrane Proteins) databases to search for template loop fragments.13 On the other hand, ab initio folding techniques explore all, or most, probable theoretical conformations of the candidate loops and select the loop conformation based on the minimum energy value or scoring function. Approaches that have been developed for exploring conformational space can further be divided into two broad categories viz. (1) geometric and probabilistic sampling; and (2) kinematic loop closure. The former is implemented in programs including MODELLER4,

14, 15

, ICM16-18, and

PLOP19, 20. Cyclic coordinate descent21, random tweak22-24 and analytical loop closure25-27 are

ACS Paragon Plus Environment

4

Page 5 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

JCIM ci-2015-00554v

examples of methods that follow the kinematic loop closure. Cyclic coordinate descent21 and analytical loop closure28 have been implemented in Rosetta29, 30. Despite much progress, none of these methods have been broadly tested for GPCR loops except for the recent works of Goldfeld et al.31,36 (using PLOP) and that of Nikiforovich et al.32. The Nikiforovich et al. method involved geometrical sampling of the space between loop anchors to select the potential loop closing conformations, which were subsequently energy minimized using the ECECPP/233, 34 force field. Moreover, most of the methods described above have been tested for reconstruction of loop conformations in proteins where the starting point is a native crystal structure. However, predicting loops in homology models is quite different from restoring loops in native crystal structures due to the structural inaccuracies present in homology models. Homology modeling has been widely used for modeling the helical segments of GPCRs because of the sequence conservation in these regions within receptor sub-classes. But as homology modeling is template-dependent, the model produced will closely resemble the template structure. Thus, in realistic scenarios of loop prediction in GPCR structure models, the conformation of the transmembrane core may still contain significant structural inaccuracies. This implies that terminal attachment points of loops on the protein core model would also have deviations from their crystal structure positions and orientations. In this regard, the sensitivity of the loop modeling to these errors in loop anchors needs to be estimated and this is the aim of the present study. Although some work has been done by Fiser et al.14 and Misura and Baker35, where loops were predicted in distorted starting structures, this was not applied to GPCR loops. Goldfeld et al. in their most recent work36 have examined their method of loop structure prediction on a homology

ACS Paragon Plus Environment

5

Journal of Chemical Information and Modeling

Page 6 of 44

JCIM ci-2015-00554v

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

model of β2AR built using the β1AR as the template; a very close homolog of the β2AR and thus represents the best case scenario. Nonetheless, this method performed well, even for extended loops (>10 amino acids). In the current work we have, for the first time, completed a comprehensive analysis of the effect of inaccuracies in loop anchors, on loop modeling. The present study estimates the range of errors that would occur in loop anchors of family A GPCRs as a result of homology modeling and analyses their effect on loop structure prediction. A previously published loop modeling algorithm, high precision sampling of loops,17,

18

implemented in the ICM modeling suite, has been adopted as the loop modeling method for this study. It uses a biased probability Monte Carlo search procedure for conformational sampling of loops.17 An internal coordinate mechanics force field supplemented with a solvent-accessible surface area solvation model optimized specifically for protein loops is used as the energy function.18 Loop prediction results obtained using this method were reported to be significantly better than or comparable to those achieved by other loop modeling methods.18 MATERIALS AND METHODS In a naïve homology model of TM helices, the backbone structure is the same as that of the aligned segment of the template structure. Therefore, the range of errors produced in the ends of TM helices by homology modeling can be estimated by finding the deviations between the corresponding helical end residues of various GPCR crystal structures. In this study, one of the GPCRs whose crystal structure is known was taken as the protein to be modeled (target) and all other GPCRs with known structures were used as the templates. All the templates were superimposed on the target crystal structure and the Cα distances of the helical end residues of this target GPCR crystal structure, from aligned residues of all other structures (templates), were calculated. Considering that the backbone coordinates of TM helices of a

ACS Paragon Plus Environment

6

Page 7 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

JCIM ci-2015-00554v

homology model will be the same as those of the aligned segment of template structure, these distances provide an estimate of errors that may arise in the loop anchors as the result of homology modeling. Materials used 1) GPCR structures used: The inverse agonist (Carazolol) bound crystal structure of the β2adrenergic receptor (β2AR) (PDB ID: 2RH1) was used as the target. All other GPCRs (16 in number), whose crystal structures were known at the time of this study, were taken as the templates (Table 1). For this study, crystal structures of all available class A GPCRs in their inactive conformations have been used (except for the rat neurotensin receptor NTSR1). These structures were either bound to inverse agonists that reduce the basal activity or to neutral antagonists that maintain basal activity. 2RH1 was selected as the target as this structure has no missing residues in the loop regions, the helical and loop regions are properly and precisely annotated and it has the highest resolution structure available for the β2AR (Resolution = 2.4 Å). Ligands and T4-lysozyme wherever present were removed from all the structures used. 2) Loop modeling method used: ICM High Precision Sampling of loops (HPS)18 has been used as the loop modeling method for the present study. It uses a biased probability Monte Carlo method for conformational sampling.17 3) Supercomputing facility used: Victorian Life Sciences Computation Initiative (VLSCI) clusters were used for completion of the study. Protocol used

ACS Paragon Plus Environment

7

Journal of Chemical Information and Modeling

Page 8 of 44

JCIM ci-2015-00554v

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

First of all, the performance of ICM HPS in modeling GPCR loops in crystal structures was analyzed. Certain parameters were required to be optimized in the process (see below). Then, the range of errors produced by homology modeling in loop anchors was estimated. Finally, the effect of these errors on loop modeling was analyzed. For this, these errors were built into the loop anchors and loop modeling was performed using the perturbed ends. 1) ICM HPS performance analysis for GPCR loops For this analysis, ICM HPS loop modeling was performed for all three extracellular loops (ECLs) of representative members of the 4 subgroups of Class A GPCRs; 2RH1 (subgroup α), 4GRV (subgroup β), 3ODU (subgroup γ) and 3VW7 (subgroup δ), with loop length and numbering detailed in Table 2. In the case of the second extracellular loop, only its distal portion was modeled i.e. distal ECL2: residues 192-196 (5 residues) in 2RH1. [As per our terminology, using 2RH1 as the exemplar, the conserved cysteine of ECL2 (Cys191) that forms the disulfide bond with 3rd transmembrane helix (Cys1063.25), divides the ECL2 into proximal and distal parts. As such, the region of ECL2 running from helix 4 up to this cysteine (Cys191) is defined as the proximal portion and the region beyond this up until the top of helix 5 as the distal portion]. For each of the three loops, modeling was done in the presence and absence of other loops. For modeling in the absence of other loops, the coordinates of loops other than the one to be modeled were deleted from the PDB file. Energy and RMSD (from the crystal structure) of the lowest energy conformations were used as the measure of performance. As ICM HPS loop modeling uses a stochastic, biased probability algorithm, different loop modeling repeats may produce different lowest energy conformations. However, with sufficient sampling the majority of runs converges towards the same or similar lowest energy conformations. In ICM HPS this is achieved by optimizing the “thoroughness” parameter that

ACS Paragon Plus Environment

8

Page 9 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

JCIM ci-2015-00554v

defines the simulation length. Therefore, by increasing the value of the thoroughness parameter, convergence in terms of (similar) lowest energy conformations can be achieved; the lowest thoroughness value to achieve convergence is thus computationally optimal. 10 repeats of the ICM HPS loop modeling were performed for each of the 3 three loops of each receptor using different thoroughness values i.e. 1, 5, 7, 10, 15, 20 and 50 (Supporting Information). Although improvements in convergence could be seen for individual loops by increasing thoroughness from 1, no significant improvements in convergence or energy score was achieved with thoroughness values higher than 10 (Supporting Information Table S1, figures S1-S3), however, the optimal thoroughness varied across receptors and ECLs. Where there was no convergence, generally 2 populations of conformations were evident, with the higher precision solution more prevalent with thoroughness parameters ≥5 (Supporting Information Table S1). It should also to be noted that as the thoroughness value defines the simulation length, the computational time taken for modeling a particular loop increases proportionally with it. Thus, looking to the tradeoff between time taken and number of similar lowest energy conformations generated per 10 repeats, the thoroughness value of 10 was considered optimum for modeling of 2RH1 loops that was used as the template for anchor point perturbation. 2) Estimating the range of errors produced in loop anchors by homology modeling Residues in different GPCRs that are positionally equivalent to the helical top residues (loop anchors) of 2RH1 were found. Subsequently, deviations in the position of Cα atoms of the helical top residues of 2RH1 from those of the corresponding (positionally equivalent) residues in other GPCRs were identified by structural superimposition of receptors onto the 2RH1 structure (Figure 1).

ACS Paragon Plus Environment

9

Journal of Chemical Information and Modeling

Page 10 of 44

JCIM ci-2015-00554v

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

These deviations were represented by covariance matrices, where the coordinates of the Cα atoms of the helical top residues of the 2RH1 crystal structure represent the population mean. In order to map the loop anchor spacing distributions as well, these covariance matrices were calculated for the paired loop anchors (see Supporting Information Tables S2-S5). The spread around the population mean, as depicted by the covariance matrix, gives an estimate of the errors that may occur in loop anchors as the result of homology modeling.

The positionally equivalent residues were identified by structure-based multiple sequence alignment. For this: (i) Sequences of the 17 GPCRs used in the study were aligned using the multiple sequence alignment tool of ICM modeling suite. The multiple sequence alignment was done using the ZEGA (Zero End-gap Global Alignment) algorithm with the Gonnet substitution matrix37 and the optimal gap open and gap extension penalties of 2.4 and 0.15 respectively;38 (ii) The helical boundaries of the 2RH1 sequence were adopted from Cherezov et al.39; (iii) The multiple sequence alignment was then manually edited, so that there were no gaps within these boundaries in any of the sequences and the most conserved residues and motifs in each helical segment were aligned. In the resulting alignment, residues of the other GPCRs that were aligned with the helical boundaries of 2RH1 are referred to as their positional equivalents. The final multiple sequence alignment that defines the positionally equivalent residues (for one of the 7 helical segments) is shown in Figure 2. The helical boundaries and the most conserved residue in each helix are represented by the Ballesteros-Weinstein numbering scheme40. The positionally equivalent residues of all the 6 loop anchors of 2RH1, thus obtained, are shown in Table 3. 3) Analysis of the effect of these anchor residue errors on loop modeling using ICM HPS

ACS Paragon Plus Environment

10

Page 11 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

JCIM ci-2015-00554v

a) On the basis of the covariance matrices calculated in section (2), 100 anchor perturbations were generated for each of the 3 loops of 2RH1 studied. For this, according to the respective paired covariance matrix, all the atoms of 2 consecutive residues (1 residue of loop anchor + 1 residue of loop end) on each end of the loop were perturbed. The rest of the protein geometry was kept the same as that of the crystal conformation. b) 10 repeats of the ICM HPS loop modeling were then performed for each of these 100 perturbations. This was completed both in the presence and absence of other loops. RESULTS AND DISCUSSION Estimating the range of errors produced in loop anchors by homology modeling As the recently available GPCR crystal structures span all four of the phylogenetic groups of the family A GPCRs i.e. alpha, beta, gamma and delta,41 they make a representative dataset of all types of structural variations that occur among the members of this subfamily. Therefore, deriving information from 17 recently available GPCR crystal structures, we estimated all possible errors that could occur in GPCR loop anchors as the result of comparative modeling. These errors are presented in the form of paired covariance matrices involving both anchors of a loop (see Supporting Information). Cα coordinates of the helical top residues in 2RH1 and their positional equivalents in the 16 other GPCRs, obtained from superimposing the 16 template GPCRs on the target 2RH1, are shown in Figure 3. Spacing between the helical tops (i.e. Cα distances between each of the three pairs of loop anchor residues) in 2RH1 and the corresponding distances between their positional equivalents in the other 16 GPCRs are shown in Table 4. It was seen that the difference in loop anchor spacing between different GPCR structures varies independently for each loop. For example, the M2 and M3 muscarinic receptors show a large

ACS Paragon Plus Environment

11

Journal of Chemical Information and Modeling

Page 12 of 44

JCIM ci-2015-00554v

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

deviation in the distance between helix 4-5 tops (3.6 and 4.0 Å, respectively) and a small deviation in distances between helix 2-3 tops (0.26 and -0.23 Å, respectively) and helix 6-7 tops (0.27 and 0.20 Å, respectively) when compared with those of 2RH1 (Table 4). Thus use of multiple templates i.e. different templates for different pairs of helices could produce better homology models. It was also noted that even though the opioid receptors belong to a different phylogenetic group (gamma group) of family A GPCRs than 2RH1 (beta group), they show very little deviation in loop anchor spacing from those of 2RH1. However, the CXCR4 chemokine receptor that also belongs to the gamma group shows much larger deviations (Table 4). This diversity could be attributed to differences in the lengths of helices, presence of helical kinks and other deformations of TM helices like bulges, as has been noted previously by Goldberg and Colleagues36. Collectively this illustrates that even the similar overall sequence similarity across templates can lead to different extent of errors in loop anchors; additional knowledge of the common features between target and template may provide a more rational guide to choosing an optimal template. As the TM helices are the most conserved regions in GPCRs, for the present study we have assumed that there are no gaps in the helical segments within the multiple sequence alignment. However, this may not always be the case. For example, the presence or absence of kinks and bulges may occur due to insertion and deletions in these helical regions that may not be captured by automated multiple sequence alignment (MSA) programs. Multiple sequence alignment, thus, often requires manual editing that is very user-dependent. Since the users, generally, do not have a priori knowledge of the position of kinks and bulges, the homology models are quite similar to the template. In this regard, we use the rational approach of deducing errors inherent to GPCR

ACS Paragon Plus Environment

12

Page 13 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

JCIM ci-2015-00554v

model helical ends by comparing helical ends of the target and their positional equivalents in other GPCR structures. The Cα coordinates of the helical tops of 2RH1 and their positional equivalents in the 16 other GPCR structures provide us with a representative dataset of all types of errors that could occur in the loop anchors of 2RH1 models created using other family A GPCRs as templates, and also of the errors that could occur in other family A GPCR models using 2RH1 as template. Here, we represented these errors in the form of paired covariance matrices involving both anchors of a loop. These covariance matrices were calculated assuming that the loop anchor coordinates of models from various templates (other GPCRs coordinates) follow a normal distribution with their crystal conformation coordinates (2RH1 coordinates) as the population mean. These matrices are significant as their extrapolation provides a rational estimate of all possible errors that could occur in the loop anchors of homology models of family A GPCRs.

ICM HPS performance analysis for GPCR loops As ICM HPS loop modeling is stochastic, different repeats of the loop modeling may produce different lowest energy conformations. However, with sufficient sampling the majority of runs should result in the same or similar lowest energy conformations. This was achieved by optimizing the “thoroughness” parameter that defines the simulation length for representative examples of each Class A GPCR subgroup. Detailed analysis of optimization of the thoroughness parameter is presented in Supporting Information Table S1). In almost all cases, increasing the thoroughness to 5 improved convergence, although, in some cases 2 populations of solutions were observed; where this occurred the lower RMSD solution was found in the most abundant population. For 2RH1, a thoroughness value of 10 was determined to be optimal

ACS Paragon Plus Environment

13

Journal of Chemical Information and Modeling

Page 14 of 44

JCIM ci-2015-00554v

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(Supporting Information). The results of HPS loop modeling at this value are detailed in Table 2 and the loop conformations generated are shown in Figure 4. In general, ICM HPS performed well in predicting the near-native conformations ECLs with the outcome dependent upon both length of the modeled loop and the extent to which other loops constrained the conformational sampling available to the individual loop. For ECL1 (3/4 GPCRs) and distal ECL2 (2/4 GPCRs) high precision was observed when the rest of the protein was in crystal conformation and other loops were present. ECL3 and loops longer than 6 amino acids performed less well overall. For the exemplar 2RH1, the median backbone RMSDs of the lowest energy conformations of ECL1 and distal ECL2 obtained were 0.42 Å and 0.49 Å respectively, whereas for ECL3 it was 3.71 Å. For estimating the significance of these results in terms of experimental data, we calculated the backbone RMSDs of the crystal conformations of these loops in 2RH1 from those in other available structures of the β2AR. These RMSDs are listed in Table 5. The superposition of various β2AR structures on 2RH1 is shown in Figure 5. From the RMSDs listed in table 5 and the overlay of ECL1 and distal ECL2 loops of different β2AR crystal structures, it is evident that these 2 loops are constrained to nearly the same conformations in the structures bound to different ligands and/or present in different activation states. These RMSDs for ECL1 and distal ECL2 are comparable to the ones achieved by ICM HPS. Thus ICM HPS is efficient in predicting the near-native conformations of these loops. The inaccurate prediction of ECL3 can to some extent be attributed to the flexibility of this loop, as is evident from the variation seen in its conformation across different crystal structures of β2AR (Figure 5), and this is likely to be true for loops in the other GPCRs where RMSD values of >1 were observed (Table 2).

ACS Paragon Plus Environment

14

Page 15 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

JCIM ci-2015-00554v

ECL2 in β2ARs is constrained by the presence of two disulphide bridges and a small helix and thus has similar conformations across all structures. ECL1 is short and interacts with the large ECL2, consequently its conformational space is reduced and it is also found in nearly identical conformations in all the structures of the β2AR. ECL3, in all 4 receptors, is away from other two loops and does not interact with them. This likely contributes to the flexibility of ECL3 seen across different β2AR structures (Figure 5), and to the relatively poor RMSD. In all cases, except for the very short (4 amino acids) dECL2 of 4GRV, removal of other loop constraints negatively impacted on RMSD for loops modeled with high precision in the presence of other loops (Table 2). However, where limited interaction with other loops occurred, or for shorter loops, the loop prediction precision was not markedly affected, with the longer loops having RMSDs of >1.5 Å both in the presence and absence of other loops (Table 2). For the exemplar 2RH1, in the absence of other loops, the median backbone RMSDs of the lowest energy conformations of ECL1, distal ECL2 and ECL3 as predicted by ICM HPS were 3.77 Å , 1.75 Å and 3.72 Å, respectively. These higher RMSDs for ECL1 and distal ECL2 from those when modeled in presence of other loops, clearly indicate that the presence of nearby loops, due to steric hindrance or the interactions with them, limits the conformational space to be searched. Thus, the neighboring loops act as constraints and help in predicting the near-native conformations. As the location of ECL3 is away from the other two loops, its prediction accuracy is not affected by the presence or absence of other loops. These results highlight another major issue in loop prediction in real world GPCR modeling; the conformations of other loops are also unknown and this adds to the difficulty of the problem. In this regard, when modeling loops in GPCRs, one needs to carefully consider which loop is to be modeled first. Our results provide some insight into this: the loops that are shorter in length and

ACS Paragon Plus Environment

15

Journal of Chemical Information and Modeling

Page 16 of 44

JCIM ci-2015-00554v

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

constrained by interactions from the non-loop regions of the protein should be modeled first. These shorter loops once built in place will constrain the large conformational space of the longer loops. Once all the loops are built, the process could be iterated to refine the conformations. For each of the cases of loop modeling by ICM i.e. 3 loops and each modeled in the presence and absence of other loops, it was seen that in the stack of conformations generated for a loop, the near-native conformations (RMSD