A universal activation index for class A GPCRs - ACS Publications

as a Python script or via an interactive web page. .... using the wild type human H4R as reference structure to which the other structures .... model ...
0 downloads 0 Views 2MB Size
Subscriber access provided by Gothenburg University Library

Computational Biochemistry

A universal activation index for class A GPCRs Passainte Ibrahim, David Wifling, and Timothy Clark J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.9b00604 • Publication Date (Web): 26 Aug 2019 Downloaded from pubs.acs.org on August 27, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

A universal activation index for class A GPCRs Passainte Ibrahima, David Wiflingb and Timothy Clarka* a Computer-Chemie-Centrum,

Department of Chemistry and Pharmacy, Friedrich-Alexander

University Erlangen-Nürnberg, Nägelsbachstraße 25, 91052 Erlangen, Germany. b Pharmaceutical/Medicinal

Chemistry II, Institute of Pharmacy, University of Regensburg,

Universitätsstraße 31, D-93040 Regensburg, Germany. [email protected]

Abstract: An index of the activation of Class A G protein coupled receptors (GPCRs) has been trained using inter-helix distances from a series of µsec molecular-dynamics simulations and tested for 268 published X-ray structures. In a three-class model that includes intermediate structures, 63% of the active structures are classified in agreement with the experimental assignment, 81% of the intermediate structures and 89% of the inactives. An alternative two-state model classifies 94% of the actives and 99% of the inactives correctly. The intermediate structures are distributed 2:1 between actives and inactives. X-ray structures with protein nanobodies give good agreement between the assigned activation state and the predictions of the model, whereby many active nanobody structures are predicted to be weakly active. The five inter-helix Cα-Cα distances that occur in the model relate clearly to the established activation mechanism. The model is available as a Python script or via an interactive web page. It can thus be used to classify both experimental and computational GPCR structures.

ACS Paragon Plus Environment

1

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 34

Introduction The experimental identification of the structural changes that occur in G-protein coupled receptors (GPCRs) on activation represents a major achievement of modern structural biology.1,2 Once an experimental activated structure became available, it was possible to simulate the deactivation process by multi-microsecond molecular-dynamics simulations,3 at that time a very significant achievement. However, simulation methodology has progressed remarkably in the past five years. It is now possible to use massively parallel conventional supercomputers combined with metadynamics4 enhanced-sampling techniques to investigate active and inactive binary (receptor-ligand or receptor-intracellular binding partner, IBP) or ternary (receptor-ligand-IBP) GPCR complexes5,6 computationally. Data on GPCR structures in the context of the activation of the receptor are thus becoming increasingly available, so that we can now construct a model for an activation index based on key geometrical data. Such an index should be applicable to both X-ray structures and simulation trajectories. It should also be easily calculated, so that it can be used for time-dependent analyses of simulation trajectories. The problem is that it is not certain that the available data are sufficient to build a fully validated and general model. Indeed, it is not certain that a universal activation mechanism exists for GPCRs, so that the success or failure of a model for a wide variety of receptors also represents a valuable result. We also aim to build a quantitative, rather than binary model. This is not possible for X-ray structures, which have largely been classified as active or inactive. Although some “intermediate” structures are available, we would need to assign arbitrary partial activity values in order to train a model. It is important to emphasize here that no activity values or even accurate classifications are available for the X-ray structures. The assumption is always implicit that the activity state of the crystallized protein corresponds to that of the corresponding biological system. This is not necessarily always the case. Furthermore, choosing

ACS Paragon Plus Environment

2

Page 3 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

to construct an activity model from simulation data alone and then test it on X-ray structures provides a valuable validation for both sources of structures and demonstrates their mutual consistency. A valuable series of data for a quantitative model exists: we have recently investigated7 human and mouse variants of the histamine 4 receptor (hH4R and mH4R, respectively) together with four hH4R mutants. The significance of these systems is that they display varying degrees of basal (constitutive) activity (i.e. activity in the absence of an activating ligand), so that the data are suitable for constructing a quantitative activity model based on geometrical parameters from the simulations. H4R is to our knowledge the only receptor for which such activity data are available. At the beginning of the project, we did not know if H4R is generic enough to serve as a model for Class A GPCRs but the success of the A100 model reported below suggests that it is. We used these data to construct a local activity model for H4R7 but data on a single receptor alone are not sufficient for constructing a generally applicable model. We have therefore used data from a further ten published simulations,5,6,8 for which binary active/inactive assignments are possible. We previously used these simulations to investigate the similarities and differences between GPCR ternary (agonist-GPCR-IBP) complexes with G-protein and β-arrestin IBPs.5,6,8 However, they contain far more information than can be described in a single publication.9 We now use the last 500 ns of each simulation as additional input data for model building. Because there is no direct way to compare activities between different receptors, we used a simple two-state model for receptors other than H4R, so that active receptors were assigned the value 100 and inactive 0. This represents an acceptable approximation in building a quantitative model as it extends the range of the parameterization systems to make the model more general. Nonetheless, because we do not

ACS Paragon Plus Environment

3

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 34

have accurate relative activity values relative to hH4R for these receptors, the use of 0 and 100 for inactive and active respectively is an approximation that cannot be avoided. As an independent validation of the model, we analyzed the available X-ray crystallographic structures for active and inactive GPCRs in order to compare the activation criteria obtained from the simulations with activation states assigned from the crystallographic data.

Training data Alignment UCSF Chimera10 was used to align and superimpose the sequences of 16 previously published simulations (see Table 1) of nine receptors for comparison and model building. The multiple superposition was carried out by treating each transmembrane helix separately, using the wild type human H4R as reference structure to which the other structures are matched in a pairwise manner. Additionally, we have added columns that show highly conserved residues, allowing each transmembrane helix to be aligned with respect to the most conserved amino acid residue. The quality of the overlay is evaluated by the corresponding root-mean-square deviation (RMSD) and the number of residues paired (N); both lower RMSD and higher N improve the superposition. A structure-based sequence alignment was then generated using the standard NeedlemanWunsch algorithm11 with percent identity scoring. However, the best-matching chains are identified based on additional alignment scores, including 70% weighting of residue similarity (BLOSUM-62 substitution matrix12), and 30% weighting of secondary-structure elements.

ACS Paragon Plus Environment

4

Page 5 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

The fit further excludes gap insertion into helices. Conformationally unrelated regions such as residue pairs far apart in space, flexible intracellular and extracellular loops and termini are also excluded, even when they are aligned in their respective sequences. The settings chosen proved to perform well, especially when proteins of different length are homologous or have common active site features but share low percent identity. The training data include 3 cases: the μ opioid receptor, the muscarinic M2 receptor and the β2adrenergic receptor, which were simulated as ternary and binary complexes bound to the same ligand, Bu72, Iperoxo and Adrenaline respectively (see Table 1). We used visual inspection to validate the choice of secondary structure scoring, by comparing the superposition of binding-site residues of each of these receptor complexes, with and without the score. This assessment suggests that none of the binding-site residues is correctly superposed in space without secondary structure scoring. The complete alignment process is defined in the script included in the Supporting Information. After elimination of loops and termini, the alignment used 213 amino-acid residues in each of the systems used. They are divided as follows: 30 in TM1, 30 in TM2, 36 in TM3, 26 in TM4, 32 in TM5, 34 in TM6 and 25 in TM7. The alignment for the nine unique receptors listed in Table 1 is shown in Figure S1 of the SI. Consequently, 194 distances between adjacent Cα-atoms of these residues were measured and finally used for the training set. Table 1: Structures used to construct the model. Structures were sampled for 500 ns at the end of each simulation. For complexes, only systems with both an agonist ligand and a G-protein are classified as active.

ACS Paragon Plus Environment

5

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Receptor

Mutant

Ligand

hH4R

wild type

Ligand function

Page 6 of 34

IBP

Activity

Ref.

-

-

100

7

S179M

-

-

86

7

F169V

-

-

45

7

F169V+S179M

-

-

20

7

F168A

-

-

0

7

mH4R

wild type

-

-

0

7

ADRB2

wild type

adrenaline

agonist

GαS

(100)

8

M2R

wild type

iperoxo

agonist

Gαi

(100)

8

µOR

wild type

BU72

agonist

Gαi

(100)

8

ADRB2

wild type

alprenolol

antagonist

β-arrestin

(0)

6

wild type

carvedilol

antagonist

β-arrestin

(0)

6

wild type

isoprenaline

antagonist

β-arrestin

(0)

6

wild type

ICI 118,551

antagonist

β-arrestin

(0)

6

wild type

adrenaline

agonist

-

(0)

5

M2R

wild type

iperoxo

agonist

-

(0)

5

µOR

wild type

BU72

Agonist

-

(0)

5

Distances between Cα-atoms situated on different helices were collected every nanosecond from the last 500 ns of each simulation and averaged. This gave 194 inter-helix distances. The H4R simulations were assigned activity values from 0 to 100 based on the % activity relative to hH4R.13 The systems used to train the model and their activity values are shown in Table 1. Apo-H4receptors are assigned their basal activity relative to hH4R wild type, ternary complexes with agonist and G-protein 100% activity and both ternary complexes with antagonists and binary ligand-receptor complexes 0%.

ACS Paragon Plus Environment

6

Page 7 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Model Building The training data (194 distances each for 16 simulations) were used to construct a model using manual stepwise regression. This was necessary because of the limited number of data used for training, so that emphasis was placed on building a very robust (in contrast to the most accurate) parsimonious model. The distance that correlated most strongly with the residual of the previous model in the stepwise procedure was generally added to the model until it was judged that the model was complete. It is clear that alternative models exist but we placed emphasis on as small a scatter as possible of the inactive (0) and active (100) structures. In the stepwise regression procedure, the five distances that correlated most strongly with the residual model error were identified at each step. Of these, the one was selected for inclusion in the model that gave both an improvement in R2 and grouped the A100-values of the active (0) and inactive (100) receptors as closely as possible in order to avoid large spreads of the extreme values.

ACS Paragon Plus Environment

7

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 34

Note that, because we cannot assign consistent activity values between different receptors, this fitting procedure is to some extent a classification, rather than a classical regression. Nonetheless, classification procedures such as decision trees would not give a numerical activity index, which was our aim. It, however, was performed without reference to the training set (see below) and with

Figure 1: Results for the training set. Apo-H4R receptors are shown in red, receptor complexes in black. The black dashed line is that resulting from the regression equation for all data points, the red one that for the H4R variants only. The regression equations are given in the appropriate colors. anonymized inter-residue distances, so that no operator-bias could occur. The final model includes five C-C distances, one between TM1 and TM7, one between TM6 and TM7, one between TM5 and TM6, one between TM3 and TM4 and one between TM2 and TM3. Figure 1 shows the performance of the model for the training set. The scatter of the results is approximately as expected, with inactive structures ranging from A100 = 13.6 to 17.2 and fully active ones from 61.5 to 91.4. R2 for the complete training set is

ACS Paragon Plus Environment

8

Page 9 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

0.923, and for the H4R variants alone 0.987. A partial least squares regression (PLS) using the same five distances gave a three-component model with a similar R2 (0.924) and with a leave-oneout cross-validated R2CV = 0.797 but a larger spread of the “active” and “inactive” values. The A100 model was adjusted to give an intercept for the regression line of zero, which leads to a rootmean-square error for all data points of 15.5. The model clearly separates the actives and inactives in the training set. The model gives well-separated “two sigma” ( two standard deviations) ranges of -20.4 to 20.7 for the “inactives” and 48.1 to 110.2 for the “actives”. The final activity model is:

A100 = -14.43r(V1.53-L7.55) – 7.62r(D2.50-T3.37) + 9.11r(N3.42-I4.42) - 6.32r(W5.66-A6.34) - 5.22r(L6.58-Y7.35) + 278.88

(1)

where A100 is the activity index on a scale from 0 (inactive) to 100 (basal activity of hH4R wild type). The five C-C distances that occur in the model are shown with their regression coefficients in Figure 2.

ACS Paragon Plus Environment

9

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 34

Figure 2: Schematic diagram of the five distances that occur in the model with their regression coefficients, shown in the boxes. The residues are numbered and labeled for the reference hH4R receptor (see Figure 1).

Validation: X-ray structures For an independent validation of the model, the sequences of 50 receptors in 268 GPCR crystal structures were aligned as described above. Note that these structures serve purely for validation. They were not used to build the model and were used unchanged from the PDB entries. Part of our aim was to test the effect of non-biological modifications used to obtain X-ray structures on the predictions of the model. The resulting alignments are shown in Table S2 of the SI. The Cartesian

ACS Paragon Plus Environment

10

Page 11 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

coordinates for the ten key residues were extracted in every case and the five inter-residue distances were calculated. A100 was calculated for each structure using equation (1). These activation-index values were compared with the experimental assignment of the structure as “active” (67), “intermediate” (32) or “inactive” (169). Histograms of the results obtained are shown in Figure 3.

Figure 3: Histograms of the A100 values for the 268 X-ray structures classified according to the experimentally assigned state. The individual A100-values are given in Table S3 of the SI. The borders between the classes in the three-state model are shown as black dashed lines and that for the two-state model in red. If we first consider the active and inactive structures, whose assignment should be relatively uncontroversial, A100 separates the classes very well, providing convincing independent validation of the model. The means  one standard deviation of the A100 values for the two classes are 58.2  22.0 for the active group and -25.4  18.9 for the inactives, suggesting good separation. Note

ACS Paragon Plus Environment

11

Journal of Chemical Information and Modeling

that many structures, especially inactive ones, give A100-values outside the 0-100 training range. This is a consequence of the approximation that all inactive training structures were set to 0 and all active ones to 100. Constitutional (basal) activity of the “inactive” training structures can lead to this result. We chose to set an upper limit of A100 = 0 for classifying unknown structures as inactive and a lower limit of A100 = 55 for actives. These borders are shown as vertical dashed lines in Figure 4. If we then use the classification scheme A100 < 0 A100 = 0-55 A100 > 55,

Inactive: Intermediate: Active:

we obtain the confusion matrix shown in Table 2. Twenty-five active structures are misclassified (24 as intermediate and one as inactive) and 19 inactives (all as intermediate), an accuracy of 63% for actives and 89% for inactives. If we ignore the “intermediate” structures and set the border between active and inactive strictures at A100 = 25, the value for the inactives increases to 99% and for the actives to 94%. We can therefore conclude that the model is very well able to distinguish between active and inactive Class A GPCR-structures, either from MD trajectories or from X-ray structures, based on five inter-helix distances. Table 2: Confusion matrix obtained for model defined in equation (1) using the alignments shown in Figure S1 for the validation set of 268 GPCR X-ray structures. Results for the three- and twostate models are shown. Assigned experimentally

n = 268

Active Borders Predicted

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 34

Active Intermediate Inactive

Intermediate Inactive

> 55

0-55

25

88.8% < 25

Active

63

(20)

2

Inactive

4

(12)

167

94.0%

-

98.8%

2-state accuracy

“Intermediate” X-ray structures Thirty-two X-ray structures have been described as representing partially activated “intermediate” (persistent partially activated) conformations of the receptor. These structures give a mean A100-value of 33.7 with a standard deviation of 31.0. Figure 3 shows that there is considerable overlap between the active and intermediate classes in the A100-range 20-60. This can be expected because there are certainly partially, or weakly active structures among those X-ray structures that have been classified as active, so that we can reasonably expect overlap with the intermediate structures. We can thus conclude that the structures described as intermediate are indeed weakly active but that approximately one third of the active structures also fall in this class. If we consider the intermediate structures to be weakly active, a two-state model in which the A100 border lies at 25 classifies both the active and the inactive structures extremely well (94 and 99% accuracy, respectively) and divides the intermediate class into 12 inactives and 20 (weakly) actives. Thus, the X-ray crystal structures characterized by the experimentalists as intermediate are recognized by the A100- index as being weakly active. However, these are static structures, which raises the question as to whether partial activity (as opposed to a receptor that is only weakly active) results from a single static structure or oscillations between active and inactive

ACS Paragon Plus Environment

13

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 34

conformations. The simulation data for the hH4R F169V mutant (basal activity = 45%) allow us to investigate the time-dependent behavior of A100 for a partially active structure. The results are shown in Figure 4.

Figure 4: (a) Time-dependence of A100 for the last 500 ns of the hH4R F169V simulation. The red dashed lines indicate the borders of the intermediate class in the three-state model. (b) Histogram of the A100 values.

The A100 values deviate between -18.7 and 84.9 with a mean value of 36.1  18.0. The distribution of the data points in the classes of the three-state model is 10 inactive, 408 intermediate and 83 active. The histogram of the A100-values reflects this distribution with a maximum in the A100 = 35-40 bin and some skew towards active structures. On balance, the results for this simulation indicate that the structure oscillates between intermediate and active (or weakly and fully active) conformations. The shape of the histogram

ACS Paragon Plus Environment

14

Page 15 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

hints that these two conformations may be stable minima but this is largely conjecture. However, this point can be clarified with metadynamics simulations using A100 as the collective variable. Such studies, both with hH4R F169V and with ternary partial agonist complexes of other receptors, are underway to clarify this point.

Structures with nanobodies Table 3 shows the structures with protein nanobodies as surrogates for bound G-proteins. With one exception, these structures have been assigned to the active state, as protein nanobodies were originally developed specifically to stabilize the active state.18 We have therefore specifically tested the activation state of these structures using A100. The results are shown in Table 3 and Figure 5. Table 3: Details of GPCR X-ray structures with nanobodies and their calculated A100-values. Receptor

Species

5-HT2B receptor A2A receptor M2 receptor M2 receptor US28

Human Human Human Human Strain AD169 Strain AD169 Strain AD169 Turkey Turkey Turkey Turkey Turkey Turkey Human Human Human Human Human Human Human Human Mouse Mouse Human Human

US28 US28 β1-adrenoceptor β1-adrenoceptor β1-adrenoceptor β1-adrenoceptor β1-adrenoceptor β1-adrenoceptor β2-adrenoceptor β2-adrenoceptor β2-adrenoceptor β2-adrenoceptor β2-adrenoceptor β2-adrenoceptor β2-adrenoceptor κ receptor μ receptor μ receptor AT1 receptor β2-adrenoceptor

Antibody

A100

Active Active Active Active Active

Gprotein Gs -

Antibody Fab fragment Nanobody Nb35 Nanobody Nanobody Nanobody 7

46.25 57.86 53.33 56.28 91.75

3.5

Active

-

Nanobody 7 Nanobody B1

90.39

4XT1

2.9

Active

-

SignProtMimic_Nanobody

101.31

6IBL 6H7O 6H7N 6H7L 6H7J 6H7M 6MXT 4QKX 4LDE 4LDL 4LDO 3SN6 3P0G 6B73 5C1M 6DDE 6DO1 5JQH

2.7 2.8 2.5 2.7 2.8 2.8 3.0 3.3 2.8 3.1 3.2 3.2 3.5 3.1 2.1 3.5 2.9 3.2

Active Active Active Active Active Active Active Active Active Active Active Active Active Active Active Active Active Inactive

Gs Gi -

Nb80 Nb6B9 Nb6B9 Nb6B9 Nb80 Nb6B9 Nanobody 71 SignProtMimic_Nanobody SignProtMimic_Nanobody Nanobody Nanobody Antibody SignProtMimic_Nanobody Nanobody Antibody Antibody Nb.AT110i1 Nanobody

30.51 32.10 32.01 32.64 32.60 31.81 44.12 27.36 21.85 21.66 24.29 44.00 33.83 51.19 60.19 53.68 40.84 -11.46

PDB ID 5TUD 6GDG 4MQT 4MQS 5WB1

Resolution (Å) 3.0 4.1 3.7 3.5 3.5

5WB2

State

ACS Paragon Plus Environment

15

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 34

The experimental assignments are largely supported by A100. The one inactive structure (PDB ID 5JHQ) is assigned correctly (A100 = -11.5). However, the majority of the structures (18) fall into the intermediate (weakly active) class. We can therefore confirm that the protein antibodies stabilize active states of Class A GPCRs and that the crystal structures reflect this state. However, the A100-distribution suggests that the nanobody-stabilized structures tend to be weakly active. The two-state model classifies the all but three (4LDE, 4LDL and 4LDO) structures in agreement with the experimental assignment.

Figure 5: Histogram of the activation index calculated from class A GPCR X-ray structures with nanobodies, as given in Table 3. The vertical dashed lines indicate the borders between classes (A100 = 0 and 55) in the three-state model. Structures described in the experiment as active are shown in green and the one inactive structure in red. We also investigated the last 500 ns of three simulations of ternary nanobody complexes; the 2 adrenergic receptor with adrenaline (4LDO),19 the -opioid receptor with BU72 (5C1M)15 and the M2 muscarinic receptor with the superagonist iperoxo (4MQT).14 Histograms of the calculated

ACS Paragon Plus Environment

16

Page 17 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

A100-values for the last 500 ns of 2 s simulations are shown in Figure 6. The three simulations behave differently. Whereas the M2R-simulation with the superagonist iperoxo gives a narrow distribution with a mean A100-value of 126.1  9.6, the ADRB2 with the weak agonist adrenaline gives an equally narrow distribution at a significantly lower A100-value (37.8  9.2), which is consistent with the strengths of the two agonists. The most interesting results, however, are found for the -opioid receptor with BU72, which shows a relatively high (78.0) mean A100-value, indicating significant activity, but a far larger standard deviation than the other two simulations (30.7). Moreover, the histogram shows a distinctly bimodal distribution; the peak at approximately 100 indicates a fully active conformation and that around 20 a weakly active one. Thus, the three simulations indicate a stable strongly active conformation for M2R/iperoxo, a weakly active one for ADRB2/adrenaline and oscillations between fully and very weakly active conformations for OR/BU72. These results suggest that both stable intermediate conformations and systems in equilibrium between two differently active

Figure 6: Histogram of the activation index calculated for three simulations of ternary complexes with nanobodies. The red bars indicate M2R with iperoxo, blue the -opioid receptor with BU72 and green ADRB2 with adrenaline. The vertical dashed lines indicate the borders between classes (A100 = 0 and 55) in the three-state model. ACS Paragon Plus Environment

17

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 34

conformations are both possible. The time-dependent plots of A100 (Figure S1 of the SI) show clear transitions between conformations for OR/BU72 in contrast to stable single conformations for M2R/iperoxo and ADRB2/adrenaline.

Physical meaning of the model Although correlation does not imply causality, we can attempt to relate the five distances that occur in the model to accepted mechanisms of GPCR activation. The ten residues that define the five distances in the model are scattered over all transmembrane helices, one each in TMs 1, 2, 4 and 5, and two each in helices 3, 6 and 7, which are generally thought to be important for activation.20,21 All but one of the distances (N3.42-I4.42) becomes shorter on activation of the receptor. Figure 7 shows a schematic view of the movements encoded in the A100-model.

Figure 7: Schematic view of the helix rearrangements encoded in the A100-model. Red distances become shorter in the activated conformation, the green one longer. The movement of helices 6 and 7 is shown by solid lines and that of helix 3 by dashed ones.

ACS Paragon Plus Environment

18

Page 19 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

The concerted movements of helices 5, 6 and 7 are shown by solid lines. The exoplasmic end of TM7 moves closer to that of TM6, whereas its cytoplasmic end moves closer to TM1. At the same time, the cytoplasmic end of TM6 moves closer to that of TM5. This movement ultimately results in the outward movement of the cytoplasmic end of TM6 that is necessary for activation.22 As a result of the lever effect of L6.58 moving closer to Y7.35 at the exoplasmic end of TM7, the cytoplasmic end of TM6 moves away from TM7, which in turn moves closer to TM1 (V1.53-L7.55). These movements are all consistent with the activation mechanism recently outlined by Manglik and Kruse.23 Additionally, TM3 moves away from TM4 (N3.42- I4.42) and towards TM2 (D2.50-T3.37). Because the A100 model only uses distances, it does not recognize specific structural switches,24 such as the TM3-TM6 ionic lock25 or the transmission switch.20 We regard this as an advantage as such switches may, or may not, be involved in the activation mechanism, so that the A100 model is independent of such features.24 Nonetheless, the residues that occur in the A100 model are often close to those involved in structural switches. For instance, the ionic lock occurs between R3.50 and E6.30 in bovine rhodopsin. A6.34, and L6.58, loosely in the region of the ionic lock, occur in the model. Similarly, the transmission switch involves residues in the same region and P5.50, whereas W5.56 occurs in the model. We can conclude that, although strictly speaking the residues and distances that occur in the model need not be causal for activation, the activation movement indicated by the model is related to a physical activation process. However, because inter-helix distances to adjacent residues

ACS Paragon Plus Environment

19

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 34

correlate strongly, the distances that occur in the model are merely diagnostic; others may perform equally well.

Conclusions The A100 activation model was trained using simulation data and validated with X-ray structures. Its performance for the latter, which are in every sense independent of the training data, encourages us to believe that the model is general for Class A GPCRs. The model can detect intermediate structures between active and inactive, probably because partially basally active H4R variants were used in training. The dynamic behavior of the partially active hH4R F169V mutant can be taken to indicate that the receptor oscillates between active and intermediate conformations but this is not conclusive. Additional metadynamics simulations are underway to clarify this point. There is considerable overlap between intermediate structures and active ones, which suggests that many of the X-ray crystal structures described as active are in fact partially active. The exact conformational behavior of partially activated receptors is of major interest and is being investigated. Significantly, the model can treat both ligand-induced and basal (constitutive) activity and confirms activity assignments made for X-ray crystal structures in which protein nanobodies serve as surrogates for G-proteins. This suggests quite strongly that, for the receptors investigated here, a general activation mechanism applies to both basal and ligand-induced activity. As a footnote, the ternary complex model1 was assumed throughout, so that the success of the A100-model constitutes strong support for the ternary complex model.

ACS Paragon Plus Environment

20

Page 21 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Because the A100-model used exclusively simulation data for training and X-ray crystal structures for validation, it gives testimony to the quality of both and to the compatibility of the two. Importantly, because the simulations were performed in a membrane with aqueous intra- and extracellular media, in contrast to the crystal environment of the experimental structures, the success of the model underlines the relevance of crystal structures for biological studies, despite the “non-biological” measures often needed to obtain crystals.26 Because the model only requires an alignment step and the calculation of five distances, it is easily applied to both simulation trajectories and X-ray crystal structures. In order to make the model easily available, we have implemented a Python script, which is included in the SI, and will make an interactive web server available that calculates A100 from the user’s .pdb file (https://www.chemistry.nat.fau.eu/ccc/a100).

Supporting Information Supporting Information available: Alignment of the nine receptors in the training set and the fifty unique ones in the X-ray structures, A100 values for the validation X-ray structures, plot of A100 vs. time for three simulations with nanobodies. Conflicts of interest There are no conflicts to declare.

Acknowledgements This work was supported by the Deutsche Forschungsgemeinschaft as part of GRK1910 “Medicinal Chemistry of Selective GPCR Ligands”. We thank the Leibniz Rechenzentrum München for a generous grant of computer time (project number pr74su).

ACS Paragon Plus Environment

21

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 34

Notes and references

1. De Lean, A.; Stadel J. M.; Lefkowitz, R. J. A Ternary Complex Model Explains the AgonistSpecific Binding Properties of the Adenylate Cyclase-Coupled -Adrenergic Receptor. J. Biol. Chem. 1980, 255, 7108-7117. 2. Rasmussen, S. G.; DeVree, B. T.; Zou, Y.; Kruse, A. C.; Chung, K. Y.; Kobilka, T. S.; Thian, F. S.; Chae, P. S.; Pardon, E.; Calinski, D.; Mathiesen, J. M.; Shah, S. T.;. Lyons, J. A.; Caffrey, M.; Gellman, S. H.; Steyaert, J.; Skiniotis, G.; Weis, W. I.; Sunahara, R. K.; Kobilka, B. K. Crystal Structure of the β2 Adrenergic Receptor–Gs Protein Complex. Nature 2011, 477, 549-555. 3. Dror, R. O.; Green, H. F.; Valant, C.; Borhani, D. W.; Valcourt, J. R.; Pan, A. C.; Arlow, D. H.; Canals, M.; Lane, J. R.; Rahmani, R.; Baell, J. B.; Sexton, P. M.; Christopoulos, A.; Shaw, D. E. Structural Basis for Modulation of a G-Protein-Coupled Receptor by Allosteric Drugs. Nature 2013, 503, 295-299. 4. Laio, A.; Parrinello, M. Escaping Free-Energy Minima. Proc. Nat. Acad. Sci. USA 2002, 99, 12562-12566. 5. Saleh, N.; Ibrahim, P.; Saladino, G.; Gervasio, F. L.; Clark, T. An Efficient MetadynamicsBased Protocol to Model the Binding Affinity and the Transition State Ensemble of GProtein-Coupled Receptor Ligands. J. Chem. Inf. Model. 2017, 57, 1210-1217. 6. Saleh, N.; Saladino, G.; Gervasio, F. L.; Clark, T. Investigating Allosteric Effects on the Functional Dynamics of β2-Adrenergic Ternary Complexes with Enhanced-Sampling Simulations. Chem. Sci. 2017, 8, 4019-4026.

ACS Paragon Plus Environment

22

Page 23 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

7. Wifling, D.; Pfleger, C.; Kaindl, J.; Ibrahim, P.; Kling, R. C.; Gohlke, H.; Buschauer, A.; Clark, T. Basal Histamine H4 Receptor Activation: Agonist Mimicry by the Diphenylalanine Motif. Submitted to Chem. Eur. J. 8. Saleh, N.; Ibrahim P.; Clark, T. Differences Between G‐Protein‐Stabilized Agonist–GPCR Complexes and their Nanobody‐Stabilized Equivalents. Angew. Chemie Int. Ed. 2017, 56, 9008-9012. 9. Clark, T. Calculations and Simulations: an Invaluable Resource. Beilstein Magazine, 2016, 2; https://doi.org/10.3762/bmag.6 10. Pettersen, E. F.; Goddard, T. D.; Huang, C. C.; Couch, G. S.; Greenblatt, D. M.; Meng, E. C.; Ferrin, T. E. UCSF Chimera - A Visualization System for Exploratory Research and Analysis. J. Comput. Chem. 2004, 25, 1605-1612. 11. Needleman, S. B.; Wunsch, C. D. A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins. J. Mol. Biol. 1970, 48, 443-453. 12. Henikoff, S.; Henikoff, J. G. Amino Acid Substitution Matrices from Protein Blocks. Proc. Nat. Acad. USA 1962, 89, 10915-10919. 13. Wifling, D.; Bernhardt, G.; Dove, S.; Buschauer, A. The Extracellular Loop 2 (ECL2) of the Human Histamine H4 Receptor Substantially Contributes to Ligand Binding and Constitutive Activity. PLoS One 2015, 10, e0117185. 14. Kruse, A. C.; Ring, A. M.; Manglik, A.; Hu, J.; Hu, K.; Eitel, K.; Hübner, H.; Pardon, E.; Valant, C.; Sexton, P. M.; Christopoulos, A.; Felder, C. C.; Gmeiner, P.; Steyaert, J.; Weis,

ACS Paragon Plus Environment

23

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 34

W. I.; Garcia, K. C.; Wess, J.; Kobilka, B. K. Activation and Allosteric Modulation of a Muscarinic Acetylcholine Receptor. Nature 2013, 504, 101-106. 15. Huang, W.; Manglik, A.; Venkatakrishnan, A. J.; Laeremans, T.; Feinberg, E. N.; Sanborn, A. L.; Kato, H. E.; Livingston, K. E.; Thorsen, T. S.; Kling, R. C.; Granier, S.; Gmeiner, P.; Husbands, S. M.; Traynor, J. R.; Weis, W. I.; Steyaert, J.; Dror, R. O.; Kobilka, B. K. Structural Insights into µ-Opioid Receptor Activation. Nature 2015, 524, 315-321. 16. Haga, K.; Kruse, A. C.; Asada, H.; Yurugi-Kobayashi, T.; Shiroishi, M.; Zhang, C.; Weis, W. I.; Okada, T.; Kobilka, B. K.; Haga, T.; Kobayashi, T. Structure of the Human M2 Muscarinic Acetylcholine Receptor Bound to an Antagonist. Nature 2012, 482, 547-551. 17. Manglik, A.; Kruse, A. C.; Kobilka, T. S.; Thian, F. S.; Mathiesen, J. M.; Sunahara R. K.; Pardo, L.; Weis, W. I.; Kobilka, B. K.; Granier, S. Crystal Structure of the µ-Opioid Receptor Bound to a Morphinan Antagonist. Nature 2012, 485, 321-326. 18. Manglik, A.; Kobilka, B. K.; Steyaert, J. Nanobodies to Study G Protein–Coupled Receptor Structure and Function. Annu. Rev. Pharmacol. Toxicol. 2017, 57, 19-37 and references therein. 19. Ring, A. M.; Manglik, A.; Kruse, A. C.; Enos, M. D.; Weis, W. I.; Garcia, K. C.; Kobilka, B. K. Adrenaline-Activated Structure of β2-Adrenoceptor Stabilized by an Engineered Nanobody. Nature 2013, 502, 575–579. 20. Deupi, X.; Standfuss, J. Structural Insights into Agonist-Induced Activation of G-ProteinCoupled Receptors. Curr. Opin. Struct. Biol. 2011, 21, 541-551. 21. Kobilka, B. K. G protein coupled receptor structure and activation. Biochim. Biophys. Acta, 2007, 1768, 794-807.

ACS Paragon Plus Environment

24

Page 25 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

22. Mahoney, J. P.; Sunahara, R. K. Mechanistic Insights into GPCR–G Protein Interactions. Curr. Opin. Struct. Biol. 2016, 41, 247-254. 23. Manglik, A.; Kruse, A. C. Structural Basis for G Protein-Coupled Receptor Activation. Biochem. 2017, 58, 5628-5634. 24. Trzaskowski, B.; Latek, D.; Yuan, S.; Ghoshdastider, U.; Debinski, A.; Filipek, S. Action of Molecular Switches in GPCRs - Theoretical and Experimental Studies. Curr. Med. Chem. 2012, 19, 1090-1109. 25. Palczewski, K.; Kumasaka, T.; Hori, T.; Behnke, C. A.; Motoshima, H.; Fox, B. A.; Le Trong, I.; Teller, D. C.; Okada, T.; Stenkamp, R. E.; Yamamoto, M.; Miyano, M. Crystal Structure of Rhodopsin: A G Protein-Coupled Receptor. Science 2000, 289, 739-745. 26. Ghosh, E.; Kumari, P.; Jaiman, D.; Shukla, A. K. Methodological Advances: The Unsung Heroes of the GPCR Structural Revolution. Nature Rev. Mol. Cell Biol. 2015, 16, 69–81.

ACS Paragon Plus Environment

25

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 34

Table of Contents Graphic

ACS Paragon Plus Environment

26

Page 27 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 1: Results for the training set. Apo-H4R receptors are shown in red, receptor complexes in black. The black dashed line is that resulting from the regression equation for all data points, the red one that for the H4R variants only. The regression equations are given in the appropriate colors.

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2: Schematic diagram of the five distances that occur in the model with their regression coefficients, shown in the boxes. The residues are numbered and labelled for the reference hH4R receptor (see Figure 1). 254x190mm (96 x 96 DPI)

ACS Paragon Plus Environment

Page 28 of 34

Page 29 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 3: Histograms of the A100 values for the 268 X-ray structures classified according to the experimentally assigned state. The individual A100-values are given in Table S3 of the ESI. The borders between the classes in the three-state model are shown as black dashed lines and that for the two-state model in red. 338x190mm (96 x 96 DPI)

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 4: (a) Time-dependence of A100 for the last 500 ns of the hH4R F169V simulation. The red dashed lines indicate the borders of the intermediate class in the three-state model. (b) Histogram of the A100 values. 338x190mm (96 x 96 DPI)

ACS Paragon Plus Environment

Page 30 of 34

Page 31 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 5: Histogram of the activation index calculated from class A GPCR X-ray structures with nanobodies, as given in Table 3. The vertical dashed lines indicate the borders between classes (A100 = 0 and 55) in the three-state model. Structures described in the experiment as active are shown in green and the one inactive structure in red. 338x190mm (96 x 96 DPI)

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 6: Histogram of the activation index calculated for three simulations of ternary complexes with nanobodies. The red bars indicate M2R with iperoxo, blue the -opioid receptor with BU72 and green ADRB2 with adrenaline. The vertical dashed lines indicate the borders between classes (A100 = 0 and 55) in the three-state model. 122x65mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 32 of 34

Page 33 of 34 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Figure 7: Schematic view of the helix rearrangements encoded in the A100-model. Red distances become shorter in the activated conformation, the green one longer. The movement of helices 6 and 7 is shown by solid lines and that of helix 3 by dashed ones. 338x190mm (96 x 96 DPI)

ACS Paragon Plus Environment

Journal of Chemical Information and Modeling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table of Contents Graphic 254x190mm (96 x 96 DPI)

ACS Paragon Plus Environment

Page 34 of 34