The Too Many Faces of PD-L1: A Comprehensive Conformational

Sep 12, 2017 - In the current study, we focused on the immune-checkpoints PD-1 pathway and in particular on the ligand PD-L1. We studied the conformat...
0 downloads 9 Views 3MB Size
Subscriber access provided by Gothenburg University Library

Article

The Too Many Faces Of PD-L1: A Comprehensive Conformational Analysis Study Marawan Ahmed, and Khaled H. Barakat Biochemistry, Just Accepted Manuscript • DOI: 10.1021/acs.biochem.7b00655 • Publication Date (Web): 12 Sep 2017 Downloaded from http://pubs.acs.org on September 13, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Biochemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

The Too Many Faces Of PD-L1: A Comprehensive Conformational Analysis Study Marawan Ahmed1, and Khaled Barakat*1,2 1

Faculty of Pharmacy and Pharmaceutical Sciences, University of Alberta,

Edmonton, Alberta, Canada 2

Li Ka Shing Institute of Virology, University of Alberta, Edmonton, Alberta,

Canada *Corresponding author: Email: mmahmed@ualberta [email protected]

ACS Paragon Plus Environment

1

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 41

ABSTRACT In the current study, we focused on the immune-checkpoints PD-1 pathway and in particular on the ligand, PD-L1. We studied the conformational dynamics of PD-L1 through principal component analysis (PCA) of existing crystal structures combined with classical and accelerated molecular dynamics simulations. We identified the maximum structural displacements that take place in all PD-L1 crystal structures and in the MD trajectories. We found that these displacements are attributed to specific flexible regions in the protein. We also investigated the conformational preference for small molecule binding and highlighted a Methionine residue at the binding site, which plays a key role in drug binding. The binding mechanism of PD-L1 to other binding partners is also discussed in details from a computational perspective. We hope that the data presented here supports the ongoing efforts to discover effective therapies targeting the PD-1 immune-checkpoint pathway.

Key words: hPD-1/hPD-L1, Immune-checkpoints, Protein-protein interactions, conformational selection, Principal component analysis (PCA)

ACS Paragon Plus Environment

2

Page 3 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

1.

Introduction:

Immune-checkpoints represent a major and effective therapeutic target in the fight against cancer.1-3 Blocking these pathways led to impressive and unprecedented clinical success. The first proof-of-principle was demonstrated by anti-CTLA-4 monoclonal antibody (MAB), Ipilimumab (YERVOY®) 4, 5, which efficiently restored the function of exhausted T cells and reactivated the immune system to recognize and kill tumor cells. Following this great discovery, other MABs against another immunecheckpoint target was identified, namely PD-1.6, 7 While targeting CTLA-4 has been always associated with severe toxicity and side effects, targeting PD-1 seems to be less toxic.8, 9 Furthermore, while blocking the CTLA-4 pathway is more efficient when targeting the receptor, for the PD-1 pathway one can target the receptor, PD-1, or its ligand, PD-L1. For example, nivolumab, a currently FDA approved monoclonal antibodies (MAB) targeting PD-1 has shown remarkable clinical benefits in patients with recurrent squamous-cell carcinoma of the head and neck 10, advanced renal-cell carcinoma

11

and melanoma.12 To the best of our knowledge, none of the PD-L1

directed MAB have yet gained FDA clinical approval, but the data is very encouraging. For example, MPDL3280A, a PD-L1 MAB in clinical trials has resulted in a response rate of approximately 43.3% in PD-L1 expression-positive patients with metastatic bladder cancer.13 This data suggests that PD-L1 targeted therapy will be a major game player in the field of immunotherapy in the next few years.

Structurally, the human PD-L1 protein is a membrane bound protein receptor that is composed of 290 amino acid residues. PD-L1 belongs to the B7 protein family and is believed to exist in a dimeric state in solution.14 As shown in Figure 1, the extracellular region of PD-L1 is composed of two distinct protein domains, an Ig-like V-type domain (IgV domain, amino acids 19–127) and an Ig-like C2-type domain (IgC domain, amino acids 133–225). Both domains adopt the standard, sandwich-like β-sheet composition (Immunoglobulin Fold).15, 16 The IgV domain is the region responsible for the binding function of PD-L1. It is also the target site for small molecule compounds 17 and MAB 18 targeting PD-L1. The complicated conformational dynamics of the IgV domain allows

ACS Paragon Plus Environment

3

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 41

it to adopt a specific ensemble of 3D conformations. This ensemble of structures is thermodynamically controlled to execute the required physiological function of binding to either PD-1 or B7-1.19-21 The concept of protein conformational ensemble and population shift is fundamental in understanding the biological functions of protein-protein interactions, including PD-L1.22

The rational design of small molecule drugs and MAB directed toward PD-L1 requires a significant level of knowledge of the conformational dynamics of the adopted conformational ensemble with different binding partners.23-25 There are fourteen human PD-L1 crystal structures available to date in the protein data bank (PDB) database. These structures comprise the free (unbound) conformation of the protein (e.g. 5C3T)

19,

the PD-1 bound conformation (e.g. 4ZQK)

19, 26,

the MAB bound

conformation (eg: 5GGT) or the small molecule bound dimer conformation of PD-L1 (e.g. 5J89).17 For a complete list of all PDB entries, phylogenetics, experimental as well as citation data, please see Table S1 in the supporting information, generated by the PDBe database.27, 28 Apparently, all of these conformations represent a limited space of the thermodynamically accessible structural ensemble of the PD-L1 protein. These conformations are biased toward the preferred states triggered by the PD-L1 binding partner, being a small organic molecule, a protein or nothing at all in the free form.

The major objective of this study was to characterize the detailed conformational dynamics of the IgV (binding domain) of PD-L1. Functionally relevant (low frequency) global scale motions were identified through principal component analysis and were extracted from all experimental crystal structures as well as by analyzing the generated MD trajectories. The impact of the conformational states of binding site residues is also discussed. Our findings highlight specific flexible regions at the PD-L1 surface that induce these conformational dynamics. Our findings also suggest a key role for a methionine residue at the small molecule-binding site to PD-L1. The data presented here identify key structural elements and dynamics that can be used to rationally design small molecule and MAB inhibitors for the PD-L1 target.

ACS Paragon Plus Environment

4

Page 5 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

2. Computational Methods and Details 2.1 Protein Preparation and Constrained Refinement We started this work by retrieving all crystal structures for PD-L1 from the Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (PDB).29 There are fourteen human PD-L1 entries deposited in the PDB database (see Table S1 for PDB codes and all relevant details). These entries represent two unbound monomeric structures of PD-L1 (PDB codes: 3BIS, 5C3T)

19, 30,

three unbound dimeric structures

of PD-L1 (PDB codes: 3FN3, 4Z18 and 5JDR) 14, two structures bound to human PD-1 (PDB codes: 4ZQK and 5IUS) 3BIK and 3SBW)

30,

19, 26,

two structures bound to mouse PD-1 (PDB codes:

two structures bound to small molecule inhibitors (PDB codes:

5J89 and 5J8O) 17 and three bound to antibodies (PDB codes: 5GGT, 5GRJ and 5JDS).18, 31, 32

The retrieved structures were prepared and refined using the protein

preparation wizard in Maestro

33,

the Schrodinger suite.34 The preparation stage

included adding missing atoms and residues, and assigning proper protonation states at neutral pH for charged residues, followed by hydrogen only constrained energy minimization. Non-IgV domain components of each structure were deleted. Given the sufficient structural differences between the dimer-forming chains in the small molecule bound structures of PD-L1 (PDB codes: 5J89 and 5J8O)

17,

we decided to

keep the two chains of these two PDB entries. Thus, we ended up by sixteen distinct human PD-L1 conformations from fourteen PDB entries.

2.2. Classical and Accelerated MD Simulations In addition to analyzing the conformational changes in all human PD-L1 structures, we used both classical and accelerated MD simulations to address this question. To explore the conformational space sampled by the apo IgV domain of PD-L1, a long MD simulation in explicit solvent were carried out. The most recent crystal structure of the IgV domain of human PD-L1 (PDB code: 5C3T)

19

was used to carry out these

simulations. This structure, 5C3T, has the best resolution among all free human PD-L1 crystal structures (Table S1, 1.8 Å). The system preparation and MD simulations were executed using the AMBER16 software package and modules within.35 Original Gln47

ACS Paragon Plus Environment

5

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 41

replaced the Glu47 mutation in the deposited crystal structure. The structure was solvated in a periodic cubic box of explicit TIP3P water molecules with a minimum distance of 15 Å between the protein atoms and the edge of the water box and the system was neutralized by counter ions. Forcefield parameters were assigned through the AMBERff14SB forcefield

36

with the tleap module in AMBER. After several stages

of minimization, NVT heating and NPT equilibration, six consecutive production 20 ns MD simulations were conducted giving a total of 120 ns simulation time where each simulation started from a random velocity. We then discarded the first 20 ns simulation trajectories and extracted the last snapshot of the last 5 MD trajectories. For each of these five snapshots and after running 10 ns NPT equilibration simulation, we ran additional 200 ns classical and 200 ns accelerated MD simulation with a boosting potential. This gave a total of 1000 ns (1 μs) of classical and 1 μs of accelerated MD simulations.

Accelerated MD (aMD) is a common technique for exploring the conformational space of bio-macromolecules and has been applied successfully to study many proteins of therapeutic interest.37, 38 For example, using aMD, Balmith et al. were able to explain the effect of the K127A, T129A and N130A mutations on the N-terminal domain of the VP40 protein of Ebola virus with cell membrane at the atomic level.39 Another study by Grant et al. has explored the conformational transition of the Ras protein, a critical protein involved in cell signaling and a significant target for cancer treatment.40 The authors were able to characterize several intermediate structures that could be mapped to existing crystal structures. They also showed that cMD was not capable of recovering such intermediate states. The encouraging results obtained by aMD in the aforementioned studies and several other studies encouraged us to apply aMD to explore the conformational space of the IgV domain of PD-L1.

In aMD, a positive boosting potential, ΔV(r), is applied to the simulated molecular system once the potential energy of the system drops below a certain potential energy threshold (E).41 Introducing this non-negative biasing potential accelerates the conformational transitions of the molecular system across the energy barriers,

ACS Paragon Plus Environment

6

Page 7 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

preventing the entrapment of the molecular system in a specific potential energy basin. A direct consequence of this accelerating potential is the extension of the simulation time scale compared to the unbiased simulation.

The modified V*(r) is therefore expressed as follow:

 ∗  =  +  … 1 Where V(r) is the system’s original potential as calculated from an equilibrium simulation. And ΔV(r) can be approximated by the following equation:

0 Vr ≥ E Vr =   −  … 2 Vr < E  +  −  In the previous equation, E is the reference energy below which the positive potential, ΔV(r), will be applied. The parameter α determines the required acceleration transition and a lower value of this parameter enhance the transition of the bimolecular system across the energy barriers. In practice, the AMBER implementation of the aMD allows the application of the boosting potential on the whole system (iamd =1), boosting only the torsional potential energy (iamd = 2) or boosting the full system potential, adding additional boost on the torsion angles, i.e. dual boosting (iamd = 3). In this study, we applied the dual boosting scheme (iamd = 3). As recommended by the aMD developers in AMBER, the selected aMD input parameters (EthreshP, alphaP, EthreshD, and alphaD) are estimated from average energy values as extracted from the 10 ns NPT equilibration simulation.” As we ran one independent aMD simulation for each of the selected five snapshots (see above), these values are different for each simulation; details are given in supporting information (Table S1, sheet 2). All production accelerated (aMD) and conventional MD (cMD) simulations were carried out at constant pressure (1 bar) and constant

ACS Paragon Plus Environment

7

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 41

temperature (300 K) using the Berendsen barostat 42 and the Langevin thermostat 43, respectively. Covalent bonds involving hydrogen atoms were constrained through the SHAKE algorithm

44

and the equations of motion were integrated using a 2 fs time

step. Short-range interactions were truncated after 9 Å cut-off and long-range electrostatic interactions were calculated using the particle mesh Ewald summation method.45 Trajectory snapshots were saved every 4 ps and trajectory analysis was carried out for 250000 frames.

2.3

Multivariate Analysis: Principal component analysis (PCA) and residue

cross correlation analyses Principal component analysis (PCA) is a robust method that can be used in identifying the collective motion of bio-macromolecules (direction and amplitude).46-48 PCA reduces the noise associated with typical MD trajectories, revealing significant inferences from complex MD simulation data. In the current study, PCA was performed on three different systems. Namely, the sixteen experimental X-ray crystal structures of human PD-L1, the 1 μs classical MD trajectory and the 1 μs accelerated MD trajectory. The steps required to perform PCA can be described as follow; first, the rotational and translational degrees of freedom were removed by least square fitting the protein Cα atoms to a reference structure (for PCA on MD trajectories, a relaxed snapshot after 20 ns initial MD simulation was used). A covariance matrix C was generated using the Cartesian space of the corresponding Cα atoms (3N*3N) and diagonalized to obtain the eigenvalues (amplitude) and their corresponding eigenvectors (directions). The matrix elements (Cij) of this C matrix were calculated from the Cartesian coordinates (r) of Cα atoms as follow:

 = <  − <  > .  − <  > > ⋯ 3 In the above equations, ri and rj represent the Cartesian coordinates of all Cα atoms and the sign denotes the ensemble average of the atomic positions in the Cartesian space. The resulting covariance matrix is diagonalized to yield the eigenvalues (PCs)

ACS Paragon Plus Environment

8

Page 9 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

and their corresponding eigenvectors (direction of motions). The resulting principal components (PCs) were ranked by the corresponding structural variance they can capture. The few largest PCs that translate to the maximum variance across the data were projected on the 3D coordinates of the protein to visualize dominant motions. To characterize the correlated motion between protein residues, a pairwise residue cross-correlation analysis was performed using the Linear Mutual Information (LMI) algorithm.49 In LMI, the correlated motion between two atoms, here we selected the Cα atoms, can vary between 0 (no correlation) to 1 (fully correlated motion). The matrix elements of the resulting LMI correlation matrix were calculated as follows:

 ! ", " = ½  !|| + !|| − !|| … 4 In the previous equation, I is the LMI correlation between xi and xj, which could be residues or atoms, Ci and Cj are marginal covariance matrices, and Cij represents the pair-covariance matrix.50-52 PCA analysis was carried out using the R statistical package BIO3D

53,

LMI residue cross-correlation analysis was performed using the

Wordom software package.50, 51 3 Results and Discussion Our journey to characterize the conformational landscape of the IgV domain of human PD-L1 involved four main stations. First, performing PCA analysis on the available experimentally determined structures of PD-L1. These different structures are expected to span a wide conformational landscape as they represent the interaction of this domain with a verity of bound structures. Two other stages involved performing a similar analysis on the trajectories of the two long (1 μs) classical and accelerated MD trajectories. Finally, local structural changes assisted by the presence of small molecule compounds that induce PD-L1 dimerization are thoroughly discussed.

3.1 Unique structural shifts of the crystal structures identified by PCA

ACS Paragon Plus Environment

9

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 41

Our first attempt to analyze the conformational space of human PD-L1 focused on the experimental crystal structures. In total, sixteen unique PD-L1 structures were extracted from the PDB database and analyzed. To assess the conformational fluctuations in these PD-L1 crystal structures, PCA was performed on them as described in details in the Methods section. A two-dimensional representation of the structural data set projected onto the subspace defined by the two largest eigenvectors (principal components) is presented in Figure 2. The two major eigenvectors correspond to the total structural variation in the structural ensemble by approximately 53.5%. In fact, over 70% of the total structural fluctuation could be captured with the first four eigenvectors as shown in the inset scree plot in Figure 2.

All crystallized PD-L1 structures can be grouped into four main clusters (See Figure 2). This simplified view of the PD-L1 structural displacements is based on the projection of the structures on the two largest principal components (eigenvectors). The dominant contribution of the first few principal components is associated with the collective motion of the C``D loop (residues: 72 to 82) that is intermittent by short helical segments. To a less extent, an additional contribution is also attributed to the short BC loop (residues: 44 to 53). These loops are colored in orange in Figure 2.

The largest cluster, cluster 1, comprise two antibody bound structures; the first is for PD-L1 bound to the BMS-936559 Fab fragment (PDB entry 5GGT)32, while the second is for PD-L1 bound to the MAB Avelumab (PDB entry 5GRJ) 18. This is despite the fact that the BMS-936559 Fab fragment and the MAB Avelumab have a different, yet overlapping binding sites at the PD-L1 binding interface. The two superimposed structures are shown in Figure 3. There is a perfect overlap of human PD-L1 conformations coming from the two crystal structures, validating their existence in a close proximity in cluster 1 as revealed from the PCA (See Figure 2). In the same cluster includes a PD-L1 conformation that was co-crystallized with a high-affinity PD1 mutant (PDB entry 5IUS). According to Pascolutti et al.

26,

the engineered PD-1

mutant shows an enhancement in the binding affinity to the wild-type PD-L1 by approximately 35,000-fold. The two remaining entries in cluster 1 (5J89 and 5J8O)

ACS Paragon Plus Environment

10

Page 11 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

represent the second chain (chain B) of the high-affinity PD-L1 dimer complexes induced by the presence of small molecule ligands.17 This observation suggests that high affinity PD-L1 binding proteins will populate similar PD-L1 conformations. The same cluster includes two apo forms of the IgV domain of PD-L1, namely, 5C3T 19 and 5JDR.31

A fundamental concept in the binding theory of bio-macromolecules, including proteins, is the presence of the given macromolecule in an ensemble of interchangeable states of structures separated by low free energy barriers.54-56 The association of two proteins is believed to take place through one of two mechanisms; i) conformational selection and/or ii) induced fit.57, 58 In the conformational selection mechanism of protein-protein binding, the bound conformation of a given protein is populated in the structure ensemble of the free state in solution. In the induced fit mechanism of binding, however, significant conformational changes are required for this binding to take place, i.e. a significant energy barrier exists between the free and the bound conformation. Given the fact that cluster 1 populates two free PD-L1 structures, we may conclude that most probably PD-L1 binding in the complexes exist in cluster 1 is taking place through a conformational selection mechanism.54-56 Additional discussion regarding this phenomenon is given below.

The second largest cluster, cluster 2, is populated by six PD-L1 conformational structures (See Figure 2). Three of these structures represent unbound PD-L1. These structures are 3BIS, 3FN3 and 4Z18. The remaining complexes represent hPD1/hPDL-1 complex (4ZQK), and mouse(m)PD-1/hPD-L1 complexes (3SBW and 3BIK respectively). As the same cluster populates both free and PD-1 bound conformations of PD-L1, one can conclude that PD-L1 binding to PD-1 is not accompanied by significant conformational changes to PD-L1. Regarding PD-1, a recent study by Liu et al. found that the conformational selection followed by induced fit is the predominant binding mechanism of PD-1 to PD-L1.59 During their MD simulation, the authors found that the PD-1 can sample both the open CC` loop (unbound like conformation) and a closed CC` loop (PD-L1 bound like) forms. One major drawback of their study is that

ACS Paragon Plus Environment

11

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 41

the authors did not consider analyzing the 35 NMR conformations available for the free PD-1 in solution (PDB code: 2M2D).60 In principle, NMR structure ensemble samples a much longer time scale than what MD can reach. It is not yet clear whether ignoring these PD-1 conformations may have affected the overall conclusion.

The last two clusters exhibited an interesting behavior and are sufficiently distinguished from the two main clusters discussed above. Cluster 3 represents the principal chain (chain A) that is required for binding to a small molecule PD-L1 inhibitor as illustrated in the small molecule induced dimerization structures of PD-L1 (5J89 and 5J8O, chain A).17 Cluster 4, however, represents an orphan PD-L1 conformation co-crystallized with a novel nanobody from a study by Zhang et al. (PDB code: 5JDS).31 As it is evident from Figure 2, cluster 3 is distinguished from both cluster 1 and cluster 2 along the second principal component. Given the small scale of the Y-axis compared to the X-axis, it does not seem that 5J89 and 5J8O, chain A, are very different from other PD-L1 structures. On the other hand, cluster 4 is significantly distinguished from other clusters along the first principal component (PC1, X-axis). To show the structural differences of cluster 4 from reference PD-L1 conformations, Figure 4a illustrates the structural superposition of the sole member of cluster 4, that is 5JDS, superimposed over the closest PD-L1 reference structure along the PC2 axis, 4Z18. The residue PC loadings of the first three principal components are given in Figure 4b with a significant contribution from the C``D loop for the first three principal components. As shown in Figure 4a, 5JDS (a PD-L1 conformation that is bound to a nanobody, KN035) 31 exhibited a significant C``D loop displacement from the reference PD-L1 conformation, 4Z18. The shift in the C’’D loop of PD-L1 (≈7.5 Å) minimizes the steric clashes between the PD-L1 interface and the KN035 nanobody. The authors of the 5JDS crystal structure, however, claimed that this shift is attributed to the crystal packing forces.31 It was clear that the C’D loop of one monomeric structure in the resolved crystal lattice forms several H-bonds and lipophilic interaction with another nearby monomer in the crystal. This justifies the observed C’D loop shift in the resolved crystal structure.

ACS Paragon Plus Environment

12

Page 13 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

3.2

MD Trajectories Root Mean Square Deviation (RMSD) and B-factor analysis

In the previous sections, PCA was conducted on the experimental crystal structures of human PD-L1. This analysis identified specific conformational clusters for the Igv domain. A central aspect to distinguish between these clusters was the collective motion of the C’’D loop. However, given the limitation of performing such analysis on static X-ray structures with limited sampling capacity, our next step was to conduct a similar analysis on long trajectories of MD simulations. As discussed in details in the computational methods section, both conventional MD (cMD) and accelerated MD (aMD) were run for five starting protein conformations. For each starting configuration, 200 ns of cMD and 200 ns of aMD were carried out. This gave a total of 1 μs of cMD and 1 μs of aMD simulations. Before analyzing the generated trajectories using PCA, we had to check the stability of these long MD simulations. The Cα associated root mean square deviation (RMSD) and B-factors over the entire simulations were calculated and shown in Figure 5. The RMSD plots reveal no large-scale structural displacements during the simulation time for both the cMD and the aMD simulation trajectories (See Figure 5), meaning that the immunoglobulin fold of PD-L1 is very stable. For example, the average RMSD value of the cMD trajectory is 1.25 Å and reached a maximum value of 2.0 Å after 600 ns of the simulation time. As also expected from a standard enhanced sampling method with a biasing potential, the aMD simulation experiences a slightly less stable MD trajectory. The average RMSD of the aMD trajectory was 1.64 Å and reached a maximum of 2.73 Å after 300 ns MD simulation time. Given the collective nature of the RMSD calculations that maps the fluctuation of the entire protein and in order to get a residue-based contribution to the total fluctuation, we calculated temperature (B) factor plots as estimated from the MD trajectories (see Figure 5b). The calculated B-factors in both the classical and accelerated MD simulations agree quantitatively well. Our data suggests that the highest fluctuations in the protein structure originate from specific loops (See Figure 5b). For example, the

ACS Paragon Plus Environment

13

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 41

largest displacement in the calculated B-factor is associated with the thermal fluctuation in the BC loop. The second largest displacement correlates with the C``D loop. Furthermore, accelerated MD simulation did not cause any significant perturbation to the overall protein conformational dynamics. The data presented in Figure 5b is in a good qualitative agreement with the B-factor plots as extracted from all free PDL-L1 crystal structures (supporting information, Figure S2). Figure S2 shows the BC loop residues to have the highest B-factors in the majority of the crystal structures. This suggests that the generated MD trajectories are good representatives of the true protein dynamics and validates the use of these trajectories in conducting a more complex analysis using PCA.

3.3 PCA and residue correlation analysis of the classical and accelerated MD simulation trajectories To further characterize the low-frequency global motion of the IgV domain of PD-L1 from the generated MD trajectories, PCA was performed on the covariance matrix of the corresponding protein Cα Cartesian coordinates. The same analysis was independently performed on the aMD and the cMD trajectories. The matrix was constructed after superimposing both trajectories to a single reference structure (see Methods). To gain more insights on the specific structural segments responsible for the collective motions of the IgV domain of PD-L1, the low-frequency modes of motions for the IgV structure are shown in Figure 6. It is apparent from this figure that the largest PC (PC1) describes the collective motion of the BC loop perpendicular to the plane formed by the beta-sheet layer at the interface. While the cMD trajectory shows an exclusive contribution from the BC loop for PC1, aMD shows additional contributions from the G strand. The contribution of the G strand for the cMD could be visualized in the third largest principal component (PC3) of the cMD simulation. The PC scores of the MD frames projected onto the two largest principal components (PC1 & PC2) plot are shown for the cMD (Figure 7a) and aMD (Figure 7b). A PC score is a measure of the PDB conformations under investigation in the space defined by the combined variables (principal components). The largest principal components

ACS Paragon Plus Environment

14

Page 15 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

(ordered by data variance) are usually the ones associated with maximum structural displacement and hence the scores of the generated PDB conformations at the space defined by these principal components are the scores that are typically investigated.46, 61

Representative conformations for the low energy (high density) regions of the PC

score plots are also shown as inset figures. For the cMD simulation, low energy (high density) conformations of the BC loop adopts three specific conformations, a, b and c. In conformations a and c, the BC loop represents a closed conformation with respect to the PD-L1 binding interface, while in conformation c it adopts an open state with respect to the PD-L1 binding interface. Similarly, the aMD PC score plots could identify three regions with high density (d,e,f), all of them represent a more open loop conformations, where d and e are overlapping. The corresponding eigenvalue ranks (see scree plots) that translate to the variance that each principal component can capture are given in in the lower panel of Figure 7. The first four PCs could capture up to 50% of the total variance in the Cα displacement in cMD and up to 42.9% in the case of aMD. Given the significant level of energetic noise associated with reweighting the aMD trajectories of medium to large sized proteins as discussed in several studies, we did not attempt to reweight the free energy profile of the principal components generated from the accelerated simulation.62, 63 In order to assess the correlation strength between the amino acid residues with distant residues, LMI residue correlation analysis for the cMD and the aMD were conducted. Residue correlation plots of both classical and accelerated MD simulations are given in Figure 8. The maximum correlation was found between adjacent residues and residues that share the same secondary structure element. The compact fold of the IgV domain allows a strong connection even between distant residues as evidenced through the correlation plot (See Figure 8). This means that even a small perturbation, that may be induced by a weak allosteric ligand, may have a significant impact on the overall signal transmission across the PD-1/PD-L1 pathway. Indeed, a recent study by Fenwick et al. has investigated the extent of correlated motion in several proteins and the authors concluded that correlated motion is a fundamental property of β-sheets.64 Additional experimental and computational studies are

ACS Paragon Plus Environment

15

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 41

required to explore the relationship between this strongly correlated motion in PD-L1 with the PD-1/PD-L1 signaling pathway. 3.4. The Conformation Of Met115 Side Chain At The Small Molecule Binding Site Zac et al. produced the first co-crystalized small molecule compounds with PD-L1. As discussed in their work (PDB codes: 5J89 and 5J8O), the pocket is formed by residues Ile54, Tyr56, Met115 and Ser117 and is induced by the presence of the small molecule ligand.17, 65 The binding interface is common between the small molecule and the PD-1 protein. Once a molecule is bound to one chain, a second PD-L1 chain can bind, forming

a

high-affinity

PD-L1/compound/PD-L1

sandwich-like

complex.

Consequently, these compounds serve as potent and unusual protein-protein interaction inhibitors, although their therapeutic value remains questionable. The three-dimensional ligand interaction diagram of a representative PD-L1 small molecule bound to the complex is shown in Figure 9a. In the meantime, Figure 9b shows the three-dimensional structure of the unbound PD-L1, 4Z18. The free structure shows the occlusion of the small molecule pocket by the side chain of Met115 in the absence of the small molecule (See Figure 9b).

It is well established that the preferred side-chain conformations of many amino acids, particularly those with long side chains follow a Boltzmann probability distribution 66. In a recent study by Virrueta et al. 67, the authors developed a reduced dipeptide model to explore the preferred side-chain conformations of some amino acids, including Methionine. Methionine owns the longest aliphatic side-chain. From a structural perspective, the overall side chain conformation is determined by the first side chain torsion, Chi1 (χ1, N-CA-CB-CG). According to the Virrueta et al. study 67, the preferred Chi1 torsion of Methionine is distributed in three rotameric bins with an increasing probability going forward (P{0° ≤ χ1  -120°} < P{±120°≤ χ1