Structural Perturbations Present in the Folding Cores of Interleukin-33

Jun 10, 2015 - Structural Perturbations Present in the Folding Cores of Interleukin-33 and Interleukin-1β Correlate to Differences in Their Function...
0 downloads 0 Views 3MB Size
Article pubs.acs.org/JPCB

Structural Perturbations Present in the Folding Cores of Interleukin33 and Interleukin-1β Correlate to Differences in Their Function V. V. Hemanth Giri Rao and Shachi Gosavi* National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bellary Road, Bangalore 560065, India S Supporting Information *

ABSTRACT: The interleukin-1 cytokines belong to the β-trefoil fold family and play a key role in immune responses to infections and injury. We simulate the structure-based models of two interleukin-1 cytokines, IL-33 and IL-1β, and find that IL-33 has a lower barrier to folding than IL-1β. We then design the folding motif (FM) of the β-trefoil fold and identify structural deviations of IL-33 and IL-1β from this FM. In previous work, we found that structural deviations from the FM that are large enough to lower folding free energy barriers were part of ligand binding sites. In contrast, we find that structural perturbations in IL-33 and IL-1β which reduce the folding free energy barrier are located in the folding core and do not bind ligands. In both proteins, such core residues are interleaved with surface residues which are proximal to receptor binding sites. However, IL-33 has a lower folding barrier because its core perturbations are larger than those in IL-1β. In order to understand the role of these core perturbations, we perform atomistic simulations of both proteins and find that the larger core perturbations may allow IL-33 to communicate signals differently across the protein. Integrating previous data, we also hypothesize that the larger IL-33 core perturbations may help accommodate its more charged binding site and may also aid in its inactivation by caspase-mediated cleavage. Together, our results suggest that protein folding landscapes are modulated not only by larger functional features such as binding sites but also by the details of protein function and fate. Furthermore, a comparative study of such landscapes may be a facile way to identify subtle differences in allosteric connectivity between two similar proteins.



INTRODUCTION Understanding the molecular details of protein folding is not only important from a biophysical perspective but also from an evolutionary standpoint. However, such details are difficult to access both experimentally and in atomistic simulations. For instance, despite great advances in computing power,1 it takes too long to perform atomistic protein folding simulations in a range of differing solvent, temperature and other biophysical conditions except using the smallest of proteins.1−7 An elegant solution to this time-scale problem (both computer time: CPU hours and real biochemical time: micro-milliseconds) is the use of energy-landscape inspired models.8−10 The energy landscape theory for protein folding posits that evolution has sculpted funnel-shaped energy landscapes for proteins.11,12 This funnelshape reduces trapping caused by the stabilization of interactions which are not present in the folded (or the native) state of the protein making the folding energy landscape smoother and allows proteins to fold in a biologically reasonable time. The reduced trapping due to non-native interactions allows models based only on the native structure of the protein to correctly capture the main features of folding.8,9,13,14 In fact, well-designed structure-based models are computationally efficient and have been able to correctly describe the stabilizing and destabilizing effects of temperature, denaturants, cosolvents, and functional residues on protein folding.9,15−17 © 2015 American Chemical Society

Here, we explore the effect of functional residues on the rates of protein folding using molecular dynamics (MD) simulations of coarse-grained structure-based models. The smoothing of landscapes to ensure fast folding has evolutionarily been achieved by choosing residues which reduce both the free energy of the final folded state and kinetic trapping while folding.12 However, because protein sequences encode both structure and function, only those residues which are not essential for function (i.e., those which are not part of active or binding sites) can be optimized for protein stability and folding.18,19 As a result, functional regions in a protein can destabilize structure, create non-native traps, or slow folding, and it has been shown experimentally in diverse proteins that mutating binding regions increases the stability or the “foldability” of a protein.19−22 In order to systematically analyze this trade-off between functional residues and folding and to explore the different ways in which function affects folding,23−25 we recently designed a method to construct the folding motif (FM) of a fold family.19 In this method, we assumed that conserved structure (both residue positions and their interactions) within Special Issue: Biman Bagchi Festschrift Received: March 31, 2015 Revised: May 26, 2015 Published: June 10, 2015 11203

DOI: 10.1021/acs.jpcb.5b03111 J. Phys. Chem. B 2015, 119, 11203−11214

Article

The Journal of Physical Chemistry B proteins which are functionally diverse but structurally similar would code for the stability and the foldability of the fold. Nonconserved structures on the other hand would code for the diverse individual functions of the proteins. By comparing the folding of the FM of the β-trefoil fold to the folding of the wildtype (WT) β-trefoil proteins, we concluded that in some proteins, such as interleukin-1β (IL-1β), much of the function (receptor binding) is encoded by adding structural elements onto the basic structure of the FM.19 Adding structural elements increased the complexity of the fold and slowed folding. However, in other proteins, function is encoded by repurposing folding residues to bind ligands. This degrades the packing of the fold and increases folding rate but at the expense of stability.19 Here, we explore the potentially subtle effects of specificity and allostery on protein folding by using the FM to identify structural perturbations that affect the folding of two functionally similar β-trefoil proteins. The β-trefoil fold is composed of 12 β-strands organized into three pseudosymmetric trefoil (β−β−β-loop-β) units (see Figure 1). The first and the fourth β-strands of each trefoil together form a sixstranded β-barrel which is capped by a hairpin triplet. The sixstranded barrel creates a well-connected compact structure and this may promote communication across the protein. As a consequence, small structural changes may modulate allostery in the β-trefoil proteins.26−28 The interleukin-1 cytokines are an essential component of the immune system and are involved in regulating immune and inflammatory response.30 Several members of this family are βtrefoil proteins that bind to the extracellular domains of their respective membrane bound receptors. When a cytokine that promotes inflammation binds to its receptor, a coreceptor gets recruited to form a ternary complex. The interleukin-1 proteins, interleukin-33 (IL-33) and IL-1β, share the same coreceptor IL1RAcP.30 They also have structurally congruent receptor binding sites named A and B.31,32 However, they have diverse sequences and bind to different receptors. IL-33 binds tighter to its receptor ST-2, and this binding is mediated by electrostatic interactions.32 The binding of IL-1β to its receptor IL-1RI is weaker and is through van der Waals and hydrogen bonding interactions.31,33 There are indications in IL-1β that the receptor binding involves allosteric communication between the two receptor binding sites of the protein.27,34 Further, previous studies have shown that function has a detectable effect on the folding of IL-1β.35,36 The well-connectedness of the β-trefoil fold, the likely existence of allostery in IL-1β, and the similar binding modes of IL-33 and IL-1β to their respective receptors make this pair of proteins an attractive system to computationally investigate the effects of allostery on protein folding.

Figure 1. Structures and native contact maps of IL-33 and IL-1β. (A) Structure of IL-33 from PDB 2KLL model 1. N- and C- termini of the protein are marked. (B) C-α native contact map of IL-33. Each native contact is marked on the contact map at (x,y) and (y,x), where x and y are the residue numbers of the residues in contact. The contacts present inside the three squares on the contact map are intratrefoil contacts, while those outside these squares are intertrefoil contacts. Secondary structural elements are marked on both the axes of the contact map with filled violet rectangles marking every β strand. The β-trefoil fold comprises three trefoils, with each trefoil having a β−β−β-loop-β architecture. The corresponding secondary structural elements for each trefoil are labeled as β1, β2, and so on. β1, β4, β5, β8, β9, β12 are organized into a β-barrel, while the remaining strands form a cap with three hairpins. Every loop between adjacent β strands is numbered, starting with the loop between strands β1 and β2 being L1. All secondary structural contacts between β strands are perpendicular to the diagonal indicating the antiparallel nature of the interacting β strands. (C) Structure of IL-1β from PDB 6I1B and its (D) C-α native contact map. The description of the intra- and intertrefoil contacts, trefoil architecture, labeling of the secondary structural elements, and the antiparallel nature of the β strands is similar to that of IL-33. L3 of IL-1β has a 310 helix which is marked on the secondary structure as an open rectangle. All structures in this manuscript were drawn using VMD 1.9.1.29 See Supporting Information for the definition of the trefoil boundaries and the lists of native contacts for both proteins.



METHODS C-α Structure-Based Models (SBMs). SBMs of proteins have found common use in protein folding because they are computationally efficient and have been shown to correctly describe both folding mechanisms and trends in folding rates.8,14 SBMs are so-called because they encode only the folded structure of the protein in their energy functions and thus have perfectly funneled energy landscapes.10 Here, we use a coarse-grained form of the SBM in which each residue is represented by a single atom at the C-α position. This SBM has previously been used to study IL-1β35−39 and its energy function is described in detail elsewhere.13 In practice, the C-α

SBMs were constructed using the SMOG Web server40 with the C-α coordinates and the contact maps of the various proteins as input. These contact maps are discussed later in the Methods and are listed in the Supporting Information. In general, SBMs employ reduced units. Since we use GROMACS41 (parameters and protocol described later) for carrying out simulations, the energy scale of the SBM was set to 1 kJ/mol and temperatures were reported in K. The temperatures can be converted into reduced units by multiplying with kB, the Boltzmann constant (= 0.008314 kJ/ 11204

DOI: 10.1021/acs.jpcb.5b03111 J. Phys. Chem. B 2015, 119, 11203−11214

Article

The Journal of Physical Chemistry B mol-K). The FAQ section of the SMOG Web server40 provides additional information on both GROMACS units and reduced units. Construction of the C-α Folding Motif (FM) for the βTrefoil Fold. Natural proteins have sequences which accommodate both protein function as well as protein folding. The FM is a protein model which is expected to incorporate all the stabilizing structural features of a fold and none of the functional features such as extra loops, kinks in secondary structure, among others. Comparing the structure and the folding of this FM to that of natural proteins has previously been shown to help understand how functional features affect protein folding.19 The FM can be obtained by choosing conserved residue positions from a structural alignment of several natural proteins having diverse functions but which share the same fold. In this case, representative members of the β-trefoil fold having different functions were chosen, aligned using STAMP42 in VMD 1.9.1,29 and residue positions that were conserved in at least 7 proteins (out of 13 proteins aligned) were chosen to constitute the C-α FM. We obtained an FM comprising 127 residues that captures all the necessary features of a β-trefoil fold (Figure 2A). In addition, the C-α native contact maps of these representative proteins, obtained using Contacts of Structural Units (CSU) software,43 were also aligned based on the structural alignment. Native contacts which were observed to be present in at least four proteins were considered as conserved pairwise interactions. These constituted the native contact map of the FM (Figure 2B). The details of this method have been described in detail previously.19 Identification of Conserved Native Contacts Which Are Absent in WT IL-33 and IL-1β. We structurally aligned and compared the contact maps of IL-33 and IL-1β with that of the FM. Contacts which were unique to the FM represent conserved pairwise interactions which were absent in the wildtype (WT) proteins. For both IL-33 and IL-1β, absent contacts having a sequence separation shorter than the median sequence separation were termed CLOSE contacts (Figure 2C,D black contacts), while those which had longer sequence separations were termed FAR contacts (Figure 2C,D red contacts). It should be underlined that the CLOSE and FAR contact sets are specific to each of IL-33 and IL-1β and are different for each protein. C-α Structure-Based Models (SBMs) of WT and Mutant IL-33 and IL-1β. C-α SBMs of WT IL-33 and IL1β were derived using the C-α coordinates from PDB IDs 2KLL (model 1), and 6I1B respectively (Figure 1A,C). The contact maps of WT IL-33 and IL-1β were also computed from the same structures using the CSU software43 (Figure 1B,D). The mutants, IL-33+CLOSE and IL-33+FAR, were constructed by appending the SBM of WT IL-33 with its CLOSE and FAR contact sets, respectively. Native contact distances for CLOSE and FAR contacts were computed from C-α coordinates of 2KLL (model 1). These contact sets are shown in Figure 2C. Similarly, the mutants IL-1β+CLOSE and IL-1β+FAR were constructed by appending the SBM of WT IL-1β with its corresponding CLOSE and FAR contact sets respectively (Figure 2D). Native contact distances for CLOSE and FAR contacts of IL-1β were computed from C-α coordinates of PDB 6I1B. Absolute Contact Order (ACO). The ACO is one way to quantitate the structural complexity of a protein. It has been shown to be linearly correlated with the negative logarithm of

Figure 2. Structure and contact map of the FM and missing interactions in IL-33 and IL-1β. (A) Structure of the 127 residue FM which captures the essential features of the β-trefoil fold. Protein specific loops of WT β-trefoil proteins do not constitute conserved residue positions and are not present in the FM. As a consequence, the FM is smaller in length than IL-33 or IL-1β. (B) Native contact map of the FM on which the set of conserved pairwise interactions are marked. Intratrefoil interactions of each trefoil lie within the three black squares, while intertrefoil interactions lie outside these squares. Secondary structural elements, strands β1 to β12, are indicated along the axes by violet, filled rectangles, one for each β strand. (C, D) Native contact maps of IL-33 and IL-1β (violet contacts reproduced from Figure 1B,D) respectively. Contacts marked in black and red together are conserved pairwise interactions that are present at structurally equivalent positions in the FM but are absent in WT IL-33 (marked in C) and WT IL-1β (marked in D). For both proteins, the missing interactions which are closer to the diagonal constitute their respective CLOSE contacts (black), and those which are further away from the diagonal constitute their respective FAR contacts (red). See Supporting Information for all the contact lists.

protein folding rates.44 We calculated the ACO as follows: Let (i, j) represent any pair of residues which are defined to be in contact in the contact map of a protein. Then ACO = Σi,j|i−j|/ Ntot where Ntot is the total number of contacts in the contact map. |i−j| is the sequence separation of the residues for a given contact and ACO is the average sequence separation of contacts in the protein.44 Molecular Dynamics (MD) Simulations of SBMs. The folding temperature, Tf, is that temperature at which the folded and the unfolded ensembles of the protein are equally populated. Folding simulations are usually performed at Tf because this ensures adequate sampling of the transition region. The β-trefoil proteins are slow folding and some way of enhancing sampling was necessary to calculate thermodynamic properties near Tf. Here we used well-tempered metadynamics.45 In well-tempered metadynamics, an adaptive, historydependent, bias potential is added to the Hamiltonian of the simulation at a fixed frequency to enhance sampling between the folded and the unfolded states. The external bias potential 11205

DOI: 10.1021/acs.jpcb.5b03111 J. Phys. Chem. B 2015, 119, 11203−11214

Article

The Journal of Physical Chemistry B

done for all the three replicates to obtain a total of 15 block averaged free energy profiles for every model. Subsequently, they were all aligned to the same baseline and the standard deviation in the free energy was considered to be the error due to variations in sampling. As expected, the errors on the folding free energy profiles were higher for states having higher free energy (e.g., transition states). The maximum error on the free energy of the transition state was approximately ±0.6 kBT for IL-1β+FAR. The standard deviation of the Tfs of the 15 block averaged free energy profiles was considered to be the error in the Tf. The errors on the Tfs were at most ±0.4 K across all the models for both IL-33 and IL-1β. The magnitude of the errors on the free energy profiles and the Tfs also indicate that the simulations have converged sufficiently. All-Atom, Explicit Solvent MD Simulations of WT IL-33 and IL-1β. All-atom explicit solvent MD simulations of WT IL33 and IL-1β were carried out at 300 K and 1 bar to obtain the dynamic cross-correlation maps of these proteins. These simulations were performed in GROMACS 4.5.349 using AMBER99SB-ILDN50 force field and TIP3P water. Both proteins were solvated with water molecules, and they were charge neutralized by adding sodium ions in separate dodecahedron boxes having a spacing of at least 1 nm between the protein and the box. The box size and the number of water molecules varied across replicate simulations and depended on the NMR structure used. The details of the structures are given in the next paragraph. After energy minimization using the steepest descent algorithm, the systems were equilibrated for 100 ps in the NVT ensemble followed by another 100 ps of equilibration in the NPT ensemble. The protein was position restrained during these two steps of equilibration. The Vrescale thermostat and Parinello-Rahman barostat were used to maintain temperature and pressure, respectively. After NPT equilibration, the system was simulated for 160 ns while saving coordinates and energies every 4 ps. A time step of 2 fs was used for both systems, and all bonds were constrained using PLINCS.51 Three replicate simulations were carried out for each protein by using different starting structures. These starting structures were chosen from the multiple NMR structures that were available in the PDB (IDs 2KLL for IL-33 and; 6I1B and 7I1B for IL-1β). One replicate simulation for IL-33 and IL-1β was initiated using structures from 2KLL model 1 and 6I1B, respectively. For the remaining two replicates, the structures in the NMR ensembles of both proteins were clustered separately using the single linkage algorithm implemented in g_cluster of GROMACS 4.5.3. Root-mean-square deviation (RMSD) for every pair of structures was computed using the non-hydrogen heavy atoms in these structures. RMSD was used as the distance criteria to cluster the structures. Using a cutoff RMSD of 0.2 nm for the 10 structures in 2KLL, and a cutoff RMSD of 0.12 nm for the 32 structures in 7I1B, we obtained four clusters for both cases. We chose alternate starting structures from different clusters, in order to initiate our replicate simulations with structures that are most distinct from each other in the NMR ensemble. We chose models 2 and 3 from 2KLL and models 26 and 32 from 7I1B to initiate the replicate simulations. These were solvated, energy minimized, and equilibrated exactly as described above. Production runs were simulated for 160 ns for each replicate. However, only the last 100 ns of each replicate was used for analysis. Root-Mean-Square Fluctuations and Dynamic CrossCorrelation Maps from All-Atom, Explicit Solvent MD

was applied along the coordinate: number of native contacts formed (Nc). We used a switching function to define Nc according to the formula used in the PLUMED 1.3 manual.46 The external bias potential was a Gaussian with a height of 1.5 kJ/mol and a σ (width) of 16. The height of the Gaussian decreases with simulation time depending on the existing bias potential and the bias-factor. The bias-factor determines the rate of decrease of the height of the Gaussian, and we used a bias-factor of 20. A Gaussian was added every 2 × 106 steps with its centroid at the value of Nc sampled at that time point. The system was simulated for a total of 1.2 × 109 steps in which there were at least 25 reversible transitions between the folded and the unfolded states. The trajectory was reweighted to obtain unbiased free energy profiles versus Nc using a previously described algorithm.47 Appropriately normalizing Nc with the total number of native contacts (Ntot) resulted in free energy profiles versus the fraction of native contacts (Q). The algorithm of unbiasing applied here47 also allows for obtaining 2D free energy profiles along a pair of arbitrary reaction coordinates. We obtained the unbiased 2D free energy profiles of Qregion versus Q at Tf and used these to compute the mean Qregion along Q, for each model. Qregion is the number (and not the fraction) of native contacts present only in a given region of the protein (similar to Nc but only for a specific region of a protein). On the basis of this formulation, several reaction coordinates to monitor the folding of different regions of the proteins are defined later in the text. We obtained three independent replicate simulations for every model. For each model, the free energy profiles obtained from well-tempered metadynamics across the replicates agreed well with each other, indicating convergence. The free energy profiles, Tfs and the mean Qregions along Q were averaged over the three replicates and reported. All simulations were performed in the NVT ensemble using the stochastic dynamics integrator, with a time step of 0.5 fs, in GROMACS 4.0.741 patched with the PLUMED 1.3 plugin.48 Analysis of SBM Simulations. The contact map is the main structural input that defines the folded state of the protein in SBMs. Thus, the fraction of formed native contacts, 0 < Q < 1, can be used to measure the amount of structure formation during protein folding and is often used as a reaction coordinate. Protein folding rates may be understood in terms of the folding free energy barriers (ΔG/kBTf) at Tf, while the folding mechanism may be investigated by plotting structure formation in different regions of the protein, i.e., Qregion as a function of Q. The proteins that were studied here have different numbers of contacts in their contact maps and different Tfs (averaged over 3 replicates; WT IL-33 139.5 K, IL33+CLOSE 147.2 K, IL-33+FAR 148.7 K; WT IL-1β 140.3 K, IL-1β+CLOSE 147.0 K, IL-1β+FAR 149.2 K). Using fraction of native contacts formed (Q) and folding free energies scaled by kBTf allowed comparison across these proteins. Estimating the Errors on the Folding Free Energy Profiles. To understand the variations in sampling during a simulation, errors on the free energy profiles were estimated. Because simulations were performed using well-tempered metadynamics, we first computed free energy profiles by block averaging. We considered the histogram counts from only a fifth of the trajectory and computed five such free energy profiles, one from each of the fifths of the trajectory. Each of these block averages for the free energy profiles were reweighted to a temperature at which the free energy difference between the folded and the unfolded basins was zero. This was 11206

DOI: 10.1021/acs.jpcb.5b03111 J. Phys. Chem. B 2015, 119, 11203−11214

Article

The Journal of Physical Chemistry B Simulations. Residue-wise root-mean-square fluctuations (RMSF) were plotted from the last 100 ns of the simulations from all the three replicates for both systems using the g_rmsf tool of GROMACS 4.5.3. This RMSF includes fluctuations only from C-α atoms. These were averaged over the three replicates for each residue of the protein. To obtain the dynamic cross correlation maps, the covariance matrix of fluctuations about the average structure was computed from the last 100 ns of the simulation for each of the three replicates of both proteins systems using the g_covar tool of GROMACS 4.5.3. The covariance matrix from a replicate was reduced from 3 M × 3 M to an M × M matrix, where M is the number of C-α atoms, by storing only the trace of every 3 × 3 submatrix. It is to be noted that each 3 × 3 submatrix comprises covariances from the x, y, z positions of the same atom. Subsequently, the square root of the diagonal elements were used to normalize the covariances to obtain the Pearson product-moment correlation coefficient for each pair of C-α atoms. These cross-correlations were averaged over the three replicates for each protein system and plotted to obtain the residue−residue dynamic cross correlation maps (DCCMs).



RESULTS IL-33 Has a Smaller Folding Free Energy Barrier than IL-1β. The C-α structure based model (SBM) simulated here has previously been successfully used to describe diverse features of the folding of several β-trefoil proteins including IL1β.16,19,35−39 IL-33 is a recently discovered cytokine,52 and its folding mechanism has not been studied using experiments. Because the SBM used here has correctly described the trends in barrier heights and folding routes of the similar IL-1β,19,35−39 we use it to make several predictions about the folding of IL-33. MD simulations of IL-33 and IL-1β were performed close to their respective folding temperatures (Tfs) in order to obtain adequate sampling of both folding and unfolding transitions. The folding free energy profiles obtained from these simulations are shown in Figure 3A as a function of the fraction of native contacts (Q). We find that IL-33 has a smaller free energy barrier (∼10 kBTf) as compared to that of IL-1β (∼13 kBTf). Folding free energy barriers have been shown to correlate well with the negative logarithms of both simulated and experimental folding rates.14 Consequently, the higher barrier of IL-1β implies that IL-33 folds faster than IL-1β. Folding Mechanisms of IL-33 and IL-1β. The sequence of structural events during folding can be seen by plotting the average number of native contacts formed within different regions of the protein (Qregion; Figure 3B,D) at increasing values of Q, the fraction of total native contacts (Figure 3C,E).35 The various regions are defined in the contact maps (Figure 3B,D) and approximately coincide with individual trefoils (Figure 1B,D). In IL-33, we find that trefoil 1 folds early and then unfolds as the protein gets more folded (Figure 3C blue; Q ∼ 0.39). Such formation followed by unfolding has been seen in trefoil 3 of IL-1β (Figure 3E green; Q ∼ 0.38) and is called backtracking.35,53 The intertrefoil contacts between trefoils 2 and 3 (Figure 3C black; Q ∼ 0.39) increase when trefoil 1 backtracks. Additionally, trefoils 2 (Figure 3C red) and 3 (green) fold. Subsequently (Q ∼ 0.58), trefoil 1 (Figure 3C blue) and its contacts with trefoils 2 and 3 form (black). A little after Q ∼ 0.6, the folding mechanisms of IL-33 and IL-1β become similar. The folding mechanism of IL-1β, has been studied extensively.35−39,53−55 IL-1β has been shown to populate

Figure 3. The folding of IL-33 and IL-1β. (A) The folding free energy profiles of IL-33 (solid) and IL-1β (dashed) at their respective Tfs are plotted as a function of the fraction of native contacts formed (Q). The free energy is in units of kBTf. The native and unfolded basins are marked by N and U respectively. The native basin is at a higher value of Q (∼0.85), and the unfolded basin is at a lower value of Q (∼0.12). The free energy barrier of IL-33 is lower than that of IL-1β by ∼3 kBTf. The error bars on the free energy represent one standard deviation on either side of the data points (see Methods). (B) The native contact map of IL-33 is divided into four regions and the contacts within each region are used to define its Qregion (i.e., the number of native contacts within that region). These regions are intratrefoil contacts of trefoil 1 (blue, Qtref1), intratrefoil contacts of trefoil 2 along with its intertrefoil contacts with β4 and β9 (red, Qcenter), intratrefoil contacts of trefoil 3 (green, Qtref3), and the remainder of the contact map which essentially constitutes intertrefoil contacts (black, Qitc). The average folding pathway can be discerned by observing the order in which these regions fold with increasing Q. (C) Qtref1 (blue), Qcenter (red), Qtref3 (green) and Qitc (black) for IL-33 (right y axis) are plotted as a function of Q. The folding free energy profile for IL-33 reproduced from panel A (left y axis) is shown in gray for reference. (D) The native contact map of IL-1β is divided into trefoil 1 (blue), center (red), trefoil 3 (green), and itc (black) similar to IL-33. (E) Qtref1 (blue), Qcenter (red), Qtref3 (green) and Qitc (black) for IL-1β (right y axis) are plotted as a function of Q along with the folding free energy profile (reproduced from panel A; gray; left y axis). See Supporting Information for the definitions of all the Qregions for both proteins.

three different folding routes: a backtracking route (trefoil 3 forms and unfolds before trefoil 2), a direct route (trefoil 3 followed by trefoil 2), and an ends-together route (trefoils 1 and 3 fold first). The backtracking route was the most populated route when the SBM used here was simulated.38 As expected, this route is predominant in the present simulations 11207

DOI: 10.1021/acs.jpcb.5b03111 J. Phys. Chem. B 2015, 119, 11203−11214

Article

The Journal of Physical Chemistry B

used here show a spurious dependence on the total number (and not type) of contacts14 and as expected, the Tfs of both variants of IL-33 (IL-33+CLOSE and IL-33+FAR) increase by ∼8 K. In order to account for the differences in the Tfs of proteins, we only compare free energies normalized by kBTf. The folding free energy barrier of IL-33+CLOSE is observed to be similar to that of IL-33 (Figure 4A). However, the folding

of IL-1β. In agreement with previous definitions of the backtracking route,35,37,38 trefoil 3 folds early (Figure 3E green; Q ∼ 0.25) but subsequently unfolds to allow the folding of trefoil 2 (Figure 3E red; Q ∼ 0.38). Trefoil 3 eventually refolds, and the average structure of the transition state ensemble has both trefoil 2 and trefoil 3 folded (Figure 3E; Q ∼ 0.5). Trefoil 1 (blue) and the intertrefoil contacts (black) fold post the transition state. The cause of backtracking in IL-1β has been explored in detail earlier,35,53 and we will not comment any further on it here. IL-33 and IL-1β Lack Several Contacts That Are Conserved across the β-Trefoil Family of Proteins. As stated earlier (see Methods), conserved residue positions and the frequently occurring interactions between these positions (contacts) are extracted from a structural alignment of structurally similar but functionally diverse proteins to create a folding motif (FM). The structure of the FM of the β-trefoil fold captures the essential structural features of this fold and is shown in Figure 2A. The contact map of this FM (Figure 2B) is assumed both to contain the interactions important for the stability and the foldability of the β-trefoil fold and to omit interactions important for function.19 Simulations of the FM indicated that it had a higher barrier to folding than IL-33. It was previously shown that some WT proteins did not have conserved FM contacts for functional reasons and this could reduce the folding free energy barrier.19 Thus, it was possible that the lower folding free energy barrier of IL-33 was also caused by a loss of conserved contacts. In order to isolate such contacts, we compared the structures and the contact maps of the FM, IL-33, and IL-1β. The structures and the contact maps of IL-33 and IL-1β are shown in Figures 1 and 2C,D (violet contacts). IL-33 and IL-1β have 421 and 422 native contacts, respectively. On aligning and comparing the contact maps of IL-33 and the FM, we find 65 contacts that are unique to the FM (Figure 2C; red and black contacts). We partition these contacts into two sets based on the sequence separation between the pair of residues that form a contact. Contacts having a sequence separation shorter than the median sequence separation are closer to the diagonal and are primarily intratrefoil contacts (Figure 2C; 33 black contacts). The rest are further away from the diagonal and primarily constitute intertrefoil contacts (Figure 2C; 32 red contacts). Similarly, there are 61 contacts unique to the FM and not present in IL-1β which are then partitioned into 30 contacts with shorter sequence separations (Figure 2D; black contacts) and 31 contacts with longer sequence separations (Figure 2D; red contacts). For both proteins, we term the set of contacts with shorter sequence separations CLOSE and the set of contacts with longer sequence separations as FAR. In the next section, we understand the effect of the respective sets of CLOSE and FAR contacts on the folding of IL-33 and IL-1β. Addition of FAR Contacts Increases the Folding Free Energy Barrier by a Greater Extent in IL-33 than in IL-1β. To understand the effect of CLOSE and FAR contacts on the folding of IL-33 and IL-1β, we simulate two variant SBMs of each protein in which the WT contact maps are appended with either the CLOSE (Figure 2C,D; violet and black contacts) or the FAR contacts (Figure 2C,D; violet and red contacts). It is expected that the addition of CLOSE contacts will increase the number of local contacts and lower the height of the folding free energy barrier, while the addition of FAR contacts will increase the number of long-range contacts and the height of the folding free energy barrier.19 The Tfs of the form of SBM

Figure 4. Folding free energy profiles of IL-33 and IL-1β with added CLOSE (Figure 2C,D; black) and FAR (Figure 2C,D; red) contacts. (A) Folding free energy profiles of IL-33 (violet), IL-33+CLOSE (black), and IL-33+FAR (red) at their respective Tfs are plotted as a function of the fraction of native contacts formed (Q). The free energy is in units of kBTf. Addition of the CLOSE contacts does not significantly affect the free energy barrier observed in IL-33 (violet curve reproduced from Figure 3A). Addition of the FAR contacts increases the free energy barrier by ∼5 kBTf relative to IL-33. (B) Folding free energy profiles of IL-1β (violet), IL-1β+CLOSE (black), and IL-1β+FAR (red) at their respective Tfs plotted as a function of Q. As in IL-33, addition of the CLOSE contacts does not significantly affect the free energy barrier relative to IL-1β. Addition of the FAR contacts increases the free energy barrier by ∼2 kBTf relative to IL-1β. In both panels, the native and unfolded basins are marked by N and U, respectively. The error bars on the free energy represent one standard deviation on either side of the data points (see Methods).

free energy barrier of IL-33+FAR is significantly higher (by ∼5 kBTf) than that of IL-33 (Figure 4A). The observed effect of the CLOSE and FAR contacts on the free energy barrier of IL-33 correlates moderately with the absolute contact order (ACO) of IL-33+CLOSE and IL-33+FAR. Addition of CLOSE contacts reduces the ACO from 38.54, to 36.99, while addition of the FAR contacts increases the ACO to 41.36. The increase in folding free energy barriers (i.e., the negative logarithm of the folding rate) with an increase in ACO is a well-known correlation observed across diverse proteins.44 The changes to the folding of IL-1β upon mutation are similar to that of IL-33. Specifically, the addition of either CLOSE or FAR contacts to IL-1β increases the Tf by ∼8 K. The folding free energy barrier of IL-1β+CLOSE is similar to that of IL-1β while that of IL-1β+FAR is higher by ∼2 kBTf (Figure 4B). The addition of the CLOSE contacts reduces the ACO from 39.46 to 37.93, while the addition of the FAR contacts increases the ACO to 41.66. It should be noted that the ACO of IL-1β is larger than that of IL-33, while the ACOs of IL-33+FAR and IL-1β+FAR are similar. This correlates with the size of the free energy barriers of all four proteins (Figure 4A,B; IL-33: ∼10 kBTf, IL-1β: ∼13 kBTf, IL-33+FAR and IL1β+FAR: ∼15 kBTf). In summary, the addition of the CLOSE contacts has a minimal effect on the folding free energy barriers of the proteins, while the addition of the FAR contacts increases the folding free energy barriers for both proteins, but this increase is larger for IL-33. We note in passing that the number of intertrefoil contacts between trefoils 1 and 2 (62 11208

DOI: 10.1021/acs.jpcb.5b03111 J. Phys. Chem. B 2015, 119, 11203−11214

Article

The Journal of Physical Chemistry B contacts) and trefoils 1 and 3 (56 contacts) are lower than the number of intertrefoil contacts between trefoils 2 and 3 (67 contacts) in WT IL-33. The addition of FAR contacts makes IL-33 more symmetric with respect to its intertrefoil contacts (∼70 contacts across all pairs of trefoils). However, because the addition of FAR contacts increases the IL-1β folding barrier but does not symmetrize the number of intertrefoil contacts, we conclude that the increase in barrier is due to the nature of the added long-range contacts and not necessarily their number. Residues Which Gain at Least Two FAR Contacts Are Proximal to Binding Site A in Both IL-33 and IL-1β. In order to understand the difference in the effect of the FAR contacts on IL-33 and IL-1β, we selected those residues which gain at least two FAR contacts upon mutation (Figure 5A,B; red residues). The FAR contacts of these residues are also

shown in Figure 5C,D. IL-33+FAR has 28 such FAR contacts out of which 22 either span the barrel or are contacts between cap hairpins. These 22 contacts involve 28 residues, 18 of which are hydrophobic. All these hydrophobic residues are buried and have a fractional solvent accessible surface area (fSASA)