1 Intrinsic Disorder in a Well-Folded Globular Protein Nahren Manuel

A series of average contact maps at intermediate Q values can be used to understand the protein ...... U. S. A. 2015, 112, E259–E266. (65) Carey, J...
1 downloads 0 Views 1MB Size
Subscriber access provided by READING UNIV

Article

Intrinsic Disorder in a Well-Folded Globular Protein Nahren Manuel Mascarenhas, Vishram L. Terse, and Shachi Gosavi J. Phys. Chem. B, Just Accepted Manuscript • DOI: 10.1021/acs.jpcb.7b12546 • Publication Date (Web): 05 Jan 2018 Downloaded from http://pubs.acs.org on January 6, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

The Journal of Physical Chemistry B is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Intrinsic Disorder in a Well-Folded Globular Protein

Nahren Manuel Mascarenhas1,* Vishram L. Terse2 and Shachi Gosavi2,* 1

Department of Chemistry Sacred Heart College

Tirupattur (Vellore) 635601, India 2

Simons Centre for the Study of Living Machines National Centre for Biological Sciences Tata Institute of Fundamental Research Bangalore 560065, India

*Correspondence to: Nahren Manuel Mascarenhas ([email protected]) or Shachi Gosavi ([email protected]) Phone: +91-80-23666105 Fax: +91-80-23636662

1 ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ABSTRACT The folded structure of the heterodimeric sweet protein monellin mimics single-chain proteins of the topology β1-α1-β2-β3-β4-β5 (chain A: β3-β4-β5 and chain B: β1-α1-β2). Further, like naturally occurring single-chain proteins of a similar size, monellin folds cooperatively with no detectable intermediates. However, the two monellin chains, A and B, are marginally structured in isolation and fold only upon binding to each other. Thus, monellin presents a unique opportunity to understand the design of intrinsically disordered proteins that fold upon binding. Here, we study the folding of a single-chain variant of monellin (scMn) using simulations of an all heavy-atom structure-based model. These simulations can explain mechanistic details derived from scMn experiments performed using several different structural probes. scMn folds cooperatively in our structure-based simulations as is also seen in experiments. We find that structure formation near the transition state ensemble of scMn is not uniformly distributed but is localized to a hairpin-like structure which contains one strand from each chain (β2, β3). Thus, the sequence and the underlying energetics of heterodimeric monellin promote the early formation of the inter-chain interface (β2-β3). By studying computational scMn mutants whose “inter-chain” interactions are deleted, we infer that this energetic distribution allows the two protein chains to remain largely disordered when this interface is not folded. From these results, we suggest that cutting the protein backbone of a globular protein between residues which lie within its folding nucleus may be one way to construct two disordered fragments which fold upon binding.

2 ACS Paragon Plus Environment

Page 2 of 30

Page 3 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

INTRODUCTION The requirement to fold in a biologically relevant time-scale leads to funnel-shaped energy landscapes for structured proteins.1–3 However, many proteins are unstructured or disordered either in part or in their entirety.4–7 Although several of these intrinsically disordered proteins (IDPs) do not show any signatures of structure-formation, others bind their partners and robustly fold to the same structure upon binding.8–10 Such IDPs, when in the presence of their binding partners, have constraints that are similar to those of structured proteins and thus, could have funnel shaped energy landscapes.11,12 In fact, structure-based models (SBMs; protein models with a Gō-like potential energy function), which work only when such funnelled energy landscapes exist, have been successfully used to predict the binding mechanisms of protein dimers that fold only upon binding.13 The two chains of the heterodimeric sweet protein monellin fold upon binding.14 What is unique about monellin is that together its two chains (chain A: β3-β4-β5 and chain B: β1-α1-β2; Fig. 1) fold to a β1-α1-β2-β3-β4-β5 topology which is structurally homologous to the single chain proteins of the cystatin fold.15 Single chain variants of monellin have been engineered by connecting the C-terminus of chain B to the N-terminus of chain A either directly16,17 or with a Gly-Phe linker.18,19 However, more information about the order of structure formation20–23 during folding is available for the variant with the linker and we use that variant here. This conversion of heterodimeric monellin to single chain monellin (scMn) does not seem to affect either its structure or its function.18 Analogous to “true” single chain globular proteins, scMn folds cooperatively in a two state manner without a detectable population of equilibrium intermediates.24 In fact, monellin has been used as a model for the folding of single domain proteins.17–25 Experiments show that the two chains of heterodimeric monellin remain largely 3 ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

unfolded in isolation but fold when mixed together14 scMn can thus be thought of as a protein that is composed of two intrinsically disordered chains that behave like a single chain globular protein when connected. Thus, studying the folding mechanism of scMn provides a unique opportunity to understand some of the design principles of IDPs that fold upon binding.26–33

Figure 1. The structure and contact map of monellin. (A) scMn (PDB ID: 1IV7: β1-α-loopA-β2loop1-β3-loop2-β4-loop3-β5) was created by connecting chain B (orange: β1-α-loopA-β2) and chain A (cyan: β3-loop2-β4-loop3-β5) of the naturally occurring two chain monellin by a GlyPhe linker (blue: loop1). The secondary structural elements and the loop regions are labelled. (B) The top left triangle: All atom contact map of scMn calculated from the structure shown in (A). The intra-chain contacts are colored the same as the chain in (A) (chain A: orange, chain B: cyan). Inter-chain contacts are colored in black (β2-β3 contacts) and gray (rest). The bottom right triangle: Specific β-β contact clusters from the contact map are reproduced for clarity are labelled by the secondary structures that contribute to that cluster. Here, we simulate the folding of wild-type (WT) scMn and several of its mutants using all-heavy atom SBMs. The folding mechanism calculated using these scMn simulations reconciles several structural results obtained under diverse experimental conditions.20–22 Specifically, the simulations show that the folding of scMn is cooperative with a single free energy barrier and no 4 ACS Paragon Plus Environment

Page 4 of 30

Page 5 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

population of equilibrium intermediates. Upon further analysis, we find that the earliest structures that form during folding consist of the formation of the β2, β3 strands and the contacts between them. β2 is part of chain B while β3 is part of chain A (Fig. 1), and so the inter-chain interface forms early during folding. Here, we explore the implications of this folding mechanism on intrinsic disorder and folding upon binding. METHODS Structure-based or Gō-like models. The sequences of structured proteins have evolved to fold on a biologically reasonable timescale. This is made possible by minimizing frustration or trapping during folding.1 Trapping stabilizes misfolded intermediates with non-native structural content. Such intermediates need to unfold before folding to the final native structure and this slows folding. Minimizing trapping speeds up folding by creating a strong correlation between native structure formation and energetic stabilization and this leads to funnelled energy landscapes. Structure-based models (SBMs) are computationally efficient because they take this a step further and make the approximation that no traps or stabilized non-native interactions can form. This is done by encoding the folded structure of the protein in their force-fields.1 SBMs have been effective at quantitating bulk properties such as folding rates, barrier heights and folding cooperativity.34 They have also been able to capture the order of structure formation during folding and the structural features of partially folded ensembles at spatial resolutions difficult to observe in experiment.35 More recently SBMs have been successfully used to predict unfolded state properties36 and to understand the behaviour of intrinsically disordered proteins.37– 40

Details of the protein model. scMn (PDB ID: 1IV7, number of residues: 96; Fig. 1A) and its contact deletion mutants (see Fig. 1B and Table S1) were simulated using an SBM in which 5 ACS Paragon Plus Environment

The Journal of Physical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

every heavy atom of the protein is represented.41 This SBM includes both bonded and nonbonded terms. The positions of the minima of potential energy terms are determined from the folded structure of the protein. Bonded terms that represent bond, angle, improper dihedral and planar constraints are encoded using strong harmonic potentials while dihedral constraints have a weaker strength and are encoded using a cosine potential. Non-bonded terms include native and non-native interactions. The latter describe only excluded volume effects and are represented using a repulsive interaction. Native interactions are attractive at longer distances and are represented using a Lennard-Jones potential. Native interactions or contacts are present between pairs of atoms which are close in the folded structure of the protein and which are separated by at least three residues in sequence. The shadow contact map algorithm with a cutoff radius of 6Å and a shadowing radius of 1Å was used to calculate native contacts.42 The total number of contacts in scMn was 992. Electrostatic interactions are not explicitly included in the model. The input files (.top and .gro) required for the GROMACS43 simulations were generated using the SMOG server44 with default parameters. It should be noted that GROMACS uses 1 kJ/mol, 1 nm and 1 ps as internal units and the basic energy scale, the basic length scale and the basic time scale of the simulations were set to these values. The folding of some proteins is sensitive to the choice of SBM and its parameters.45 The current SBM was considered appropriate for the folding of scMn because it reproduces several diverse experimental observables (Fig. 2). Contact deletion mutants of monellin. In order to understand the effect of the inter-chain interface of double-chain monellin on structure formation in scMn, we created four mutants, namely, scMn(∆(β1-β2)), scMn(∆(β2-β3)), scMn(∆(β3-β4)), scMn(∆(β4-β5)). These mutants have the same terms representing all the backbone and the dihedral interactions as in the scMn SBM. However, they have fewer contacts than scMn. Each of these mutants were constructed by 6 ACS Paragon Plus Environment

Page 6 of 30

Page 7 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

deleting contacts present between the secondary structural elements given in the brackets in the name of the mutants. The contacts are shown in Fig. 1B marked by the same secondary structural elements. Further details about the deleted contacts are given in Table S1. The relevant contacts were deleted from the .top file of scMn to generate the appropriate input files for these mutants. Simulation details. MD simulations were performed using the GROMACS 5.043 program suite. The SMOG server44 provides a sample simulation parameters file (.mdp; http://smogserver.org/MDP_sample.v5.html) required for GROMACS simulations. This file with the appropriate temperature and number of time steps was used to perform the simulations. Specifically, the leap frog stochastic dynamics integrator was used with a time step of 0.0005 ps and an inverse friction constant of 1 ps. Trajectory snapshots from simulations were saved every 2000 timesteps and were used for analysis. In order to sample the transition region between the folded and the unfolded states, MD simulations of scMn were performed near its folding temperature, Tf (~114.2 in reduced units). Tf is that temperature at which the folded and the unfolded states occur with equal probabilities and multiple transitions occur between the folded and the unfolded ensembles. Simulating at Tf allows extensive sampling of the transition region. The simulation trajectory with 102 (un)folding transitions was reweighted using single-histogram reweighting such that the folded and unfolded ensembles were equally populated. This reweighting was used to calculate Tf. The contact deletion mutants were simulated at the Tf of scMn for 6×108 timesteps. This is approximately the amount of time in which 5 scMn transitions take place. No mutant folding transitions took place within this time at this temperature. Free energy profiles, error analysis and other simulation analyses. A contact is considered to be formed (q=1) in a given snapshot of the simulation trajectory if the distance between the two atoms in contact, r