Sampling Long- versus Short-Range Interactions ... - ACS Publications

Aug 14, 2017 - HITS Heidelberg Institute for Theoretical Studies, 35 Schloß ... Heidelberg University, Mathematikon, Im Neuenheimer Feld 205, 69120...
0 downloads 0 Views 4MB Size
Subscriber access provided by UNIVERSITY OF ADELAIDE LIBRARIES

Article

Sampling long versus short range interactions defines the ability of force fields to reproduce the dynamics of intrinsically disordered proteins. Davide Mercadante, Johannes Andreas Wagner, Iker Valle Aramburu, Edward A. Lemke, and Frauke Gräter J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.7b00143 • Publication Date (Web): 14 Aug 2017 Downloaded from http://pubs.acs.org on August 16, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Theory and Computation is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Sampling long versus short range interactions defines the ability of force fields to reproduce the dynamics of intrinsically disordered proteins. Davide Mercadante1,2*, Johannes A. Wagner1, Iker V. Aramburu3, Edward A. Lemke3 and Frauke Gräter1,2 1

HITS – Heidelberg Institute for Theoretical Studies, 35 Schloß Wolfsbrunnenweg, 69118 Heidelberg, Germany 2 IWR – Interdisciplinary Center for Scientific Computing, Heidelberg University, Mathematikon, Im Neuenheimer Feld 205, 69120 Heidelberg, Germany 3 Structural and Computational Biology Unit, Cell Biology and Biophysics Unit, European Molecular Biology Laboratory (EMBL), Meyerhofstraße 1, 69117 Heidelberg, Germany *Corresponding Author

Abstract Molecular dynamics simulations have valuably complemented experiments describing the dynamics of intrinsically disordered proteins (IDPs), particularly since the proposal of models to solve the artificial collapse of IDPs in silico. Such models suggest redefining non-bonded interactions, by either increasing water dispersion forces or adopting the Kirkwood-Buff force field. These approaches yield extended conformers that better comply with experiments, but it is unclear if they all sample the same intra-chain dynamics of IDPs. We have tested this by employing MD simulations and smFRET spectroscopy to sample the dimensions of systems with different sequence compositions: namely strong and weak polyelectrolytes. For strong polyelectrolytes in which charge effects dominate, all the proposed solutions equally reproduce the expected ensemble’s dimensions. For weak polyelectrolytes, at lower cutoffs, force fields abnormally alter intra-chain dynamics, overestimating excluded volume over chain flexibility or reporting no difference between the dynamics of different chains. The TIP4PD water model alone can reproduce experimentally observed changes in extensions (dimensions), but not quantitatively and with only weak statistical significance. Force field limitations are reversed with increased interaction cutoffs, showing that chain dynamics are critically defined by the presence of long-range interactions. Force field analysis aside, our study provides the first insights into how long-range interactions critically define IDP dimensions and raises the question of which length range is crucial to correctly sample the overall dimensions and internal dynamics of the large group of weakly charged yet highly polar IDPs.

ACS Paragon Plus Environment

1

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 41

Introduction Intrinsically disordered proteins (IDPs) have unhinged the dogma of modern biology by which structure is inextricably related to function. IDPs lack a stable secondary structure but are still able to fulfill vital cellular functions and are linked to the occurrence of numerous diseases.1 Importantly, their abundance in proteomes is correlated to organismal complexity2, indicating that the loss of structure can be related to the acquisition of function. Because of the absence of a well-defined structure, IDPs are characterized by high conformational plasticity, which makes their investigation through conventional structural biology techniques difficult.3-4 IDP conformers are therefore often portrayed by coarse observables, describing, for example, their overall dimensions. The conformational heterogeneity of IDP ensembles can therefore be exhaustively depicted only through a statistical description of the explored conformational space, with factors such as solvent quality and excluded volume effects determining the overall shape and extension of populated conformers.5 Computer simulations can thus be invaluable to understand IDPs’ functions and intra-chain dynamics, provided they are firstly capable of correctly reproducing the overall dimensions of the intrinsically disordered conformers populating an ensemble. Unfortunately, simulations of IDPs have suffered severe inaccuracies, as they commonly yield overly compacted conformers, which are far from experimental dimensions.6-7 This flaw, originating from adopting any of the canonical force fields and water models, has been attributed to overstating protein-protein over proteinwater interactions. Such limitations in correctly sampling the dynamics of IDPs can be more easily rationalized when considering force fields and water models that have been developed and tested over decades in which the existence of intrinsically disordered chains was neglected.2, 8 Nevertheless, the over-stabilizing tendency of force fields also strongly affects structured proteins. Petrov et al. reported how several force fields failed at reproducing the experimentally derived aggregationprone behavior of villin headpieces, driving all the simulated systems to aggregate, even though experiments revealed non-aggregating properties for some of the villin isoforms tested.9 The inaccuracy of force fields becomes remarkably obvious when simulating IDPs as these quickly collapse into unphysical globe-like conformations. The dimensions of IDPs have often been related to the presence of charged residues and their distribution along the protein sequence.10-12 It has been shown that the distribution of charged residues strongly affects the compaction of conformers for model systems that either feature alternating or block co-polymeric charges along their sequence10 or, more recently, for IDPs undergoing multiple phosphorylation events.13 Nevertheless, IDPs very often contain polar and, sometimes, hydrophobic amino acids that make the reproduction of the dimensions of experimental ensembles more challenging than for model systems or charged chains.6, 10 Generally, advancements in our ability to sample more extended conformers in explicit solvent has been brought forward by models that rebalance protein solvation by reshaping protein-water interactions. Mostly, these strategies can be categorized into two different attempts proposed by Best et al.14 and Piana et al.15. The approach proposed by Best et al. is focused on an empirical rescaling of protein-

ACS Paragon Plus Environment

2

Page 3 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

water Lennard-Jones (LJ) interactions in order to match experimental dimensions of a series of IDPs. This attempt has led to the creation of the AMBER03ws force field,14 coupled with the TIP4P2005s water model.16 In parallel, Piana et al. proposed a conceptually similar solution focused on the redefinition of water LJ parameters, which, however, is achieved through an analytical, rather than empirical, determination of dispersion forces in water and, consequently, between water and proteins. This attempt led to the creation of the TIP4PD water model.15 Although based on different frameworks, both these water rescaling models are able to yield more extended conformers and in many cases have sampled dimensions for IDPs better in line with experimental observations.6, 14-15, 17 The strikingly similar chain dimensions given by both these approaches has led to questions of how these two models, which are based on consistently diverging approaches could yield such similar outcomes.17 This is especially interesting considering the possibility that protein and water dynamics are intimately linked18, suggesting that different water models affect a protein’s internal dynamics differently. Recently, more expanded IDP ensembles in agreement with experiments have been also obtained when increasing the dispersion term of water hydrogen atoms in CHARMM-TIPS3P water molecules. Nevertheless, this effort leads to overly extended peptides in the case of charged chains.19 In parallel, the over collapse of IDPs in canonical force fields has also prompted the development of more efficient and optimized implicit-solvent models.20-21 Interestingly, a recent analysis of the IDP sequences investigated by means of computational models, revealed that the sequence of simulated intrinsically disordered chains may be biased towards the presence of only some residues, with others, like proline and glutamine, being under-represented.17 Overall, the study questioned the effective ability of these models to sample the dimensions of chains containing a series of these residues, which on the other hand are abundant in IDPs. Additionally, Miller et al. ran an exhaustive analysis of the osmotic coefficients for different amino acids.22 The outcomes of their analysis triggers two interesting observations: firstly, experimental osmotic coefficients of different amino acids are never fully reproduced, even qualitatively, by any force field or water model proposed to solve the problem of IDPs over compaction and, secondly, the osmotic coefficients retrieved from simulations are in agreement with experiments for some residues and in disagreement for others.22 This is particularly relevant for waterrescaling approaches, which aim to solve the problem of over-compaction by using empirically scaled LJ parameters for solute-solvent interactions on the basis of matching a limited set of model systems. It is indeed unclear how such an approach may affect the ability to sample the sequence specifics that determine IDPs’ dynamics or excluded volume effects, which are crucial to correctly describe the conformational dynamics of IDPs.23 One possibility is that a crude increase in water dispersion forces could lower the ability of a force field to resolve the contributions of each amino acid in modulating molecular dynamics, eventually shifting the conformational dynamics of a protein to that of a homopolymer. On the contrary, a de novo parameterization of water that defines the effects of short-distance dispersion terms (C8 and C10) not considered in the LJ parameters of canonical water models would take care of short-range interactions. These would

ACS Paragon Plus Environment

3

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 41

improve the interaction energy within water at different length scales and enhance the solvents ability to solvate IDPs meaningfully. In this scenario, it is also unclear how short- and long-range interactions may impact the ability of force fields to define the overall dimensions of IDPs. Interestingly, for both folded and unfolded states, it has been claimed that protein free-energies converge at interaction cutoffs of 0.9 nm.24 A cutoff no higher than 1.0 nm is most often adopted for the sake of computational efficiency, especially in the case of IDP dynamics, as these such entities are characterized by an enormous conformational heterogeneity. Nevertheless, some force fields retrieve a meaningful parameterization only at cutoffs larger than 1.0 nm. In parallel to the attempts of water rescaling models to solve the over collapse of IDPs, molecular dynamics simulations presented in our previous work showed that a new approach, based on the Kirkwood-Buff force field (KBFF),25-31 developed following the Kirkwood-Buff theory of solution,32 is able to reproduce the dimensions of chains retrieved from smFRET and SAXS measurements for a noncharged, highly polar and hydrophobic peptide of the intrinsically disordered nucleoporin Nup153.6 In order to test how water-rescaling models and KBFF perform when sampling the internal dynamics of weak and strong polyelectrolytes we firstly simulated strong polyelectrolytes described by alternating or block copolymeric negative and positive charges along the sequence. We then simulated the behavior and dimensions of a weak polyelectrolyte represented by the previously studied fragment of Nup153 (Nup153PxFG). This peptide is particularly abundant in proline and phenylalanine residues as it contains numerous ‘PxFG’ repeats and thus composed of amino acids that were thought to be mostly under-represented in simulated IDP sequences.17 Additionally, we rationally designed Nup153PxFG mutants in which all the proline or phenylalanine residues in the chain are mutated into alanine. We term these systems as Nup153AxFG and Nup153PxAG respectively. The substitution of proline residues to alanine is well known to shrink the ensemble, as proline, due to its steric properties, creates kinks along the chain.33-35 On the contrary, the effect of phenylalanine to alanine mutations are more difficult to predict as such substitutions can lead to two different scenarios: a compaction of the ensemble due to a decrease in the excluded volume generated by replacing a bulky residue with a smaller one or an extension of the chain caused by a higher conformational freedom of alanine with respect to phenylalanine. We investigated the relative differences in overall dimensions for the Nup153PxFG and Nup153PxAG systems using smFRET and sizeexclusion chromatography and found the phenylalanine to alanine substitution to lead to an expansion of the IDP ensemble, in agreement with previous studies.6, 36 Although all tested force fields and water models reproduce the expected trends in chain dimensions dictated by charge distribution along the chain for strong polyelectrolytes, they sample the conformational dynamics of weak polyelectrolytes very differently. AMBER99sb*-ILDN coupled with the TIP4PD water model was found to be the only model that yields results in line with experimental findings of compacted and expanded ensembles for Nup153AxFG and Nup153PxAG respectively, when using a cutoff of 1.0 nm. Importantly, the inaccuracy of AMBER03ws and KBFF in reproducing the relative differences that are observed experimentally can be reversed by increasing

ACS Paragon Plus Environment

4

Page 5 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

interaction cutoffs and by sampling longer-range interactions. More generally, our findings underpin limitations or capabilities of each approach proposed to solve the problem of over compaction, provide a useful guideline for simulating highly polar and/or hydrophobic IDPs and outline the importance of sampling long range interactions. These importantly provide evidence that aside from the ability of successfully sampling the overall dimensions of IDPs, different force fields sample intra-chain dynamics oppositely, by either favoring the establishment of short or long-range interactions. Materials and Methods Protein purification The 82 aa fragments of human Nup153 (amino acid position 1313-1390 with respect to the full length protein) (Nup153PxFG) and its corresponding fragment with all the F mutated to A (Nup153PxAG) were cloned into a pTXB3 vector with a 6His-tev cleavage site fused at the N-terminal site and an intein chitin binding domain (CBD) fused at the C-terminal site of the Nup153 fragments. Nup153 fragments contained a single cysteine and an Amber (TAG) codon mutation at the positions indicated in the sequence below. The proteins where recombinantly expressed, purified using standard Ni and Chitin affinity purification procedures under mild denaturing conditions (1x phosphate buffered saline (PBS) pH 8, 150 mM NaCl and 2M urea) and site-specifically labeled (oxime ligation with Alexa488-hydroxylamine and maleimide labeling with Alexa594-maleimide) as previously described.37 The labeled protein was purified from the free dye by size exclusion chromatography (Superdex75 10/300 GL). The selected fractions were concentrated and stored at -80˚C in 4M guanidinium hydrochloride 1x PBS. Sequences of Nup153PxFG GCPSASPAFG ANQTPTFGQS QGASQPNPPG FGSISSSTAL FPTGSQPAPP TFGTVSSSSQ PPVFGQQPSQ SAFGSGTTPN AcFA Sequence of Nup153PxAG GCPSASPAAG ANQTPTAGQS QGASQPNPPG AGSISSSTAL APTGSQPAPP TAGTVSSSSQ PPVAGQQPSQ SAAGSGTTPN AcFA Single molecule FRET spectroscopy Single molecule FRET experiments were performed on a custom built, time correlated, single photon counting multiparameter single molecule spectrometer centered around a 60x, 1.27 NA water immersion objective (Nikon) similar to that described previously.38-39 The Nup153 FRET labeled fragments where measured at around 50 pM in 1x PBS pH 7.4, 2mM DTT and 2mM Mg(CH3COO)2. In brief, the measurements of freely diffusing labeled proteins where acquired using alternating linearly polarized lasers at a pulse rate of 27MHz. The green laser (LDH 485; Picoquant, Berlin, Germany) filtered through an excitation filter of 482/18, and a white light laser, (SuperkExtreme, NKT Photonics) filtered at 572 nm (NKT photonics) where used for the excitation of the donor and acceptor dyes respectively. The signal

ACS Paragon Plus Environment

5

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 41

from the photons was acquired with a multichannel time-correlated single-photon counting module (Hydraharp 400, Picoquant). The measured data were analyzed by multiparameter fluorescence analysis (Eggeling et al., 2001; Kudryavtsev et al., 2012; Sisamakis et al., 2010). Single molecules were identified using a burst search algorithm (Enderlein et al.; Schaffer et al., 1999) and fluorescence intensities (I), donor lifetimes (τ) and donor anisotropies (r) were extracted from individual bursts. These acquired data were analyzed with a custom-written program using Igor Pro (Wavemetrics, Lake Oswego, OR). The interphoton lag time threshold for burst selection was set to 90 microseconds and identified bursts were subjected to a photon based selection threshold of 70 photons. EFRET= IA/(IA+ID) was calculated from the background, leakage and direct excitation corrected donor (ID) and acceptor (IA) signals. In Figure S1 we show additional data to rule out the possibility that observed EFRET changes could originate from changes in dye quantum yields. Molecular dynamics simulations Molecular dynamics simulations were performed using GROMACS version 5.1.1.40 The simulated systems are listed in Table 1 and were initially modeled as extended or semi-extended chains. Molecular topologies were created according to the parameters from AMBER03ws14, AMBER99sb*-ILDN41 or KBFF25-26, 29-31 (version 2) force fields, which were coupled with the TIP4P2005s16, TIP4PD15 and SPC/E42 water models respectively. After the creation of molecular topologies, the systems were placed in a dodecahedron box and solvated. Equilibration of the solvent around the protein was achieved in two steps in which protein atoms were subjected to a positional restraint through the application of a harmonic potential of 1000 kJ mol-1. In the first step, 0.5 ns-long simulations were carried out in the NVT ensemble to keep the temperature constant by using the V-rescale thermostat43, which was set to rescale the temperature at 300 K every 0.1 ps. In this step, initial velocities for each particle were generated following a Boltzmann distribution at a temperature of 300 K. In a second step, a 0.5 ns-long simulation was run in the NpT ensemble, using the velocities generated in the NVT step. In this step, pressure was kept isotropically constant at a value of 1.0 bar by using the Parrinello-Rahman barostat44 with the temperature kept at a value of 300 K as in the NVT step. After equilibration, production MD runs were carried out on 20 independent replicates for each system. The replicates differed with respect to the initial set of velocities randomly generated during the NVT equilibration step. Each replicate was run for 150 ns, accounting for a total simulated time of 3 μs for each of the investigated systems and a total collective simulated time of 45 μs. The cutoffs used to calculate van der Waals and short-range Coulomb electrostatic interactions are reported in Table 1. Beyond the cutoff used to compute Coulomb interactions the Particle Mesh Ewald (PME) summation method45 was used to account for long-range electrostatic interactions. Trajectory data were collected every 10 ps and the extrapolation of the results was carried out, discarding the first 100 ns which were still considered as equilibration time. Analyses were performed using tools available in the GROMACS suite or developed in the CTraj package (http://pappulab.wustl.edu/CTraj.html) by Alex Holehouse in the Pappu Lab at Washington University in St. Louis, or using

ACS Paragon Plus Environment

6

Page 7 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

MDTraj46 and CONAN, the latter being an in-house developed tool for the computation of contact maps. Results Protein/water models solving IDP over-compaction equally reproduce the expected dimensions of strong polyelectrolytes. In order to assess the different solutions proposed to solve the problem of overcompaction, we chose two sets of IDPs for the simulations. Firstly, 25 residue-long peptides that were highly charged in an alternated or block co-polymeric fashion were simulated. Secondly, an 82 residue-long fragment of Nup153 (Nup153PxFG) and mutated forms that feature the substitution to alanine of all the proline (Nup153AxFG) or phenylalanine (Nup153PxAG) residues, were simulated. Where possible according to the expression profiles of the wild-type mutated forms of the Nup153 fragments, the dimensions of the simulated peptides were characterized by means of smFRET spectroscopy. Finally the obtained ensembles were compared on the basis of the FRET efficiency (EFRET) and of the end-to-end distance (RE) distributions obtained from the simulations. This comparison allowed for the direct assessment of simulations, in order to identify which approach reproduced experimental findings. Additionally, we also monitored radius of gyration (RG) distributions for which, however, no experimental measure was available. The analysis of the dynamics for the charged 25-mer peptides revealed that all the tested force fields and water models reproduce well the expected trends previously delineated by Das and Pappu.10 In fact, semi-extended, elongated conformers are sampled for the alternated positively and negatively charged peptides, whereas collapsed states populate the ensemble when blocks of positive and negative stretches appear along the sequence. Therefore, all the tested force fields equally reproduce the trend in the dimensions of charged chains in the ensembles. The derived ensembles can be considered as quantitatively equivalent because there are no statistically significant differences between the RE and RG distributions obtained from the different models (Figure 1). Similarly, the scaling profiles collected for these peptides show that alternating charges shift dynamics towards the domination of excluded volume effects (Figure 2). Interactions are mostly dictated by charge repulsion and by the impossibility to establish favorable intra-chain salt-bridges due to the topology of oppositely charged amino acids along the polymer. On the other hand, intra-chain interactions are abundant when stretches featuring blocks of charged residues are present (Figure 2). In this case long-range interactions can be formed and salt-bridges stabilize the observed globe-like structures. Consistently with this scenario, asphericity profiles shift to higher values when stretches of charges are replaced by alternating charges (Figure S2), as is expected when excluded volume increases. Protein/water models solving IDP over-compaction only partially reproduce the dimensions of weak polyelectrolytes.

ACS Paragon Plus Environment

7

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 41

In contrast to the strong electrolytes, the analysis of the dynamics sampled for the intrinsically disordered Nup153PxFG fragment and its designed mutants reports remarkable differences in RE and RG between the different force fields and water models adopted (Figure 3). The differences in mean RE and RG are statistically significant, with p-values < 0.05 (Figure 4). KBFF reports the highest dimensions among the protein/water models for the wild-type Nup153PxFG peptide. These dimensions are in line with what has been previously measured by smFRET spectroscopy and SAXS.6 AMBER99sb*-ILDN with TIP4PD yields ensembles which are more collapsed compared to the other force fields, and as such differ more significantly from the other two models in terms of the predicted overall dimensions of the set of IDPs (Figures 3 and 4). Within each force field, the differences between the dimensions of the ensembles sampled for the Nup153PxFG and its mutants Nup153AxFG and Nup153PxAG are only significantly different for KBFF and AMBER99sb*-ILDN/TIP4PD and are not significantly different for AMBER03ws (Figures 3 and 4). In the case of KBFF, the dimensions of both Nup153AxFG and Nup153PxAG mutants are lower than those of Nup153PxFG. The difference between Nup153PxFG, and Nup153AxFG or Nup153PxAG is significant with p-values < 0.01 and < 0.05, respectively. According to AMBER99sb*ILDN/TIP4PD, the mutation from proline to alanine also leads to a reduction of the ensemble’s overall dimensions (p-value < 0.01). Surprisingly and in contrary to the prediction by KBFF, AMBER99sb*-ILDN predicts the dimensions of the Nup153PxAG peptide to be significantly larger (p-value < 0.05) than Nup153PxFG. The AMBER03ws force field, on the other hand, does not sample statistically significant differences (pvalues > 0.05) between the dimensions of the ensembles for Nup153PxFG, Nup153AxFG or Nup153PxAG (Figure 4). To validate the observed differences between the force fields with experimental evidence, we performed smFRET spectroscopy on the Nup153PxFG and Nup153PxAG peptides. Although Nup153AxFG has also been simulated it did not express and therefore we trusted previously collected evidence in which the substitution of proline residues to alanine generates a compaction of the ensemble33-35, 47. The experiments were thus carried out for Nup153PxFG and Nup153PxAG peptides, the behavior of which has not been previously predicted as to where the substitution from proline to alanine occurred, a case that can describe the changes in the dimensions of the Nup153AxFG system. The collected FRET efficiency (EFRET) distributions show that Nup153PxAG is substantially more extended than Nup153PxFG, with a mean EFRET of 0.49, compared to the protein’s wild type for which the EFRET = 0.6 (Figure 5). These results confirm that only AMBER99sb*-ILDN/TIP4PD reproduces the trends suggested by the experiments, while both KBFF in combination with SPC/E and AMBER03ws coupled with TIP4P2005s are not able to catch such differences. Protein/water models solving IDP over-compaction, interpret the internal dynamics of weak polyelectrolytes differently when sampling shorter-range interactions. We next asked for the underlying internal dynamics that give rise to the observed differences in the collapse tendencies of the three protein/water models. Figure 6A

ACS Paragon Plus Environment

8

Page 9 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

shows scaling profiles of the lengthening of residue-residue distances with increasing sequence length along the peptide, for the simulated ensembles. Scaling profiles reflect the ability of forming interactions at different positions along the chain, and thereby reveal deviations of simple polymer dynamics. In KBFF, the scaling profiles diverge more evidently among the three Nup variants compared to the other models. Their shape suggests that the dynamics of wild-type and mutants were dominated by excluded volume effects (Figure 6A). The asphericity profiles, as another measure of the diversion from a simple Gaussian chain-like polymer, also show interesting differences. KBFF shifts asphericity distributions towards more spherical conformers in the case of both Nup153AxFG and Nup153PxAG mutants (Figure S3A, D and G). On the contrary, AMBER99sb*ILDN/TIP4PD heavily shifts the mostly spherical shape of the wild-type ensemble towards the sampling of an aspherical space (Figure S3B, E and H). On the other hand, similarly to that observed above for the RE and RG distributions, AMBER03ws does not sample significant differences in the asphericity for all the simulated systems (Figure S3C, F and I). With respect to the intra-chain contacts formation, AMBER99sb*-ILDN/TIP4PD suggests the occurrence of longer range and longer lived interactions (Figures S4 and S5), with both hydrophobic interactions and hydrogen bonds branching off the contact map diagonal (Figures 6B and S6). Finally, the profiles obtained from AMBER03ws seemed virtually identical as they completely overlapped (Figure 6C), further confirming the lack of differences with statistical significance (Figure 4) between the collected RE and RG distributions shown in Figure 3 and the identical scaling between intra-chain stretches and chain dimension. Sampling long-range contacts by increasing interaction cutoffs reconstitutes the ability to reproduce experimental trends for IDP dimensions. A plausible explanation for the incapacity of force fields to reproduce the experimental trends observed when phenylalanine residues are mutated into alanine might be found by analyzing the spatial range within which Coulomb or LJ interactions play a role. Most often, to gain computational efficiency and at the expense of a better resolution, the distance cutoff to compute molecular interactions is conveniently set at 1.0 nm. For folded proteins and their unfolded states it has been reported that at such a value, the folding free energy converges and justifies the choice of a shorter cutoff.24, 48 Nevertheless IDPs, especially when avoiding the formation of secondary structure elements and when characterized by large conformational fluctuations in short timescales, may escape such an approximation because stretches within the same chain may fold back and come into closer contacts thus, playing a significant role in the overall definition of chain dimensions. To test this, we performed the simulations of Nup153PxFG and Nup153PxAG peptides using the exact same parameters as before with the exception that Coulomb and LJ cutoffs were set at 1.5 and 1.4 nm for KBFF and AMBER03ws force fields respectively. Remarkably, for both force fields this increase reversed the trend in dimensions observed in the simulations performed using 1.0 nm cutoffs. When simulated at a longer cutoff, Nup153PxAG is more expanded than the

ACS Paragon Plus Environment

9

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 41

Nup153PxFG peptide (Figure 7) as experiments show (Figure 5). The analysis of the shape for the collected scaling profiles reveals differences that point to the occurrence of longer-range interactions. Nevertheless, there are interesting differences between the KBFF and the AMBER03ws force fields. As for the former, intra-chain interactions are minimal whereas in the latter, the scaling profile collected for the Nup153PxFG more heavily kinks and suggests the occurrence of intrachain interactions (Figure 6). Overall, these interactions account for the lower RE and RG sampled by AMBER03ws (Figure 7) with differences only slightly more statistically significant for the RE of AMBER03ws compared to KBFF (Figure S7). Strikingly, an analysis of the interaction types shows opposite trends when KBFF or AMBER03ws are used. At larger cutoffs, AMBER03ws samples more long-range contacts for the Nup153PxFG peptide (compare Figure S6G with Figure S8C), which mostly accounts for the reduction of inter-residue distances at larger |i-j| and away from the excluded-volume (EV) regime (Figure S6E). These interactions occur between residues spaced approximately 30 amino acids along the sequence (Figure S8C). The insertion of the mutations (Nup153PxAG) increases chain flexibility and promotes the disestablishment of long-range contacts, which are in this case largely distributed along the matrix diagona (Figure S8D). The trend observed for KBFF is the opposite (Figure S8A and B). Nup153PxAG shows the formation of additional contacts occurring locally, within a range spanning approximately 10 residues. Overall, our data reveals how force fields differently sample the internal dynamics of the simulated system, even though they equally well reproduce experimental trends of collapse tendencies upon mutation.

Discussion In this study we have sampled the dynamics of peptides bearing amino acids with different chemical properties (Table 1) and compared the overall dimensions of the sampled ensembles in order to assess the validity of models proposed to solve the problem of over compaction in IDPs. For this purpose we have analyzed the dynamics of two model systems, 25-mer peptides made of alternating or block copolymeric negatively and positively charged residues and of a peptide from Nup153, which we previously adopted as a test case.6 This system is particularly suited to understand how such models sample the dynamics of a chain that does not have charged residues but instead numerous hydrophobic and polar amino acids. Overall, no significant differences in sampling the charged peptides were found for the KBFF, AMBER99sb*-ILDN and AMBER03ws force fields and all the force fields behave similarly if applied on systems for which there is an extensive contribution of charge in defining the dimension of the ensemble. On the contrary, the analysis of Nup153PxFG revealed remarkable differences between the different models, answering the question if all these models effectively sample the dynamics of intrinsically disordered regions in the same way. In order to test the ability of the different models to sample relative differences between similar peptides, we designed a series of Nup153PxFG mutants for which proline or phenylalanine residues in the chain were mutated to alanine, Nup153AxFG and Nup153PxAG respectively. From experiments we already knew the effects of these

ACS Paragon Plus Environment

10

Page 11 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

mutations on the overall dimensions of the ensembles and thus, we were able to dissect the capacity of each model to reproduce the experimentally observed trends. Interestingly, using interaction cutoffs set at 1.0 nm, KBFF correctly predicts the reduction in the dimensions of the Nup153AxFG ensemble, but fails in predicting the extended ensemble upon the phenylalanine to alanine mutations as observed with smFRET spectroscopy and size-exclusion chromatography (Figure 5). An explanation of the inconsistency shown by KBFF can be given by the analysis of the collected scaling profiles showing that the dynamics sampled by such a model is strongly dominated by excluded volume effects. In this scenario, the substitution of bulky and hydrophobic phenylalanine residues into smaller units such as alanine, allows the chain to more closely pack (as excluded volume decreases), consequently increasing the occurrence of -chain interactions (Figure S3). This lowers the median of the RE and RG distributions (Figure 3). The tendency to more closely pack the chain where the substitutions are made is also reflected by the analysis of asphericity profiles, distances and contact maps (Figures S2-S6) in which more contacts, mainly hydrogen bonds, are formed in the proximity of the contact map diagonal. Importantly, the inefficiency in correctly reproducing the experimental trends shown in Figure 5 can be overcome if 1.5 nm Coulomb and LJ cutoffs are used. In this case, the overall dimensions sampled for the Nup153PxFG and Nup153AxFG peptides reverse and the alanine-substituted system is more expanded than the WT. Scaling profiles show significant quantitative differences starting to diverge as soon as the relation between blob length and R|i-j| overcomes the previously described regime by which |i-j| < 2g.10 At longer cutoffs, the scaling profiles also show kinks describing a dynamic regime less dominated by excluded volume effects, although the shape of the scaling profile largely recalls dynamics within the EV limit. The two distinct water rescaling models suggested by Best14 and Piana15 (AMBER03ws and AMBER99sb*-ILDN respectively) are much less dominated by excluded volume, even though Nup153PxFG does not contain net charged residues and the shortening of inter-residue distances has only been described for charged systems as it is driven by electrostatic interactions between oppositely charged residues.10 Notwithstanding the increase in water dispersion forces, both force fields still favor the occurrence of durable and consistent intra-chain interactions that lead to lower dimensions for all the sampled peptides when compared to KBFF. More importantly, although starting from the same idea but with different approaches, both these models do not equally reproduce the relative differences in the ensemble’s dimensions observed between the wild type and mutated Nup153 peptides. Out of the two approaches only AMBER99sb*-ILDN/TIP4PD is successful in predicting smaller and larger conformers for the Nup153AxFG and Nup153PxAG respectively, as reported in experiments. The expansion upon phenylalanine to alanine substitutions as correctly predicted by AMBER99sb*-ILDN/TIP4PD can be explained with the fact that a more numerous presence of alanine residues along the Nup153PxAG chain increases the peptide’s flexibility through an increased solvation and therefore, leading to larger dimensions even at a lower cutoff. AMBER03ws coupled with TIP4P2005s at a lower cutoff flattens the effects of the mutations on the dimensions of the ensembles and all the simulated systems seem to be virtually identical for what concerns RE and RG, scaling profiles and asphericity profiles, all of which show a lack of statistically significant differences.

ACS Paragon Plus Environment

11

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 41

Increasing the cutoff enables AMBER03ws to show the experimental trends in overall dimensions, even though it still samples RE and RG very differently compared to KBFF. KBFF on the other hand, seems to better reproduce the values of end-toend distance and radius of gyration as previously measured by means of smFRET and SAXS for the Nup153PxFG system investigated here.6 A larger cutoff more strongly affects the sampling of LJ interactions, as electrostatic interactions beyond the cutoff are taken into account by the particle mesh Ewald summation method (PME)45. PME, however, computes electrostatics in the reciprocal space by interpolating charges on a grid, rather than by direct summation as within the Coulomb cutoff, and overall leads to a reduced accuracy in sampling electrostatic contributions to dynamics. Surprisingly, a more in-depth analysis of the interaction types sampled for the Nup153PxFG and Nup153PxAG at larger cutoffs shows that AMBER03ws and KBFF both reproduce the same trends in the peptides’ overall dimensions but by two opposite mechanisms (Figure S8). While for AMBER03ws, the mutations lead to more extended conformers through the loss of interactions, for KBFF a gain of local interactions can be observed and is reflected in differently shaped scaling profiles that do not show kinks as previously described for charged systems alone. This ignites the question of what the internal dynamics of highly polar, uncharged, IDPs really look like and if long-range interactions described for charged systems equally occur or if only short-range contacts define the experimentally shown trends. Such questions have an enormous biological significance, as the large majority of IDP’s feature highly polar rather than charged sequences. Overall, these observations speak in favor of significantly different performances of the models proposed to mitigate the collapse of IDPs, especially in the case where the simulated system is predominantly or exclusively composed of polar residues and characterized by the absence of charged amino acids. Indeed, while all the tested models nicely cope with the presence of charges along the chain and reproduce charge repulsion in alternatingly charged chains as well as attraction in the presence of block co-polymeric charged stretches, they describe strongly different dynamics when polar chains are simulated. Recently an additional model proposing to increase the ε term of LJ interaction for the water hydrogen (εH) of the CHARMM-modified water model has surfaced.19 While this model, such as the other water rescaling models, yields more extended conformers, it has been shown to yield overly extended conformers of strong polyelectrolytes and it seems to therefore be system-dependent.19 Recently the parameterization of water has evolved towards the optimization of its dielectric moment proposing two additional water models: namely TIP4Q and TIP4P/ε that accurately reproduce water properties at different temperatures.49-50 It will be interesting to see how these water models perform in sampling the dynamics of weakly charged proteins. Overall, our results indicate that predicting the effects of sequence changes along IDPs, in particular those not involving charged residues, remains a challenge, in spite of the recent advancements proposed to solve the collapse of IDPs in molecular simulations.

ACS Paragon Plus Environment

12

Page 13 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

List of Tables System

EKEK

EEKK

Nup153PxFG

Nup153AxFG

Nup153

PxAG

Force field

Water model SPC/E

Number of mutations -

Cutoff for computing interactions (nm) 1.0

KBFF AMBER99sb*-ILDN AMBER03ws KBFF AMBER99sb*-ILDN AMBER03ws KBFF AMBER99sb*-ILDN AMBER03ws KBFF AMBER99sb*-ILDN AMBER03ws KBFF AMBER99sb*-ILDN AMBER03ws

TIP4PD TIP4P2005s SPC/E TIP4PD TIP4P2005s SPC/E TIP4PD TIP4P2005s SPC/E TIP4PD TIP4P2005s SPC/E TIP4PD TIP4P2005s

14 (~17%) 14 (~17%) 14 (~17%) 8 (~10%) 8 (~10%) 8 (~10%)

1.0 1.0 1.0 1.0 1.0 1.0 & 1.5 1.0 1.0 & 1.4 1.0 & 1.5 1.0 1.0 & 1.4 1.0 1.0 1.0

Table 1 Synoptic table of the simulated systems, number and percentage of the mutations applied to PxFG Nup153 peptide and the interaction cutoffs at which direct Coulomb and LJ interactions have been computed, with PME being used to compute electrostatics beyond the cutoff.

ACS Paragon Plus Environment

13

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 41

List of Figures

Figure 1. End-to-end distance (RE) and radius of gyration (RG) distributions for the ensembles of 25-mer peptides featuring the presence of block co-polymeric (A-C) or alternating (D-F) charged residues, simulated using the KBFF (A and D), AMBER99sb*-ILDN (B and E) or AMBER03ws(C and F) force fields coupled with the SPC/E, TIP4PD and TIP4P2005s water models respectively. RE and RG values are colored according to their frequency from dark blue to yellow.

ACS Paragon Plus Environment

14

Page 15 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 2. Scaling profiles for the ensembles of 25-mer peptides featuring the presence of alternated (red) or block co-polymeric (blue) charged residues and simulated using the KBFF (A), AMBER99sb*-ILDN (B) and AMBER03ws(C) force fields coupled with the SPC/E, TIP4PD and TIP4P2005s water models respectively.

ACS Paragon Plus Environment

15

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 41

Figure 3. (A) End-to-end distance (RE) and (B) radius of gyration (RG) distributions of the Nup153PxFG, Nup153AxFG and Nup153PxAG simulated ensembles. The different force fields tested are reported in yellow (KBFF), light blue (AMBER99sb*-ILDN) and green (AMBER03ws). The red squares inside the boxplots represent the means of the distributions whereas the red lines show the medians.

ACS Paragon Plus Environment

16

Page 17 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 4. Statistical Kolmogorov-Smirnov analysis of the simulated ensembles. The matrices show the p-values estimated from the comparison of the end-to-end (A) and radius of gyration (B) distributions shown in Figure 3. For representation purposes, rather than the p-values, the negative logarithm of the p-value is reported. Squares are colored according to the color bar shown at the bottom of the figure . pvalues of 0.05, 0.01 and 0.001 are mapped on the color bars by green dashed lines.

ACS Paragon Plus Environment

17

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 41

Figure 5. Nup153PxAG is more extended than Nup153PxFG: A) Cartoon representing the Nup153PxFG and Nup153PxAG fragments. The red/green spheres represent the location of the Alexa594 and Alexa488 fluorophores respectively. B) Normalized size exclusion chromatography elution profiles monitoring the absorbance at 488nm of Alexa488 single labeled Nup153PxFG (black) and Nup153PxAG (red). Nup153PxAG elutes at a lower volume indicating that it has an apparent larger hydrodynamic volume than Nup153PxFG. C) Single molecule fluorescence 2D histograms of burst integrated lifetime (τ) versus FRET efficiency (EFRET) of Nup153PxFG (upper plot) and Nup153PxAG (lower plot), the plots are color-coded for frequency of occurrence. The EFRET values report that the distances between donor and acceptor dyes give high EFRET values when the donor and acceptor are in close proximity and low EFRET values when the distance between donor and acceptor is greater therefore, the efficiency of energy transferred from the donor to the acceptor is lower. The dashed line visualizes the center position of the FRET peak for Nup153PxFG. The smFRET data indicates that Nup153PxFG is more compact (higher EFRET) than Nup153PxAG (lower EFRET) which is more extended. The shift in τ data is consistent with this observation.

ACS Paragon Plus Environment

18

Page 19 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 6. Scaling profiles for the ensembles simulated using the KBFF (A and D), AMBER03ws (B and E) and AMBER99sb*-ILDN (C) force fields coupled with the SPC/E, TIP4P2005s and TIP4PD water models respectively. The reported profiles have been computed from trajectories performed using Coulomb and van der Waals cutoffs at 1.0 (in panels A and B) or 1.5 nm (in panels D and E). They show the average distance, along the simulated trajectory, between residue i and j as previously described by Das and Pappu.10 The differently simulated variants of the Nup153 fragment investigated are shown in yellow (Nup153PxFG), green (Nup153AxFG) and light blue (Nup153PxAG) and are dubbed, in the legend on the right-hand side, as WT, PtoA and FtoA respectively.

ACS Paragon Plus Environment

19

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 41

Figure 7. End-to-end distance (RE) and radius of gyration (RG) distributions of the Nup153PxFG and Nup153PxAG simulated ensembles using as Coulomb and van der Waals cutoffs the values of 1.0 (yellow) or 1.5 (green) nm. KBFF (shown in panel A) and AMBER03ws force fields (panel B) have been tested for their ability to reproduce the experimental trends shown in Figure 5. The red squares inside the boxplots represent the means of the distributions whereas the green lines show the medians. A Kolmogorov-Smirnov test showing the statistically significant differences between the distributions reported above can be found in Figure S7.

ACS Paragon Plus Environment

20

Page 21 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

TOC graphic. Description: Schematic representation of the performance of compaction-solving force fields or water models in sampling dimensions (and dynamics) of intrinsically disordered protein chains that either exhibit a highly charged (strong polyelectrolytes) or highly polar and hydrophobic sequence (weak polyelectrolytes). For weak polyelectrolytes, the experimentally-derived trends can only be reproduced with increased interaction cutoffs, accounting for the importance of long-range interactions in defining dimensions and conformational dynamics of polar intrinsically disordered proteins.

ACS Paragon Plus Environment

21

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 41

References (1) Babu, M. M.; van der Lee, R.; de Groot, N. S.; Gsponer, J., Intrinsically disordered proteins: regulation and disease. Curr. Opin. Struct. Biol. 2011, 21, 43240. (2) Xue, B.; Dunker, A. K.; Uversky, V. N., Orderly order in protein intrinsic disorder distribution: disorder in 3500 proteomes from viruses and the three domains of life. J. Biomol. Struct. Dyn. 2012, 30, 137-49. (3) Wright, P. E.; Dyson, H. J., Linking folding and binding. Curr. Opin. Struct. Biol. 2009, 19, 31-8. (4) Ball, K. A.; Wemmer, D. E.; Head-Gordon, T., Comparison of structure determination methods for intrinsically disordered amyloid-beta peptides. J. Phys. Chem. B 2014, 118, 6405-16. (5) Mao, A. H.; Lyle, N.; Pappu, R. V., Describing sequence-ensemble relationships for intrinsically disordered proteins. Biochem. J. 2013, 449, 307-18. (6) Mercadante, D.; Milles, S.; Fuertes, G.; Svergun, D. I.; Lemke, E. A.; Grater, F., Kirkwood-Buff Approach Rescues Overcollapse of a Disordered Protein in Canonical Protein Force Fields. J. Phys. Chem. B 2015, 119, 7975-84. (7) Henriques, J.; Cragnell, C.; Skepo, M., Molecular Dynamics Simulations of Intrinsically Disordered Proteins: Force Field Evaluation and Comparison with Experiment. J. Chem. Theory Comput. 2015, 11, 3420-31. (8) Dunker, A. K., In Intrinsically Disordered Proteins Studied by NMR Spectroscopy. I ed.; Springer International Publishing: New York, 2015; Vol. 870, pp 3-10. (9) Petrov, D.; Zagrovic, B., Are current atomistic force fields accurate enough to study proteins in crowded environments? PLoS Comput. Biol. 2014, 10, e1003638. (10) Das, R. K.; Pappu, R. V., Conformations of intrinsically disordered proteins are influenced by linear sequence distributions of oppositely charged residues. Proc. Natl. Acad. Sci. U. S. A. 2013, 110, 13392-7. (11) Müller-Späth, S.; Soranno, A.; Hirschfeld, V.; Hofmann, H.; Rüegger, S.; Reymond, L.; Nettels, D.; Schuler, B., Charge interactions can dominate the dimensions of intrinsically disordered proteins. Proc. Natl. Acad. Sci. U. S. A. 2010, 107, 14609-14614. (12) Mao, A. H.; Crick, S. L.; Vitalis, A.; Chicoine, C. L.; Pappu, R. V., Net charge per residue modulates conformational ensembles of intrinsically disordered proteins. Proc. Natl. Acad. Sci. U. S. A. 2010, 107, 8183-8. (13) Martin, E. W.; Holehouse, A. S.; Grace, C. R.; Hughes, A.; Pappu, R. V.; Mittag, T., Sequence determinants of the conformational properties of an intrinsically disordered protein prior to and upon multisite phosphorylation. J. Am. Chem. Soc. 2016, 138, 15323–15335. (14) Best, R. B.; Zheng, W.; Mittal, J., Balanced Protein-Water Interactions Improve Properties of Disordered Proteins and Non-Specific Protein Association. J. Chem. Theory Comput. 2014, 10, 5113-5124. (15) Piana, S.; Donchev, A. G.; Robustelli, P.; Shaw, D. E., Water dispersion interactions strongly influence simulated structural properties of disordered protein States. Journal Physical Chemistry B 2015, 119, 5113-23.

ACS Paragon Plus Environment

22

Page 23 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(16) Abascal, J. L.; Vega, C., A general purpose model for the condensed phases of water: TIP4P/2005. J. Chem. Phys. 2005, 123, 234505. (17) Henriques, J.; Skepo, M., Molecular Dynamics Simulations of Intrinsically Disordered Proteins: On the Accuracy of the TIP4P-D Water Model and the Representativeness of Protein Disorder Models. J. Chem. Theory Comput. 2016, 12, 3407-15. (18) Köhler, M. H.; Barbosa, R. C.; da Silva, L. B.; Barbosa, M. C., Role of the hydrophobic and hydrophilic sites in the dynamic crossover of the protein-hydration water. Physica A: Statistical Mechanics and its Applications 2017, 468, 733-739. (19) Huang, J.; Rauscher, S.; Nawrocki, G.; Ran, T.; Feig, M.; de Groot, B. L.; Grubmuller, H.; MacKerell, A. D., Jr., CHARMM36m: an improved force field for folded and intrinsically disordered proteins. Nat. Meth. 2017, 14, 71-73. (20) Bottaro, S.; Lindorff-Larsen, K.; Best, R. B., Variational Optimization of an AllAtom Implicit Solvent Force Field to Match Explicit Solvent Simulation Data. J. Chem. Theory Comput. 2013, 9, 5641-5652. (21) Vitalis, A.; Pappu, R. V., ABSINTH: a new continuum solvation model for simulations of polypeptides in aqueous solutions. J. Comput. Chem. 2009, 30, 67399. (22) Miller, M. S.; Lay, W. K.; Elcock, A. H., Osmotic Pressure Simulations of Amino Acids and Peptides Highlight Potential Routes to Protein Force Field Parameterization. J. Phys. Chem. B 2016, 120, 8217-29. (23) Song, J.; Gomes, G.-N.; Gradinaru, C. C.; Chan, H. S., An Adequate Account of Excluded Volume Is Necessary To Infer Compactness and Asphericity of Disordered Proteins by Förster Resonance Energy Transfer. J. Phys. Chem. B 2015, 119, 1519115202. (24) Piana, S.; Lindorff-Larsen, K.; Dirks, R. M.; Salmon, J. K.; Dror, R. O.; Shaw, D. E., Evaluating the effects of cutoffs and treatment of long-range electrostatics in protein folding simulations. PLoS One 2012, 7, e39918. (25) Ploetz, E. A.; Bentenitis, N.; Smith, P. E., Developing Force Fields from the Microscopic Structure of Solutions. Fluid Phase Equilib. 2010, 290, 43. (26) Ploetz, E. A.; Smith, P. E., A Kirkwood-Buff force field for the aromatic amino acids. Phys. Chem. Chem. Phys. 2011, 13, 18154-67. (27) Smith, P. E., Equilibrium dialysis data and the relationships between preferential interaction parameters for biological systems in terms of Kirkwood-Buff integrals. J. Phys. Chem. B 2006, 110, 2862-8. (28) Tironi, I. G.; Sperb, R.; Smith, P. E.; van Gunsteren, W. F., A generalized reaction field method for molecular dynamics simulations. J. Chem. Phys. 1995, 102, 5451-5459. (29) Weerasinghe, S.; Smith, P. E., Kirkwood-Buff derived force field for mixtures of acetone and water. J. Chem. Phys. 2003, 118, 10663-10670. (30) Weerasinghe, S.; Smith, P. E., A Kirkwood-Buff derived force field for the simulation of aqueous guanidinium chloride solutions. J. Chem. Phys. 2004, 121, 2180-6. (31) Weerasinghe, S.; Smith, P. E., A Kirkwood-Buff derived force field for methanol and aqueous methanol solutions. J. Phys. Chem. B 2005, 109, 15080-6. (32) Kirkwood, J. G.; Buff, F. P., The Statistical Mechanical Theory of Solutions. I J. Chem. Phys. 1951, 19, 774–777.

ACS Paragon Plus Environment

23

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 41

(33) Marsh, J. A.; Forman-Kay, J. D., Sequence determinants of compaction in intrinsically disordered proteins. Biophys. J. 2010, 98, 2383-90. (34) Perez, R. B.; Tischer, A.; Auton, M.; Whitten, S. T., Alanine and proline content modulate global sensitivity to discrete perturbations in disordered proteins. Proteins 2014, 82, 3373-84. (35) Tomasso, M. E.; Tarver, M. J.; Devarajan, D.; Whitten, S. T., Hydrodynamic Radii of Intrinsically Disordered Proteins Determined from Experimental Polyproline II Propensities. PLoS Comput. Biol. 2016, 12, e1004686. (36) Yamada, J.; Phillips, J. L.; Patel, S.; Goldfien, G.; Calestagne-Morelli, A.; Huang, H.; Reza, R.; Acheson, J.; Krishnan, V. V.; Newsam, S.; Gopinathan, A.; Lau, E. Y.; Colvin, M. E.; Uversky, V. N.; Rexach, M. F., A bimodal distribution of two distinct categories of intrinsically disordered structures with separate functions in FG nucleoporins. Mol. Cell. Proteomics 2010, 9, 2205-24. (37) Brustad, E. M.; Lemke, E. A.; Schultz, P. G.; Deniz, A. A., A general and efficient method for the site-specific dual-labeling of proteins for single molecule fluorescence resonance energy transfer. J Am Chem Soc 2008, 130, 17664-5. (38) Delaforge, E.; Milles, S.; Bouvignies, G.; Bouvier, D.; Boivin, S.; Salvi, N.; Maurin, D.; Martel, A.; Round, A.; Lemke, E. A.; Jensen, M. R.; Hart, D. J.; Blackledge, M., Large-Scale Conformational Dynamics Control H5N1 Influenza Polymerase PB2 Binding to Importin alpha. J Am Chem Soc 2015, 137, 15122-34. (39) Sisamakis, E.; Valeri, A.; Kalinin, S.; Rothwell, P. J.; Seidel, C. A. M., Chapter 18 - Accurate Single-Molecule FRET Studies Using Multiparameter Fluorescence Detection. In Methods Enzymol., Nils, G. W., Ed. Academic Press: 2010; Vol. Volume 475, pp 455-514. (40) Van Der Spoel, D.; Lindahl, E.; Hess, B.; Groenhof, G.; Mark, A. E.; Berendsen, H. J., GROMACS: fast, flexible, and free. J. Comput. Chem. 2005, 26, 1701-18. (41) Lindorff-Larsen, K.; Piana, S.; Palmo, K.; Maragakis, P.; Klepeis, J. L.; Dror, R. O.; Shaw, D. E., Improved side-chain torsion potentials for the Amber ff99SB protein force field. Proteins 2010, 78, 1950-8. (42) Berendsen, H. J. C.; Grigera, J. R.; Straatsma, T. P., The Missing Term in Effective Pair Potentials. J. Phys. Chem. 1987, 91, 6269-6271. (43) Bussi, G.; Donadio, D.; Parrinello, M., Canonical sampling through velocity rescaling. J. Chem. Phys. 2007, 126, 014101. (44) Parrinello, M.; Rahman, A., Polymorphic transitions in single crystals: A new molecular dynamics method. J. Appl. Phys. 1981, 52, 7182-7190. (45) Darden, T.; York, D.; Pedersen, L., Particle mesh Ewald: An N⋅log(N) method for Ewald sums in large systems. The Journal of Chemical Physics 1993, 98, 10089. (46) McGibbon, R. T.; Beauchamp, K. A.; Harrigan, M. P.; Klein, C.; Swails, J. M.; Hernandez, C. X.; Schwantes, C. R.; Wang, L. P.; Lane, T. J.; Pande, V. S., MDTraj: A Modern Open Library for the Analysis of Molecular Dynamics Trajectories. Biophys. J. 2015, 109, 1528-32. (47) Dyson, H. J.; Wright, P. E., Intrinsically unstructured proteins and their functions. Nat. Rev. Mol. Cell Biol. 2005, 6, 197-208. (48) Piana, S.; Lindorff-Larsen, K.; Shaw, D. E., How robust are protein-folding simulations with respect to force field parameterization? Biophys. J. 2011, 100, L4749.

ACS Paragon Plus Environment

24

Page 25 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(49) Fuentes-Azcatl, R.; Alejandre, J., Non-polarizable force field of water based on the dielectric constant: TIP4P/epsilon. J Phys Chem B 2014, 118, 1263-72. (50) Alejandre, J.; Chapela, G. A.; Saint-Martin, H.; Mendoza, N., A non-polarizable model of water that yields the dielectric constant and the density anomalies of the liquid: TIP4Q. Phys Chem Chem Phys 2011, 13, 19728-40.

ACS Paragon Plus Environment

25

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

42x20mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 26 of 41

Page 27 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

375x229mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

561x267mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 28 of 41

Page 29 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

389x130mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

337x170mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 30 of 41

Page 31 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

175x136mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

483x283mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 32 of 41

Page 33 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

283x325mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

121x183mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 34 of 41

Page 35 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

378x223mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

378x224mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 36 of 41

Page 37 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

374x377mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

99x88mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 38 of 41

Page 39 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

579x495mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

195x118mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 40 of 41

Page 41 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

291x241mm (300 x 300 DPI)

ACS Paragon Plus Environment