Specific Force Fields for Molecular Dynamics Simulations of Peptide

Developments and Applications of Coil-library-based Residue-. Specific Force Fields for Molecular Dynamics Simulations of. Peptides and Proteins. Fan ...
2 downloads 0 Views 1MB Size
Subscriber access provided by Iowa State University | Library

Perspective

Developments and Applications of Coil-library-based Residue-Specific Force Fields for Molecular Dynamics Simulations of Peptides and Proteins Fan Jiang, Hao-Nan Wu, Wei Kang, and Yun-Dong Wu J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.8b00794 • Publication Date (Web): 08 Jan 2019 Downloaded from http://pubs.acs.org on January 9, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Developments and Applications of Coil-library-based ResidueSpecific Force Fields for Molecular Dynamics Simulations of Peptides and Proteins

Fan Jiang,1,† Hao-Nan Wu,1,† Wei Kang,1,2,† and Yun-Dong Wu1,2,* 1. Laboratory of Computational Chemistry and Drug Design, State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen, 518055, China. 2. College of Chemistry and Molecular Engineering, Peking University, Beijing, 100871, China.

†These

authors contributed equally to this paper.

*Correspondence

and requests for materials should be addressed.

1

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Abstract: Molecular dynamics (MD) simulation has become a powerful tool for studying the structures and functional mechanisms of biomolecules, and its reliability crucially depends on the accuracy of underlying force fields. This review describes our recent efforts to develop more accurate protein force fields by improving the description of intrinsic conformational preferences of amino acid residues using residue-specific dihedral-angle-related parameters. Both backbone and side-chain conformational distributions and their coupling were optimized to fit those from protein coil library. The resulting force fields RSFF1 and RSFF2 have been found to be more accurate than popular protein force fields, in reproducing experimental structural data of various peptides and proteins. They have also been successfully used in studying folding mechanisms and refinement of structure models. Further methodology developments related to intrinsically disordered proteins (RSFF2+) and a more universal implementation (RSFF2C) based on CMAP potentials are also described.

2

ACS Paragon Plus Environment

Page 2 of 40

Page 3 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

1. Introduction The brilliant physicist Richard Feynman once famously stated that: “Everything that living things do can be understood in terms of the jiggling and wiggling of atoms.” In 1977, McCammon et al. reported the first molecular dynamics (MD) simulation of a small protein (58 residues), which was performed in gas-phase for less than 10 picoseconds.1 Since then, with the boost of rapidly developing computation technologies (both hardware and software), a surging number of MD simulations of biomolecules have been reported with increasingly longer simulation times and greater complexity of simulated systems. In 2010, all-atom MD simulation reaching millisecond time-scale was achieved on the same protein in explicit solvent.2 Also, MD simulations on an entire virus (the simplest life form) has been reported more than a decade ago.3 Now, MD simulation is becoming a powerful tool extensively used in understanding the mechanisms of living things. All MD simulations depend on certain description of interactions between atoms. Although some very early MD simulations used discontinues potential energy surface (PES),4 currently the vast majority of MD simulations utilize the force acted on each atom calculated from analytically smooth PES. These force fields are approximations of the underlying quantum mechanics (QM) governing the microscopic world, and are usually empirical with many adjustable parameters. Thus, the accuracy of these force fields is fundamental to the reliability of MD simulations. With increasingly better conformational sampling,5−9 more and more inaccuracies in protein force fields have been revealed and inspired continuous improvements. For the recent two decades, considerable efforts have been made (Figure 1) to update classical protein force fields such as AMBER,10−13 CHARMM,14−17 and OPLS-AA.18−20 Most 3

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

of them have been focused on adjusting the parameters of backbone (ϕ, ψ) and/or side-chain (χ) dihedral-angle (torsion) terms to fit ab initio QM calculations or experimental (especially NMR) observables. For example, AMBER ff99SB was developed by fitting the conformational energies of glycine and alanine tetrapeptides (Ac-X3-NHMe) from gas-phase QM calculations.11 Later, the side-chain potentials of Ile, Leu, Asp, and Asn in ff99SB were improved by fitting to gas-phase QM energies of dipeptide models (blocked amino acids, Ac-X-NHMe), resulting in ff99SB-ildn.12 Despite these efforts, simulations of peptide/protein folding have revealed biased secondary structure propensities in various force fields,21−25 promoting further corrections. For example, Best et al. added small corrections to backbone ψ potential to reproduce the α-helical contents of poly-alanine-based peptides (AMEBR ff99SB* and ff03*).26 More recent force field optimizations, including CHARMM22*,16 AMBER ff14SB13 and CHARMM36,17 have adopted a comprehensive way to take into account QM calculations, conformational distributions in protein crystal structures, as well as NMR and other experimental data.

Figure 1. Recent developments of all-atom protein force fields. AMBER has more force field variants than CHARMM and OPLS-AA.

Currently, MD Simulations of proteins with state-of-the-art force fields have proved to be quite accurate in modeling well-folded proteins27 and are able to fold various small proteins spontaneously.28 However, the accuracy of these force fields are still significantly system4

ACS Paragon Plus Environment

Page 4 of 40

Page 5 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

dependent. For example, AMBER ff99SB*-ildn and CHARMM22* can both reproduce the experimental helicity of a poly-alanine-based peptide at 300 K, but the former significantly overestimates the stability of α-helical miniprotein Trp-cage while the latter severely underestimates it.29 Besides, AMBER 99SB*-ildn cannot well stabilize the folded state of some β-hairpin peptides.29 Therefore, the match between sequence and structure is still suboptimal in these force fields. It has been known that different amino acid (AA) residues have different conformational preferences. For example, α-helix is preferred by Ala/Glu/Gln/Leu/Met while βsheet is preferred by Val/Ile/Thr.30−32 However, these force fields use same backbone torsion parameters for most AAs, which may not well capture the AA-specific conformational behaviors. We first faced this problem in the development of the PACE force field,33 which is a multiscale model with united-atom description of peptide/protein coupled with coarse-grained water (the MARTINI model).34 In PACE, we use 10 different sets of backbone (ϕ, ψ) parameters for the 20 different AAs. Since both QM calculations and NMR experiments cannot give detailed and reliable residue-specific (ϕ, ψ) free energy surfaces (potential of mean force, PMF) of dipeptide models in water, we used conformational distributions from statistical analysis of protein crystal structures as the reference data. Mining information from databases of known protein structures is a common practice in developing knowledge-based potentials, which have been widely and successfully used in protein structure prediction.35-47 Some popular physicsbased potentials such as CHARMM 22/CMAP15,48 also have empirical adjustments based on backbone conformational distributions observed in protein crystal structures. Our PACE force field also proved to be a success: it can fold a variety of peptides and mini-proteins even though it is a simplified model.49 Inspired by the success of the PACE force field, we applied this 5

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

strategy to improve all-atom protein force fields, towards the goal of achieving higher accuracy in simulating peptide and protein systems, without sacrificing any simulation speed. Several recently developed force fields also have some consideration for residue-specificity. Starting from AMBER 99SB, Wang et al.50 refitted all parameters for bond, angle, backbone and sidechain dihedrals terms using the ForceBalance51, yielding the AMBER-FB15 force field. Within this model, residue-specific torsion parameters were implemented by assigning new atom types to Cβ atoms (and Cγ atoms for some residues). These were validated by better agreement with experimental NMR scalar couplings. The AMBER ff15ipq reported by Debiec et al.52 has different ϕ/ψ torsions for five kinds of amino acid residues: Gly, Pro, negatively charged, positively charged, and other neutral residues. However, both of the two variants were developed by fitting against QM results, which is quite different from our coil-library-based parametrization philosophy. In the following section, we will first briefly introduce the construction and statistical analyses of the protein coil library. Then, we describe how to incorporate the obtained conformational distributions of 20 AA residues into all-atom protein force fields, in the development of residue-specific force fields. Finally, their validations and applications will be introduced, especially in folding and structure prediction of peptides and proteins.

2. Statistical Analyses of Protein Coil Library 2.1 Construction of protein coil library Obtaining intrinsic conformational (both backbone and side-chain) preferences of an AA residue without the bias from other residues is crucial for accurate modelling of both foldable 6

ACS Paragon Plus Environment

Page 6 of 40

Page 7 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

and intrinsically disordered proteins (IDPs), especially the latter, where the inaccuracies in existing force fields are more evident.53,54 In our studies, these intrinsic preferences were extracted from the so-called protein “coil library”,55-59 which consists of residues outside regular secondary structures. Previously, the potential of using coil library statistics has been explored in simulations of denatured or intrinsically disordered proteins.60-64 Herein, we describe the construction and verification of our coil library. The conformation of a single residue in a protein is strongly affected by their interactions with other residues. The impact of other residues can be observed in short peptides without any secondary structures. Two-residue model peptides have been studied experimentally65 and computationally66 to analyze the effect of residues on conformational preference of their neighboring residues, and residues with bulky aromatic side chains were found to have the largest neighboring residue effects (NRE) on adjacent residues. When peptide chains grow longer, the formation of secondary structures can further restrict the conformational space to a quite small (ϕ, ψ) region in the Ramachandran plot, driven by inter-residue backbone hydrogen bonds. For example, the (ϕ, ψ) distribution of all alanine residues in high-resolution protein crystal structures is mostly within a small right-handed-helical (αR) region (Figure 2A),67 due to the formation of α-helix. This is very different from spectroscopic experiments of short Ala-based peptides in aqueous solution, which give polyproline-II (PII) conformation as the dominant one.68−73

7

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2. Ramachandran plots (ϕ  distributions) of Ala residue from whole protein structures (top) and a protein coil library (bottom). All probability densities were obtained using a 2D Gaussian kernel estimator (σ=10°), and contours were draw on natural logarithm scale (every kBT free energy difference). Five major conformational basins were labelled.

Firstly, to avoid the bias from inter-residue backbone hydrogen-bonding, for each highresolution crystal structure from Protein Data Bank (PDB), any residue within any secondary structure defined by the DSSP74 program was excluded, except for the “bend” structure without backbone hydrogen-bonding. Secondly, pre-Pro residues were also excluded because of Pro’s strong NREs.75 The resulting (ϕ ) distribution of Ala indeed gives PII as the most populated conformation, with much lower αR probability (Figure 2B). In a previous dispersion-corrected density functional theory (DFT) calculations, PII conformation has the highest population (55% from ωB97X-D, 68% from B2PLYP-D) in water (SMD model),76 further supporting the use of coil library as reference for force field development. As for the influences from other neighboring 8

ACS Paragon Plus Environment

Page 8 of 40

Page 9 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

residues, we can safely assumed that they were averaged out in our coil library. This assumption was validated by a similar conformation distribution in a more stringent coil library (Coil-6)77 constructed by removing any secondary structure neighbors, 15% most exposed residues, and any residues adjacent to SDNVITFYHW. The high similarity (S = 0.99) between the two coil libraries indicates NREs are either statistically averaged out or not very strong in most cases.

2.2 Coupling between backbone and side-chain conformations Our coil library statistics revealed strong coupling between backbone and side-chain conformations in a single residue.78 The coupling has been found previously,79 but not from a coil library presenting the intrinsic conformational behaviors. Especially, we found that, in most situations, different side-chain 1 rotamers give (ϕ, ψ) plots with difference larger than that between different AA residues under a same side-chain rotamer. Most of the coupling can be understood from the perspective of interactions between γ atom(s) and adjacent backbone peptide groups. Further, we found these conformational features cannot be well reproduced by popular protein force fields, such as OPLS-AA/L, AMBER ff99SB, ff99SB-ildn, and ff03,67,78 indicating further improvements are needed.

2.3 Three-dimensional free energy decomposition Among the 20 common AAs, 17 of them (except Gly/Ala/Pro) are Cβ-derivatives of Ala. The intrinsic (ϕ, ψ, χ1) conformational preferences of each Ala-derived residue is determined by the solvent-mediated interactions among four fragments: two neighboring peptide groups, the Cβ group, and the substituent(s) on Cβ. As shown in Figure 3, the three pair-wise interactions among 9

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 40

Cβ and the two peptide groups together (with strong solvent effect)80 give the intrinsic backbone ϕ, ψ conformational preferences of Ala. The remaining two interactions are from the side-chain (beyond Cβ) to the ϕ-related and the ψ-related peptide groups, which are functions of (χ1, ϕ) and (χ1, ψ), respectively. By assuming that the interaction free energies of these pairs are additive, the related (χ1, ϕ, ψ) 3D PMF can also be decomposed into three 2D PMF: 𝐺X(𝜒1, 𝜙,𝜓) = 𝐺A(𝜙,𝜓) + 𝐺X(𝜒1, 𝜙) + 𝐺X(𝜒1, 𝜓)

(1)

Based on the Boltzmann relationship, eq 1 can be re-written as: 𝑝X(𝜒1,𝜙,𝜓) = 𝑝A(𝜙,𝜓) ∙ 𝑝X(𝜒1,𝜙) ∙ 𝑝X(𝜒1,𝜓)

(2)

Here pX (χ1, ϕ, ψ) is the conformational distribution of residue type X and pA (ϕ, ψ) is the conformational distribution of Ala.

Figure 3. Pairwise additivity of five interactions (dashed arrows) determining local backbone (ϕ, ψ) and side-chain χ1 conformational preferences within a dipeptide model (blocked AA residue). The corresponding distributions 3D and 2D distributions of Glu residue obtained from coil library are shown as an example (bottom).

Based on the coil library (ϕ, ψ) distribution of Ala, the residue-specific information encoded in the 3D (ϕ, ψ, χ1) distribution of certain AA-type X is extracted as two 2D distributions pX(χ1, 10

ACS Paragon Plus Environment

Page 11 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

ϕ) and pX(χ1, ψ), and a very similar 3D distribution can be reconstructed from the resulting 2D distributions with the (ϕ, ψ) plot of Ala.81 This strategy greatly reduces the numeric data needed to represent such hyper-surfaces and can guide our force field developments. Feasibility of free energy decomposition is a well-discussed topic.82-85 Although we cannot provide a theoretical proof for this free energy decomposition scheme, just like other additivity assumption widely used in biochemistry,86 we can provide strong evidence for the accuracy of the decomposition from two different aspects. Firstly, our previous QM calculations (MP2/631+G** level87 with the CPCM88,89 solvent model for water) of the effects of the side-chain substitution on backbone ϕ torsion and ψ torsion separately can be used to reproduce the observed conformational distributions from coil library.77 Secondly, the original coil library statistics and the re-constructed ones based on eq. 2 have very similarities (>0.99) for a vast majority of the side-chain rotamers.81

3. Developments of coil-library-based residue-specific force fields 3.1 RSFF1 & RSFF2 We developed two residue-specific force fields, RSFF129 and RSFF290, as improvements of OPLS-AA/L and AMBER ff99SB force fields, respectively. In RSFFs, all bond stretching and angle bending potentials were adopted from the parent force fields without modification. All the atomic parameters of Lennard-Jones (L-J) potentials and all atomic charges were also unchanged. On the other hand, residue-specific dihedral-angle parameters were optimized for all backbone and side-chain torsions, except for the ω torsion of the peptide bond. The goal is that simulations of various AA dipeptides in explicit water can yield conformational distributions very similar to 11

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 40

those from coil library statistics.

Figure 4. (A) The first step in the optimization of RSFF1 parameters after assignment of initial parameters (zeros in most cases). (B) Special Lennard-Jones potentials for 1-5 (red) and 1-6 (blue) non-bonded interactions in RSFF1. (C) Free energy surface (PMF) for side-chain χ2 torsion in Asp residue, from coil library and various force fields.

Parametrization was guided by comparison between replica-exchange molecular dynamics (REMD) simulation results of dipeptides and coil library statistics, including backbone (ϕ, ψ) distributions, χ1-rotamer-dependent (ϕ, ψ) distributions, percentages of three χ1-rotamers, and PMFs of all χ torsions. The backbone-related parameters was first optimized on Ala dipeptide, using a decomposition method29 to derive corrections on ϕ and ψ potentials separately: Δ𝐺𝜙(𝜙) + Δ𝐺ψ(𝜓) = 𝐺Coil(𝜙,𝜓) ― 𝐺MD(𝜙,𝜓)

(3)

where ΔGϕ(ϕ) and ΔGψ(ψ) are discrete functions with 10° interval and are fitted to Fourier expansions. 12

ACS Paragon Plus Environment

Page 13 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

To evaluate the agreement between two distributions n1(ϕ, ψ) and n2(ϕ, ψ), the cosine similarity coefficient (S) was calculated: 𝒏1 ∙ 𝒏2

𝑆 = cosα = ‖𝒏1‖‖𝒏2‖ =

∑[𝑛1(𝜙, 𝜓) ∙ 𝑛2(𝜙, 𝜓)] ∑𝑛1(𝜙, 𝜓)2 ∙ ∑𝑛2(𝜙, 𝜓)2

(6)

where each summation is over all (ϕ, ψ) grids. As shown in Figure 4A, after only one cycle of optimization, the simulated (ϕ, ψ) distribution was significantly improved from S = 0.49 to S > 0.97. To better describe the coupling between ϕ, ψ torsions, some special local 1-5 L-J interactions were introduced and optimized (Figure 4B), including a weak repulsion between Hi···Ni+1 to destabilize the α′ conformation and a weak attraction between Hi···Oi to stabilize the C5 conformation. We also found that the 1-5 L-J parameters optimized on Ala can be transferred into Gly with optimization of ϕ and ψ parameters. Thus, the same backbone 1-5 L-J parameters were used for all other AAs, with their ϕ and ψ parameters optimized separately. The parameters for side-chain χi potentials were also optimized. As an example (Figure 4C), both ff99SB and OPLS-AA/L give PMFs of χ2 rotation in Asp very different from the coil library statistics, and even the ff99SB-ildn force field with improved side-chain parameters cannot reproduce the coil library PMF. On the other hand, with optimized torsion parameters, our RSFF1 and RSFF2 can reproduce it much better. As discussed in section 2.2, the couplings between side-chain χ1 rotamer and backbone ϕ/ψ cannot be ignored. However, current force field framework doesn’t allow the use of different backbone dihedral parameters for different rotamers. To optimize these couplings, a specially optimized 1-5 L-J interaction between the Cγ atom and the backbone amide H atom (in RSFF1) or carbonyl O atom (in RSFF2) is sufficient for most AAs. However, some residues with short 13

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

polar side-chains (Ser, Thr, Asp, Asn) are much more difficult to optimize. Understandably, they involve strong short-range electrostatic interactions between side-chain and backbone atoms, which may not be accurately described using fixed atom-centered charges in classical force fields. We use some special 1-5 and 1-6 L-J interactions (Figure 4B) to optimize their χ1/ϕ and χ1/ψ couplings, with the σ and ε parameters adjusted manually. Thus, better χ1-rotamer-dependent (ϕ, ψ) distributions can be achieved. Both RSFF1 and RSFF2 force fields can excellently reproduce experimental 3JHNHαcouplings of all 19 dipeptides (excluding Pro without the HN atom).29 On the other hand, we found that AMBER ff99SB*-ildn, ff99SB-ildn-nmr, and OPLS-AA/L force fields cannot reproduce these experimental J-couplings very well. In an independent test, Li & Elcock found that RSFF2 gave J-couplings of short peptides and proteins in much better agreement with experimental data, compared with then state-of-the-art AMBER force fields ff99SB-ildn and ff99SB-ildn-nmr.91

3.2 RSFF2+ In the last decade, the increasing interests in the intrinsically disordered proteins (IDPs)92−94 has brought both opportunities and challenges for current force fields.95 Simulated structural ensembles of unfolded states of foldable proteins and IDPs were relatively too collapsed compared to that inferred from experimental data.53,54 Some evidence indicated that proteinprotein interactions and protein-water interactions are not well balanced in popular force fields, as they were developed for folded proteins.96,97 To solve this problem, Best et al. slightly increased L-J interactions between the water oxygen atom and all protein atoms in AMBER03w, 14

ACS Paragon Plus Environment

Page 14 of 40

Page 15 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

resulting in expanded unfolded ensembles which are much more consistent with experiments.97 Piana et al. proposed that this problem may arise from underestimation of dispersion interactions in popular water models, as supported by high-level QM calculations. The new water model (TIP4P-D), in which the dispersion coefficient C6 was increased by ~50%, can produce more expanded and accurate structural ensembles for disordered states.98 When TIP4P-D is used to simulate disordered peptides, with either AMBER ff99SB-ildn or RSFF2 force fields, very good reproduction of experimentally derived radius of gyration (Rg) can be achieved, while the TIP3P water model underestimates Rg values (Figure 5).99

Figure 5. Radius of gyration (Rg) of intrinsically disordered peptides: Histatin-5 (top) and RS peptide (bottom). Results from simulations with TIP3P and TIP4P-D water models were compared with small-angle X-ray scattering (SAXS) experiments.

However, we found that some well-folded small proteins could not be stabilized using TIP4P-D with AMBER ff99SB-ildn and CHARMM36m force fields, during REMD simulations 15

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(Figure 6A).99 Accordingly, we speculate that the good performance of these force fields with the previous TIP3P water model may due to error cancellation. Water models like TIP3P give an overly compact unfolded state with underestimated conformational entropy, and relatively overstabilize the folded state. Thus, these AMBER and CHARMM force fields might not give enough stabilization of the folded state by themselves. On the other hand, RSFF1 and RSFF2 overestimate the stability of folded states, with higher folding mid-point temperature (Tm) in most cases, when used with TIP4P/Ew and TIP3P water models, respectively.29,90

Figure 6. (A) The folded structures of Trp-cage and GB1 hairpin cannot be stabilized by popular AMBER and CHARMM force fields when the TIP4P-D water model is used. Each obtained representative structure of the largest cluster is given on the right. (B) Representative structures and melting curves of Trp-cage and GB1 hairpin using RSFF2 and RSFF2+ force fields with TIP4P-D, compared with corresponding experimental data. (C) Melting curve of Ac-Ala14NHMe (top) is sensitive to the energy (ε) parameter of the additional L-J potential on each backbone Oi···Hi+4 atom pair (bottom). ε = 1.3 kJ/mol and σ = 1.5 Å is finally chosen for the new RSFF2+ force field.

16

ACS Paragon Plus Environment

Page 16 of 40

Page 17 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

When using TIP4P-D, we found that RSFF2 can correctly fold various miniproteins and peptides which contained different secondary structure in REMD simulations.99 However, the stability of the folded state of α-helix systems is severely underestimated (Figure 6B). Indeed, there is significant H-bonding cooperativity (many-body effect) in α-helix, which is partly due to electronic polarization beyond the description of all classical force fields.100−104 By adding an additional L-J potential that stabilizes Oi···Hi+4 backbone hydrogen bonds, the new RSFF2+ force field gives melting curves in very good agreement with the experimental data for both αhelix and β-hairpin systems (Figure 6B). We have achieved, for the first time, correctly reproducing experimental Tm and folding enthalpy of different proteins simultaneously, which is a very difficult and attractive challenge.26,105−107 Overall, the new RSFF2+/TIP4P-D combination can yield realistic structural ensembles for both folded and disordered states, and correctly describe their equilibrium thermodynamics. Recently, others have also been trying to develop force fields that can accurately describe both protein states. For example, Huang et al. improved CHARMM36 by adding CMAP potentials and improving salt-bridge interactions for folded and IDPs with CHARMM-modified TIP3P water model.108 The resulting CHARMM36m yields more accurate descriptions of well-folded proteins and IDPs. Another example is AMBER 99SB-disp, which substantially improves the accuracy for IDP simulations without sacrificing the accuracy for folded proteins when used with a slightly modified TIP4P-D water.109 In the development of this force field, the PDB coil library data were also used for the re-parameterization of backbone and side-chain torsions.

3.3 RSFF2C 17

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

During the development of RSFF1 and RSFF2, special treatment of local (1-5 or 1-6) nonbonded interactions were introduced to optimize the couplings between neighboring torsions. Therefore, we can only implement these force fields in Gromacs110 without modifying the program codes (using ‘pair’ interactions). Besides, the parametrization required many rounds of iterative manual adjustments and verifications. Thus we saw the need to universally implement the RSFF models in currently popular MD packages (Gromacs,111 Amber,112,113 Charmm,114 OpenMM,115 etc.) using a more efficient way to optimize torsion-torsion couplings. Fortunately, all above MD packages have implemented a type of 2D grid-based potential energy function called Correction-Map (CMAP), firstly introduced by MacKerell et al. to optimize the 2D (ϕ, ψ) energy surfaces.15,116 CMAPs have also been used to introduce residuespecific optimizations of backbone (ϕ, ψ) preferences for IDPs.117 According to the free energy decomposition described in section 2.3, the 3D free energy hyper-surface of (ϕ, ψ, χ1) torsions can be decomposed into free energy surfaces of (ϕ, ψ), (χ1, ϕ) and (χ1, ψ). For each term, a CMAP can be optimized to compensate the difference between those from force field simulation and those from the coil library statistics. Using AMBER ff14SB as the starting model, we optimized backbone CMAP for Ala residue, which was then used for all Ala-derived residues. Then, two side-chain-related CMAPs for each Ala-derived residue were obtained. The side-chain χ2 and χ3 torsion potentials for Glu, Gln and Asn were also optimized to better fit coil library statistics. A separate CMAP was used for (ϕ, ψ) of Gly. The resulting force field was termed RSFF2C, but it is not an exact re-implementation of RSFF2.81 We have found that the addition of CMAPs can give excellent reproduction of coil-library (ϕ, ψ) distributions of Gly and Ala, achieving similarities S > 0.995 (Figure 7A). Especially, the 18

ACS Paragon Plus Environment

Page 18 of 40

Page 19 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

diagonal shape of αR basin are better captured. Also, compared to RSFF2, RSFF2C can much better reproduce the χ1-dependent (ϕ, ψ) distributions from coil library. RSFF2C simulations give 50 out of 53 rotamers with S > 0.98, compared with only 11 rotamers from RSFF2 simulations. For t rotamer of Asp, only RSFF2C can excellently reproduce its special coil-library (ϕ, ψ) plot (Figure 7B). This indicates that CMAPs are not only easier to optimize but also more accurate in describing torsion-torsion couplings than the special L-J interactions in RSFF1 & RSFF2. Besides, although RSFF2C was initially developed within the Amber MD package, we successfully implemented it in Gromacs as a standalone force field package (RSFF2C.ff) where new atom types were defined to allow residue-specific CMAP terms, and in OpenMM because it can parse topology files generated by Gromacs.

19

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 7. Backbone (ϕ, ψ) distributions from dipeptide simulations with various force fields, compared with those from the coil library statistics. The similarity to the corresponding coil library plot is given for each simulated (ϕ, ψ) plot. (A) (ϕ, ψ) distributions of Gly and Ala. (B) χ1 rotamer-dependent (ϕ, ψ) distributions of aspartate residue (Asp).

We conducted ab initio folding simulations of two Trp-cage variants (TC5b, TC10b), the αhelix protein HP35, and β-sheets of various sizes (CLN025, Trpzip-2, GB1 hairpin, and WW domain). RSFF2C was found to have similar or sometimes better performance compared to RSFF2. In Trpzip-2, RSFF2C overestimates the stability of folded state. This probably rose from unbalanced protein-protein and protein-water interactions when using the TIP3P water model, as discussed in Section 3.2. Backbone dynamics with RSFF2C was also evaluated and compared with RSFF2 and ff14SB. For three folded proteins (GB3, ubiquitin, and lysozyme), these force fields give similarly good agreement with N-H S2 order parameters from NMR experiments. We also evaluated the compatibility of RSFF2C with TIP4P-D in dipeptides and two IDPs (Histatin 5 and HIV-1 Rev) in terms of reproduction of experimental 3JHNHα couplings, and RSFF2C gave similar results to RSFF2, with significant improvement over ff14SB. For simulation of foldable peptides/proteins in TIP4P-D, an L-J correction potential similar to that in RSFF2+ may be needed to sufficiently stabilize α-helixes.

4. Applications of residue-specific force fields 4.1 Cyclic peptides Peptides are promising candidates for targeting protein-protein interactions (PPIs) with high 20

ACS Paragon Plus Environment

Page 20 of 40

Page 21 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

potency and specificity. However, linear peptides are often conformationally flexible, lowering its binding affinity due to entropic loss required to achieve its bound (functional) conformation. Also, they often have low bioavailability and poor in vivo stability. Cyclization of linear peptides, either by backbone or side-chain linkages, can achieve much lower conformational flexibility and better drug-like properties.118 Apart from their importance as chemical tools and drug leads, we think that cyclic peptides (CPs) can be very good model systems to validate protein force fields: (1) compared with proteins, they have smaller sizes and limited conformational spaces, converged conformational samplings can be achieved using some enhanced sampling methods119,120 with readily affordable computational resources; (2) compared with linear peptides, CPs can be much more easily crystalized, giving high-resolution experimental structures, avoiding the issue of error-prone interpretation of NMR measurements;121 (3) we have found that the conformational sampling of residues in small CPs is much more similar to that in globular proteins, compared with that in linear peptides.122 Recently, Lin and coworkers investigated the ability of MD-based simulations with enhanced sampling to predict the NMR structure of a model CP, using various force fields including our RSFF1.119 For a more systematic study, we simulated twenty CPs of 5-12 residues using REMD to achieve converged conformational sampling.122 RSFF1 and RSFF2 can correctly identify the crystal structures of more than half CPs as belonging to the most populated conformational cluster (that of lowest free energy). RSFF2 can predict the crystal structures of 17 out of these 20 CPs with backbone RMSD < 1.1 Å, and 8 CPs with backbone RMSD < 0.5Å, much better than previous OPLS-AA and AMBER force fields (Figure 8).

21

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 8. Statistics (box plots) of the RMSD between crystal structures of the 20 cyclic peptides and those predicted from REMD simulations with various force fields.

Thus, RSFF2 was used recently by Lin and coworkers to study sequence–structure relationships of some simple cyclic hexapeptides,123 and design well-structured cyclic pentapeptides.124 RSFF2 has also been used to study α-helical stapled peptides, giving predictions very similar to experimental structures.125,126 They are CPs formed by side-chain linkages, which involve non-standard amino acids. However, our RSFF2 force field can be regarded as an improvement of AMBER-99sb force field, generalized AMBER force field (GAFF) could be used compatibly.

4.2 Protein folding Some peptides and small proteins can spontaneously fold to their native structures from unfolded states within a few microseconds,127 which can be achieved recently in MD simulations. Thus, protein folding simulations have been increasingly used to validate protein force fields. In 2011, Lindorff-Larsen et al. reported the first successful large-scale folding simulation. Twelve proteins of 10~80 residues were reversibly folded using the CHARMM22* force field in explicit water.28 This has been made possible by their special purpose computer 22

ACS Paragon Plus Environment

Page 22 of 40

Page 23 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Anton. Using REMD to enhance the conformational sampling, we found that our RSFF1 can fold all these 12 proteins, along with the Trp-cage TC5b and wild-type engrailed homeodomain (EnHD) whose native structures could not be well stabilized by CHARMM22* (Figure 9).128 This indicates that RSFF1 not only give balanced secondary structure preferences, but also can consistently be use on different protein sequences. By analyzing continuous trajectories tracking every replica exchange, we also found that REMD can increase the folding rate by about 6 times. As expected, high temperature (T) indeed elevates the folding free energy barrier, but also significantly (> 102 times in most cases) increases the diffusion rate on rough energy landscape following the Zwanzig’s super-exponential T-dependence.129

Figure 9. Superpositions of the experimental (magenta) and predicted (rainbow) structures (from most populated conformational cluster) of the 14 proteins in our large-scale folding simulation. PDB ID for each protein is given in parenthesis, and below are percentage of the cluster and the 23

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Cα-RMSD (the value in parenthesis is calculated without a few terminal residues).

We also reported, for the first time, successful folding simulations of an α-helical hairpin using both RSFF1 and RSFF2.130 Besides, we carried out long-time replica-exchange REMD simulations on five Trp-cage variants to achieve equilibrium of folding-unfolding, the simulations well reproduce the experimental structures of TC5b and TC10b. The calculated Tm and folding enthalpies (ΔHf) of five variants are in very good correlations with corresponding experimental data.131 This indicates that our residue-specific force fields can indeed describe the conformational effects of different AA residues. We also explored their folding free energy landscapes and revealed their similarities and differences in folding mechanisms. Post-translational modifications of proteins, such as phosphorylation, plays very important functional roles in eukaryotes, especially in intra-cellular and inter-cellular signal transductions. However, occurrences of phosphorylated residues in our coil library are rare, so their RSFF-like parameterizations are quite impossible. Nevertheless, our RSFF2 is based on, and should be fully compatible with, AMBER force fields. We can use previously developed AMBER force fields for phosphorylated residues coupled with RSFF2 for the rest of the system. Through this approach, we investigated the mechanism of the folding of an intrinsically disordered protein (eIF4E-binding protein isoform 2, 4E-BP2) induced by site-specific phosphorylation(s), and found that the phosphorylated residues can function as nucleation sites of the folding.132

4.3 Protein structure refinement There is a huge and growing gap between the number of experimental protein structures and 24

ACS Paragon Plus Environment

Page 24 of 40

Page 25 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

the number of known sequences. The high demand for protein structures in modern biological research and drug discovery has placed sequence-based protein structure prediction at a crucial position. However, a predicted protein model is usually much less accurate than its corresponding experimental structure, limiting its usage in some important applications including structure-based drug design,133,134 making structure refinement from low-accuracy models to high-accurate ones very important. However, this is a very difficult challenge.135 One promising approach of structure refinement is by physical-based MD simulations. Feig and coworkers applied a MD-based refinement protocol to targets from Critical Assessment of Structure Prediction (CASP) competitions, and they found that the CHARMM36 force field achieved moderate but consistent refinement for most targets.136,137 However, Shaw and colleagues found that their CHARMM22* force field, which can fold a diverse set of small proteins, may not stabilize the experimental structure of a protein in long-time MD simulations, and good refinement can hardly be achieved.138 Thus, the success of structure refinement highly relies on the force field accuracy. We recently evaluated the applicability of RSFF1 in protein structure refinement.139 We chose 30 single-domain proteins from CASP8-10 refinement targets with diverse secondary and tertiary structures. During 1.0 μs unrestrained MD simulations (298 K) initiated from experimental structures, 27 of them have average Cα RMSD < 2.90 Å with low fluctuations, better than previous CHARMM22* simulations. Initiated from their homology models with a large Cα RMSD coverage of 1-9 Å, MD simulations (380 K) with weak Cα position restraints gave best structures with RMSD reduced by -0.85 Å on average, better than previous results from Feig’s group. With long-time REMD simulations, RSFF1 can fold two homology models, 25

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

TR614 and TR624 with initial Cα-RMSD > 5 Å, into near native structures. In 2016, a total of 39 groups took part in the CASP-12 refinement challenge, including us. The top-ranking groups are relatively conservative, yielding structures that are close to the initial ones. Considering the best model out of five submitted ones for each protein, our approach (REMD simulation using the RSFF1 force field) ranked as 4th and 12th with a penalty threshold of 0.0 and -2.0 based on the sum of Z-scores, respectively. This indicates that our approach is adventurous, and can provide significantly refined models for some targets (such as TR866, TR894 and TR944) but performs modestly overall.140 Encouraged by good performance in structure refinement, we applied RSFFs to investigate the structural and dynamic features of proteins, including the role of intrinsically disordered regions in RfaH141 and the role of structural waters in bromodomain dynamics.142 In a recent work, we predicted the possible binding mode between the WRPW motif of HES1 and the LRR domain of FBXL14.143 The homology model of apo-FBXL14 protein and the predicted complex structure was refined by our RSFF2 and the results provided a possible model about how the WRPW motif and FBXL14 interact.

5. Summary Here, we have described the developments and applications of our residue-specific force fields (RSFFs), which were parameterized based on the backbone and side-chain conformational distributions of various amino acid residues from protein coil library. The RSFF1 and RSFF2, as improvements of the OPLS-AA/L and AMBER ff99SB force fields respectively, can describe different intrinsic conformational preferences of 20 amino acid residues much better. RSFF1 26

ACS Paragon Plus Environment

Page 26 of 40

Page 27 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

successfully folded a diverse set of small proteins with diverse secondary structure contents and can give better performances in protein structure refinement, indicating the transferability of our force fields among different sequences. RSFF2 can provide reliable and accurate structure predictions of cyclic peptides, which we regarded as good model systems to test protein force fields. Then, we developed the RSFF2+ and RSFF2C force fields based on the success of RSFF2. With correction for many-body effect in helix formation and stronger protein-water interactions, the new RSFF2+/TIP4P-D method can correctly describe the thermodynamic equilibrium between folded and unfolded states, showing great transferability for both IDPs and folded proteins. The RSFF2C force field breaks the limitation of software by using CMAP potentials and can be easily implemented into different MD software packages. We hope our works in force field developments can be helpful for increasing the usefulness of MD simulations in modelling the structures and dynamics of peptides and proteins, in studying their functional mechanisms, and in drug discovery and design.

AUTHOR INFORMATION Corresponding Author *E-mail: [email protected] Notes The authors declare no competing financial interest.

ACKNOWLEDGEMENTS We thank the financial supports from the Shenzhen Science and Technology Innovation 27

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Committee (JCYJ20170412150507046), and the National Natural Science Foundation of China (21573009).

REFERENCES 1

McCammon, J. A.; Gelin, B. R.; Karplus M. Dynamics of folded proteins Nature 1977, 267,

585−590. 2

Shaw, D. E.; Maragakis, P.; Lindorff-Larsen, K.; Piana, S.; Dror, R. O.; Eastwood, M. P.; Bank,

J. A.; Jumper, J. M.; Salmon, J. K.; Shan, Y.; Wriggers, W. Atomic-Level Characterization of the Structural Dynamics of Proteins. Science. 2010, 330, 341−346. 3

Freddolino, P. L.; Arkhipov, A. S.; Larson, S. B.; McPherson, A.; Schulten, K. Molecular

Dynamics Simulations of the Complete Satellite Tobacco Mosaic Virus. Structure 2006, 14, 437−449. 4

Alder, B. J.; Wainwright, T. E. Studies in Molecular Dynamics. I. General Method. J. Chem.

Phys. 1959, 31, 459−466. 5

Sugita, Y.; Okamoto, Y. Replica-Exchange Molecular Dynamics Method for Protein Folding.

Chem. Phys. Lett. 1999, 314, 141−151. 6

Kumar, S.; Rosenberg, J. M.; Bouzida, D.; Swendsen, R. H.; Kollman, P. A. The Weighted

Histogram Analysis Method for Free-Energy Calculations on Biomolecules. I. The Method. J. Comput. Chem. 1992, 13, 1011−1021. 7

Hamelberg, D.; Mongan, J.; McCammon, J. A. Accelerated Molecular Dynamics: A Promising

and Efficient Simulation Method for Biomolecules. J. Chem. Phys. 2004, 120, 11919−11929. 8

Laio, A.; Parrinello, M. Escaping Free-Energy Minima. Proc. Natl. Acad. Sci. 2002, 99,

12562−12566. 9

Gao, Y. Q. An Integrate-over-Temperature Approach for Enhanced Sampling. J. Chem. Phys.

2008, 128, 064105. 10

Cornell, W. D.; Cieplak, P.; Bayly, C. I.; Gould, I. R.; Merz, K. M.; Ferguson, D. M.;

Spellmeyer, D. C.; Fox, T.; Caldwell, J. W.; Kollman, P. A. A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules. J. Am. Chem. Soc. 1995, 117, 5179−5197. 28

ACS Paragon Plus Environment

Page 28 of 40

Page 29 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

11

Hornak, V.; Abel, R.; Okur, A.; Strockbine, B.; Roitberg, A.; Simmerling, C. Comparison of

Multiple Amber Force Fields and Development of Improved Protein Backbone Parameters. Proteins 2006, 65, 712−725. 12

Lindorff-Larsen, K.; Piana, S.; Palmo, K.; Maragakis, P.; Klepeis, J. L.; Dror, R. O.; Shaw, D.

E. Improved Side-Chain Torsion Potentials for the Amber ff99sb Protein Force Field. Proteins 2010, 78, 1950−1958. 13

Maier, J. A.; Martinez, C.; Kasavajhala, K.; Wickstrom, L.; Hauser, K. E.; Simmerling, C.

ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from ff99SB. J. Chem. Theory Comput. 2015, 11, 3696−3713. 14

MacKerell, A. D.; Bashford, D.; Bellott, M.; Dunbrack, R. L.; Evanseck, J. D.; Field, M. J.;

Fischer, S.; Gao, J.; Guo, H.; Ha, S.; Joseph-McCarthy, D.; Kuchnir, L.; Kuczera, K.; Lau, F. T. K.; Mattos, C.; Michnick, S.; Ngo, T.; Nguyen, D. T.; Prodhom, B.; Reiher, W. E.; Roux, B.; Schlenkrich, M.; Smith, J. C.; Stote, R.; Straub, J.; Watanabe, M.; Wiórkiewicz-Kuczera, J.; Yin, D.; Karplus, M. All-Atom Empirical Potential for Molecular Modeling and Dynamics Studies of Proteins. J. Phys. Chem. B 1998, 102, 3586−3616. 15

Mackerell, A. D.; Feig, M.; Brooks, C. L. Extending the Treatment of Backbone Energetics in

Protein Force Fields: Limitations of Gas-Phase Quantum Mechanics in Reproducing Protein Conformational Distributions in Molecular Dynamics Simulation. J. Comput. Chem. 2004, 25, 1400−1415. 16

Piana, S.; Lindorff-Larsen, K.; Shaw, D. E. How Robust Are Protein Folding Simulations with

Respect to Force Field Parameterization? Biophys. J. 2011, 100, L47−L49. 17

Best, R. B.; Zhu, X.; Shim, J.; Lopes, P. E.; Mittal, J.; Feig, M.; Mackerell, A. J. Optimization

of the Additive CHARMM All-Atom Protein Force Field Targeting Improved Sampling of the Backbone ϕ, ψ and Side-Chain χ1 and χ2 Dihedral Angles. J. Chem. Theory Comput. 2012, 8, 3257−3273. 18

Jorgensen, W. L.; Maxwell, D. S.; Tirado-Rives, J. Development and Testing of the OPLS

All-Atom Force Field on Conformational Energetics and Properties of Organic Liquids. J. Am. Chem. Soc. 1996, 118, 11225−11236. 19

Kaminski, G. A.; Friesner, R. A.; Tirado-Rives, J.; Jorgensen, W. L. Evaluation and 29

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Reparametrization of the OPLS-AA Force Field for Proteins via Comparison with Accurate Quantum Chemical Calculations on Peptides. J. Phys. Chem. B 2001, 105, 6474−6487. 20

Robertson, M. J.; Tirado-Rives, J.; Jorgensen, W. L. Improved Peptide and Protein Torsional

Energetics with the OPLS-AA Force Field. J. Chem. Theory Comput. 2015, 11, 3499−3509. 21

Hu, H.; Elstner, M.; Hermans, J. Comparison of a QM/MM force field and molecular

mechanics force fields in simulations of alanine and glycine “dipeptides” (Ace-Ala-Nme and Ace-Gly-Nme) in water in relation to the problem of modeling the unfolded peptide backbone in solution. Proteins 2003, 50, 451−463. 22

Yoda, T.; Sugita, Y.; Okamoto, Y. Secondary-structure preferences of force fields for proteins

evaluated by generalized-ensemble simulations. Chem. Phys. 2004, 307, 269−283. 23

Best, R. B.; Buchete, N.-V.; Hummer, G. Are Current Molecular Dynamics Force Fields Too

Helical? Biophys. J. 2008, 95, L07−L09. 24

Freddolino, P. L.; Park, S.; Roux, B.; Schulten, K. Force Field Bias in Protein Folding

Simulations. Biophys. J. 2009, 96, 3772−3780. 25

Matthes, D.; de Groot, B. L. Secondary Structure Propensities in Peptide Folding Simulations:

A Systematic Comparison of Molecular Mechanics Interaction Schemes. Biophys. J. 2009, 97, 599−608. 26

Best, R. B.; Hummer, G. Optimized Molecular Dynamics Force Fields Applied to the Helix-

Coil Transition of Polypeptides. J. Phys. Chem. B 2009, 113, 9004−9015. 27

Lindorff-Larsen, K.; Maragakis, P.; Piana, S.; Eastwood, M. P.; Dror, R. O.; Shaw, D. E.

Systematic Validation of Protein Force Fields against Experimental Data. PLoS One 2012, 7, e32131. 28

Lindorff-Larsen, K.; Piana, S.; Dror, R. O.; Shaw, D. E. How Fast-Folding Proteins Fold.

Science 2011, 334, 517−520. 29 Jiang,

F.; Zhou, C.-Y.; Wu, Y.-D. Residue-specific force field based on the protein coil library.

RSFF1: modification of OPLS-AA/L. J. Phys. Chem. B 2014, 118, 6983−6998. 30

Chou, P. Y.; Fasman, G. D. Conformational parameters for amino acids in helical, β-sheet,

and random coil regions calculated from proteins. Biochemistry 1974, 13, 211−222. 31

Lyu, P.; Liff, M. I.; Marky, L. A.; Kallenbach, N. R. Side chain contributions to the stability 30

ACS Paragon Plus Environment

Page 30 of 40

Page 31 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

of alpha-helical structure in peptides. Science 1990, 250, 669−673. 32

Minor, D. L.; Kim, P. S. Measurement of the β-sheet-forming propensities of amino acids.

Nature 1994, 367, 660−663. 33

Han, W.; Wan, C.; Jiang, F.; Wu, Y.-D. PACE force field for protein simulations. 1. Full

parameterization of version 1 and verification. J. Chem. Theory Comput. 2010, 6, 3373−3389. 34

Marrink, S. J.; de Vries, A. H.; Mark, A. E. Coarse grained model for semiquantitative lipid

simulations. J. Phys. Chem. B 2004, 108, 750−760. 35

DeBolt, S. E.; Skolnick, J., Evaluation of atomic level mean force potentials via inverse

folding and inverse refinement of protein structures: atomic burial position and pairwise nonbonded interactions. Protein Eng. 1996, 9, 637-655. 36

Zhang, C.; Vasmatzis, G.; Cornette, J. L.; DeLisi, C., Determination of atomic desolvation

energies from the structures of crystallized proteins. J. Mol. Biol. 1997, 267, 707-726. 37

Melo, F.; Feytmans, E., Assessing protein structures with a non-local atomic interaction

energy. J. Mol. Biol. 1998, 277, 1141-1152. 38

Samudrala, R.; Moult, J., An all-atom distance-dependent conditional probability

discriminatory function for protein structure prediction. J. Mol. Biol. 1998, 275, 895-916. 39

Gatchell, D. W.; Dennis, S.; Vajda, S., Discrimination of near-native protein structures from

misfolded models by empirical free energy functions. Proteins 2000, 41, 518-534. 40

Lu, H.; Skolnick, J., A distance-dependent atomic knowledge-based potential for improved

protein structure selection. Proteins 2001, 44, 223-232. 41

Shen, M.-y.; Sali, A., Statistical potential for assessment and prediction of protein structures.

Protein Sci. 2006, 15, 2507-2524. 42

Tanaka, S.; Scheraga, H. A., Medium- and Long-Range Interaction Parameters between

Amino Acids for Predicting Three-Dimensional Structures of Proteins. Macromolecules 1976, 9, 945-950. 43

Hinds, D. A.; Levitt, M., A lattice model for protein structure prediction at low resolution.

Proc. Natl. Acad. Sci. 1992, 89, 2536-2540. 44

Jones, D. T.; Taylort, W. R.; Thornton, J. M., A new approach to protein fold recognition.

Nature 1992, 358, 86-89. 45

Simons, K. T.; Ruczinski, I.; Kooperberg, C.; Fox, B. A.; Bystroff, C.; Baker, D., Improved

recognition of native-like protein structures using a combination of sequence-dependent and 31

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

sequence-independent features of proteins. Proteins 1999, 34, 82-95. 46

Rajgaria, R.; McAllister, S. R.; Floudas, C. A., A novel high resolution Cα-Cα distance

dependent force field based on a high quality decoy set. Proteins 2006, 65, 726-741. 47

Zhou, H.; Zhou, Y., Distance-scaled, finite ideal-gas reference state improves structure-

derived potentials of mean force for structure selection and stability prediction. Protein Sci 2002, 11, 2714-26. 48

MacKerell, A. D.; Feig, M.; Brooks, C. L. Improved Treatment of the Protein Backbone in Empirical Force Fields. J. Am. Chem. Soc. 2004, 126, 698−699. 49

Han, W.; Wan, C.-K.; Wu, Y.-D. PACE force field for protein simulations. 2. Folding

simulations of peptides. J. Chem. Theory Comput. 2010, 6, 3390−3402. 50

Wang, L. P.; McKiernan, K. A.; Gomes, J.; Beauchamp, K. A.; Head-Gordon, T.; Rice, J. E.;

Swope, W. C.; Martinez, T. J.; Pande, V. S., Building a More Predictive Protein Force Field: A Systematic and Reproducible Route to AMBER-FB15. J Phys Chem B 2017, 121, 4023-4039. 51

Wang, L.-P.; Martinez, T. J.; Pande, V. S. Building Force Fields: An Automatic, Systematic,

and reproducible Approach. J. Phys. Chem. Lett. 2014, 5, 1885−1891. 52

Debiec, K. T.; Cerutti, D. S.; Baker, L. R.; Gronenborn, A. M.; Case, D. A.; Chong, L. T.,

Further along the Road Less Traveled: AMBER ff15ipq, an Original Protein Force Field Built on a Self-Consistent Physical Model. J. Chem. Theory Comput. 2016, 12, 3926-3947. 53

Piana, S.; Klepeis, J. L.; Shaw, D. E. Assessing the Accuracy of Physical Models Used in

Protein-Folding Simulations: Quantitative Evidence from Long Molecular Dynamics Simulations. Curr. Opin. Struct. Biol. 2014, 24, 98−105. 54

Rauscher, S.; Gapsys, V.; Gajda, M. J.; Zweckstetter, M.; de Groot, B. L.; Grubmüller, H.

Structural Ensembles of Intrinsically Disordered Proteins Depend Strongly on Force Field: A Comparison to Experiment. J. Chem. Theory Comput. 2015, 11, 5513−5524. 55

Swindells, M. B.; MacArthur, M. W.; Thornton, J. M. Intrinsic ϕ, ψ Propensities of Amino

Acids, Derived from the Coil Regions of Known Structures. Nat. Struct. Biol. 1995, 2, 596−603. 56

Serrano, L. Comparison between the ϕ distribution of the amino acids in the protein database

and NMR data indicates that amino acids have various ϕ propensities in the random coil conformation. J. Mol. Biol. 1995, 254, 322−333. 57

Fiebig, K. M.; Schwalbe, H.; Buck, M.; Smith, L. J.; Dobson, C. M. Toward a Description of

the Conformations of Denatured States of Proteins. Comparison of a Random Coil Model with NMR Measurements. J. Phys. Chem. 1996, 100, 2661−2666. 32

ACS Paragon Plus Environment

Page 32 of 40

Page 33 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

58

Fitzkee, N. C.; Fleming, P. J.; Rose, G. D. The Protein Coil Library: A Structural Database

of Nonhelix, Nonstrand Fragments Derived from the PDB. Proteins 2005, 58, 852−854. 59

Jha, A. K.; Colubri, A.; Zaman, M. H.; Koide, S.; Sosnick, T. R.; Freed, K. F. Helix, Sheet,

and Polyproline II Frequencies and Strong Nearest Neighbor Effects in a Restricted Coil Library. Biochemistry 2005, 44, 9691−9702. 60

Jha, A. K.; Colubri, A.; Freed, K. F.; Sosnick, T. R., Statistical coil model of the unfolded

state: Resolving the reconciliation problem. Proc. Natl. Acad. Sci. 2005, 102, 13099-13104. 61

Bernadó, P.; Blanchard, L.; Timmins, P.; Marion, D.; Ruigrok, R. W. H.; Blackledge, M., A

structural model for unfolded proteins from residual dipolar couplings and small-angle x-ray scattering. Proc. Natl. Acad. Sci. 2005, 102, 17002-17007. 62

Fujitsuka, Y.; Chikenji, G.; Takada, S., SimFold energy function for de novo protein

structure prediction: Consensus with Rosetta. Proteins 2005, 62, 381-398. 63

Betancourt, M. R., Knowledge-Based Potential for the Polypeptide Backbone. J. Phys.

Chem. B. 2008, 112, 5058-5069. 64

Rata, I. A.; Li, Y.; Jakobsson, E., Backbone Statistical Potential from Local Sequence-

Structure Interactions in Protein Loops. J. Phys. Chem. B. 2010, 114, 1859-1869. 65

Jung, Y.-S.; Oh, K.-I.; Hwang, G.-S.; Cho, M., Neighboring Residue Effects in Terminally

Blocked Dipeptides: Implications for Residual Secondary Structures in Intrinsically Unfolded/Disordered Proteins. Chirality 2014, 26, 443-452. 66

Li, S.; Andrews, C. T.; Frembgen-Kesner, T.; Miller, M. S.; Siemonsma, S. L.;

Collingsworth, T. D.; Rockafellow, I. T.; Ngo, N. A.; Campbell, B. A.; Brown, R. F.; Guo, C.; Schrodt, M.; Liu, Y. T.; Elcock, A. H., Molecular Dynamics Simulations of 441 Two-Residue Peptides in Aqueous Solution: Conformational Preferences and Neighboring Residue Effects with the Amber ff99SB-ildn-NMR Force Field. J. Chem. Theory Comput. 2015, 11, 13151329. 67

Jiang, F.; Han, W.; Wu, Y.-D. The intrinsic conformational features of amino acids from a

protein coil library and their applications in force field development. Phys. Chem. Chem. Phys. 2013, 15, 3413−3428. 68 Poon, C.; Samulski, E. T.; Weise, C. F.; Weisshaar, J. C. Do Bridging Water Molecules Dictate

the Structure of a Model Dipeptide in Aqueous Solution? J. Am. Chem. Soc. 2000, 122, 5642−5643. 33

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

69

Shi, Z.; Olson, C. A.; Rose, G. D.; Baldwin, R. L.; Kallenbach, N. R. Polyproline II Structure

in a Sequence of Seven Alanine Residues. Proc. Natl. Acad. Sci. 2002, 99, 9190−9195. 70

Shi, Z.; Chen, K.; Liu, Z.; Kallenbach, N. R. Conformation of the Backbone in Unfolded

Proteins. Chem. Rev. 2006, 106, 1877−1897. 71

Graf, J.; Nguyen, P. H.; Stock, G.; Schwalbe, H. Structure and Dynamics of the Homologous

Series of Alanine Peptides: A Joint Molecular Dynamics/NMR Study. J. Am. Chem. Soc. 2007, 129, 1179−1189. 72

Hagarman, A.; Measey, T. J.; Mathieu, D.; Schwalbe, H.; Schweitzer-Stenner, R. Intrinsic

Propensities of Amino Acid Residues in GxG Peptides Inferred from Amide I’ Band Profiles and NMR Scalar Coupling Constants. J. Am. Chem. Soc. 2010, 132, 540−551. 73

Grdadolnik, J.; Mohacek-Grosev, V.; Baldwin, R. L.; Avbelj, F. Populations of the Three

Major Backbone Conformations in 19 Amino Acid Dipeptides. Proc. Natl. Acad. Sci. 2011, 108, 1794−1798. 74

Kabsch, W.; Sander, C. Dictionary of Protein Secondary Structure: Pattern Recognition of

Hydrogen-bonded and Geometrical Features. Biopolymers 1983, 22, 2577−2637. 75

Ho, B. K.; Brasseur, R., The Ramachandran plots of glycine and pre-proline. BMC Struct.

Biol. 2005, 5, 14. 76

Kang, Y. K.; Byun, B. J. Assessment of density functionals with long‐range and/or empirical

dispersion corrections for conformational energy calculations of peptides. J. Comput. Chem. 2010, 31, 2915−2923. 77

Jiang, F.; Han, W.; Wu, Y. D., The intrinsic conformational features of amino acids from a

protein coil library and their applications in force field development. Phys. Chem. Chem. Phys. 2013, 15, 3413-3428. 78

Jiang, F.; Han, W.; Wu, Y. D. Influence of side chain conformations on local conformational

features of amino acids and implication for force field development. J. Phys. Chem. B. 2010, 114, 5840−5850. 79

McGregor, M. J.; Islam, S. A.; Sternberg, M. Analysis of the Relationship between Side-Chain

Conformation and Secondary Structure in Globular-Proteins. J. Mol. Biol. 1987, 198, 295−310. 80

Wang, Z.; Duan, Y. Solvation effects on alanine dipeptide: A MP2/cc-pVTZ//MP2/6-31G** 34

ACS Paragon Plus Environment

Page 34 of 40

Page 35 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

study of (Φ, Ψ) energy maps and conformers in the gas phase, ether, and water. J. Comput. Chem. 2004, 25, 1699−1716. 81

Kang, W.; Jiang, F.; Wu, Y.-D., Universal Implementation of a Residue-Specific Force Field

Based on CMAP Potentials and Free Energy Decomposition. J. Chem. Theory Comput. 2018, 14, 4474-4486. 82

Gao, J.; Kuczera, K.; Tidor, B.; Karplus, M., Hidden thermodynamics of mutant proteins: a

molecular dynamics analysis. Science 1989, 244, 1069-1072. 83

Boresch, S.; Archontis, G.; Karplus, M., Free energy simulations: The meaning of the

individual contributions from a component analysis. Proteins 1994, 20, 25-33. 84

Mark, A. E.; van Gunsteren, W. F., Decomposition of the Free Energy of a System in Terms

of Specific Interactions: Implications for Theoretical and Experimental Studies. J. Mol. Biol. 1994, 240, 167-176. 85

Boresch, S.; Karplus, M., The Meaning of Component Analysis: Decomposition of the Free

Energy in Terms of Specific Interactions. J. Mol. Biol. 1995, 254, 801-807. 86

Dill, K. A., Additivity Principles in Biochemistry. J. Biol. Chem. 1997, 272, 701-704.

87

Møller, C.; Plesset, M. S., Note on an Approximation Treatment for Many-Electron Systems.

Phys. Rev. 1934, 46, 618-622. 88

Barone, V.; Cossi, M., Quantum Calculation of Molecular Energies and Energy Gradients in

Solution by a Conductor Solvent Model. J. Phys. Chem. A 1998, 102, 1995-2001. 89

Takano, Y.; Houk, K. N., Benchmarking the Conductor-like Polarizable Continuum Model

(CPCM) for Aqueous Solvation Free Energies of Neutral and Ionic Organic Molecules. J. Chem. Theory Comput. 2005, 1, 70-77. 90

Zhou, C.-Y.; Jiang, F.; Wu, Y.-D. Residue-specific force field based on protein coil library.

RSFF2: modification of Amber ff99SB. J. Phys. Chem. B 2015, 119, 1035−1047. 91

Li, S.; Elcock, A. H. Residue-Specific Force Field (RSFF2) Improves the Modeling of

Conformational Behavior of Peptides and Proteins. J. Phys. Chem. Lett. 2015, 6, 2127−2133. 92

Dunker, A. K.; Brown, C. J.; Lawson, J. D.; Iakoucheva, L. M.; Obradović, Z. Intrinsic

Disorder and Protein Function. Biochemistry 2002, 41, 6573−6582. 93

Chouard, T. Structural Biology: Breaking the Protein Rules. Nature 2011, 471, 151–153.

94

Oldfield, C. J.; Dunker, A. K. Intrinsically Disordered Proteins and Intrinsically Disordered

Protein Regions. Annu. Rev. Biochem. 2014, 83, 553−584. 35

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

95

Huang, J.; MacKerell, A. D. Force Field Development and Simulations of Intrinsically

Disordered Proteins. Curr. Opin. Struct. Biol. 2018, 48, 40–48. 96

Nerenberg, P. S.; Jo, B.; So, C.; Tripathy, A.; Head-Gordon, T. Optimizing Solute-Water van

Der Waals Interactions To Reproduce Solvation Free Energies. J. Phys. Chem. B 2012, 116, 4524−4534. 97

Best, R. B.; Zheng, W.; Mittal, J. Balanced Protein-Water Interactions Improve Properties of

Disordered Proteins and Non-Specific Protein Association. J. Chem. Theory Comput. 2014, 10, 5113−5124. 98

Piana, S.; Donchev, A. G.; Robustelli, P.; Shaw, D. E. Water Dispersion Interactions Strongly

Influence Simulated Structural Properties of Disordered Protein States. J. Phys. Chem. B 2015, 119, 5113−5123. 99

Wu, H.-N.; Jiang, F.; Wu, Y.-D. Significantly Improved Protein Folding Thermodynamics

Using a Dispersion-Corrected Water Model and a New Residue-Specific Force Field. J. Phys. Chem. Lett. 2017, 8, 3199−3205. 100

Wu, Y.-D.; Zhao, Y.-L. A Theoretical Study on the Origin of Cooperativity in the Formation

of 310 - and α-Helices. J. Am. Chem. Soc. 2001, 123, 5313−5319. 101

Morozov, A. V.; Tsemekhman, K.; Baker, D. Electron Density Redistribution Accounts for

Half the Cooperativity of α Helix Formation. J. Phys. Chem. B 2006, 110, 4503−4505. 102

Duan, L. L.; Mei, Y.; Zhang, D.; Zhang, Q. G.; Zhang, J. Z. H. Folding of a Helix at Room

Temperature Is Critically Aided by Electrostatic Polarization of Intraprotein Hydrogen Bonds. J. Am. Chem. Soc. 2010, 132, 11159–11164. 103

Huang, J.; MacKerell, A. D. Induction of Peptide Bond Dipoles Drives Cooperative Helix

Formation in the (AAQAA)3 Peptide. Biophys. J. 2014, 107, 991–997. 104

Ouyang, J. F.; Bettens, R. P. A. When Are Many-Body Effects Significant? J. Chem. Theory

Comput. 2016, 12, 5860−5867. 105 Best, R. B.; Mittal, J. Protein Simulations with an Optimized Water Model: Cooperative Helix

Formation and Temperature-Induced Unfolded State Collapse. J. Phys. Chem. B 2010, 114, 14916−14923. 106

Day, R.; Paschek, D.; Garcia, A. E. Microsecond Simulations of the Folding/Unfolding 36

ACS Paragon Plus Environment

Page 36 of 40

Page 37 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Thermodynamics of the Trp-Cage Miniprotein. Proteins 2010, 78, 1889−1899. 107

Best, R. B.; Mittal, J.; Feig, M.; MacKerell, A. D. Inclusion of Many-Body Effects in the

Additive CHARMM Protein CMAP Potential Results in Enhanced Cooperativity of α-Helix and β-Hairpin Formation. Biophys. J. 2012, 103, 1045−1051. 108

Huang, J.; Rauscher, S.; Nawrocki, G.; Ran, T.; Feig, M.; de Groot, B. L.; Grubmüller, H.;

MacKerell, A. D. CHARMM36m: An Improved Force Field for Folded and Intrinsically Disordered Proteins. Nat. Methods 2016, 14, 71−73. 109 Robustelli,

P.; Piana, S.; Shaw, D. E. Developing a Molecular Dynamics Force Field for Both

Folded and Disordered Protein States. Proc. Natl. Acad. Sci. 2018, 115, E4758−E4766. 110

Van der Spoel, D.; Lindahl, E.; Hess, B.; Groenhof, G.; Mark, A. E.; Berendsen, H.

GROMACS: Fast, flexible, and free. J. Comput. Chem. 2005, 26, 1701−1718. 111

Bjelkmar, P.; Larsson, P.; Cuendet, M. A.; Hess, B.; Lindahl, E. Implementation of the

CHARMM Force Field in GROMACS: Analysis of Protein Stability Effects from Correction Maps, Virtual Interaction Sites, and Water Models. J. Chem. Theory Comput. 2010, 6, 459–466. 112

Case, D. A.; Cheatham, T. E.; Darden, T.; Gohlke, H.; Luo, R.; Merz, K. M.; Onufriev, A.;

Simmerling, C.; Wang, B.; Woods, R. J. The Amber Biomolecular Simulation Programs. J. Comput. Chem. 2005, 26, 1668–1688. 113

Crowley, M. F.; Williamson, M. J.; Walker, R. C. CHAMBER: Comprehensive Support for

CHARMM Force Fields within the AMBER Software. Int. J. Quantum Chem. 2009, 109, 3767– 3772. 114

Brooks, B. R.; Brooks, C. L.; MacKerell, A. D.; Nilsson, L.; Petrella, R. J.; Roux, B.; Won,

Y.; Archontis, G.; Bartels, C.; Boresch, S.; Caflisch, A.; Caves, L.; Cui, Q.; Dinner, A. R.; Feig, M.; Fischer, S.; Gao, J.; Hodoscek, M.; Im, W.; Kuczera, K.; Lazaridis, T.; Ma, J.; Ovchinnikov, V.; Paci, E.; Pastor, R. W.; Post, C. B.; Pu, J. Z.; Schaefer, M.; Tidor, B.; Venable, R. M.; Woodcock, H. L.; Wu, X.; Yang, W.; York, D. M.; Karplus, M. CHARMM: The Biomolecular Simulation Program. J. Comput. Chem. 2009, 30, 1545–1614. 115

Eastman, P.; Swails, J.; Chodera, J.; McGibbon, R. T.; Zhao, Y.; Beauchamp, K. A.; Wang,

L.-P.; Simmonett, A.; Harrigan, M.; Stern, C. D.; Wiewiora, R. P.; Brooks, B. R.; Pande, V. OpenMM 7: Rapid development of high performance algorithms for molecular dynamics. PLoS 37

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Comput. Biol. 2017, 13, e1005659. 116

MacKerell, A. D.; Feig, M.; Brooks, C. L. Improved Treatment of the Protein Backbone in

Empirical Force Fields. J. Am. Chem. Soc. 2004, 126, 698−699. 117

Wang, W.; Ye, W.; Jiang, C.; Luo, R.; Chen, H.-F. New Force Field on Modeling Intrinsically

Disordered Proteins. Chem. Biol. Drug Des. 2014, 84, 253–269. 118

Craik, D. J.; Fairlie, D. P.; Liras, S.; Price, D. The Future of Peptide-Based Drugs. Chem.

Biol. Drug Des. 2013, 81, 136−147. 119

Yu, H.; Lin, Y.-S. Toward Structure Prediction of Cyclic Peptides. Phys. Chem. Chem. Phys.

2015, 17, 4210−4219. 120

McHugh, S. M.; Rogers, J. R.; Yu, H.; Lin, Y. S. Insights into How Cyclic Peptides Switch

Conformations. J. Chem. Theory Comput. 2016, 12, 2480−2488. 121

Nabuurs, S. B.; Spronk, C. A. M.; Vuister, G. W.; Vriend, G. Traditional Biomolecular

Structure Determination by NMR Spectroscopy Allows for Major Errors. PLoS Comput. Biol. 2006, 2, e9. 122

Geng, H.; Jiang, F.; Wu, Y.-D. Accurate structure prediction and conformational analysis of

cyclic peptides with residue-specific force fields. J. Phys. Chem. Lett. 2016, 7, 1805−1810. 123

McHugh, S. M.; Yu, H.; Slough, D. P.; Lin, Y. S. Mapping the Sequence-Structure

Relationships of Simple Cyclic Hexapeptides. Phys. Chem. Chem. Phys. 2017, 19, 3315−3324. 124

Slough, D. P.; McHugh, S. M.; Cummings, A. E.; Dai, P.; Pentelute, B. L.; Kritzer, J. A.;

Lin, Y.-S. Designing Well-Structured Cyclic Pentapeptides Based on Sequence-Structure Relationships. J. Phys. Chem. B 2018, 122, 3908−3919. 125

Hu, K.; Geng, H.; Zhang, Q.; Liu, Q.; Xie, M.; Sun, C.; Li, W.; Lin, H.; Jiang, F.; Wang, T.;

Wu, Y.-D.; Li, Z. An in-tether chiral center modulates the helicity, cell permeability, and target binding affinity of a peptide. Angew. Chem. Int. Ed. 2016, 55, 8013−8017. 126

Zhao, H.; Liu, Q.; Geng, H.; Tian, Y.; Cheng, M.; Jiang, Y.; Xie, M.; Niu, X.; Jiang, F.;

Zhang, Y.; Lao, Y.; Wu, Y.-D.; Xu, N.; Li, Z. Crosslinked aspartic acids as helix-nucleating templates. Angew. Chem. Int. Ed. 2016, 55, 12088−12093. 127

Kubelka, J.; Hofrichter, J.; Eaton, W. A. The protein folding ‘speed limit’. Curr. Opin. Struct.

Biol. 2004, 14, 76−88. 38

ACS Paragon Plus Environment

Page 38 of 40

Page 39 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

128

Jiang, F.; Wu, Y.-D. Folding of fourteen small proteins with a residue-specific force field and

replica-exchange molecular dynamics. J. Am. Chem. Soc. 2014, 136, 9536−9539. 129 Zwanzig, R. Diffusion in a rough potential. Proc. Natl. Acad. Sci. U.S.A. 1988, 85, 2029−2030. 130

Zeng, J.; Jiang, F.; Wu, Y.-D. Folding simulations of an α‑helical hairpin motif αtα with

residue-specific force fields. J. Phys. Chem. B 2016, 120, 33−41. 131

Zhou, C.; Jiang, F.; Wu, Y.-D. Folding thermodynamics and mechanism of five trp-cage

variants from replica-exchange MD simulations with RSFF2 force field. J. Chem. Theory Comput. 2015, 11, 5473−5480. 132

Zeng, J.; Jiang, F.; Wu, Y.-D. The mechanism of phosphorylation-induced folding of 4E-PB2

revealed by molecular dynamics simulations. J. Chem. Theory Comput. 2017, 13, 320−328. 133

Zhang, Y. Protein structure prediction: when is it useful? Curr. Opin. Struct. Biol. 2009, 19,

145-155. 134

Gront, D.; Kmiecik, S.; Blaszczyk, M.; Ekonomiuk, D.; Koliński, A. Optimization of protein

models. WIREs. Comput. Mol. Sci. 2012, 2, 479−493. 135

Nugent, T.; Cozzetto, D.; Jones, D. T. Evaluation of predictions in the CASP10 model

refinement category. Proteins. 2014, 82, 98−111. 136

Mirjalili, V.; Feig, M. Protein Structure Refinement through Structure Selection and

Averaging from Molecular Dynamics Ensembles. J. Chem. Theory Comput. 2013, 9, 1294−1303. 137

Mirjalili, V.; Noyes, K.; Feig, M. Physics-Based Protein Structure Refinement through

Multiple Molecular Dynamics Trajectories and Structure Averaging. Proteins 2014, 82, 196−207. 138

Raval, A.; Piana, S.; Eastwood, M. P.; Dror, R. O.; Shaw, D. E. Refinement of protein

structure homology models via long, all-atom molecular dynamics simulations. Proteins 2012, 80, 2071−2079. 139

Xun, S.; Jiang, F.; Wu, Y.-D. Significant refinement of protein structure models using a

residue-specific force field. J. Chem. Theory Comput. 2015, 11, 1949−1956. 140

Hovan, L.; Oleinikovas, V.; Yalinca, H.; Kryshtafovych, A.; Saladino, G.; Gervasio, F. L.,

Assessment of the model refinement category in CASP12. Proteins: Structure, Function, and Bioinformatics 2017, 86, 152-167. 39

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

141

Xun, S.; Jiang, F.; Wu, Y.-D. Intrinsically disordered regions stabilize the helical form of the

C-terminal domain of RfaH: A molecular dynamics study. Bioorg. Med. Chem. 2016, 24, 4970−4977. 142

Zhang, X.; Chen, K.; Wu, Y.-D.; Wiest, O. Protein Dynamics and Structural Waters in

bromodomains. PLoS One 2017, 12, e0186570. 143

Chen, F.; Zhang, C.; Wu, H.; Ma, Y.; Luo, X.; Gong, X.; Jiang, F.; Gui, Y.; Zhang, H.; Lu,

F. The E3 Ubiquitin Ligase SCFFBXL14 Complex Stimulates Neuronal Differentiation by Targeting the Notch Signaling Factor HES1 for Proteolysis. J. Biol. Chem. 2017, 292, 20100−20112.

TOC graphic

40

ACS Paragon Plus Environment

Page 40 of 40