Subscriber access provided by UNIVERSITY OF ADELAIDE LIBRARIES
Article
Maximum Likelihood Calibration of the UNRES Force Field for Simulation of Protein Structure and Dynamics Pawel Krupa, Anna Halabis, Wioletta Zmudzinska, Stanislaw Oldziej, Harold Abraham Scheraga, and Adam Liwo J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.7b00254 • Publication Date (Web): 15 Aug 2017 Downloaded from http://pubs.acs.org on August 16, 2017
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
Journal of Chemical Information and Modeling is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 48
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
1
Maximum Likelihood Calibration of the UNRES Force Field for Simulation of Protein Structure and Dynamics ˙ PaweÃl Krupa,1,2,3 Anna HaÃlabis,4 Wioletta Zmudzi´ nska,4 StanisÃlaw OÃldziej,4 Harold A. Scheraga,2 Adam Liwo1,5 1
Faculty of Chemistry, University of Gda´ nsk, ul. Wita Stwosza 63, 80-308 Gda´ nsk, Poland 2
Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, NY 14853-1301, U.S.A.
3
Institute of Physics, Polish Academy of Sciences, Aleja Lotnik´ow 32/46, PL-02668 Warsaw, Poland
4
Laboratory of Biopolymer Structure, Intercollegiate Faculty of Biotechnology, University of Gda´ nsk and Medical University of Gda´ nsk, Abrahama 58, 80-307 Gda´ nsk, Poland 5
School of Computational Sciences, Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu, Seoul 130-722, Republic of Korea
The authors declare no competing financial interest.
∗
Corresponding author, phone:
+48 58 523 5124, fax:
[email protected] ACS Paragon Plus Environment
+48 58 523 5012, email:
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 2 of 48
2
Abstract By using the maximum likelihood method for force-field calibration recently developed in our laboratory, which is aimed at achieving the agreement between the simulated conformational ensembles of selected training proteins and the corresponding ensembles determined experimentally at various temperatures, the physics-based coarse-grained UNRES force field for simulations of protein structure and dynamics was optimized with seven small training proteins exhibiting a variety of secondary and tertiary structures. Four runs of optimization, in which the number of optimized force-field parameters was gradually increased, were carried out and the resulting force fields were subsequently tested with a set of 22 α-, 12 β-, and 12 α + β-proteins not used in optimization. The variant in which energy-term weights, local, and correlation potentials, and side-chain radii and anisotropies were optimized turned out to be the most transferable, and outperformed all previous versions of UNRES on the test set. Key words: Maximum likelihood, Force field optimization, UNRES force field, Multiplexed replica exchange molecular dynamics, Protein folding.
ACS Paragon Plus Environment
Page 3 of 48
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
3
INTRODUCTION Molecular simulations with empirical force fields are nowadays commonly used to study protein structure, folding pathways and energy landscapes, functionally-related conformational changes, and biological processes.1 Even though it is possible to run large-scale all-atom simulations, owing to the skyrocketing development of high-performance computers, including the construction of dedicated machines,2,3 coarse-grained force fields are becoming increasingly popular,4–9 because their use increases the size- and time-scale of simulations by several orders of magnitude with respect to all-atom simulations.10,11 In our laboratory, we are developing the physics-based coarse-grained UNRES force field for the simulation of protein structure and dynamics.12–16 The force field is based on the expansion of the potential of mean force of polypeptide chains in water into Kubo’s cluster-cumulant functions,17 which enabled us to identify the respective effective energy terms with potentials of mean force of small model systems,18,19 which are comparatively easy to handle. The different contributions to the effective potential energy are then multiplied by appropriate weights (termed energy-term weights13 ) and added up to produce the complete effective energy function.13,14 UNRES has performed well in protein-structure prediction20–24 and was applied with success to protein-folding simulations, including the simulations of folding kinetics,25–27 studies of free-energy landscapes,10,28–30 and investigations of biologically-important processes such as, e.g., PICK1 to BAR binding,31 Hsp70 chaperone cycle,32 and modeling the structure and stability of the complex between the iron-sulfur-binding protein 1 (Isu1) and the Jac1 co-chaperone.33 For the results of simulations to be reliable, the force field has to be calibrated, with a set of training proteins or small molecules, to reproduce structural and thermodynamics data. Various approaches have been used for this purpose, including the maximization of the energy gap between the lowest-energy native-like and lowest-energy non-native structure,34–37 maximization of the Z-score between the native-like and non-native structures,38–41 and the hierarchical optimization method developed in our laboratory, in which the force field is optimized so that the free energies of the sub-ensembles with increased degree of native-likeliness are in descend-
ACS Paragon Plus Environment
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 4 of 48
4
ing order before and in ascending order after the melting temperature.42–44 The last method results in force fields most consistent with the thermodynamics of protein folding. However, all these methods suffer from the arbitrariness of the division of the set of conformations into sub-ensembles. In our recent work,45 we developed a completely new approach to force-field calibration, which is based on fitting the conformational ensembles to the experimental ensembles determined at various temperatures, by using the maximum-likelihood principle. This approach is free from arbitrary partition of the conformational ensembles into sub-ensembles; the maximumlikelihood function is constructed by matching each conformation of the simulated set to each of the experimental set, by using a soft Gaussian function. As in other force-field optimization methods for ab initio protein simulations,34,35,37–44 optimization is run in cycles, each of which consists of generation of the ensembles of the conformations of the training proteins with the current force-field parameters and target-function minimization with the generated set of conformations. It should be noted that the developed maximum-likelihood method is not restricted to the optimization of the UNRES force field and even not to force-field optimization but it can be applied to any data-fitting problem in which the probability density corresponding to a model fitted to the data cannot be obtained in analytical or histogram form but can be obtained only as a set of simulated points (e.g., protein conformations). We tested the method by using it to calibrate the UNRES force field with the tryptophan cage, selected as the training protein.45 We used the experimental structures that were determined by the nuclear magnetic resonance (NMR) method at three temperatures, including a temperature below the melting point, the melting-point temperature, and a temperature above the melting point.46 Despite being trained with a single α-helical protein, the resulting variant of the UNRES force field produced native-like structures with a resolution below 5 ˚ A for 10 out of 14 α-helical benchmark proteins with various chain lengths and structures with native-like topology for 13 out of 14 benchmark proteins.45 However, more than a single training protein should be used in order to obtain a fully transferable force field.
ACS Paragon Plus Environment
Page 5 of 48
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Chemical Information and Modeling
5
In this paper, we report the application of the maximum-likelihood method to a full-blown calibration of the UNRES force field. We have used seven training proteins with various chain lengths and types of secondary structure (α, β, and α + β) and selected four sets of parameters for optimization: (i) the minimal set consisting of energy-term weights only, (ii) the set consisting of energy-term weights plus the torsional and correlation-term parameters, (iii) the set that additionally included side-chain radii and anisotropies and, (iv) the extended set consisting of set (iii) plus the well depths of the side chain – side chain interaction potentials. We tested the resulting variants of the UNRES force field with a benchmark set consisting of 22 α-, 12 β- and 12 α + β-proteins and found that variant (iii) outperformed all previous versions of UNRES.
METHODS UNRES energy function In the UNited RESidue (UNRES) model of polypeptide chains developed in our laboratory,12–16 each amino-acid residue has two interaction sites, namely the united peptide group (p) and the united side chain (SC). The geometry of the backbone is defined in terms of the positions of the α-carbon (Cα ) atoms, which are not the interaction sites. Each peptide group is placed in the middle between the consecutive Cα atoms, and the united side chains are attached to the Cα atoms. The geometry of the virtual-bond-chain can also be defined in terms of the internal coordinates, namely the backbone virtual-bond angles θ, backbone virtual-bond dihedral angles γ, and the zenith and the azimuthal angles α and β, respectively, which determine the location of a side-chain center with respect to the backbone (Figure 1). The effective energy function originates from the Potential of Mean Force (PMF) of the protein in water in which all degrees of freedom that do not belong to the coarse-grained representation have been integrated out.18,19 The PMF is then expanded into the generalized Kubo cluster-cumulant series,17 which has enabled us to derive analytical expressions for the effective energy terms, including the correlation terms which are inherent in coarse-grained force fields.18,19 The effective energy function is expressed by eq 1.
ACS Paragon Plus Environment
Journal of Chemical Information and Modeling
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
U = wSC
X
USCi SCj + wSCp
V DW USCi pj + wpp
i6=j
i