Use of Restraints from Consensus Fragments of Multiple Server

Oct 17, 2016 - Based on these results, we modified the procedure by deriving restraints from models from multiple servers, in this study the four top-...
1 downloads 9 Views 2MB Size
Subscriber access provided by CORNELL UNIVERSITY LIBRARY

Article

Use of Restraints from Consensus Fragments of Multiple Server Models to Enhance Protein-Structure Prediction Capability of the UNRES Force Field Magdalena Anna Mozolewska, Pawel Krupa, Bartlomiej Zaborowski, Adam Liwo, Jooyoung Lee, Keehyoung Joo, and Cezary Czaplewski J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.6b00189 • Publication Date (Web): 17 Oct 2016 Downloaded from http://pubs.acs.org on October 20, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Information and Modeling is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 44

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Information and Modeling

Use of Restraints from Consensus Fragments of Multiple Server Models to Enhance Protein-Structure Prediction Capability of the UNRES Force Field

Magdalena A. Mozolewska,1 PaweÃl Krupa,1 BartÃlomiej Zaborowski,1 Adam Liwo,1,2,∗ Jooyoung Lee,2 Keehyoung Joo,3 and Cezary Czaplewski1

1

Faculty of Chemistry, University of Gda´ nsk, Wita Stwosza 63, 80-308 Gda´ nsk, Poland,

2

Center for In Silico Protein Structure and School of Computational Sciences, Korea Institute

for Advanced Study, 85 Hoegiro, Dongdaemun-gu, Seoul 130-722, Republic of Korea. 3

Center for Advanced Computation Korea Institute for Advanced Study, 85 Hoegiro,

Dongdaemun-gu, Seoul 130-722, Republic of Korea,



Corresponding

author;

phone:

+48585235124;

fax:

[email protected] ACS Paragon Plus Environment

+48585235012;

e-mail:

Journal of Chemical Information and Modeling

Page 2 of 44

2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Abstract Recently, we developed a new approach to protein-structure prediction, which combines template-based modeling with the physics-based coarse-grained UNited RESidue (UNRES) force field. In this approach, restrained multiplexed replica exchange molecular dynamics (MREMD) simulations with UNRES, with the Cα -distance and virtual-bond-dihedral-angle restraints derived from knowledge-based models are carried out. In this work, we report a test of this approach in the 11th Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP11), in which we used the templatebased models from early-stage predictions by the LEE group CASP11 server (group 038, called ’nns’), and further improvement of the method. The quality of the models obtained in CASP11 was better than that resulting from unrestrained UNRES simulations; however, the obtained models were generally worse than the final nns models. Calculations with the final nns models, performed after CASP11, resulted in substantial improvement, especially for multi-domain proteins. Based on these results, we modified the procedure by deriving restraints from models from multiple servers, in this study the 4 top-performing servers in CASP11 (nns, BAKERROSETTASERVER, Zhang-server, and QUARK), and implementing either all restraints or only the restraints on the fragments that appear similar in the majority of models (the consensus fragments), outlier models discarded. Tests with 29 CASP11 human-prediction targets with length less than 400 amino-acid residues demonstrated that the consensus-fragment approach gave better results, i.e., lower α-carbon Root-Mean-Square Deviation (RMSD) from the experimental structures, higher Template Modeling score (TM-score), and Global Distance Test Total Score (GDT TS) values than the best of the parent server models. Apart from global improvement (repacking and improving the orientation of domains and other substructures), improvement was also reached for template-based modeling targets, indicating that the approach has refinement capacity. Therefore, the consensus-fragment analysis is able to remove lower-quality models and poor-quality parts of the models without knowing the experimental structure.

ACS Paragon Plus Environment

Page 3 of 44

Journal of Chemical Information and Modeling

3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

Introduction

Prediction of three-dimensional protein structures from amino-acid sequence alone is a very important research field, which combines molecular simulations with bioinformatics. The reason for its importance is the demand for the structures of biologically important proteins, which determines their functions and the fact that experimental determination of protein structures is time- and labor consuming.1 The most successful approaches to protein-structure prediction are template-based methods or other bioinformatics approaches that strongly rely on the information from structural databases. The physics-based approaches, which rely on the Anfisen thermodynamic hypothesis,2 according to which the native structure of a protein is the global minimum of the free energy of protein in water, have been less successful compared to knowledge-based approaches. The main reason for their poorer performance is that, generally, the present energy functions are not accurate enough to distinguish native-like structures from non-native structures, even though examples of successful physics-based predictions of protein structures3–5 and folding simulations at the all-atom6–8 and coarse-grained level9–14 are known. Nevertheless, the development of physics-based methods is important because, as opposed to knowledge-based methods, they are independent of structural databases and, therefore, the discovery of new folds does not affect their performance. Moreover, the knowledge-based methods sometimes fail to predict protein structures correctly even though the respective folds are in the database(s) used for prediction, because the correct folds produce too weak signals. This was the case of, e.g., the CASP3 target T0061,3 the CASP6 target T0215,4 and the CASP10 target T06635 for which our physics-based methodology was able to find the correct folds, while the knowledge-based methods did not. In collaboration with Harold A. Scheraga, Cornell University, we have been developing the physics-based coarse-grained UNited RESidue (UNRES) force field for protein-folding simulations and protein-structure prediction.15–31 Because of the reduced number of interaction centers and of averaging out fast-moving degrees of freedom, UNRES simulations are 103 − 104 times faster compared to all-atom simulations,11,32 which tremendously reduces the time required to carry out converged simulations. With our UNRES-based methodology, we achieved ACS Paragon Plus Environment

Journal of Chemical Information and Modeling

Page 4 of 44

4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

good results in protein structure prediction in previous CASP experiments4,5,33 and also used it with success to investigate biologically important protein systems such as, e.g., the bacterial DnaK chaperone Hsp70,34 the heterodimeric Isu1-Jac1 system in yeast35 which is involved in iron-sulfur cluster biogenesis, and the bacterial arginine-binding protein.36 Reduction of protein representation results, however, in lower resolution of the simulated structures; at present, the UNRES-simulated structures are about 5 ˚ A away from the experimental structures in Cα RMSD for 50-70 residue proteins, although structures with lower Cα -RMSD were also obtained,4,5,33,37 e.g., the best UNRES model of the 97-residue target T0769 submitted ˚.37 On the other hand, our UNRES-based in the last CASP experiment had Cα RMSD of 3.8A methodology is successful in predicting the overall fold and can predict the overall topology, in particular domain packing better than knowledge-based methods, as demonstrated in the CASP10 exercise.5 This observation prompted us to develop an approach which combines the accuracy of the prediction of knowledge-based methods in the sections of the sequence which can be well predicted by homology-based approaches with the ability of UNRES to predict the overall fold.31 To accomplish this task, we implemented the distance- and virtual-bond dihedral-angle restraints in the form of log-Gaussian restraints derived from a number of homology models, similar to those used in the MODELLER program.38,39 We tested the method with singledomain CASP9 (generating the models to derive restraints with MODELLER) and two-domain CASP10 targets (taking the models submitted by the LEEcon group during CASP10).40 We found that the models of TBM-target single-domain proteins obtained with the template-aided UNRES simulations were of comparable quality to that of the parent models while for 3 out of 4 two-domain proteins, better models were obtained than the parent LEEcon models.31 In this paper we report the results of the blind test of our approach in the CASP11 experiment in which we participated as the KIAS-Gdansk group using early stage template-based models from the LEE group CASP11 server (group 038, called ’nns’)41 and also post-CASP11 tests in which the final models from the nns group were used to derive restraints. Since CASP7 the LEE group server has been quite successful in the TBM category. Especially, nns was ranked as the second top server for TBM in terms of both the first model and the best model.41,42 Because combinACS Paragon Plus Environment

Page 5 of 44

Journal of Chemical Information and Modeling

5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ing various, even very diverse approaches to obtain better protein models proved beneficial (an example of widespread application of diverse methods is the WeFold project43 ), we have also tested an approach in which restraints are derived using knowledge-based models from several sources. To enhance the quality of the predictions with the use of UNRES supplemented by knowledge-based information, we designed a simple, but well-performing consensus-fragment analysis method, which enables us to automatically select, without prior knowledge of the experimental structure, models and fragments of the models of good quality which are, therefore, appropriate to derive restraints.

2 2.1

Methods UNRES model

In the coarse-grained UNRES model,15–27,44 a polypeptide chain is represented by a sequence of united peptide groups (p), each of which is placed between the two consecutive Cα atoms, and united side chains (SC) (represented by ellipsoids of revolution) attached to the Cα atoms. Only the SC and p centers are interaction sites; the α-carbon atoms serve only to define the geometry of a chain (Figure 1). The effective energy function is expressed by eq. 1.

U = wSC

X

USCi SCj + wSCp X

Utor (γi ) + wtord f3 (T )

i

+ wb

X

V DW USCi pj + wpp

i6=j

i