Article pubs.acs.org/JCTC
A Simple and Efficient Protein Structure Refinement Method Qianyi Cheng, InSuk Joung, and Jooyoung Lee* Center for In Silico Protein Science and School of Computational Sciences, Korea Institute for Advanced Study, Seoul 02455, Korea S Supporting Information *
ABSTRACT: Improving the quality of a given protein structure can serve as the ultimate solution for accurate protein structure prediction, and seeking such a method is currently a challenge in computational structural biology. In order to promote and encourage much needed such efforts, CASP (Critical Assessment of Structure Prediction) has been providing an ideal computational experimental platform, where it was reported only recently (since CASP10) that systematic protein structure refinement is possible by carrying out extensive (approximately millisecond) MD simulations with proper restraints generated from the given structure. Using an explicit solvent model and much reduced positional and distance restraints than previously exercised, we propose a refinement protocol that combines a series of short (5 ns) MD simulations with energy minimization procedures. Testing and benchmarking on 54 CASP8−10 refinement targets and 34 CASP11 refinement targets shows quite promising results. Using only a small fraction of MD simulation steps (nanosecond versus millisecond), systematic protein structure refinement was demonstrated in this work, indicating that refinement of a given model can be achieved using a few hours of desktop computing.
■
INTRODUCTION The Critical Assessment of Structure Prediction (CASP) has continued for more than two decades and substantial progress has been achieved since the first CASP. Both template based modeling (TBM) and template free modeling (FM), in which an evolutionarily related structural template may or may not be available, have advanced considerably through community wide efforts. In contrast, refining comparative models so that they attain experimental accuracy remains as a major challenge.1 In the refinement category of CASP, generally the best server predictions are subsequently released as refinement targets, then the participating groups try to improve these structures blindly. Sometimes the most problematic regions of the protein model are announced along with the initial structure. As a CASP assessor mentioned, because the starting models have often already been refined by the original servers, refinement predictors face the even more difficult challenge of trying to add further value to the best prediction, beyond the capabilities of the best TBM predictors.1 Along with this foreseen hardship, other problems such as inadequate conformational sampling, insufficiently accurate force fields, and insufficiently long molecular dynamics (MD) trajectories2 often complicate the challenge even more, especially if the MD simulation is already used as a refinement tool to generate the original model. Although there has been some progress in the refinement category, the inconsistency of the results still keeps us asking “what is an effective solution for refining a given model?” Further more “is there a simple and general solution for refinement in various studies such as drug discovery and phenotype analysis?” © XXXX American Chemical Society
In more recent CASPs, new approaches have been developed and applied for the refinement category,1,3,4 including physicsbased and knowledge-based approaches, along with their hybrids, fragment-based methods,5 elastic network methods,6 hydrogen bond network optimization,7 and side-chain rebuilding.8−10 All of these approaches bring new insights to the structural refinement problem, although most of them are based on MD.11−18 Particularly in CASP10, longer MD simulations under explicit solvent conditions followed by structural averaging had shown consistent improvement of given models.17,18 In 2015, Princeton_Tigress19 showed that under the implicit solvent condition, which is computationally less demanding, short MD simulations alone could effectively produce better quality models in terms of the Global Distance Test_Total Score (GDT_TS). In this approach, various restraints were applied to backbone atoms, while their strengths were set according to the quality of the initial model. However, the net improvement was rather limited. Protein structure refinement is a complicated problem, and there is no easy and simple solution to it. Till now, systematic and evident protein structure refinement in CASP experiments has been achieved only by extensive MD simulation, which can be performed only by a few groups with extensive computational resources. Besides that, the energy function applied in the simulation as well as the model selection method were rather complicated (ad hoc). Therefore, applying or developing MDbased refinement methods is not an option for most research groups with only limited computational resources. Over the Received: May 6, 2017 Published: August 11, 2017 A
DOI: 10.1021/acs.jctc.7b00470 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
50 steps
50 steps
50 steps
500 ps
500 ps
2 ns
Min2
Min3
Min4
Equl1
Equl2
Prod1
a
2 ns
1 or 2 ns 1 ns
kposc, kdistb
500 ps
200 steps 200 steps 200 steps 200 steps
500 ps
kpos on backbone atoms; kdist on secondary structure Cα−Cα atom pairs
backbone atoms
all heavy atoms
kpos = 3.0, kdistb
kpos = 5.0, kdistb
kpos gradually reduced from 100 to 5
CHARMM 19 implicit solvent/FACTS19
kpos = 5.0, kdistb kpos = 3.0, kdistb kposc, kdistb 0.50 kposc, kdistb 0.25 kposc, kdistb
kpos = 5
kpos = 10
kpos = 50
kpos on backbone atoms; kdist on secondary structure Cα−Cα atom pairs
backbone atoms
kpos = 100 all heavy atoms
AMBER ff14SB implicit solvent/generalized Born
CASP11 protocol
4 ns
500 ps
500 ps
100 steps 200 steps 400 steps 800 steps kpos = 5.0, kdistb kpos = 3.0, kdistb kpos = 0.1, kdistb
kpos = 5
kpos = 10
kpos = 50
kpos = 100
kpos on backbone atoms; kdist on secondary structure Cα-Cα atom pairs
backbone atoms
all heavy atoms
AMBER ff14SB explicit solvent/TIP3P
CASP12 protocol
The ways of generating kpos and kdist are modified based on Princeton_Tigress, please see eqs 1 and 3, and the unit for kpos and kdist is kcal mol−2 Å−2. bSee eq 1. cSee eq 3.
Prod3
Prod2
50 steps
Min1
force field solvation
Princeton_Tigress
Table 1. Comparison among Princeton_Tigress and Our CASP11 and CASP12 Protocolsa
Journal of Chemical Theory and Computation Article
B
DOI: 10.1021/acs.jctc.7b00470 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation
Figure 1. Our CASP12 refinement protocol. The protocol begins from the input structure, proceeds to 4 steps of minimization, 2 steps of equilibration, and 1 step of production. In the equilibration and production, the derived distance constraints on intrastrand Cα-Cα atom pairs were included. Using different length of the production result, the averaged structures MD0, MD1, MD2, MD1.1, MD1.2, MD2.1, and MD2.2 are further minimized for the final submission.
years, many factors were considered to be important for successful protein structure refinement, such as force fields, statistical potentials, restraints, solvent conditions, and trajectory analysis. We believe that these should be investigated in order to develop a much more efficient refinement protocol than available today. A simple but efficient method is much needed to demonstrate the effect of the factors. Here, we propose a simple approach similar to Princeton_Tigress, but with a few modifications eliminating proteinsize and initial-model-quality dependency. Using this approach, our goal is to improve the GDT_TS score of a given model, to exceed not only the initial score but also the Princeton_Tigress result. The details of our protocol are described in the following sections, and the results are shown along with the comparison with Princeton_Tigress and our own CASP11 protocol.20−22 Through this study, we intend to answer the following questions on refinement:
■
3. If the MD simulation is used, is long simulation necessary? 4. Is there an effective way to pick up the best model among many models generated?
MATERIALS AND METHODS Here, we describe our CASP11 and CASP12 refinement protocols, which are based on the Princeton_Tigress MD-only refinement protocol. The comparison of the three protocols is summarized in Table 1 and the entire flowchart of our CASP12 protocol is shown separately in Figure 1. There are two versions of Princeton_Tigress, MD-only and machine learning augmented, and we deal with only the first version in this work. We note that the results of machine-learning-augmented Princeton_Tigress were mixed with slight improvement of average GDT score and decreased number of improved targets. Here, we propose a simple and efficient MD-only protocol for protein structure refinement. Notable differences among Princeton_Tigress and our CASP11 and CASP12 protocols
1. Does the quality of the starting structure matter? 2. Does the target length affect the quality of refinement? C
DOI: 10.1021/acs.jctc.7b00470 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation
⎧GDT_TS × 0.035 − 1.65 nr ≥ 154 k pos = ⎨ ⎪ ⎩(GDT_TS × 0.035 − 1.65) × 1.5 nr < 154
are the choices of force fields and solvent models, the exclusion of distance restraints other than intra-β-strand distance restraints, reduced weight for positional restraint energy term, protein-size, and initial-model-quality dependency, and detailed schedules of minimization procedures and MD simulations (see Table 1). CAPS11 Refinement Protocol. For a given initial model to refine, any missing residues and hydrogen atoms were added using the MODELLER program.23 Two types of restraints were introduced at different stages of this protocol: positional restraints (force constant kpos) and Cα−Cα distance restraints (force constant kdist). We note that the exact values of kpos (in minimization procedures) and kdist used in Princeton_Tigress were not specified in the original publication,19 and in our CASP11 protocol we independently estimated their values to generate results similar to Princeton_Tigress (see Table 1). The distance restraints in our CASP11 protocol were generated as follows: β-strands of a given initial structure were first identified using the DSSP program.24,25 Intra-β-strand Cα pairs were chosen to apply harmonic distance restraints when they satisfy all of the following three conditions that they are
⎪
(3)
GDT_TS represents the initial structure quality provided by CASP organizers, which can be found in the description of each target in the CASP Web site. For targets where GDT_HA was provided (such as in CASP11) instead of GDT_TS, GDT_TS was estimated by GDT_TS = GDT_HA/0.75
Finally, the average structure of the last 2 ns production run was generated and subjected to 100 steps of steepest descent minimization using the same potential energy used in the MD simulation, and we submitted this structure as model 1. CASP12 Refinement Protocol. In the CASP12 protocol, we used the identical force field of our CASP11 protocol, AMBER ff14SB, but the solvation effect was treated differently by using the explicit solvent model of TIP3P.29 All proteins were solvated in a truncated octahedron water box with a 15 Å cutoff to the octahedron edge. The systems were neutralized by adding Na+ and Cl−,30 and then subjected to MD simulations with periodic boundary conditions. The cutoff value of nonbonded interactions was set at 8 Å. The simulations were performed using Langevin dynamics under the constanttemperature and constant-pressure (NPT) conditions at 300 K and 1 atm. The SHAKE algorithm was used to constrain all bonds involved with hydrogen atoms. Similar to our CASP11 protocol, all targets were subjected to four minimization procedures and two equilibration procedures (see Table 1). The four minimizations ran 100, 200, 400, and 800 steps with the same values of kpos and settings used in the CASP11 protocol. The equilibrations were the same as in the CASP11 protocol, but the production run was carried out for 4 ns, regardless of the chain length of the target (nr). In our CASP12 protocol, the positional restraint kpos was set to 0.10 kcal mol−1 Å−2 regardless of the quality of starting structure. The method for generating the distance restraints was identical to the CASP11 protocol. Finally, the average structure of the entire 4 ns production run (denoted as MD0) was generated, along with two consecutive 2 ns average structures (denoted as MD1 and MD2) and four consecutive 1 ns average structures (denoted as MD1.1, MD1.2, MD2.1 and MD2.2). Each of these seven structures were subjected to 100 steps of steepest descent local minimization in order to generate seven final models (see Figure 1). The potential energy function used here was the same as in the MD simulation, except the solvation model switched from explicit solvent29 to implicit solvent.28
1. in a contiguous β-strand, 2. separated by an even number of residues, 3. located from each other within the distance of 15/0.98 (∼15.31) Å. For example, if residues 4−13 are in a β-strand, then the distance restraints are considered from the following Cα−Cα: 4−6, 4−8, 4−10, 4−12, 5−7, 5−9, 5−11, 5−13, 6−8, ..., 11− 13. The harmonic force constant (in units of kcal mol−1 Å−2) was determined by the pair distance between two Cα atoms (rij), measured in Å, as follows: kdist =
7.452 × 102 rij 2
(1)
We used the AMBER14 molecular dynamics package26 for all MD simulations with AMBER force field ff14SB27 and implicit solvent generalized Born model28 in this protocol. All targets were subjected to four minimization procedures, two equilibration procedures, and production procedures as shown in Table 1. All four minimizations ran 200 steps with the force constants of the harmonic positional restraints (kpos) set at 100, 50, 10, and 5 kcal mol−1 Å−2. These restraints were applied to all heavy atoms in the first two minimizations and only to backbone heavy atoms in the second two minimizations. The first and the second equilibrations were carried out for 500 ps each with kpos set at 5.0 and 3.0 kcal mol−1 Å−2, respectively, only on backbone heavy atoms. Starting from the equilibration procedure, additional distance restraints were introduced. The total potential energy is
■
RESULTS AND DISCUSSION Results on 54 Refinement Targets of CASP8, 9, and 10. Unless noted otherwise, all RMSD, TM-score, GDT_TS, and GDT_HA values reported in this work were calculated by aligning a structure to its corresponding native structure using the TM-score program.31 There were 12, 14, and 28 refinement targets for CASP8, 9, and 10, respectively, ranging in size from 63 to 249 amino acids in length. The initial qualities of these 54 targets, in terms of GDT_TS values, range from 44.6 to 92.6, while the secondary structure contents of these targets are quite diverse.
E = E bond + Eangle + Edihedral + E1 − 4NB + E1 − 4EEL + Evdw + Eeelec + Erestraint
(4)
(2)
The length of the production run was set according to the chain length of a given target, nr. If nr ≤ 154, we perform only one production run for 2 ns. If nr ≥ 154, three consecutive production runs were carried out, for 1 ns with kpos, for another 1 ns with kpos/2, and finally for 2 ns with kpos/4. kpocws is defined by the following equation: D
DOI: 10.1021/acs.jctc.7b00470 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation
Table 2. Initial GDT_TS Scores, and ΔGDT_TS Results of Our New Protocol for CASP8, CASP9, and CASP10 Refinement Targets As a Benchmark, along with our CASP11 Protocol and Princeton_Tigress Resultsa CASP8−10 targets
length
start GDT_TS
MD0
MD1
MD2
MD1.1
MD1.2
MD2.1
MD2.2
CASP11 protocol
Princeton_Tigress
TR389 TR429 TR432 TR435 TR453 TR454 TR461 TR462 TR464 TR469 TR476 TR488
135 154 130 137 87 192 157 143 69 63 87 95
80.97 44.57 91.73 81.36 88.08 64.06 89.72 66.78 79.35 82.54 46.55 87.89
0.56 2.17 2.50 3.39 3.49 0.65 2.53 1.93 0.72 1.59 −1.15 4.48
0.56 1.63 2.69 3.39 3.20 0.78 2.53 2.45 1.08 1.98 −0.86 4.22
0.75 1.81 1.92 3.60 3.20 0.65 2.37 1.58 0.72 0.79 −1.15 4.74
0.19 1.08 2.12 2.75 2.91 0.65 2.69 2.63 0.36 1.19 −0.86 4.22
0.19 1.81 2.69 4.02 2.62 0.65 2.37 1.75 0.72 0.79 −0.86 4.48
0.37 1.63 2.12 3.39 3.20 0.65 2.69 1.05 0.36 0.40 −0.86 4.48
0.00 1.44 1.92 3.17 2.62 0.52 2.53 1.93 1.08 0.79 −1.44 3.95
0.00 1.99 0.61 3.07 −1.03 −0.26 −0.16 0.18 −3.26 −1.98 0.58 0.82
−0.56 1.81 −0.19 0.25 −0.15 0.78 0.79 0.65 −2.90 0.00 0.00 0.53
TR517 TR530 TR557 TR567 TR568 TR569 TR574 TR576 TR592 TR594 TR606 TR614 TR622 TR624
159 80 125 142 97 79 102 138 105 140 123 121 122 69
70.91 85.94 67.00 78.17 54.90 72.15 62.01 64.31 90.24 86.61 71.75 69.63 66.80 55.43
0.63 0.93 3.20 2.46 0.00 −0.95 1.47 −0.54 2.38 1.43 1.22 2.68 2.05 0.73
0.47 0.93 2.60 2.29 0.51 −0.95 1.23 −0.72 2.62 1.07 1.42 2.27 1.03 0.73
0.63 0.62 2.80 2.64 0.00 −1.26 1.47 −0.18 2.14 1.60 0.20 2.68 2.05 1.09
0.47 0.62 2.60 2.11 0.25 −0.95 0.49 −1.09 2.62 0.53 0.81 1.86 1.03 0.37
0.32 1.25 2.60 2.82 0.00 −0.63 1.47 0.00 2.62 0.89 0.81 2.68 1.44 0.37
0.63 0.62 2.80 2.11 −0.26 −1.26 0.98 −0.18 2.62 1.25 0.61 1.65 0.62 0.73
0.47 0.00 2.80 2.64 0.00 −1.26 1.47 −0.72 1.43 1.43 0.81 2.89 3.28 1.45
0.32 0.93 0.80 0.18 0.51 0.63 1.96 −0.54 0.47 0.35 1.01 0.62 −1.43 2.18
0.47 0.62 −1.00 0.35 0.25 0.95 0.74 0.73 0.47 0.00 1.22 0.62 1.03 1.45
TR644 TR655 TR661 TR662 TR663 TR671 TR674 TR679 TR681 TR688 TR689 TR696 TR698 TR699 TR704 TR705 TR708 TR710 TR712 TR720 TR722 TR723 TR724 TR738 TR747 TR750 TR752 TR754
141 175 185 75 152 88 132 223 224 185 234 100 119 234 235 96 196 194 186 202 127 132 113 249 98 182 156 68
84.22 78.50 80.00 81.33 69.08 55.68 85.23 71.86 78.27 78.38 87.80 70.75 64.71 84.11 69.70 64.58 86.48 74.87 92.61 57.83 57.09 85.11 59.35 90.06 82.50 76.79 90.37 77.94
0.89 −1.67 1.35 1.34 −0.49 0.57 1.70 1.13 0.79 1.48 −0.24 0.25 0.63 1.22 3.78 1.57 −0.26 1.29 0.27 0.25 −0.59 0.77 −0.44 1.71 0.56 2.61 0.17 0.37
0.89 −1.17 1.22 1.00 −0.17 −0.28 1.51 1.51 0.26 1.48 −0.48 −2.00 0.42 0.78 3.14 1.83 −0.26 1.03 0.40 0.38 −0.59 0.58 −0.22 1.41 0.28 2.61 0.00 −0.37
0.71 −1.83 1.08 1.67 −0.33 0.29 1.32 0.50 0.79 1.35 −0.24 1.25 1.05 1.11 4.22 1.31 −0.13 1.29 0.40 0.38 −0.40 0.96 −0.44 1.41 0.00 2.06 0.00 −0.37
0.35 −0.83 0.95 1.00 −0.17 0.29 1.70 1.76 0.00 1.48 −0.24 −2.25 1.26 1.11 3.24 2.35 −0.38 0.65 0.40 0.76 −0.40 0.00 −0.44 1.71 0.00 2.74 −0.51 0.00
1.06 −1.67 0.95 1.00 −0.33 0.29 1.70 1.38 0.26 1.76 −0.36 −1.50 0.00 1.33 3.46 1.57 −0.77 1.16 0.13 0.12 −0.59 1.15 −0.44 1.10 −0.28 2.88 −0.17 −0.37
0.89 −1.50 1.22 2.00 −0.17 0.00 1.13 −0.25 0.39 1.76 0.00 0.75 0.63 1.22 3.89 2.09 −0.26 1.16 −0.14 0.25 0.19 0.58 0.65 1.21 0.28 3.02 −0.17 −0.73
0.71 −1.83 1.22 1.67 0.16 0.85 1.51 1.38 0.66 1.48 −0.36 1.25 0.84 0.67 4.00 2.09 −0.13 1.29 0.40 −0.38 −0.20 0.77 −0.87 1.21 0.00 1.78 0.17 −0.73
0.18 0.17 0.14 −1.07 0.33 −0.28 1.13 0.63 −0.78 −0.14 0.00 0.00 0.00 0.33 1.19 0.78 0.13 −0.51 0.40 0.00 0.19 0.00 −0.44 0.70 −0.83 1.09 −0.17 −0.73
0.35 −9.79 0.54 0.67 −0.17 0.57 1.13 0.12 −0.39 0.27 0.75 −0.75 1.26 0.11 0.62 1.31 0.00 0.90 0.40 0.38 −0.20 0.20 0.00 0.20 −1.94 0.27 0.17 0.00
1.14 0.83 1.16
1.01 0.78 1.00
1.05 0.81 1.07
0.89 0.80 0.88
0.98 0.78 1.02
0.97 0.80 0.95
1.02 0.81 1.03
0.20 0.70 0.36
0.11 0.79 0.40
avgb fractionb avgc
E
DOI: 10.1021/acs.jctc.7b00470 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation Table 2. continued CASP8−10 targets fractionc
length
start GDT_TS
MD0
MD1
MD2
MD1.1
MD1.2
MD2.1
MD2.2
0.87
0.80
0.84
0.82
0.82
0.78
0.84
a c
CASP11 protocol 0.76
Princeton_Tigress 0.84
b
GDT_TS values have no units, and they are in the range of 0 to 100, where 100 corresponds to the perfect match. Results of all given targets. Results not including TR453, TR462, TR464, TR655, TR662, TR689, TR704, TR724, and TR747, 9 targets.
observe many advantages of using specific MD protocols. For example, the correlation between the ΔGDT_TS score and the target length for all 54 targets was 0.15, and the correlation between ΔGDT_TS and target’s initial quality (starting GDT_TS score) was −0.19. Therefore, in our CASP12 protocol, we chose to set kpos = 0.1 in eq 3 regardless of the target length and initial model quality. It is worth looking into how the distance restraints (see eq 1) actually affect the refinement results. It should be noted that the only distance restraints used in our CASP11 and CASP12 protocols are intra-β-strand ones. To address this issue, we repeated the identical refinement computation without distance restraints on the 54 targets, and the changes of the model quality in GDT_TS were measured (see Table 3). In this case, the trajectories showed increased fluctuations since more flexibility was allowed. Without distance restraints, the GDT_TS score improved by 0.84, 0.77, 1.03, and 0.86 on average from MD1.1, MD1.2, MD2.1, and MD2.2, respectively. On the other hand, the MD simulation with distance restraints resulted in the corresponding GDT_TS improvement of 0.89, 0.98, 0.97, and1.02, which represents more uniform improvement. The best results without distance restraints were all generated by MD0, which improved GDT_TS by 1.06, 1.02, and 1.12 for CASP8, 9, and 10, respectively. With distance restraints, the corresponding improvements were 1.91, 1.26, and 0.75, which is 1.14 on average over 54 targets, slightly better than the results without distance restraints (1.08). MD0 with distance restraints improved 45 out of 54 targets in terms of GDT_TS, better than the result (42) without restraints. With insufficient restraints, the populated structures from short MD simulations are likely to exhibit greater deviation from the initial structure but not to the extent that the sampling could reach a new basin of attraction with better structural quality. To sum up our comparison, the contribution from the distance restraints was not quite significant, but it guided the refinement in a more consistent way within 4 ns of MD simulations. In addition to GDT_TS, RMSD and GDT_HA are two important backbone assessment metrics. The initial values of RMSD and GDT_HA for all 54 targets as well as their changes (i.e., ΔRMSD and ΔGDT_HA) from MD0, MD1, MD2, MD1.1, MD1.2, MD2.1, MD2.2 and CASP11 protocols are shown in Tables 4 and 5. The improvements of RMSD from the 7 CASP12 protocols were marginal, the best being MD0 with the improvement of RMSD only by 0.013 Å on average, but were still better than our CASP11 protocol (0.006 Å). In terms of the refined fraction, 29 out of 54 were improved. Positive correlation between DFIRE and ΔRMSD was reported17 to be useful when selecting the final structure. However, we did not observe such correlation. Similarly, we did not observe correlation between the target chain length and ΔRMSD. Among the 7 CASP12 protocols, MD0 produced the best result with the average improvement of 2.03 in GDT_HA and with 46 out of 54 targets improved. The other 6 protocols also
The performance of our CASP12 protocol is compared to that of our CASP11 protocol as shown in Table 2. The performance is measured in terms of GDT_TS, and the results of Princeton_Tigress19 are also included. The target length and the GDT_TS score of the given model are also shown. However, from our calculation, the GDT_TS scores of 8 given models [3 CASP8 targets (TR453, TR462, and TR464), and 5 CASP10 targets (TR655, TR662, TR689, TR704, and TR747)] are slightly different from the ones listed in Table S2 of ref 19. Additionally, a CASP10 target, TR724, is not listed in ref 19. To make a fair comparison between our results and those of Princeton_Tigress, the amount of net refinement (i.e., total ΔGDT_TS divided by total number of targets) and the fraction of refinement (i.e., number of targets whose quality improved divided by total number of targets), including and excluding the above 9 targets, are both listed in Table 2 (a total of 54 or 45 targets are considered). Princeton_Tigress was shown to improve the GDT_TS score by 0.40 on average for the 45 targets; with 0.38, 0.56, and 0.30 on average for 9 CASP8, 14 CASP9, and 22 CASP10 targets, respectively. Our CASP11 protocol resulted in a comparable result with the average changes in GDT_TS (ΔGDT_TSavg, ΔGDT_TS = GDT_TSrefined − GDT_TSinitial). ΔGDT_TSavg for the 45 targets is 0.36 and ΔGDT_TSavg for 9 CASP8, 14 CASP9, 22 CASP10 targets are 0.52, 0.57, and 0.16 respectively. We note that the performance of our CASP11 protocol is similar to that of the Princeton_Tigress MD-only procedure (denoted as Princeton_Tigress throughout this manuscript). The refined fraction of Princeton_Tigress is 84% (38/45), better than 76% (34/45) of our CASP11 protocol. Using our CASP12 protocol, for each target, we generated 7 models (MD0, MD1, MD2, MD1.1, MD1.2, MD2.1, and MD2.2) by averaging a certain fraction of the production run as indicated in Figure 1. All 7 ΔGDT_TSavg values were better than those of the CASP11 protocol as well as those of the Princeton_Tigress (see Table 2). It should be noted that among the 7 models, MD0 often produced the best result, while the quality of MD1.1 was often not quite satisfactory. We believe that this is due to the fact that MD1.1 was averaged over the first 1 ns MD simulation after the abrupt reduction of kpos from 3.0 to 0.1. Although MD1.1 was not very reliable, for some targets it still generated slightly improved structures in terms of ΔGDT_TS when compared to the models of both Princeton_Tigress and our CASP11 protocol. In terms of the refined fraction, MD0 was the best again. The total number of improved cases by MD0 is 39 out of 45, 1 more than Princeton_Tigress and 5 more than our CASP11 protocol. Explicit solvent, weaker positional constraints, and reduced number of distance restraints (only on Cα atom pairs from β structure, not from α-helix structure) seem to allow the given model to move more freely, which helped to produce betterquality structures. As in Princeton_Tigress, our CASP11 protocol for MD simulations was target length and initial model quality dependent (see eq 3 and Table 1). However, we did not F
DOI: 10.1021/acs.jctc.7b00470 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation
Table 3. Improvement without Distance Restraints in Terms of GDT_TS Results for CASP8, CASP9, and CASP10 Refinement Targets, ΔGDT_TS = GDT_TSwo−restraints − GDT_TSinitiala targets
MD0
MD1
MD2
MD1.1
MD1.2
MD2.1
MD2.2
TR389 TR429 TR432 TR435 TR453 TR454 TR461 TR462 TR464 TR469 TR476 TR488
1.12 1.63 1.73 0.84 1.75 0.13 2.37 2.10 −0.36 −1.19 −1.15 3.69
1.31 1.08 1.15 0.63 2.04 0.13 2.69 0.88 0.00 −1.98 −2.01 3.95
1.31 1.81 2.31 0.42 1.75 0.91 2.37 1.75 0.36 −0.40 −0.86 3.16
1.49 0.72 1.35 1.06 2.62 −0.13 2.37 1.23 −0.36 −0.40 −1.44 3.95
0.93 0.72 0.96 0.42 1.16 0.00 2.69 0.70 0.00 −1.59 −1.72 3.43
0.93 1.81 1.54 0.21 1.75 0.91 2.05 1.75 1.08 −0.79 −0.86 3.16
1.12 2.17 2.12 0.63 1.75 0.65 2.69 1.05 0.00 −0.79 −0.29 2.90
TR517 TR530 TR557 TR567 TR568 TR569 TR574 TR576 TR592 TR594 TR606 TR614 TR622 TR624
0.47 1.56 2.40 0.70 0.51 −0.95 1.23 −1.27 2.38 2.14 2.64 −1.24 1.85 1.82
0.47 0.93 2.20 1.06 0.00 −1.26 1.47 −0.54 2.38 1.78 2.44 −1.65 1.44 1.45
0.32 0.62 3.00 1.06 1.54 −1.26 1.23 −2.35 2.62 1.96 2.44 −1.65 2.46 0.37
0.47 1.56 1.80 1.23 0.25 −0.95 0.49 −0.36 1.43 1.78 2.44 −2.27 1.64 1.45
0.47 1.25 2.60 1.41 0.25 −0.95 2.45 −1.81 2.86 0.89 1.22 −1.45 1.03 1.09
0.63 0.62 2.60 1.06 1.03 −0.31 0.74 −1.45 2.14 1.43 2.64 −1.24 2.46 0.73
0.00 0.31 3.00 1.23 1.54 −2.21 1.23 −2.72 2.86 2.14 2.03 −1.86 2.67 0.73
TR644 TR655 TR661 TR662 TR663 TR671 TR674 TR679 TR681 TR688 TR689 TR696 TR698 TR699 TR704 TR705 TR708 TR710 TR712 TR720 TR722 TR723 TR724 TR738 TR747 TR750 TR752 TR754
0.18 −1.17 1.89 3.34 0.82 1.71 2.84 0.63 2.23 1.48 −0.24 0.75 1.26 1.45 3.14 2.87 0.38 0.39 0.27 0.88 0.19 0.96 −0.22 3.01 1.67 2.61 −0.51 −1.47
0.35 −1.33 1.62 3.34 0.00 1.99 1.70 1.00 0.53 0.94 0.35 0.75 1.05 1.33 2.38 2.87 0.13 1.29 −0.14 0.88 0.00 0.39 −1.31 2.71 1.11 2.19 −0.34 −1.10
−0.35 −1.00 0.95 4.34 0.99 1.71 3.60 0.50 1.83 1.08 0.11 0.75 1.05 1.56 3.68 3.65 0.25 0.77 0.27 0.50 −0.40 0.58 0.00 2.81 0.83 2.61 0.00 −2.57
−0.18 −0.67 1.35 4.00 0.00 1.99 1.51 1.26 0.13 0.67 −0.12 0.50 1.68 1.22 1.73 2.35 0.13 1.16 −0.41 0.38 0.00 0.20 −1.52 2.81 1.11 2.06 −0.67 −0.73
0.35 −2.33 1.62 3.00 0.49 2.27 1.70 0.63 0.53 0.13 −0.36 0.75 0.00 1.56 3.57 3.39 0.51 1.29 −0.14 0.38 0.00 0.20 −0.65 2.71 1.11 2.19 −0.51 −1.84
0.00 −1.33 1.22 3.67 0.82 1.42 3.60 0.75 1.44 0.54 −0.48 1.00 1.47 1.33 3.35 3.39 0.51 0.77 0.27 0.38 −0.20 0.77 0.65 2.81 1.39 2.33 −0.67 −2.20
−0.89 −1.00 0.95 3.34 0.66 1.14 3.60 0.25 1.44 1.35 0.23 0.50 −0.21 1.33 4.00 3.65 0.13 0.77 0.00 0.12 −0.40 0.39 −1.52 2.51 0.28 2.33 −0.34 −2.94
avg fraction
1.08 0.78
0.87 0.81
1.06 0.83
0.84 0.74
0.77 0.80
1.03 0.81
0.86 0.78
G
DOI: 10.1021/acs.jctc.7b00470 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation Table 3. continued a
GDT_TS values have no units, and they are in the range of 0 to 100, where 100 corresponds to the perfect match.
negative value means improvement) from the 7 averaged structures as well as that of the CASP11 protocol are listed in Table 8. The CASP11 protocol improved all targets in terms of MolProbity except TR530, which is worse than the initial by 0.154. MolProbity scores of MD0, MD1, MD1.2, and MD2.2 structures were better than those from the CASP11 protocol for this target, but none of them were better than that of the initial structure. It is possible that the quality of the initial structure was already quite good (GDT_TS 86, GDT_HA 69, RMSD 1.99, MolProbity 0.612), so there might not be much room for further improvement. This trend is also found in targets TR661 and TR710 with initial MolProbity scores 0.766 and 0.500, respectively. We note that the perfect MolProbity score is 0.5. Overall, for all 54 targets, the MolProbity score improved by 1.10, 1.16, 1.21, 1.18, 1.28, 1.26, 1.30, and 1.27 on average for MD0, MD1, MD2, MD1.1, MD1.2, MD2.1, MD2.2, and CASP11 protocol, respectively. Comparison of Model Selection Methods. Before finalizing our CASP12 protocols, different model selection/ ranking methods were tested. Cluster analysis is to examine the structure population from production trajectory. For each CASP10 refinement target, we applied CPPTRAJ33 with the epsilon clustering option, to the RMSD distance metric on the entire 4 ns production trajectory, to generate 5 clusters. After clustering, the averaged structure for each cluster was generated and then was energy minimized by 100 steps of the steepest descent. The quality of the 5 cluster models is summarized in Supporting Information Table S1−S3. For some targets, their cluster models win in one metric, but on average, none of the 5 cluster models are of better quality than our CASP12 protocol models in terms of GDT_TS (Table S1) and RMSD (Table S2). The best model among the 5 cluster models is of slightly better GDT_HA score than our CASP12 MD1 and MD1.1 protocol models on average (Table S3) but is not as good as all the other CASP12 protocol models. A similar trend is observed when we use the second 2 ns production trajectory, whose results are shown in Supporting Information Table S4−S6. In recent CASPs, ProQ234 performed quite well in the model quality assessment category. We used ProQ2 to rank the structures of the entire 4 ns and the second 2 ns production trajectory. For the 4 ns trajectory, top 20% ranked structures were selected and their averaged structure was energy minimized. For the second 2 ns trajectory, top 40% ranked structures were selected for averaging and minimization. The quality of these 2 models in terms of GDT_TS, RMSD, and GDT_HA are shown in Supporting Information Table S7−S9. The second 2 ns trajectory is better than the 4 ns model, but not as good as our CASP12 protocols models in either GDT_TS or GDT_HA. As an attempt to select a better quality model, a separate score different from the energy used for sampling is often considered. Here, we tried AMBER plus DFIRE statistical potential and DFIRE alone to rank structures from the second 2 ns simulation. A total of 40%, 20%, and 10% of the top ranked structures were selected for averaging and energy minimization for both cases. The results using AMBER plus DFIRE are shown in Supporting Information Table S10−S12, and the results using DFIRE are shown in Table S13−S15.
worked well and the average improvements were mostly three times better than that of our CASP11 protocol (0.54 GDT_HA improvement) except MD1.1 (1.61 GDT_HA improvement, see Table 5). MD2.2 produced the second least improvement in ΔGDT_HA on average. Since CASP accepts five models for each target, we decided to use the following 5 CASP12 protocols MD0, MD2, MD1, MD1.2, and MD2.1, to respectively generate models 1, 2, 3, 4, and 5, according to their average values in Table 5. Using the 5 protocols, many targets were improved by more than 2.0 in ΔGDT_HA (see Table 5), and TR461 was improved the most considering all targets and all 5 protocols (8.39 GDT_HA improvement). Previously, MD simulations were not particularly successful in refining given protein 3D models. For example, the inability to refine protein structures by MD was reported by Levitt.32 However, CASP10 marked the first case where well-designed restrained MD simulations was shown to improve protein structures in a systematic way. Mirjalili and Feig17 (method name FEIG) reported their refinement success tested on 27 CASP10 targets, where the average improvement of 2.6 GDT_HA score was achieved (see Table 6). In terms of GDT_HA, 24 targets improved, one tied, and two became worse. The comparison of our 5 CASP12 protocols against the FEIG method of CASP10 is shown in Table 6. We note that our models are from MD0. Although our values are less outstanding compared to those by FEIG, we note that the amount of computational resources used in our CASP12 protocol is only about 1000th of that from FEIG. Also, it should be noted that 81% of the targets tested improved in their GDT_HA scores, with the average improvement of 1.5 GDT_HA units in our case. Among 28 CASP10 refinement targets, TR722 was not included in ref 17. For the 27 CASP10 targets, the best of five were selected individually among our five CASP12 protocols. The average ΔGDT_HA of the best model for 27 of the targets was 2.0, not as high as FEIG’s 3.8. The best of five models both from FEIG and our CASP12 protocol failed in 2 separate cases (TR698 and TR754 from FEIG, TR655 and TR689 from our protocol), FEIG’s best model for target TR754 was much worse than the initial (ΔGDT_HA = −6.3), whereas we slightly improved the GDT_HA score of this target (ΔGDT_HA = 1.5). It should be noted that, in many cases, the best model is not from the model 1. So the question arises: Is there a method or a metric to help identify better quality structures among all generated models from simulation. Energies are often considered to be important metrics, so we analyzed the correlation between various energy components and scores for each structure of all CASP8, 9, and 10 targets (Table 7). Unfortunately, no clear correlations between energy components (DFIRE energy, potential energy, and positional restraint energy) and the structural quality scores (RMSD, GDT_TS, GDT_HA, and TM-score) were observed, which means that even though there may exist better quality models, they can not be identified under current setting. According to the CASP11 evaluation, all LEER model 1 exhibited significant improvement in MolProbity. All 54 refinement targets’ initial MolProbity scores and their improvements (ΔMolProbity = MolProbityrefined − MolProbityinitial, a H
DOI: 10.1021/acs.jctc.7b00470 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation
Table 4. Initial RMSD (in Å) and ΔRMSD (in Å) Results for CASP8, CASP9, and CASP10 Refinement Targets along with Our CASP11 Protocol Results CASP8−10 targets
start RMSD
MD0
MD1
MD2
MD1.1
MD1.2
MD2.1
MD2.2
CASP11 protocol
TR389 TR429 TR432 TR435 TR453 TR454 TR461 TR462 TR464 TR469 TR476 TR488
2.638 6.796 1.646 1.886 1.331 3.238 1.634 2.541 2.734 2.176 6.848 2.109
0.005 0.006 −0.115 −0.182 −0.074 −0.050 −0.123 −0.043 −0.031 −0.171 0.013 −0.130
0.028 −0.001 −0.130 −0.175 −0.061 −0.035 −0.129 −0.048 −0.019 −0.181 0.044 −0.123
−0.018 0.025 −0.095 −0.186 −0.072 −0.063 −0.110 −0.019 −0.036 −0.145 0.026 −0.119
0.018 −0.008 −0.128 −0.143 −0.072 −0.041 −0.135 −0.047 −0.003 −0.167 0.079 −0.100
0.034 0.004 −0.115 −0.201 −0.059 −0.022 −0.112 −0.024 −0.025 −0.166 0.005 −0.134
−0.003 0.018 −0.108 −0.187 −0.090 −0.048 −0.107 0.006 −0.037 −0.108 0.025 −0.106
−0.028 0.033 −0.074 −0.178 −0.076 −0.079 −0.104 −0.034 −0.018 −0.182 0.028 −0.136
−0.040 −0.123 0.020 −0.015 0.034 −0.161 0.029 −0.009 0.082 0.009 −0.106 0.004
TR517 TR530 TR557 TR567 TR568 TR569 TR574 TR576 TR592 TR594 TR606 TR614 TR622 TR624
4.646 1.990 4.074 3.435 6.149 3.011 3.583 6.851 1.257 1.818 4.850 6.491 7.474 5.189
0.014 −0.066 0.011 −0.030 0.083 0.028 0.040 0.228 −0.109 −0.009 0.075 0.005 0.195 0.005
0.019 −0.084 0.003 −0.012 0.067 0.025 0.050 0.247 −0.124 −0.013 0.059 −0.008 0.215 0.016
0.013 0.008 0.022 −0.035 0.093 0.034 0.033 0.212 −0.075 0.002 0.091 −0.010 0.177 0.002
0.010 −0.115 0.012 −0.013 0.062 0.037 0.049 0.247 −0.088 −0.009 0.056 0.006 0.223 0.027
0.035 −0.054 0.000 −0.015 0.077 0.021 0.060 0.246 −0.120 −0.008 0.062 0.020 0.185 0.018
0.028 0.018 0.021 −0.031 0.088 0.046 0.031 0.208 −0.074 −0.006 0.079 −0.002 0.189 0.025
−0.001 0.003 0.008 −0.040 0.093 0.026 0.030 0.218 −0.062 0.018 0.107 −0.025 0.176 −0.013
0.006 0.009 0.030 0.013 0.131 0.028 −0.025 0.086 −0.025 −0.014 −0.007 0.016 −0.586 0.011
TR644 TR655 TR661 TR662 TR663 TR671 TR674 TR679 TR681 TR688 TR689 TR696 TR698 TR699 TR704 TR705 TR708 TR710 TR712 TR720 TR722 TR723 TR724 TR738 TR747 TR750 TR752 TR754
2.712 3.970 2.743 1.920 3.372 7.716 3.444 3.949 2.305 2.524 1.573 3.519 4.653 2.211 2.540 4.709 4.626 2.440 1.992 8.515 4.422 2.232 5.951 1.396 1.956 2.125 1.495 2.410
0.008 −0.007 −0.022 −0.022 0.013 0.092 0.055 −0.056 −0.043 −0.005 −0.010 0.025 −0.020 −0.011 −0.142 0.031 0.094 −0.001 0.014 0.058 0.044 −0.022 0.040 −0.109 0.006 −0.122 −0.019 −0.123
−0.004 −0.023 −0.008 −0.022 0.009 0.100 0.061 −0.057 −0.037 −0.011 −0.005 0.040 −0.029 0.025 −0.134 0.060 0.100 −0.001 0.025 0.066 0.043 −0.012 0.051 −0.102 0.013 −0.126 −0.003 −0.083
0.021 0.015 −0.002 −0.025 0.025 0.085 0.041 −0.042 −0.036 0.006 −0.010 0.023 −0.025 −0.036 −0.147 0.009 0.084 0.009 0.004 0.051 0.039 −0.037 0.033 −0.103 0.010 −0.119 −0.033 −0.144
−0.021 −0.032 −0.006 −0.033 −0.005 0.123 0.040 −0.046 −0.040 −0.012 0.010 0.085 −0.039 0.036 −0.128 0.072 0.087 0.003 0.034 0.062 0.042 0.009 0.064 −0.090 0.009 −0.128 0.012 −0.076
0.009 −0.006 0.002 −0.010 0.024 0.086 0.083 −0.067 −0.024 −0.003 −0.014 0.036 −0.012 0.018 −0.129 0.043 0.114 0.005 0.018 0.063 0.044 −0.034 0.039 −0.097 0.029 −0.117 −0.016 −0.075
0.027 0.030 −0.005 −0.021 0.023 0.079 0.046 −0.034 −0.029 0.011 −0.025 0.029 −0.015 −0.037 −0.139 0.024 0.074 0.015 0.013 0.051 0.033 −0.046 0.030 −0.090 0.003 −0.124 −0.024 −0.135
0.021 0.005 −0.007 −0.023 0.027 0.095 0.035 −0.047 −0.036 0.006 0.013 0.018 −0.030 −0.034 −0.148 0.000 0.097 0.008 0.004 0.060 0.054 −0.021 0.038 −0.104 0.025 −0.112 −0.041 −0.135
0.000 −0.012 0.018 −0.001 −0.011 0.067 0.028 0.012 −0.017 0.032 0.018 0.036 0.015 −0.023 −0.012 0.050 0.030 0.004 0.004 0.066 0.006 −0.004 0.050 −0.036 0.015 −0.060 −0.002 −0.003
−0.013 0.537
−0.008 0.574
−0.010 0.481
−0.004 0.500
−0.005 0.481
−0.007 0.481
−0.010 0.500
−0.006 0.407
avg fraction
I
DOI: 10.1021/acs.jctc.7b00470 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation
Table 5. Initial GDT_HA Score and ΔGDT_HA Results of Our Protocol for CASP8, CASP9, and CASP10 Refinement Targets As a Benchmark along with our CASP11 Protocol Resultsa CASP8−10 targets
start GDT_HA
MD0
MD1
MD2
MD1.1
MD1.2
MD2.1
MD2.2
CASP11 protocol
TR389 TR429 TR432 TR435 TR453 TR454 TR461 TR462 TR464 TR469 TR476 TR488
63.25 25.72 77.50 65.04 72.09 42.32 72.78 43.71 59.78 63.49 28.74 75.00
−0.19 3.45 4.62 4.88 4.65 1.56 6.81 2.09 1.09 5.16 −2.02 4.47
−0.19 3.08 5.00 5.51 3.78 1.95 6.97 2.79 1.81 6.35 −1.44 4.21
−0.38 3.81 3.27 4.45 3.78 1.56 6.65 1.57 1.09 3.18 −1.44 4.47
−0.56 2.00 4.23 4.24 4.36 1.95 7.76 2.97 0.00 4.37 −2.02 3.42
−0.75 3.45 4.23 6.99 2.91 1.69 6.49 1.92 1.09 3.57 −1.44 3.95
−0.56 3.45 3.85 4.88 4.07 1.43 8.39 1.05 1.09 2.78 −1.15 3.95
−1.12 3.63 3.27 2.97 2.91 1.56 5.38 1.92 1.45 2.78 −2.88 4.21
−0.38 3.81 1.13 3.82 −0.70 −0.39 0.16 0.35 −2.53 −0.39 0.28 1.08
TR517 TR530 TR557 TR567 TR568 TR569 TR574 TR576 TR592 TR594 TR606 TR614 TR622 TR624
53.62 69.06 47.80 58.27 35.82 52.22 39.95 45.29 73.33 66.96 52.64 52.48 48.57 35.87
2.67 1.25 4.20 3.53 0.26 −0.32 3.68 0.18 3.57 2.86 2.03 3.92 2.86 1.45
2.35 0.94 3.00 3.17 1.04 −0.32 3.43 −0.91 3.57 2.68 2.03 3.51 1.84 1.45
2.51 0.00 3.80 3.17 0.26 −1.59 3.43 0.18 3.81 3.58 1.43 3.51 2.86 1.81
1.88 1.25 2.60 3.00 0.26 −0.32 2.21 −1.27 2.86 1.79 1.02 3.51 2.04 1.45
2.51 1.57 3.40 3.88 0.52 0.00 3.92 −0.18 4.05 2.50 1.43 2.89 2.04 1.09
2.51 0.00 3.60 2.65 −0.25 −1.27 2.45 0.00 3.81 2.86 2.24 2.69 0.82 1.45
2.51 0.94 3.80 3.53 0.26 −1.59 2.94 −0.54 3.10 3.40 2.03 3.31 3.89 2.17
0.31 2.19 1.00 0.00 0.52 0.31 2.94 −1.09 0.96 1.08 0.82 1.03 −0.21 2.90
TR644 TR655 TR661 TR662 TR663 TR671 TR674 TR679 TR681 TR688 TR689 TR696 TR698 TR699 TR704 TR705 TR708 TR710 TR712 TR720 TR722 TR723 TR724 TR738 TR747 TR750 TR752 TR754
66.84 60.33 59.46 60.33 49.34 36.65 70.45 51.26 57.46 56.89 72.99 50.25 44.54 64.44 48.48 43.75 72.19 53.61 80.51 39.77 38.19 66.03 41.52 74.20 63.33 55.08 75.84 58.82
2.66 −0.66 2.30 2.00 −0.49 0.57 3.98 1.38 2.36 2.16 −2.02 0.50 0.63 3.12 3.90 1.82 0.39 1.67 1.08 0.89 −0.59 2.48 0.44 3.61 1.39 3.99 −0.33 1.47
1.95 −0.83 1.62 1.67 0.17 0.28 3.79 2.01 1.96 2.30 −2.26 −2.00 0.42 3.12 3.14 2.08 0.39 1.03 0.54 1.01 −0.39 2.67 0.65 3.21 0.84 3.85 −1.18 0.37
2.66 −0.66 1.62 2.67 −0.66 0.85 3.22 1.00 2.36 2.16 −2.26 2.50 1.05 2.23 4.66 1.56 0.90 2.19 1.48 1.01 −0.20 2.67 0.00 3.41 0.28 3.57 −0.16 0.37
1.25 −0.16 1.22 2.00 −0.33 0.28 4.36 2.51 1.44 2.30 −1.90 −2.75 0.84 2.45 3.47 3.39 0.26 0.64 0.40 1.14 −0.20 1.53 0.65 3.81 0.84 3.71 −1.85 0.37
2.66 −0.83 1.89 1.67 −0.16 1.13 3.41 2.13 1.96 2.43 −2.49 −1.25 −0.21 3.23 3.68 1.56 0.13 1.29 0.27 0.51 −0.79 3.44 0.87 2.21 1.11 4.26 −0.33 0.37
2.84 −0.83 1.49 3.34 −0.66 0.57 3.22 0.00 1.83 2.57 −2.14 1.75 1.05 2.78 4.23 2.60 0.64 1.67 0.27 0.63 0.20 2.48 1.52 2.81 0.84 4.26 −0.16 0.37
2.31 −1.00 1.35 2.34 0.00 1.13 3.22 1.76 2.75 2.43 −2.49 2.25 0.84 1.89 4.44 2.60 0.77 2.19 0.81 −0.38 0.20 3.44 −0.22 3.61 0.28 3.30 0.00 −0.73
0.71 0.84 1.62 −0.46 0.17 −0.85 1.52 0.88 −0.52 0.41 −0.12 0.25 0.21 2.00 1.20 0.52 0.00 −0.39 −0.13 0.26 0.39 1.53 −0.43 1.50 −0.55 1.51 −1.18 −0.73
2.03 0.85
1.85 0.83
1.88 0.85
1.61 0.81
1.81 0.81
1.80 0.85
1.76 0.83
0.54 0.69
avg fraction
J
DOI: 10.1021/acs.jctc.7b00470 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation Table 5. continued a
GDT_TS values have no units, and they are in the range of 0 to 100, where 100 corresponds to the perfect match.
Table 6. Refinement Results Showing the First Submitted Model and the Best of Five Submitted Models from our New Protocol in comparison with FEIG17 in Terms of ΔRMSD (in Å) and ΔGDT_HA for 27 CASP10 Targets our new protocol first submitted model
FEIG best of five models
first submitted model
best of five models
target
ΔRMSD
ΔGDT_HA
ΔRMSD
ΔGDT_HA
ΔRMSD
ΔGDT_HA
ΔRMSD
ΔGDT_HA
TR644 TR655 TR661 TR662 TR663 TR671 TR674 TR679 TR681 TR688 TR689 TR696 TR698 TR699 TR704 TR705 TR708 TR710 TR712 TR720 TR723 TR724 TR738 TR747 TR750 TR752 TR754
0.01 −0.01 −0.02 −0.02 0.01 0.09 0.06 −0.06 −0.04 0.00 −0.01 0.02 −0.02 −0.01 −0.14 0.03 0.09 0.00 0.01 0.06 −0.02 0.04 −0.11 0.01 −0.12 −0.02 −0.12
2.7 −0.7 2.3 2.0 −0.5 0.6 4.0 1.4 2.4 2.2 −2.0 0.5 0.6 3.1 3.9 1.8 0.4 1.7 1.1 0.9 2.5 0.4 3.6 1.4 4.0 −0.3 1.5
0.03 −0.03 −0.02 −0.02 0.01 0.09 0.04 −0.05 −0.04 0.01 0.01 0.02 −0.03 0.02 −0.15 0.07 0.08 0.01 0.00 0.06 −0.03 0.03 −0.09 0.01 −0.12 −0.04 −0.12
2.8 −0.2 2.3 3.3 0.2 1.1 4.4 2.5 2.8 2.6 −1.9 2.5 1.1 3.2 4.7 3.4 0.9 2.2 1.5 1.1 3.4 1.5 3.8 1.4 4.3 0.0 1.5
−0.03 0.04 −0.03 −0.20 −0.12 −0.01 0.00 0.01 −0.04 0.01 −0.10 −0.13 −0.02 −0.09 −0.17 −0.14 0.09 −0.04 −0.08 0.02 −0.13 −0.01 −0.20 −0.10 −0.16 −0.12 0.09
2.8 0.3 1.9 5.3 2.6 0.6 4.9 0.6 1.1 1.5 3.5 3.5 −0.4 4.6 3.9 6.0 2.7 4.3 3.4 2.7 6.5 2.6 6.0 0.8 4.8 1.4 −6.3
−0.55 0.00 −0.03 −0.25 −0.15 −0.25 −0.06 −0.03 −0.15 −0.02 −0.13 −0.33 −0.02 −0.09 −0.23 −0.24 0.09 −0.06 −0.14 −0.99 −0.39 −0.48 −0.30 −0.10 −0.16 −0.12 0.09
5.3 0.3 1.9 6.7 3.6 2.8 4.9 3.3 5.4 2.2 4.9 4.8 −0.4 4.6 5.6 7.3 2.9 4.3 5.0 3.2 9.7 3.7 9.5 0.8 4.8 1.4 −6.3
avg
−0.01
1.5
−0.01
2.0
−0.06
2.6
−0.19
3.8
Table 7. Average Correlation showing for 54 CASP8−CASP10 Refinement Targetsa RMSD
GDT_TS
GDT_HA
TMscore
protocol
DFIRE
potential
positional
DFIRE
potential
positional
DFIRE
potential
positional
DFIRE
potential
positional
MD0 MD1 MD2 MD1.1 MD1.2 MD2.1 MD2.2
0.09 0.08 0.09 0.10 0.10 0.10 0.08
0.02 0.02 0.01 0.02 0.01 0.03 0.00
0.00 0.01 0.01 0.02 0.03 0.03 0.01
−0.08 −0.09 −0.07 −0.09 −0.08 −0.11 −0.06
−0.04 −0.03 −0.02 −0.03 −0.02 −0.03 −0.01
−0.01 −0.02 −0.04 −0.04 −0.04 −0.05 −0.04
−0.10 −0.10 −0.08 −0.10 −0.10 −0.12 −0.08
−0.05 −0.04 −0.03 −0.04 −0.02 −0.03 −0.02
0.00 −0.01 −0.04 −0.03 −0.04 −0.05 −0.04
−0.09 −0.10 −0.08 −0.10 −0.09 −0.12 −0.07
−0.04 −0.04 −0.03 −0.03 −0.02 −0.03 −0.01
−0.01 0.00 −0.04 −0.02 −0.03 −0.07 −0.04
a
The correlation is between energy (DFIRE, potential, positional restraint energies) and score (RMSD, GDT_TS, GDT_HA, and TMscore) of each frame from 4 ns MD run, and averaged over frames 1−4000 for MD0, 1−2000 for MD1, 2001−4000 for MD2, 1-−000 for MD1.1, 1001−2000 for MD1.2, 2001−3000 for MD2.1, and 3001−4000 for MD2.2.
None of these models are better than our CASP12 protocols models. Benchmarking Results on CASP11 34 Refinement Targets. Considering all metrics, MD0 produced better quality models in a consistent manner. We practiced this method on 34 CASP11 refinement targets, and the performance of this protocol is shown in Table 9. Along with the ΔGDT_TS, ΔGDT_HA, and ΔMolProbity results, the initial quality of
each target, as well as the performance of top ranked models (each top ranked model represents the one with the highest GDT_TS score among all models listed in the target evaluation table. More information about the top ranked models can be found in Supporting Information Table S20), and the results generated by our CASP11 protocol are also shown in Table 9. Once again, MD simulation is shown to be effective in improving the MolProbity score. Our CASP12 MD0 protocol K
DOI: 10.1021/acs.jctc.7b00470 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation Table 8. Initial MolProbity Score and ΔMolProbity Results for CASP8, CASP9, and CASP10 Refinement Targetsa CASP8−10 targets
start MolProbity
MD0
MD1
MD2
MD1.1
MD1.2
MD2.1
MD2.2
CASP11 protocol
TR389 TR429 TR432 TR435 TR453 TR454 TR461 TR462 TR464 TR469 TR476 TR488
2.678 4.366 1.820 2.529 2.533 2.080 2.415 3.710 0.946 2.816 2.008 2.548
−1.691 −2.387 −1.159 −2.029 −1.802 −1.086 −1.448 −1.760 −0.168 −1.827 −0.437 −1.827
−1.643 −2.503 −1.075 −1.664 −1.793 −1.420 −1.517 −2.216 0.119 −1.827 −0.902 −2.048
−1.691 −2.574 −1.320 −1.871 −1.802 −1.362 −1.139 −2.307 −0.446 −1.827 −0.704 −1.972
−1.573 −2.528 −1.320 −1.690 −1.458 −1.545 −1.262 −2.482 −0.446 −1.403 −0.855 −2.017
−1.691 −2.498 −1.159 −2.029 −2.033 −1.325 −1.298 −2.795 −0.168 −2.316 −1.112 −2.048
−1.766 −2.696 −1.159 −1.754 −1.947 −1.580 −1.303 −2.242 −0.446 −2.005 −0.888 −1.743
−1.807 −2.951 −1.320 −1.871 −2.033 −1.420 −1.175 −2.746 −0.446 −2.316 −0.957 −1.743
−1.691 −3.009 −1.320 −1.435 −1.793 −1.580 −1.552 −2.356 −0.446 −1.830 −1.058 −1.972
TR517 TR530 TR557 TR567 TR568 TR569 TR574 TR576 TR592 TR594 TR606 TR614 TR622 TR624
1.207 0.612 1.326 1.557 1.485 0.773 3.108 3.367 2.789 2.795 3.702 2.851 2.821 1.750
−0.379 0.005 −0.621 −0.730 −0.077 −0.273 −1.618 −1.684 −1.921 −1.534 −2.424 −1.090 −1.255 −1.073
−0.308 0.005 −0.826 −0.537 −0.592 −0.273 −1.638 −1.462 −1.950 −1.499 −1.951 −1.242 −1.217 −1.250
−0.464 0.154 −0.366 −0.871 −0.592 0.000 −1.623 −1.962 −2.022 −1.623 −2.700 −1.251 −1.411 −1.073
−0.464 0.154 −0.732 −1.003 −0.183 −0.273 −1.504 −1.301 −1.921 −1.588 −2.735 −1.276 −1.110 −0.934
−0.379 0.005 −0.826 −0.806 −0.308 −0.146 −1.645 −1.668 −2.087 −1.588 −2.570 −1.836 −1.389 −1.073
−0.379 0.492 −0.317 −0.673 −0.238 −0.146 −1.766 −1.870 −2.124 −1.623 −2.923 −1.724 −1.151 −1.073
−0.567 0.005 −0.826 −0.848 −0.324 −0.146 −1.811 −1.823 −2.022 −1.664 −2.796 −1.472 −1.549 −1.073
−0.707 0.154 −0.621 −0.734 −0.492 −0.273 −1.753 −1.856 −2.087 −1.240 −2.771 −1.156 −1.480 −0.934
TR644 TR655 TR661 TR662 TR663 TR671 TR674 TR679 TR681 TR688 TR689 TR696 TR698 TR699 TR704 TR705 TR708 TR710 TR712 TR720 TR722 TR723 TR724 TR738 TR747 TR750 TR752 TR754
2.214 4.478 0.766 1.609 3.145 3.137 2.806 1.000 2.848 1.472 2.376 3.239 1.681 1.829 2.591 3.350 2.319 0.500 2.028 1.114 0.500 1.160 2.851 1.967 1.335 2.433 1.073 2.030
−1.714 −3.001 0.422 −0.688 −1.473 −1.956 −1.578 −0.070 −1.678 −0.434 −1.033 −1.829 −0.773 −0.585 −1.446 −1.854 −1.274 0.221 −1.224 −0.231 0.000 −0.380 −1.843 −1.040 −0.198 −1.151 0.024 −0.395
−1.714 −3.144 0.001 −1.109 −1.654 −1.239 −1.816 −0.104 −1.716 −0.703 −1.335 −2.011 −0.515 −0.673 −1.542 −1.939 −1.600 0.366 −1.400 0.049 0.167 −0.380 −1.843 −1.062 −0.198 −1.476 0.024 −0.838
−1.553 −3.396 0.070 −1.109 −1.728 −2.141 −1.584 −0.210 −1.683 −0.532 −1.389 −2.536 −0.582 −0.924 −1.653 −1.939 −1.109 0.125 −1.400 −0.168 0.000 −0.212 −1.398 −1.084 −0.473 −1.535 0.057 −0.545
−1.393 −3.184 0.001 −1.109 −1.409 −2.058 −1.936 −0.062 −1.783 −0.433 −1.400 −1.973 −0.600 −0.819 −1.453 −2.229 −1.786 0.221 −1.157 −0.323 0.167 −0.380 −1.509 −1.161 −0.543 −1.182 −0.088 −0.569
−1.510 −3.450 0.070 −0.847 −1.866 −2.195 −2.049 0.069 −1.716 −0.532 −1.508 −1.704 −0.872 −0.817 −1.758 −2.046 −1.491 0.366 −1.528 −0.423 0.000 −0.660 −1.709 −1.221 −0.543 −1.613 −0.138 −0.553
−1.714 −3.344 0.168 −1.109 −1.637 −1.956 −1.442 −0.249 −1.869 −0.576 −1.449 −2.437 −0.886 −0.849 −1.653 −2.358 −1.569 0.125 −1.400 −0.391 0.000 −0.660 −1.898 −1.040 −0.338 −1.382 −0.088 −0.926
−1.553 −3.486 −0.005 −1.109 −1.690 −2.195 −1.817 0.048 −1.627 −0.660 −1.428 −2.154 −1.070 −1.112 −1.721 −2.001 −1.055 0.125 −1.400 −0.601 0.000 −0.660 −1.548 −1.005 −0.473 −1.535 0.057 −0.606
−1.310 −3.396 −0.221 −1.109 −1.551 −2.195 −1.800 −0.210 −1.758 −0.535 −1.510 −1.709 −0.650 −0.876 −1.727 −1.875 −1.522 0.000 −1.311 −0.614 0.000 −0.660 −1.637 −1.457 −0.338 −1.776 −0.290 −0.613
−1.101
−1.160
−1.212
−1.178
−1.278
−1.259
−1.296
−1.271
avg
a
ΔMolProbity = MolProbityrefined − MolProbityinitial, and a negative value means improvement. L
DOI: 10.1021/acs.jctc.7b00470 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation
Table 9. GDT_TS, GDT_HA, and MolProbity Scores for the Initial 34 CASP11 Refinement Targets and the Performance of Our Protocol in Terms of ΔGDT_TS, ΔGDT_HA, and ΔMolProbitya start GDT_TS
top rankeda
new protocol
CASP11 protocol
start GDT_HA
top rankeda
new protocol
CASP11 protocol
initial MolProbity
top rankeda
new protocol
CASP11 protocol
TR217 TR274 TR280 TR283 TR759 TR760 TR762 TR765 TR768 TR769 TR772 TR774 TR776 TR780 TR782 TR783 TR786 TR792 TR803 TR810 TR811 TR816 TR817 TR821 TR822 TR823 TR827 TR829 TR833 TR837 TR848 TR854 TR856 TR857
82.62 45.90 78.39 62.82 62.90 75.12 85.60 77.30 80.94 79.38 70.58 57.67 82.19 74.21 83.18 77.78 69.01 78.44 52.05 71.56 90.24 71.69 75.75 69.71 53.29 61.81 56.74 66.79 76.16 65.91 75.54 78.21 79.72 55.73
2.86 2.32 7.55 1.92 13.71 1.75 1.85 11.52 5.42 4.64 1.39 2.50 2.17 4.74 4.32 2.78 4.15 11.87 4.48 2.33 1.00 12.13 10.57 12.45 6.36 3.90 6.60 6.72 5.09 2.69 6.16 3.22 0.94 4.69
3.21 0.14 2.86 0.96 5.65 −0.99 0.10 4.94 2.80 4.38 1.01 0.16 1.94 2.11 1.14 2.57 1.73 1.25 2.05 1.33 0.00 1.84 0.67 3.72 3.73 2.95 0.51 −0.75 1.39 −0.41 3.26 0.36 −0.79 0.26
1.55 −1.23 1.04 1.92 0.81 0.13 −0.39 1.98 0.70 0.77 0.00 −0.50 1.71 0.79 0.23 0.41 1.73 0.93 1.49 1.33 0.40 0.37 10.57 2.05 1.53 1.30 0.25 −0.37 0.92 0.00 0.55 −0.35 −0.16 −0.52
64.40 28.96 59.38 41.19 43.95 57.34 70.43 57.89 63.81 58.51 52.40 37.67 62.79 53.95 64.77 57.30 47.81 58.13 33.21 53.89 73.21 52.21 58.77 48.33 30.26 40.54 33.94 50.37 61.34 43.39 58.15 58.57 61.48 33.07
3.10 −1.23 11.71 3.36 14.52 −0.13 0.97 19.74 8.57 5.92 1.01 3.16 6.50 5.26 8.64 5.05 5.88 13.12 4.85 5.67 3.98 11.03 7.55 13.83 7.02 5.99 6.60 6.35 3.24 2.68 5.80 6.07 −0.16 4.17
5.12 0.00 4.16 1.12 6.05 −0.62 −0.68 4.61 3.85 4.38 0.63 1.00 4.79 1.84 2.73 5.97 2.07 1.87 2.80 2.55 0.99 1.47 1.61 6.28 4.61 3.99 0.52 −1.12 2.78 −0.21 3.99 2.50 −0.79 −0.26
2.39 −1.91 1.56 3.36 1.61 0.50 −0.68 1.98 1.05 0.25 −0.51 0.50 3.88 0.79 0.68 1.14 1.96 0.93 1.68 2.33 1.49 −0.37 7.55 2.75 1.54 1.91 −0.13 −0.37 1.39 0.00 1.09 0.00 −1.73 −0.52
3.208 2.477 2.528 2.677 2.297 2.790 1.785 1.231 1.674 0.808 2.634 3.528 0.924 2.436 1.381 3.334 1.808 1.614 2.376 1.991 1.163 1.342 2.899 1.847 5.045 1.228 2.487 1.537 2.299 2.054 2.468 1.517 1.969 2.046
−1.382 1.009 −1.733 −1.744 −1.797 −1.990 0.508 −0.352 −0.028 −0.308 −0.857 −1.547 0.085 −1.530 −0.417 −2.020 −0.984 −0.195 −1.006 −1.491 −0.465 1.158 −1.424 −1.204 −1.646 0.162 −0.570 0.116 −1.799 −0.308 −1.333 −0.002 0.129 −0.597
−1.734 −1.259 −1.596 −1.595 −1.318 −1.415 −0.922 −0.468 −0.625 −0.099 −0.853 −2.099 0.309 −1.162 −0.737 −1.967 −0.826 −1.114 −1.266 −0.763 −0.327 −0.705 −1.643 −0.976 −3.014 −0.158 −1.279 0.033 −1.648 −0.485 −1.572 −0.546 −0.224 −0.845
−1.688 −1.670 −1.712 −1.744 −1.585 −1.767 −0.932 −0.241 −0.581 −0.266 −1.114 −2.466 0.062 −1.429 −0.632 −2.385 −1.018 −1.114 −1.334 −1.059 −0.515 −0.842 −1.424 −1.204 −3.495 −0.534 −1.987 −0.485 −1.799 −0.289 −1.410 −0.710 −0.669 −1.125
avg fraction of improvement
71.32
5.20 34/34
2.16
−0.752 27/34
−1.026 32/34
−1.211 33/34
targets
1.65 30/34
0.94 27/34
52.10
6.17 31/34
2.37 28/34
1.12 26/34
a
Comparison also made with top ranked models (structures from CASP evaluation site; each top ranked model represents the one with the highest GDT_TS score among all models listed in the target evaluation table) and models generated by our CASP11 protocol. ΔScore = Scorerefined − Scoreinitial. For ΔGDT_TS and ΔGDT_HA, positive value means improvement; for ΔMolProbity negative value means improvement.
The most successfully refined regions are mostly identified as well-defined α-helices and β-strands. For example, in TR769, there are 4 β-strands and two relatively long helices. After refinement, the N-terminal part (a β-sheet and an α-helix) moved closer to the native structure (see Figure 2). The rest of the protein was more or less unchanged. In terms of GDT_HA, the new protocol produced better structures for TR856 (see Figure 3) than the top ranked model. For TR274 (see Figure 4) even though all CASP11 prediction groups failed to improve the GDT_HA score, MD0 did not suffer from such failure. Compared to TR769, the structure of TR274 is much more complex; there are many long loops, mixed with several strands of various lengths. For this difficult target, no CASP11 prediction groups succeeded in improving its quality. Improving the structural quality of this kind of protein model remains as a difficult challenge in computational protein modeling.
improved almost all targets in this metric except for TR776 and TR829. The average ΔMolProbity (−1.03) was slightly better than the averaged result of top ranked models (0.75 MolProbity improvement). Using MD0, the average GDT_TS score improved by 1.65, and 30 out of the 34 targets were improved. Similarly, the average GDT_HA score improved by 2.37 and 28 out of the 34 targets were improved. Although the average improvements of GDT_TS and GDT_HA are not as large as compared to those of the top ranked models, MD0 is much more effective than our CASP11 protocol. For target TR217 (see Figure 2), MD0 generated a better quality model than the top ranked model; for target TR783 (see Figure 2), MD0 scored almost 1.0 GDT_HA unit higher than the top ranked model. Finally, for target TR769, MD0 improved GDT_TS by 4.38, close to the improvement of the top ranked model (see Figure 2). M
DOI: 10.1021/acs.jctc.7b00470 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation
Figure 3. Refinement in target TR856. The native reference is shown in dark salmon, the initial model in pale green, and the refined model by the new protocol shown in light blue. Residues 75−79 and 123− 130 (successfully and unsuccessfully refined) are highlighted in saturated colors.
such defects. This is mainly because all our generated models during the MD simulation were subjected to identical backbone restraints, and consequently, the variation in backbone was not large enough to cause any tangled structures. Variation in side chain was relatively larger, but energy minimization was sufficient to eliminate any tangled structures. All of our CASP refinement models are available for inspection from the CASP homepage. As another test to check if our refined model MD0 is tangled or volume compressed, we compared the 1−4 VDW terms of MD0 to that of the energy minimized structure of the final structure of our 5 ns MD simulation. Since no structure averaging was performed in generating the minimized MD structure, if the two VDW terms are similar to each other, it is reasonable to assume that MD0 is not tangled nor volume compressed. We observed that at the very early steps of minimization, large energy drops happened in MD0 structures, typically in the 1−4 VDW energy term. However, as shown in the Supporting Information (Supporting Information Table S16), after the minimization, all the corresponding energy terms between MD0 and the minimized MD structure, including the 1−4 VDW term, are similar to each other. The values of radius of gyration are also very close to each other for MD0 and the minimized-MD structure (see Table S17). In summary, our refined model, the MD0 model, does not contain any unphysical shapes of protein such as tangled structures or compressed structures. There are several reasons that helped to keep our model “in good shape”. First, the initial model to start with was already of relative good quality. Second, even with relatively weak constraints applied, the overall shape of generated MD models was preserved during the entire simulation. Third, not too long MD simulation time prevented the structure from moving too far away from the initial structure. Fourth, the energy minimization procedure took care of potentially unphysical side chains of the averaged structure.
Figure 2. Successful refinement in targets TR217, TR783, and TR769. The native reference is shown in dark salmon, the initial model in pale green, and the refined model by the new protocol shown in light blue. Successful refined regions are highlighted in saturated colors.
Are the Minimized Average Structures Realistic? As discussed in above, the MolProbity score of our refined structure was consistently improved over the given model, and this shows that the local quality of our refined structure (such as the clash-score, side chain rotamer outlier and the percentage of backbone Ramachandran outliers) was improved over the given model. However, when averaging multiple structures, it is always possible that one may end up with a tangled structure that cannot be resolved by simple energy minimization. Because of this concern, all of our refined structures were visually inspected, and our models did not get inflicted by any
■
CONCLUSIONS In summary, we tried to improve Princeton_Tigress MD-only protocol as well as our CASP11 protocol by analyzing the N
DOI: 10.1021/acs.jctc.7b00470 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation
Figure 4. Refinement in target TR274. The native reference is shown in dark salmon, the initial model in pale green, and the refined model by the new protocol in light blue. Residues 185−216, 305−332, and 343−368 (long strands) are highlighted in saturated colors.
effects of distance restraints, solvent model, and trajectory averaging. Due to the enormous computational resources required, this could not be performed in a fully systematic manner by investigating the effect of each component one by one, which remains as a challenge for protein structure refinement. Here, we propose a simple and efficient protein structure refinement protocol, CASP12 protocol, where energy minimizations and a series of short MD simulations of 5 ns (two 0.5 ns equilibration simulations and one 4 ns production simulation) are applied with weak positional and distance restraints using explicit water. See Figure 1 and Table 1 for more details. Distance restraints were applied only on intrastrand Cα atom pairs. The final refined structure was first averaged over 4000 structures from the entire production run of 4 ns, and then energy minimized. When tested on 54 CASP8−10 targets and 34 CASP11 targets, improvement of a given protein 3D model was observed in a consistent manner. The average improvement of the GDT_TS score was 1.14 for 54 CASP8−10 targets, which is better than the MD-only version of Princeton_Tigress with the average gain of 0.40 GDT_TS score and our CASP11 protocol with the average gain of 0.36 GDT_TS score. For 34
CASP11 refinement targets, the average improvements of GDT_TS, GDT_HA, and MolProbity were, respectively, 1.65, 2.37, and 1.03. Our method is an improvement over existing methods in two ways. First, our protocol requires only a fraction (a few thousandths) of runtime compared to that of Mirjalili and Feig (ref 17); Second, our procedure does not depend on the target chain length or the model quality of a given model, which should be contrasted to Princeton_Tigress of ref 19. Unfortunately, we have not yet identified a better metric or a clustering method that can guide us to select a better model than the average model by our protocol, which remains as a challenge for protein structure refinement. While development of new sampling methods or scoring techniques is certainly one way to pursue refinement, we show that consistent refinement can be achieved by simple and fast application of MD simulations with energy minimization.
■
ASSOCIATED CONTENT
S Supporting Information *
The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jctc.7b00470. O
DOI: 10.1021/acs.jctc.7b00470 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation
■
(7) Bhattacharya, D.; Cheng, J. 3Drefine: consistent protein structure refinement by optimizing hydrogen bonding network and atomic-level energy minimization. Proteins: Struct., Funct., Genet. 2013, 81, 119−31. (8) Heo, L.; Park, H.; Seok, C. GalaxyRefine: Protein structure refinement driven by side-chain repacking. Nucleic Acids Res. 2013, 41, W384−8. (9) Park, H.; Seok, C. Refinement of unreliable local regions in template-based protein models. Proteins: Struct., Funct., Genet. 2012, 80, 1974−86. (10) Park, H.; Ko, J.; Joo, K.; Lee, J.; Seok, C.; Lee, J. Refinement of protein termini in template-based modeling using conformational space annealing. Proteins: Struct., Funct., Genet. 2011, 79, 2725−34. (11) Lee, M. R.; Tsai, J.; Baker, D.; Kollman, P. A. Molecular dynamics in the end game of protein structure prediction. J. Mol. Biol. 2001, 313, 417−430. (12) Fan, H.; Mark, A. E. Refinement of homology-based protein structures by molecular dynamics simulation techniques. Protein Sci. 2004, 13, 211−20. (13) Chen, J.; Brooks, C. L., III Can molecular dynamics simulations provide high-resolution refinement of protein structure? Proteins: Struct., Funct., Genet. 2007, 67, 922−930. (14) Chopra, G.; Summa, C. M.; Levitt, M. Solvent dramatically affects protein structure refinement. Proc. Natl. Acad. Sci. U. S. A. 2008, 105, 20239−44. (15) Ishitani, R.; Terada, T.; Shimizu, K. Refinement of comparative models of protein structure by using multicanonical molecular dynamics simulations. Mol. Simul. 2008, 34, 327−336. (16) Raval, A.; Piana, S.; Eastwood, M. P.; Dror, R. O.; Shaw, D. E. Refinement of protein structure homology models via long, all-atom molecular dynamics simulations. Proteins: Struct., Funct., Genet. 2012, 80, 2071−9. (17) Mirjalili, V.; Feig, M. Protein Structure Refinement through Structure Selection and Averaging from Molecular Dynamics Ensembles. J. Chem. Theory Comput. 2013, 9, 1294−1303. (18) Mirjalili, V.; Noyes, K.; Feig, M. Physics-based protein structure refinement through multiple molecular dynamics trajectories and structure averaging. Proteins: Struct., Funct., Genet. 2014, 82 (Suppl 2), 196−207. (19) Khoury, G. A.; Tamamis, P.; Pinnaduwage, N.; Smadbeck, J.; Kieslich, C. A.; Floudas, C. A. Princeton_TIGRESS: Protein geometry refinement using simulations and support vector machines. Proteins: Struct., Funct., Genet. 2014, 82, 794−814. (20) Joo, K.; Joung, I.; Cheng, Q.; Lee, S. J.; Lee, J. Contact Assisted Protein Structure Modeling by Global Optimization in CASP11 Experiments. Proteins: Struct., Funct., Genet. 2016, 84, 189−199. (21) Joung, I.; Lee, S. Y.; Cheng, Q.; Joo, K.; Lee, S. J.; Lee, J.; Kim, J. Y. Template Free Modeling by LEE and LEER in CASP11. Proteins: Struct., Funct., Genet. 2016, 84, 118−130. (22) Joo, K.; Joung, I.; Lee, S. Y.; Kim, J. Y.; Cheng, Q.; Manavalan, B.; et al. Template Based Protein Structure Modeling by Global Optimization in CASP11 Experiments. Proteins: Struct., Funct., Genet. 2016, 84, 221−232. (23) Šali, A.; Blundell, T. L. Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol. 1993, 234, 779−815. (24) Touw, W. G.; Baakman, C.; Black, J.; te Beek, T. A. H.; Krieger, E.; Joosten, R. P.; et al. A series of PDB-related databanks for everyday needs. Nucleic Acids Res. 2015, 43, D364−8. (25) Kabsch, W.; Sander, C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 1983, 22, 2577−637. (26) Case, D.; Berryman, J.; Betz, R.; Cerutti, D.; Cheatham I, T. E.; Darden, T. et al. AMBER 14. University of California, San Francisco, 2014. (27) dos Reis, M. A.; Aparicio, R.; Zhang, Y. Improving Protein Template Recognition by Using Small-Angle X-Ray Scattering Profiles. Biophys. J. 2011, 101, 2770−2781. (28) Nguyen, H.; Roe, D. R.; Simmerling, C. Improved generalized born solvent model parameters for protein simulations. J. Chem. Theory Comput. 2013, 9, 2020−2034.
Quality (in terms of GDT_TS, RMSD, and GDT_HA) of the 5 cluster models generated by averaging entire 4 ns simulation trajectory and energy minimization, generated by averaging the second 2 ns simulation trajectory and energy minimization, generated by averaging the top 20% ProQ2 ranked structures of the entire 4 ns simulation trajectory and energy minimization as well as the model which is generated by averaging the top 40% ProQ2 ranked structures of the second 2 ns simulation trajectory and energy minimization, generated by averaging the top 40%, 20%, and 10% EAMBER + EDFIRE ranked structures of the second 2 ns simulation trajectory and energy minimization, and generated by averaging the top 40%, 20%, and 10% EDFIRE ranked structures of the second 2 ns simulation trajectory and energy minimization, energy decomposition and comparison for MD0 and minimizedMD model for CASP10 targets, radius of gyration comparison of MD0 and minimized-MD model for CASP10 targets, model evaluation results by WHATCHECK program, ΔGDT_TS comparison of MD0 and minimized-MD model for CASP10 targets, more detail about the CASP11 top ranked models, the model side chain quality in terms of χ1 and χ2 accuracy and model quality of the MD0 model in terms of side chain and GDT_TS decomposition detail (PDF)
AUTHOR INFORMATION
Corresponding Author
*E-mail:
[email protected]. ORCID
Jooyoung Lee: 0000-0002-4432-6163 Funding
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government(MEST) (No. 2008-0061987). We would like to thank KIAS Center for Advanced Computation for providing computing resources. This work was supported by the National Institute of Supercomputing and Networking/Korea Institute of Science and Technology Information for supercomputing resources including technical support (KSC-2014-C3-01). Notes
The authors declare no competing financial interest.
■
REFERENCES
(1) Nugent, T.; Cozzetto, D.; Jones, D. T. Evaluation of Predictions in the CASP10 Model Refinement Category. Proteins: Struct., Funct., Genet. 2014, 82 (Suppl 2), 98−111. (2) Moult, J. A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. Curr. Opin. Struct. Biol. 2005, 15, 285− 9. (3) MacCallum, J. L.; Pérez, A.; Schnieders, M. J.; Hua, L.; Jacobson, M. P.; Dill, K. A. Assessment of protein structure refinement in CASP9. Proteins: Struct., Funct., Genet. 2011, 79 (Suppl 10), 74−90. (4) MacCallum, J. L.; Hua, L.; Schnieders, M. J.; Pande, V. S.; Jacobson, M. P.; Dill, K. A. Assessment of the protein-structure refinement category in CASP8. Proteins: Struct., Funct., Genet. 2009, 77 (Suppl 9), 66−80. (5) Zhang, J.; Liang, Y.; Zhang, Y. Atomic-level protein structure refinement using fragment-guided molecular dynamics conformation sampling. Structure 2011, 19, 1784−95. (6) Gniewek, P.; Kolinski, A.; Jernigan, R. L.; Kloczkowski, A. Elastic network normal modes provide a basis for protein structure refinement. J. Chem. Phys. 2012, 136, 195101−195104. P
DOI: 10.1021/acs.jctc.7b00470 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX
Article
Journal of Chemical Theory and Computation (29) Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 1983, 79, 926−935. (30) Joung, I. S.; Cheatham, T. E., 3rd Determination of alkali and halide monovalent ion parameters for use in explicitly solvated biomolecular simulations. J. Phys. Chem. B 2008, 112, 9020−41. (31) Zhang, Y.; Skolnick, J. Scoring function for automated assessment of protein structure template quality. Proteins: Struct., Funct., Genet. 2004, 57, 702−10. (32) Koehl, P.; Levitt, M. A brighter future for protein structure prediction. Nat. Struct. Biol. 1999, 6, 108−111. (33) Roe, D. R.; Cheatham, T. E., III PTRAJ and CPPTRAJ: Software for Processing and Analysis of Molecular Dynamics Trajectory Data. J. Chem. Theory Comput. 2013, 9, 3084−3095. (34) Ray, A.; Lindahl, E.; Wallner, B. Improved model quality assessment using ProQ2. BMC Bioinf. 2012, 13, 224.
Q
DOI: 10.1021/acs.jctc.7b00470 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX