Balancing Simulation Accuracy and Efficiency with the Amber United

Feb 4, 2010 - We have analyzed the quality of a recently proposed Amber united-atom model and its overall efficiency in ab initio folding and thermody...
0 downloads 0 Views 3MB Size
2886

J. Phys. Chem. B 2010, 114, 2886–2893

Balancing Simulation Accuracy and Efficiency with the Amber United Atom Force Field Meng-Juei Hsieh and Ray Luo* Department of Molecular Biology and Biochemistry, UniVersity of California, IrVine, California 92697-3900 ReceiVed: July 15, 2009; ReVised Manuscript ReceiVed: December 16, 2009

We have analyzed the quality of a recently proposed Amber united-atom model and its overall efficiency in ab initio folding and thermodynamic sampling of two stable β-hairpins. It is found that the mean backbone structures are quite consistent between the simulations in the united-atom and its corresponding all-atom models in Amber. More importantly, the simulated β turns are also consistent between the two models. Finally, the chemical shifts on HR are highly consistent between simulations in the two models, although the simulated chemical shifts are lower than experiment, indicating less structured peptides, probably due to the omission of the hydrophobic term in the simulations. More interestingly, the stabilities of both β-hairpins at room temperature are similar to those derived from the NMR measurement, whether the united-atom or the allatom model is used. Detailed analysis shows high percentages of backbone torsion angles within the β region and high percentages of native contacts. Given the reasonable quality of the united-atom model with respect to experimental data, we have further studied the simulation efficiency of the united-atom model over the all-atom model. Our data shows that the united-atom model is a factor of 6-8 faster than the all-atom model as measured with the ab initio first pass folding time for the two tested β-hairpins. Detailed structural analysis shows that all ab initio folded trajectories enter the native basin, whether the united-atom model or the allatom model is used. Finally, we have also studied the simulation efficiency of the united-atom model as measured in terms of how fast thermodynamic convergence can be achieved. It is apparent that the unitedatom simulations reach convergence faster than the all-atom simulations with respect to both mean potential energies and mean native contacts. These findings show that the efficiency of the united-atom model is clearly beyond the per-step dynamics simulation of about 2 over the all-atom model. Thus, reasonable reduction of a protein model can be achieved with improved sampling efficiency while still preserving a high level of accuracy for applications in both ab initio folding and thermodynamic sampling. This study motivates us to develop more simplified protein models with sufficient consistency with the all-atom models for enhanced conformational sampling. 1. Introduction Ab initio protein folding has remained a challenging problem in biophysical chemistry due to the enormous sampling space and the atomistic resolution that are needed for consistent accuracy in theoretical prediction. The sampling space of typical protein domains (aka ∼200 residues) is exponentially large. If each residue can adopt only two possible conformations, a protein domain of 200 residues can adopt a total of 1.6 × 1060 possible conformations, which cannot be adequately sampled at the atomistic resolution within reasonable time. Quite often, accuracy at the atomistic resolution is a must for further functional analyses of proteins. Addressing the challenges in sampling and accuracy requires significant computational resources. Increasing computing power has made it possible to simulate ab initio folding of small proteins,1 but it has remained impractical for typical protein domains. The computational difficulties in ab initio protein folding can be roughly grouped into two categories: overcoming enthalpic barriers (i.e. how to escape from local minima) and overcoming entropic barriers (i.e. how to sample the exponentially large conformational space with reasonable scalability). Most previous developmental efforts have been focused on crossing enthalpic barriers. Monte Carlo-based methods, including simulated annealing,2 and genetic algorithms,3 are widely * Corresponding author. E-mail: [email protected].

used.4-12 This is across different protein models, either knowledgebased or physics-based potentials, although its application to all-atom potentials based on molecular mechanics force fields is challenging. Potential energy surface smoothing is also widely used. Various methods, such as the diffusion equation method,13 Gaussian “coarse-graining”,14 and packet annealing,15,16 have been proposed. Smoothing potential energy surface, in principle, reduces the number of minima, thus reducing the number of energy barriers. With the hope that the global minimum of the deformed potential energy surface shares a similar topology with the global minimum of the original surface,14 the resulting global minimum is then mapped back to the original surface by a reversing procedure. However, the reversing procedure has been a major challenge in these approaches.17 Generalized-ensemble methods, such as a multicanonical algorithm, simulated tempering, and replica-exchange methods, have recently emerged to be effective conformational sampling methods.18-46 Their applications to ab initio protein folding simulation have also been reported. The multicanonical algorithm relies on the idea of artificially eliminating energy barriers, thereby circumventing the problems associated with the traditional sampling methods. In simulated tempering, temperature becomes a dynamical variable that takes values ranging over a

10.1021/jp906701s  2010 American Chemical Society Published on Web 02/04/2010

Accuracy, Efficiency of Amber UA Force Field definite set. In this method, it is possible to utilize the fact that at higher temperatures, potential energy barriers are effectively lower. Metadynamics methods, such as self-guided Langevin dynamics47 and adaptively biased molecular dynamics,48 have also been used to enhance the sampling efficiency. Many other methods have been proposed to enhance conformational sampling for systems of high dimensionality. It seems that overcoming entropic barriers is much harder without sacrificing the resolution of protein models. Previous approaches can be roughly grouped into two categories. The first strategy utilizes discrete conformational space, such as the use of lattice models9,49,50 or discrete torsion-angle values in Monte Carlo sampling,7 to overcome entropic barriers. The second strategy aims at reducing the dimensionality of sampling space; this leads to reduced protein models at residue-level representations. The applications of reduced protein models to ab initio protein structure predictions are very encouraging;8,10 These models have also been used to understand the physical principles in protein folding.51-59 The reduced models can use only one particle per residue with either protein-specific Goj potentials derived from specific protein structures or statistical potentials derived from a database of high-quality native structures.60,61 More complex reduced protein models represent each residue by more than one particle.62-67 Compared with the minimal models with one particle per residue, these reduced models improve the predictive power with a better description of the side-chain geometry and interactions. These models also perform gracefully well when combined with postrefinement using other atomistic models68-70 in ab initio protein folding simulations. Nevertheless, use of reduced models at lower resolutions apparently results in low-resolution theoretical predictions. Similarly, use of discrete space results in limited coverage of sampling space that in turn also leads to lowresolution predictions. Given these limitations, new strategies, such as resolution or Hamiltonian exchange,38-45 can be utilized to couple high-resolution all-atom models for accuracy and lowresolution reduced models to enhancing sampling efficiency without sacrificing the much needed all-atom accuracy during simulations. The reduction of model complexity can also be achieved by using implicit solvents because the bottleneck of all-atom simulations lies in the expensive solvent-solvent interactions, although there are still issues remaining in their development, especially when they are compared with explicit solvents.71,72 With the growing popularity of implicit solvents in protein simulations, the advantages of united-atom models become apparent because these models can be used to achieve modest reduction of complexity without much sacrifice of atomistic accuracy. Indeed, it is subject to debate whether it is necessary to represent hydrogen atoms in atomistic simulations when we have given up explicit representation of solvent molecules. The idea of using united-atom models for efficient simulations goes back to the 1970s when Dunfield et al. developed the UNICEPP force field.73 In UNICEPP, nonpolar hydrogens are not represented explicitly, but are included implicitly by representing nonpolar carbons and their bonded hydrogens as single particles.73 Compared with all-atom models, the advantages in using united-atom models are apparent even if only raw efficiency gain in simulations is considered. First, they can significantly reduce the size of most problems, since roughly half of the atoms in biological or other organic macromolecules are hydrogens. Thus, there are fewer nonbonded interactions and internal degrees of freedom in united-atom models. Second,

J. Phys. Chem. B, Vol. 114, No. 8, 2010 2887 larger dynamics integration step sizes can be used by not including hydrogens, since their small mass requires a smaller time step for accurate integration. However, an often overlooked and more important advantage in adopting united-atom models is the efficiency gain in conformational sampling. First, the simpler energy model can reduce the noise of the potential energy (Ep) landscape. Second, with fewer degrees of freedom (N), the total energy (E) fluctuation is also higher, since σE/E is on the order of N-1/2.74 The reduced model may provide just enough fluctuation to overcome potential energy barriers between local minima. This is an added benefit for conformational sampling in molecular dynamics-based methods. It should be pointed out that these advantages become less apparent when the systems are solvated in explicit solvent. Earlier comparisons between the all-atom and the united-atom simulations show that the united-atom model is a satisfactory representation of internal vibrations and bulk properties of small molecules and short peptides.73,75 However, limitations were also revealed in previous studies:75 (1) explicit representation of hydrogens was found to be necessary for accurate treatment of hydrogen bonding; (2) π-stacking could not be represented without including hydrogens in aromatic groups explicitly; and (3) dipole and quadrupole moments were found inaccurate when uniting hydrogens with polar heavy atoms. New approaches were found to overcome the limitations of united-atom models. For example, only aliphatic hydrogens, which are not significantly charged and do not participate in hydrogen bonds, are represented as united atoms, whereas other hydrogens are represented explicitly. In this way, the limitations of the unitedatom model are partially mitigated while preserving most of the benefits of the united-atom model. Of course, larger dynamics time steps can no longer be used due to the use of polar and aromatic hydrogens. However, with increasing computing power, a factor of about 2 saving from using a larger time step becomes less important. Recently a new united-atom force field, termed “ff03ua” in the Amber package,76,77 was reported to achieve a high-level agreement with its all-atom counterpart, “ff03”,78 in both structures and dynamics for tested proteins. A major difference of ff03ua with earlier united-atom models lies in its all-atom representation of the protein main chain. In addition, its parameter development was tightly coupled with that of its allatom counterpart, ff03. Our previous study showed that the new united-atom model was more efficient than the ff03 all-atom force field for the tested system of ALA18 in the distancedependent dielectric, which is known to fold into a stable helical structure.76 In this study, we have investigated the overall simulation efficiency of ff03ua for ab initio folding simulations of the more challenging β-hairpins and in the more realistic generalized-Born implicit solvent treatment. 2. Methods 2.1. Model Molecules. To examine the overall sampling efficiency of the new united-atom (UA) model over the allatom (AA) model, a set of model molecules was used as benchmarks. Due to the number of independent trajectories needed for a meaningful comparison and our limited computing resources, we chose two short but stable β-hairpin peptides designed for protein folding studies. These are HP5w4 and HP5F.79 Their sequences are shown in Table 1. Both the UA and the AA models for the two β-hairpins were built with the LEaP program in Amber 9.77 All initial structures were relaxed with a brief steepest descent minimization of 1000 steps. 2.2. Simulation Methods. The generalized-Born implicit solvent80 with the default options in Amber was used to treat

2888

J. Phys. Chem. B, Vol. 114, No. 8, 2010

TABLE 1: Sequences of the Two Tested β-Hairpin Peptides HP5w4 and HP5F molecule

sequence

HP5w4 HP5F

KKWTWNPATGKWTWQE KKYTWNPATGKFTVQE

polar solvation. Our previous analysis of solvent models shows that the classical SA term of nonpolar solvation overstabilizes the hairpin in both GB and PB solvents;81 thus, the nonpolar solvent accessible surface area term was turned off. In addition, it is highly inaccurate when compared with the TIP3P solvent.72 All nonbonded interactions were computed without cutoff. SHAKE82 was used to constrain bonds containing hydrogen atoms. The SANDER program in Amber 9 was used to perform Langevin dynamics simulation at 300 K with a friction constant of γ ) 1 ps-1. Four sets of trajectories at 300 K and one set of trajectories at 280 K were collected for each combination of the two tested peptides and the two models. The first set, termed the “native” set, consists of 30 independent trajectories (with different random seeds for initial velocity assignments) starting from the native conformation. The native set was used to analyze the folded populations of the peptides and was also used as reference for the following ab initio simulations. Each trajectory was run for at least 200 ns or until convergence of chemical shift deviation was reached, as will be discussed below. The second set, termed the “ab initio” set, consists of 20 independent trajectories with randomized initial structures and different random seeds for up to 600 ns. The ab initio set was used to study the ab initio folding efficiency. The randomized initial structures for the ab initio set were generated from the saved snapshots (every 5 ps) of a high-temperature Langevin dynamics simulation at 600 K from the fully extended all-trans conformation. The third and fourth sets, termed the “ergo1” set and “ergo2” set, respectively, were used for studying the thermodynamic sampling efficiency, as will be discussed below. Each set consists of 100 independent trajectories of 20 ns. The randomized initial structures for the ergo1 set were generated from the saved snapshots (every 5 ps) of a high-temperature Langevin dynamics simulation at 600 K from the native conformation. The randomized initial structures for the ergo2 set were generated from a high-temperature Langevin dynamics simulation from the fully extended all-trans conformation. Finally the fifth set, termed “native280k” set, were collected at 280 K. It consists of 40 independent trajectories and is set up in the same way as the native set. Note the number of trajectories was increased to ensure good convergence at the lower simulation temperature. The native280k set was used as a reference to analyze the folded populations of the peptides at 300 K and the native structures, as will be discussed below. Each trajectory was run for at least 200 ns or until convergence of chemical shift deviation was reached, as will be discussed below. 2.3. Native State Analysis. The quality of the UA model was first studied by comparing the native structures simulated in the UA and AA models. All structural analyses were conducted on the native280k set simulated at 280 K, which gives more ordered structures, although there is no NMR structure available for either peptide. Native structural analysis consists of a backbone analysis of the mean structures from the UA and AA simulations and a more detailed β-turn analysis and classification based on the main chain φ/ψ angles.83 In addition,

Hsieh and Luo we compared computed and measured chemical shifts on all main chain HR protons for both tested peptides. The quality of the UA model was then studied by comparing the populations of folded conformations at 300 K from both simulation and experiment. In the NMR experiment, the folded population at a given temperature (300 K) was estimated by assuming a linear relationship between the folded population and the chemical shift deviations (CSD).79 Thus, if we assume that a peptide is ∼100% folded and its CSD reaches maximum at a reference low temperature (280 K),79 its folded population at 300 K can be estimated by the ratio between the CSD at 300 K and that at 280 K.79 Thus, to compute the folded population for a peptide, two CSDs are needed, one at 280 K and one at 300 K. The CSD at 300 K was computed using the native set at 300 K described above. The CSD at 280 K was computed using the native280k set at 280 K. All trajectories were continued until the CSD is converged, as described below. In this study, we followed the strategy used in the NMR experiment in selecting chemical shifts for the analysis of folded populations of both peptides.79 That is, to select those chemical shifts with the most significant CSDs on the beta strands in the reference simulations at 280 K. Specifically, we have selected those HR chemical shifts with the most significant CSDs for at least three out of the four simulations (two peptides by two models). This leads to the choice of HR chemical shifts at residues 5, 6, 13, and 14. Specifically, the sum of absolute CSD values at the four sites, ∑|CSD|, were used in the folded population estimation for each peptide: 100 × ∑|CSD|300K/ ∑|CSD|280K. Finally, simulated chemical shifts were computed with the SHIFTS84 program by Case and co-workers (http:// casegroup.rutgers.edu/qshifts/qshifts.htm). The CSDb database by Andersen and co-workers (http://andersenlab.chem.washington.edu/CSDb/) was then used to estimate CSD as in the NMR experiment.85 The convergence of CSD is not trivial, even if 30-40 independent trajectories were used for each peptide/each model. It was found that at least 200 ns per trajectory was necessary to observe converged ∑|CSD| for the tested peptides in both the UA and AA models. Specifically, it took up to 300 ns per trajectory in the AA model to converge. To monitor convergence, the cumulative averages of ∑|CSD| were monitored every 20 ns for the last 120 ns at both 280 and 300 K (Figure 1). The cumulative averages of ∑|CSD| for the last 120 ns were then used to compute the folded population for each peptide/each model. In addition to the analysis of folded populations, we also analyzed the main chain φ/ψ distributions of the intended β-strand portions of the peptides (residues 3-6 and 11-14), native contact distributions, and salt bridge distributions in the 300 K simulations. Native contacts are defined as CR/CR pairwise contacts at least three residues away in sequence and are within the cutoff of 6.5 Å in the mean structure.86 The following native contacts were identified and monitored (residue pairs): 3/13, 3/14, 4/12, 4/13, 5/11, 5/12, 6/10, 6/11, 6/12, 6/9, and 7/10. Since there is only one possible type of salt bridge in the two peptides (i.e., that between residues LYS and GLU) identification of a salt bridge is quite straightforward: the two residues are in a salt bridge if the atomic distance between LYS/ NZ and GLU/CD is within the cutoff of 4.0 Å. 2.4. Ab Initio Folding Analysis. The first-pass folding times were first used to quantify the sampling efficiency of UA and AA models. The first-pass folding time is defined as the simulation time needed to reach the native basin. Three criteria are used to detect whether a trajectory reaches the native basin:

Accuracy, Efficiency of Amber UA Force Field

J. Phys. Chem. B, Vol. 114, No. 8, 2010 2889

Figure 2. Comparison of simulated mean structures in the UA and AA models. Only main chain atoms are shown. The snapshots chosen are closest to the mean structures of the last 40 × 100 ns in the native280k set simulation. The UA models are colored in light orange, and the AA models are colored by atom types.

Figure 1. Top: Cumulative averages of ∑|CSD| monitored every 20 ns for the last 120 ns per trajectory in both the 280 and 300 K simulations starting from the native structure. Forty independent trajectories were used for each peptide/each model at 280 K. Thirty independent trajectories were used for each peptide/each model at 300 K.

(1) the running average potential energy should reach within one standard deviation from the mean potential energy of the reference native set; (2) the backbone rmsd from the mean structure of the reference native set is less than 1 Å; and (3) the peptide has to stay at the native basin so defined in criteria 1 and 2 for at least 5 ns. 2.5. Thermodynamic Convergence Analysis. The sampling efficiency of the UA model can also be measured in terms of how fast convergence of observables can be achieved. Here, cumulative averages of potential energies and native contacts are monitored to see how fast the UA and AA models can converge to the same mean values. It should be pointed out that convergence of cumulative averages is a necessary, but not a sufficient, condition for the ergodic hypothesis to be satisfied.38,87,88 To quantify the convergence efficiency of the UA and AA models on the two tested peptides, we simulated two sets (ergo1 and ergo2) of 100 independent trajectories for each peptide and each model. Since the time-dependent accumulative averages for the two different sets of trajectories, in principle, converge to the same values, a faster convergence would indicate a faster conformational sampling. Specifically, we have monitored the time-dependent relative deviations for both mean potential energies and mean native contacts for both tested peptides in both models to investigate the convergence efficiency of the UA and AA models. 3. Results and Discussion The first goal of this study is to see if the UA model is good enough by comparing it with both the simulation data in the AA model and the experimental data from NMR. Second, we want to know whether the UA model is more efficient than the AA model under identical simulation conditions. We answered the question by kinetic analysis and then by thermodynamic analysis. 3.1. Quality of United-Atom Model. We first analyzed simulated mean structures in the UA and AA models at the low simulation temperature of 280 K. Here, the closest-to-the-

Figure 3. Comparison of β turns in the simulated mean structures in the UA and AA models. The snapshots chosen are closest to the mean structures of the last 40 × 100 ns in the native280k set simulation. The UA models are colored in light orange, and the AA models are colored by atom types.

TABLE 2: β-Turn Types in the Simulated Mean Native Structures at 280 Ka φi+1 ψi+1 φi+2 ψi+2 type

UA HP5w4

AA HP5w4

UA HP5F

AA HP5F

-78.62 -37.20 -87.02 -30.28 I

-73.27 -35.00 -90.27 -17.20 I

-78.77 -37.22 -80.51 -35.23 I

-74.88 -34.54 -97.59 -29.74 I

a The snapshots closest to the mean structures of the last 40 × 100 ns in the native280k set simulation were used in the analysis. The β-turn types were determined by the φ/ψ dihedral angles for residues i + 1 and i + 2, given i to be the first residue of the β-turn and i + 3 to be the last residue of the β-turn.83

mean snapshots in the last 100 ns of the native280k set were used to compare the backbone structures simulated in the UA and the AA models. The superimposed mean structures of the UA and AA models are shown in Figure 2. It was found that the backbone rmsd is 0.37 Å for HP5F and 0.44 Å for HP5w4. Overall, a good agreement in the mean backbone structures can be observed. We next analyzed the simulated β turns in more detail using the same structures shown in Figure 2. Comparison of these snapshots shows that the turn region (P7-G10) between the UA and AA simulation is quite similar (Figure 3). In addition, the β turn types of the tested peptides are “type I” in all simulations (Table 2). This agrees with the prediction based on the knowledge-based β turn potential compiled by Hutchinson and Thornton.83,89 This is also consistent with the template structure from protein G (pdb code: 1pgb) that was used to design both peptides and the related TRPZIP4 peptide (pdb code: 1le3).

2890

J. Phys. Chem. B, Vol. 114, No. 8, 2010

Hsieh and Luo

TABLE 3: Folded Populations of Two Simulated β-Hairpins at 300 K HP5w4 HP5F

UA

AA

NMR

96 ( 4% 89 ( 2%

>99 ( 4% 91 ( 5%

>96% 82 ( 4%

We also compared all HR chemical shifts from simulation and experiment. It is found that the correlations between the UA and AA simulations are excellent: 0.96 for HP5F and 0.94 for HP5w4. The root-mean-square deviations between the UA and the AA simulations are ∼0.12 ppm for both peptides. The correlations between both simulations and experiment are lower but still reasonable: 0.77-0.80 for HP5F and 0.57-0.63 for HP5w4, considering that the chemical shifts are distributed in a very narrow range from 3 to 5 ppm. The root-mean-square deviations between simulation and experiment are 0.31-0.36 ppm for HP5F and 0.49-0.51 ppm for HP5w4. Overall, the chemical shifts are underestimated in simulation, indicating both peptides are less structured in both AA and UA simulations, probably due to the omission of the hydrophobic term in the simulations. Given the overall agreement in the simulated native structures in the UA and AA models, we further compared folded populations (at 300 K) of the two β hairpins in the UA model with those in the AA model and in experiment.79 It can be found that both UA and AA simulations produce percentages of β-hairpin similar to experiment as measured by CSD (Table 3), with HP5W4 being more stable in both simulation and experiment. In addition, the simulations in both the UA and AA models are quite consistent with each other. Next, we analyzed populations of secondary structures, native contacts, and salt bridges in the native set simulation at 300 K. It would be interesting to quantify β strand populations in both the UA and AA models. Unfortunately for β hairpins this short, the conformations are too dynamic for the DSSP algorithm90 or other similar measure to pick up noticeable β strand populations. Thus, we turned to analyze the φ/ψ dihedral angle distributions to study the consistency between the UA and AA models in the simulations of the two peptides. The convergence of the distributions was monitored by checking the timedependent values for every consecutive block of 50 ns up to the 300 ns simulated. Figure 4 shows that both peptides stay in the β region for ∼67% of the time in the UA model and ∼61% in the AA model. It is interesting to note that the UA model favors the β region slightly more than the AA model, even if the backbone charges and torsion angle terms are identical between the two. Nevertheless, the difference is, indeed, very small in terms of the free energy, ∼0.1 kT, which can be easily contributed by the difference in the side chains between the two models. Figure 5 shows that the majority of the populations are with high fractions of native contacts for both peptides in both models. Overall agreement between the UA and AA models can be observed, although difference also exists. For example, HP5w4 appears to have more native contacts in the UA model than in the AA model. The mean native contact populations, 82% in the UA model versus 61%, correspond to a difference of ∼0.3 kT in terms of free energy, which is larger than the difference in the backbone β region distributions. Finally, Figure 6 shows that the salt bridge populations are consistently low in both peptides in both models. 3.2. First-Pass Folding Times. Given the reasonable quality of the UA model with respect to the experimental data, we went ahead to study the efficiency gain of the UA model over the AA model. Here, the first pass folding times for the two peptides

Figure 4. φ/ψ distributions of the eight β-strand residues during the last 30 × 100 ns of the native set at 300 K. The partition of φ/ψ distribution is from Hu et al.91

Figure 5. Distributions of native contact fractions of the last 30 × 100 ns in the native set simulation at 300 K.

Figure 6. Populations of all salt bridges of the last 30 × 100 ns in the native set simulation at 300 K.

were analyzed. As defined in the Methods section, a folding event into the native basin consists of both potential energy falling in one standard deviation from the mean potential energy of the reference native set simulations and main chain rmsd within 1 Å from the mean structure of the reference native set simulations. Figure 7 shows the ab initio simulations of HP5w4 for both UA and AA models for the first 60 ns simulated. It

Accuracy, Efficiency of Amber UA Force Field

J. Phys. Chem. B, Vol. 114, No. 8, 2010 2891

Figure 7. Potential energy and rmsd running averages verses simulation time for the ab initio simulations of HP5w4 at 300 K. In the potential energy plot, magenta lines indicate the averages of the reference simulations, and cyan lines indicate the standard deviations. Magenta lines in the rmsd plot represent the rmsd cutoff for folding events.

Figure 8. Potential energy and rmsd running averages verses simulation time for the ab initio simulations of HP5F at 300 K. In the potential energy plot, magenta lines indicate the averages of the reference simulations, and cyan lines indicate the standard deviations. Magenta lines in the rmsd plot represent the rmsd cutoff for folding events.

can been seen that 6 ab initio folding events out of 20 independent trajectories occur during the 60 ns simulated for the UA model, whereas there are only 3 ab initio folding events out of 20 independent trajectories during the same amount of simulation time. Similarly, for HP5F in Figure 8, there are 6 ab initio folding events up to 60 ns and 4 events in the AA model within the same amount of simulation time. All ab initio folding simulations were continued until the 11th ab initio folding event in UA and AA was reached. Specifically, 200 ns was simulated in UA and 600 ns was simulated in AA for HP5w4, respectively. For HP5F, 140 ns was simulated in UA and 600 ns was simulated in AA, respectively. All available

Figure 9. Representative snapshots for the average folded conformations in ab initio folding simulations in UA (left) and AA (right): a and b are for HP5w4, and c and d are for HP5F. A representative snapshot is defined as the closest snapshot to the mean folded conformation in all ab initio folding trajectories.

ab initio first pass folding times are also listed in Table 4. These data show that the median first pass folding time is 184.15 ns for HP5w4 in UA and 120.95 ns for HP5F in UA. The median first pass folding time is 442.30 ns for HP5w4 in AA and 481.60 ns for HP5F in AA. Thus, the efficiency of the UA model is clearly beyond the per-step dynamics simulation of about 2 over the AA model. We have also analyzed the ab initio folded mean structures for both peptides. The representative snapshots are shown in Figure 9. Apparently, all ab initio folded trajectories indeed enter the native β-hairpin basin, whether the UA model or the AA model is used. The testing data confirms our hypothesis that a reasonable reduction in the protein model can be achieved with noticeable improvement of sampling while still preserving a high level of accuracy for theoretical prediction in ab initio protein folding simulations. 3.3. Convergence of Cumulative Averages. The sampling efficiency of the UA model was also measured in terms of how fast thermodynamic convergence can be achieved. It should be pointed out that ergodic measures were proposed to assess the sampling efficiency.87,88 On the basis of the ergodic theorem, the time average of an observable is equal to the configuration space average for an ergodic system. Usually, in practice, squared variants of the observable between two different simulations can be calculated as ergodic measures.38 However, ergodic measures suitable for protein systems are yet to be developed. Thus, in this study, cumulative averages of potential energies and native contacts from different sets of independent trajectories were analyzed to see how fast the cumulative averages can reach the same values. Specifically, we have monitored the time-dependent relative deviations in cumulative

TABLE 4: Comparison of First Pass Ab Initio Folding Times (ns) between UA and AA Modelsa molecule UA AA UA AA a

HP5w4 HP5w4 HP5F HP5F

1

2

3

4

5

6

7

8

9

10

11

21.8 25.2 4.0 8.7

44.8 38.9 8.5 9.0

59.4 60.5 37.0 39.3

62.5 94.7 38.2 44.4

64.7 125.7 38.2 58.2

73.6 171.6 46.8 79.4

163.4 181.3 69.3 143.5

165.1 185.1 89.9 146.6

172.6 190.0 95.0 244.0

182.0 365.7 102.8 431.3

186.3 518.9 139.1 531.9

A total of 20 independent trajectories for each system in each model were simulated. The mean of the 10th and 11th first pass times of the folding set is defined as the median folding times.

2892

J. Phys. Chem. B, Vol. 114, No. 8, 2010

Hsieh and Luo added benefit for conformational sampling in molecular dynamics-based methods. Our findings of enhanced sampling in both the kinetic and the thermodynamic simulations in the more realistic continuum solvent are consistent with these arguments for higher sampling efficiency. Thus, the efficiency of the Amber UA model is clearly beyond the per-step dynamics simulation of about 2 over its counterpart AA model; it is about 6-8 times faster as measured with the first-pass folding times for the tested peptides. 4. Conclusions

Figure 10. Deviations of cumulative averages between two different sets of independent trajectories versus simulation time for the UA and AA simulations at 300 K. Upper: relative deviation (%) between the cumulative average potential energies of the two different sets. Lower: relative deviation (%) between the cumulative average native contacts of the two different sets.

averages for both tested peptides in both models to investigate their convergence efficiency. Recall that the two sets of 100 independent trajectories were started from two completely different sets of random structures. The time-dependent deviations in cumulative averages between the two sets of trajectories are shown in Figure 10, which clearly shows an earlier convergence in the cumulative averages of native contacts in both peptides when the UA model is used; the relative deviations decrease earlier and stay low longer. Note that the convergence of average potential energies is much faster than that of the average native contacts, with relative deviations approaching zero within just a few nanoseconds of simulations. Thus, the advantage of the UA model over the AA model is not very apparent. For example, the relative deviation comes down first in the AA simulation of HP5w4. However, the deviation goes higher only after 2 ns. In contrast, the relative deviation comes down to zero slower in the UA simulation of HP5w4, but it remains low throughout the rest of the 20 ns. 3.4. Discussion. With the growing popularity of implicit solvents in protein simulations, the advantages of UA models become apparent because such models can be used to achieve modest reduction of complexity without much sacrifice of atomistic accuracy. Indeed, it is subject to debate whether it is necessary to represent hydrogen atoms in atomistic simulations when we have given up explicit representation of solvent molecules. Apparently, UA models significantly reduce the size of most problems; that is, there are fewer nonbonded interactions and internal degrees of freedom in the UA models. More importantly, the efficiency gain in conformational sampling cannot be neglected in the UA models because there are fewer degrees of freedom. Aside from the smaller configurational space, the higher sampling efficiency can be attributed, in part, to the reduced noise in the potential energy landscape. Indeed, it is observed that the σEp/Ep for the two tested peptides is 3.58% in the UA model versus 6.57% for the AA model in the native set simulations at 300 K. The higher sampling efficiency may also be attributed to the higher fluctuation in total energy fluctuation from fewer degrees of freedom, since σE/E is on the order of N-1/2. This may provide just enough fluctuation to overcome potential energy barriers between local minima, an

In this study, we have analyzed the quality of the recently proposed Amber united-atom model and its overall efficiency in ab initio folding of two β-hairpins. It is found that the mean backbone structures are quite consistent between the unitedatom and the all-atom models. More importantly, the simulated β turns are also consistent between the two models. Finally, the chemical shifts on HR are highly consistent between simulations in the two models, although the simulated chemical shifts are lower than experiment, indicating less structured peptides, probably due to the omission of the hydrophobic term in the simulations. More interestingly, the percentages of both β-hairpins at room temperature are similar to those derived from measured CSD data, whether the united-atom or the all-atom model is used. Detailed analysis of simulated conformations shows high percentages of backbone torsion angles within the β region and high percentages of native contacts. Interestingly, all salt bridges are populated with low occupancy. Nevertheless, it is still apparent that salt bridge populations are consistent between the united-atom and all-atom models. Given the reasonable quality of the united-atom model with respect to both the all-atom model and experiment, we studied the efficiency gain of the united-atom model over the all-atom model. Analysis of the ab initio first pass folding time data shows that the median first pass folding time is 184.15 ns for HP5w4 and 120.95 ns for HP5F in the united-atom model, whereas the median first pass folding time is 442.30 ns for HP5w4 and 481.60 ns for HP5F in the all-atom model. Detailed structural analysis shows that all ab initio folded trajectories indeed enter the native β-hairpin basin, whether the united-atom model or the all-atom model is used. Finally, we have also studied the sampling efficiency of the united-atom model as measured in terms of how fast thermodynamic convergence can be achieved. It is apparent that the united-atom simulations may reach convergence faster with respect to both cumulative averages of potential energies and native contacts for both tested peptides. The testing data confirm our hypothesis that reasonable reduction of a protein model can be achieved with noticeable improvement of sampling while still preserving a high level of accuracy for theoretical prediction in ab initio protein folding simulations. This study motivates us to develop more simplified protein models to further enhance conformational sampling while still maintaining sufficient accuracy in ab initio protein folding simulations. Acknowledgment. This work is supported in part by NIH (GM069620 and GM079383). References and Notes (1) Duan, Y.; Kollman, P. A. Science 1998, 282, 740–744. (2) Kirkpatrick, S.; Gelatt, C. D.; Vecchi, M. P. Science 1983, 220, 671–680. (3) Holland, J. H. SIAM J. Comput. 1973, 2, 88–105.

Accuracy, Efficiency of Amber UA Force Field (4) Pedersen, J. T.; Moult, J. Curr. Opin. Struct. Biol. 1996, 6, 227– 231. (5) Lomize, A. L.; Pogozheva, I. D.; Mosberg, H. I. Proteins: Struct., Funct., Genet. 1999, 199–203. (6) Osguthorpe, D. J. Proteins: Struct., Funct., Genet. 1999, 186–193. (7) Samudrala, R.; Xia, Y.; Huang, E.; Levitt, M. Proteins: Struct., Funct., Genet. 1999, 194–198. (8) Scheraga, H. A.; Lee, J.; Pillardy, J.; Ye, Y. J.; Liwo, A.; Ripoll, D. J. Global Optimization 1999, 15, 235–260. (9) Skolnick, J.; Kolinski, A. Comput. Sci. Eng. 2001, 3, 40–50. (10) Standley, D. M.; Eyrich, V. A.; An, Y. L.; Pincus, D. L.; Gunn, J. R.; Friesner, R. A. Proteins: Struct., Funct., Genet. 2001, 133–139. (11) Baker, D. Abstr. Papers Am. Chem. Soc. 2001, 221, U432–U432. (12) Jones, D. T. Proteins: Struct., Funct., Genet. 2001, 127–132. (13) Piela, L.; Kostrowicki, J.; Scheraga, H. A. J. Phys. Chem. 1989, 93, 3339–3346. (14) Stillinger, F. H.; Stillinger, D. K. J. Chem. Phys. 1990, 93, 6106– 6107. (15) Church, B. W.; Oresic, M.; Shalloway, D. DIMACS Series in Discrete Mathematics and Theoretical Computer Science; American Mathematical Society: Providence, RI, 1996; Vol. 23; pp 41-64. (16) Shalloway, D. In Recent adVances in global optimization; Floudas, C. A., Pardalos, P. M., Eds.; Princeton University Press: Princeton, NJ, 1992; p 433-477. (17) Wales, D. J.; Scheraga, H. A. Science 1999, 285, 1368–1372. (18) Hansmann, U. H. E.; Okamoto, Y. Annual ReView Computational Physics; World Scientific: Singapore, 1999; Vol. 6; pp 129-157. (19) Okamoto, Y. J. Mol. Graphics. Model. 2004, 22, 425–439. (20) Mitsutake, A.; Sugita, Y.; Okamoto, Y. Biopolymers 2001, 60, 96– 123. (21) Yang, W.; Nymeyer, H.; Zhou, H. X.; Berg, B.; Bru¨schweiler, R. J. Comput. Chem. 2008, 29, 668–672. (22) Berg, B. A.; Neuhaus, T. Phys. Lett. B 1991, 267, 249–253. (23) Berg, B. A.; Neuhaus, T. Phys. ReV. Lett. 1992, 68, 9–12. (24) Nadler, W.; Hansmann, U. H. E. Phys. ReV. E 2007, 75, 026109. (25) Lyubartsev, A. P.; Martsinovski, A. A.; Shevkunov, S. V.; Vorontsovvelyaminov, P. N. J. Chem. Phys. 1992, 96, 1776–1783. (26) Marinari, E.; Parisi, G. Europhys. Lett. 1992, 19, 451–458. (27) Irba¨ck, A.; Potthast, F. J. Chem. Phys. 1995, 103, 10298–10305. (28) Hansmann, U. H. E.; Okamoto, Y. Phys. ReV. E 1996, 54, 5863– 5865. (29) Hansmann, U. H. E.; Okamoto, Y. J. Comput. Chem. 1997, 18, 920–933. (30) Marinari, E.; Parisi, G.; Ruiz-Lorenzo, J. J. Spin Glasses and Random Fields; World Scientific: Singapore, 1998; pp 59-98. (31) Irba¨ck, A.; Sandelin, E. J. Chem. Phys. 1999, 110, 12256–12262. (32) Swendsen, R. H.; Wang, J. S. Phys. ReV. Lett. 1986, 57, 2607– 2609. (33) Geyer, C. J. Stat. Sci. 1992, 7, 473–483. (34) Hukushima, K.; Nemoto, K. J. Phys. Soc. Jpn. 1996, 65, 1604– 1608. (35) Hansmann, U. H. E. Chem. Phys. Lett. 1997, 281, 140–150. (36) Hansmann, U. H. E.; Okamoto, Y. Curr. Opin. Struct. Biol. 1999, 9, 177–183. (37) Sanbonmatsu, K. Y.; Garcı´a, A. E. Proteins 2002, 46, 225–234. (38) Lwin, T. Z.; Luo, R. J. Chem. Phys. 2005, 123, 194904–194910. (39) Lyman, E.; Ytreberg, F. M.; Zuckerman, D. M. Phys. ReV. Lett. 2006, 96, 028105. (40) Li, H.; Yang, W. J. Chem. Phys. 2007, 126, 114104. (41) Liu, P.; Voth, G. A. J. Chem. Phys. 2007, 126, 045106. (42) Mu, Y.; Yang, Y.; Xu, W. J. Chem. Phys. 2007, 127, 384119. (43) Bandyopadhyay, P. J. Chem. Phys. 2008, 128, 134103. (44) Christen, M.; van Gunsteren, W. F. J. Comput. Chem. 2008, 29, 157–166. (45) Li, W.; Takada, S. J. Chem. Phys. 2009, 130, 214108. (46) Zheng, L. Q.; Chen, M. G.; Yang, W. Proc. Natl. Acad. Sci. U.S.A. 2008, 105, 20227–20232. (47) Wu, X. W.; Brooks, B. R. Chem. Phys. Lett. 2003, 381, 512–518. (48) Babin, V.; Roland, C.; Sagui, C. J. Chem. Phys. 2008, 128, 134101. (49) Skolnick, J.; Kolinski, A. Monte Carlo Methods Chem. Phys. 1999, 105, 203–242. (50) Kolinski, A.; Galazka, W.; Skolnick, J. J. Chem. Phys. 1998, 108, 2608–2617. (51) Goj, N. Annu. ReV. Biophys. Bioeng. 1983, 12, 183–210.

J. Phys. Chem. B, Vol. 114, No. 8, 2010 2893 (52) Wolynes, P. G.; Onuchic, J. N.; Thirumalai, D. Science 1995, 267, 1619–1620. (53) Onuchic, J. N.; Wolynes, P. G.; Lutheyschulten, Z.; Socci, N. D. Proc. Natl. Acad. Sci. U.S.A. 1995, 92, 3626–3630. (54) Karplus, M.; Sali, A. Curr. Opin. Struct. Biol. 1995, 5, 58–73. (55) Dill, K. A.; Chan, H. S. Nat. Struct. Biol. 1997, 4, 10–19. (56) Bryngelson, J. D.; Onuchic, J. N.; Socci, N. D.; Wolynes, P. G. Proteins: Struct., Funct., Genet. 1995, 21, 167–195. (57) Dill, K. A.; Bromberg, S.; Yue, K. Z.; Fiebig, K. M.; Yee, D. P.; Thomas, P. D.; Chan, H. S. Protein Sci. 1995, 4, 561–602. (58) Li, A. J.; Daggett, V. J. Mol. Biol. 1996, 257, 412–29. (59) Shakhnovich, E. I. Curr. Opin. Struct. Biol. 1997, 7, 29–40. (60) Crippen, G. M.; Ponnuswamy, P. K. J. Comput. Chem. 1987, 8, 972–981. (61) Skolnick, J.; Kolinski, A. J. Mol. Biol. 1990, 212, 787–817. (62) Levitt, M.; Warshel, A. Nature 1975, 253, 694–8. (63) Miyazawa, S.; Jernigan, R. L. Macromolecules 1985, 18, 534–552. (64) Zhou, Y. Q.; Hall, C. K.; Karplus, M. Protein Sci. 1999, 8, 1064– 1074. (65) Munoz, V.; Eaton, W. A. Proc. Natl. Acad. Sci. U.S.A. 1999, 96, 11311–11316. (66) Thirumalai, D.; Klimov, D. K. Curr. Opin. Struct. Biol. 1999, 9, 197–207. (67) Klimov, D. K.; Thirumalai, D. Proc. Natl. Acad. Sci. U.S.A. 2000, 97, 2544–9. (68) Liwo, A.; Oldziej, S.; Pincus, M. R.; Wawak, R. J.; Rackovsky, S.; Scheraga, H. A. J. Comput. Chem. 1997, 18, 849–873. (69) Liwo, A.; Pincus, M. R.; Wawak, R. J.; Rackovsky, S.; Oldziej, S.; Scheraga, H. A. J. Comput. Chem. 1997, 18, 874–887. (70) Liwo, A.; Kazmierkiewicz, R.; Czaplewski, C.; Groth, M.; Oldziej, S.; Wawak, R. J.; Rackovsky, S.; Pincus, M. R.; Scheraga, H. A. J. Comput. Chem. 1998, 19, 259–276. (71) Tan, C.; Yang, L.; Luo, R. J. Phys. Chem. B 2006, 110, 18680– 18687. (72) Tan, C.; Tan, Y. H.; Luo, R. J. Phys. Chem. B 2007, 111, 12263– 12274. (73) Dunfield, L. G.; Burgess, A. W.; Scheraga, H. A. J. Phys. Chem. 1978, 82, 2609–2616. (74) McQuarrie, D. A. Statistical mechanics; University Science Books: Sausalito, CA, 2000, p 62. (75) Brooks, B. R.; Bruccoleri, R. E.; Olafson, B. D.; States, D. J.; Swaminathan, S.; Karplus, M. J. Comput. Chem. 1983, 4, 187–217. (76) Yang, L. J.; Tan, C. H.; Hsieh, M. J.; Wang, J. M.; Duan, Y.; Cieplak, P.; Caldwell, J.; Kollman, P. A.; Luo, R. J. Phys. Chem. B 2006, 110, 13166–13176. (77) Case, D. A.; Cheatham, T. E.; Darden, T.; Gohlke, H.; Luo, R.; Merz, K. M.; Onufriev, A.; Simmerling, C.; Wang, B.; Woods, R. J. J. Comput. Chem. 2005, 26, 1668–1688. (78) Duan, Y.; Wu, C.; Chowdhury, S.; Lee, M. C.; Xiong, G. M.; Zhang, W.; Yang, R.; Cieplak, P.; Luo, R.; Lee, T.; Caldwell, J.; Wang, J. M.; Kollman, P. J. Comput. Chem. 2003, 24, 1999–2012. (79) Fesinmeyer, R. M.; Hudson, F. M.; Andersen, N. H. J. Am. Chem. Soc. 2004, 126, 7238–7243. (80) Onufriev, A.; Bashford, D.; Case, D. A. Proteins: Struct., Funct., Bioinf. 2004, 55, 383–394. (81) Lwin, T. Z.; Zhou, R. H.; Luo, R. J. Chem. Phys. 2006, 124, 034902. (82) Ryckaert, J. P.; Ciccotti, G.; Berendsen, H. J. C. J. Comput. Phys. 1977, 23, 327–341. (83) Hutchinson, E. G.; Thornton, J. M. Protein Sci. 1994, 3, 2207– 2216. (84) Sitkoff, D.; Case, D. A. J. Am. Chem. Soc. 1997, 119, 12262– 12273. (85) Andersen, N. H.; Neidigh, J. W.; Harris, S. M.; Lee, G. M.; Liu, Z. H.; Tong, H. J. Am. Chem. Soc. 1997, 119, 8547–8561. (86) Lwin, T. Z.; Luo, R. Protein Sci. 2006, 15, 2642–2655. (87) Thirumalai, D.; Mountain, R. D. Phys. ReV. A 1990, 42, 4574– 4587. (88) Thirumalai, D.; Mountain, R. D.; Kirkpatrick, T. R. Phys. ReV. A 1989, 39, 3563–3574. (89) Kaur, H.; Raghava, G. P. S. Protein Sci. 2003, 12, 627–634. (90) Kabsch, W.; Sander, C. Biopolymers 1983, 22, 2577–2637. (91) Hu, H.; Elstner, M.; Hermans, J. Proteins: Struct., Funct., Genet. 2003, 50, 451–463.

JP906701S