6260
J. Phys. Chem. B 1998, 102, 6260-6272
Use of Multiple Molecular Dynamics Trajectories To Study Biomolecules in Solution: The YTGP Peptide Graham A. Worth*,† Theoretische Chemie, Physikalisch-Chemisches Institut, UniVersita¨ t Heidelberg, INF 253, 69120 Heidelberg, Germany
Frederico Nardi and Rebecca C. Wade*,‡ European Molecular Biology Laboratory, Meyerhofstrasse 1, 69012 Heidelberg, Germany ReceiVed: January 23, 1998
NMR studies of the YTGP tetrapeptide in aqueous solution show a large structural shift on the glycine peptide proton, indicating that the peptide adopts structures in which the aromatic ring lies close to this proton in an aromatic-(i+2) amine interaction (Kemmink and Creighton, J. Mol. Biol. 1993, 234, 861). Here, molecular dynamics simulations are used to obtain details at the atomic level of the structural properties of this system. To maximize the volume of conformational space searched, 13 separate trajectories of 300 ps were calculated at 300 K, with starting structures chosen from a conformational search, protein crystal structures, and modeling. Analysis shows that together the trajectories sample all the important regions of conformational space. Structures from the trajectories were clustered into conformational states, defined as regions in coordinate space around maxima in the probability distribution. Nine conformational states with significant populations were identified in which the glycine peptide proton has a large structural NMR shift. The relative importance of the trajectories for the system were assigned using energetic weighting factors, and the four most important trajectories were then extended to 1 ns. The structural shift spectra calculated from the significantly populated conformational states with an aromatic-(i+2) amine interaction are in good agreement with the experimental spectrum. The analysis shows that the potential energy surface of the system is complicated and divided into two separate regions at this temperature.
Introduction Classical molecular dynamics (MD) simulations are a useful tool to provide an atomic level description of experimental phenomena for a wide range of chemical and biochemical systems. For the study of the folding of peptides, however, the method suffers from the long time scale of the process; many nanoseconds of simulation are needed.1-6 A major problem is that in a complex molecular system there are often energy barriers in phase space between a randomly chosen starting structure and the native fold, and it can take a long time for these to be crossed. This problem is further complicated when many conformations are accessible at the temperature of interest. To obtain reliable populations, more powerful methods, such as umbrella sampling, may be needed to calculate the free energy differences between conformations.7-9 Standard MD can act as a precursor to such a study, identifying the major conformations and the properties of the potential energy surface. The tetrapeptide with the sequence YTGP is known to be structured in aqueous solution,10,11 and in this paper we use MD to find possible geometries for the folded state. To speed up the search for the desired, folded, conformations, we use a set of trajectories starting in different regions of conformational space. Previous uses of multiple MD trajectories differ from this approach, taking the same starting structure with different * Corresponding authors. † Fax No.: +49-6221-545221. E-mail:
[email protected]. ‡ Fax No.: +49-6221-387517. E-mail:
[email protected].
seed velocities.12,13 As the results show however, the use of different starting structures is important as, over the length of the trajectories (up to 1 ns after equilibration), the conformational space for this molecule has distinct regions; that is, the conformations sampled by a finite trajectory are dependent on the initial structure. As the trajectories do not travel through the same parts of conformational space, a problem is then how to decide which trajectories are important. Here we introduce simple weighting factors, based on the partition function for a region of phase space, to estimate the relative importance of the trajectories. These weights are given by quantities directly calculated during the simulation and are therefore simple to apply. After the characterization of conformational states, it is also then possible to analyze the importance of the sampled conformations. The YTGP sequence is found as residues 10-13 of the bovine pancreatic trypsin inhibitor protein (BPTI) and has been implicated in the folding pathway of this protein. Protein folding is an interesting and very complex process. Despite much work, the pathways followed and the guiding energetic factors are still not fully elucidated.14,15 BPTI has possibly the best studied folding process.16,17 Experimental evidence shows that one possible folding route is the formation of secondary structure in two regions (residues 15-36 and 43-58), which are then mutually stabilized by the formation of a disulfide bridge between residues 30 and 51.18,19 As well as such regions that spontaneously form long-range structure, there appear to be certain peptide sequences that show a limited range of folded
S1089-5647(98)00853-0 CCC: $15.00 © 1998 American Chemical Society Published on Web 07/21/1998
The YTGP Peptide conformations, i.e., local structure, even when isolated in solution as short peptides. These sequences may be important in determining the rate at which the protein folds.11 YTGP is one of these sequences. In both the crystal structure of the protein20 (Brookhaven databank code 4pti) and the structure in aqueous solution from 1H nuclear magnetic resonance (NMR) spectroscopy,21,22 the tyrosine 10 aromatic ring is clearly lying close to the glycine 12 peptide proton. This interaction is also seen in the 30-51 disulfide-bound intermediate,19 in which the first 15 N-terminal residues are otherwise fairly disordered. More surprisingly, the local structure with the tyrosine ring obviously close to the glycine peptide proton is still observed when this region is isolated, either as the first 15 residues of BPTI or as the tetrapeptide YTGP.10 A structured tetrapeptide is unusual in aqueous solution. For example, the related met5-enkephalin peptide (YGGFM) is unstructured.23 The structure of the YTGP peptide has been inferred from NMR spectral measurements. The NMR chemical shifts for the protons in an unstructured peptide are given by the so-called random coil values.24,25 These take account of local effects on the magnetic field experienced by a proton. A deviation from these values indicates the presence of an electron-rich group nearby in space, but not close in sequence. Aromatic rings produce particularly large deviations due to the magnetic field set up by the delocalized electrons. In the YTGP NMR spectrum10 all protons exhibit shifts close to the random coil values, with the exception of the glycine peptide proton. This shift from the expected value25 of 8.21 ppm to 7.00 ppm corresponds to a large so-called structural shift of ∆δ ) -1.21 ppm. A similar structural shift is found for this proton in the NMR spectrum of the protein.22 In the present study, the aim is to identify conformations of the YTGP peptide that are important for the NMR spectrum. The close proximity of an aromatic ring to a proton in a protondonating group, such as an amine group, is found to be common in the database of protein crystal structures.26-30 Molecular mechanics calculations by Levitt and Perutz31 showed that the interaction between an amine group and an aromatic ring has a minimum energy of around -12 kJ mol-1, with the amine bond lying perpendicular to the ring plane and the proton under the center of the ring. The existence of this favorable interaction, known as an aromatic-amine hydrogen bond, is supported by further calculations30,32 and the formation of a complex between ammonia and benzene in the gas phase.33 This would lead one to expect YTGP to adopt a structure with such an interaction between the tyrosine ring and glycine peptide proton, and it is indeed possible to build a structure with a near perpendicular interaction which reproduces the features of the NMR spectrum. The most recent database searches have however shown that interaction geometries with the amine bond parallel to the aromatic ring are more common. As Mitchell et al.30 point out, in this geometry the amine proton is free to make a stronger hydrogen bond to a standard hydrogen bond acceptor, such as a solvent molecule or a peptide carbonyl group. In a previous study,34 we focused on the aromatic-(i+2) amine interaction. Here, as in the YTGP peptide, the amine group is part of the peptide link to the (i+2)th residue, and the aromatic ring is part of the ith residue. For this interaction, the perpendicular interaction geometry of the aromatic-amine hydrogen bond is prevented by the constraints of the intervening bonds. A simple search of the conformational space for the FGG peptide in a model hydrogen-bonding environment was made, from which a set of minimum energy conformations containing this interaction was derived. In agreement with the
J. Phys. Chem. B, Vol. 102, No. 32, 1998 6261
Figure 1. Coordinates used to describe the geometry of an amine group with respect to an aromatic ring.
recent database searches, when a hydrogen bond acceptor other than the aromatic ring is available for the (i+2) peptide proton, these conformations all have a parallel aromatic-amine interaction geometry. The energy stabilizing the conformations is then found to be dominated not by the aromatic-amine interaction but by the packing of the glycine methylene group protons under the ring. As a result, the model structure that reproduces the NMR spectrum, mentioned above, is not predicted to be an energy minimum in aqueous solution. However, none of these minimum energy structures are able to reproduce the NMR spectrum. In all cases the calculated ∆δ for the (i+2) peptide proton is smaller than that observed. Also, as mentioned above, in all these parallel structures the glycine R protons are packed under the ring. In this geometry, the packing is asymmetric, with one proton closer to the ring than the other. This leads not only to deviations from the random coil values but also to a split in the chemical shifts, which is not observed experimentally.10 In the experiment, the peptide is in aqueous solution at 271 K and therefore will not be in a single minimum energy conformation. Simulations that include the effect of temperature, such as MD, are therefore needed to show the population of the various conformational states and to calculate the properties of the structures in the states. In a recent study, van der Spoel et al.35 have done just that, performing one long (1 ns) trajectory for each of several force fields. They found a folded state for only one force field and concluded that the results are sensitive to the force field, and in particular the water model, used. Here, we use multiple trajectories, starting from distinct structures such as the minimum energy structures described above. The various trajectories sample different conformational states, a number of which have an aromatic(i+2) amine interaction. The results over 1 ns thus depend not only on the force field but also on the starting structure. Methods (i) Definition of Coordinate Systems. The geometry of an amine group relative to an aromatic ring can be described by the coordinates shown in Figure 1. R and θ define how far the amino proton is away from the center of the ring. Using an empirical relationship,36 these parameters can be used to estimate the ring shift experienced by the proton:
σring ) 27.0(1 - 3 cos2 θ)/R3
(1)
where R is in Å and σring in ppm. As in our earlier work,34 an aromatic-amine interaction is defined to be present when σring e -0.5 ppm, i.e., a significant shift is experienced. This definition means that if the proton is directly under the ring, cos θ ) 0° and R e 4.7 Å. The angle R defines the orientation of the N-H bond with respect to the plane of the ring. This coordinate is needed for the classification of the type of aromatic-amine interaction. If
6262 J. Phys. Chem. B, Vol. 102, No. 32, 1998
Worth et al. TABLE 1: Energy-Minimized Starting Geometries for 13 Separate Trajectories of the YTGP Peptide in Aqueous Solution The Dihedral Angles Are Those Defined in Figure 2. See Text for Further Details name
Figure 2. Essential dihedral angles defining the conformation of the YTGP peptide. The standard IUPAC nomenclature is used.37
R e 30°, the interaction is classified as parallel, whereas if R > 30°, the interaction is classified as perpendicular. The reason for this distinction is that when a parallel interaction geometry is adopted, the amino proton is able to form a hydrogen bond to a receptor other than the ring, whereas in a perpendicular geometry this is not the case. For the present study, the conformations of the YTGP peptide are best classified by the relative orientation of the tyrosine aromatic ring to the glycine residue. The seven torsion angles thus needed to define a conformation are shown in Figure 2. They are named according to the standard IUPAC definitions37 and are referred to as the “essential torsion angles”. (ii) Molecular Dynamics Trajectories. Starting from distinct conformations of the YTGP peptide, a simulation consisting of 13 separate trajectories was made. In our previous work34 nine minimum energy conformations, labeled 6HB1 to 6HB9, were identified as representing the regions of conformation space with an aromatic-(i+2) amine interaction when the peptide was in a hydrogen-bonding environment; that is, all polar atoms could make hydrogen bonds to the solvent. The name 6HBx was chosen as in the model, 6 hydrogen bonds were possible. These formed nine of the starting structures. The tenth had all φ and ψ angles set at 180° and is termed the “linear” structure. The next structure was formed from residues 10-13 in the BPTI protein crystal structure and is called “P10-13”. Residues 35-38 in the same protein have the sequence YGGC and contain a perpendicular aromatic-(i+2) amine interaction. The geometry of this sequence was therefore taken as a basis for “P35-38”. The final structure, named “NMRmod”, was the model mentioned in the Introduction which reproduces the major features of the NMR spectrum. It has a perpendicular aromaticamine interaction geometry and the two glycine R protons are equidistant away from the aromatic ring. The essential torsion angles for the starting structures are listed in Table 1. The sequence YTGP was set up with the listed geometries using the QUANTA graphics program38 and, as in the NMR experiments, was capped with an acetyl group on the N-terminus and an amine group on the C-terminus. As the 6HBx and P35-38 structures are not based on the YTGP peptide (FGG and YGG, respectively), the threonine residue backbone geometry was relaxed by steepest descent optimization, keeping the tyrosine and glycine residues fixed. Finally, the proline residue was set in a trans conformation (ω ) 180°), and the ψ3 and C-terminus torsion angles were taken from the energy minimum found by a conformational search. The ARGOS program39 was used with the CHARMm22 allatom force field40,41 for the simulation with the following protocol for each trajectory: 1. SolVation. Each starting conformation was solvated in a periodic cubic box of TIP3P water molecules.42 The sides of the box were 41 Å in length, and the minimum solute-solvent
χ21
χ11
ψ1
φ2
ψ2
φ3
ψ3
6HB1 56.7 -170.9 101.9 -71.3 -39.6 -64.8 133.5 6HB2 97.2 59.8 16.2 70.0 43.7 -68.2 148.3 6HB3 109.1 -167.5 102.4 -66.2 -39.6 67.2 -119.8 6HB4 93.9 38.2 15.7 66.5 43.1 49.7 179.8 6HB5 102.3 82.8 -13.9 160.9 -108.1 -65.1 149.9 6HB6 99.8 49.9 45.7 67.7 24.5 66.1 -150.1 6HB7 74.0 178.1 138.0 -159.4 105.8 66.7 -150.0 6HB8 79.3 163.3 140.2 -159.7 118.9 -64.3 121.3 6HB9 68.4 -161.0 103.6 -68.4 -38.2 161.0 178.2 linear 91.5 -120.1 178.2 178.2 178.3 178.2 -179.7 P10-13 83.2 179.8 110.3 -74.0 -36.7 88.7 174.9 P35-38 45.8 173.5 131.8 -81.8 -2.1 100.5 163.1 NMRmod 81.4 167.1 89.8 -124.8 50.5 155.8 135.1
distance was 3 Å. This resulted in 2310-2315 water molecules being placed around each structure. 2. Relaxation of the solVent, keeping the solute fixed. The solvent was energy minimized and then equilibrated at 300 K. Finally the energy of the whole system was minimized. 3. Heating and equilibration of the whole system. The system was equilibrated to 300 K in three stages: 1 ps of NVE at 50 K, 9 ps of NPT to 200 K and 1 atm, 120 ps NPT to 300 K and 1 atm. 4. Simulation. The trajectories were then produced by propagation for 300 ps at 300 K and 1 atm. The solute coordinates were saved every 1 ps, making a total of 3900 conformations for analysis. 5. Extension of important trajectories. The important trajectories were identified using the weighting factors described below. These were then continued from 300 ps to 1 ns under the same conditions. In all calculations, a time step of 2 fs was used, and the bond lengths were constrained using the SHAKE algorithm,43 with a tolerance of 0.001 Å. For the NPT dynamics, the temperature was kept constant using the Berendsen temperature scaling algorithm,44 with a relaxation time of 0.4 ps, and the pressure was kept around 1 atm by scaling the system volume,45 with a relaxation time of 0.5 ps. The nonbonded interactions were treated within a simple 10 Å group-based cutoff, and the pair list was updated every 20 steps. The problems of truncating the nonbonded interactions for the dynamics of peptides in solution are well-known.46-48 and can lead to incorrect dynamics due to the incorrect alignment of water dipoles in the truncated electric field of the solute. However, the solvated peptide systems studied here contained approximately 8000 atoms, and so a nonbonded cutoff was needed to keep the computational resources required to a minimum. The use of a group-based cutoff means that dipoles are not split by the cutoff scheme. Furthermore, not only was the force field used calibrated with a cutoff but studies47 have shown that a 10 Å cutoff leads to reasonable dynamics over a few hundred picoseconds for a peptide in water due to a fortuitous cancellation of errors. As the primary use of the simulation is to search configuration space for suitable conformers rather than to study the detailed dynamics, artifacts due to the truncation are unlikely to be important. (iii) Definition and Calculation of Properties. The chemical shifts for the protons of a peptide structure were calculated using the TOTAL program.49-51 The position of the spectral peak, relative to the random coil chemical shift, is described by
∆δ ) σring + σani + σel
(2)
The YTGP Peptide
J. Phys. Chem. B, Vol. 102, No. 32, 1998 6263
where there are contributions due to aromatic rings (σring), the anisotropy of C-O and C-N bonds (σani), and the electric field from polar groups (σel). The ∆δ are known as structural shifts, as they provide information about changes in the environment of a proton relative to the random coil structures. These contributions are described by geometrical factors and parameters, the latter having been optimized by empirical fitting to protein NMR data. To calculate the aromatic ring contribution, the Haigh-Mallion method52 is used, which is more accurate than the empirical formula, eq 1. For R protons an additional empirical term of -0.65 ppm is also added (for glycine R protons the term is -0.43 ppm) Using this program, the shifts calculated for a set of crystal structures (one of which was BPTI) agreed with the experimental values to an error of less than 0.3 ppm standard deviation.49 To convert ∆δ values to an NMR spectrum, a set of random coil values are needed for each proton type. Various sets have been measured, and the value is known to change slightly depending on the environment. As a standard, the TOTAL program uses the set of Bundi and Wu¨thrich.24 Here we use the more recent set from Wishart et al.25 These were chosen as these authors measured shifts for residues neighboring a proline, and significant differences (on the order of 0.2 ppm) were found compared to other neighboring residues. A different selection of random coil values will of course change the relative differences between the experimental and calculated spectra. Surface-accessible surface areas (SASs) were calculated using the NACCESS program,53 which uses the Lee and Richards algorithm.54 The program divides the surface into areas of polar side chain, nonpolar side chain, and main chain. The main chain area does not include that assigned to the R carbon. The potential energy from the MD simulations can be partitioned as
Utot ) Uss + Usw + Uww
(3)
where Uss is the solute intramolecular (solute-solute) energy, Usw the solute-solvent interaction energy, and Uww the interaction energy between solvent molecules (the solvent-solvent energy). As the various trajectories have slightly different numbers of water molecules, Uww was normalized to 2315 molecules. Another partition that we shall use is to split Usw between the solute and solvent, and so we define an assigned solute energy as
Usolute ) Uss + 0.5Usw
reduced representation of coordinate space, the values of the torsion angles from each structure were first collected into separate histograms with bins of 10° width. The maxima along each dihedral coordinate were then noted. These represent the most populated values of each torsion angle, averaging over the other coordinates. Major density maxima in the full multidimensional space must therefore lie near a combination of these values, which can be used as seed conformations for clustering. Each coordinate set was assigned to the cluster to which it was nearest, according to a criterion of lowest torsional root-mean-square difference (RMSD) with respect to the seed geometry. Each cluster is a conformational state, and the conformation geometry is the average within the cluster. The clustering analysis was repeated to extract different information. First all the structures from all the trajectories were included. This allowed the total conformational space visited to be analyzed, so as to check how well this space was covered and which regions were most populated. The structures saved from each trajectory were then analyzed separately. This enabled dynamic information to be extracted and an idea of the connectivity of phase space to be gained. It also showed how close the trajectories stay to their initial conformation and how quickly the conformational states interconverted. (v) Assigning the Relative Importance of Multiple Trajectories. If a set of trajectories are distinct, it is a problem to know how to compare them. By distinct we mean that the sampling is different: either different conformational states are sampled or else the same states are sampled with different populations. A simple measure of the relative importance of each trajectory can however be obtained by considering how trajectories relate to an ensemble average. Under NPT conditions, assuming PV is constant for all ensemble members, the ensemble average of a property, Φ, which only depends on the spatial coordinates, can be defined by
〈Φ〉 )
Q
(5)
where the integral is over all configurations, and U(q) is the potential energy of a configuration. Here, Q is related to the partition function and, ignoring the standard prefactor, is given by
Q)
(4)
It should be noted that when averages are made of the above properties over a trajectory, 〈∆δ〉 and 〈SAS〉 are averages over the structures saved from each trajectory, whereas the average energies, such as 〈Uss〉 are averages over the whole trajectory, i.e., the energies at each time step. (iv) Conformational Analysis. To analyze the configuration space visited during the trajectories, the saved structures were clustered into conformational states. Various methods have been proposed to assign such states; for example Karpen et al.55 used neural nets to recognize similar structures, and McKelvey et al.56 assigned states according to the Ramachandran plot region occupied. Here we use a simple method, based on the probability density in coordinate space; maxima of density are at the centers of conformational states, minima between the states. Rather than trying to locate the maxima and minima in this multidimensional space, the following procedure was used. Using the essential torsion angles, shown in Figure 2, as a
∫Φ(q) exp(-U(q)/RT) dq
∫exp(-U(q)/RT) dq
(6)
Thus in eqs 5 and 6 the constant factor of the kinetic energy has been divided out. Dividing the coordinate space into distinct sets, {qR}, each of which can then be sampled by different trajectories, the expectation value of Φ can then be written
〈Φ〉 )
1
∑R ∫Φ(q) exp(-U(q)/RT) dqR
Q
(7)
where ∫dqR is the integral over sets of coordinates in the set {qR}, or in ensemble average notation
〈Φ〉 )
QR 〈Φ〉R Q
∑R
(8)
where 〈Φ〉R is the property average over {qR}, and QR is the partition function for the set.
6264 J. Phys. Chem. B, Vol. 102, No. 32, 1998
Worth et al.
The step from eq 5 to eq 7 assumes that all the coordinates are contained once, and only once, in the set of trajectories. Even if only a sample of coordinates are present however, the ratio QR/Q is the relative importance of the trajectory sampling the set {qR}. A problem arises if a coordinate set is repeated in different trajectories, i.e., the trajectories meet in coordinate space. However, given the large number of solute and solvent atoms in this calculation, such a situation is unlikely to occur. Indeed it was found that in the 13 trajectories of YTGP on calculating the RMSD of the essential torsion angles between all the solute structures saved, only three pairs of structures were within an RMSD of 5° from one another, and none coincided. Defining QR by eq 6, then eq 8 can be written
〈Φ〉 )
1
∑R exp(-〈Utot〉R/RT)〈Φ〉R
Q
(9)
TABLE 2: Maxima of Histograms of the Essential Dihedral Angles Found in a Set of 3900 YTGP Conformers in Aqueous Solution Obtained at 1 ps Intervals in 13 Trajectories, Each of 300 ps Length. The Population of the 10° Bin around the Maximum is Given in Parentheses angle χ21 χ11 ψ1 φ2 ψ2 φ3 ψ3
histogram maxima (population) -55 (66) -35 (192) -75 (379) -35 (70) -75 (403)
85 (967) 65 (331) 55 (205) 55 (316) 65 (194) 75 (494)
-175 (701) 125 (335) -155 (361) 165 (300) 175 (120) -175 (806)
interaction energy differences between the trajectories, could be important. Results
which has the simple interpretation of an ensemble of trajectories, each weighted by
weight 1 )
1 exp(-〈Utot〉R/RT) Q
(10)
where Q ) ∑RQR, and the free energy of each trajectory GR is approximated as 〈Utot〉R. A practical problem with this weighting factor is that in a simulation the large number of solvent molecules produces large fluctuations in the solvent-solvent energy. As the solvent is acting as a heat bath, it is reasonable to assume that its average energy term, Uww + 0.5Usw, is in fact a constant. Partitioning the solute-solvent interaction energy evenly between the parts of the system, we then make the approximation that GR ≈ 〈Usolute〉R, where Usolute is as defined above in eq 4. Each trajectory is then weighted by
1 weight 2 ) exp(-〈Usolute〉R/RT) Q
(11)
and as the large fluctuations have now been removed, the average energy should be more reliable. Another approach is to assume that the most important contribution to the free energy is the free energy of solvation. From linear response theory, this can be approximated as Gsolv ≈ 0.5〈Usw,el〉, where Usw,el is the solute-solvent electrostatic interaction energy.57-60 Hence,
1 weight 3 ) exp(-0.5〈Usw,el〉R/RT) Q
(12)
While this term accounts for the free energy of charging a solute in a solvent environment, a refinement for large solutes is the addition of a term to allow for the free energy of cavity formation. An estimate for this term, found appropriate in binding free energy studies, is 0.161〈Esw,lj〉, where Esw,lj is the solute-solvent Lennard-Jones interaction energy.61 A term based on the solvent-accessible surface area (SAS),62 such as the average SAS multiplied by 0.2 kJ mol-1 Å-2, has also been proposed. In the present study, little difference in the results from those using weight 3 was found using either form for the nonpolar cavity term. As seen from the solvent-accessible surface areas in Table 5, the solute-solvent interface is similar for all the trajectories; that is, the average SAS does not vary very much, and the ratio of polar to nonpolar surface area remains similar. For other systems this may not be the case, and then this term, which allows for solvent entropy and
(i) YTGP Conformations Sampled during the Simulation. The values of the essential torsion angles were gathered into histogram bins along each coordinate. The most populated values for these angles, shown by the maxima of the distributions, are listed in Table 2. The values of the χ21 angle, which governs the angle of the plane of the tyrosine aromatic ring to the CR-Cβ bond, can vary only between 0° and 180° due to symmetry. In this range there is a single maximum between 80° and 90°, with approximately 25% of all structures having a value in this range. The distribution of χ11 tyrosine side chain angle values has approximately 3-fold symmetry, as is expected from a free torsion involving tetrahedral carbon atoms. The weakly populated maximum around -55° is important as none of the starting structures contain this angle (see Table 1), and so the trajectories are sampling values of this angle away from the starting geometries. Interestingly this is the most populated tyrosine rotamer found in protein structures.63 It is however rarely observed in this system because, as the tyrosine residue is at the N-terminal position, a torsion angle of χ11 ) -55° would mean that the aromatic side chain is solvated. The distributions for the essential backbone angles, excluding the ψ3 angle, also show three maxima. These are not as symmetrically arranged as for the χ11 angle, due to the steric hindrance experienced when rotating around the backbone bonds. As a result, the distributions reflect the Ramachandran plot. For example, the density maxima for the ψ1 and ψ2 angles both avoid the unfavored region of the Ramachandran plot between -90° and -180°. It can also be seen that the φ3 angle has a higher population of positive values than the φ2 angle, as is expected for a glycine compared to a nonglycine (threonine) residue. In contrast, the distribution of values for the ψ3 angle between the glycine and proline residues only has a single maximum at -175°. This shows that the proline residue makes this region of the peptide more rigid. Along each degree of freedom, the structures sample all of the expected angles. To judge how well the conformational space is covered, it is necessary to extend the analysis to combinations of the torsion angles. After clustering with respect to the threonine φ2 and ψ2 angles, structures are found in all nine possible clusters, the sizes of which are related to the Ramachandran plot. The largest cluster (1251 members) is in the β sheet region, with φ2 ) -155° and ψ2 ) 165°. Five other large clusters (187-843 members) lie in the other favored regions of the plot, including the region for left-handed R helices with φ2 ≈ 60° and ψ2 ≈
The YTGP Peptide
J. Phys. Chem. B, Vol. 102, No. 32, 1998 6265
TABLE 3: Properties of Conformational States with a Large Glycine Peptide Proton Structural Shift Found in a Set of 3900 Structures of YTGP in Aqueous Solution at 300 K Generated as in Table 2 av. interaction geometrye name
ntota
nintb
nperpc
〈∆δ〉d (ppm)
〈R〉 (Å)
〈θ〉 (deg)
〈R〉 (deg)
6HBxf
RMSDg (deg)
ArNH1 ArHN2 ArNH3 ArHN4 ArHN5 ArHN6 ArHN7 ArHN8 ArHN9
268 257 88 75 62 44 43 38 33
115 114 65 40 26 34 11 30 13
8 63 39 14 4 29 1 3 8
-0.60 -0.76 -1.08 -0.69 -0.60 -1.17 -0.59 -0.86 -0.69
3.9 3.6 3.6 4.0 5.0 3.5 5.2 3.9 4.3
36.6 40.0 23.9 30.5 25.9 23.4 29.5 17.4 34.3
18.0 23.1 27.9 26.3 19.4 43.1 17.4 15.6 32.2
6HB2 6HB4 6HB1 6HB8 6HB9 6HB8 6HB3 6HB3 6HB8
23.7 26.0 13.3 48.7 16.4 28.4 20.0 11.4 37.6
a Total number of structures in the conformational state. b Number of structures with an aromatic-(i+2) amine interaction, defined as σ ring < -0.5 ppm in eq 1. c Number of structures with a perpendicular aromatic-(i+2) amine interaction, i.e., R > 30°. d Average structural shift on the glycine peptide proton. e Defined in Figure 1. f Nearest minimum energy structure with an aromatic-(i+2) amine interaction found in a conformational search.34 g RMSD of essential torsion angles (see Figure 2) between the ArHNx conformation and the 6HBx structure.
TABLE 4: Brief Description of the Number of Conformational States Sampled by Each Trajectory and How Long Each Trajectory Sampled Each State starting structure
length of stay in statesc shortd mede longf
ncona
nmoveb
6HB1 6HB2 6HB3 6HB4 6HB5 6HB6 6HB7 6HB8
15 4 27 4 11 8 12 15
81 33 80 23 40 34 70 111
73 23 70 18 31 27 61 108
8 9 10 3 8 6 9 3
0 1 0 2 1 1 0 0
6HB9 linear P10-13 P35-38 NMRmod
14 9 10 4 12
68 63 105 12 100
59 56 100 7 94
9 6 5 3 6
0 1 0 2 0
ArHN states sampledg ArHN1 (89%) ArHN2 (86%)
ArHN4 (16%) ArHN6 (8%) ArHN9 (8%) ArHN8 (7%)
ArHN3 (26%)
a
Number of conformational states sampled along the trajectory. b Number of times the peptide moves into a different conformational state. c Residence time spent in a state during a single sampling along a trajectory. d Number of states with residence times 10 ps and < 50 ps. f Number of states with residence times >50 ps. g Nomenclature as in Table 3. The population of the state is given in parentheses.
60°. In comparison, the three clusters that lie in the unfavored regions are only weakly populated (22-34 members). Including the glycine φ3 angle, all of the possible 27 clusters are populated. Extending the analysis to include the tyrosine ψ1 torsion angle, 58 out of a possible 81 clusters are found; the low population of the unfavored regions of the threonine φ-ψ space is responsible for the missing structures. Finally, examining the clustering around the tyrosine χ11 and ψ1 angles, only eight of the possible nine clusters are populated; the combination (χ11, ψ1) ) (-55°, -35°) is not found. On examination of the structures, the reason appears to be that, as would be expected, the tyrosine side chain is predominantly packed against the peptide chain, and this restricts the possible torsional angle value for each conformation. The 3900 structures were finally assigned into the conformational states; that is, they were clustered around the set of torsion angle combinations. Of the 243 possible states, 78 are populated. The 10 most populated states account for over 65% of the structures. With one exception, the φ2, ψ2 angles of the threonine residue are in either a β sheet or a left-handed helix conformation. While a tetrapeptide is not long enough to
produce a complete structural element, the preference for a β sheet type geometry is not a surprise, as here the backbone is ideally able to make hydrogen bonds with its environment. The left-handed helix geometry is however not usually observed, being a less stable structural element than the right-handed version. (ii) Conformations with an Aromatic-(i+2) Amine Interaction. We need to identify the conformational states of interest, i.e., those sets of structures with a large structural shift, 〈∆δ〉, on the glycine peptide proton. To do this, the TOTAL program was used to calculate the NMR proton shifts for all 3900 structures. These shifts were then averaged over the structures in each conformational state. Nineteen states were found to have values of 〈∆δ〉 < -0.5 ppm. All the histogram maxima torsion angles are represented, with the exception of χ11 ) -55°. Ignoring the conformational states which have 20 members or less, nine states are left. Table 3 shows the population (ntot) of these states, which are labeled ArHNx and collectively known as ArHN conformational states. The value of 〈∆δ〉 for the glycine peptide proton and the average aromatic-amine interaction geometry, as defined in Figure 1, are also listed. It can be seen that some of the states are well populated. In fact ArHN1 and ArHN2 are the third and fourth most populated conformational states of all. One of the states, ArHN6, is interesting as it has a predominantly perpendicular aromatic-amine interaction (〈R〉 ) 43°). The conformations, defined as the average geometry of the structures in the conformational state, can be divided into three groups by the values of the threonine φ2 and ψ2 angles. The first group (ArHN1 and ArHN2) are in the right-handed R helix region, with (φ2, ψ2) ≈ (60°, 60°). The second group (ArHN3, ArHN5, ArHN7, and ArHN8) are in the left-handed R helix region, with (φ2, ψ2) ≈ (-60°, -60°), while the final group (ArHN4, ArHN6, and ArHN9) are in the β sheet region, with (φ2, ψ2) ≈ (-120°, 90°). The essential torsion angles for these groups are plotted in Figures 3a-c, and the structural relationships are clear. The essential torsion angles from the crystal structures P10-13 and P35-38 are plotted for comparison in Figure 3d. They lie in the right-handed R helix region and are both closest to the ArHN8 conformation. The ArHN conformations were also compared to the structures 6HB1-6HB9 found in a conformational search of the FGG peptide in a model solvent environment.34 A criterion of lowest RMSD of the essential torsion angles was used, ignoring the ψ3 angle which is not present in the 6HB1-6HB9 model
6266 J. Phys. Chem. B, Vol. 102, No. 32, 1998
Worth et al.
TABLE 5: Average Properties and Relative Importance of 13 Trajectories of the YTGP Peptide in Aqueous Solution over 300 ps at 300 K starting structure
SAS (Å2)a total % polarityc
6HB1 6HB2 6HB3 6HB4 6HB5 6HB6 6HB7 6HB8 6HB9 linear P10-13 P35-38 NMRmod
704.3 676.8 697.3 661.3 707.8 650.5 706.9 656.7 709.5 649.8 651.6 684.7 668.7
36.7 36.5 39.2 35.1 36.7 32.8 36.0 35.6 40.5 34.0 38.3 40.8 35.9
〈∆δ〉 (ppm)
Utotd
-0.13 -0.61 -0.17 -0.72 0.00 -0.29 -0.05 -0.48 -0.04 -0.20 -0.25 -0.06 -0.58
45.5 11.8 43.2 9.3 49.4 30.6 81.7 41.0 83.7 281.8 70.1 74.9 0.0
b
average energy (kJ mol-1) Usolutee (1/2)Usw,elf (1/2)Usw,ljg 28.1 2.0 45.3 24.3 37.9 25.1 44.3 0.0 70.0 27.9 49.5 56.1 5.8
42.7 10.8 32.8 0.0 37.9 28.1 33.0 44.4 27.0 54.1 39.9 32.6 36.6
0.2 4.9 0.2 5.6 0.3 6.1 0.1 6.3 0.0 6.5 7.0 2.6 5.4
weight 1h 0.000 0.009 0.000 0.023 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.968
weighting factor weight 2i weight 3j 0.000 0.286 0.000 0.000 0.000 0.000 0.000 0.650 0.000 0.000 0.000 0.000 0.063
0.000 0.013 0.000 0.987 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
a Average solvent-accessible surface area. b Average structural shift on the glycine peptide proton. cAverage % of SAS that is polar. The polar SAS includes the area assigned to the main chain (which does not include the R-carbon atoms) as well as polar side chains. d Average total energy relative to that of the trajectory with the lowest average total energy, E0 ) -96 635.8 kJ mol-1. e Average assigned solute energy, Uss + 0.5Usw,el + 0.5Usw,lj, relative to that of the trajectory with the lowest average assigned solute energy, E0 ) -655.2 kJ mol-1. f Half the average solutesolvent electrostatic interaction energy relative to that of the trajectory with the lowest average energy, E0 ) -201.5 kJ mol-1. g Half the average solute-solvent Lennard-Jones interaction energy relative to that of the trajectory with the lowest average energy, E0 ) -123.0 kJ mol-1. h exp(-〈U 〉/RT). i exp(-〈U j tot soluble〉/RT). exp(-0.5-〈Usw,el〉/RT).
Figure 3. (a-c) Essential dihedral angles for important conformations of the YTGP peptide in aqueous solution which contain an aromatic-(i+2) amine interaction, classified according to the region of the Ramachandran plot occupied by the threonine φ2 and ψ2 angles: (top left) left-handed R helix, (top right) right-handed R helix, (bottom left) β sheet, (bottom right) conformations from the crystal structure of BPTI containing an aromatic-(i+2) amine interaction: P10-13 and P35-38.
structures. The lowest RMSDs were all less than 50°. The structures found in the conformational search are therefore good representations of the possible conformations with an aromatic(i+2) amine interaction; even though the trajectories search large amounts of conformational space, these interactions were only found near these structures. The nearest 6HBx structure and the RMSD are included for the ArHN conformations listed in Table 3. Using the definition of the presence of an aromatic-(i+2) amine interaction by a ring shift contribution σring < -0.5 ppm as predicted by eq 1, 557 of the 3900 structures are classified as containing the interaction. All except for 109 of these
structures are contained within the nine most populated ArHN conformational states. A perpendicular interaction, with R > 30°, is found in 203 structures. The numbers of interactions, nint, and numbers of perpendicular interactions, nperp, are included for the conformational states listed in Table 3. It can be seen that larger proportions of nperp correlate with larger values of both 〈R〉 and 〈∆δ〉. (iii) Clusters Sampled along the Individual Trajectories. The structures saved from each trajectory were separately clustered into the 243 conformational states described above. All trajectories, excepting those starting from the 6HB2, 6HB4, 6HB5, and 6HB6 structures, sample some common states.
The YTGP Peptide Likewise the trajectories starting from the 6HB4 and 6HB6 structures have conformational states in common. The remaining trajectories, starting from the 6HB2 and 6HB5 structures, however sample states not seen otherwise. The trajectories are therefore sampling four distinct regions of conformational space. The trajectories starting from the 6HB2, 6HB4, and 6HB6 structures all sample conformational states which lie in the lefthanded R helix region of the threonine (φ, ψ) space. These are not found in other trajectories, and so it seems that interconversion between this region and other areas of the Ramachandran plot is inhibited. Table 4 lists data related to the sampling by each trajectory. The number of conformational states sampled is given by nconf, while nmove shows how often the peptide moved to a different state. How long it remained in the conformational state is then classified for each visit as short (10 ps, 50 ps). For example, the trajectory starting from the 6HB1 structure changed conformational state 81 times, moving between 15 distinct states. Of these 81 periods along the trajectory, for 73 the peptide remained in the conformational state for a short time only, and for eight it remained a medium amount of time. The table thus gives an idea of the properties of the region of conformational space through which the trajectory is moving. The data show that the trajectories behave quite differently. Some (e.g., those starting from the 6HB1, 6HB3, 6HB8, P10-13, and NMRmod structures) move often between many states. Others (e.g., those starting from the 6HB2, 6HB4, and P35-38 structures) sample fewer states and remain in a single one for a relatively long time. Also included in the table are the ArHN conformational states sampled by each trajectory, including the percentage of the trajectory spent in this state. The conformation is only listed if it accounts for more than 1% of the trajectory. Five trajectories contain significant populations of these states. The trajectory starting from the 6HB2 structure is dominated by state ArHN1, while the trajectory starting from the 6HB4 structure is dominated by state ArHN2. In both cases the dominant conformer is close to the starting structure (see Table 3). These are in fact closely related, differing only in the φ3 angle (see Table 1). The trajectory starting from the NMRmod structure has a reasonable population of the ArHN3 state, while the trajectory starting from the 6HB8 structure is found to sample a number of the ArHN conformational states. (iv) Average Properties from the Trajectories. Table 5 lists various average properties calculated from each trajectory: average structural shift for the glycine peptide proton, energy (assigned solute and total), and solvent-accessible surface area. Only four trajectories show significant structural shifts (here taken as < -0.45 ppm) on the glycine peptide proton: those starting from the 6HB2, 6HB4, 6HB8, and NMRmod structures. As can be seen in Table 4, these are the trajectories with significant populations of ArHN conformational states. Surprisingly, both the P10-13 an P35-38 trajectories do not show anomalous shifts. This must mean that the structures in the protein crystal environment are not stable in solution. The range of both average solute and total energies spanned by the trajectories is large, over 50 kJ mol-1. Even though the figures are different, the five trajectories with the lowest energies are the same for both properties (those starting from the 6HB2, 6HB4, 6HB6, 6HB8, and NMRmod structures), although the orders are different. It is noticeable that the trajectory starting from a linear geometry has a very high total energy. As the solute energy is comparable to the other trajectories, this indicates that the solvent has not relaxed fully around the solute.
J. Phys. Chem. B, Vol. 102, No. 32, 1998 6267 The solvent-accessible surface area (SAS) can be taken as a measure of fold; the smaller the surface area, the more folded a peptide is. There is no correlation between average total SAS and glycine peptide proton shift, and so tightly folded structures are possible without this feature. As expected, there is a correlation of total SAS with low, i.e., favorable, solvent-solute Lennard-Jones energy. There is however no correlation between either the total SAS or the ratio of average polar SAS to average total SAS with the solute-solvent electrostatic interaction energy. This SAS ratio however shows a definite correlation with the assigned solute energy: small ratio corresponds to low average energy. This is surprising, as it is usually assumed that a peptide will maximize this ratio, forming a nonpolar core and polar interface with the solvent. (v) Selecting and Extending the Important Trajectories. The trajectory energies in Table 5 are reflected in the weights assigned using the three different weighting schemes. While the weights are different, all schemes pick out only trajectories with large glycine peptide proton anomalous shifts. This indicates that folded conformations are indeed energetically favored. Using the average total energy as an approximation for the free energy (weight 1, eq 10), it is found that the trajectory starting from the NMRmod structure dominates the ensemble, with a small admixture of the trajectories staring from the 6HB4 and 6HB2 structures. Using the average energy assigned to the solute (weight 2, eq 11), a mix of the trajectories staring from the 6HB8, 6HB2, and NMRmod structures is found. weight 3 (eq 12), based on an estimate of the free energy of solvation, generates an ensemble dominated by the trajectory starting from the 6HB4 structure, with a small amount of the trajectory starting from the 6HB2 structure. The uncertainty in the average energies is defined as the rootmean-square deviation of the running average, taken at 1 ps intervals. This is a measure that will approach zero if the average remains constant as the trajectory progresses. The uncertainty in the 〈Utot〉 values varied between 14.7 and 29.1 kJ mol-1, excluding the much larger uncertainty for the trajectory starting from the linear structure. Given that the five trajectories with the lowest average total energy are 42 kJ mol-1 apart, this uncertainty is significant. In contrast, the uncertainty in the 〈Usolute〉 values is smaller: between 2.1 and 10.6 kJ mol-1. The five trajectories with the lowest solute energy span 25 kJ mol-1, and the uncertainty here is less significant. The uncertainty in the solute-solvent interaction energy term is also significant; for the 0.5 〈Usw,el〉 values these lie between 1.4 and 13.7 kJ mol-1. Although the uncertainty in the energies makes the weights tentative, the trajectories picked out by the various schemes, those starting from the 6HB2, 6HB4, 6HB8, and NMRmod structures, are clearly of interest due to the large glycine peptide proton structural shift. These were therefore continued to 1 ns. Table 6 summarizes the average properties over this total time. The average energies over 1 ns are different from those over 300 ps. The energies have become closer together: 〈Usolute〉 spans 5.2 kJ mol-1 instead of 24.3 kJ mol-1, 〈Utot〉 spans 15.6 kJ mol-1 instead of 42.0 kJ mol-1, 0.5 〈Usw,el〉 spans 26.2 kJ mol-1 instead of 44.4 kJ mol-1. The exception to this is the solute-solvent Lennard-Jones interaction energy, 0.5〈Usw,lj〉, which still spans approximately 2 kJ mol-1. The uncertainty in the energies is also reduced, but is still of the same order of magnitude as the energies. The weighting of the trajectories has changed during the extension of the four trajectories. Using weight 1 the trajectory starting from the 6HB2 structure is now found to dominate the
6268 J. Phys. Chem. B, Vol. 102, No. 32, 1998
Worth et al.
TABLE 6: Sampling and Average Properties of Selected Trajectories of the YTGP Peptide in Aqueous Solution over 1 ns at 300 K average energy (kJ mol-1)d Usolutef (1/2)Usw,elg (1/2)Usw,ljh
ncona
nmoveb
〈∆δ〉c (ppm)
6HB2
13
136
-0.56
1.3
6.4
12.5
4.8
0.990
0.361
0.489
6HB4
11
131
-0.66
14.9
11.6
12.4
5.7
0.004
0.044
0.511
6HB8
26
317
-0.41
16.9
7.1
36.4
3.9
0.002
0.269
0.000
NMRmod
31
346
-0.29
14.9
6.7
38.6
3.7
0.004
0.319
0.000
starting structure
e
Utot
weight 1i
weighting factor weight 2j weight 3k
ArHN states sampledm ArHN1 (47%) ArHN2 (22%) ArHN2 (46%) ArHN1 (13%) ArHN8 (17%) ArHN9 (9%) ArHN6 (5%) ArHN4 (5%) ArHN3 (5%) ArHN9 (13%) ArHN3 (8%) ArHN6 (5%)
a Number of conformational states sampled along the trajectory. b Number of times the peptide moves into a different conformational state. Average structural shift on the glycine peptide proton. d The zero point of these energies is that in Table 5. e Average total energy relative to E0 ) -96 635.8 kJ mol-1. f Average assigned solute energy, Ussj + 0.5Usw,el + 0.5Usw,lj, relative to E0 ) -655.2 kJ mol-1. g Half the average solutesolvent electrostatic interaction energy relative to E0 ) -201.5 kJ mol-1. h Half the average solute-solvent Lennard-Jones interaction energy relative to E0 ) -123.0 kJ mol-1. i exp(-〈Utot〉/RT). j exp(-〈Usolute〉/RT). k exp(-0.5〈Usw,el〉/RT). m Nomenclature as in Table 3. The population of the state is given in parentheses. c
ensemble, rather than that starting from the NMRmod structure, which is now almost unpopulated. The populations found using weight 2 now show an ensemble that is almost evenly divided between three trajectories starting from the 6HB2, 6HB8, and NMRmod structures, with a small proportion of the trajectory starting from the 6HB4 structure. The scheme based on the solute-solvent interaction energy, weight 3, leads to an ensemble divided reasonably evenly between the trajectories starting from the 6HB2 and 6HB4 structures. All the trajectories sample more conformational states than during the first 300 ps, but the average time for sampling a conformation remains approximately the same: 9 ps for the trajectories starting from the 6HB2 and 6HB4 structures, and 3 ps for those starting from the 6HB8 and NMRmod structures. The trajectories starting from the 6HB2 and 6HB4 structrues are still dominated by the two conformational states ArHN1 and ArHN2. Whereas over 300 ps, each trajectory sampled only one of these, they are now seen to be sampling both. Both trajectories retain a significant structural shift on the glycine peptide proton with 〈∆δ〉 < -0.5 ppm. In contrast, the extension of the 6HB8 trajectory continues to sample many conformations. Even so, over 40% of the population are in ArHN conformational states, and 〈∆δ〉 ) -0.41 ppm. The trajectory starting from the NMRmod structure also samples many conformational states. The majority of these states however do not have an aromatic-(i+2) amine interaction, and on average over the 1 ns no significant structural shift is observed on the glycine peptide proton. Figure 4 shows the structural shift, ∆δ, as a function of time for two of these trajectories. One sees that the fluctuations are large, reflecting the mobility of the peptide and the sensitivity of ∆δ to the peptide structure. For the trajectory starting from the 6HB2 structure (Figure 4a), the glycine peptide proton stuctural shift varies between 0 and -1.2 ppm. In general a sizable negative shift is always present. Also along this trajectory the threonine peptide proton continuously exhibits a large negative ∆δ. The trajectory starting from the 6HB4 structure shows a similar pattern. The time evolution of the structural shifts from the trajectory starting from the 6HB8 structure is shown in Figure 4b. The glycine peptide proton shift is at times as large as -1.5 ppm. A major conformational transition clearly occurs after 680 ps; the glycine peptide proton structural shift vanishes. In contrast to the trajectory starting
Figure 4. Structural shifts, ∆δ, for the threonine peptide proton (upper panel) and glycine peptide proton (lower panel) as a function of time for trajectories starting from (a) the 6HB2 structure and (b) the 6HB8 structurue. The values plotted are averages over 10 ps windows.
from the 6HB2 structure, the threonine peptide proton structural shifts are in general small. Toward the end of the trajectory the structural shift becomes larger, and the peptide has changed from a conformation with an aromatic-(i+2) amine interaction to one with an aromatic-(i+1) amine interaction. (vi) Calculated NMR Spectrum. Figure 4a indicates that the conformational states sampled by the trajectories starting from the 6HB2 structure, ArHN1 and ArHN2, possess structural shifts on both the threonine and glycine peptide protons. It is therefore necessary to compare the complete proton structural shift spectra with the experimental spectrum and not just concentrate on the glycine peptide proton structural shift values. The ensemble of structures was calculated at 300 K. As the random coil shifts25 were measured at 298 K, the calculated
The YTGP Peptide
J. Phys. Chem. B, Vol. 102, No. 32, 1998 6269 conformational states with a β sheet threonine conformation are plotted in Figure 5c. These in general show good qualitative agreement with the experimental spectra. The threonine peptide proton structural shifts are now smaller, although the ∆δ for this proton is rather large for the ArHN4 and ArHN6 conformational states. The major deviations from the experimental spectrum are for the glycine R protons, where the shifts are split due to the parallel arrangement of the aromatic-(i+2) amine interaction geometry. The splitting pattern however varies from state to state: sometimes ∆δ for H3R1 is more negative than that for H3R2, and sometimes this is reversed. As a result, it is possible that a mixture of these states would lead to the same shift on both protons. Assuming the trajectories cover the whole conformational space, it is possible to calculate the NMR spectrum for the peptide in aqueous solution by weighting the averages calculated for each trajectory. Using weight 1 or weight 3, the spectrum is very similar to that for the ArHN1 and ArHN2 conformational states. Using weight 2, based on the average solute energy, the spectrum is closer to the experimental one, although the glycine peptide proton structural shift is too weak at ∆δ ) -0.43 ppm, and the threonine peptide proton structural shift is too strong at ∆δ ) -0.58 ppm. The R proton shifts are nearly equivalent, with ∆δ ) -0.17 and -0.09 ppm. Discussion
Figure 5. NMR structural shifts of the YTGP peptide protons in aqueous solution at 300 K: experimental (bold line) compared to calculated values for conformational states (dashed lines), classified according to the region of the Ramachandran plot occupied by the threonine φ2 and ψ2 angles; (top) left-handed R helix, (middle) righthanded R helix, (bottom) β sheet.
NMR spectra are relevant at this temperature. The experimental spectrum was however measured at 271 K. Temperature factors for the shifts were also measured,10 and so this difference can be corrected for. As we wish to see whether the model at 300 K is reasonable, the experimental spectrum was corrected by adding -0.41, -0.29, and +0.47 ppm to the peptide protons of tyrosine, threonine, and glycine, respectively. The corrected experimental spectrum is shown by a bold line in Figure 5 as deviations from the random coil values. The spectrum clearly deviates little from the random coil spectrum, with the exception of the large negative peak for the glycine peptide proton. The spectra calculated from the various ArHN conformational states are also plotted, divided into the three sets defined by the threonine φ2 and ψ2 angles (see Results section (ii)). Each spectrum is an average over the spectra of the structures within the state. Large deviations from the experimental spectrum for the proline δ protons are seen for all conformational states. This could be due to systematic errors which may arise because this is a C-terminal residue. The spectra for the conformational states with the threonine residue in the left-handed R helix region of the Ramachandran plot are shown in Figure 5a. The two spectra are very similar. As mentioned above, these conformational states have a large structural shift not only on the glycine but also on the threonine peptide protons. In Figure 5b, the spectra for conformational states with a right-handed R helix threonine are plotted, while those for
It appears that after 1 ns the trajectories are beginning to converge but sample two separate regions of conformational space: the trajectories starting from the 6HB2 and 6HB4 structures in region 1, and the trajectories starting from the 6HB8 and NMRmod structures in region 2. In region 1 the threonine residue is in a left-handed R helix geometry, while in region 2, the φ2 and ψ2 angles adopt values related to either a left-handed R helix or a β sheet. Thus the regions are separated by a barrier between these regions of the Ramachandran plot. The two regions are seen to have different characteristics. In region 1 a trajectory samples a few conformational states, remaining in each for a long time, while in region 2 a trajectory passes quickly through many states. Such a division makes the use of multiple trajectories essential. The use of many short trajectories, followed by selection of the most important, seems the most sensible way of getting to the stable regions of conformation space as quickly as possible. For example, of the trajectories not extended past 300 ps, it is likely that trajectory 6HB6 would also converge in region 1; and the remaining trajectories, in region 2. The amount of time needed for this will however be much more than that needed for the four selected trajectories. The structural shift spectra for the various ArHN conformational states sampled in region 2 (Figure 5b,c) agree qualitatively with the experimental spectrum, and it seems possible that a mixture of these states could be responsible for the observed NMR spectrum. The structural shift spectra for the ArHN conformational states in region 1 (Figure 5a) however have a very strong structural shift on the threonine peptide proton, which is not observed experimentally. Two conformations of the peptide are shown in Figure 6. The conformations look similar, but the glycine and threonine peptide protons (labeled HGLY and HTHR respectively) are coming out of the plane of the paper in the ArHN1 conformation (Figure 6a), while they are going into the paper in the ArHN8 conformation (Figure 6b). The structural shifts on the glycine peptide protons are -0.60 and -0.85 ppm, while those on the threonine peptide protons are -0.94 and -0.35 ppm, respectively.
6270 J. Phys. Chem. B, Vol. 102, No. 32, 1998
Worth et al.
Figure 6. Stereoscopic plots of two conformations of the YTGP peptide in aqueous solution which have a large structural shift on the glycine peptide proton, HGLY (a) ArHN1 (∆δ ) -0.60 ppm) and (b) ArHN8 (∆δ ) -0.85 ppm). The structural shifts on the threonine peptide protons, HTHR, are -0.94 and -0.35 ppm, respectively.
It is difficult to estimate how accurate the calculated shifts are. One source of error is the empirical method itself. Furthermore the parameters used were optimized for protons in a protein environment, which may not be so appropriate for a short peptide in an aqueous environment. Another source of error is that the values are very sensitive to the geometry, and the slightly different packing of atoms due to a different force field will change the calculated shifts. The error of the shift calculation method of (0.3 ppm (see Methods section (iii)) must therefore be taken as a lower bound to the error of the calculated structural shift values. The structural shifts on the threonine peptide proton for the conformational states in region 1 however are large enough to be a definite effect. The results give a large weighting to these conformations, which does not agree well with the experimental spectrum. As shown by the uncertainty in the average energies and the changing conformational state populations within a trajectory on extending it from 300 ps to 1 ns, it is clear that the trajectories are not converged enough for the weights to be reliable. It is thus not possible to analyze some of the assumptions used to set up the weighting schemes, e.g., the neglect of entropy terms in weight 1 and weight 2. It does however seem clear that the neglect of solute-solute terms in weight 3, where the free energy difference is approximated by a solvation free energy term, definitely leads to the overweighting of conformations in region 1. The force field is also crucial in this respect. For example, slight changes in the solvation terms could significantly change the balance between conformational states. The reliance on the force field has been examined by van der Spoel et al.35 There, a set of 1 ns trajectories was calculated, all starting with a linear conformation but using different force fields. In only one trajectory was a peptide structure with an anomalous glycine peptide proton shift found, and they con-
cluded that the results depend heavily on the force field and water model used. Although the CHARMm force field was not tested, the TIP3P water model was found to be inadequate with all the force fields tested. As a comparison to this earlier study, the trajectory starting from a linear structure was also extended to 1 ns. At the end of this time it was found that, in agreement with the van der Spoel et al. simulation, no significant glycine peptide proton structural shift was found. In the last 600 ps of the trajectory however the conformational states ArHN7, ArHN8, and ArHN9 are being sampled and finally have populations of 4%, 13%, and 10% respectively. Thus this trajectory is indeed slowly converging in region 2 of the conformation space. From the results presented above, it is clear that the starting structure is crucial for trajectories of this length. Without extensive calculation it is not easy to separate this effect from dependence on the force field. Thus different starting structures may also have sampled ArHN conformational states using the force fields that produced no anomalous shift starting from a linear structure. The test of the quality of a force field is whether it produces results in agreement with experiment. The CHARMm/TIP3P force field has produced a number of structures that agree well with the experimental NMR spectrum. The geometry of the folded conformation found by van der Spoel et al. is unfortunately not reported in full by them. The threonine φ2 and ψ2 angles are however shown to be in the right-handed R helix region. Thus it could be either ArHN3, ArHN5, ArHN7, or ArHN8. Unfortunately the structural shifts of the glycine R protons and threonine protons are also not reported, and so it is not possible to distinguish between them. Kemmink and Creighton64 have analyzed the experimental data to study the physical properties of the aromatic-(i+2) amine interaction in aqueous solution, using a two-state open-
The YTGP Peptide closed model. As a measure of the “closed” aromatic-(i+2) amine interaction, they take the ∆δ value for the peptide proton of G37 in BPTI. This is however a strong perpendicular interaction and has a very large anomalous shift of ∆δ ≈ -3 ppm. From the results presented above two points should be made about this model. First, structures with such large structural shifts are never found in the simulation of the peptide in aqueous solution. Second, the use of the ∆δ from a single conformation ignores the dynamic nature of the system. A more realistic model would use the value of the structural shift from the folded conformational state. Their results could therefore be reinterpreted using the lower value of 〈∆δ〉 for an ArHN conformational state in aqueous solution. Using the data at 300 K, i.e., ∆δ(expt) ≈ -0.8 ppm and ∆δ(calc) > -1.2 ppm for folded states, more than 66% of the peptide must be in a folded state with an aromatic-(i+2) amine interaction. Conclusion The YTGP peptide in aqueous solution has an extremely complex potential energy surface, and many conformational states are sampled. A major problem is that the system has two distinct, and important, regions of conformational space, and conversion from one region to another does not occur during the trajectories calculated here. As a result, single trajectories of up to even 1 ns are insufficient to sample the conformational space of the system. In general, the trajectories are found to be dependent on the starting structure, even after 1 ns. Choosing a suitable starting structure reduces the time needed to reach the region of conformation space important for the system. The strategy adopted, running many short trajectories from selected starting conformations, followed by selection of the most important for further study, seems to be a sensible method to try and shorten this equilibration time when the best starting structure is not known a priori. This strategy is especially useful when, as for this peptide, there are conformations that interchange very slowly. Other possible methods to approach such a problem are the use of high-temperature dynamics,65-68 mass weighting,69 weight histogram analysis (WHAM),70 or locally enhanced dynamics.71,72 The experimental NMR spectrum for this peptide has a large structural shift on the glycine peptide proton. The simulation of the peptide found nine well-populated conformational states with this feature. The aromatic-(i+2) amine interaction geometry is found to be predominantly parallel, although perpendicular structures were also found. Seven of the states have a calculated structural shift spectrum in qualitative agreement with the experimental spectrum, and so the system is likely to be a mixture of a number of conformational states at this temperature. The remaining two conformational states, despite their apparent importance in the simulation, have a large structural shift on the threonine (i+1)-peptide proton, which is not observed experimentally. As the set of conformational states are close in energy however, the results will be dependent on the force field used; that is, the relative importance of the conformational states will change. Likewise, the calculated shifts will also change with different force fields, as their values are very sensitive to the structure of the peptide. In conclusion, the YTGP peptide in aqueous solution is a difficult system to study using standard MD methods. The use of multiple trajectories has however enabled us to find a set of likely structures for the peptide in solution and to get a feel for the dynamic behavior of the system. It has also provided
J. Phys. Chem. B, Vol. 102, No. 32, 1998 6271 information on the sensitivity of the system to various factors such as force field and simulation length. This information can all be used to set up an efficient, more detailed calculation on the free energy surface. Acknowledgment. We thank Tom Creighton for first bringing our attention to this interesting and complex problem, and Johan Kemmink for discussions of his NMR data. We also thank Mike Williamson for providing the TOTAL program, and Simon Hubbard for providing the NACCESS program. References and Notes (1) Tobias, D. J.; Mertz, J. E.; Brooks, C. L., III. Biochemistry 1991, 30, 6054. (2) Brooks, C. L., III; Case, D. A. Chem. ReV. 1993, 93, 2487. (3) Demchuk, E.; Bashford, D.; Case, D. A. Fold. Des. 1997, 2, 35. (4) van der Spoel, D.; Berendsen, H. J. C. Biophys. J. 1997, 72, 2032. (5) Nardi, F.; Worth, G. A.; Wade, R. C. Fold. Des. 1997, 2, 62. (6) Mohanty, D.; Elber, R.; Thirumalaie, D.; Beglov, D.; Roux, B. J. Mol. Biol. 1997, 272, 432. (7) Brooks, C. L., III; Karplus, M.; Pettitt, B. M. AdV. Chem. Phys. 1988, 71, 1. (8) Guo, Z.; Brooks, C. L., III; Boczko, E. M. Proc. Natl. Acad. Sci. U.S.A. 1997, 94, 10161. (9) Straatsma, T. P.; McCammon, J. A. J. Chem. Phys. 1994, 101, 5032. (10) Kemmink, J.; Van Mierlo, C. P. M.; Scheek, R. M.; Creighton, T. E. J. Mol. Biol. 1993, 230, 312. (11) Kemmink, J.; Creighton, T. E. J. Mol. Biol. 1993, 234, 861. (12) Auffinger, P.; Westhof, E. J. Mol. Biol. 1997, 269, 326. (13) Caves, L. S.; Evanseck, J. D.; Karplus, M. Protein Sci. 1998, 7, 649. (14) Creighton, T. E., Ed. Protein Folding; W. H. Freeman: New York, 1992. (15) Honig, B.; Yang, A.-S. AdV. Protein Chem. 1995, 46, 27. (16) Creighton, T. E. Biochem. J. 1990, 270, 1. (17) Creighton, T. E. In Protein Folding; W. H. Freeman: New York, 1992; pp 301-351. (18) Van Mierlo, C. P. M.; Derby, N. J.; Creighton, T. E. Proc. Nat. Acad. Sci. U.S.A. 1992, 89, 6775. (19) Van Mierlo, C. P. M.; Derby, N. J.; Keeler, N. J.; Neuhaus, D.; Creighton, T. E. J. Mol. Biol. 1993, 229, 1125. (20) Wlodawer, A.; Deisenhofer, J.; Huber, R. J. Mol. Biol. 1987, 193, 145. (21) Pardi, A.; Wagner, G.; Wu¨thrich, K. Eur. J. Biochem. 1983, 137, 445. (22) Wagner, G.; Braun, W.; Havel, T. F.; Shaumann, T.; Go, N.; Wu¨therich, K. J. Mol. Biol. 1987, 196, 611. (23) Higashijima, T.; Kobayashi, J.; Nagai, U.; Miyazawa, T. Eur. J. Biochem. 1979, 97, 43. (24) Bundi, A.; Wu¨thrich, K. Biopolymers 1979, 18, 285. (25) Wisart, D. S.; Bigham, C. G.; Holm, A.; Hodges, R. S.; Sykes, B. D. J. Biomol. NMR 1995, 5, 67. (26) Burley, S. K.; Petsko, G. A. FEBS Lett. 1986, 203, 139. (27) Singh, J.; Thornton, J. M. J. Mol. Biol. 1990, 211, 595. (28) Mitchell, J. B. O.; Nandi, C. L.; Ali, S.; McDonald, I. K.; Thornton, J. M.; Price, S. L.; Singh, J. Nature 1993, 366, 413. (29) Flocco, M. M.; Mowbray, S. L. J. Mol. Biol. 1994, 235, 709. (30) Mitchell, J. B. O.; Nandi, C. L.; McDonald, I. K.; Thornton, J. M.; Price, S. L. J. Mol. Biol. 1994, 239, 315. (31) Levitt, M.; Perutz, M. F. J. Mol. Biol. 1988, 201, 751. (32) Cheney, J.; Cheney, B. V.; Richards, W. G. Biochim. Biophys. Acta 1988, 954, 137. (33) Rodham, D. A.; Suzuki, S.; Suenram, R. D.; Lovas, F. J.; Dasgupta, S.; Goddard, W. A.; Blake, G. A. Nature 1993, 362, 735. (34) Worth, G. A.; Wade, R. C. J. Phys. Chem. 1995, 99, 17473. (35) van der Spoel, D.; van Buuren, A.; Tieleman, D. P.; Berendsen, H. J. C. J. Biomol. NMR 1996, 8, 229. (36) Abraham, R. J.; Fisher, J.; Loftus, P. Introduction to NMR Spectroscopy; Wiley and Sons: New York, 1987; p 20. (37) IUPAC-IUB Commission on Biochemical Nomenclature. Biochemistry 1970, 9, 3471. (38) Molecular Simulations Inc., 200 Fifth Ave., Waltham, MA 02154, 1992. (39) Straatsma, T. P.; McCammon, J. A. J. Comput. Chem. 1990, 11, 943. (40) Brooks, B. R.; Bruccoleri, R. E.; Olafson, B. D.; States, D. J.; Swaminathan, S.; Karplus, M. J. Comput. Chem. 1983, 4, 187. (41) QUANTA parameter handbook, release 3.3; Molecular Simulations Inc., San Diego, 1992.
6272 J. Phys. Chem. B, Vol. 102, No. 32, 1998 (42) Jorgensen, W.; Chandrasekhar, J.; Madura, J.; Impey, R.; Klein, M. J. Chem. Phys. 1983, 79, 926. (43) van Gunsteren, W. F.; Berendsen, H. J. C. Mol. Phys. 1977, 34, 1311. (44) Berendsen, H. J. C.; Postma, J. P. M.; van Gunsteren, W. F.; Di Nola, A.; Haak, J. R. J. Chem. Phys. 1984, 81, 3684. (45) Andersen, H. C. J. Chem. Phys. 1980, 72, 2384. (46) Loncharich, R. J.; Brooks, B. R. Proteins: Struct., Funct. Genet. 1989, 6, 32. (47) Schreiber, H.; Steinhauser, O. Chem. Phys. 1992, 168, 75. (48) Guenot, J.; Kollman, P. A. J. Comput. Chem. 1993, 14, 295. (49) Williamson, M. P.; Asakura, T. J. Magn. Reson. Ser. B 1993, 101, 63. (50) Asakura, T.; Taska, K.; Demura, M.; Williamson, M. P. J. Biomol. NMR 1995, 6, 227. (51) Williamson, M. P. TOTAL, computer program; Krebs Institue, Department of molecular biology and biotechnology, University of Sheffield, Sheffield, S10 2UH, U.K., 1993. (52) Haigh, C. W.; Mallion, R. B. Prog. NMR Spectrosc. 1980, 13, 303. (53) Hubbard, S. J.; Thornton, J. M. NACCESS, computer program; Department of Biochemistry and Molecular Biology, University College, London, U.K., 1993. (54) Lee, B.; Richards, F. M. J. Mol. Biol. 1971, 55, 379. (55) Karpen, M. E.; Tobias, D. J.; Brooks, C. L., III. Biochemistry 1993, 32, 412.
Worth et al. (56) McKelvey, D. R.; Brooks III, C. L.; Mokotov, M. J. Protein Chem. 1991, 10, 265. (57) Jayaram, B.; Fine, R.; Sharp, K.; Honig, B. J. Phys. Chem. 1989, 93, 4320. (58) Roux, B.; Yu, H.-A.; Karplus, M. J. Phys. Chem. 1990, 94, 4683. (59) A° qvist, J.; Hansson, T. J. Phys. Chem. 1996, 100, 9512. (60) Nina, M.; Beglov, D.; Roux, B. J. Phys. Chem. 1997, 101, 5239. (61) Hansson, T.; Marelius, J.; Aqvist, J. J. Comput.-Aided Mol. Des. 1998, 12, 27. (62) Sharp, K. A.; Nicholls, A.; Fine, R. F.; Honig, B. Science 1991, 252, 106. (63) Ponder, J. W.; Richards, F. M. J. Mol. Biol. 1987, 193, 775. (64) Kemmink, J.; Creighton, T. E. J. Mol. Biol. 1995, 245, 251. (65) Daggett, V.; Levitt, M. J. Mol. Biol. 1992, 223, 1121. (66) Mark, A. E.; van Gunsteren, W. F. Biochemistry 1992, 31, 7745. (67) Tirado-Reves, J.; Jorgensen, W. L. Biochemistry 1993, 32, 4175. (68) Caflisch, A.; Karplus, M. J. Mol. Biol. 1995, 252, 672. (69) Mao, B. Biophys. J. 1991, 60, 611. (70) Kumar, S.; Rosenberg, J. M.; Bouzida, D.; Swendsen, R. H.; Kollman, P. A. J. Comput. Chem. 1995, 16, 1339. (71) Roitberg, A.; Elber, R. J. Chem. Phys. 1991, 95, 9277. (72) Simmerling, C.; Elber, R. Proc. Nat. Acad. Sci U.S.A. 1995, 92, 3190.