Pathways for Conformational Change in Nitrogen Regulatory Protein

Pathways corresponding to the conformational change in nitrogen regulatory protein C are calculated using the CHARMM19 force field with an implicit so...
1 downloads 0 Views 847KB Size
2456

J. Phys. Chem. B 2008, 112, 2456-2465

Pathways for Conformational Change in Nitrogen Regulatory Protein C from Discrete Path Sampling Mey Khalili* and David J. Wales UniVersity Chemical Laboratories, Lensfield Road, Cambridge CB2 1EW, UK ReceiVed: August 17, 2007

Pathways corresponding to the conformational change in nitrogen regulatory protein C are calculated using the CHARMM19 force field with an implicit solvation model. Our analysis employs the discrete path sampling approach to grow a database of local minima and transition states from the potential energy surface that contains kinetically relevant pathways. The pathways with the largest contribution to the phenomenological two-state rate constants are found to exhibit a number of structural features that agree with experimental observations. Further details of the calculated pathways for conformational change may therefore provide useful predictions of how this large-scale motion is achieved.

1. Introduction Nitrogen regulatory protein C (NtrC) is a member of the “twocomponent system’’ signal transduction pathway abundant in bacteria,1-3 which utilizes phosphoryl transfer from sensor kinases to response regulatory proteins for signal propagation. The members of this family become activated when they are phosphorylated on an aspartate residue in their receiver domain and modulate the activity of their transcription activation domain. NtrC is phosphorylated at aspartate-54, after which it utilizes the σ54-holoenzyme form of RNA polymerase, and activates transcription.4 Unphosphorylated NtrC exists as a dimer. Upon phosphorylation, it forms large octameric and hexameric oligomers that have the capacity to hydrolyze ATP and provide the energy for the transcriptional activation. The solution structure of the unphosphorylated receiver domain of NtrC (apo-NtrCr) was determined in 1995 with NMR5 (PDB code: 1NTR). The structure of the transiently phosphorylated form was determined in 19996 (PDB code: 1DC8). Later, it was discovered that berrylofluoride (BeF3-) forms a persistent complex with NtrC and fully activates it. The structure of beryllofluoride-activated NtrCr (BeF3-NtrCr) was solved by NMR at high resolution in 20037 (PDB code: 1KRW). There have been a number of experimental studies that suggest large conformational changes upon phosphorylation.8 The wild-type monomer does not have any significant ATP-ase activity with or without the phosphorylation, unless its concentration is above 5 µ molar.8 With phosphorylation, the ATP-ase activity (measured as picomoles per 10 µL per 10 min) is about 2525 at 5 µ molar concentrations. A number of NtrC mutants have much higher ATP-ase activity in the monomeric state, with NtrCS160F having the highest value of 908 at 1 µmol without phosphorylation and 1896 with phosphorylation.8 A ribbon diagram of the N-terminal receiver domain of NtrC is shown in Figure 1, with all the secondary structures numbered according to their position along the chain. Phosphorylation induces large conformational changes involving a displacement of R3, R4, β4, and β5 away from the active site, and a shift and an axial rotation in R4.6 There is also substantial rearrangement of the loop above the active site (β3-R3 loop, residues 55-64) as well as the loops leading to and leaving from R4.6

Figure 1. Ribbon diagram for the N-terminal receiver domain of NtrC (residues 1-123). The position of the active-site residue (D54), which becomes phosphorylated, is marked. The secondary structures are labeled according to their position along the chain.

The loops fluctuate freely in the inactive protein and become restricted upon phosphorylation. As well as large secondary structure rearrangements, a number of sidechains change their orientation upon phosphorylation. Among these are L87, A90, and V91, which rotate from the core of the protein to the outside. Upon phosphorylation, new interactions [identified by the nuclear overhauser effect (NOE)] are formed between residues A89 (in R4) and M81 (in β4) and residues Y94 (in R4) and I80 (in β4) and Y101 (in β5). The reorientation of these hydrophobic residues results in formation

10.1021/jp076628e CCC: $40.75 © 2008 American Chemical Society Published on Web 02/05/2008

Change in N Regulatory Protein C of a hydrophobic surface.6 The phosphorylation-dependent exposure of the hydrophobic surface of R4 results in its interaction with the transactivation domain of the protein, which is needed for the biological function. Other experimental studies have found a strong correlation between phosphorylation-driven activation of NtrC and its microsecond time scale backbone dynamics.9 There is evidence of dynamical exchange between the active and inactive conformations. Both states are populated in the wild-type unphosphorylated form, and phosphorylation shifts the equilibrium toward the active form. The activation process, which happens on micro- to millisecond timescales, is quantified by Rex, which measures the conformational exchange factor between the conformations that have different chemical environments. Rex shows that many residues of the inactive wild-type show microto millisecond timescale motions, which disappear in the active form. The population of the active form of NtrC in the wildtype unphosphorylated ensemble is between 2 and 10%.9 After phosphorylation, the equilibrium involves more than 99% of the active form, and inactive conformations virtually disappear. These observations suggest a two-state, allosteric behavior for the conformational transition from the inactive form to the active form. Hu and Wang10 have studied the unphosphorylated (inactive) form (PDB code: 1DC7) and the phosphorylated (active) form (PDB code: 1DC8) with molecular dynamics (MD) and targeted molecular dynamics simulations. They did not include the phosphoryl group of the activated form in their simulations. They used the AMBER7.0 force field11 and the TIP3P water model.12 They found the inactive form to be more flexible than the active form, in agreement with the experiments. They monitored the time evolution of the root-mean-square deviation (rmsd) of the heavy atoms of the backbone from the initial structure, averaged over the last 6 ns of the MD run. The rmsd of the active form stabilized around 3.5 Å after about 3.5 ns of simulation. However, the rmsd of the inactive form was still increasing from 4.5 Å at 9.5 ns at this point. This result could be due to the inactive form being more flexible. However, it could also be due to the protein unfolding. The average structure calculated from the NMR snapshots in the PDB file of the inactive form, 1NTR (see section 3.2), has a significant reduction in the secondary structure content in the regions where Hu and Wang report seeing large fluctuations, i.e., R4 and loop β3R3. However, they also reported that the C-terminal end of the protein (R5) had a deviation of 14 Å from the initial structure, which suggests that R5 might have been unfolding. We also note that deviations of over 10 Å, which were reported for R4, may also be associated with helix unfolding rather than fluctuation. Our MD results, reported below, are more consistent with unfolding behavior. In this paper, our aim is to study the conformational changes associated with the activation of NtrC, using the discrete path sampling method.13-15 The key question we are aiming to answer here is how the wild-type NtrC transforms into the activated conformation from the inactive conformation. In future work, we aim to compare these results with pathways for the phosphorylated protein. This comparison should enable us to distinguish between the suggested allosteric activation mechanism,9 where phosphorylation shifts a preexisting equilibrium between the inactive and active forms, and an induced fit mechanism, where phosphorylation triggers a conformational shift. Elucidating the precise mechanism is an important issue, since it has been suggested that the allosteric mechanism may

J. Phys. Chem. B, Vol. 112, No. 8, 2008 2457 be at work in other proteins that exhibit conformational exchange equilibria,9,16 such as calmodulin. To start our DPS calculations, we required two representative endpoints for the active and inactive conformations of NtrC. Initially, we tried to obtain these endpoints using MD simulations starting from the NMR structures of the active and inactive forms. However, the active conformation (PDB code: 1NTR) unfolds after 1 ns of simulation with CHARMM19 and the EEF1 solvation model. Instead, we performed an initial DPS run from the active and inactive NMR conformations. We selected a number of low-energy active and inactive conformations obtained by DPS and performed short 1 ns MD simulations on each one to further relax them. We collected snapshots from the MD simulations every 20 ps and minimized each snapshot. To distinguish between the active and inactive conformation, we chose an order parameter that clearly distinguished the corresponding NMR conformations, namely, the distance between the R3-β3 loop and R2. We clustered the minimized snapshots according to this order parameter and selected the lowest energy conformation from the active and inactive cluster as the starting endpoints for subsequent DPS runs. 2. Methods 2.1. Discrete Path Sampling. We have studied the conformational transition between the active and inactive forms of NtrC using the discrete path sampling (DPS) method. Details of the theory have been presented elsewhere;13-15 here, we provide a brief overview. The DPS approach is based upon construction of databases of local minima on the potential energy surface (PES) and the transition states that connect them. Here, a transition state is defined geometrically as a point where the Hessian matrix of second derivatives has only one negative eigenvalue, and a minimum is defined as a point where this matrix has no negative eigenvalues. Phenomenological two-state rate constants are calculated from these databases using statistical rate theory for the individual minimum-to-minimum transitions. In the present work, we employ the harmonic approximation to construct local densities of states for the minima and transition states, along with a consistent formulation of transition state theory for the rate constants between directly connected minima. DPS does not require prior knowledge of a reaction coordinate. An initial pathway is constructed by linking together minimum-transition state-minimum triples on the potential energy surface, using tools based on geometry optimization and network flow analysis.17 For distant minima, the initial path usually involves many high-energy structures, and is kinetically irrelevant. However, as the database of stationary points is refined, more relevant pathways are progressively located. This refinement is continued until further additions have a negligible effect on the rate constants calculated from the database. Since large steps are taken between local minima, it is usually possible to recover rapidly from a poor initial path and obtain results that are independent of this starting point. It is important to define an order parameter that is capable of classifying the local potential energy minima into product and reactant sets (A and B) for the endpoints and all the intervening minima that belong to neither set. Calculating a particular A to B path requires a number of transition state searches, and we employ a previously described approach based on Dijkstra’s algorithm18 from network flow theory to connect distant minima. Since DPS is designed to locate the most important kinetic paths between fixed endpoints, it does not sample stationary points corresponding to high-energy, off-

2458 J. Phys. Chem. B, Vol. 112, No. 8, 2008 pathway events, such as unfolding of the starting structure, observed in MD simulations of NtrC. The DPS procedure can be considered as a framework designed to grow stationary point databases that are relevant to extracting global kinetics.19-21 The rate constants calculated from such databases are formally exact within a Markov approximation for transitions between the local minima if the database is large enough. However, further simplifications are required to make the calculations feasible, and aside from finite sampling, the other main approximation employed in the present work is the use of harmonic densities of states and harmonic transition state theory for the individual minimum-to-minimum rate constants. These additional approximations are deemed appropriate because the empirical potential itself is probably the greatest source of uncertainty. For example, one recent study has found that torsional anharmonicity only changes the harmonic transition state rate constant by a factor of 2 in hydrogen abstraction from ethane.22 The Markov assumption itself would probably lead to an underestimation of the rate constants, although the largest effects will occur for low barrier processes that are probably not rate determining. In contrast, the use of an implicit solvent model will probably accelerate the kinetics, since explicit relaxation of the solvent is not included. In the present contribution, we focus upon structural aspects of the predicted pathways for conformational change, which are likely to be more accurate than numerical predictions of the corresponding rate constants. 2.2. Potential Energy Function. The force field we used in the DPS calculations was the CHARMM19 polar hydrogen energy function,23 in which only the polar hydrogens are explicitly defined. The solvent model we used was the effective energy function (EEF1) for proteins in solution.24 In this solvation model, the ionic sidechains are neutralized and a distance-dependent dielectric constant is used to approximate the charge-charge interactions between the residues. In benchmark MD simulations with EEF1, the deviations from the experimental structure of a number of test proteins were found to be similar to those obtained in explicit solvent calculations.24 The calculated enthalpy of unfolding for polyalanine is also similar to the experimental value.24 EEF1 is capable of distinguishing between correctly and incorrectly folded proteins, both in static energy evaluations and in MD simulations.24 EEF1 works with a specific set of nonbonded cutoffs. The van der Waals and electrostatic interactions are cut off at 9 Å, with a switching function between 7 and 9 Å. The distance cutoff for generating the pair list was set to 15 Å since the usual value of 10 Å for EEF1 produced discontinuities that interfere with accurate geometry optimization. The temperature of the simulation was 298 K. EEF1 simulations require about twice as much computer time as vacuum simulations. 3. Results 3.1. Order Parameter Selection. Following advice from the experimental group who obtained the original data,6,9 we selected NMR structures for the inactive form (PDB code: 1NTR) and the berrylofluoride activated form (PDB code: 1KRW) to define the conformations of the two endpoints for the DPS calculations. 1NTR and 1KRW have 20 and 26 NMR snapshots, respectively. The first NMR model of 1KRW lacks the last residue (GLU 124). To provide a consistent comparison, we removed this residue from the rest of the NMR models. Since the protein becomes more accessible to the solvent upon activation, we checked the solvent accessible surface area for each residue in the active and inactive databases of the NMR

Khalili and Wales

Figure 2. Average CR surface area, S, of each residue in the (A) open (inactive) and (B) closed (active) forms of NtrC. The amino acids that show more than a 6 Å2 change in surface area are marked. (C) Absolute difference in the surface area for the marked residues. Regions β4 to mid R5 show the greatest change in the solvent accessible surface area.

conformations. We used the program Naccess,25 which calculates the atomic accessible surface by rolling a probe of given size around the van der Waals surface of each atom. The default probe size of 1.4 Å was used. The accessible area of all the CR atoms of each NMR snapshot, in the active and inactive forms, were averaged over the entire pool of snapshots belonging to each group. The results are shown in Figure 2. The largest changes in surface area are observed in R1 and the C-terminal region of the protein. It is important to highlight these changes because these areas are the same ones that lose the secondary structure in our MD simulations (see section 3.2). They are also the same areas that are reported to rearrange upon activation of the protein in experiments.6 Upon activation, the accessible surface area increases for some residues (R3, G27,

Change in N Regulatory Protein C

J. Phys. Chem. B, Vol. 112, No. 8, 2008 2459

Figure 4. N37-G59 CR distance in angstroms for each NMR conformation of 1NTR and 1KRW. The solid line is the distance obtained from the NMR conformations of the “open’’, inactive form (1NTR). The dashed line is the one obtained from the “closed’’, active form (1KRW). There is a 2 Å gap between the shortest distance of the open form and the longest distance of the closed form.

Figure 3. Superposition of the average NMR structures of 1NTR9 (open (inactive), black) and 1KRW7 (closed (active), gray). The open average geometry loses some secondary structure. However, R2 stays the same in the N-terminal part of both average structures. The position of the β3-R3 loop changes significantly in the closed and open structures. In the average active structure, the loop has a closed conformation, whereas in the inactive structure, it adopts an open conformation.

K46, H84, G97, P103, P105, D107, A113), while it decreases for others (A24, G25, A93, A98, F106, A118). Hence, we see that polar and charged groups become more accessible to the solvent. There is no significant increase in surface area for L87, A90, and V91. The loop above the active site (β3-R3 loop, residues 5564) changes conformation significantly in the active and inactive forms. In the inactive form, it has an “open’’, upright conformation, whereas in the active form it flips toward R2 and adopts a “closed’’ conformation.6 To see this change more clearly, the average structures of the open and closed forms were calculated from the NMR databases. The average structures were then minimized for 1000 steps with the steepest-decent minimization method in CHARMM to remove clashes and relax them. The superposition of the minimized “average’’ closed and open structures is shown in Figure 3. In both average conformations, some of the secondary structures are shorter, especially for the open average structure, in which β2 is significantly shorter, R3 has lost one turn, and R1 is one and half turns shorter in its N-terminal region. However, the structure of R2 is the same in its N-terminal section, although it is shorter by one turn in the average structure of the closed form (Figure 3). Hence, the distance between this loop and R2 seems to be a suitable order parameter. After monitoring the distance between a number of residues in the loop and in R2, the distance between the CR atoms of N37 and G59 showed a clear separation between the open (inactive) and closed (active) forms (Figure 4). 3.2. MD Simulations of the Endpoints. The structures obtained experimentally by either NMR or X-ray methods are usually very high in energy and have clashes when they are used as starting points for empirical force fields. Although it is possible to minimize these structures to small rms forces, this procedure may produce a local minimum that is still much

higher in energy than the global minimum of the potential energy landscape. The stationary point databases produced by DPS depend on the choice of the endpoints, and it is important to supply physically meaningful structures. To relax the experimental geometries, we selected the first conformation of each of the two PDB files, and minimized them to an rms force of 10-6 kcal/mol. We then performed MD simulations for 5.5 ns to relax the structures. Figures 5A and B show the all-atom rmsd deviation of selected secondary structure segments for the selected conformations. It is evident that for the open (inactive) conformation R1, β5 and R5 unfold significantly. We note that these are the same regions that show the greatest change in the accessible surface area (Figure 2). For the closed (active) state, these secondary structures are stable for the duration of the MD simulation (Figure 5B). Since we could not obtain low-energy conformations from MD without unfolding the closed endpoint, we instead started a short DPS run with the energy-minimized PDB structures as endpoints and constructed a small database of minima and transition states between them. The disconnectivity graph14 for this database is shown in Figure 6. Disconnectivity graphs partition the minima on the potential energy (or free energy26,27) landscape of the protein into disjoint sets based on the energies of the transition states.28,29 In a disconnectivity graph, the vertical axis represents the potential or free energy, and the horizontal axis can be arbitrary. The most important feature of the potential energy landscape in Figure 6 is that it separates into two distinct regions. The coloring of the graph identifies the minima based on the value of our chosen order parameter (the N37-G59 CR distance). The purple regions contain closed (active) conformations, while the green regions contain open (inactive) conformations. The orange regions contain very open structures, which are not observed in the NMR databases. Both regions of the graph contain significant, low-energy traps. However, these traps are probably just artifacts of incomplete sampling. This preliminary survey of the PES was designed to improve the initial path between the endpoints. Low-lying minima that were found during this process, but lie off-pathway, were not selected for further connection attempts. The two initial endpoints do not appear in the graph because they lie a great deal higher in energy than other minima located during the refinement of the database (-3459.5 kcal/mol for the open conformation and -3443.2 kcal/ mol for the closed conformation).

2460 J. Phys. Chem. B, Vol. 112, No. 8, 2008

Figure 5. (A) All-atom rmsd for selected secondary segments of the open protein during MD simulations. The protein unfolds after about 1 ns. The unfolding is especially pronounced in R1 and β5. (B) Values of the rmsd for the selected secondary structure segments of the closed protein. Unlike the open (inactive) protein, the closed (active) protein is stable on the MD timescale.

We selected a number of minima from each region of the disconnectivity graph for further analysis. Figure 7 shows the conformations that are marked in Figure 6. In all four minima, the secondary structures are well-formed, except for minimum 51662, which is missing β4 and β5. However, β4 and β5 are also poorly formed in some of the NMR conformations of 1NTR. The other open minimum (44474) is well-formed and well-packed. The minimum belonging to the non-NMR open category (56183), is distinctively open. In this minimum, we also see a significant change to R4, which is longer and has taken over some parts of the β4-R4 loop. This change removes any clashes in the right-hand of the active-side loop and allows it to flip to the right and adopt the very open conformations we have characterized. It is quite possible that this region of the energy landscape is an artifact of the potential energy function or reflects the quality of the NMR database. However, the fact that similar unfolding and instability have also been observed in other MD simulations with the AMBER force field and explicit TIP3P water suggests that this region might possess ambiguous secondary structure, which becomes stabilized upon the activation of the protein. If so, this observation would be in good agreement with the experimental NMR snapshots of the open (inactive) form, which fluctuate significantly and exhibit a large solvent accessibility in the same regions. It would be interesting to conduct simulations with alternative force fields and solvent models in future work for comparison.

Khalili and Wales

Figure 6. Potential energy disconnectivity graph obtained after DPS sampling started from the PDB minimized structures as endpoints. The vertical axis is the potential energy in kilocalories per mole. The horizontal axis is arbitrary. Each node on the graph represents a set of structures that are connected by a transition state below that threshold. The orange regions of the graph represent structures that are very open and have loop configurations not observed in the NMR database. The green regions of the graph represent open (inactive) structures. The purple regions represent closed (active) structures. The positions of four particular local minima are indicated by short red lines; these minima are illustrated in Figure 7.

We performed 1 ns MD simulations for each of the minima in Figure 7 just to let the structures relax, but not allowing them to unfold. We collected conformations every 20 ps, and subsequently minimized every snapshot to an rms force of 10-6 kcal/mol. Figure 8 shows the potential energy histograms of the MD snapshots. Compared to the two initial PDB structures, the conformations are about 300 kcal/mol lower in energy. They are also about 100 kcal/mol lower in energy than the four original minima from which we started the MD simulations. Figure 9 shows that the histogram of the order parameter in the MD snapshots can clearly be divided into three ranges: 5-11, 11-17, and 17-25.5 Å. The most probable conformations are the open (inactive) category. We selected the lowest energy conformations in the open and closed categories for use in further DPS calculations, excluding the very open, non-NMR conformations. The two new endpoints along with their energies are shown in Figure 10; the secondary structures are well formed in each of them. The order parameter is 20 Å for the open conformation and 10.5 Å for the closed conformation, and the potential energies are almost identical and fall in the low-energy tail of the potential energy histogram (Figure 8).

Change in N Regulatory Protein C

J. Phys. Chem. B, Vol. 112, No. 8, 2008 2461

Figure 7. Representative structures from each region of the disconnectivity graph in Figure 6. Structures 51662 and 44474 are open conformations, 43728 is a closed conformation, and 56183 is not observed in the NMR database. In all four conformations, the secondary structures are wellformed and well-packed, except β4 and β5, which are missing in 51662. β4 and β5 are also missing in some of the NMR conformations of 1NTR. In 56183, R4 is significantly longer and twisted, causing the β4-R4 loop to be shorter and allowing the active side loop to swing toward R4 and produce large values for the chosen order parameters.

Figure 8. Potential energy histogram of MD snapshots. The energies sampled lie significantly lower in energy than the initial PDB structures and hence the endpoint conformations in the original DPS run.

3.3. DPS Calculations. After obtaining two suitable endpoints, we proceeded to sample the potential energy landscape of the protein using the DPS approach. The resulting disconnectivity graph is shown in Figure 11. From the graph, it is evident that there are virtually no low-energy, non-NMR like conformations in the database. Such conformations only appear if we start from high-energy PDB structures, because then the protein can fall into local traps. The energy landscape of the protein could be described as a potential energy funnel,30 and

Figure 9. Histogram of N37-G95 CR distances obtained from the MD snapshots. The distances can clearly be divided into three groups, closed (5-11), open (11-17), and very open (17.5-25.5 Å).

the global minimum is a homogeneous cluster of open conformations. In general, the graph lacks hierarchical structure,28,29 meaning that open and closed structures are not necessarily separated by barriers on different energy scales. However, there are a number of clusters of different energies that contain similar open conformations and one cluster that contains similar closed conformations. The two endpoints are no longer the lowest in potential energy, since DPS sampling has located structures that lie as much as 15 kcal/mol lower.

2462 J. Phys. Chem. B, Vol. 112, No. 8, 2008

Khalili and Wales

Figure 11. Potential energy disconnectivity graph for NtrC obtained by sampling pathways between the new endpoints. The scheme is the same as for Figure 6. There are no non-NMR conformations in this graph because they are much higher in energy than the NMR-like conformations. A number of conformations with energies below the chosen endpoints are observed. The open and closed structures do not generally separate into clusters along the horizontal axis. This structure indicates that the transition states between open and closed structures span a wide range of potential energy.

Figure 10. New closed and open endpoints for NtrC selected from the energy-minimized MD snapshots. The protein is well-packed in both structures, which have comparable potential energies.

We now compare the potential energy and free energy landscapes. Free energies were first calculated for individual potential energy minima in the harmonic approximation using normal-mode analysis. The database was then regrouped by combining the minima separated by free energy barriers lower than 4 kcal/mol. The transition states were regrouped according to the free energy groups they connected, as described elsewhere.31 This scheme produces a free energy disconnectivity graph26,27 (Figure 12) that again lacks hierarchical structure, especially at higher energy. Nevertheless, at the bottom of the graph, we see separate funnels consisting exclusively of either open or closed structures. This separation means that the landscape has a hierarchical structure, with a higher free energy barrier to interconversion of the open and closed forms than the barriers between minima within each region. The rate of transition between these two minima is 200 per second obtained from graph transformation rate calculations.32 There is another double funnel of separate open and closed structures that lies on average about 4 kcal/mol higher in free energy. At room

temperature, the protein would preferentially occupy the two funnels at the bottom of the free energy landscape in equilibrium. Figure 13 shows snapshots of stationary points from the PES along one pathway between the two endpoints in the DPS database. This path makes the largest contribution to the phenomenological two-state rate constant if recrossings are ignored and the intervening minima are assumed to be in steady state. Visualizing the path in terms of stationary points provides a particularly clear picture of the underlying structural changes, because vibrational noise is removed. However, it is important to realize that all the thermodynamic and kinetic properties are calculated at 298 K using harmonic densities of states and a consistent rate theory. We do not attempt to correct the potential energy, entropy, or free energy for further anharmonic effects because the empirical potential is probably the greatest source of error in the present work. There is a significant correlation between the motion of R2, R3, and R4 in the structures illustrated in Figure 13. Helices “slide’’ past one another before the active site loop flips to the closed conformation. In the open (inactive) endpoints, the three helices become more closely aligned. Initially, R4 moves up relative to R2 and R3, and R2 moves up relative to R3 and becomes more level with respect to R4. Finally, the loop flips toward R2. The loop does not close immediately after the upward shift of R2; instead, there are some minor conformational fluctuations first. Furthermore, if we look from the direction of N and C termini to the center of the protein, it is clear that R2, R3, and R4 rotate clockwise from the center. The rotation and shift of R4 have been observed experimentally.6

Change in N Regulatory Protein C

J. Phys. Chem. B, Vol. 112, No. 8, 2008 2463

Figure 12. Free energy disconnectivity graph for NtrC using the same scheme as for Figure 6. In contrast to the potential energy landscape, the global minimum of the free energy landscape includes distinct regions of open and closed structures. Higher energy parts of the graph lack hierarchical structure, except for another double cluster of separate closed (active) and open (inactive) structures, which lie about 4 kcal/ mol higher in free energy.

The behavior of some of the residues, which showed the largest increases in surface area in Figure 2, is very interesting (Figure 14). R3, which in the open conformation interacts tightly with R5, flips from the interior of the protein toward R1 early along the path, and it interacts with the carbonyl of A24 for the remainder. The flip of R3 unhinges β1 and R5 from one another. This flip happens before any rearrangement of the secondary structure occurs in the rest of the protein (the conformation is still inactive) and explains why so many open NMR conformations show fluctuations in their C-terminal domain. The same behavior is observed later along the path for K46, which in the open conformation is flipped toward the interior of the protein and interacts with β5. K46 flips outward and is exposed to the solvent. The outward flip of K46 unhinges R2 from β2 and allows R2 to slide upward relative to R3. The shift of R2 relative to R3 happens only after this residue has flipped to the outside. The result of the unhinging of these two key residues (R3 and K46) is that the protein expands along the pathway and the total surface area increases substantially, as shown in Figure 15. Here, the total surface area of the protein is calculated as the sum of the accessible areas of all the atoms. The surface area of the first endpoint is subtracted from all the other snapshots along the pathway to show the change in the surface area. The increase in surface area that we observe is in agreement with experimental observations.6 We find a significant interaction between residues Y94 and Y101 upon activation, as in experiment.6 These residues do not

Figure 13. (A) Side views and (B) bottom views of NtrC for a kinetically relevant pathway. The three helices, R2, R3, and R4, move significantly and in phase with each other. Initially, R4 slides upward relative to R2 and R3, then R2 slides up relative to R3, and finally the loop closes. (B) There is a significant clockwise rotation of the three helices with respect to the center of the protein. The label n corresponds to the number of stationary points along the discrete pathway.

interact in the open (inactive) protein. However, the ring of Y94 moves toward the ring of Y101 at the same time that the active site loop flips, and the ring-ring stacking interaction is present for the rest of the path. Since the active site loop is far away from these residues, this rearrangement provides a clear example of distal motion, which occurs upon changes in the active site. Experimentalists observe the protein to fluctuate less upon activation.9 In order to measure the degree of fluctuation, we calculated the log product of positive normal-mode frequencies, Λ, for each minimum and transition state along the pathway. The vibrational entropy in the harmonic approximation is proportional to -Λ. If the protein becomes “stiffer’’ we would expect Λ to increase. Figure 16 shows Λ for the minima and transition states along the pathway. Λ for the transition states

2464 J. Phys. Chem. B, Vol. 112, No. 8, 2008

Khalili and Wales

Figure 14. Alternative view of NtrC along the selected pathway with two important charged residues marked. The blue residue is R3, and the red one is K46. The movement of these two residues along the path unhinges the secondary structures from one another and allows for the expansion of the protein and a significant increase in the solvent accessible surface area. The label n is the number of stationary points along the pathway.

pathway for both minima and transition states. This increase is consistent with the protein becoming stiffer as it becomes activated because more intramolecular contacts are formed. 4. Conclusions

Figure 15. Change in the total surface area, ∆S, of the protein along the pathway. Initially, the protein contracts before expanding significantly. Toward the end of the pathway, the protein contracts again but remains significantly more exposed than the initial endpoint.

Figure 16. Logarithm of the product of positive normal-mode frequencies, Λ, for all the minima and transition states along the pathway. -Λ is proportional to the vibrational entropy in the harmonic approximation. Λ is higher by about 30 for the minima compared to the transition states, corresponding to an extra normal mode angular frequency of about 1013 rad/s. Although there is a large fluctuation in Λ, there is also a systematic increase as the protein becomes stabilized along the pathway.

is about 30 less than for the minima, corresponding to an additional angular frequency of order 1013 rad/s (or 1012 Hz) in each minimum. This difference occurs because the imaginary frequency of each transition state, corresponding to the reaction vector, is excluded from Λ. Although there are significant fluctuations, there is a systematic increase in Λ along the

In this study, we have applied the DPS method to study the conformational change associated with the activation of NtrC. We could not obtain the initial endpoints from MD simulations of the PDB structures, because the closed conformation unfolded after about 1 ns of simulations. Instead, we obtained the endpoints by performing a short DPS simulation to survey the potential energy landscape. We then selected a number of lowenergy closed and open conformations from the PES and used them as starting points for short MD simulations where snapshots were collected every 20 ps and subsequently minimized. We sought a suitable order parameter by analyzing the closed and open conformations of NtrC obtained by NMR and selected the distance between the R3-β3 loop and R2. This loop adopts an open conformation in the unphosphorylated form. In the phosphorylated form it moves toward R2 and adopts a closed configuration. We clustered the MD snapshots into closed and open sets according to the order parameter and selected the lowest-energy conformations as the initial endpoints for further DPS runs. We constructed both potential and free energy disconnectivity graphs from the resulting stationary point databases. The free energy landscape is different from the PES in that the lowest feature could be described as a double funnel. One of these regions contains only closed conformations, while the other contains only open conformations. If we take the average free energy of each set, then the closed and open conformations are separated by a barrier of only about 5 kcal/mol. At roomtemperature we would therefore expect the unphosphorylated protein to fluctuate between the closed and open conformations, in agreement with experiment. It is possible the barrier becomes lower in energy upon phosphorylation, so that the transition between the closed and open conformations becomes even faster. We obtained a transition rate of 200 per second between the two free energy minima, which is around the expected range for conformational transitions for proteins. We also found another double funnel feature of open and closed conformations, which is about 4 kcal/mol higher in free energy at 298 K. To analyze the structural changes that accompany the conformational transition, we then focused upon the discrete path that makes the largest contribution to the two-state rate constant when intervening local minima are placed in steadystate. We checked that this fastest path made a dominant contribution to the kinetics using the recursive enumeration algorithm33 of Jime´nez and Marzal to obtain the k shortest paths between the end points. By weighting the connections between

Change in N Regulatory Protein C local minima appropriately,13,34,35 this procedure actually ranks the contributions of discrete paths to the two-state rate constant. We observed significant shuffling and rotation of both individual residues and secondary structure segments along the activation pathway. R4 in particular shifts upward relative to R2 and R3, and it rotates counterclockwise relative to the center of the protein. It is the first secondary structure to change its position significantly. The movement of R4 occurs before the R3-β3 loop closes. A number of residues play a major role along the activation pathway. In particular, the ring-ring stacking of Y94 and Y101, which is lacking in the open conformation, is essential for the stabilization of the closed conformation. The rotation of Y94 occurs at the same time that the R3-β3 loop adopts the closed conformation and persists for the remainder of the activation pathway. Experimentalists have observed an increase in the solvent accessible surface area upon activation. To check if we observe the same behavior, we calculated the total solvent accessible surface area and found that indeed the total surface area of the protein increases significantly upon activation. To investigate the cause of this increase, we went back to the NMR conformations and determined which residues had the largest change in the solvent accessible surface area upon phosphorylation. We monitored the behavior of these residues along the activation pathway. Two residues showed the most dramatic change in their orientation in DPS. These two residues were R3 and K46. R3 stabilizes R5 to the core of the protein, while K46 stabilizes R2 to the core of the protein in the open conformations. Along the activation pathway, both of these residues rotate from their initial positions. Consequently, the unhinging of R2 and R5 causes the protein to expand and the surface area to increase. Experimentalists have observed that the protein becomes less mobile upon activation. In fact, comparison of the collective NMR structures of the open and activated (closed) forms shows that there is greater fluctuation in the open form, which disappears in the closed form. We monitored the log product of normal-mode frequencies along the activation pathway for all the stationary points. This quantity increases for both the minima and the transition states when the protein adopts the closed conformation, in agreement with the observation that the protein becomes more rigid upon activation. Other features of our calculated pathways are also in agreement with experiment, including the large shifts and rotations of the helices and the small movements of key residues. This agreement suggests that other details of the pathways, for which no experimental data is available, may provide useful predictions of how the conformational change is achieved. The DPS approach is designed to analyze kinetics between fixed sets of product and reactant structures. In the present study, our original choice of endpoints proved to be inappropriate, and this was reflected by the location of minima with significantly

J. Phys. Chem. B, Vol. 112, No. 8, 2008 2465 lower potential energy. A consistent picture emerged through a combination of local relaxation using thermodynamic weighting through molecular dynamics and analysis of the global kinetics using DPS. Acknowledgment. We gratefully acknowledge advice from Prof. Dorothee Kern and her group concerning the initial setup of these calculations, and Dr. Joanne Carr for her assistance with the k shortest paths calculations. This research was supported by a grant from the BBSRC. References and Notes (1) Parkinson, J. S.; Kofoid, E. C. Annu. ReV. Genet. 1992, 26, 71. (2) Mizuno, T.; Kaneko, T.; Tabata, S. DNA Res. 1996, 3, 407. (3) Mizuno, T. DNA Res. 1997, 4, 161. (4) Porter, S. C.; North, A. K.; Kustu, S. In Two-Component Signal Transduction; Hoch, 1998; pp 271-277. (5) Volkman, B. F.; Nohaile, M. J.; Amy, N. K.; Kustu, S.; Wemmer, D. E. Biochemistry 1995, 34, 1413. (6) Kern, D.; Volkman, B. F.; Luginbuhl, P.; Nohaile, M. J.; Kustu, S.; Wemmer, D. E. Nature 1999, 402, 894. (7) Hastings, C. A. S.-Lee, Y.; Cho, H. S.; Yan, D.; Kustu, S.; Wemmer, D. E. Biochemistry 2003, 42, 9081. (8) Hwang, I.; Thorgeirsson, T.; Lee, J.; Kustu, S.; Shin, Y. Proc. Natl. Acad. Sci. U.S.A. 1999, 96, 4880. (9) Volkman, B. F.; Lipson, D.; Wemmer, D. E.; Kern, D. Science 2001, 291, 2429. (10) Hu, X.; Wang, Y. J. Biomol. Struct. Dyn. 2006, 23, 509. (11) Pearlman, D. A.; Case, D. A.; Caldwell, J. W.; Ross, W. S.; Cheatham, T. E., III; DeBolt, S.; Ferguson, D.; Seibel, G.; Kollman, P. Comp. Phys. Commun. 1995, 91, 1. (12) Jorgensen, W. L. J. Phys. Chem. 1983, 87, 5304. (13) Wales, D. J. Mol. Phys. 2002, 100, 3285. (14) Evans, D. A.; Wales, D. J. J. Chem. Phys. 2004, 121, 1080. (15) Wales, D. J.; Bogdan, T. V. J. Phys. Chem. B 2006, 110, 20765. (16) Hawkins, R. J.; McLeish, T. C. B. Phys. ReV. Lett. 2004, 93, 098104. (17) Carr, J. M.; Trygubenko, S. A.; Wales, D. J. J. Chem. Phys. 2005, 122, 234903. (18) Dijkstra, E. W. Numer. Math. 1959, 1, 269. (19) Berry, R. S.; Breitengraser-Kunz, R. Phys. ReV. Lett. 1995, 74, 3951. (20) Ball, K. D.; Berry, R. S. J. Chem. Phys. 1998, 109, 8557. (21) Ball, K. D.; Berry, R. S. J. Chem. Phys. 1999, 111, 2060. (22) Sturdy, Y. K.; Clary, D. C. Phys. Chem. Chem. Phys. 2007, 9, 2397. (23) Brooks, B. R.; Bruccoleri, R. E.; Olafson, B. D.; States, D. J.; Swaminathan, S.; Karplus, M. J. Comp. Chem. 1983, 4, 187. (24) Lazaridis, T.; Karplus, M. Proteins: Struct. Funct. Genet. 1999, 35, 133. (25) Hubbard, S. J.; Thornton, J. M. NACCESS; Department of Biochemistry and Molecular Biology: University College London. (26) Krivov, S. V.; Karplus, M. J. Chem. Phys. 2002, 117, 10894. (27) Evans, D. A.; Wales, D. J. J. Chem. Phys. 2003, 118, 3891. (28) Becker, O. M.; Karplus, M. J. Chem. Phys. 1997, 106, 1495. (29) Wales, D. J.; Miller, M. A.; Walsh, T. R. Nature 1998, 394, 758. (30) Goldstein, R. A.; Luthey-Schulten, Z. A.; Wolynes, P. G. Proc. Natl. Acad. Sci. U.S.A. 1992, 89, 9029. (31) Carr, J. M.; Wales, D. J. J. Chem. Phys. 2005, 123, 234901. (32) Trygubenko, S. A.; Wales, D. J. Mol. Phys 2006, 104, 1497. (33) Jimenez, V. M.; Marzal, A. In Algorithm Engineering: 3rd International Workshop, WAE’99, London, UK, July 1999; Vitter, J. S., Zaroliagis, C. D., Eds.; Springer: Berlin/Heidelberg, 1999. (34) Wales, D. J. Mol. Phys. 2004, 102, 891. (35) Wales, D. J. Int. ReV. Phys. Chem. 2006, 25, 237.