Hamiltonian Replica Exchange Method Study of Escherichia coli and

Nov 19, 2009 - A small number of systems were found to be sufficient to enlarge the sample space substantially, on the basis of root-mean-square fluct...
0 downloads 0 Views 4MB Size
J. Phys. Chem. B 2009, 113, 16197–16208

16197

Hamiltonian Replica Exchange Method Study of Escherichia coli and Yersinia pestis HPPK Li Su and Robert I. Cukier* Department of Chemistry, Michigan State UniVersity, East Lansing, Michigan 48824-1322 ReceiVed: April 27, 2009; ReVised Manuscript ReceiVed: August 18, 2009

HPPK (6-hydroxymethyl-7,8-dihydropterin pyrophosphokinase) catalyzes the transfer of pyrophosphate from ATP to HP (6-hydroxymethyl-7,8-dihydropterin). This first reaction in the folate biosynthetic pathway is a potential target for antimicrobial agents. A Hamiltonian replica exchange method (HREM) molecular dynamics (MD) approach is used, with the goal of improving conformational sampling, whereby multiple copies of the system are run without requiring a large number of system copies. For HPPK, the aim is to improve conformational sampling around the HP binding pocket and thereby find near-closed conformations (similar but not identical to the binding pocket of HP, as defined by the ternary crystal structure). Near-closed conformations may be better targets for the design of species-selective inhibitors. Well-populated, nearclosed conformations of Escherichia coli HPPK (EcHPPK) and Yersinia pestis HPPK (YpHPPK) were found with HREM by focusing on the interactions involving loops 2 and 3 that are known to be the more flexible regions of HPPK. A small number of systems were found to be sufficient to enlarge the sample space substantially, on the basis of root-mean-square fluctuation measures, relative to the results of a conventional MD simulation. By clustering snapshots on the basis of some of the key residues that form the HP binding pocket, distinct HREM-generated conformations are found. Residue displacements mainly from loop 2 are responsible for the distinct conformers found, relative to the crystal structure, for both EcHPPK and YpHPPK. In contrast, the conventional MD simulations of EcHPPK and YpHPPK each lead essentially to one cluster, with use of the same clustering criterion as for the HREM. The shapes of the HREM near-closed binding pockets are qualitatively investigated and found to be different. Some of these conformations are distinguishable between EcHPPK and YpHPPK, indicating that there may be differing species-selective, near-closed conformations suited to HP binding. 1. Introduction Molecular dynamics (MD) simulations of protein motions with explicit solvent that are run on the current practical time scale of tens of nanoseconds often stay in the neighborhood of the starting configuration. Due to potential energy barriers that are large compared with the thermal energy, separate stable states may not be sampled or may not be sampled with weights corresponding to the Boltzmann distribution. This generic sampling problem is a major concern in MD and Monte Carlo (MC) simulations. A substantial number of sampling enhancement methods (such as multicanonical ensemble;1,2 simulated tempering;3,4 and parallel tempering, also referred to as the replica exchange method (REM)5-11) have been developed to address this issue. The first versions of the REM used in MD or MC simulations were temperature REM (TREM), and recently, a Hamiltonian REM (HREM) was introduced.12 The REM methods contain two elements: (1) multiple copies of configurations are run independently by MD or MC with different temperatures (TREM) or Hamiltonians (HREM). (2) Neighboring systems (different temperatures or Hamiltonians) may be exchanged, according to the Metropolis-Hastings algorithm.13 In the TREM, sampling is improved at the desired (usually lowest) temperature by higher temperature replicas overcoming barriers in the potential energy surface.14 TREM has the drawback that the required number of system copies roughly scales with the square root of the number of degrees of freedom of the system of interest.12 In explicit solvent * Corresponding author. Phone: 517-355-9715, ext 263. Fax: 517-3531793. E-mail: [email protected].

simulations of a typical protein, the total number of degrees of freedom is dominated by the solvent. Thus, use of TREM would require a great number of system copies to provide sufficient overlap between neighboring temperature systems to guarantee adequate exchange probabilities. To address this difficulty with TREM, Fukunishi et al.12 introduced the HREM method, in which the potential energy functions in different Hamiltonians differ only in a limited set of the total number of degrees of freedom required to characterize the system, thereby, in principle, reducing the number of systems needed. In practice, the problem of implementing an HREM method is to choose appropriate degrees of freedom to accelerate the exploration rate of the configuration space. For example, we demonstrated that for a five-residue peptide (Met-enkephalin), modifying the potential among the protein atoms and the protein and solvent while leaving the interactions of the solvent molecules intact led to a thorough exploration of the configuration space, in contrast with conventional MD.15 The need for methods that enhance conformational exploration is related to current suggestions that conformational change occurs, to a greater or lesser extent, when proteins and ligands form complexes. A progression of models has been introduced that in essence increase the role of protein plasticity in the binding of ligands.16-20 The “lock and key” model asserts that a protein has a cavity that a ligand (or another protein) can be fit into with minor rearrangements of protein and ligand. But this model cannot account for proteins that can bind differently shaped substrates. In contrast, the “induced fit” model tries to account for this by suggesting that a ligand induces a conformational change at the binding site, shifting it toward an active

10.1021/jp903861a  2009 American Chemical Society Published on Web 11/19/2009

16198

J. Phys. Chem. B, Vol. 113, No. 50, 2009

state. The pre-existing equilibrium hypothesis,21,22 based on more recent funnel energy landscape protein-folding theories,23,24 asserts that the native state of a protein can exhibit an ensemble of conformations that can span apo to more ligand-bound-like conformations. The ligand can select a prebinding conformation from the ensemble of protein conformers and then bias the equilibrium toward the catalytically competent binding conformation. Some studies have shown that a number of ligand-free proteins (staphylococcal nuclease,25,26 calbindin,27 adenylate kinase,28-32 and calmodulin33) do sample conformations required for ligand binding. A recent NMR study34 of apo ubiquitin found that the generated NMR ensemble spanned the known 46 X-ray structures of it with 46 different ligands, providing strong evidence for conformational selection.35,34 Conformational plasticity might be more advantageous for ligand binding by requiring smaller protein rearrangements (with a reduced energetic requirement) than would be necessary in the more open forms. With the idea in mind that proteins can fluctuate to produce conformations suited for ligand binding, in this work we focus on what we shall refer to as near-closed conformations (close to the closed, catalytically competent structure that characterizes the ligand-protein complex, as determined by crystallography) that are relatively stable (i.e., significantly populated) for the following reason: Active-site residues are often well preserved in a given enzyme among different species due to the fact that the fundamental catalytic mechanism and substrates are typically the same. Consequently, the closed conformation usually represents a challenging template for developing speciesselectiVe inhibitors. Compared with closed conformations, nearclosed conformations may provide greater opportunities for two homologous proteins to discriminate between different ligands. The identification of new, low-energy protein conformations that have not been trapped by crystallographic or NMR studies should provide targets for species-selective inhibitors. In this work, we investigate if HREM can be used to generate near-closed conformations. Although we anticipate that a properly designed HREM will increase the exploration of conformation space around a ligand-binding site, it could be that a more-or-less diffuse set of conformations would be obtained that are not well-distinguished. That a distinct set of near-closed conformations is obtained and can be categorized in terms of the positioning of key residues is addressed with a clustering algorithm complemented by a qualitative method of defining substrate binding cavities. The target of our study is HPPK (6-hydroxymethyl-7,8dihydropterin pyrophosphokinase), a key enzyme in the folate biosynthetic pathway, which catalyzes the transfer of pyrophosphate from ATP to 6-hydroxymethyl-7,8-dihydropterin (HP).36 The protein is essential for most microorganisms but absent from mammals because, although mammals obtain folates from their diet, most microorganisms must synthesize folates de novo.37 HPPK is an attractive target for the development of new antibiotics that are urgently needed to counteract antibiotic resistance. The two homologues that are studied here are Escherichia coli HPPK38 (EcHPPK) and Yersinia pestis HPPK39 (YpHPPK). EcHPPK is the most studied HPPK, and these studies have shown that it has several well-defined, biologically important conformational states along its catalytic cycle.40-42 YpHPPK is a causative agent of septicemic, pneumonic, and bubonic plague and is one of the most virulent pathogens known.43 Implementing HREM for a protein requires information about which parts are most rigid (a core region) and which parts are

Su and Cukier more flexible. By comparing the closed (the ternary complex with substrates HP and ATP)40,41 and open (apo)42 forms of EcHPPK, it is clear that the regions of the protein that have the largest differences between the two forms are loops 2 and 3. Furthermore, for both EcHPPK33 and YpHPPK,38,39 these loops are among the most flexible regions. Therefore, the HREM method implemented in this work focused on these two loops. Compared with using conventional MD, in which the force field is not modified, HREM helps to sample the low-energy states neighboring the closed conformations of the crystal structures for both EcHPPK and YpHPPK so that broader simulated ensembles of dominant conformations were obtained. On the basis of these simulated ensembles, clustering methods were applied to identify well-populated conformations that differ significantly in HP binding pocket conformations between the two species. Differences between near-closed conformations in the EcHPPK and YpHPPK HP binding pockets that are potential targets for designing ligands with enhanced specificity can be identified. 2. Methodology 2.1. Hamiltonian Replica Exchange Method. The temperature replica exchange method (TREM) constructs independent copies of a system that differ by their temperatures. The REM concept can be generalized to a Hamiltonian replica exchange method12 (HREM) in which the systems differ by their Hamiltonian (in practice, in their potential energy function when the kinetic energy part is left intact). As a matter of terminology, we shall refer to these different Hamiltonians as “systems” (versus “replicas”), since replica connotes a copy of an item. In our MD program, CUKMODY,30,15 for each MD step, the configuration is maintained on a particular computer node, and the systems (with different potential functions) move onto and out of that node. The Hamiltonian for the ith system within the extended HREM ensemble can be represented as, Hi(X, P) ) T(P) + Vi(X) where T(P) is the kinetic energy and Vi(X) is the potential function for the ith system with phase space coordinates X, P. Exchange attempts will be made at certain predetermined MD steps, and for this work, the attempt was made every 40 steps. Between exchange attempts, MD is run for each system, and the exchanges may be thought of either as configuration exchanges or potential energy function exchanges, which will be scale-of-interaction in this HREM implementation. Computationally, it is efficient that only a scale factor for the potential energy function needs to be exchanged, versus exchanging configurations. Exchanges are attempted only between neighboring systems because for the method to be effective, the overlap between the systems’ probability distributions needs to be adequate. (Efficiency can be roughly measured by the exchange acceptance probability, which is the fraction of successful exchange attempts, and is a compromise between the speed of motion and step size through the configuration space.) When system interchanges are attempted, detailed balance equations for pairs of neighboring systems

R(X, X' f X', X) Pi(X) Pj(X') ) R(X', X f X, X') Pi(X') Pj(X)

(1)

are enforced. Here, R(X, X′fX′, X) is the acceptance probability (transition probability) that configuration X in the ith system and X′ in the jth system before exchange results in configuration X′ in the ith system and X in the jth system after exchange, and Pi(X) is the Boltzmann distribution at temperature T ) 1/kBβ for the ith system. The Metropolis rule for exchange between two systems,

Study of E. coli and Y. pestis HPPK

R(X, X' f X', X) ) min(1, e-∆(X,X'fX',X))

J. Phys. Chem. B, Vol. 113, No. 50, 2009 16199

(2a)

where

∆(X, X' f X', X) ) β[(Vi(X') - Vj(X')) + (Vj(X) (2b) Vi(X))] is used to impose the detailed balance equations to guarantee that Boltzmann equilibrium in the extended ensemble of the product of all the systems’ ensembles will result for a sufficiently long trajectory.12 If the potential functions differ by a restricted set of degrees of freedom, only those will contribute to eq 2b. In the HREM implementation for the HPPK study, the potential energy is parametrized as

Vi(X) ) λiVLL(xL, xL) + √λiVLR(xL, xR) + VRR(xR, xR)

(3)

where the “L” subscript in eq 3 stands for “loop” and denotes the HPPK loop 2 (residues 44-53 for both EcHPPK and YpHPPK) and loop 3 (EcHPPK residues 82-93, YpHPPK residues 87-94). The R subscript stands for “rest” which denotes the rest of the atoms in the system, including the remaining parts of HPPK: ATP, the two magnesium ions, and the explicit solvent. Thus, the three terms in eq 3 are interactions between atoms of the two loops, between a loop atom and an atom within the “rest” parts, and between atoms within the “rest” parts. λi is a scaling factor for the Lennard-Jones and electrostatic nonbonded interactions. It is clear by the construction in eq 3 that the number of degrees of freedom is dominated by the rest parts; thus, the indicated scaling is much reduced relative to TREM, for which the global scaling βVi(X) ) βλiV(X) would be used. The form of scaling in eq 3 is a requirement for the use of an Ewald method, in which the evaluation of energy and force in the reciprocal space is based on a structure factor (a sum over atoms) and, consequently, necessitates assignment of the scale factor to atoms, versus to atom-atom interactions. The electrostatic and Lennard-Jones interactions are uniformly scaled; therefore, as λi decreases, both softer Lennard-Jones and reduced electrostatic interactions are obtained. The standard potential energy function is being used in system 0 (λ0 ) 1) so that the trajectory associated with it samples the correct canonical ensemble. Therefore, all our analysis in Section 3 was based on trajectories for system 0. 2.2. Molecular Dynamics Simulations. The CUKMODY protein molecular dynamics code, which uses the GROMOS9644 force field, was used to carry out the simulations. To conduct HREM simulations, the CUKMODY code was modified to incorporate the HREM, on the basis of our previous REM code.30,15 For HREM simulations, the systems were run separately on different nodes of a Linux cluster computer, and when exchanges were attempted, information was passed using MPICH, an implementation of the message passing interface technique. For all simulations, bond distances were constrained using SHAKE,45 enabling a 2 fs time step, and the temperature was globally controlled at 298 K using a Berendsen thermostat46 with a relaxation time of 0.2 ps. For the evaluation of the electrostatic and the attractive part of the Lennard-Jones energies and forces, the PME method was applied with a direct-space cutoff of 9.0 Å, an Ewald coefficient of 0.45, and a 72 × 72 × 72 reciprocal space grid. The simulations were all carried out in a cubic box with sides of 64.1 Å. For the EcHPPK (YpHPPK) simulations, 7671 (7644) waters were included, after waters overlapping the protein were removed. Five sodium cations were added to neutralize the systems for EcHPPK.

The simulation starting structures were obtained from the crystal ternary complex structures for EcHPPK (PDB entry 1q0n) and YpHPPK (PDB entry 2qx0) with AMPCPP (an ATP analog that prevents turnover), two associated magnesium cations, and HP. For our simulations, HP was removed, and AMPCPP was replaced by ATP, with the ATP modeled using the GROMOS9644 force-field parameter values. The two Mg2+ cations were each covalently linked to an oxygen phosphate atom of ATP with one linked to an R phosphate oxygen and the other to a γ phosphate oxygen, in accord with their placement in the crystal structures. The ATP phosphates were assumed fully deprotonated (ATP thus has a total charge of -4), in agreement with the ATP protonation state when ligated to Mg2+.47 The protonation states of all the ionizable residues were set to their normal ionization states at pH 7. The available crystal structure of YpHPPK is a dimer, and the issue of whether dimerization is catalytically relevant is not clear.39 From the crystal dimer structure, it is evident that each monomer’s loops 2 and 3 (especially loop 2, which is longer) extends close to the other monomer and seems to strongly interact with the other monomer. Therefore, in the dimer, these active site loops might spend a substantial amount of time bumping into each other, suggesting that the dimer is not the physiologically relevant state. For these reasons, we have simulated the monomer of YpHPPK. In the HREM simulation of both EcHPPK and YpHPPK, six systems were used with the scale factors set to λ0 ) 1, λ1 ) 0.83, λ2 ) 0.73, λ3 ) 0.65, λ4 ) 0.57, and λ5 ) 0.5. They were started from their respective initial configurations, and exchanges were attempted every 40 steps. The first 2 ns are considered as equilibration times. We monitored both the root-mean-square deviation (rmsd) and the root-mean-square fluctuation (rmsf) as functions of time for the backbone atoms and all the atoms in loops 2 and 3 and for the backbone and all atoms in the entire protein. After the initial 2 ns, all these quantities along the following 3 ns production runs for the EcHPPK and YpHPPK HREM trajectories are stable to within the normal statistical fluctuations. For the conventional MD simulations, run with the force field parameters corresponding to λ0 ) 1, the same initial configurations as the HREM were used, and the first 4 ns are considered as equilibration periods. These simulations will be referred to as MD simulations in the following. The scaling in eq 3 does include interactions between loops 2 and 3 and the rest of the protein (and the solvent). Thus, there should be and there are some modest enhancements of the rest of the protein rmsf’s relative to the MD results. The interaction of loops 2 and 3 with the rest of the protein is somewhat stronger in EcHPPK than in YpHPPK, as monitored by the trajectoryaveraged residue-by-residue rmsf’s. The weaker interaction for YpHPPK might be due to the shorter length of its loop 3. But even for EcHPPK, using the HREM approach, the rmsf’s for core residues are mostly below 2 Å. An interesting feature, especially evident for EcHPPK, is that although loop 1 is not scaled, its rmsf has increased substantially. This observation supports the crystal structure evidence that there may be strong interactions between loop 1 and some parts of loops 2 and 3. As desired, by the conservative HREM scaling in eq 3, the core protein structure is very well maintained. 2.3. Deviation and Fluctuation Measures. The standard structural deviation measure used is the rmsdj, defined as follows:

16200

J. Phys. Chem. B, Vol. 113, No. 50, 2009

rmsdj )

√( ∫

T

0

Su and Cukier

)

dt (rj(t) - rj0) · (rj(t) - rj0) /T

(4)

TABLE 1: Acceptance Ratios for the HREM Simulations potential index

λa

r0j

where is the base position vector (usually the crystal structure) for atom j. The conventional rmsfj (root-mean-square fluctuation for atom j) measure of protein fluctuations (that can be compared to B-factors)48 is defined as

rmsfj ) ∫T0

√( ∫

T

0

)

dt (rj(t) - r¯j) · (rj(t) - r¯j) /T

(5)

where jrj ) dt rj(t)/T is the trajectory-averaged position vector for atom j. 2.4. Clustering Method and HP Binding Pocket Profile Calculation. The clustering was done with the “g_cluster” routine of GROMACS implemented in the software package using the algorithm as described in Daura et al.49 For each input structure, the algorithm counts the number of neighbors within an rmsd cutoff (for this article, all the clustering is done using a 2 Å cutoff); takes the structure with the largest number of neighbors, along with all its neighbors, as a cluster; and eliminates all the structures within this cluster from the pool of structures. These steps are repeated for the remaining structures in the pool until no structures are left. The HP binding pocket profile calculation is carried out as follows: First, the conformation of a protein snapshot is superimposed onto the starting crystal structure. Then using the initial position of the mass center of HP in the crystal structure as the center and a radius of 5 Å as a search sphere, a gridpoint search is performed within this sphere with an oxygen test atom for points that have Lennard-Jones interaction potential energy with protein atoms no greater than zero, and they are recorded. Finally, those points are connected to provide an excluded volume region that defines the HP binding pocket profile. 3. Results and Discussion 3.1. HREM Replica Exchange Diagnostics. The first important issue for successful application of an HREM method is to limit the number of systems required while still providing enhanced configurational sampling. In explicit solvent simulations, not scaling the solvent-solvent interactions should provide a large reduction in the number of systems relative to the temperature REM that scales all the degrees of freedom. In addition, since it can be expected that within a reasonable time span, the relatively rigid core and loop 1 of HPPK should not change their conformations substantially, the interactions among those atoms can be left unscaled to reduce further the number of systems required. As in all REM versions, the choice and optimization of the acceptance probability of attempted exchanges in the HREM is a central issue. There should be a “reasonable” acceptance probability because for too low an exchange probability, the rate of movement through configuration space is too small, whereas for too high an exchange probability, the movement through configuration space is too slow. Table 1 lists the acceptance ratios for the HREM simulation for EcHPPK and YpHPPK. Predescu and coworkers14 analyzed the optimization of the TREM acceptance ratio (the fraction of successful exchange attempts) for a multidimensional oscillator system and found that acceptance probabilities in the 7-82% range provide sufficiently close to optimal sampling rates. All the acceptance ratios are within this suggested range. HREM simulations rely on the complementary features of a given system migrating into and out of the configurations and

λb EcHPPK 0.83 0.73 0.65 0.57 0.5

0T1 1T2 2T3 3T4 4T5

1.0 0.83 0.73 0.65 0.57

0T1 1T2 2T3 3T4 4T5

YpHPPK 1.0 0.83 0.83 0.73 0.73 0.65 0.65 0.57 0.57 0.5

acceptance ratio 0.13 0.094 0.14 0.080 0.12 0.64 0.50 0.43 0.41 0.37

a given configuration migrating into and out of the systems. The time courses of these migration patterns (namely, whether all the configurations can visit a particular system (with a particular λi value) and whether given configurations are visited by all the systems), are displayed in Figures 1 and 2. (Since the plots for the third and fourth nanosecond are similar to the plots for the fifth nanosecond, only the plots for the fifth nanosecond are displayed.) From the plots, it is clear that for the simulation of YpHPPK, all the systems can be visited by all the configurations (a-f), and conversely, all the configurations can be visited by all the systems (g-l). Although the plots for the simulation of EcHPPK are not as uniform as the ones for YpHPPK, it is still true that configurations 2 and 3 can be visited by all the systems and systems 2 and 3 can be visited by all the configurations, whereas all the systems have configurations with both higher and lower indices visiting them. These results are necessary conditions for the functioning of HREM but do not guarantee that the sampling of desired nearclosed conformations will result. The following section takes up this issue. 3.2. Comparing rmsd and rmsf for HREM Trajectories with Those for MD. As noted in the Introduction and Methodology sections, the most flexible regions of HPPK are loops 2 and 3, and these two loop regions comprise the atoms whose interactions were scaled. To see whether the HREM simulations improve sampling relative to MD simulations carried out with the standard MD force field, the rmsd and rmsf of the two loops are compared. (We will refer to these simulations as MD.) The system 0 HREM trajectories are used for the comparison, since it is system 0 that samples from the MD force field’s (λ ) 1) canonical ensemble. The rmsd and rmsf are evaluated on the basis of the backbone atoms and on the basis of all the atoms for each residue in the two loops. All the trajectories used are ones obtained after the 2 ns equilibration periods for HREM (4 ns for MD). The rmsd comparisons are displayed in Figures 3 and 4. On the basis of EcHPPK backbone atoms, the rmsd calculated from the MD simulation is mostly around 1-3 Å, showing that the simulated conformations did not deviate far from the original crystal structure. The rmsd for most residues in loops 2 and 3 based on the HREM trajectories has a value between 3 and 5 Å, with residues at the ends of two loops having a smaller rmsd and two residues in the center of loop 3 having a bit larger rmsd. The simulated conformations based on HREM deviate more than those based on MD trajectories from the starting structure. A larger deviation has a better chance to occur, since the starting structure was made by removing HP from the ternary crystal structure. On the basis of YpHPPK backbone atoms, using either the MD or the HREM method, the resulting rmsd plots show that the simulated protein loop 2 and 3 conformations

Study of E. coli and Y. pestis HPPK

J. Phys. Chem. B, Vol. 113, No. 50, 2009 16201

Figure 1. Migration checks for EcHPPK. (a-f) Migration of configurations into and out of a given system (λi scale value). (g-l) Migration of systems into and out of a given configuration. The figures of the upper two rows correspond, for a-f, to systems 0-5, respectively, and the figures of the bottom two rows, for g-l, to configurations 0-5, respectively. (Note that in view of the number of data points that are plotted, it appears as if, at a particular time, several configurations occupy the same system or several systems visit the same configuration; this does not happen.)

are substantially different from the starting crystal structure after several nanoseconds of simulation. Basing the rmsd calculations on all atoms of loops 2 and 3, the changes of conformations for most residues are larger for both EcHPPK and YpHPPK either by regular MD or by HREM. This result is expected because the side chains are usually more flexible than backbones. The larger rmsd difference for YpHPPK versus EcHPPK may be due to the fact that the starting crystal structure for YpHPPK was obtained by simulating one monomer from the dimeric crystal structure.39 As discussed in Section 2.2, in the dimer crystal structure, each monomer’s loop 2 interacts strongly with residues in the other monomer, and it is loop 2 that shows the largest rmsd’s from the crystal structure. The loop 2 residues of YpHPPK deviate more from the starting crystal structure than those of EcHPPK. Figure 5 shows that the loop 2 conformations for the starting crystal structures of the two proteins are quite similar, whereas from the ternary structures, it is clear that the HP binding pocket in HPPK is close to its loop 2. Therefore, the difference of the magnitudes

of the deviations from the crystal structures observed for the loop 2 residues in the two proteins suggests the possible existence of different HP binding pocket conformations for the two proteins. The differences between the rmsd’s based on HREM and MD trajectories are in the main not large, but rmsd shows only the deviation from the starting crystal structure and does not contain direct information about protein flexibility. In particular, if the protein is trapped in a small area far from the starting point within the configuration space, the rmsd can be large. Therefore, rmsf comparisons should be more revealing of the extent of configurational sampling. If they turn out to be small, then the rmsd measures indicate that the various trajectories are confined to basins in configuration space that are displaced by varying degrees from the crystal structures. Figures 6 and 7 display the rmsf comparisons, which show that the HREM rmsf is consistently much larger than the rmsf based on MD trajectories for both EcHPPK and YpHPPK. The MD rmsf based on the backbone atoms is less than 2 Å for each residue of the two

16202

J. Phys. Chem. B, Vol. 113, No. 50, 2009

Su and Cukier

Figure 2. Migration check for YpHPPK. (a-f) Migration of configurations into and out of a given system (λi scale value). (g-l) Migration of systems into and out of a given configuration. The figures of the upper two rows correspond, for a-f, to systems 0-5, respectively, and the figures of the bottom two rows, for g-l, to configurations 0-5, respectively. (Note that in view of the number of data points that are plotted it appears as if, at a particular time, several configurations occupy the same system or several systems visit the same configuration; this does not happen.)

loops for the two proteins, with the only exception that the Gly47 of YpHPPK has an rmsf of 2.2 Å. Even when side chains are included, only Arg84 of EcHPPK has an rmsf greater than 3 Å. It is clear that the MD trajectories were trapped around some basin in the configuration space over the simulation time scale after a stable rmsd was achieved. The rmsf obtained through HREM simulation is much larger, especially for residues in the middle of the loops that are less susceptible to the influence of the rigid core. On the basis of only backbone atoms, the residues in the middle of the loops for the two proteins all have an rmsf greater than or very close to 3 Å. It is noteworthy that the YpHPPK backbone of loop 2 exhibits a much larger flexibility as compared with that of EcHPPK. The rmsf based on backbone atoms is mostly more than 5 Å for the residues in the middle of loop 2 of YpHPPK, with the central three (Pro48, Gln49, and Asp 50) having an rmsf close to 7 Å, whereas the similarly calculated rmsf’s for loop 2 are mostly between 3 and 4 Å for EcHPPK. Again, since the HP binding pocket is close to loop 2, this large difference suggests that there might be different

configurations of residues contributing to the HP binding pockets for the two proteins. The relatively large backbone atom rmsf based on the HREM simulations makes it possible to search for the potential existence of different configurations suited for binding ligands. A possible reason for this large HREM loop 2 rmsf difference between the two proteins may be due to the short length of the loop 3 of YpHPPK. Loop 3 of YpHPPK contains only eight residues, making it difficult to interact with loop 2, leading to the possibility of enhanced freedom of motion for loop 2. 3.3. Binding Pockets for EcHPPK and YpHPPK. The HREM improvement in conformational sampling around the HP binding pocket raises the possibility of sampling near-closed conformations that are, in our terminology, similar to but not essentially identical to the crystal structure binding pocket of HP. If the rmsf increases discussed in Section 3.2 were a consequence of sampling a more-or-less uniform distribution of conformations, then the concept of near-closed conformations would not be tenable here. Thus, we follow the strategy of

Study of E. coli and Y. pestis HPPK

J. Phys. Chem. B, Vol. 113, No. 50, 2009 16203

Figure 3. EcHPPK rmsd’s for loop 2 (panels a and c) and loop 3 (panels b and d) residues, with solid lines for MD and dotted lines for HREM MD. Panels a and b are rmsd’s based on backbone atoms, and panels c and d are rmsd’s based on all atoms.

Figure 4. YpHPPK rmsd’s for loop 2 (panels a and c) and loop 3 (panels b and d) residues, with solid lines for MD and dotted lines for HREM MD. Panels a and b) are rmsd’s based on backbone atoms, and panels c and d are rmsd’s based on all atoms.

16204

J. Phys. Chem. B, Vol. 113, No. 50, 2009

Su and Cukier

Figure 5. Starting crystal structures for the simulations: (a) for EcHPPK (PDB entry 1q0n) (b) for YpHPPK (PDB entry 2qx0)

Figure 6. EcHPPK rmsf’s for loop 2 (panels a) and c) and loop 3 (panels b and d) residues, with solid lines for MD and dotted lines for HREM MD. Panels a and b are rmsf’s based on backbone atoms, and panels c and d are rmsf’s based on all atoms.

clustering the data to see if there are relatively stable sets of conformations that can be used to define near-closed conformations. To investigate the possibility of near-closed conformers and the potential differences between EcHPPK and YpHPPK binding pockets, we selected residues that are close to HP in the respective crystal structures. The residues include Thr42, Pro43, Pro44, Leu45, Try53, Asn55, and Phe123 for EcHPPK and Thr43, Lys44, Pro45, Leu46, Phe54, Asn55, and Phe123 for YpHPPK. (These residues will be referred to as pocket residues hereafter.) Only the backbone atoms of these residues are included in view of their greater stability relative to sidechain atoms. The g_cluster method described in Section 2.4 can be used to divide the 3000 snapshots of the HREM trajectories

for the two proteins into different groups according to the backbone atom rmsd’s between among pairs of snapshots. By using a 2 Å cutoff in the g_cluster, the HREM trajectories for EcHPPK and YpHPPK can be clustered into seven and eight clusters, respectively. (The last two clusters for EcHPPK are single snapshot clusters.) That the YpHPPK trajectory can be separated into more clusters using the same cutoff value may be due to the greater flexibility of loop 2 of YpHPPK. An examination of the g_cluster output for both EcHPPK and YpHPPK shows that there are at least 200 transitions (out of the 3000 snapshots) among the different clusters that occur in a uniform manner over the simulation time interval, allowing for the statistical fluctuations inherent to dwell times. Thus, the

Study of E. coli and Y. pestis HPPK

J. Phys. Chem. B, Vol. 113, No. 50, 2009 16205

Figure 7. YpHPPK rmsf’s for loop 2 (panels a and c) and loop 3 (panels b and d) residues, with solid lines for MD and dotted lines for HREM MD. Panels a and b are rmsf’s based on backbone atoms, and panels c and d are rmsf’s based on all atoms.

Figure 8. EcHPPK backbone atom conformations of pocket residues of the HP binding pocket of the central structures for clusters 1 and 4 of HREM snapshots: The left panel is for cluster 1, and the right, for cluster 4. The central structure of each cluster is shown in green; the starting crystal structure is shown in yellow; and HP, in pink, is placed according to the ternary complex crystal structure.

conformational state occupations based on the cluster algorithm are the result of transitions among these states, and there is no evidence of nonuniform behavior with regard to cluster sampling over the HREM trajectories. That the clustering algorithm for the HREM data resulted in well-defined clusters supports the basic premise that there are near-closed conformations in HPPK. The g_cluster method was also applied to the MD trajectories using the same 2 Å cutoff, but all snapshots turned out to be in the same cluster for EcHPPK, whereas 99.9% of the snapshots were in the same cluster for YpHPPK. The backbone atom conformations of the pocket residues of the central structures are shown in light green in Figures 8 and 9 for clusters 1 and 4 obtained from the EcHPPK trajectory and for clusters 1, 2, 5, and 6 from YpHPPK. The crystal structure conformations are shown in bright yellow in the two figures for comparison. HP is also put back and shown in

Figure 9. YpHPPK backbone atom conformations of pocket residues of the HP binding pocket of the central structures for clusters 1, 2, 5, and 6 of HREM snapshots. The left panels are for clusters 1 (top) and 5 (bottom), and the right panels, for clusters 2 (top) and 6 (bottom). The central structure of each cluster is shown in green; the starting crystal structure is shown in yellow; and HP, in pink, is placed according to the ternary complex crystal structure.

magenta in the two figures, according to its place in the respective crystal ternary complexes, to give a clearer picture of the binding pocket. Central structures of other dense clusters are not shown, since they are similar to either the crystal structures or the central structures shown in Figures 8 and 9. From the two figures, it is clear that the major difference

16206

J. Phys. Chem. B, Vol. 113, No. 50, 2009

Su and Cukier

Figure 10. The HP binding pockets profiles for EcHPPK. The HP binding pocket profiles are displayed in bright red. (a) Profile for the starting crystal structure; (b, c) profiles for the central structures of cluster 1 and cluster 4, respectively.

between the central structure of a certain cluster and the corresponding starting crystal structure come from the movement of the residues Thr42, Pro43, Pro44, Leu45, and Try53 for EcHPPK and residues Thr43, Lys44, Pro45, Leu46, and Phe54 for YpHPPK. These residues are either included in or close to loop 2 of the two proteins and have relatively larger flexibility, as displayed in Figures 6 and 7. From Figure 8, it is clear that the central structure of cluster 1 of EcHPPK has residues Thr42 and Pro43 moved downward and residues Pro44 and Leu45 moved in to be closer to the HP position in the X-ray ternary complex, and there exist substantial changes of the Φ and Ψ dihedral angles of Pro44 and Φ dihedral angle of Leu 46 in the structure. The downward and inward movement and change of dihedrals of Pro44 are mainly a consequence of the displacement of the left half of loop 2 (residues 44-49), in the orientation of Figure 10, toward the protein core. (Ribbon representations of the protein for the central snapshots of all displayed clusters are shown in Figures 10 and 11.) The cluster 4 central structure, shown in Figure 8, has residues Pro43, Pro44, Leu45, and Try53 moved upward, and there are substantial changes in the Φ and Ψ dihedral angles of Pro43. These changes are mainly due to loop 2 transiting toward the solvent. For the four central structures of YpHPPK shown in Figure 9, the central structures of both clusters 1 and 5 have residues Thr43, Lys44, Pro45, Leu46, and Phe54 displaced substantially upward. In the central structure of cluster 1, there are significant dihedral changes from the starting crystal structure for the Ψ of residue Thr43 and the Φ dihedrals of Pro45 and Leu46. In the central structure of cluster 5, the large dihedral changes are Ψ of Lys44 and the Φ angles of Pro45 and Leu46. The Φ dihedrals of Pro45 and Leu46 are also quite different for the two central structures. The upward movements of those residues and dihedral changes of Thr43 for cluster 1 and Lys44 in cluster 2 are due mainly to the extension of loop 2 toward the solvent. The central structures of clusters 2 and 6 for YpHPPK have residues Thr43, Lys44, Pro45, Leu46, and Phe54 moved upward, with the displacement for cluster 6 being larger. The Pro45 Ψ

Figure 11. The HP binding pockets profiles for YpHPPK. The HP binding pocket profiles are displayed in bright red. (a) Profile for the starting crystal structure; (b, c, d, e) profiles for the central structures of clusters 1, 2, 5 and 6, respectively.

and Leu46 Φ of the central structure of cluster 2 are substantially different from those in the starting structure. For cluster 6, the different dihedrals are Ψ for Lys44 and Pro45 and Φ for Pro45 and Leu46. The upward displacements are due mainly to the extension of loop 2 toward the solvent. The differences in pocket residue orientations in the various clusters described above on a residue basis are, to a large degree, more globally attributable to changes in loop 2 orientation. The flexibility of loop 2 leads to orientations that are more toward the protein core or more toward the solvent, relative to the ternary crystal structure. To obtain a clearer picture of the overall HP binding pocket, on the basis of the backbone atom conformations of the pocket residues shown in Figures 8 and 9, we calculated HP binding pocket profiles. The method given in Section 2.4 defines a binding pocket profile by providing an excluded volume region around HP. The binding pocket profile shapes can be correlated with the different pocket residue conformations. These pocket profiles are shown in red in Figures 10 and 11 for EcHPPK and YpHPPK, respectively. It is clear from the two figures that when loop 2 moves downward as compared with the starting crystal structure (cluster 1 for EcHPPK and clusters 2 and 6 for YpHPPK), the HP binding pockets get narrower, but they get wider when loop 2 moves upward (cluster 4 for EcHPPK and clusters 1 and 5 for YpHPPK), and when the movement is larger, the change of the binding pocket profile is more substantial. Figures 10 and 11 also show that the HP pocket conformations for the central

Study of E. coli and Y. pestis HPPK

J. Phys. Chem. B, Vol. 113, No. 50, 2009 16207 our previous study50 of EcHPPK that also emphasized the role of Asp 95 and Asp 97 in keeping ATP stable in its binding pocket. 4. Conclusions

Figure 12. The alignment of HP and ATP for the central structures of cluster 1 for EcHPPK (left panel) and of cluster 1 for YpHPPK (right panel). The two Mg2+ cations are shown in pink. Indicated are the distances between the hydroxyl oxygen of HP and the β-phosphate of ATP (2.46 Å for EcHPPK and 2.34 Å for YpHPPK), between the carboxylate carbons of Asp95 of EcHPPK and Asp 96 of YpHPPK and Mg2+ (2.26 Å for EcHPPK and 2.13 Å for YpHPPK), and between the carboxylate carbons of Asp97 of EcHPPK or of Asp 98 of YpHPPK and the other Mg2+ (2.22 Å for EcHPPK and 2.94 Å for YpHPPK).

structures are quite different for EcHPPK and YpHPPK. This difference is most prominent for the central structure of cluster 1 of EcHPPK and cluster 6 of YpHPPK. Although both of them have narrower binding pockets, the profiles are substantially different because in cluster 1 of EcHPPK, only the left half of loop 2 has moved toward the core, whereas in YpHPPK, the whole loop 2 moves toward the solvent. This finding supports the hypothesis that it might be possible to design inhibitors that can discriminate between the proteins. An issue important to the biochemical function of HPPK is the assembly of the catalytic site. In the ternary complex of EcHPPK,38 the two magnesium cations are coordinated by the phosphate tail of ATP (AMPCPP in this crystal structure); the hydroxyl group of HP; and the carboxyl groups of the two core residues, Asp 95 and Asp 97. By this coordination, HP is aligned with the phosphate tail of ATP to produce conformations for an associative, one-step pyrophosphoryl transfer from ATP to HP.38 To check that a reasonable catalytic site alignment was maintained for the conformations of the central structures of the clusters for EcHPPK and for YpHPPK (the relevant core residues are Asp96 and Asp 98 for YpHPPK), we reinserted HP to its original place in the respective ternary complexes. Of course, since we have positioned HP according to the crystal structures, it would move somewhat if it were included in a simulation and might enhance its interactions with the other catalytic site moieties. For all the clustered central structures, ATP did not undergo large positional changes so that when HP was reinserted, the alignment of HP and ATP is well-preserved. In particular, the β-phosphate of ATP is closest to the hydroxyl oxygen of HP for all clusters, in agreement with the crystal structures. Figure 12 displays ATP, the two magnesium cations, the two aspartates, and HP of cluster 1 (the densest clusters) for EcHPPK and YpHPPK. Note that, as discussed in the previous paragraph, even though cluster 1 for EcHPPK has a narrower HP binding pocket than the crystal structure and cluster 1 for YpHPPK has a broader HP binding pocket, in both cases, the alignments are still good. The stability of ATP is probably due to the strong charge interactions between the aspartate carboxylates and the magnesium cations. For all the structures, the magnesium cation that was linked to an R phosphate oxygen of ATP for making the starting structure is close to Asp95 in EcHPPK (Asp96 in YpHPPK), with the distance between the magnesium ion and the carboxylate carbon within 3 Å. The other magnesium cation is close to the Asp97 carboxylate in EcHPPK (Asp98 in YpHPPK). This result is in agreement with

A HREM approach to improving MD sampling was introduced to investigate the possibility of near-closed (relative to the ternary crystal structures) conformations of EcHPPK and YpHPPK. The HREM approach can be viewed as an attempt to excite only important degrees of freedom compared with the TREM, in which all degrees of freedom are thermally excited. The crucial point of the HREM implementation is the decision as to which are the important degrees of freedom. For a protein with a rigid core simulated in explicit solvent, exciting only the degrees of freedom of the most flexible parts of the protein (usually just some loops) is a great advantage since their number is at least an order of magnitude smaller than the number of other degrees of freedom in the system. In our application of the method, by focusing on loops 2 and 3, the HREM improves the MD sampling of both EcHPPK and YpHPPK, when monitored by comparing the rmsf’s based on HREM trajectories and those based on MD simulation. Using clustering methods focused on the conformations generated with the HREM, some well-populated, near-closed structures were obtained for EcHPPK and YpHPPK. In contrast, with the same cluster definition, the MD trajectory data essentially falls into one cluster, illustrating the utility of the HREM. That clusters are found, versus obtaining a broad set of conformations that do not fall into clusters, supports the hypothesis that there are near-closed conformations of HPPK with sufficient populations to be thermally accessible. In principal, the relative free energies between the most populated (cluster 1) and the other clusters could be evaluated from the cluster populations. However, obtaining sufficient statistics to reliably predict a modest free energy difference of ∼3 kcal/ mol (a relative population