Toward Quantitative Simulations of Carbon Monoxide Escape

Jan 19, 2008 - opener. Nevertheless, an intriguing visualization analysis of ... Computational studies of escape pathways started with the work of Cas...
0 downloads 0 Views 501KB Size
J. Phys. Chem. B 2008, 112, 6147-6154

6147

Toward Quantitative Simulations of Carbon Monoxide Escape Pathways in Myoglobin† Ron Elber*,‡ and Quentin H. Gibson§ Department of Chemistry and Biochemistry, and Institute of Computational Sciences and Engineering, UniVersity of Texas at Austin, 1 UniVersity Station, ICES, C0200, Austin, Texas 78712, and Department of Biochemistry and Cell Biology, Rice UniVersity, MS 140, 6100 Main Street, Houston, Texas 77005-1892 ReceiVed: August 30, 2007; In Final Form: NoVember 8, 2007

Straightforward molecular dynamics trajectories have been computed to explore the diffusion of carbon monoxide through myoglobin. The classical equations of motion were integrated for 2 ns and the resulting pathways analyzed. Two types of runs were examined. Type i: Myoglobin and a ligand embedded in a periodic box with 9996 water molecules; the water molecules are rigid but the bonds of the protein are flexible. Type ii: Myoglobin with a solvation shell (153 water molecules) in which all bond lengths are fixed. In trajectories of type i, the diffusing ligand visits a significant part of the protein matrix and was not constrained to the proximity of the heme pocket before escaping. The maximum time of the trajectories was 2 ns. It was shorter if the ligand escaped earlier. Two ligands (from a total of 88) escape to the solvent from nonclassical gates (non-E-helix gates). In trajectories of type ii, the overall fluctuations of the protein are smaller and the ligand explores significantly smaller internal space. The escape rate from type ii trajectories (11 of 400) is comparable to type i and is not dramatically different from experiment (1 of 100). Interestingly, the two simulations with comparable rates sampled different pathways. In trajectories of type ii, we observe escapes from the classical gate (His 64) and from the Xe4 cavity. Further studies (that are underway) are required to define the escape pathways and the overall rate.

I. Introduction The experimental determination of the structures of myoglobin and hemoglobin with X-ray crystallography1,2 revealed that the active site of these proteins (the heme pocket) is buried inside the protein matrix with no obvious pathway to the exterior. Since this discovery, the mechanism of ligand entry, exit, and diffusion in the protein matrix has continued to be an active topic of research. The working hypothesis is that thermal fluctuations in the neighborhood of the native structure serve as the gate opener. Nevertheless, an intriguing visualization analysis of X-ray structures suggests that the path is imprinted in the static structure and connections between known Xe sites3 create the diffusion pathway. However, the detailed atomic characteristics of these motions remain unsettled and the problem has led to a large number of experimental and computational studies, several of which are current.4,5 In addition, the geometric characteristics of the path, and its quantitative aspectssthe thermodynamics and kinetics of ligand bindingsshould be predictable computationally. Computational studies of escape pathways started with the work of Case and Karplus6 on the E-helix gate. This gate was further examined by a free energy calculation.7 We define as the “E-helix gate” escape routes that include doorway residues at the E helix, such as Val 68 and His 64. These studies were extended significantly by Tilton et al.8 using a small ligand probe of the structure, by the mean field calculations of ligand diffusion by Elber and Karplus,9 and more recently by Cohen et al.,4 using ligand insertion to probe plausible pathways. These investigations have given a qualitative picture of the ligand †

Part of the “Attila Szabo Festschrift”. * Corresponding author. Phone: 512-232-5415. Fax: 512-471-8694. ‡ University of Texas at Austin. § Rice University.

pathways, but their relation to function (thermodynamics and kinetics of binding and diffusion) has been more difficult to address. Other interesting investigations of ligand diffusion solved the diffusion equation in two dimensions, illustrating rapid exchange of the ligand between the heme pocket and the Xe4 cavity.10 A similar suggestion was made based on a detailed experiment of ligand recombination.11 Further insights into and questions about the ligand diffusion problem have been obtained by comprehensive mutation experiments coupled to thermodynamic, kinetic, crystallographic, and computational investigations.5,12-22 In particular, the locally enhanced sampling (LES) simulations of ligand diffusion in myoglobin mutants showed encouraging correlations with numerous experiments on a picosecond time scale.14,18 The process of ligand escape from the protein matrix is challenging from a computational viewpoint for two reasons: First, experimental estimates of the rate15 have yielded half times of hundreds of nanoseconds at room temperature, times that are hard to reach by straightforward molecular dynamics (MD) simulations. Second, the process is not at equilibrium. Ligands that leave the protein are unlikely to return to it on the time scales we consider (below microseconds), raising questions about the validity of equilibrium statistical mechanics in estimation of escape rates. While three approximate computational methods4,6,9,10 have been applied to the problem, the theoretical assumptions are difficult to validate, and many experimental puzzles remain. Some of the approximations are dynamical in nature, like the LES approach,9 and some are based on statistical arguments.4 Recently, time dependent crystallography21,23-25 has provided more information on geminate time scales following the breakage of the ligand-protein bond. The ligand has been observed to remain in the heme pocket for considerable periods,

10.1021/jp0769779 CCC: $40.75 © 2008 American Chemical Society Published on Web 01/19/2008

6148 J. Phys. Chem. B, Vol. 112, No. 19, 2008 an observation perhaps at variance with the picture of multiple pathways and multiple docking sites (of xenon).8,9 Ligand diffusion in the heme pocket and study of early times may help establish the general direction (or lack of a general direction) which the ligand is likely to follow on its way out of the protein matrix. As the ligand drifts away at early times from the nearest E-helix gate, it becomes less likely to use this gate later. There is, however, no detailed information on late-time escape pathways. The significant and continuous growth and availability of computer power has now made it possible to compute straightforward nonequilibrium trajectories of the early times that follow bond breaking between the ligand and the iron. The use of a nonequilibrium ensemble of straightforward MD trajectories allows the numerous approximations previously required to save computer time to be avoided. Instead of using the locally enhanced sampling (LES) with many ligand copies in a single protein matrix9 (representing a mean field density of one ligand), we use many independent trajectories, each with only one ligand. The LES approximation softens the interactions with the ligand. It accelerates diffusion and escape rates, making it difficult to compute time scales, thermodynamics, and kinetics properties from the simulations. The straightforward trajectories avoid the problems of unrealistically rapid diffusion rates. The simulations have been extended to 2 ns. This time scale overlaps the time dependent crystallography experiments of the groups of Moffat23,24 and of Schotte at al.25 that have illustrated ligand drift to the back of the heme pocket (the docking site near Leu 29). The use of straightforward molecular dynamics trajectories leaves only the quality of the energy function, the representation of the solvent, and the size of the ensemble as limiting factors of the simulation, since no other significant approximations are made. To explore plausible approaches for speeding up the calculations, we considered two models of solvation. One model embeds the protein in a periodic box of water molecules (type i), and the second uses only a solvation shell (type ii). Previous studies have argued that a limited solvation shell is sufficient to reproduce the expected protein fluctuations.26 Moreover, earlier studies of ligand diffusion with only a few water molecules correlated well with experiments on the picosecond time scales.14 Simulations that use the water shell are of course a lot faster. Interestingly, the two types of simulations produce similar ligand-escape rates that are faster by a factor of about 2-3 compared to experiment. On the basis of this comparison alone, we might have concluded that the simulations are similar and perhaps satisfactory. However, as we illustrate below, the mechanisms of ligand escape differ appreciably in the two types of simulations. More simulations and comparisons to other experimental observations (for example, comparing diffusion rates of mutants) are required to differentiate between the pathways proposed in the two computer experiments. II. Methods The simulations were initiated from the PDB structure 1MBO.27 Water molecules were added to create a box of 69.83 Å3 for simulations of type i, and 153 water molecules defined by crystallography were used as a solvation shell for simulations of type ii. No cutoff for elecrostatic calculations was used in simulations of type i in which the particle mesh Ewald method was used.28 In simulations of type ii, the electrostatic cutoff was 12 Å. The cutoff of van der Waals interaction was 9 Å for both simulations. The calculations use the molecular modeling program MOIL.29 The complete source code of MOIL is in the

Elber and Gibson public domain (http://cbsu.tc.cornell.edu/software/moil/moil.html). The potential energy in the present case is the extended atom model OPLS for nonbonded interactions,30 and the Amber force field for covalent energies,31 as implemented in the molecular dynamics program MOIL.29 The simulations were done with the particle mesh Ewald28 for summation of longrange (electrostatic) forces. All bonds were constrained with the SHAKE algorithm in simulations of type ii.32 The internal structure of the water molecules was kept rigid with a symmetric SHAKE matrix.33 In simulations of type i, only the bonds of the water molecules were constrained while all of the protein bonds were unconstrained. The integration time step was 1 fs. Initially, the protein was kept rigid and the water molecules were equilibrated for 10 ps. Then, the protein atoms were allowed to move and the whole system (protein, ligand, and solvent) was equilibrated for an additional 10 ps for simulations of type i and 6 ps for simulations of type ii. The ligand remains in the heme pocket during the equilibration time, though some drift to the back of the pocket (near Leu 29) was noted. An ensemble of trajectories of type i and of type ii was initiated by assigning different starting velocities (sampled from the Maxwell distribution at 300 K) to each run. The same coordinate set was used for all trajectories. Although it would be possible to vary the initial coordinate set also, the statistics in that case are not obvious, since the process (as mentioned above) is initiated far from equilibrium. Since the trajectories are chaotic, randomizing the velocities provides highly diverse paths, as became clear when examining individual trajectories. A total of 88 trajectories were computed for type i and 400 trajectories for type ii. The calculations were conducted on a computer cluster using one CPU (AMD 2.2 GHz) for a trajectory. The calculations required about 2 weeks to complete 2 ns trajectories of type i (the trajectories of type ii required 23 h for each group of 100 runs). The images were prepared with the graphic interface of MOIL and the integrated display program CMOIL (cbsu.tc.cornell.edu/ software/moil/). The other plots were prepared by the program xmgrace (http://plasma-gate.weizmann.ac.il/Grace/). III. Results In Table 1, we summarize an average view of the places the ligand “visited” during its diffusion through the protein matrix, listing collisions with the protein residues. The small ligand (carbon monoxide) collides with an amino acid if at least one atom of that residue is found within a distance of 4 Å from the ligand. It is therefore possible that multiple collisions between the same pair of residues will occur at a given time slice. Collisions with the heme group were included but not those with water molecules. The collisions in trajectories of type i and of type ii are considered separately. A measure of the collision strength is the collision fraction, f, which is the number of times that a “collision” is found between either atom of the small ligand (carbon monoxide) and any atom of a particular protein residue, Ncol-res, divided by the total number of protein structures, nstr, that were sampled during the trajectories and divided by the total number of residues which collide with the ligand, nres, i.e., f ) Ncol-res/(nres‚nstr). Since we count collisions between atoms, it is possible that the collision fraction will be larger than one when more than a pair of atoms collide, making the collision particularly significant. For example, having 88 trajectories of type i and 4000 structures for each of the trajectories (we saved coordinates every 0.5 ps), we examine collisions in 88 × 4000 ) 352 000 structures. For clarity, only residues with a collision fraction exceeding 0.01 are listed in

Carbon Monoxide Escape Pathways in Myoglobin

J. Phys. Chem. B, Vol. 112, No. 19, 2008 6149

TABLE 1: Comparisons of Collision Fractions of Different Residues with the Diatomic Ligand (Carbon Monoxide)a index

amino acid

box

8 11 14 15 18 25 26 29 30 33 34 44 47 65 66 69 70 73 76 77 83 90 91 94 105 108 109 112 115 116 132 135 136 138 139 140 143 156

Trp Val Val Trp Val His Gly Ile Leu Leu Phe Phe Phe His Gly Val Leu Leu Ile Leu His Leu Ala His Leu Ile Ser Ile Val Leu Met Ala Leu Leu Phe Arg Ile heme

0.04 0.01 0.01 0.09 0.06 0.05 0.36 0.49 0.66 0.05 0.61 0.01 0.33 0.14 1.00 0.29 0.09 0.04 0.06 0.07 0.04 0.01 0.10 0.04 0.27 0.03 0.18 0.01 0.07 0.04 0.09 0.03 0.05 0.16 0.04 0.03 3.82

shell (1)

shell (2)

TABLE 2: Comparing Collision Patterns at Different Time Slicesa

shell (3)

index

amino acid

2 ns

200 ps

0.01 0.07 0.09 0.08 0.06 0.06 0.01 0.15 0.05 0.02 0.04 0.03 0.38

0.08 0.34 0.42 0.25 0.18 0.20 0.07 0.57 0.31 0.09 0.10 0.15 1.12

2.a 0.09 0.01 0.03 0.32 0.39 0.26 0.02 0.02 0.14

0.10

0.08

0.02 0.34 0.43 0.25

0.34 0.42 0.25

0.12

0.18

0.17 0.09 0.49 0.31 0.08

0.14 0.10 0.54 0.30 0.09

0.20 0.07 0.57 0.31 0.09

0.09

0.11

0.10

0.13

0.14

0.15

0.01 0.01 0.02

0.83

Trp Gly Ile Leu Phe His Gly Val Leu Leu Ile Ile Hem1

8 11 14 15 18 25 26 29 30 33 44 47 65 66 69 70 73 76 77 83 90 91 94 105 108 109 112 115 116 132 135 136 138 139 140 143 156

Trp Val Val Trp Val His Gly Ile Leu Leu Phe Phe His Gly Val Leu Leu Ile Leu His Leu Ala His Leu Ile Ser Ile Val Leu Met Ala Leu Leu Phe Arg Ile Hem1

2.b

0.02

0.78

15 26 29 30 44 65 66 69 70 73 108 112 156

1.12

a

0.04 0.01 0.01 0.09 0.06 0.05 0.36 0.49 0.66 0.05 0.61 0.01 0.33 0.14 1.00 0.29 0.09 0.04 0.06 0.07 0.04 0.01 0.10 0.04 0.27 0.03 0.18 0.01 0.07 0.04 0.09 0.03 0.05 0.16 0.04 0.03 3.82

0.10 0.11 0.16 0.14 0.06 0.04 0.23 0.07

The collision fraction is the number of times during the trajectories in which the ligand was within 4 Å of any atom of that residue divided by the total number of collisions and the number of residues (the summation runs over all of the trajectories of a particular type and over all structures saved for a particular trajectory). The first column is the index of the amino acid. The MOIL program adds an N terminal residue to the beginning of the chain and a C terminal residue to the end of it. As a result, the amino acid indices are shifted by one compared to the PDB numbers, and the index of the heme group increases by two. Collisions are counted for both the carbon and the oxygen of the CO molecule. Therefore, the CO Molecule may collide twice with the same atom. The third column is the collision fraction of the simulation in a box of water (type i), and the fourth, fifth, and sixth columns are the collision fraction of a batch of 100 trajectories with a solvation shell (type ii). We compare the simulations of type i to batches of 100 trajectories of type ii (instead of a sum of 400 trajectories) in order to have comparable statistics to the simulations of type i and to be able to compare the batches with respect to each other. The fourth batch of type ii has very similar properties and is not shown.

Table 2.a: A table comparing 200 ps to 2 ns collisions for one of the batches (100 trajectories total) of type ii simulations. Note that the colliding partners are identical but the frequencies of the collisions vary significantly. Table 2.b: A table comparing 200 ps collision patterns with 2 ns patterns for the simulations of type i (a periodic box of water).

the table. Even if we remove the threshold (i.e., all residues that collide at least once with the ligand are counted), the overall picture remains the same. For example, the number of residues with a collision fraction greater than zero in trajectories of type i is roughly double the number of colliding residues in trajectories of type ii (77 in type i and 49 in type ii). From the table, it is clear that type i trajectories are more diverse than those of type ii (the ligand collides with a larger number of amino acids). Only about 20 residues collide with the ligand in 400 trajectories of type ii. Geographically, in trajectories of type ii, the ligand is localized in the heme pocket

and in the Xe4 cavity for most, or even the complete duration, of the trajectory. In contrast, 37 residues collide with the ligand in simulations of type i, illustrating the wider scope of the diffusion of the ligand. Another measure of the extent of the diffusion of the ligand is given in Table 2. We examine the distribution of residues that collide with the ligand using a different time window (the first 200 ps) and compare it to the 2 ns picture. In Table 2.a, we report the average collisions for the simulations of type ii. While the list of colliding residues is the same for the two time windows, the absolute number of collisions for each of the

0.07 0.02

0.73

a

6150 J. Phys. Chem. B, Vol. 112, No. 19, 2008

Elber and Gibson

Figure 1. Total number of collisions as a function of time for the small ligand (carbon monoxide) as it diffuses through the protein myoglobin for two sampled trajectories: (1.a) a trajectory of type i; (1.b) a trajectory of type ii.

residues is always higher for the longer times, suggesting a higher density near the ligand. Interestingly, using the same comparison of collisions at different time windows for type i simulation yields significantly different results, as illustrated in Table 2.b. There is a significant change in the number of the colliding residues above threshold. This suggests a phase shift of a ligand primarily localized at the heme pocket and the Xe4 cavity in the first 200 ps to a more delocalized ligand at longer times. We did not observe this shift in trajectories of type ii. The observation of consistently different rate and diffusive behavior leave the choice of the different pathways and their relative contributions to the overall rate as an open question. It is tempting to conclude that the ligand with larger spatial distribution inside the protein matrix would also have a smaller collision fraction to assist its higher mobility. This is however

incorrect. Trajectories of type i show a considerably larger number of collisions (per time slice) in the long time limit compared to trajectories of type ii. In Figure 1.a, we show the numbers of collisions of protein heavy atoms with either of the ligand atoms (carbon or oxygen) as a function of the time slices for one typical trajectory. The total length of the trajectory was 2 ns, and coordinate sets were saved every 0.5 ps for trajectories of type i (and 1 ps for trajectories of type ii). While the fluctuations around the average are significant, they seem to converge at times longer than 600 ps to (about) 17 collisions per time slice. These limits are (approximately) between 10 and 25. Before convergence, the fluctuations of the collision numbers are much higher. The average collision number is significantly lower with a significant sampling of scores below 10. It takes about 600 ps to converge, explaining the differences in the collision tables of 200 ps and 2 ns.

Carbon Monoxide Escape Pathways in Myoglobin

J. Phys. Chem. B, Vol. 112, No. 19, 2008 6151

Figure 2. Diffusion of the ligand in myoglobin. We show a single static structure for the protein. The backbone is a green ribbon, the heme is in red with atoms modeled as spheres, and the amino acid residues are shown as sticks with gray and red color in type i. The residues are not shown in one of the trajectories of type ii to enhance clarity. The carbon atom of the carbon monoxide molecule is shown as a yellow space-filling sphere. A sequence of yellow spheres defines the trajectory of the ligand as a function of time. In part 2.a, we show trajectories of type i, and in part 2.b, trajectories of type ii. Trajectories 2.a.1-2.a.4 illustrate the confinement of the ligand at early phases. Trajectories 2.a.5-2.a.8 show rare fluctuations and ligand exits from the primary heme pocket. Trajectory 2.a.7 shows an event of ligand escape (see text for more details). We also show escape pathways from trajectories of type ii. One of these trajectories is directly through the E-helix gate (between His 65 and Val 69). The second escape pathway, close to the N terminal, is similar to one of the pathways observed for type i escapes. The trajectories of type ii are more diverse. See text for more details. The images were prepared with the program CMOIL, the visualization package of the simulation program MOIL.29

In Figure 1.b, we show a similar plot for a sampled trajectory of type ii. The asymptotic average is significantly lower. It is about 10 collisions per time slice, and it fluctuates rapidly between 5 and 15 collisions. Not only is the asymptotic, longtime behavior different, but also the beginning of the trajectory shows significant differences as well. At the start, the number of collisions is higher than the asymptotic values (an average of about 20 which is higher than any other value we have seen for type i or type ii trajectories). Hence, the qualitative trend of modifying the collision environment from lower to higher value in type i trajectories is now inverted in type ii (moving from high to low). The transition between the two phases occurs rather early (130 ps) compared to the transition time in type i (600 ps). Hence, there are substantial quantitative differences between the mechanisms of diffusion in the two types of the calculation. It is somewhat surprising that this relatively small number of collisions is sufficient to contain the ligand in its restricted space so well in simulations of type ii. The restriction and the smaller number of collisions are results of ligand seating firmly

in the Xe4 cavity which is quite large and empty, and does not enforce a large number of collisions on the ligand. It is also of interest to comment on the concept of local equilibrium. In local equilibrium, we assume that the escape rate is a small perturbation on the system that rapidly reaches a steady state at times shorter than the time scale of escape. This assumption is useful in approximate theories for calculation of rate constants such as the transition state theory,34 and it is clearly satisfied for trajectories of type ii in which the collision behavior reaches a steady state on the 100 ps time scale. However, the local equilibrium assumption is more problematic for trajectories of type i in which the diffusive and collision patterns of the ligand with protein residues are changing phase at 600 ps. The time scale of 600 ps is not far off from the 1-2 ns time scale that we use computationally to probe ligand escapes. Another analysis of ligand diffusion is obtained by examination of individual trajectories of type i and type ii. In Figure 2, we show a few trajectories with emphasis on ligand motions. In all figures, only one protein frame (initial structure) is used.

6152 J. Phys. Chem. B, Vol. 112, No. 19, 2008 In the initial frame, we plot as yellow spheres the different positions that the ligand visited during the trajectories. Escaping and nonescaping trajectories are shown to emphasize the difference between the two and to examine the extent of ligand penetration to other spots in the protein matrix. The plot does not take into account the probability of being at a particular site. Sites that were visited only once carry the same yellow sphere as sites that were visited more frequently. From this perspective, the plots do not do justice to the actual ligand probability density and overemphasize rare positions. However, for visualization purposes and highlighting the escape pathways, we find them adequate. It is of interest to examine a few escape pathways and list the residues that enclose the ligand diffusion pathways. The pathways below are similar to those discovered and discussed in the LES simulaton;9 however, the pathways of the LES simulations are more permissive and greater variation in exit pathways and access to different protein locations is observed. This is particularly significant, since the statistics in the LES simulations was poorer. Escaping Trajectories of Type i. 2.a.6. The ligand was confined at the beginning by residues Leu 29, Phe 43, His 64, Val 68, and Ile 107 and the heme. Later, the ligand diffused back to and from along the elongated heme pocket and added the following residues of the B and E helix to the list: His 24, Gly 25, and Ile 28. Oscillations showing a subset of the list so far continued to 1.5 ns. At around this time, the ligand shifts position to include residues Leu 86, Leu 89, Ala 90, His 93, and Leu 104. From there, the ligand eventually escaped after 1.6 ns. 2.a.7. The start (not surprisingly) was in the heme pocket: Leu 29, Phe 43, His 64, Val 68, and Ile 107. At around 300 ps, there was a transition to another cavity in the protein matrix, a cavity defined by Trp 14, Val 17, His 24, Ile 28, Ile 69, Leu 72, Ile 111, Leu 115, and Met 131. The ligand continues to migrate toward the N terminal and after 435 ps can be found near Trp 7, Ile 75, Leu 76, His 82, Ala 134, Leu 137, and Phe 138. From this position, the ligand escapes to the solvent. Escaping Trajectories of Type ii. Trajectory ii.1. The ligand never leaves the heme pocket: Leu 29, Phe 43, His 64, Val 68, and Ile 107. Escape after 600 ps directly through the E gate (between His 64 and Val 68). Trajectory ii.2. Heme pocket to start with (Leu 29, Phe 33, His 64, Val 68, Ile 107, and heme). After 85 ps, shift to the Xe4 cavity (contacts with Gly 25, Ile 28, Val 68, Leu 69, Ile 107, and Ile 111), back to the heme pocket after 150 ps, back to Xe4 after 350 ps, back to the heme at 400 ps, back at Xe4 after 540 ps. It finally does something new at around 600 ps in which it diffuses further close to the N terminal (Val 10, Val 13,Trp 14, Val 17, Ile 111, Leu 115, Phe 123, and Met 131). The escape from the last site occurred at around 700 ps. Another quantity of interest is the fluctuation of the protein, defined as RMS ) [(1/L‚M)∑i)1...L,j)1...N mi(xij - xj,av)t(xij xj,av)]1/2, where L is the number of structures that we sampled during the trajectory, M is the total mass M ) ∑i)1,...,N mi, mi is the mass of particle i, the structure index is j, and xij is the vector position of atom i in structure j. The average coordinate vector, xj,av, is computed after optimal overlap with the initial coordinate set.35 The difference xij - xj,av is computed after optimal overlap of xij with xj,av. In our multiple trajectories, we have the opportunity to compare the fluctuations produced by different trajectories. This is a better measure than convergence tests that are based on a single trajectory.36 In Figure 3, we overlay the fluctuations of two trajectories of type i, averaged

Elber and Gibson over all time slices for different amino acids. We also added the fluctuations extracted from the B factors. When the fluctuations of the two trajectories are similar, the agreement with the B factor is quite good. Overall, the fluctuations of type i seem somewhat larger than the thermal fluctuations estimated from the B factor. Interestingly, the situation is reversed for type ii trajectories with significantly smaller fluctuations. The difference may be due to the application of SHAKE on all bonds in type ii and not necessarily to the difference in solvation models. Unsurprisingly, the largest fluctuations are in nonhelical regions. IV. Discussions and Conclusions The present investigation builds on prior studies of ligand diffusion, and focuses on the most direct approach to the problem. Twenty years ago, it was not possible to compute even a single trajectory of myoglobin (without solvent) on the nanosecond time scale and simulations searching for alternative pathways were carried out: (1) in a rigid protein matrix and with a small (helium-like) probe8 and (2) in a vacuum, at elevated temperatures to accelerate the dynamics, and within the mean field LES approximation.9 Such searches are clearly approximate. They are likely to overestimate the number of available pathways and give preference to more direct and geometrically short paths even if they are energetically costly. Of course, the level of overestimation is not known. In an attempt to explore further the diffusion pathways of the ligand using gentler approximations, we performed two types of simulations, both at room temperature and using a straightforward and detailed molecular dynamics approach. We focus on early events of the diffusion process, restricting the simulation time to 2 ns. We hope to explore the possibility that the diffusion is activated, which in the context of the present application means the existence of a metastable ligand state in the protein matrix with population that decays exponentially in time. If it is indeed the case, then sampling of a few events at shorter times may allow us to extrapolate to times significantly longer and more relevant to the process at hand (ligand exit requires hundreds of nanoseconds). One set of simulations (type i) uses a periodic box of water to model protein solvation, and the second set includes only a solvation shell (type ii). Simulations of type i are considerably more expensive and require about 2 weeks of CPU from start of a trajectory to its completion. We assume that each trajectory is running in serial and that we use multiple CPUs to run multiple independent trajectories. As a result, the alternative simulations (with solvation shells only) are particularly attractive. They usually require only a few hours to complete a single 2 ns simulation. It is therefore not surprising that, while 88 trajectories of type i were calculated, the statistics for the shell simulations are significantly better (400 trajectories) and were obtained at a fraction of the computational cost. In the use of limited solvation (type i), we were encouraged and motivated by previous LES simulations with only a few water molecules, which showed good correlation with experiment on a picosecond time scale.14,17 The present study is in a broad sense complementary to the study by Zhang et al.37 on the effect of solvation on heme relaxation processes in myoglobin. Other studies of energy relaxation in myoglobin were published.38,39 It shows reasonable correlation of the computed value with estimates of rates. The experimental rate of ligand escape from the protein matrix15 has been reported to be k ) 6.1 × 106 s-1. This rate is not at significant variance with the observed number of ligand escapes in type i and type ii simulations if we assume first-order

Carbon Monoxide Escape Pathways in Myoglobin

J. Phys. Chem. B, Vol. 112, No. 19, 2008 6153

Figure 3. Root-mean-square displacements of myoglobin resiudes averaged over all time slices for two trajectories each of type i and type ii. Fluctuations computed from two trajectories are shown in red and black. The fluctuations estimated from the experimental B factors are shown in green.

kinetics: n(t) ) n0 exp(-kt), where n0 is the number of trajectories we started with and n(t) is the number of trajectories that remain in the protein matrix after time t. Plugging in the above rate constant, we obtain that after 2 ns 98.8% of the population should remain in the protein (one escape from 88 trajectories of type i and 11 from 400 trajectories of type ii). Obviously, more will have to be done to accurately estimate the rate constant, but this is a reasonable start. The retention time of the ligand in the pocket and the backward “docking site” are also consistent with the time dependent experiments of Aranda et al.21 The current model offers (at least) a semiquantitative description of small ligand migration through a protein.

The encouraging results from the perspective of rate prediction and cavity retention time do not imply convergence to a unique diffusion mechanism. Different diffusion and escape mechanisms were observed in the two types of simulations. One of the mechanisms (found in the solvation shell simulation) is dominated by motion in the heme pocket and the Xe4 cavity. The second mechanism (found in the simulations with periodic boundary conditions) occurs in a more permissive protein in which the ligand collides with a broader range of amino acids but is still allowed to move forward. The decision of which of the two diffusion mechanisms better describes experiment will require more simulations or experiments. Studies of the diffusion and ligand escapes in other protein mutants, using comparable

6154 J. Phys. Chem. B, Vol. 112, No. 19, 2008 approaches to the investigations described here, are underway and hopefully will shed more light on these open questions. It is possible to think of the variations in the models of solvation as a different choice of solvent viscosity.40 This phenomenological choice is consistent with current observation. We expect that the viscosity of the solvation shell will be higher (high surface tension), and indeed, the extent of proten motions (at the observed time scale) is smaller. Finally, we comment on the quality of the modeling from a physical viewpoint. The use of a small number of water molecules is not a good imitation of solution conditions, and from physical considerations, we expect that the simulation with periodic boundary conditions will provide more accurate results. Nevertheless, other sources of errors (inadequate sampling, inaccuracies in the force field, etc.) make it useful to examine the stability of the results under significant variation in the model parameters. The shell simulations allow for rapid exploration of plausible models that can be examined in greater detail with the more elaborate solvation models. Acknowledgment. This research was supported by NIH grant GM059796. References and Notes (1) Kendrew, J. C.; Bodo, G.; Dintzis, H., M.,; Parrish, R., G.,; Wyckoff, H.; Phillips, D. C. Nature 1958, 181, 662. (2) Perutz, M. F. Structure of Hemoglobin; Brookhaven Symposia in Biology; 1960. (3) Teeter, M. Biophys. J. 2004, 86, 1A. (4) Cohen, J.; Arkhipov, A.; Braun, R.; Schulten, K. Biophys. J. 2006, 91, 1844. (5) Hummer, G.; Schotte, F.; Anfinrud, P. A. Proc. Natl. Acad. Sci. U.S.A. 2004, 101, 15330. (6) Case, D. A.; Karplus, M. J. Mol. Biol. 1979, 132, 343. (7) Kottalam, J.; Case, D. A. J. Am. Chem. Soc. 1988, 110, 7690. (8) Tilton, R. F.; Kuntz, I. D.; Petsko, G. A. Biochemistry 1984, 23, 2849. (9) Elber, R.; Karplus, M. J. Am. Chem. Soc. 1990, 112, 9161. (10) Banushkina, P.; Meuwly, M. J. Chem. Phys. 2007, 111. (11) Wang, Y. H.; Baskin, J. S.; Xia, T. B.; Zewail, A. H. Proc. Natl. Acad. Sci. U.S.A. 2004, 101, 18000. (12) Draghi, F.; Miele, A. E.; Travaglini-Allocatelli, C.; Vallone, B.; Brunori, M.; Gibson, Q. H.; Olson, J. S. J. Biol. Chem. 2002, 277, 7509. (13) Carlson, M. L.; Regan, R. M.; Gibson, Q. H. Biochemistry 1996, 35, 1125. (14) Gibson, Q. H.; Regan, R.; Elber, R.; Olson, J. S.; Carver, T. E. J. Biol. Chem. 1992, 267, 22022.

Elber and Gibson (15) Scott, E. E.; Gibson, Q. H. Biochemistry 1997, 36, 11909. (16) Scott, E. E.; Gibson, Q. H.; Olson, J. S. J. Biol. Chem. 2001, 276, 5177. (17) Carlson, M. L.; Regan, R.; Elber, R.; Li, H. Y.; Phillips, G. N.; Olson, J. S.; Gibson, Q. H. Biochemistry 1994, 33, 10597. (18) Quillin, M. L.; Li, T. S.; Olson, J. S.; Phillips, G. N.; Dou, Y.; Ikedasaito, M.; Regan, R.; Carlson, M.; Gibson, Q. H.; Li, H. Y.; Elber, R. J. Mol. Biol. 1995, 245, 416. (19) Brunori, M.; Cutruzzola, F.; Savino, C.; Travaglini-Allocatelli, C.; Vallone, B.; Gibson, Q. H. Biophys. J. 1999, 76, 1259. (20) Bourgeois, D.; Vallone, B.; Arcovito, A.; Sciara, G.; Schotte, F.; Anfinrud, P. A.; Brunori, M. Proc. Natl. Acad. Sci. U.S.A. 2006, 103, 4924. (21) Aranda, R.; Levin, E. J.; Schotte, F.; Anfinrud, P. A.; Phillips, G. N. Acta Crystallogr., Sect. D 2006, 62, 776. (22) Bossa, C.; Amadei, A.; Daidone, I.; Anselmi, M.; Vallone, B.; Brunori, M.; Di, Nola, A. Biophys. J. 2005, 89, 465. (23) Srajer, V.; Teng, T. Y.; Ursby, T.; Pradervand, C.; Ren, Z.; Adachi, S.; Schildkamp, W.; Bourgeois, D.; Wulff, M.; Moffat, K. Science 1996, 274, 1726. (24) Srajer, V.; Ren, Z.; Teng, T. Y.; Schmidt, M.; Ursby, T.; Bourgeois, D.; Pradervand, C.; Schildkamp, W.; Wulff, M.; Moffat, K. Biochemistry 2001, 40, 13802. (25) Schotte, F.; Lim, M. H.; Jackson, T. A.; Smirnov, A. V.; Soman, J.; Olson, J. S.; Phillips, G. N.; Wulff, M.; Anfinrud, P. A. Science 2003, 300, 1944. (26) Steinbach, P. J.; Brooks, B. R. Proc. Natl. Acad. Sci. U.S.A. 1996, 93, 55. (27) Phillips, S. E. V. J. Mol. Biol. 1980, 142, 531. (28) Essmann, U.; Perera, L.; Berkowitz, M. L.; Darden, T.; Lee, H.; Pedersen, L. G. J. Chem. Phys. 1995, 103, 8577. (29) Elber, R.; Roitberg, A.; Simmerling, C.; Goldstein, R.; Li, H. Y.; Verkhivker, G.; Keasar, C.; Zhang, J.; Ulitsky, A. Comput. Phys. Commun. 1995, 91, 159. (30) Jorgensen, W. L.; Tiradorives, J. J. Am. Chem. Soc. 1988, 110, 1657. (31) Weiner, S. J.; Kollman, P. A.; Case, D. A.; Singh, U. C.; Ghio, C.; Alagona, G.; Profeta, S.; Weiner, P. J. Am. Chem. Soc. 1984, 106, 765. (32) Ryckaert, J. P.; Ciccotti, G.; Berendsen, H. J. C. J. Comput. Phys. 1977, 23, 327. (33) Weinbach, Y.; Elber, R. J. Comput. Phys. 2005, 209, 193. (34) Truhlar, D. G.; Garrett, B. C.; Klippenstein, S. J. J. Phys. Chem. 1996, 100, 12771. (35) Kabsch, W. Acta Crystallogr., Sect. A 1978, 34, 827. (36) Straub, J. E.; Rashkin, A. B.; Thirumalai, D. J. Am. Chem. Soc. 1994, 116, 2049. (37) Zhang, Y.; Fujisaki, H.; Straub, J. E. J. Phys. Chem. B 2007, 111, 3243. (38) Kondrashov, D. A.; Montfort, W. R. J. Phys. Chem. B 2007, 111, 9244. (39) Takayanagi, M.; Okumura, H.; Nagaoka, M. J. Phys. Chem. B 2007, 111, 864. (40) Finkelstein, I. J.; Massari, A. M.; Fayer, M. D. Biophys. J. 2007, 92, 3652.