J. Phys. Chem. B 2010, 114, 4389–4399
4389
Molecular Mechanism of Acid-Catalyzed Hydrolysis of Peptide Bonds Using a Model Compound Bin Pan,† Margaret S. Ricci,‡ and Bernhardt L. Trout*,† Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, and Department of Process & Product DeVelopment, Amgen Inc., Thousand Oaks, California 91320 ReceiVed: June 9, 2009; ReVised Manuscript ReceiVed: January 21, 2010
The stability of peptide bonds is a critical aspect of biological chemistry and therapeutic protein applications. Recent studies found elevated nonenzymatic hydrolysis in the hinge region of antibody molecules, but no mechanism was identified. As a first step in providing a mechanistic interpretation, this computational study examines the rate-determining step of the hydrolytic reaction of a peptide bond under acidic pH by a path sampling technique using a model compound N-MAA. Most previous computational studies did not include explicit water molecules, whose effects are significant in solution chemistry, nor did they provide a dynamic picture for the reaction process in aqueous conditions. Because no single trajectory can be used to describe the reaction dynamics due to fluctuations at finite temperatures, a variant version of the transition path sampling technique, the aimless shooting algorithm, was used to sample dynamic trajectories and to generate an ensemble of transition trajectories according to their statistical weights in the trajectory space. Each trajectory was computed as the time evolution of the molecular system using the Car-Parrinello molecular dynamics technique. The likelihood maximization procedure and its modification were used in extracting dynamically relevant degrees of freedom in the system, and approximations of the reaction coordinate were compared. Its low log-likelihood score and poor pB histogram indicate that the C-O distance previously assumed as the reaction coordinate for the rate-determining step is inadequate in describing the dynamics of the reaction. More than one order parameter in a candidate set including millions of geometric quantities was required to produce a convergent reaction coordinate model; its involvement of many degrees of freedom suggests that this hydrolytic reaction step is very complex. In addition to affecting atoms directly involved in bond-making and -breaking processes, the water network also has determining effects on the hydrolytic reaction, a fact that is manifest in the expression of the one-dimensional best-ranked reaction coordinate, which includes three geometric quantities. The pB histograms were computed to verify the results of the likelihood maximization and to verify the accuracy of approximation to the “true” reaction coordinate. Introduction Hydrolytic reactions of peptide bonds in protein molecules have been investigated with respect to the proficiencies of hydrolases and proteases1-3 and for understanding the chemical stability of protein pharmaceuticals.4 In addition to hydrolysis of peptide bonds on the polypeptide backbone, the routes of chemical degradation of protein pharmaceuticals include oxidation,5,6 deamidation,7,8 and other mechanisms.9 Monoclonal antibodies, a fast-growing class of biopharmaceuticals,10 have been reported to undergo nonenzymatic fragmentation in the hinge region.11-13 This finding was identified as resulting mainly from the hydrolysis of specific peptide bonds in the hinge. We believe that understanding the underlying reaction mechanism, particularly from a molecular perspective, will help to direct formulation development and antibody engineering efforts to minimize the extent of degradation in a rational and more efficient way. An example of this approach is the discovery of water interactions as a key parameter in the oxidation mechanism, which leads to the formulation strategy of manipulating solvent accessibility to control methionine oxidation.5 Cordoba et al. were the first to identify the nature of the fragmentation of several different humanized recombinant IgG1 * To whom correspondence should be addressed. E-mail:
[email protected]. † Massachusetts Institute of Technology. ‡ Amgen Inc.
monoclonal antibodies.11 They found that neither the addition of host cell proteins nor the addition of protease inhibitors caused any change in the levels of fragmented antibody molecules during a 1 month incubation of these antibodies, and they concluded the nonenzymatic nature of the fragmentation. They also identified that the fragmentation was related to a hydrolytic cleavage of several peptide bonds only in the hinge region on a heavy chain of the antibody molecule, resulting in the two Fab and Fab + Fc fragments. However, if the hydrolysis of these peptide bonds was not catalyzed by proteases or hydrolases, the occurrence of hydrolysis of these peptide bonds only in the hinge region was difficult to interpret. Further investigation is required to understand the mechanism of such fragmentation. Using optimized reversed-phase methods coupled with mass spectrometry, Dillon et al.12 reached similar conclusions about the location and extent of hydrolytic cleavage of peptide bonds in the hinge region for IgG2 antibodies. Furthermore, the resulting fragments subsequently associated to form high molecular-weight species via a clip-mediated aggregation mechanism,14,15 which can be the primary degradation pathway at elevated temperatures for IgG2 antibodies. This work and a companion paper16 aim at understanding the hydrolytic reaction of peptide bonds in a model compound in order to provide a basis for studying hydrolysis of peptide bonds in antibody molecules.
10.1021/jp905411n 2010 American Chemical Society Published on Web 03/18/2010
4390
J. Phys. Chem. B, Vol. 114, No. 13, 2010
There have been numerous experimental and computational studies on the hydrolysis of peptide bonds in aqueous solutions due to the important biological relevance of peptide bonds. To elucidate enzyme proficiencies in catalyzing peptide bond hydrolysis, a number of groups2-4,17 extensively characterized the reaction rates of hydrolysis of peptide bonds. Studies found that, in general, the reaction rate at neutral pH is extremely slow, with a half-time corresponding to hundreds of years. However, this reaction occurs much faster in both acidic and basic conditions, as studied by Smith et al.18 They extensively mapped the pH and concentration dependences of reaction rates for the hydrolytic reaction of a peptide bond in N-(phenylacetyl)glycylD-valine (PAGV) and identified three regimes of reaction mechanisms, an acid-catalyzed mechanism for pH < 4 with a first-order reaction rate in concentrations of both PAGV and H+, a base-catalyzed mechanism for pH > 10 with a first-order reaction rate in concentrations of both PAGV and OH-, and a water-mediated mechanism for pH values in between with a first-order reaction rate only in the concentration of PAGV. A number of computational efforts19-21 were devoted to understanding the energetics of both stable and intermediate species involved in the hydrolytic reaction. However, these studies showed only a static picture of the reaction process. Stranton et al.22 used combined quantum mechanical and classical mechanical calculations to study the hydrolytic reaction of peptide bonds catalyzed by trypsin in solution; free energies of reaction were reported. Zahn et al.23-26 studied the hydrolytic reaction of N-MAA in various solution conditions, namely, acidic, basic, and neutral pHs. These ab initio calculations of the potential of mean force assumed reaction coordinates and performed the projection of the free-energy hypersurface onto these coordinates using constrained molecular dynamics. However, if the reaction coordinates assumed were not correct, the free-energy profile could be irrelevant to the reaction process. A separate study by Zahn27 investigated the rate-limiting step of the acid-catalyzed hydrolysis of a peptide bond using the path sampling Car-Parrinello technique, the exact same problem and approach as presented here. However, not enough transition trajectories were collected in Zahn’s study nor was the determination of the reaction coordinate performed. Even though the hydrolytic reaction occurs slowly at pH 4-10, hydrolysis of therapeutic antibodies can be significant in physiological conditions during circulation or within the formulation solution conditions over shelf life, as illustrated in the studies by Cordoba et al.11 and Dillon et al.12 Thus, from a scientific and practical standpoint, elucidation of the underlying mechanism of the hydrolytic reaction is essential. Previous experimental studies identified distinct reaction mechanisms for the hydrolytic reaction of peptide bonds at different solution pHs. In acidic conditions, as shown in Figure 1, the hydrolytic reaction is accelerated by the protonation of carbonyl oxygen,17 followed by a water-assisted H2O attack on the resulting O-protonated amide to yield a tetrahedral intermediate I1. This attack is the rate-limiting step, as confirmed by experiments of solvent kinetic isotope effects.17 Following the formation of the tetrahedral intermediate, a proton is added by another water molecule on the amide nitrogen to form another intermediate I2, and C-N cleavage ensues with formation of unprotonated carboxylic acid and an amine. In this work and in a companion paper,16 we studied the hydrolytic reaction of peptide bonds under both acidic and neutral pH conditions by using a model compound N-methyl acetyl acrylamide (N-MAA), in which two methyl groups are the minimal but computationally reasonable constituents on the
Pan et al.
Figure 1. The acid-catalyzed pathway of hydrolytic reaction of the peptide bond.17 Only the rate-limiting step is studied in this paper. I1 and I2 are the two metastable intermediates.
peptide bond -NH-CO-. Acidic and neutral pH conditions were chosen because these conditions are the formulation conditions most frequently used for the stabilization of monoclonal antibody molecules.28 Ab initio molecular dynamic simulations at finite temperature equipped with transition path sampling (TPS),29-31 likelihood maximization32-34 and pB histogram analysis31 techniques were conducted to gain understanding of the reaction mechanism, including identification and verification of the reaction coordinate. This work aims at identifying the reaction coordinate in the rate-determining step, that is, the formation of intermediate I1 in the acid-catalyzed pathway of the hydrolytic reaction. The reaction coordinate is defined here as a descriptor of the real physical progress of the transition from the reactant state to the product state. This definition of reaction coordinate has to be contrasted with that of order parameters, which serves mainly to distinguish the two ending states. An order parameter (OP) is a quantity whose value can be used to distinguish different thermodynamic states, for example, reactant or product states and crystalline, amorphous, or liquid states. A reaction coordinate has to be an OP, while the contrary is not necessarily true. Identification of the “correct” reaction coordinate from a collection of OPs is a very challenging problem, especially when the transition is complex. However, the information about the correct reaction coordinate is necessary when computing quantities of interest, such as free-energy barriers and reaction rate constants, from molecular simulations. Also, we believe that knowledge of the correct reaction coordinate, and hence the reaction mechanism, can provide essential information on the molecular level for judicious engineering of complex systems. Transition path sampling (TPS) is a technique well-suited for studying transition processes that have time scale separation and exhibit a rough potential energy surface when the system is not too diffusive. It focuses on the dynamic bottleneck for the rare event which is defined as the transition-state surface. That the dynamic trajectories cross a metastable intermediate or metastable intermediates can be identified automatically in this TPS procedure, indicated by long lingering of the system inside of such a region in phase space, but this was not in fact found in this study. TPS starts with a predetermined transition path connecting two stable states, and then by making shooting and shifting moves in the trajectory space using a Monte Carlo procedure, TPS can eventually lead to the true dynamics at
Acid-Catalyzed Hydrolysis of Peptide Bonds transition states and an ensemble of physically meaningful pathways. Eventually, complete reaction profiles, transition states, free energies of reaction barrier, and reaction rates can be obtained. The prerequisites to perform a transition path sampling are (1) definitions of two stable states and (2) an initial trajectory, or a series of points in phase space, which connects the two stable states. This initial trajectory may not be physical or at the same condition as the one of interest. The most appealing feature of TPS is that no prior knowledge about the reaction mechanism, the reaction coordinate, and the transition state is needed to begin with. A more efficient sampling technique to explore the trajectory space is the aimless shooting algorithm,33,33 a variant of the TPS algorithm. It has the advantage of having more decorrelation between successive trajectories than the original version of TPS and thus can explore the trajectory space more quickly. The detailed implementation of the algorithm can be found in Peters et al.32-34 Given a collected ensemble of trajectories, with their statistical weight in trajectory space correctly incorporated, there are different ways to identify the important degrees of freedom, namely, the factors comprising the reaction coordinate. Ma et al.35 developed an algorithm based on neural networks and a genetic algorithm to identify the reaction coordinate. In their method, neural networks are used to determine the functional dependence of the committor probability on a set of coordinates, and the genetic algorithm selects the combination of elements that yields the best fit from this set of coordinates. Likelihood maximization,32,33 an alternative and complementary technique to select physically important degrees of freedom, is based on informatics theory. It screens a set of candidate collective variables for a viable reaction coordinate that depends only on a few relevant variables. Selection of the candidate collective variables is crucial in both this approach and the approach developed by Ma et al.35 The possible collective variables can be several, (1) pseudointernal coordinates including geometric quantities such as distances, angles, and dihedral angles, (2) accessible surface area of, radius of gyration of, and excluded volume of the solute, and (3) the number of hydrogen bonds, protein native contact number, and so forth. Because the computational screening of a single such collective variable is fast, it is feasible to perform the screening over a large number of candidate variables. The likelihood maximization algorithm starts with a data set, which is the rejected or accepted shooting points in TPS, and it seeks a committor probability model pB[r(x)] and a reaction coordinate model r(x) as a function of the shooting point configuration x in order to explain the realizations of pB[r(x)] that were obtained from TPS. The Bayesian information criterion32 can be used to distinguish different reaction coordinate models such that an optimal but still simplistic model can be obtained. Methodology System Description. A simulation setup very similar to Zahn’s potential of mean force calculations23 was chosen for our path sampling and analysis, but with a few important differences. First, we performed equilibration molecular dynamic (MD)36,37 simulations by running long trajectories using the classical CHARMM force field38 under ambient temperature and pressure. This was done using the CHARMM package39 using the NPT ensemble where the number of particles N, the pressure P, and the temperature T were held constant. Force field
J. Phys. Chem. B, Vol. 114, No. 13, 2010 4391
Figure 2. Simulation box (a) together with bond distances used to define basins of stable states (b).
parameters for N-MAA were taken from similar structures (ALA and CT3 in CHARMM22 for residues alanine and N-methylamide C-terminus), and its geometry was fixed. The TIP3P force field40 parameters were used for water in the system, and the CHARMM force field was used for HCl. In the final equilibrated system, our 12.49 Å × 9 Å × 9 Å simulation cell was comprised of 31 water molecules, one N-MAA, and one HCl in order to provide an acidic solution condition, as shown in Figure 2. We found that the value of the water density in Zahn’s studies23 was too high, while the water density values used in these studies had the correct value at the ambient temperature and pressure. Our initial setup aimed at assessing the validity of the C-O distance along which the free-energy profile was projected in Zahn’s study;23 therefore, the same number of water molecules, that is, 31 water molecules, was used. Compared to Marx’s study (9.8652 Å × 9.8652 Å × 9.8652 Å),41 our simulation box (12.49 Å × 9 Å × 9 Å) has 5 additional HCl molecule together with 31 water molecules, while being 31.7% greater in volume than that of the simulation box used in Zahn’s study (12 Å × 8 Å × 8 Å). Having said that, there could be finite size effects which would need to be investigated in a subsequent, much more extensive study. Of course, increasing the number of water molecules in the simulation system makes the type of computations illustrated in this paper prohibitively expensive. Our choice of the number of water molecules was a compromise between accuracy and efficiency. A long constant-temperature MD trajectory of 50 ps was performed with the O1-H1, the C-O1, and the H1-O2 distances (as shown also in Figure 2) constrained in the Car-Parrinello molecular dynamics (CPMD)42 using the CPMD package43 in order to equilibrate other degrees of freedom. Dynamic trajectories were collected using the CPMD package43 in the NVT ensemble. A time step of 4 au (∼0.1 fs) and an electron fictitious mass of 400 au were used. A chain of four Nose´-Hoover thermostats was used to control the temperature at 298° K. The molecular orbitals were described by a planewave basis with an energy cutoff of 70 Ry. Norm-conserving Troullier-Martins pseudopotentials44 and BLYP density functionals were used, both of which were validated by Zahn.23 We found that selecting from two points, x-∆t or x+∆t, is sufficient to sample the transition-state ensemble, and that is what we did in this study. Stable Basin Definitions. Here, three distances (the O1-H1 distance, the C-O1 distance, and the H1-O2 distance) as shown in Figure 2, are used to determine the stable state to which a particular configuration corresponds. Long molecular dynamics trajectories (5000 steps, equivalently ∼5 ps) with neither constraints nor restraints were applied using CPMD to obtain the fluctuations in these bond distances, and then, an appropriate range was taken by computing the variances of bond distances in the stable bond regions with adjustment such that
4392
J. Phys. Chem. B, Vol. 114, No. 13, 2010
Pan et al.
Figure 3. Transition trajectory with the associated changes in the OPs of bond distances. The distances are d(O1-H1), d(C-O1), and d(H1-O2), corresponding to d(O2-H46), d(C36-O1), and d(H46-O3), respectively.
TABLE 1: Ranges of Bond Distances in Figure 2 Used for Definitions of Basins of Stable Statesa fluctuation reactant mean
std
definition
product mean
std
reactant
product
O1-H1 dist (Å) 1.00 0.04 [0.88,1.12] >1.62 O1-C dist (Å) 1.40 0.08 >2.14 [1.28, 1.64] H1-O2 dist (Å) 1.00 0.03 >1.62 [0.88, 1.12] a A configuration corresponds to a particular stable state (either reactant or product) only when three bond distances are simultaneously within the specified ranges.
in the aimless shooting procedure, no returning to an indeterminate region occurred once the system reached one stable basin (shown in Figure 3). The mean values of these bond distances and their fluctuations are listed in Table 1, which also gives the choice of basin definitions. Note that there is not a unique way to define stable basins within TPS. To do so, an order parameter that can distinguish the two basins is defined and updated as needed until as high of a percentage as possible of the TPS trajectories is harvested as conclusive (one way or another). Our way of defining stable basins gave us 1) are included in the likelihood maximization procedure, the following systematic approach was adopted. Basically, it assumes that important OPs previously screened based on likelihood scores will also be important in comprising reaction coordinate models with more OPs. It starts with the best m one-OP-variable reaction coordinate models. Then, in each round for d-OP variable optimization, every best-ranked (d - 1)-OP variable result is supplemented with every best m one-OP variable to give a d-OP variable model. Next, only the best m d-OP variable results are retained for (d + 1)-OP variable screening. Thereby, each round has roughly m × m optimization problems to solve. Figure 6 shows the flowchart for this algorithm. Uncertainty Analysis in Likelihood Maximization. The criterion to discriminate among different reaction coordinate models r(q) is the Bayesian information criterion (BIC),32 which equals log(N)/2, where N is the number of accepted trajectories in aimless shooting. If the difference in two log-likelihood scores is greater than BIC, the reaction coordinate model having a higher log-likelihood score is superior in describing the transition process. On the other hand, if the difference in two loglikelihood scores is within BIC, the two reaction coordinate models are indistinguishable, at least from the data collected. However, due to finite sampling, there is statistical uncertainty present in the estimate of likelihood score in eq 3.33
σ2(ln L) ) 1 pB(r) ) [1 + tanh(r)] 2
xkfA
∑ pB(xk)[1 - pB(xk)]{ln pB(xk) xk
(2)
ln[1 - pB(xk)]}2 (4)
4394
J. Phys. Chem. B, Vol. 114, No. 13, 2010
Pan et al.
Figure 4. Key snapshots describing the rate-determining step in an acid-catalyzed hydrolysis pathway. Only three water molecules are shown for clarity. The overall trajectory is 4000 MD steps, during which the formation of intermediate I1 occurs. C-O bond formation and a proton-transfer step are concerted.
Figure 5. Accumulation of four different types of trajectories in the procedure of aimless shooting for acid-catalyzed hydrolysis of a peptide bond. These types are the forward half trajectory having reached (reactant) basin A and the backward half trajectory having reached (product) basin B (type 1), the forward half trajectory having reached basin B and the backward half trajectory having reached basin A (type 2), both forward and backward half trajectories having reached basin A (type 3), and both forward and backward half trajectories having reached basin B (type 4).
where the sum is the overall shooting points, each of which is a pB realization. Reaction Coordinate Validation. After the likelihood maximization is used to generate an approximation to the reaction coordinate, its correctness must be checked. This can be done by computing the estimate of the probability of reaching the product basin (pB) from the predicted transition-state region obtained in the likelihood maximization, commonly referred to as a committor distribution analysis, or pB histogram computation.30 In this procedure, independent configurations are generated that all satisfy the predicted transition state. Then, a number of trajectories are initiated with the momenta drawn from a Maxwell-Boltzmann distribution from these configurations, and the estimate of pB values for these configuration can thus be obtained. Finally, a histogram of the number of configurations versus pB values can be constructed.
Figure 6. Flowchart showing how the likelihood maximization is conducted when more OP variables are included in comprising a RC model. Essentially, every iteration for a more than two-OP-variable RC model screened approximately m × m models.
For complex reaction coordinates like the ones used in this study, generating independent configurations for pB histogram computation can be done efficiently by the BOLAS algorithm.46 In our work, shooting points were first examined, and several of them were selected close to the predicted transition state region, as defined by r(q) ) 0 in eq 1. Very short trajectories were fired randomly from each initial configuration, and the end points were evaluated to determine if they were within a narrow window on the transition-state isosurface. If so, this
Acid-Catalyzed Hydrolysis of Peptide Bonds configuration was accepted and became the next shooting point. This process was repeated until an adequate number of configurations was generated from which to shoot reactive trajectories to build a pB histogram. To construct the histogram, trajectories are shot from each configuration with a length corresponding to half of the length of a reactive trajectory. The end points of the trajectories are evaluated, and a histogram is constructed of the probability of reaching basin B from the predicted transition-state isosurface. The basin definitions for constructing the pB histograms correspond to the same basin definitions used for the reactant and product basins in the aimless shooting simulations. An adequate approximation to the true reaction coordinate will yield a histogram that is sharply peaked at pB ) 0.5.31 Additionally, one can make a quantitative comparison of the histogram to the binomial distribution, which will have a mean value of µ ) 0.50 with a standard deviation of σ ) 0.050. The trajectories for the generation of new configurations are ∼5 fs or 50 MD steps long, and the end point window width at r ) 0 is constrained within a range of (1% of the total configuration space sampled, as measured by ∆r. For each histogram assembled in this study, 20 shooting points were collected. From each configuration collected, 20 trajectories were shot, corresponding to approximately 400 trajectories for each histogram. The trajectory length for calculating pB values is ∼200 fs or 2000 MD steps, which is half of the length of the reactive trajectories in the aimless shooting simulations, again resulting in a low rate of inconclusive paths. Results and Discussion Initial Trajectory. As mentioned previously, an initial reactive trajectory was obtained by first running long equilibration MD when fixing postulated bond distances of O2-H46, C36-O1, and H46-O3 (labeled in Figure S.1 in the Supporting Information) in order to remove potential artifacts when using constraints. Then, repeated forward and backward shootings were conducted until the two half-trajectories combined to yield a reactive trajectory. Figure 3 shows the how bond distances change in this particular trajectory, together with extended sampling in the stable states in order to see how these bond distances fluctuate. The bond distances in stable states compare well with known experimental values, and the transition dynamics has a time scale of ∼4000 MD steps. The values and fluctuations of these three bond distances provide a basis for choosing the MD steps in the aimless shooting procedure to reach the compromise between efficiency and a minimal number of inconclusive trajectories. Trajectory Characteristics. Snapshots from a typical reactive trajectory show what happens during the transition, as shown for the reaction core part in Figure 4. All of the reactive trajectories collected showed that the rate-determining step of hydrolysis proceeds in a concerted fashion, that is, one protontransfer process between two nearby water molecules and C-O bond formation occur simultaneously. The initial step of protonation of carbonyl oxygen of the peptide bond, that is, the formation of CH3C+(OH)NHCH3 species, was also observed transiently in some trajectories (∼5%), in agreement with the conclusion that this initial step does not lead to a stable intermediate. (see Movie 1 in Supporting Information for a better view of the reaction dynamics.) Statistics of Trajectory Ensemble. In Figure 5, the total number of four different types of trajectories is plotted against the trajectory index recorded in the aimless shooting procedure. Type 1 and 2 trajectories are reactive since they connect both
J. Phys. Chem. B, Vol. 114, No. 13, 2010 4395 TABLE 2: Likelihood Maximization Results for N ) 1836 Aimless Shooting Paths, with BIC ) log(N)/2 ) 3.758a number of OP variables in RC model 1 2
c
3c c
4
OPs in best ranked RC models phi(O22-O15-O20-H46) d(O2-C36)b a(O22-O24-H46), phi(O2-O20-H82-H46) phi(O15-H46-H105-H55), d(O22-H46), phi(O2-O20-H82-H46) a(O22-O24-H46), phi(O2-O20-H82-H46), phi(H80-O25-H46-H106), phi(H46-O25-H106-H80)
log-likelihood score (ln(L)) -1062.0 -1178.5 -1041.1 -1035.6 -1032.9
a The order parameters (OPs) have the following meaning: d(n1,n2) is the distance between atom numbers n1 and n2; a(n1,n2,n3) is the angle comprised of atom numbers n1, n2, and n3; and phi(n1,n2,n3,n4) is the dihedral angle comprised of atom numbers n1, n2, n3, and n4. The column Ri ’s give the vector R ) (R0,R1, ...,Rn) corresponding to reduced and normalized OP qi ∈ [0,1]. b This order parameter was used in the previous potential-of-mean-force calculation.26 c Results of likelihood maximization with more than one OP variable, with m ) 100, n ) 100. Convergence was achieved at d ) 3. See the text for the modified algorithm of likelihood maximization.
reactant and product states, while types 3 and 4 are rejected shooting moves. As required in configurational Monte Carlo sampling, the acceptance ratio (defined as the ratio of the number of reactive trajectories to that of total) should be around 4050% for both efficiency and sampling speed in exploring trajectory space. As just stated, our choice of ∆t yielded a 44.8% acceptance ratio, which is excellent. It is expected that in the aimless shooting algorithm, the number of trajectories of type 1 at any point in the accumulation of trajectory ensemble should be approximately equal to that of type 2 due to the random nature of assigning initial shooting velocities, as is the case in Figure 5. One also expects that the number of trajectories of type 3 should be approximately equal to that of type 4 if the trajectory space was sufficiently explored during the aimless shooting procedure. Here, the product side of the transitionstate region was explored more because there are more type 3 trajectories than type 4 trajectories. There are a total of 603 type 3 trajectories and 320 type 4 trajectories. The reason for this is that even though the aimless shooting algorithm is selfcorrecting (the shooting points tend to move back to the transition-state region if they have drifted away), it does take time (or rejected shifting moves) for it to get back. When we use fewer trajectories to perform the likelihood maximization procedure where the numbers of type 3 and type 4 trajectories do not show a big difference (∼300 type 3 and ∼300 type 4 trajectories), the results do not differ significantly, and the RC obtained are the same. Thus, we think that the uneven numbers of type 3 and type 4 trajectories are relatively unimportant for our purpose of determining a correct RC. Likelihood Maximization. The results of likelihood maximization are shown in Table 2. A few best reaction coordinate models with different numbers of OP variables (up to four) are listed. (see Table 1 in Supporting Information for more information; the atom labeling is referred to in Figure 1 of the Supporting Information.) In addition, one-OP-variable best reaction coordinate models are pictorially shown in Figure 8. On the basis of likelihood scores, these one-OP-variable reaction coordinate models are much better in describing the reaction in the statistical sense of likelihood maximization than the order parameter of the distance between O(2) and C(36), which was used in the calculation of the potential of mean force for the rate-limiting step of the hydrolysis of N-MAA under acidic pH
4396
J. Phys. Chem. B, Vol. 114, No. 13, 2010
Pan et al.
Figure 7. Comparison of the pB(r) model versus aimless shooting data. Here, the half trajectory pB(r) model33 was used, that is, pB(r) ) [1 + tanh(r)]/2. Note that the error bars appear on the model, not the data. The error bars show how far shooting point data should deviate from the probabilities pB(r) for a perfect reaction coordinate model. (a) Two-OP-variable reaction coordinate model. (b) Three-OP-variable reaction coordinate model.
Figure 8. Illustration of best-ranked one-OP-variable RC models in the likelihood maximization procedure. The OP is given as a dihedral among quadruple atoms (denoted as phi(atom_index_1, atom_index_2, atom_index_3, atom_index_4)), as an angle among triple atoms (denoted as a(atom_index_1, atom_index_2, atom_index_3)), or as a bond distance between pair atoms (denoted as d(atom_index_1, atom_index_2)). The associated likelihood scores (LS) are also given. Note that almost all of these best-ranked involve the hydrogen atom indexed as 46:H.
conditions.23 As more OPs were used in the linear combination expression of the reaction coordinate model in eq 1, higher and higher log-likelihood scores were obtained, suggesting a complex reaction mechanism involving many physical degrees of freedom. In our approximation scheme for the likelihood maximization procedure when including more OP variables in the reaction coordinate models, the log-likelihood score fell within the BIC criterion after three-OP variables were included, indicating a convergent result. Both best reaction coordinate models with two-OP and threeOP variables were checked against the aimless shooting data, as shown in Figure 7. All accepted shooting points for which the forward and backward shootings led to a conclusive trajectory were used to construct two histograms of the reaction coordinate determined from likelihood maximization. The two histograms were based on only the half-trajectories of the forward shooting from each accepted shooting point whether
they ended in the reactant basin or in the product basin. Then, pB(r) data values in Figure 7 were computed as the ratio
pB(r)
|
data,ith bin
)
NfA,i NfA,i + NfB,i
(5)
where NfA, i and NfB, i stand for the number of shooting points whose configurations give the reaction coordinate value r in the ith bin and from which the forward shooting half-trajectory ends in A or B, respectively. Thus, the comparison of model versus data provides a measure of how well aimless shooting collects information about transition paths and shooting points, along with how well likelihood maximization is calculated. Figure 7 shows that both reaction coordinate models fit the aimless shooting data well. Using the equation shown in eq 5, the statistical uncertainty in the log-likelihood score was computed with the collection
Acid-Catalyzed Hydrolysis of Peptide Bonds
J. Phys. Chem. B, Vol. 114, No. 13, 2010 4397 TABLE 3: Maximal Likelihood Estimates for Means and Standard in Deviations the pB Histograms Shown in Figure 8a reaction coordinate model/distribution
µh
σh
C(36)-O(2) best one-OP variable best two-OP variable best three-OP variable binomial distribution ideal P(pB)
0.565 0.590 0.595 0.550 0.500 0.500
0.109 0.042 0.040 0.023 0.025 0.000
a
Figure 9. Illustration of constituent OPs in the best three-OP-variable RC model. Nomenclature of these OPs is defined in Figure 8.
of accepted shooting points to be σ2(ln L) ) 15.33 and 14.60 for the best RC models with two- and three-OP variables, respectively. These values provide reasonable confidence in our log-likelihood score value.47 Figure 9 shows the OPs which comprise the best reaction coordinate model with three-OP variables. Similar to our study on the hydrolytic reaction under neutral pH,16 these OPs include the local bonding pattern changes, such as proton H46 being transferred between the two water molecules, namely, the O22-H46 distance. Solvent degrees of freedom in affecting reaction dynamics are also needed, as seen by the presence of two dihedral angles phi(O15-H46-H105-H55) and phi(O2O20-H82-H46). The inclusion of both local OPs close to the
The procedure was used as that in Peters.47
reaction center and the OPs describing the solvent networks suggests the importance of solvent in determining reaction dynamics. Consistent with the findings of other studies, such as proton transfer in liquid water,45 protein folding,48 and the solvation of small hydrophobic molecules,49,50 aqueous-phase reactions can critically depend on solvent degrees of freedom. Reaction Coordinate Validation. Four pB histograms were computed using the method described above. These include using the C36-O2 distance reaction coordinate model and the best one-, two-, and three-OP-variable RC models. The results are shown in Figure 10. The poor description of the reaction process by the C-O distance RC model can be seen in this pB histogram since its distribution is very skewed. Both the best one- and the best two-OP-variable RC models show inferior histograms compared to the best three-OP-variable RC model. Summary and Conclusions To elucidate the mechanism of the nonenzymatic fragmentation of peptide bonds in the hinge region of antibody molecules, we examined the mechanism of the rate-determining step of hydrolytic reaction of peptide bonds under acidic pH conditions
Figure 10. Committor probability histogram using C(36)-O(2) as the reaction coordinate model (a), the best one-OP-variable reaction coordinate model (b), the best two-OP-variable reaction coordinate model (c), and the best three-OP-variable reaction coordinate model (d), compared with the binomial distribution (red line). Quantification of means and standard deviations for these histograms following the procedure in Peters47 is shown in Table 3.
4398
J. Phys. Chem. B, Vol. 114, No. 13, 2010
using a model compound N-MAA. We considered the effects of explicit solvent molecules and of dynamic effects at finite temperatures important for the study of this reaction. Thus, a path sampling method with ab initio molecular dynamics was used to generate an ensemble of trajectories according to their statistical weights in trajectory space for our simulation system where explicit water molecules were included. In order to determine the reaction coordinate, the technique of likelihood maximization and its modification were used in extracting physically important degrees of freedom in the system. Different approximations of the reaction coordinate were compared. It was found that this hydrolytic reaction is very complex in nature and involves many degrees of freedom. The specific conclusions obtained in our study are the following. • The rate-determining step of the hydrolytic reaction of N-MAA under acidic pH occurs in a concerted fashion; a stable intermediate was found to be the final state in our path sampling simulations. • The likelihood maximization procedure was extended to screen RC models with more OP variables, and within BIC, a reaction coordinate with three constituent geometric variables was found to be the best in describing the path ensemble that we generated. • In the best RC model, both geometric quantities that reflect bonding pattern changes and those which reflect the solvent network changes are included, suggesting a complicated reaction involving many degrees of freedom. The participation of the solvent degrees of freedom necessitates the use of explicit water molecules in studying chemical reactions such as this one. This finding is in line with other studies where the importance of solvent effects is revealed, such as proton transfer in liquid water,45 protein folding,48 and the solvation of small hydrophobic molecules.49,50 • Several pB histograms were computed to verify the results of likelihood maximization, and the relative ranking of the top reaction coordinate models is in accord with their respective likelihood scores. The technique of likelihood maximization is a very powerful statistical tool in determining a reaction mechanism, especially when it is complex. However, new problems arise when the set of candidate order parameters is large, such as the combinatorial problem in exhaustive screening for the best reaction coordinate model. This study provides an approach to conduct a curtailed optimization procedure to determine the reaction coordinate. In addition, our approach critically relies on the proper choice the set of candidate OPs, as does the approach of the genetic algorithm and neutral network.35 One can take the advantage that the computational cost of a single such OP variable is fast in order to perform the screening over a large number of candidate variables, thus partially overcoming the problem of dependence on the set of candidate OPs. Even though it may be difficult to relate a concise physical picture with the likelihood maximization using many OPs, with respect to the pB histogram and the calculation of quantities of interest such as free-energy profile and reaction rates, this improvement has advantages. Overall, our study shows that combining path sampling and reaction coordinate determination techniques gave us direct information about the nature of the hydrolysis reaction under acidic pH. In particular, the reaction itself involves many degrees of freedom, including solvent degrees of freedom. There does not seem to be a simple geometic coordinate that governs the reaction. The reaction coordinate found here should be physically relevant in that rates of reaction and free-energy barriers
Pan et al. to reaction that can be calculated using this reaction coordinate should be physically accurate. This reaction coordinate could also be used to understand the reason why rates of hydrolysis of the same pairs of amino acids depend on the structure of the protein in which they reside. Both of these would be separate studies and outside of the scope of the present work. Acknowledgment. We thank Amgen Inc. for their generous financial support of this study. We thank Dr. Gregg Beckham, Dr. Baron Peters, Dr. Erik Santiso, and Nicholas Musolino for insightful discussions. We are grateful to Tom Dillon, Dr. Pavel Bondarenko, Dr. David Brems, and Jeff Abel at Amgen Inc. for their support of this work and our previous work. Special thanks to Elizabeth Fox at MIT’s Writing Center for editing of this manuscript. Supporting Information Available: Some figures, tables, and movies. This material is available free of charge via the Internet at http://pubs.acs.org. References and Notes (1) Radzicka, A.; Wolfenden, R. Science 1995, 267, 90–93. (2) Bryant, R. A. R.; Hansen, D. E. J. Am. Chem. Soc. 1996, 118, 5498–5499. (3) Radzicka, A.; Wolfenden, R. J. Am. Chem. Soc. 1996, 118, 6105– 6109. (4) Kahne, D.; Still, W. C. J. Am. Chem. Soc. 1988, 110, 7529–7534. (5) Chu, J. W.; Yin, J.; Brooks, B. R.; Wang, D. I. C.; Ricci, M. S.; Brems, D. N.; Trout, B. L. J. Pharm. Sci. 2004, 93, 3096–3102. (6) Pan, B.; Abel, J.; Ricci, M. S.; Brems, D. N.; Wang, D. I. C.; Trout, B. L. Biochemistry 2006, 45, 15430–15443. (7) Liu, D. T. Y. Trends Biotechnol. 1992, 10, 364–369. (8) Kosky, A. A.; Razzaq, U. O.; Treuheit, M. J.; Brems, D. N. Protein Sci. 1999, 8, 2519–2523. (9) Wei, W. Int. J. Pharm. 1999, 185, 129–188. (10) Daugherty, A. L.; Mrsny, R. J. AdV. Drug DeliVery ReV. 2006, 58, 686–706. (11) Cordoba, A. J.; Shyong, B. J.; Breen, D.; Harris, R. J. J. Chromatogr., B 2005, 818, 115–121. (12) Dillon, T. M.; Bondarenko, P. V.; Rehder, D. S.; Pipes, G. D.; Kleemann, G. R.; Ricci, M. S. J. Chromatogr., A 2006, 1120, 112–120. (13) Cohen, S. L.; Price, C.; Vlasak, J. J. Am. Chem. Soc. 2007, 129, 6976–6977. (14) Perico, N.; Purtell, J.; Dillon, T.; Ricci, M. S. J. Pharm. Sci. 2009, 98, 3031–3042. (15) Buren, N. V.; Rehder, D.; Matsumura, H.; G, M.; Jacob, J. J. Pharm. Sci. 2009, 98, 3013–3030. (16) Pan, B.; Ricci, M. S.; Trout, B. L. Submitted. (17) Brown, R. S.; Bennet, A. J.; Slebockatilk, H. Acc. Chem. Res. 1992, 25, 481–488. (18) Smith, R. M.; Hansen, D. E. J. Am. Chem. Soc. 1998, 120, 8910– 8913. (19) Krug, J. P.; Popelier, P. L. A.; Bader, R. F. W. J. Phys. Chem. 1992, 96, 7604–7616. (20) Antonczak, S.; Ruizlopez, M. F.; Rivail, J. L. J. Am. Chem. Soc. 1994, 116, 3912–3921. (21) Bakowies, D.; Kollman, P. A. J. Am. Chem. Soc. 1999, 121, 5712– 5726. (22) Stanton, R. V.; Perakyla, M.; Bakowies, D.; Kollman, P. A. J. Am. Chem. Soc. 1998, 120, 3448–3457. (23) Zahn, D. J. Phys. Chem. B 2003, 107, 12303–12306. (24) Zahn, D. Chem. Phys. Lett. 2004, 383, 134–137. (25) Zahn, D. Chem. Phys. 2004, 300, 79–83. (26) Zahn, D. Eur. J. Org. Chem. 2004, 4020–4023. (27) Zahn, D.; Schmidt, K. F.; Kast, S. M.; Brickmann, J. J. Phys. Chem. A 2002, 106, 7807–7812. (28) Wang, W.; Singh, S.; Zeng, D. L.; King, K.; Nema, S. J. Pharm. Sci. 2007, 96, 1–26. (29) Bolhuis, P. G.; Chandler, D.; Dellago, C.; Geissler, P. L. Annu. ReV. Phys. Chem. 2002, 53, 291–318. (30) Dellago, C.; Bolhuis, P. G.; Csajka, F. S.; Chandler, D. J. Chem. Phys. 1998, 108, 1964–1977. (31) Dellago, C.; Bolhuis, P. G.; Geissler, P. L. AdV. Chem. Phys. 2002, 123, 1–78. (32) Peters, B.; Trout, B. L. J. Chem. Phys. 2006, 125, 054108/1–054109/ 10.
Acid-Catalyzed Hydrolysis of Peptide Bonds (33) Peters, B.; Beckham, G. T.; Trout, B. L. J. Chem. Phys. 2007, 127, 034109/1–034109/9. (34) Beckham, G. T.; Peters, B.; Starbuck, C.; Variankaval, N.; Trout, B. L. J. Am. Chem. Soc. 2007, 129, 4714–4723. (35) Ma, A.; Dinner, A. R. J. Phys. Chem. B 2005, 109, 6769–6779. (36) Allen, M. P.; Tildesley, D. J. Computer simulation of liquids; Oxford University Press: Oxford, U.K., 1989. (37) Frenkel, D.; Smit, B. Understanding molecular simulation: from algorithms to applications, 2nd ed.; Academic: San Diego, CA and London, 2002. (38) MacKerell, A. D., Jr.; et al. J. Phys. Chem. B 1998, 102, 3586– 3616. (39) Brooks, B. R.; Bruccoleri, R. E.; Olafson, B. D.; States, D. J.; Swaminathan, S.; Karplus, M. J. Comput. Chem. 1983, 4, 187–217. (40) Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L. J. Chem. Phys. 1983, 79, 926–935. (41) Marx, D.; Tuckerman, M. E.; Hutter, J.; Parrinello, M. Nature 1999, 397, 601–604.
J. Phys. Chem. B, Vol. 114, No. 13, 2010 4399 (42) Car, R.; Parrinello, M. Phys. ReV. Lett. 1985, 55, 2471–2474. (43) CPMD; IBM Corp. and MPI fu¨r Festko¨rperforschung: Stuttgart, Germanyhttp://www.cpmd.org/ (1997-2001). (44) Troullier, N.; Martins, J. L. Phys. ReV. B 1991, 43, 1993–2006. (45) Geissler, P. L.; Dellago, C.; Chandler, D.; Hutter, J.; Parrinello, M. Science 2001, 291, 2121–2124. (46) Radhakrishnan, R.; Schlick, T. J. Chem. Phys. 2004, 121, 2436– 2444. (47) Peters, B. J. Chem. Phys. 2006, 125, 241101 1–4. (48) Bolhuis, P. G. Proc. Natl. Acad. Sci. U.S.A. 2003, 100, 12129– 12134. (49) Miller, T. F.; Vanden-Eijnden, E.; Chandle, D. Proc. Natl. Acad. Sci. U.S.A. 2007, 104, 14559–14564. (50) van Erp, S. T.; Meijier, J. E. Angew. Chem., Int. Ed. 2004, 43, 1660–1662.
JP905411N