J. Phys. Chem. 1993,97, 12959-12966
12959
Molecular Dynamics Simulations of a Protein in the Canonical Ensemble Douglas J. Tobias, Glenn J. Martyna, and Michael L. Klein’ Department of Chemistry, University of Pennsylvania, Philadelphia, Pennsylvania 19104-6323 Received: April 13, 1993’
We report the results of constant energy (microcanonical ensemble) and constant temperature (canonical ensemble) molecular dynamics simulations of bovine pancreatic trypsin inhibitor. The constant temperature simulations were carried out by using either Langevin dynamics or one of three ”extended system” (ES) methods: a single NosbHoover thermostat, a single Nos€-Hoover chain, and “massive” Nosb-Hoover chains (one chain coupled to each degree of freedom). We evaluate the abilities of the different methods to provide temperature control and examine their effects on kinetic energy equipartitioning rates, average structures and fluctuations, and dynamical properties. Provided the system is well-equilibrated, all of the methods considered in this study can be used to perform essentially constant temperature simulations of proteins that generate canonical velocity distributions. The methods differ in their utility as equilibration techniques. During equilibration, in microcanonical simulations, the temperature can drift, while in NosbHoover simulations, relatively large, undesirable oscillations in the temperature can build up. The simulations based on the Langevin or NosbHoover chain dynamics do not suffer from these problems. The equipartitioning rates were comparable in the microcanonical, NosbHoover, and Nos€-Hoover chain simulations, but they were significantly faster in the “massive”chains and Langevin simulations, particularly at larger values of the coupling parameter (thermostat frequency and friction, respectively). The average structures from all of the simulations were similar, differing about 2 A from the experimentally determined structure. The root-mean-squared fluctuations were roughly the same in the microcanonical, low-friction Langevin, NosbHoover, Nosk-Hoover chains, and low-frequency “massive” simulations. The fluctuations were moderately increased at larger thermostat frequencies in the “massive” chain simulations but were drastically reduced with increasing friction constants in the Langevin simulations. Our analysis of various time correlation functions shows that the protein dynamics was not seriously affected by NosGHoover thermostats but, not surprisingly, thedynamics in the high-friction Langevin simulations were much different from the microcanonical and ES simulations. On the basis of the results of this investigation, we recommended the “massive” NosbHoover chain dynamics as an efficient technique for equilibration and any of the Nos€-Hoover chain based ES methods for production dynamics where good temperature control is desired in protein simulations.
I. Introduction The potential utility of computer simulations in biology was demonstratedwhen the first molecular dynamics (MD) simulation of a protein was reported in 1977.1 Subsequently,MD simulations of proteins were instrumental in changing the static view of the structure of biological molecules to a dynamic More recently, a variety of studies based on MD simulations have complementedexperimentalstudies, furthering our understanding of protein folding, structure, stability, and function.*J MD simulations of proteins are currently also being used in many other important applications, including the design of new drugs4 and as a tool for the refinement of protein structures determined from X-ray diffractions and nuclear magnetic resonance spectroscopic data.6 Thus far, most MD simulations of proteins have been carried out in the microcanonical ensemble where the number of particles, volume, and energy are constant (constant NVE), and the characteristic thermodynamic potential is the entropy, S(N,V,E). (MD simulationsare most easily carried out in the microcanonical ensemble.) There are, however, several disadvantages to doing simulationsin the microcanonical en~emble.~ First, oneobjective of performing a simulation study is to calculate experimentally measurable quantities. The experiments with which we want to make contact are usually carried out under the conditions of constant particle number and temperature and either constant pressureor volume (i.e., in the constant NPT (isobaric-isothermal) or constant NVT (canonical) ensembles). The temperature and Abstract published in Advance ACS Absfracfs,November 1, 1993.
0022-3654/93/2097-12959$04.00/0
pressure are computed from microcanonical simulations as averages of the kinetic energy and the virial, respectively, and therefore, we do not know whether a study has been performed at the desired thermodynamic state until after it has been carried out. In the isobaric-isothermal and canonical ensembles, the characteristic thermodynamic potentials are free energies (the Gibbs free energy, G(N,P,T), and the Helmholtz free energy, A(N,V,T)). These thermodynamic functions are much easier to calculate than the entropy via simulations.8 Furthermore, expressions for properties related to fluctuations of the thermodynamic potential are generally less convenient to use in the microcanonical ensemble. Several techniques have been employed to control the temperature in protein simulations. Perhaps the simplest and most commonly used method is the velocity scaling a l g ~ r i t h m .Here, ~ the velocities of all the particles in the system are periodically multiplied by a factor which scales thekinetic energy to thedesired value. This method gives the correct canonical distribution in coordinate space (to an accuracy of the order of the time step, At) if the scaling is performed every time steps7Another simple, commonly employed method that gives thecanonical distribution is the method of stochastic collisions10 in which some or all of the velocities are occasionally resampled at random from the Maxwell-Boltzmann distribution. Constant temperature simulations of proteins4 have also been carried out by integrating Langevin’s equations where coupling to a heat bath is accomplished by including frictionaland stochasticterms in the equations of motion.11 The final method that has been commonly applied in protein simulations is the so-called “coupling to an external 0 1993 American Chemical Society
Tobias et al.
12960 The Journal of Physical Chemistry, Vol. 97,No. 49, 1993
bath” method of Berendsen et al.,I2 which is a refinement of the velocity scaling algorithm. All of the constant temperature simulation methods discussed thus far have some serious disadvantages. First, the dynamics generated by using the velocity scaling, stochastic collision, and Langevin algorithms are irreversible: in the velocity scaling and stochastic collision methods the dynamics become discontinuous when the veclocities are modified, and the Langevin dynamics are irreversible due to the presence of the random force. In these three methods there are also no well-defined, conserved quantities. Furthermore, trajectories generated by the stochastic collision and Langevin dynamics methods are difficult to reproduce by another worker on a different computer because their computation involves the use of random numbers. The method of Berendsen et al., which is also irreversible, suffers from the additional problem that it does not generate the canonical distribution or, in fact, states in any known statistical mechanical e n ~ e m b l e . ~ One class of constant temperature simulation methods that has seen numerous applications to a wide range of problems in such diverse areas aschemical and solid-statephysics, spin systems, and lattice guage theory, but not, until now, in simulations of biological molecules, is the so-called ‘extended system” (ES) methods7 The ES method, as originally proposed by Nosei3 following the use of an extended system by Andersen’Ofor constant pressure simulations, involves the introduction of an additional dynamical variable that corresponds to the heat bath and acts by scaling the time. Hoover14 rewrote Noses equations in a more convenient form, eliminating the time scaling variable and introducing a friction coefficient. The resulting method, commonly referred to as the NoseHoover thermostat, is identical to NOSC’Soriginal method in the case of a single thermostat but is more generally applicable, for example, to the case of multiple thermostats.’ The ES methods are attractive because they produce continuous dynamics with a well-defined, conserved quantity and, assuming the dynamics is ergodic, they generate canonically distributed positions and m ~ m e n t a Unfortunately, .~ for small or “stiff“ systems (among which proteins might be an example) the dynamics is not ergodic, and the correct distributions are not generated.l4JS Martyna et a1.16 recently devised a solution to the problem of nonergodicity in Nost-Hoover dynamics. They pointed out that, although the Gaussian fluctuations of the particle momenta are driven by the thermostat, there is nothing in the dynamics to drive the correct fluctuations in the thermostat momentum. However, ergodicity requires that the dynamics of the system covers the whole phase space which includes the thermostat variables. To ensure the ergodicity of the extended system, Martyna et al. introduced a “chain” of Nos€-Hoover thermostats, referred to collectively as a “Nos€-Hoover chain”. The first thermostat in the chain is coupled to the particles, the second is coupled to the first thermostat, the third to the second, etc. Martyna et al. showed that the NosbHoover chain dynamics gives a very good approximation to the canonical ensemble in pathological cases where the original Nost-Hoover dynamics failed. One attractive featureof the No&-Hoover thermostat methods is that they may be easilyextended to include multiple thermostats. The ability to couple different degrees of freedom with different energy flow characteristics to separate thermostats is very useful in various situations. For example, separate thermostats can be coupled to solvent and protein degrees of freedom to keep them at the same temperature in simulations of a protein in solution. (This is often a problem in microcanonicalsimulations.) Separate thermostats can also be used to keep nuclear degrees of freedom hot and electronic degrees of freedom cold, Le., on the BornOppenheimer surface, a requirement that is crucial in CarParrinello ~imu1ations.I~In this paper we propose as a rapid equilibration technique for proteins the extreme case of coupling a separate Nost-Hoover chain to each degree of freedom in the system. Indeed, this technique, which we refer to as “massive”
Nost-Hoover chains, has proven to be a crucial ingredient in a new, efficient path integral M D method.’* Our objectivesin this study were toinvestigate theconsequences of using ES methods for carrying out MD simulations of proteins in the canonical ensemble. To this end, we carried out several simulations of the small, globular protein, bovine pancreatic trypsin inhibitor (BFTI). BFTI is a good candidate for comparing simulation methods because its structure and dynamics have been extensively studied experimentally and with computer simulat i o n ~ , ~ and J ~ Jits~ small size (58 amino acid residues) makes it quite tractable computationally. In this paper we compare the results of microcanonical, Langevin, and ES simulations of BFTI, evaluating the abilities of the different simulation methods to provide temperature control and examining the effects of using the different methods on kinetic energy equipartitioning rates, average structures and fluctuations, and dynamical properties. The ES methods we consider are the original Nose-Hoover thermostat, the Nost-Hoover chain, and “massive” Nost-Hoover chains. Before we present the results, we briefly review the ES and Langevin dynamics methods and report the details of the BPTI simulations. 11. NosCHoover Chain Dynamics
Here we briefly review the equations of motion in the NostHooverchaindynamics.16 Weconsider thecaseofnchains where the j t h chain is coupled to gj degrees of freedom and n
is the total number of degrees of freedom in the system. Newton’s equation of motion for the ith degree of freedom coupled to the jth chain is
where Fij is the force on qij, ml, is the associated mass, and q l j is the coordinate of the first thermostat variable in the chain. The equations of motion for the thermostat variables in thejth chain containing Mjthermostats at a temperature Ti are
..
ill
. .
(3) where k~ is Boltzmann’s constant and Qkj is the mass of the kth thermostatvariablein thechain. Itcanbeshownthat theextended system has the conserved quantity7916
I
M.
where U is the potential energy. Note that the original Nos& Hoover method corresponds to Mj = 1 in the above equations. The rate of convergence of a system to the canonical ensemble can depend on the choice of the thermostat mass parameters.21
Simulations of a Protein in the Canonical Ensemble Analyses of the Nos&Hoover equations by and the Nos& Hoover chain equations by Martyna et a1.16 suggest that a good choice for the thermostat masses is Qu = g,kBTpZ and Qkl = kBT/rZwhere T is a characteristic time scale of the system. In the case of the “massive” chains we choose all the masses in a given chain to be the same (since gj = 1) and equal to Qkl = ( m l / m ~ ) k ~ T ’where ~ , mH is the mass of a hydrogen atom, so that the relative rates of thermostat fluctuations are inversely proportional to the masses, mi, of the particles to which they are coupled. In practice, the choice of thermostat masses is not very crucial in simulations with a single NosbHoover chain.I6 In preliminary work, however, we found that the “massive” chains simulations were quite sensitive to initial conditions. But we discovered subsequently that this sensitivity could be eliminated by coupling an additional chain with a thermostat time scale 7’ to the whole system. We have employed the additional chain in the “massive” chain simulations described below. 111. Langevin Dynamics
Langevin dynamics are described by the Langevin equation12
+
m&, = F, - m,rAr I?,(?) (5) where y is a friction constant and R ( t ) is a Gaussian stochastic variable with zero mean and variance The friction constant y determines the strength of the coupling to the heat bath. and its inverse. which we denote as 7. can be interpreted as a velocity relaxation time analogous to the 7 used to choose the thermostat masses in the ES methods. Note that, in the limit of zero fiction, Langevin’sequations become Newton’s equa tions.
IV. Simulations The starting point of our calculationswas the crystal structure of BPTI (file 5PTI in the Brookhaven Protein Data Bank) determined by joint refinement of X-ray diffraction data to 1-A resolution (R = 0.200) and neutron diffraction data to 1.8 A (R = 0.197) by Wlodawer et al.19 We discarded all but the four internal water molecules in the crystal structure and employed the “polar-H” modelz2in which only hydrogen atoms that are capable of hydrogen bonding (i.e., those attached to 0 and N atoms) are explicitly represented and nonpolar hydrogen atoms are implicitly represented through the parameters of the heavy atoms to which they are attached. The total number of atoms is 580 in the polar-H model of BPTI with the four internal water molecules. In the energy-based calculations we used the CHARMM potential energy functionz2 with the PARAM19 protein parameters23 and the TIP3P water parameters24 and shifting functions to truncate the nonbonded interactions at 8 A. We used the CHARMM program,22 version 22, as modified by us to incorporate ES simulation methods, to carry out the energy minimization and MD calculations described below. Webegan the preparationofthe protein for theMDsimulations by running lo00 cycles of energy minimization with the Powell algorithm.25 Following the minimization we equilibrated the protein by running 40 000 steps (20 ps) MD using the Verlet algorithm26 to integrate the equations of motion with a time step of 0.0005 ps. During this run we checked average temperature of the system every 100 steps and resampled the velocities from a Maxwell-Boltzmann distribution with T = 300 K if the temperaturewas not within 5 Kof 300 K. This 20-ps equilibration period appeared to be sufficient since the potential energy and temperature ceased to drift after 15 p. Starting with the atomic positions and velocities from the end of the 20-ps equilibration, we initiated eight 80 000-step (40-ps)
The Journal of Physical Chemistry, Vol. 97, No. 49, 1993 12961
MD simulationsof BPTI. The first was a microcanonical (NVE) simulationcarried out using the Verlet algorithm. The next three were Langevin dynamics simulations at 300 K in which the equations of motion were integrated with the low-friction Langevin/Verlet algorithm of Brunger et al.27928 The three Langevin simulationswere carried out with the following friction coefficients assigned to the heavy (non-hydrogen) atoms: the values y = 2 and 20 ps-l, corresponding to 7 = 0.5 and 0.05 ps, respectively, as well as thevalue y = 50 ps-l (7 = 0.02ps) suggested by Fleischman and BrooksS4The remaining four simulations were performed at 300 K using the ES methods (constant NVT) with the following thermostats, respectively: a single Nos& Hoover thermostat (NH) with 7 = 0.5 ps, a single Nost-Hoover chain (NHC) with M 3: 5 and T = 0.5 ps, “massive” Nos& Hoover chains (MNHC) with M = 5 and T‘ = T = 0.5 ps, and “massive” chains with M = 5 and 7‘ = 7 = 0.05 ps. Because the atomic accelerations depend on the atomic velocities in the ES equationsof motion, we used a Verlet-like algorithm with velocity iteration29 to integrate the equations of motion in the ES simulations. (We note that other algorithms, e.g. the velocity Verlet algorithm,ls can be used, but we found it easier to simply modify the existing Verlet algorithm in CHARMM and had no problems with stability in the present study.) With tolerances on the root-mean-squared change in the velocities of 5 X 10-17 Asps-’ in the N H and NHC simulations and 5 X Amps-1 in the MNHC simulations,4-5 iterations were required to converge the velocities at each time step. To start the ES simulations, we set the initial values of the thermostat variables to zero and assigned the initial thermostat velocities randomly from a Maxwell-Boltzmann distribution. Our experience suggested that temperature control in microCanOniCal and ES SimUlatiOIlS might be Sensitive to initial conditions. We carried Out, therefore, an additional five 2O-ps simulations to investigatethe effects of incomplete equilibration on the temperature controllingabilities of these methods: NVE. N H ( 7 = 0.5 ps), NHC ( 7 =%.5 ps), and MNHC ( 7 = 0.5 and 0.05 ps). These simulations were identical to those described in the last paragraph except that they were started using the configuration and velocities from the middle (after 10 ps) of the equilibration run and were run for half as long. Before presenting the results, we comment briefly on the relative computational requirements of the simulations. The NVE and Langevin simulations used roughly the same amount of CPU time and memory. The N H and NHC methods used about the same amount of memory as the NVE method but more CPU time because of the iterative algorithm used to integrate the equations of motion. With the convergence criteria used in this study, the N H and NHC simulationsused about 15%more CPU time than the NVE simulation. The computationalrequirements of the MNHC simulationsaresignificantlygreater than the other methods because the MNHC method introducesL = (3N+ l ) M additional equations of motion, where N is the number of atoms and M is the number of thermostats per chain. Our integrator uses five arrays of length L,which translates to less than 1 MWord = 8 Mbyte of additional memory (no problem on present generation workstationsand mainframe computers) for a relatively large system of 10 000 atoms with M = 5. The significance of the CPU overhead depends on the amount of time spent calculating the forces. In this study, the MNHC simulations used about twice as much CPU time as the other simulations. V. Result9 and Discussion
(a) Temperature Control. In Figure 1 we show plots of the instantaneous temperature versus time for the eight simulations of BPTI started after 20-ps equilibration. In all of the simulations the average temperature is very close to the desired value of 300 K. The temperature displays very rapid fluctuations (