Estimation of Error in the Diffusion Coefficient from Molecular

Adsorption and Diffusion of H2 in the MOF Type Systems MIL-47(V) and MIL-53(Cr): A Combination of Microcalorimetry and QENS Experiments with Molecular...
34 downloads 0 Views 420KB Size
J. Phys. Chem. B 1997, 101, 5437-5445

5437

Estimation of Error in the Diffusion Coefficient from Molecular Dynamics Simulations† R. Chitra‡ and S. Yashonath*,‡,§ Solid State and Structural Chemistry Unit and Supercomputer Education and Research Centre, Indian Institute of Science, Bangalore-560012, India ReceiVed: January 23, 1997; In Final Form: April 28, 1997X

Long molecular dynamics simulations (120 ns) of guests confined in zeolite NaCaA as well as that of pure fluid have been carried out in the microcanonical ensemble to obtain an estimate of the error associated with the diffusion coefficient (D). It is found that the error is about 50% for a 1 ns run for argon parameters in NaCaA zeolite. It is found that shorter runs lead to an overestimation of the value of D. It is also found that a linear fit to a region of the msd (mean squared displacement) below ∼200 ps leads to an overestimation of D. For runs that are sufficiently long, the initial configuration has no effect on D. Further, the values obtained from the msd and the velocity autocorrelation function converge for the runs reported here. It is shown that the error in D depends on the nature of the system and its underlying potential energy surface. The calculated statistical inefficiency suggests that the averages of properties such as the total energy will be uncorrelated over blocks longer than 150 ps. Dependence of the error in D on the simulation length and size is reported.

1. Introduction Molecular dynamics and Monte Carlo methods are being used increasingly to model and simulate many processes and phenomena in nature.1-5 The error involved in the calculation of a property would have a bearing on the conclusions drawn, whether such conclusions are based on the results of the calculation itself or whether they are based on comparison of the calculated values with experimentally derived properties. A knowledge of the associated error is relevant not only to simulation scientists but also to experimental scientists as well as theorists since they often compare their results with numbers from simulation. Methods for estimation of the error involved in the results obtained from simulation have been discussed in the literature by several workers.6-10 Zwanzig and Ailawadi6 have obtained the dependence of the error associated with replacement of an equilibrium ensemble average by a time average over a finite interval of time. This is particularly relevant in the computation of time correlation functions of dynamical variables. They have shown that the error is inversely proportional to the square root of the duration of the run. Wood7 has discussed the method for the estimation of the error involved in the computation of any quantity by dividing the run into a number of shorter blocks. This method has since been used widely. Allen and Tildesley8 have discussed methods to estimate errors in equilibrium or firstorder quantities, in fluctuation-dependent second-order quantities, and in structural and correlation functions. Smith and Wells9 have suggested a method to estimate errors that attempts to reduce the effect of correlation within the run using autocorrelation and partial autocorrelation functions. Straatsma et al.10 have proposed a modification that does not require the tedious analysis of the correlation behavior. The error associated with a first-order quantity can be estimated without much difficulty. However, as is well-known, the calculation of the mean and the variance of a second-order quantity such as, for example, specific heat, Cp or CV, isothermal † Contribution No. 1261 from the Solid State and Structural Chemistry Unit. ‡ Solid State and Structural Chemistry Unit. § Supercomputer Education and Research Centre. X Abstract published in AdVance ACS Abstracts, June 15, 1997.

S1089-5647(97)00305-2 CCC: $14.00

compressibility, κ, and the coefficient of thermal expansion, R, is more difficult due to the slower convergence.5,8 These quantities are dependent on the fluctuation in a related quantity, and they are frequently reported in the literature, as they are of considerable interest to chemists, physicists, biologists, and materials scientists among others. Consequently, it is essential to have a knowledge of the error involved in these properties. Properties such as the diffusion coefficient are even more difficult to evaluate accurately since we are dealing with timedependent as well as fluctuation-dependent quantities. One needs to evaluate autocorrelation functions to obtain the diffusion coefficient, and the sources of error in the evaluation of the autocorrelation function are several.6 Many transport properties such as the diffusion coefficient are frequently reported, and it is of considerable importance to know the accuracy with which we are able to calculate such quantities. Detailed investigations carried out a few years ago in this laboratory to obtain dependence of the diffusion coefficient, D, on the run length and size of the system did not reveal any well-defined trend. In retrospect, we realize that our runs were not sufficiently long, as a result of which no definitive trend could be seen. Availability of faster computers in recent times has made possible the present study. During the past few years, simulations of sorbed species in porous media such as zeolites have been carried out by several workers.11-17 A detailed study suggests that the diffusion coefficient of sorbates in porous systems shows an anomalous behavior. A peak is observed when the sorbate size is significantly large and comparable to the pore diameter. It is not clear whether the changes in the diffusion coefficient reported are larger than the error involved in their estimation. The present study attempts to calculate the error in the calculated value of D for such systems. In this work we report microcanonical ensemble molecular dynamics computer simulations of total length of about 120 ns or 0.12 µs consisting of two different runs, each of about 60 ns, and several shorter (15 ns) runs. We have also chosen a sufficiently large system, which enables us to answer some of the questions listed below. The system consisted of sorbates diffusing in zeolite NaCaA of approximate size 50 × 50 × 250 Å. Even though the principal results reported here are directly relevant in the study of particles diffusing in the host zeolite © 1997 American Chemical Society

5438 J. Phys. Chem. B, Vol. 101, No. 27, 1997

Chitra and Yashonath TABLE 1: Self-Interaction Parameters for Zeolite Atoms type

σ, Å

, kJ/mol

O-O Na-Na Ca-Ca

2.79 3.26 3.35

1.759 920 0.089 015 9.545 283

TABLE 2: Cross-Interaction Parameters for Guest-Guest and Guest-Zeolite Interactionsa σ, Å

type

Guest-Zeolite System guest-guest interactions a-a 2.21 a-b 2.38 a-c 2.775 b-b 2.55 b-c 2.945 c-c 3.34 guest-zeolite interactions a-O 2.5 a-Na 2.735 a-Ca 2.78 b-O 2.67 b-Na 2.905 b-Ca 2.95 c-O 3.065 c-Na 3.3 c-Ca 3.345

Figure 1. One crystallographic unit cell of zeolite A consisting of eight R-cages. The shaded spheres indicate the positions of the Na and Ca atoms near the centers of the 6-rings. The Ca atoms are slightly larger than the Na atoms.

system, the results reported also throw light on the error associated with simulations of pure fluids. Our principal results concern the value of the diffusion coefficient and the error in it. We shall also attempt to answer a few related questions, some of which are (i) How does the error in D depend on the duration of the simulation? In other words, what should be the duration of the simulation to obtain D within a specified accuracy? (ii) What is the effect of using a shorter run to estimate D? (iii) Does the choice of initial configuration have an effect on the estimated value of D? (iv) Do the values of D obtained from the velocity autocorrelation function (vacf) and from the mean squared displacement (msd) converge? (v) How does error in D depend on the nature of the system being investigated? (vi) How does the error in D depend on the system size? We have carried outsin addition to investigations on sorbate particles in zeolitesssimulations of pure fluid in order to answer the last two questions. 2. Structural Details The structure of zeolite NaCaA was taken from the work of Pluth and Smith.18 This zeolite belongs to the Fm3hc space group with a unit cell edge length of 24.555 Å. Each unit cell contains 192 Si, 384 O, 32 Na, and 32 Ca atoms. The sodium and the calcium atoms occupy positions close to the centers of the 6-rings in the zeolite framework. There are eight R-cages in each unit cell. The diameter of an R-cage is approximately 11.4 Å. The R-cages are all octahedrally connected by 8-ring windows that have a diameter of about 4.5 Å (see Figure 1). 3. Intermolecular Potential

[( ) ( ) ] σ rij

12

-

σ rij

6

(1)

where i and j are any two sorbate atoms. σ is the diameter of the interacting sorbates, and  is the well depth. The total sorbate-sorbate interaction is the sum of the interactions between all distinct pairs of sorbates:

Ugg )

1

N

Pure Fluid System 2.21

0.997 729 0.997 729 0.997 729 0.997 729 0.997 729 0.997 729 1.325 113 0.298 015 3.086 034 1.325 113 0.298 015 3.086 034 1.325 113 0.298 015 3.086 034 0.997 729

a

Particles of types a, b, and c are of sizes 2.21, 2.55, and 3.34 Å, respectively.

The interactions of the sorbates with the O, Na, and Ca atoms in the zeolite framework were also assumed to be of the (6,12) Lennard-Jones form. The bulky oxygen atoms shield the Si atoms of the zeolite lattice from interacting with the sorbates, and hence, the sorbate-Si interactions are neglected. The selfinteraction parameters for the zeolite atoms were taken from the work of Kiselev and Du19 and from Englesfield.20 The selfinteraction parameters are listed in Table 1. Cross-interaction parameters were computed by using the Lorentz-Berthelot combination rules8 for both guest-guest and guest-zeolite interactions. These are listed in Table 2. The total sorbatezeolite potential energy is given by N

Ugh )

∑ ∑ φ(riz) i)1 z)O,Na,Ca

(3)

Both the guest-guest and guest-zeolite potentials were shifted so as to have zero energy at the cutoff radius. The total potential energy associated with the system is the sum of the sorbate-sorbate and sorbate-zeolite contributions:

U ) Ugg + Ugh

(4)

4. Computational Details

The sorbate-sorbate interactions have been modeled by simple (6,12) Lennard-Jones potentials:

φ(rij) ) 4

a-a

, kJ/mol

N

∑ ∑

2 i)1 j)1,j*i

φ(rij)

(2)

Simulation was carried out on guest atoms confined in zeolite NaCaA. Ten of each of three different sizes of Lennard-Jones atoms were taken. The sizes were 2.21, 2.55, and 3.34 Å, which are referred to as particles of type a, b, and c, respectively. All the guest atoms were assigned a mass of 40 amu. As previously mentioned, the unit cell of zeolite NaCaA is cubic, with an edge length of 24.555 Å. The simulation cell consisted of two unit cells along each of the x and y directions and 10 unit cells along the z direction. The dimensions of the simulation cell were, therefore, 49.11 × 49.11 × 245.55 Å. Ten crystallographic unit cells were chosen along the z direction in order to study any possible effect of the periodic boundary conditions on the calculated diffusion coefficient. The zeolite

Estimation of Error in the Diffusion Coefficient

J. Phys. Chem. B, Vol. 101, No. 27, 1997 5439

framework was assumed to be rigid. Guest-host interactions were computed in the usual way. However, we made use of the translational symmetry of the zeolite to compute the guestzeolite interactions. A spherical cutoff radius of 12 Å was used in calculating the sorbate-sorbate as well as the sorbate-zeolite interactions. Periodic boundary conditions were imposed. All the simulations were carried out in the microcanonical ensemble using the molecular dynamics technique. The velocity Verlet scheme was used to carry out the integration. A time step of ∆t ) 10 fs was found to be adequate for a good conservation of the total energy. All properties have been calculated from positions and velocities stored every ∆ts ) 0.2 ps (unless mentioned otherwise) in the production phase of the MD run. The time evolution of the mean squared displacement was obtained from the stored coordinates. The starting configuration consisted of at most one sorbate in each R-cage. All sorbates were assigned initial velocities from a MaxwellBoltzmann distribution appropriate to that temperature. Simulations were performed with an equilibration of 500 ps, which was followed by a run of 60 ns or 60 000 ps during which no velocity scaling was carried out. Two such runs, termed A and B, were carried out with two different starting configurations to see if the results depend on the initial configuration. Properties were calculated from data accumulated over 54 ns, a period of 6 ns immediately following equilibration being neglected. This was done to avoid any possible error since we suspect that the period immediately following equilibration might not generate the trajectory correctly. We have also performed reasonably long runs of length 15 ns at three different temperatures: 129, 207, and 276 K in order to obtain the activation energies of different sized particles. We also carried out simulations of a pure liquid consisting of 108 atoms of size 2.21 Å at 123 K (see Table 2). The equilibration and the production phases were, respectively, 1 and 10 ns long. This was done to study the difference in the error involved in computing the diffusion coefficient for a pure fluid as compared to that of a confined fluid. 5. Method

σ12 p



s12

m

1

∑ (fhi - hf )2

) p

(7)

(m - 1) i)1

An estimate of the variance of hf can be obtained from

var hf ≈ s12/L Calculation of D. D can be obtained from either the time evolution of the mean squared displacement, u2(t), or from the velocity autocorrelation function, c(t). The mean squared displacement may be obtained from the time evolution of the guest atomic positions, b ri:

u2(t) )

1

N

1

∫ ∑ N i)1 (T - t) 0

T-t

[b r i(t + τ) - b r i(τ)]2 dτ

(8)

where T is the duration of the simulation. A least squares straight line fit to the points was then obtained by fitting to the points in the closed interval [t1,t2]. The slope of this straight line was used to compute the diffusion coefficient, D, using the Einstein relation

D)

2 2 〈u2(t)〉 u (t2) - u (t1) ) 2dt 2d(t2 - t1)

(9)

where we have assumed d to be equal to 3. D was also calculated by integration of the unnormalized velocity autocorrelation function (vacf):

1

T c(t) dτ ∫ 0 T f∞ d

D ) lim

c

(10)

c

where the velocity autocorrelation function, c(t), is defined by

c(t) )

1

N

1

∫ ∑ N i)1 (T - t) 0

T-t

b V i(t+τ) b V i(τ) dτ

(11)

Here b Vi(t) is the velocity of the ith particle. Wood7

Estimation of Error. The method proposed by has been used to estimate the error involved. In the computation of any property, f, of the system, the overall average is given by

hf )

1

L

∑ fk

(5)

L k)1

where L is the total number of MD steps: L ) T/∆t. We now split the total run into m blocks of p steps (or p∆t in units of time) each such that

L ) mp The average over each block indicated by hfi is given by

hf i )

1

ip



p k)(i-1)p+1

fk

(6)

where i ) 1, 2, ..., m. When p is large, the block averages, hfi, will be approximately normally distributed with mean 〈f〉 and variance σ12/p. Since the block averages are statistically independent, one can obtain an estimate, s12/p, of the true variance:

6. Results and Discussion Figure 2 shows the time evolution of the mean squared displacement (msd) over the first 500 ps for run A and also run B. We report results of the simulations for sorbates of size 2.21 and 2.55 Å only, as these do not lie in the anomalous regime17 and exhibit the expected dependence of D on 1/σgg2. The average temperatures were 93.3 and 98.6 K for runs A and B, respectively. The initial part of the msd represents ballistic motion. Hence, it is preferable to exclude these points from the calculation of the least squares straight line whose slope will yield the diffusion coefficient, D. Statistics is poor at large t, as only (T - t)/∆ts points are available from a simulation of duration T resulting in larger error. It is therefore advisable to omit these points as well. As a result, our choice has been to choose the points in the range between 200 and 500 ps. A least squares straight line was obtained by fitting to the points of the msd curve over this range. From the slope of the line and the Einstein’s equation (9) we obtained the diffusion coefficient, D. These are listed in Table 3 for sorbates of sizes 2.21 and 2.55 Å for both run A and run B. We consider this to be the best estimate of D due to the judicious choice of the range over which the straight line has been fitted and the long run length. We denote it by Dapp. Below we shall compare estimates obtained for a different set of parameters or under different conditions with Dapp to understand their influence on D.

5440 J. Phys. Chem. B, Vol. 101, No. 27, 1997

Chitra and Yashonath

Figure 2. Evolution of the mean square displacement, u2(t), during the first 500 ps for runs A and B (for sorbates confined in zeolite NaCaA) each with equilibration, neglect, and production periods of 0.5, 6.0, and 54 ns, respectively. The temperatures of the runs A and B were 93.3 and 98.6 K, respectively. Table 3 lists the values of D obtained from this plot. The solid and dotted lines are for particles with σgg ) 2.21 and 2.55 Å, respectively. Insets show a ln-ln plot of the same curves.

TABLE 3: D Obtained from Run A and Run B with T ) 54 ns and (t1,t2) t (200,500) ps from the msd Curves Shown in Figure 2 D × 108, m2/s σgg, Å

run A

run B

2.21 2.55

0.122 911 0.129 782

0.171 366 0.170 660

TABLE 4: Dependence of D on (t1,t2) Based on Data from Runs A and B with T ) 54 ns D × 108, m2/s run A (t1,t2), ps 30-100 100-200 200-300 300-400 200-500

run B

σgg ) 2.21 Å σgg ) 2.55 Å σgg ) 2.21 Å σgg ) 2.55 Å 0.133 157 0.128 652 0.126 320 0.122 997 0.122 911

0.137 699 0.138 164 0.132 732 0.129 760 0.129 782

0.181 394 0.172 880 0.171 925 0.172 551 0.171 366

0.170 616 0.166 260 0.168 843 0.169 183 0.170 660

Dependence of D on (t1,t2). The choice of the interval (t1,t2) over which one fits a straight line to the msd curve (see eq 9) determines the value of D. It is not clear how D depends on the choice of t1 and t2. Table 4 lists the values of D for different choice of the interval (t1,t2). D was estimated from the slope of the straight line fitted to the msd curves obtained from run A and run B. It is seen from Table 4 that the value of D obtained by fitting a straight line over the range 300-400 ps is the one that is closest to Dapp. The range 30-100 ps seems to yield values of D that are higher than Dapp. The slope obtained by fitting a straight line to the initial part of the curve appears to lead to estimates that are generally higher than Dapp. This is understandable in view of the fact that the motion is ballistic16 where u2(t) ≈ t2. Hence, it is advisable that the value of t1 should be chosen so as to avoid the ballistic regime. Here sufficient care should be taken to ensure that t1 is well beyond the ballistic regime. Normally, the ballistic regime does not extend beyond a few picoseconds. However, its influence on the slope seems to extend well beyond it. From Table 4, it is seen that the value of D fitted over the range 100-200 ps and even 200-300 ps is higher. It is not clear whether the effect

of ballistic motion extends as far as 200 ps, resulting in higher values of D. This result is of considerable importance in view of the fact that most of the reported values of the diffusion coefficient in the literature have been obtained by fitting over a range much below 200 ps. In fact, even t2 is often chosen to be less than 200 ps. As a result, the reported Values generally oVerestimate the Value of D. For t2, a value that is not so very large would be appropriate to ensure better statistics. The choice of t2 will, however, depend on the duration of the simulation or the total run length, T. In the present study, t2 has been chosen to be 1% of T. Dependence of D on Block Length p. Figure 3 displays samples of the msd curves calculated from MD run A over blocks that are 1, 3, 6, and 9 ns long. From the figure, it is obvious that the msd’s obtained from 1 ns blocks often do not show a well-defined slope. The msd curve is more straight when the block size is 3 ns. For 6 and 9 ns blocks, the msd is almost straight and has a consistent slope over 200-500 ps range even though the slope is often different for different blocks. Table 5 lists the D values obtained for run A and run B with p ) 1, 3, 6, and 9 ns. D was calculated from Einstein’s relation using the slope of the u2(t) vs t curve between 200 and 500 ps. It is seen that the estimates of D are generally higher than Dappsthe value obtained from the 54 ns simulationswhich are listed in Table 4. A comparison of the two tables suggests that for a given range (t1,t2) the average value of D calculated from shorter runs results in an estimate that is usually higher than the correct value. It is worthwhile looking at the results in the literature in the light of this result. Most of the runs in the literature are at best a few hundred picoseconds long. The present result suggests that the Values of the diffusion coefficient reported from these rather short runs are higher than the correct Value of D for the system under consideration. Effect of Initial Configuration on D. As previously mentioned, Table 3 lists the values of the diffusion coefficient for both run A and run B as calculated using Einstein’s relation with (t1,t2) being chosen to be the interval 200-500 ps. The msd curves are shown in Figure 2. The values of D obtained from the two runs are different. The two runs themselves are identical except for the starting configuration. However, the resulting temperature for run B is 98.6 K, whereas it is 93.3K for run A. It is seen that the value of D obtained for this run (run B) is slightly higher than that obtained for run A. It is not clear whether this difference is a consequence of the starting configuration or is simply due to the difference in temperature between them. In order to clarify this point, we performed reasonably long runs of length 15 ns at three different temperatures: 129, 207, and 276 K. We calculated the activation energy for diffusion from the Arrhenius plot, which is shown in Figure 4. Ea was found to be 2.94 and 3.11 kJ/mol for sorbates of size 2.21 and 2.55 Å, respectively. The values of D for 2.21 and 2.55 Å sorbates were estimated for a temperature of 98.6 K using the Arrhenius relation

D ) D0 exp(-Ea/RT)

(12)

The estimated values of D from the above expression using the values of D0 and Ea obtained from the fit are 0.160670 × 10-8 and 0.156939 × 10-8 m2/s respectively for 2.21 and 2.55 Å sorbates. The value for σgg ) 2.21 Å; namely, (0.160670 ( 0.018) × 10-8 m2/s is within the range obtained from run B ((0.171366 ( 0.018) × 10-8 m2/s). Similarly, the value of σgg ) 2.55 Å, viz., (0.156939 ( 0.018) × 10-8 m2/s, is within the range obtained from run B ((0.170660 ( 0.018) × 10-8 m2/s). These are listed in Table 6. It is seen that the difference in the value of D between the value calculated from run B and that estimated for 98.6 K using eq 12 is within the error in the

Estimation of Error in the Diffusion Coefficient

J. Phys. Chem. B, Vol. 101, No. 27, 1997 5441

Figure 3. Samples of msd curves obtained from run A for sorbates in zeolite NaCaA for different T values: (a) 1, (b) 3, (c) 6, and (d) 9 ns. Four curves are shown for each T value. Curves for diffusing particles with σgg ) 2.21 and 2.55 Å are shown by solid and dotted lines, respectively. The data range from which the msd curves were calculated is also indicated.

TABLE 5: Dependence of D on the Block Length, p, for Runs A and Ba,b D × 108, m2/s run A

run B

p, ns

σgg ) 2.21 Å

σgg ) 2.55 Å

σgg ) 2.21 Å

σgg ) 2.55 Å

1 3 6 9

0.149 671 0.133 690 0.136 488 0.138 357

0.150 377 0.144 734 0.148 378 0.141 953

0.192 282 0.185 108 0.189 860 0.187 301

0.189 706 0.194 581 0.188 902 0.187 649

a

The simulation of length T ()54) ns is divided into m blocks, each of length p ns. b Fitted over (t1,t2) t (200,500) ps.

calculation of the diffusion coefficient. Thus, it appears that the difference in the value of D between run A and run B can be attributed to the difference in the temperatures of the two runs. It therefore seems reasonable to conclude that initial configuration does not have any effect on the value of D for a simulation that is sufficiently long. D As Calculated from Velocity Autocorrelation Functions. It is known that calculation of D from vacf (see eq 10) is somewhat approximate due to errors associated with the following steps: (1) integration of the vacf introduces additional errors, and (2) limiting integration to Tc without taking into account the long-time tail that may exist also adds to errors.8 Figure 5 shows a plot of the vacf obtained from run A. Velocities stored every 0.2 ps were used to calculate the vacf.

Figure 4. Arrhenius plots for particles of size 2.21 and 2.55 Å based on data from MD runs at 93.3, 129, 207, and 276 K for the sorbatezeolite system.

This introduces additional errors due to the large size of the stored step. Table 7 lists the value of D obtained from the vacf with Tc ) 500 ps for run A and run B. The values of D obtained from vacf and those obtained from Einstein’s relation do not differ by more than the error involved in the calculation of D. It, therefore, appears that the value of D obtained from vacf

5442 J. Phys. Chem. B, Vol. 101, No. 27, 1997

Chitra and Yashonath

Figure 5. Unnormalized velocity autocorrelation function plotted as a function of time using data from run A for particles of size 2.21 and 2.55 Å. c(t) is in units of (Å/fs).2 The temperature of the sorbatezeolite system at which the simulation was carried out is 93.3 K, as already stated.

TABLE 6: D from Run B and Arrhenius Expression at 98.6 Ka D × 108, m2/s

a

Figure 6. Percent error in D, δD, shown as a function of different run lengths T for runs A and B for particles with σgg ) 2.21 and 2.55 Å diffusing in NaCaA.

TABLE 9: Dependence of the Error in D on the Run Length, T, for the Guest-Zeolite System from Eq 13, Based on 54 ns Long Simulations δD

σgg, Å

from run B

from eq 12

2.21 2.55

0.171 366 0.170 660

0.160 670 0.156 939

run A

Fitted over (t1,t2) t (200,500) ps.

TABLE 7: D As Calculated from Unnormalized vacfs for Runs A and Ba,b with T ) 54 ns

run B

T, nsa

σgg ) 2.21 Å

σgg ) 2.55 Å

σgg ) 2.21 Å

σgg ) 2.55 Å

1.0(54) 3.0(18) 6.0(9) 9.0(6)

54.5 48.9 39.9 39.9

53.9 46.3 35.8 24.4

41.8 29.7 23.3 17.8

36.9 31.7 30.6 32.1

a The numbers on parentheses indicate the number of blocks of a given length employed in estimating the error.

D × 108, m2/s σgg, Å

run A

run B

2.21 2.55

0.118 999 (0.122 911) 0.137 932 (0.129 782)

0.167 731 (0.171 366) 0.190 702 (0.170 660)

TABLE 10: Error Associated with D as a Function of the Block Length, p, Obtained from Wood’s Analysis Using Data from Runs A and Ba

a T ) 500 ps for the calculation of D from vacf. b The values in c parentheses indicate the value of D from msd.

∆D × 108, m2/s run A

TABLE 8: D As Calculated from vacf with T ) 0.5 ns D × 108, m2/s σgg, Å

∆ts ) 0.05 ps

∆ts ) 0.2 ps

2.21 2.55

0.190 856 0.098 389

0.176 237 0.089 394

and that obtained from the u2(t) vs t curve will converge provided the run length is sufficiently long. However, it is advisable to calculate D from vacf from data stored at more frequent intervals. Table 8 lists the value of D obtained from a single 0.5 ns simulation where data were stored every 0.05 ps as well as every 0.2 ps. Here, Tc was chosen to be 100 ps. It is evident that additional errors are introduced when data are stored with a larger interval size. Dependence of the Error in D on Run Length T. The error in the value of D obtained from runs of length p can be estimated from

∆Dp ) xm∆DT

(13)

where ∆Dp and ∆DT are the errors when the run lengths are respectively p and T. Here the estimated value of the diffusion coefficient and the associated error is given by D ( ∆D. We define percentage error, δD, in D as

δD ) ∆D/D × 100

(14)

run B

p, ns

σgg ) 2.21 Å

σgg ) 2.55 Å

σgg ) 2.21 Å

σgg ) 2.55 Å

1 3 6 9

(0.021 (0.018 (0.018 (0.023

(0.017 (0.018 (0.018 (0.014

(0.017 (0.015 (0.015 (0.014

(0.021 (0.018 (0.019 (0.025

a The simulation of length T ()54) ns is divided into m blocks, each of length p ns.

Table 9 lists the errors associated with 1, 3, 6, and 9 ns long simulations. It is seen that the error is as much as 55% for a run that is 1 ns long and can be as large as 40% when the run is 9 ns long. Figure 6 shows how the percentage error in D, δD, varies as a function of the run length, T, for runs A and B. The errors themselves vary over a wide range. This is to be expected since D, a second-order quantity, is known to converge slowly. Consequently, the error in D will be a fourth-order quantity which is expected to converge even more slowly, and therefore the 54 ns run reported here should be too short for estimating error in D accurately even though it is sufficiently long to estimate D itself reasonably accurately. While the above errors were obtained from expression 13, the error for the total run of 54 ns was obtained by the method proposed by Wood.7 Runs were divided into blocks of length 1, 3, 6, and 9 ns, and the errors calculated are reported in Table 10. The corresponding values for the diffusion coefficient, D, were reported in Table 5. The error in the estimated value of D is about (9-17%.

Estimation of Error in the Diffusion Coefficient

J. Phys. Chem. B, Vol. 101, No. 27, 1997 5443 TABLE 11: Details of the Molecular Dynamics Runs for Sorbate-Zeolite System and Pure Liquid sorbate-zeolite system 〈Ugg〉 (kJ/mol) 〈Ugh〉 (kJ/mol) 〈U〉 (kJ/mol) 〈T〉 (K) N equilibration length (ns) run length (ns)

run A

run B

pure liquid

-0.019 892 -12.2925 -12.3124 93.3 ( 11.5 30a 0.5

-0.040 119 -12.1523 -12.1924 98.6 ( 12.1 30a 0.5

-4.653 21

54.0c

54.0c

10.0

a

c

Figure 7. Percent error in D, δD, for the guest-zeolite system as a function of different run lengths T based on an estimate of the error averaged over the two runs (A and B) and over the two sizes (2.21 and 2.55 Å). The solid curve has been obtained from a least squares fit of the equation ∆D ) (A/xT) + c, where A ) 8.267458 × 10-4 s1/2 and c ) 21.5622.

These estimates are based on position coordinates of the particles extending over 54 ns. As mentioned earlier, the estimates of the error in D are poor, and a better estimate is desired. A somewhat more accurate estimate of the error and its dependence on T can be obtained by combining runs A and B as well as the results for particles of size 2.21 and 2.55 Å. This is possible due to the fact that the errors themselves do not depend on the size of the particle or the starting configuration, the temperatures for the two runs being very similar. Such a combined error estimate is shown in Figure 7 along with a least squares fit curve of the form

δD ) A/xT + c

-4.653 21 122.6 ( 5.9 108b 1.0

b

10 sorbates each of sizes 2.21, 2.55, and 3.34 Å. σgg ) 2.21 Å. 6 ns after equilibration was neglected.

TABLE 12: Block Averages of D over Every 1 ns Block for the Pure Liquid and for the Sorbate-Zeolite System D × 108, m2/s block

pure liquid

run A

run B

1 2 3 4 5 6 7 8 9 10 average

0.223 891 0.249 774 0.195 275 0.244 366 0.162 723 0.226 394 0.181 405 0.238 316 0.178 197 0.210 566 0.211 091 ( 0.009 601

0.210 293 0.100 017 0.059 194 0.198 734 0.090 732 0.184 627 0.227 140 0.276 575 0.084 993 0.267 602 0.169 991 ( 0.025 276

0.162 057 0.316 119 0.107 367 0.378 361 0.096 727 0.211 013 0.208 061 0.083 706 0.124 117 0.054 280 0.174 181 ( 0.033 336

TABLE 13: Block Averages of D over Every 1 ns Block for the Pure Liquid D × 108, m2/s

(15)

to the calculated points. The values of the constants are found to be A ) 8.267458 × 10-4 s1/2 and c ) 21.5622. As we shall see, the values of A and c depend on the nature of the system. From this it is possible to estimate the value of the error for any given simulation length T. It is seen that even for a 1 ns run the error will be around 50% or more. Earlier, we had made a rough estimate of the error as 20% for sorbates in NaY and NaCaA based on runs that were 0.6-1.6 ns long.17 Our present results, which report runs significantly longer and therefore obtain a more accurate estimate of the error, suggest that the errors are, in fact, larger. Dependence of Error in D on Nature of the System. Is there a dependence of the error associated with D on the nature of the system? We have attempted to answer this by carrying out simulations of a pure Lennard-Jones fluid. The details of this run are given in the section dealing with computational details. Table 11 lists the average values of properties calculated for the sorbate-zeolite and the pure liquid systems along with some details of the simulation. Even though this simulation was carried out with 108 particles, initially we employed only 10 particles to estimate D and the error in D. The temperature of the run was 123 K, and the run was 10 ns long. A block of 10 ns from run A as well as run B of the fluid-zeolite system was taken and analyzed. Results are presented in Table 12. It is seen that the error associated with guest-zeolite system is usually larger by a factor of 2-4. In the case of the guest-zeolite system, the potential energy varies over a wide range of energies. In contrast, in the liquid, the range over which the potential energy change occurs is smaller, and consequently, the gradient of the potential energy is also probably smaller. It appears that the larger the range over which the potential energy varies and the steeper the

block

n ) 10

n ) 108

1 2 3 4 5 6 7 8 9 10 average

0.223 891 0.249 774 0.195 275 0.244 366 0.162 723 0.226 394 0.181 405 0.238 316 0.178 197 0.210 566 0.211 091 ( 0.009 601

0.206 737 0.231 116 0.227 563 0.220 330 0.211 484 0.207 288 0.212 285 0.215 662 0.222 911 0.230 219 0.218 560 ( 0.002 909

gradient in the potential energy, the greater will be the error associated with the value of the calculated D. Consequently, a longer run is necessary to obtain D within a specified accuracy if the system has a steeper potential energy surface. Dependence of Error in D on System Size. The analysis related to this was carried out on the data obtained by simulating a pure Lennard-Jones fluid consisting of 108 particles with σgg ) 2.21 Å. The analysis was carried out using data for different numbers of particles, n (n e 108), from the simulation for 108 particles. Table 13 presents the block averages or subaverages over every nanosecond for 10 and 108 particles, along with the overall average for the diffusion coefficient and the estimated error. The error is 4.5% when the data for 10 particles is used and 1.3% for n ) 108. A similar analysis using other values of n was carried out, and Figure 8 shows a plot of this error as a function of n. The points in the figure correspond to the errors obtained from the analysis of the MD trajectory, while the curve represents the least squares fit to the points. The form of the curve was assumed to be

δD ) B/xn

(16)

5444 J. Phys. Chem. B, Vol. 101, No. 27, 1997

Figure 8. Dependence of percent error in D on the number of particles, n, for the pure fluid. The run was 10 ns long with 1 ns equilibration. The solid curve is obtained from the best fit to the equation ∆D ) B/xn, where B ) 13.0674.

Chitra and Yashonath

Figure 10. Distribution, f(U), of the block averages of the total potential energy, U, shown by the data points. Here U is the total potential energy of the guest-zeolite system comprising the guest-guest and the guest-zeolite contributions. The solid line is only a guide to the eye.

Analysis of Correlation. Estimation of the error associated with D using the method given by Wood7 is valid only if the block averages are uncorrelated. When the blocks are sufficiently long, it is expected that the block averages will be uncorrelated. However, the critical size beyond which the block averages are not correlated is not known. In order to estimate this, we have calculated the statistical inefficiency,8 s, defined as

pσ2(fhi) s ) lim , i ) 1, 2, ...m and L ) mp pf∞ σ2(fh)

Figure 9. Plots of the msd along x, y, and z directions indicating the absence of any effect on the msd due to the pbc. The msd was obtained from run A for the sorbates in the zeolite NaCaA system at a temperature of 93.3 K.

From the least squares fit, B was estimated to be 13.0674. The value reported here is probably typical for Lennard-Jones systems. The value of B will depend on the nature of the system. The figure clearly indicates that the error in D exhibits the expected 1/xn dependence, where n is the number of particles. Periodic Boundary Condition and D. One of the problems that arise due to the finite size of the simulation cell is that periodic boundary conditions (pbc) can give rise to spurious peaks in the time correlation function. Figure 9 shows plots of ux2(t), uy2(t), and uz2(t) from the 54 ns long run A. Note that along the x and y directions the simulation cell has a dimension of 49.11 Å, whereas along the z direction the dimension is 245.55 Å. One expects that ux2(t), uy2(t), and uz2(t) are all identical since Fm3hc is a cubic space group with no anisotropy. Since uz2(t) is identical to ux2(t) and uy2(t), it appears that the pbc does not have any effect on the resulting evolution of the msd. This result suggests that the effect of the pbc is absent when the simulation cell is equal to or longer than 49.11 Å. It is possible that when the edge length is shorter, pbc may still have some effect on the msd and the resulting D. This result may in part be due to the fact that we are simulating the fluid phase and not the solid phase since phonons are strongly damped in a fluid but not in a solid.

(17)

where p is the length of the block. Here i varies from 1 to m and L ) mp. σ2(fhi)/p is the variance (see eq 7) of the block averages, and the denominator is the variance associated with the total run length. When the values of the block averages obey Gaussian statistics, then the limiting value of s obtained from eq 17 gives an idea of the correlation time. More precisely, the statistical inefficiency, s, is related to the correlation time,8 τc, by

s ) 2τc/∆ts

(18)

Figure 10 shows the distribution of block averages for the total potential energy, U. The solid line is a Gaussian curve. The points lie close to this curve, showing thereby that the points obey Gaussian statistics. Following Allen and Tildesley,8 we plot s against p1/2 in Figure 11. The property considered is, again, the total potential energy, U. It is seen that the plateau in the curve occurs around 150 ps. If the correlation time is independent of the property, it follows that s will also be independent of the property, and in such a situation this gives us an estimate of the minimum block size that gives uncorrelated values of the diffusion coefficient D. Assuming that it is so, the averages obtained over blocks of 1 ns or longer seem to be reasonable since they have a length that is several times 150 ps. The correlation time, τc, is found to be 15 ps from expression 18. 7. Conclusions The principal conclusions of this study are as follows. (1) It is suggested that the values of t1 and t2 should be chosen so as to avoid both the initial and the long-time regions. If the initial part of msd is considered in the estimation of D, then the resulting value will normally be higher than the true longtime D. Unfortunately, in the literature D is obtained from msd

Estimation of Error in the Diffusion Coefficient

Figure 11. Plot of the statistical inefficiency, s, as a function of length of the block p. Abscissa is chosen as p1/2 for convenience. The value of s at which the plateau is reached is around 0.15 ns or 150 ps when the averages are expected to be uncorrelated. The property considered is the total potential energy, U, of the guest-zeolite system based on data from run A.

by fitting to a range in which t1 is usually not very large. Therefore, the values reported in such studies are usually higher than the correct value. Neglect of the initial part ensures that the ballistic part of the curve is avoided. At large values of t, the statistics is poor. The final choice of t2 will depend on the length, T, of the run. (2) Even if t1 and t2 are selected carefully, the shorter the run length T, the greater will be the overestimation of D over and above the correct value. Thus, the value of D from shorter blocks seems to result in an overestimation of the value of D. (3) The value of D from msd and vacf converge to the same value within the error in the calculation for sufficiently long runs. It is not clear beyond what value of T such convergence is assured. (4) For a guest-zeolite system with 10 particles, a run of length 1 ns is associated with an error of about 50%. The error reduces to 15-40% when the run length is increased by a factor of 9, i.e., for a 9 ns run. (5) There does not seem to be any effect of the pbc on the behavior of the msd even when the system is about 50 Å in length. It is not clear whether this will still be true even when the simulated phase is a solid or when the edge length is shorter. (6) A steep or widely varying potential energy surface seems to give rise to a larger error in D. Thus, the error in D is system dependent. The error in D for the sorbate-zeolite system is about 2-4 times larger than the error of pure fluid. (7) The calculated statistical inefficiency, s, is around 150 ps, suggesting that block averages of the total potential energy, obtained from blocks longer than 150 ps, are uncorrelated. (8) For the runs reported here there seems to be no effect of the starting configuration on the calculated values of D.

J. Phys. Chem. B, Vol. 101, No. 27, 1997 5445 The present calculation suggests that accurate diffusion coefficients can be obtained if the runs are sufficiently long. The results indicate that the runs should be about 10 ns long or t* ) 9663.1 (for a sorbate of size 2.21 Å), where t* is the reduced time unit for a Lennard-Jones system. The error for such a run could be anywhere between 15 and 40% for a system consisting of 10 particles confined to the cavities of a zeolite lattice. For a pure fluid, the errors are likely to be smaller. It is expected that the error for a pure fluid with 108 particles will be 5% for a run that is 1 ns long, while the error will be about 1% if the run is 10 ns long. So, for pure LJ fluids it is seen that runs that are 1 ns long can give a reasonable estimate of D. The implications these results have on the values of D reported are now considered. In the literature, most of the simulations are usually only a few picoseconds long, although in recent times, with the availability of faster CPUs, somewhat longer runs of several hundred picoseconds with particles whose mass is close to that of argon and run lengths of 1 or 2 ns with xenon-like mass have also been reported. The present results indicate that the error for these could be anywhere between 50% and several hundred percent of the value of D reported. It is, therefore, necessary that the values of D reported and often compared with experimentally measured values of D should be interpreted with caution. Present results indicate that the errors in D can depend on the nature of the system, i.e., the underlying potential energy surface, and consequently, the error can be different for different zeolites or guest-host systems. Further, the errors are usually larger than those for the system without the host. Acknowledgment. The authors acknowledge the computational facilities provided by the Supercomputer Education and Research Centre, Indian Institute of Science, for carrying out this work. The authors would also like to thank T. S. Mohan for computational assistance. References and Notes (1) Metropolis, N.; Rosenbluth, A. W.; Rosenbluth, M. N.; Teller, A. H.; Teller, E. J. Chem. Phys. 1953, 21, 1087. (2) Alder, B. J.; Wainwright, T. E. J. Chem. Phys. 1959, 31, 459. (3) Rahman, A. Phys. ReV. 1964, 136A, 405. (4) Verlet, L. Phys. ReV. 1967, 159, 98. (5) Jorgensen, W. L.; Madura, J. D.; Swenson, C. J. J. Am. Chem. Soc. 1984, 106, 6638. (6) Zwanzig, R.; Ailawadi, N. K. Phys. ReV. 1969, 182, 280. (7) Wood, W. W. In Physics of Simple Liquids; Temperley, H. N. V., Rowlinson, J. S., Rushbrooke, G. S., Eds.; North-Holland Publishing Co.: Amsterdam, 1968. (8) Allen, M. P.; Tildesley, D. J. Computer Simulations of Liquids; Clarendon Press: Oxford, 1987. (9) Smith, E. B.; Wells, B. H. Mol. Phys. 1984, 53, 701. (10) Straatsma, T. P.; Berendsen, H. J. C.; Stam, A. J. Mol. Phys. 1986, 57, 89. (11) June, R. L.; Bell, T. A.; Theodorou, D. N. J. Phys. Chem. 1990, 94, 1508. (12) Nicholas, J. B.; Trouw, F. R.; Mertz, J. E.; Iton, L. E.; Hopfinger, A. J. J. Phys. Chem. 1993, 97, 4149. (13) Henson, N. J.; Cheetham, A. K.; Peterson, B. K.; Pickett, S. D.; Thomas, J. M. J. Comput.-Aided Mater. Des. 1993, 41, 1. (14) Bandyopadhyay, S.; Yashonath, S. J. Phys. Chem. 1995, 99, 4286. (15) Mosell, T.; Schrimpf, G.; Hahn, C.; Brickmann, J. J. Phys. Chem. 1996, 100, 4571. (16) Santikary, P.; Yashonath, S.; Ananthakrishna, G. J. Phys. Chem. 1992, 96, 10469. (17) Yashonath, S.; Santikary, P. J. Phys. Chem. 1994, 98, 6368. (18) Pluth, J. J.; Smith, J. V. J. Am. Chem. Soc. 1980, 102, 4704. (19) Kiselev, A. V.; Du, P. Q. J. Chem. Soc., Faraday Trans. 2 1981, 77, 1. (20) Englesfield, E. J. In Computer Simulation of Solids; Catlow, C. R. A., Mackrodt, W. C., Eds.; Springer-Verlag: Berlin, 1982.