Monte Carlo Method Applied to the Estimation of Coefficient Errors in

Cc. Bento da Rocha Cabral, 14 - 1200 Lisboa, Portugal. Received: March 4, 1994; In Final Form: May 25, 1994@. A Monte Carlo method is employed to ...
0 downloads 0 Views 441KB Size
J. Phys. Chem. 1994,98, 9537-9540

9537

Monte Carlo Method Applied to the Estimation of Coefficient Errors in In k =f(T) Equations Raquel M. C. Gongalves,' Filomena E. L. Martins, and Ruben A. Elvas Leitlo CECUL - FCUUISEL, Centro de Electroquimica e Cinetica da Universidade de Lisboa, Cc. Bento da Rocha Cabral, 14 - 1200 Lisboa, Portugal Received: March 4, 1994; In Final Form: May 25, 1994@

A Monte Carlo method is employed to estimate coefficient standard deviations and their confidence intervals for rate constant temperature equations. The method used, unlike parametric statistics, requires no assumptions about the distribution of errors or the covariances between pairs of parameters. In this way, biased estimates of coefficient confidence intervals are avoided, and therefore more reliable values for thermodynamic activation functions' confidence intervals are obtained. The advantages and importance of using weighted regression in this context are also shown. The proposed procedure is applied to rate constants of the solvolysis of 2-iodo-2-methylpropane in methanol, between 0 and 55 "C. New rate constants are reported at 0.00 and 5.00 "C. A comparison is made with data derived from conventional weighted least squares regression.

Introduction The dependency of rate constants, k, on temperature is commonly studied by fitting the data to suitable In k = AT) equations, using traditional least squares analysis. 1-6 In this context, a useful analytical procedure has been to appeal to the transition state theory to establish a link between thermodynamics and kinetics. Among the equations proposed to study this dependency, the most frequently used is a particular case of the Clarke and Glew expression:'

The use of a Monte Carlo method to estimate coefficient standard deviations and their confidence intervals for eq 1 avoids biased estimates of coefficient confidence interval^,^^,^^ and therefore more reliable values for thermodynamic activation functions' confidence intervals are obtained. The proposed method is applied to the solvolysis of 2-iodo-2-methylpropane in methanol. Weighted Least Squares

m

In k = a,

+ u 2 T 1+ a3 In T + x a x 7 ( r - 3 )

(1)

x=4

(As Blandamer et al. pointed the Clarke and Glew equation uses a reference temperature, 8, for the dependence on temperature. If 8 is chosen near the middle of the experimental range, the standard errors are less than those based on equations where the effective reference is at 8 = 1.) In these studies it is assumed that the model parameter errors are normally distributed, which is often not correct. Moreover, it is still common to use unweighted regression although the need of correct weighting has been stressed by several author^.^-^ Through the use of F-statistics, one can decide how many significant terms in eq 1 are needed to describe the data at a chosen confidence level. However, there is evidence for collinearity among equation coefficient^,^^^^^^.^^ which in tum reflects itself upon the magnitude of the associated variances and covariances. Errors in these coefficients are important, as they are used to estimate the activation parameter uncertainties of the system under study, namely, the standard deviations associated with the enthalpy, oA*H, entropy, oA*S, and heat capacity, oA*C,, of activation. Furthermore, with this procedure one should calculate the confidence intervals (CI) rather than simply standard deviations, u,

CI = f r o

(2)

in order to analyze the results, especially when comparisons among several systems are involved.'*

* Author to whom correspondence should be addressed at CECUL,Cg. Bento da Rocha Cabral, 14 - 1200 Lisboa, Portugal. Abstract published in Advance ACS Abstracts, July 15, 1994. @

It is a well-known practice to use weights, wi, whenever the experimental values have different standard deviations, ui. This type of individual weighting accounts for the precision of each value, and in most situations wi is considered to be inversely proportional to In the case of kinetic measurements, since the precision of mean ki values is usually lower at the extreme temperatures of the working interval, this is specially meaningful. However, the effect of a variable transformation is frequently not taken into consideration. For a logarithmic transformation such as Yi = In ki, the original assumption is that ki values are normally distributed, but we are now minimizing the sum of squares of the deviations in In ki. If ki values are likely to follow a Gaussian distribution, those in In ki are definitely not, since there is a compression of the data for low ki values and an expansion for the higher o n e ~ . ~Following J~ the usual assumption that AYi and Aki are relatively small, we can write

4.

AY, dYi -=Aki - dk,

(3)

When the experimental data k, are converted to Yi, one should then introduce a global weighting factor, w'i, dictated by the mathematical transformation of the data, given by Wli=

-

(dY,idki)?

(4)

The global weighting must be used in addition to the individual weights. The combination is achieved by multiplying

0022-365419412098-9537$04.50/0 0 1994 American Chemical Society

9538 J. Phys. Chem., Vol. 98, No. 38, 1994

Gonqalves et al. 04 0 .2 -

TABLE 1: Average Rate Constants and Associated Standard Deviations for the Solvolysis of t-BuI in Methanol no. of order T/K k1s-I ads-' ref 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

273.15 277.69 278.15 283.15 288.03 288.15 293.15 298.15 298.21 303.15 307.99 308.15 313.15 317.99 318.15 323.15 326.22 328.15

3.26 x 7.06 x 5.36 x 2.33 x 2.97 x 3.51 x 6.68 x 1.29 x 1.26 x 2.32 x 4.14 x 4.43 x 8.00 x 1.31 x 1.31 x 2.09 x 3.14 x 3.73 x

4.30 x 1.77 x 1.50 x lo-' 3.08 x 7.43 x 9.90 x lo-' 2.45 x 4.94 x 3.15 x 2.02 x 1.04 x 6.07 x 1.80 x 3.27 x 4.59 x 8.16 x 7.85 x 8.90 x

this work 19 this work 20 19 20 20 20 19 20 19 20 20 19 20 20 19 20

both factors, which in the present case yields the expression

118

w.= WjW'. =

n

ky

(5)

a

GJ

0

i

6-02

5-0.4

t x w

p

-I -1 2

0

5

10 15 number of order

20

Figure 1. Residuals for each type of regression: (W) nonweighted; (0)weighted.

of experimental uncertainties, which is not always accurate. In our case a different approach was possible: (i) regression analysis was performed using the Arrhenius equation since in this case collinearity problems do not arise; (ii) the MC method was then applied to the same data and the same equation, producing parameter and error estimates; (iii) the scattering factor was increased, and step ii was repeated until the parameter errors equaled those obtained in the first step. The factor determined in this manner can be safely used for any other temperature equation. Example

Monte Carlo Calculation of Confidence Intervals The Monte Carlo (MC) method is being increasingly used to estimate confidence intervals on estimated model coefficients calculated by linear and nonlinear regression. 13,14,16 In this work a large number, N , of synthetic data sets based on the original experimental values (kL,7') were generated using an assembly of random numbers (q)normally distributed:

+

k: = ki ei For this purpose an algorithm using the power residual method" was used to ensure long nonrepeating sequences (5 x IO8) with appropriate standard deviation and mean zero. Regression analysis was then performed over each synthetic data collection (k;, T) to obtain N parameter sets ( u l , uz, ..., um),(bl, b2, ..., bm),etc. To obtain a given confidence interval, x%, the upper and lower (100 - x)/2% values of the ordered list for each coefficient were eliminated. This procedure, as stated by Alper and Gelb,l3,l4has the advantage that there is no need to assume a given parameter distribution. The number of MC simulations has to be chosen so that essentially constant estimates of the coefficients are obtained. The choice of the scattering factor is of vital importance in this method. This value is usually chosen on the basis of an estimate TABLE 2: Choice of the Best Fit Equation In k = a1

+ a2T-l +

The following example refers to the choice of the best fit temperature equation for the rate constants of the solvolysis of 2-iodo-2-methylpropane (t-BuI) in methanol, based on the Clarke and Glew treatment (eq l ) . ' ~ ' In ~ the first part, the need and importance of weighted regression are demonstrated, and in the second part, a comparison between the MC approach and classical regression analysis is made. Table 1 shows the experimental data set used in both cases. Weighted vs Nonweighted Regression. Nonweighted regression was performed over the data points in Table 1. The number of significant terms was chosen according to the F-test for the 95% confidence level. In this test, the calculated value (Fcalc) is compared with the tabulated value (Ftab). This procedure, with weights calculated as described above, has also been applied to the same set of data. Table 2 summarizes the results. The results in Table 2 show that, for any confidence interval greater than 95%, the number of terms in the best fit equation is not the same in both cases: two parameters for the nonweighted and three for the weighted regression. Also, a residual analysis (Figure 1) clearly shows that the residuals are not randomly distributed along the zero line in the case of nonweighted regression whereas a better distribution is found when weights are considered. a3

In T

+~~.-t~~T(x-~)

nonweighted parameters and uncertainties = (2.93 f 0.06) x 10' (-1.14 f 0.02) x lo4

afit

0.1451

weighted

SSR" 0.3370

parameters and uncertainties

ofit

(2.97 f 0.05) x 10' = (-1.15 f 0.01) x 104

0.0845

(3.1 1 f 0.87) x 10' uz = (-2.42 f 0.39) x 104 a3 = (-4.19 f 1.29) x 10'

0.0669

UI

3.74 ( < F t a b )

u2=

= (3.18 f 1.49) x lo2 u2 = (-2.43 f 0.67) x lo4 ~3 = (-4.31 z t 2.23) x 10'

calcd F value

a2

SSR" 0.1140

10.45 ("Ftab)

0 1=

0.1341

0.2698

lo3 uz = (-0.92 1.60) x lo5 a3 = (-0.49 f 1.06) x lo3 ~4 = (0.74 zk 1.76) x 10'

0.0672 0.19 ('Ftab)

U I = (2.87 f 6.06) x

*

a

calcd F value

SSR = sum of squared residuals.

0.0688

0.0663

J. Phys. Chem., Vol. 98, No. 38, 1994 9539

Estimation of Coefficient Errors by MC Method TABLE 3: Randomness Test for Nonweighted and Weighted Regression runs test nonweighted regression no. of positive residuals no. of negative residuals no. of runs Z

7 11 7 -2.34

weighted regression no. of positive residuals no. of negative residuals no. ofruns Z

not random distrib. at 95% level of confidence

8 10 8 -0.44

random distrib. at 95% level of confidence

TABLE 4: Comparison of Parameters and 68% Confidence Intervals Given by the Classical Regression and the MC Method param

least squares regression

Monte Carlo sim

a1

(3.111 f 1.850) x lo2 (-2.419 f 0.831) x lo4 (-4.195 f 2.758) x 10'

(3.125 f 0.600) x lo2 (-2.425 f 0.278) x lo4 (-4.216 f 0.889) x 10'

a2 a3

The randomness of the residuals distribution can be quantified, measured, and tested by the so-called runs test.21 In this test a standardized variable, 2,is calculated according to the number of positive and negative residuals and the number of times the sequence of residuals changes sign. This value is then compared with the normal distribution for a chosen confidence interval. In our case the 95% level was chosen and therefore -1.96 < Z < 1.96

(7)

Z values higher than the upper limit indicate oscillations in the data values that have to be considered in the model; if the value is too low, then the model is inadequate. A Z value contained in the interval defined by this two-sided test indicates randomness. The test results, summarized in Table 3, confirm that the residuals pertaining to the nonweighting regression are not randomly distributed. Conversely, the ones for weighted regression are distributed at random. MC Method vs Classical Weighted Regression. In order to build the confidence intervals for the estimated parameters, ai, 200 MC numerical simulations were performed using a scattering factor chosen as described earlier. The average value and the 68% confidence interval for each parameter were calculated as previously mentioned. The standard error estimates of each parameter, using classical regression, have been obtained by taking the appropriate values

from the covariance matrix and are shown in Table 2. Each 68% CI was calculated using eq 2, the student t statistic being 1.035 for this level with 15 degrees of freedom (number of points - number of parameters). This procedure also follows the hypothesis of normality of the residuals. The results of the calculated coefficients and confidence intervals for both methods are shown in Table 4. The results indicate large differences in the confidence intervals. In percentage terms, the confidence intervals calculated by the MC method are approximately 67% of those obtained through parametric statistics. This shows the inadequacy of the Gaussian distribution to describe the parameters' distribution due to their collinearity. The covariances between pairs of coefficients of the best fit equation, sa,,, were calculated according to the following expression: N

N

N

The resulting values for the thermodynamic activation functions (TAF), calculated using the two methods, are presented in Table 5. Again, the uncertainties given by the MC method are much smaller than those obtained from conventional weighted least squares analysis. This was, however, expected since the variances and covariances of the parameters are much smaller in the MC procedure due to the elimination of collinearity problems. Experimental and Computational Conditions The solvolytic path of the reaction of t-BuI (0.01 mol dm-3) with methanol was followed conductimetrically until 90-95% conversion was achieved. Rate constant values, k, were obtained in the range 0-55 "C through the use of the Kezdy-Swinboume method.22 Conductance was measured with a Wayne-Kerr B905 autocalibrated conductance bridge linked to an IBM-compatible computer through a RS 232C interface. The cells were covered with aluminium foil in order to prevent photoinduced reactions, and a Braun thermostat was used to ensure a temperature control better than fO.O1 0C.4,6,23At least three independent experi-

TABLE 5: Thermodynamic Activation Functions and 68 % Confidence Intervals least squares regression values

T/K 273.15 278.15 277.69 283.15 288.03 288.15 293.15 298.15 298.21 303.15 307.99 308.15 313.15 317.99 318.15 323.15 326.22 328.15

A*H- f oA*H%J mol-' 103.6 f 3.3 101.8 f 2.8

A*s" f aA*s"/J mol-' K-'

29.0 f 11.5 22.5 f 9.6 101.9 f 2.8 23.1 f 9.8 100.0 f 2.3 16.2 f 7.7 98.2 f 1.8 10.1 f 6.1 9.9 f 6.0 98.2 f 1.8 96.4 f 1.4 3.8 f 4.5 94.6 f 1.1 -2.3 f 3.5 94.6 f 1.1 -2.3 f 3.5 -8.2 f 3.3 92.9 f 1.1 -13.9 f 4.0 91.1 f 1.3 91.1 f 1.3 -14.1 f 4.1 -19.8 f 5.3 89.3 f 1.7 -25.3 f 6.8 87.6 f 2.2 87.5 f 2.2 -25.5 f 6.8 -31.0 f 8.4 85.7 f 2.7 -34.4 f 9.4 84.6 f 3.0 -36.5 f 10.0 83.9 f 3.1 A*c,- = -,357 f 112 J mol-' K-'

Monte Carlo values

A*H- f UA*H-M mol-'

A*S- f oA*s"/J mol-' K-'

103.6 f 2.9 29.2 f 8.8 22.7 f 6.8 101.8 f 2.6 23.3 f 7.0 102.0 f 2.6 100.0 f 2.2 16.3 f 4.9 98.3 f 1.7 10.2 f 3.1 10.1 f 3.0 98.2 f 1.7 96.4 f 1.3 3.9 f 1.2 -2.2 f 0.7 94.7 f 1.0 94.6 f 1.0 -2.3 f 0.7 92.9 f 0.7 -8.2 f 2.5 91.1 f 0.4 -13.8 f 3.8 -14.0 f 3.8 91.1 f 0.4 -19.8 f 5.0 89.3 f 0.0 -25.3 f 6.0 87.5 f 0.4 -25.5 f 6.0 87.5 f 0.4 -31.1 f 8.1 85.7 f 0.8 -34.5 f 9.0 84.6 f 1.0 -36.6 f 9.4 83.9 f 1.1 A*C,- = -359 f 74 J mol-' K-'

GonGalves et al.

9540 J. Phys. Chem., Vol. 98, No. 38, 1994

ments were performed at each temperature, under the same experimental conditions. For the MC method, 200 numerical simulations were used and the scattering factor, found as described earlier on the basis of the Arrhenius equation, was 0.43. This relatively large value reflects the fact that values from two different experimental sources were used in the calculations.

The use of MC simulation showed that the parametric error estimates of TAF are too conservative and that this can be overcome, allowing thus sounder mechanistic interpretations. Finally, the same type of treatment can be easily applied to expressions for the study of the dependency of rate constants on pressure24and of equilibrium constants on temperature and pressure, where collinearity is also present.

Final Considerations

Acknowledgment. This work was supported by the INIC and JNICT, Lisbon, Portugal.

Our results clearly show that correct weighting of the data is very important in dealing with rate constant temperature equations. The use of unweighted regression leads to biased standard deviations, as can be seen through residual analysis. Weighting also increased the intemal consistency of the tested data set. Subsets of the original data (e.g., taking out the extreme points of the interval or considering only our own results) have also been tested. With nonweighted least squares analysis the number of terms in the best fit equation and even the sign of the parameters may change. The same tests performed with weighted regression lead consistently to the same number of coefficients in the best fit equation and comparable estimated parameters. This is very significant since all these aspects influence dramatically the TAF values and thus the mechanistic conclusions that can be drawn thereafter. Although weighting ought to be used whenever a transformation of variable is involved and/or the errors in the experimental values differ significantly, the analysis of the results presented in Table 2 shows that the variances of the parameters of temperature equations calculated from the diagonal values of the variancekovariance matrix can nevertheless become very large. Several authors,2~3~10~11 showed that the two last terms in the three-term equation that arises from the Clarke and Glew treatment are highly correlated and that this accounts for the abnormally high values of both parameter variances and covariances. Our results also indicate the adequacy of the MC simulations to provide useful estimates of confidence intervals for the coefficients derived from In k equations, especially when parametric covariances are important. The calculation of confidence intervals by elimination of extreme values, as proposed by Alper and Gelb,13J4precludes the need for the often erroneous assumption of normally distributed parameters.

=fin

References and Notes (1) Clarke, E. C. W.; Glew, D.N. Trans. Faraday Soc. 1966,62,539. (2) Wold, S.; Ahlberg, P. Acta Chem. Scand. 1970, 24, 618. (3) Blandamer, M. J.; Robertson, R. E.; Ralph, E.; Scott, J. M. W. J . Chem. Rev. 1982, 82, 259. (4) Gonplves, R. M. C.; Martins, F. E. L. An. Quim. 1990, 86, 694. (5) GonGalves, R. M. C.; Simdes, A. M. N.; Albuquerque, L. M. P. C.; Formosinho, S. J. J . Chem. Soc., Perkin Trans 2 1991, 931. (6) Gonplves, R. M. C.; Simdes, A. M. N.; Leitlo, R. A. S. E.; Albuquerque, L. M. P. C. J . Chem. Res. (S)1992, 330; (M) 1992, 2701. (7) Bevington, P. R. Data Reduction and Error Analysis for the Physical Sciences; McGraw-Hill: New York, 1969. (8) Levie, R. J . Chem. Ed. 1986, 63, 10. (9) Kalantar, A. H. J . Phys. Chem. 1986, YO,6301. (10) Blandamer, M. J.; Robertson, R. E.; Scott, J. M. W. Can. J . Chem. 1980, 58, 772. (11) Blandamar, M. J.; Robertson, R. E.; Scott, J. M. W.; Vrielink, A. J. Am. Chem. SOC.1980, 102, 2585. (12) Milton, J. S.; Amold, J. C. Probability and Statistics in the Engineering and Computing Sciences; McGraw-Hill: New York, 1986. (13) Alper, J. S.; Gelb, R. I. J. Phys. Chem. 1990, 94, 4747. (14) Alper, J. S.; Gelb, R. I. J . Phys. Chem. 1991, 95, 104. (15) Gomes, L.; Vital, J.; Lobo, L. S. CHEMPOR '89, Lisbon, 1989. (16) Gatland, I. R.; Thompson, W. J. Am. J . Phys. 1993, 61, 269. (17) Reference Manual on Random Number Generation and Testing; IBM, 1959. (18) Euranto, E. K.; Kanerva, L. T. J. Chem. Soc. Faraday 1 1983, 79, 1483. (19) Biordi, J.; Moelwyn-Hughes, E. A. J . Chem. Soc. 1962, 4291. (20) Martins, F. E. L. Ph.D. Thesis, Lisbon, 1993. (21) Brownlee, K. A. Statistical Theory and Methodology in Science and Engineering, 2d ed.; John Wiley and Sons: New York, 1965. (22) Swinboume, E. S. Analysis of Kinetic Data; Thomas Nelson: London, 1971. (23) Gonplves, R. M. C.; Martins, F. E. L.; Simdes, A. M. N. An. Quim. 1992, 88, 417. (24) Albuquerque, L. M. P. C. High Temp.-High Press. 1992,24, 697.