Calculating the Confidence and Prediction Limits ... - ACS Publications

Feb 13, 2017 - Calculating the Confidence and Prediction Limits of a Rate Constant at a Given Temperature from an Arrhenius Equation Using Excel...
0 downloads 0 Views 302KB Size
Communication pubs.acs.org/jchemeduc

Calculating the Confidence and Prediction Limits of a Rate Constant at a Given Temperature from an Arrhenius Equation Using Excel Ronald A. Hites* School of Public and Environmental Affairs, Indiana University, Bloomington, Indiana 47405, United States ABSTRACT: Calculating error limits for rate constants derived from a fitted Arrhenius expression is problematic. In the past, de Levie has suggested that centering the data around the mean ln(k) and 1/T values before doing the regression gives a correct result. It is suggested here that a convenient and standardized approach (and thus, one likely to be used by students) is to calculate the confidence and prediction limits of the rate constant at a given temperature using the Regression Data Analysis tool of Excel and using some of Excel’s built-in functions. KEYWORDS: Second-Year Undergraduates, Physical Chemistry, Computer-Based Learning, Kinetics



BACKGROUND The application of the Arrhenius relationship, which links a reaction’s rate constant to the reaction temperature, presents some unusual statistical challenges. The linearized form of the Arrhenius equation is1,2 ln(k) = ln(A) −

⎛ Ea ⎞⎛ 1 ⎞ ⎜ ⎟⎜ ⎟ ⎝ R ⎠⎝ T ⎠

Table 2. Output of Excel’s Regression Data Analysis Tool Using the Data in Table 1 Regression Statistics Multiple R R square Adjusted R square Standard error Observations

(1)

where k is the rate constant at temperature T, A is the “preexponential factor”, Ea is the reaction’s activation energy, and R is the gas constant. Thus, a plot of ln(k) versus 1/T is a straight line with an intercept of ln(A) and a slope of −Ea/R. Obtaining this intercept and slope using linear regression is conveniently done in Excel. In addition, Excel’s Regression Data Analysis Tool can be used to get the standard errors of these two parameters. After exponentiation of the intercept plus and minus its error, one can express A and its standard errors. The slope of the line is −Ea/R, and the Regression tool gives its standard error directly. As indicated by de Levie,3 it is tricky to get a reasonable error for a rate constant back-calculated from these results at a specific temperature. Table 1 and Table 2 show the regression for the example Arrhenius data used by de Levie.3 The resulting line fits the

Degrees of Freedom Regression Residual Total

Intercept Slope

a

k, min−1

1/T

ln(k)

298 303 308 312

0.000560 0.00102 0.00183 0.00278

0.003356 0.003300 0.003247 0.003205

−7.4876 −6.8880 −6.3034 −5.8853

1 2 3

Mean Sums of Squares

1.4623 1.4623 0.0004 0.0002 1.4627 Regression Results

F Stat

P Value

7439.4206

0.0001

Coefficients

Std Error

t Stat

P Value

28.3585 −10680.4418

0.4058 123.8282

69.8756 −86.2521

0.0002 0.0001



IDENTIFYING THE PROBLEM What if one wants to know the rate constant at, say, 25 °C (298.15 K), and what if one wants to know the error associated with that result? From the Arrhenius equation, the rate constant at this temperature is 5.73 × 10−4 min−1. But how does one get the error associated with this particular rate constant value? It is tempting to do a simple propagation of error analysis of eq 1 and get ⎡ ⎤1/2 ⎛ 1 ⎞2 s ln(k) = ⎢s ln(A)2 + ⎜ ⎟ s Ea / R 2 ⎥ ⎝T ⎠ ⎣ ⎦

See ref 3.

data very well (R2 = 0.9997 and F = 0.0001). Table 2 shows that Ea/R = −10680.4 ± 123.8 K and that A = exp(28.3585) = 2.07 × 1012 min−1. The upper limit increment of A is exp(28.3585 + 0.4058) − A = 1.04 × 1012 min−1, and the lower limit increment is exp(28.3585−0.4058) − A = −0.69 × 1012 © XXXX American Chemical Society and Division of Chemical Education, Inc.

Sums of Squares

min−1. Note that these two limits are not symmetrical; thus, the 12 Arrhenius expression for this reaction is k = (2.07+1.04 −0.69) × 10 exp[−(10680 ± 124)/T].

Table 1. Measured Rate Constants (k’s) for the Hydrolysis of p-Nitrophenol Acetate in Ethanol/Water as a Function of Temperature (T)a T, K

0.9999 0.9997 0.9996 0.01402 4 ANOVA

(2)

Received: October 31, 2016 Revised: January 31, 2017

A

DOI: 10.1021/acs.jchemed.6b00842 J. Chem. Educ. XXXX, XXX, XXX−XXX

Journal of Chemical Education

Communication

The confidence limits (cl’s) in y for a given x value are

where sln(k) is the standard error associated with ln(k), sln(A) is the standard error associated with ln(A) (the intercept), and sEa/R is the standard error associated with −Ea/R (the slope). Both of the latter error values come from the regression analysis as shown in Table 2. Using these results and T = 298.15 K, one finds sln(k) = 0.581. Using eq 1 and including this error, ln(k) at 298.15 K is −7.464 ± 0.581. Exponentiating this gives a rate +4.51 constant k(298) = (5.73−2.53 ) × 10−4 min−1. These are enormous errors and are clearly wrong. A large part of this problem is that the 1/T values are not anywhere close to the origin. Figure 1 shows the data from this

1/2 ⎛1 (x − x ̅ )2 ⎞ ⎜ ⎟ cl(x) = ±tcritsreg ⎜ + n ⎟ ∑i = 1 (xi − x ̅ )2 ⎠ ⎝n

(3)

where xi values are the independent variables (1/T values in this case), n is the number of measurements, tcrit is the critical value from the Student’s t-distribution for the correct number of degrees of freedom (n − 2) and for the desired significance level (usually 0.05 or 0.01), x̅ is the mean of the xi values, and sreg is the standard error of the regression.4 Notice that the quantity (xi − x)̅ appears in this equation, which is the same as centering the x values on their mean as suggested by de Levie. The value of sreg is given by n ⎛ 1 ⎞1/2 sreg = [∑ (yi − yi ̂ )2 ]1/2 ⎜ ⎟ ⎝ (n − 2) ⎠ i=1

(4)

where yi values are the measured dependent variables [ln(k) values in this case] and ŷi is the fitted value for each xi.4 This equation is the heart of the least-squares fitting method, and it is this sum which is being minimized by changing the intercept and slope, hence the term “least squares”. In Excel, one does not need to calculate sreg because, in the Regression Data Analysis tool output, it is given in the cell called “Standard Error”. More properly, sreg should be called the standard deviation in y; this is an important distinction because standard errors will decrease (generally by a factor of √n) as the number of observations increase, while standard deviations will not. There is also a built-in Excel function called STEYX that will give the value of sreg. Another built-in function, DEVSQ, will give ∑(xi − x)̅ 2. The calculation of the confidence limits of k(298) for the data shown in Table 1 uses the following values: 1/T = 0.003356, the average of the four 1/T values = 0.003277, and the sum of the squared deviations of each 1/T from the mean of these values = 1.282 × 10−8. The critical t value is obtained using the Excel function TINV, which for the arguments of two degrees of freedom and 5% significance returns a value of 4.30. As given in Table 2, sreg = 0.01402. Hence, at T = 298 K, using eq 3, the confidence limit is ±0.0517. Casting this result out of logarithmic space, one gets the 95% confidence limits of k(298) −4 = (5.63+0.30 min−1, which is a realistic error. −0.28) × 10 What exactly does a confidence limit mean in this context? These limits indicate that, if one were to repeat the entire experiment again with, in this case, four measurements, one would have a 95% probability of determining the rate constant at 298 K to be in the range 5.35 × 10−4 to 5.93 × 10−4 min−1. This does not mean that if someone measured the rate constant at 298 K with a single independent experiment that they would probably get a value in this range. In this case, one needs the prediction limits (pl’s), which for a given x value are4

Figure 1. Arrhenius plot of the data from Table 1 showing the linear regression (solid red line) and the full range of the 1/T and ln(k) values such that the origin is included. See Table 2 for regression details. This plot was constructed with SigmaPlot Ver. 13.

example plotted on a scale that includes 1/T = 0. Note that the data are clustered in a tiny space at the far right of the plot and that a projection of the linear regression line to the y-axis is not accurate. A small change in the measurements would result in a large change in the intercept. A literal reading of this plot would indicate that the rate constant of this reaction at an infinite temperature (1/T = 0) is on the order of 2 × 1012 min−1, which is silly. In reality, the linearity of the Arrhenius equation is true only over a limited temperature range, which is usually given with the equation’s fitted parameters. In fact, some Arrhenius plots have inflection points, indicating that the mechanism of the reaction is different in different temperature regimes. de Levie3 has noted this problem before. He attributed it to the collinearity of the intercept and slope and suggested centering the data around the means of ln(k) and 1/T and doing the regression of the resulting adjusted data. Indeed, this approach gives correct results, reducing the error in de Levie’s example to (5.73 ± 0.07) × 10−4 min−1.



CONFIDENCE AND PREDICTION LIMITS A standardized alternative to this general, but cumbersome, approach is to use a convenient formula for the standard error of a computed y value, which is applicable to unweighted linear fits. The resulting error estimate can then be presented as a confidence or prediction limit rather than as a standard error. The approach espoused by de Levie and the calculation of confidence limits are actually the same thing, as will be apparent in the calculations that follow. The main difference is that the calculation of confidence limits can be done in Excel, requires no additional macros, is likely to be used by students after they graduate, and, perhaps most importantly, gives a result with a built-in error limit (for example ±95%).

1/2 ⎛ (x − x ̅ )2 ⎞ 1 ⎟⎟ pl(x) = ±tcritsreg ⎜⎜1 + + n n ∑i = 1 (xi − x ̅ )2 ⎠ ⎝

(5)

Typically, in most papers reporting Arrhenius regressions, the rate constants back-calculated from the regression at a specific temperature are reported as the 95% confidence limits because these limits are the best estimate of the error associated with this particular set of experimental data. Nevertheless, one should be aware of the differences among standard deviations, standard errors, confidence limits, and prediction limits. B

DOI: 10.1021/acs.jchemed.6b00842 J. Chem. Educ. XXXX, XXX, XXX−XXX

Journal of Chemical Education

Communication

details of doing this for the Arrhenius equation have been worked out by Cvetanovic et al.5 Other statistical packages, such as Minitab, can also be used to implement a weighted linear regression.

Once calculated, confidence and prediction limits can be plotted on the same graph that shows the data and the regression line. Figure 2 shows the data from Table 1, the fitted



CONCLUSION Clearly, students should properly learn to use the common statistical tools that are now almost universally available to them in Excel. Unfortunately, there is no single function in Excel that will calculate confidence and prediction limits, but implementing these equations in user-defined functions in Excel is easy (and instructive).



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. ORCID

Figure 2. Arrhenius plot of the data from Table 1 showing the linear regression (solid red line), the 95% confidence limits (dashed green lines), and the 95% prediction limits (dotted blue lines). The points are labeled with the temperature of the measurement. The regression details are given in Table 2. This plot was constructed with SigmaPlot Ver. 13.

Ronald A. Hites: 0000-0003-0975-5058 Notes

The author declares no competing financial interest.

■ ■

ACKNOWLEDGMENTS I thank Philip S. Stevens for helpful comments on an early draft of this paper.

line, the lower and upper 95% confidence limits, and the lower and upper 95% prediction limits. These limits look surprisingly wide, but the back-calculated 95% prediction limits of k(298) = −4 (5.63+0.47 min−1 are only about a ±10% relative error. −0.43) × 10



REFERENCES

(1) Arrhenius, S. About the reaction rate in the inversion of cane sugar. Z. Phys. Chem. 1889, 4, 226−248. (2) Logan, S. R. The origin and status of the Arrhenius equation. J. Chem. Educ. 1982, 59, 279−281. (3) de Levie, R. Collinearity in least-squares analysis. J. Chem. Educ. 2012, 89, 68−78. (4) Kennedy, J. B.; Neville, A. M. Basic Statistical Methods for Engineers and Scientists, 2nd ed.; A Dun-Donnelley: New York, 1976; pp 264−268. (5) Cvetanovic, R. J.; Singleton, D. L.; Paraskevopoulos, G. Evaluations of the mean values and standard errors of rate constants and their temperature coefficients. J. Phys. Chem. 1979, 83, 50−60.

COMPLICATIONS It is important to remember that the calculated confidence and prediction limits of the rate constant at a given temperature include a critical t value, which changes with the number of degrees of freedom, that is, with the number of data used for the regression. In the case of de Levie’s example,3 there were only four data points in the regression so the t value was 4.3. In fact, this is the difference between de Levie’s error of ±0.07 and the 95% confidence limits reported here. Of course, one can select any level of confidence one wants. Usually, one reports the 95% confidence limits, but there may be occasions when it is better to report the 99% confidence limits. This is a simple change in the arguments going into the TINV function. For example, using the data in Table 1, the 99% confidence limits for the rate constant at 298 K would be k(298) = (5.63+0.71 −0.63) × 10−4 min−1, and the 99% prediction limits would be k(298) = −4 (5.63+1.13 min−1. In this case the 99% limits are very −0.94) × 10 much larger than the 95% limits because there are only four observations, which means the critical t value changes from 4.3 to 9.9, which is a lot. By the way, this is an argument for using as many data points for a regression analysis as possible. In this case, if one had 10 measurements instead of 4, the critical t values would be 2.3 and 3.4 at 95% and 99% significance, respectively. In practice, there is another complication: Sometimes the individual rate constants that go into the Arrhenius equation regression analysis have different precisions. For example, one may have measured the rate constant at a given temperature many more times that at the other temperatures; thus, the precision of such a rate constant is higher than those at the other temperatures. If one wants to explicitly use these varying precisions in the Arrhenius equation regression, one would need to do a weighted regression, which is beyond the scope of this communication and of the Regression tool in Excel. The C

DOI: 10.1021/acs.jchemed.6b00842 J. Chem. Educ. XXXX, XXX, XXX−XXX