Information
•
Textbooks
•
Media
•
Resources edited by
Computer Bulletin Board
Steven D. Gammon Western Washington University Bellingham, WA 98225-9150
Design of an Excel Spreadsheet To Estimate Rate Constants, Determine Associated Errors, and Choose Curve’s Extent
W
Luís Moreira and Filomena Martins* Department of Chemistry and Biochemistry, Faculty of Sciences, University of Lisbon, CQB, Lisboa, Portugal; *
[email protected] Ruben Elvas-Leitão Department of Chemical Engineering, Engineering Institute (ISEL), Polytechnic Institute of Lisbon, CQB, Lisboa, Portugal
The determination of rate constants, k, is an important feature in the study of solvent effects. Typically, a reaction is carried out in several media and k values are analyzed to disclose different properties or reactivity aspects that are responsible for reaction rates dissimilarities. Often, a reaction performed in a given solvent is followed at least three times to obtain a statistically meaningful average k value, thus enlarging the number of total experiments. The need for a reliable and expeditious way to work out these determinations becomes, therefore, imperative. In this article we wish to present a new Microsoft Excel spreadsheet design that enables a prompt calculation of k values and associated errors and also addresses less common features such as the choice of an experimental curve’s extent. This procedure, free from any auxiliary macros, although partly similar to others reported, relies upon a novel approach that considerably simplifies calculations, without any loss in reliability, and adds new tools to achieve and evaluate results. Overview The use of Excel spreadsheets has been extensively reported in this Journal. Several contributions have highlighted their immense pedagogical potential. They were successfully used, for instance, to produce curve simulations for enzyme kinetics (1), perform curve fitting through a trial-and-error approach to deal with potentiometric titrations (2), or carry out curve fitting using Excel’s routine Solver. This last application allowed, for example, the analysis of electronic and infrared spectra (3, 4) and voltammograms (5); the deconvolution of overlapped gas chromatographic peaks (6); the determination of the van Deemter’s equation coefficients from experimental gas chromatographic data (3, 7, 8); the determination of standard addition and calibration curves (9), equilibrium constants, and unknown concentrations in acid– base titrations (8); and enthalpies of solution (10) and reaction orders or other kinetic parameters (8, 11–13). In kinetic studies, nonlinear curve fitting offers some advantages over linearization procedures or time-lag methods. Biases introduced by the use of unweighted or inappropriately weighted linearization procedures, for example, have been thoroughly addressed (14) and it has been shown that results from correctly weighted linearized fits match those from nonlinear ones (15). Also, problems arising from neglecting covariance in the estimation of parameters’ uncerwww.JCE.DivCHED.org
•
tainties have been detected (9) in linearization approaches. On the other hand, time-lag methods necessarily involve equally spaced data acquisition and grouping of kinetic data, thus limiting the number of points used in rate determinations (13). Additionally, in certain cases, a slight dependence on the chosen fixed-time interval has been noticed (16). However, the use of Solver over dedicated commercial software also presents some disadvantages; namely, its inability to provide standard deviations for the determined coefficients, rightly considered as important as the optimized values themselves (7, 17). To overcome this drawback, several authors have used auxiliary macros like SolverAid (9) written in VBA or Solvstat (17), written in XLM. The jackknife procedure (7), considered impractical for all but small data sets (8) or the direct calculation of uncertainties (3, 8, 13), via established formulas, available in several statistical textbooks, have also been considered likely options. The latter offers the possibility of acquiring an insightful knowledge of statistical and numerical analysis theory and its practical application on spreadsheets’ design. Spreadsheet Design
General Considerations This spreadsheet was developed to determine k values for first-order reactions but it can also be used to attend pseudofirst-order conditions or easily changed to address different reaction orders. The reactions selected to illustrate the use of the spreadsheet are the heterolysis of 3-bromo-3methylpentane in protic and aprotic media (18). Alkyl bromide concentration was 0.01 mol dm3 and a small quantity of 2,6-lutidine was added to the solution, when necessary, to prevent acid formation. The reactions proceeded at 25.00 C.1 The integrated rate law can thus be expressed as product concentration, c, measured as a function of time, t, where c0 represents the initial reagent concentration:
(
c = c 0 1 − e − kt
)
(1)
Often, one performs an indirect measurement of concentration in terms of a related property, P. In our case, all reactions produce a charged species responsible for an increase in conductance, G. This allowed the use of this property as a kinetic probe. Two possibilities can arise when dealing with
Vol. 83 No. 12 December 2006
•
Journal of Chemical Education
1879
Information
•
Textbooks
•
Media
•
Resources
indirect concentration measurements: either there is a linear relationship between the property and c (2)
c = a0 + a1P
(where a0 and a1 are fitting parameters) or calibration curves must be obtained to calculate the concentration values from the measured property and carry on with k’s calculation. In either case, one obtains a collection of (t, P ) pairs. Combining eqs 1 and 2, we obtain
P = b + m e −kt
(3)
where b = (c0 − a0 )兾a1 and m = (c0 )兾a1, from which k can be computed using nonlinear adjustment by maximizing the coefficient of determination, R 2, that is, the square of the linear correlation coefficient, R , for all (ekt, P ) pairs. This optimization condition is not commonly used in fitting designs. The most frequent option is to minimize the sum of squares due to error (SSE). We must, however, emphasize the usefulness of our choice. In fact, if one deals with a large number of kinetic curves, it is easier to evaluate a normalized value, R 2, that can be directly compared to those obtained for other curves, rather than a number, such as SSE, dependent on the magnitude of variation of the measured property. To demonstrate the mathematical equivalence of both approaches, let us consider the following equations 2
N
SSE =
∑ ( yn
− yˆn )
(4)
n =1
2
N
SSM =
∑ ( yn
− y)
(5)
n =1
R2 = 1 −
Choice of Curve’s Extent
SSE SSM
(6)
where yn is the nth measured dependable variable, y^n is the nth dependent variable adjusted to fit, y– the arithmetic average of all measured dependent variables, N the number of considered points, and SSM is the sum of squares about mean. It is clear that minimizing SSE is equivalent to maximizing R 2, since SSM is independent of the adjustment procedure.
Rate Constant Determination One can now start to create the spreadsheet. The worksheet is renamed data and columns B and C are reserved to import, or write directly, time and property values. The k value in min1 is assigned to cell H25. Initially, a reasonable starting value for this variable is introduced (this will be detailed further on). Column O is used to determine ekt. So, one enters the formula EXP(-$H$25*B1) on cell O1 and then pastes it into the other 4999 below. Throughout the spreadsheet design all built columns will have this size, since it is assumed that no experiment will exceed this
1880
Journal of Chemical Education
number of data points. Cell L21 is used to calculate R2 through Excel’s function CORREL(C1:Cz;O1:Oz) (where z is the number of the last cell containing numerical information in columns B and C). Also, b and m values are calculated through functions INTERCEPT(C1:Cz;O1:Oz) and SLOPE(C1:Cz;O1:Oz). These two values are assigned to cells H23 and H24. Once these calculations are performed, the user can call upon Solver (Tools > Solver), set L21 as the target cell to be maximized, and H25 as the changing cell. After changing the convergence to 1 1010 in the options window, press OK, and then solve. After a few iterations, Solver finds the best solution, and the user should press OK to keep it. With optimized k, b, and m values, predicted P versus t can be plotted. To obtain this plot, place $H$23+$H$24 *EXP(-$H$25*Q1) expression in cell R1, assign Q1 cell equal to B1 and paste throughout columns R and Q. Time, t, transference adds the possibility to artificially increase the time column, allowing an extended visualization of the estimated plots, without altering the direct acquisition columns (B and C). For practical reasons, it is recommended that the predicted plot is implemented in a new sheet, named chart, together with the experimental curve, to envisage the fitting. Although the comparison between experimental and predicted data produces a useful means to evaluate the fit, more powerful information can be available if residual plots are depicted. To implement this plot set aside columns T and U to input the ordinal of the acquired (t, P ) pair as the x axis and the difference between P and (b mekt ) as the y axis, respectively (this last information can be calculated from the difference between columns C and R). Since it is best to obtain a percentage value in the residual plots, the difference must be divided by C column values and percentage selected as the format number presentation. This plot should be implemented in the data sheet, within visual range.
•
Often, it is not convenient to use all the acquired experimental data points, since the linear relationship between the property and concentration might be restricted to a given concentration range. The analysis of the residual plots may become useful to detect this problem. The user might also want to set a default range related to the extent of reaction. However, the user should have the possibility of easily changing the curve limits used in k determination, without having to modify numerous functions and plotting parameters. The best way to implement this is to assign two cells for the working lower and upper limits of acquisition data. We thus define cells G7 and G8 as these limits and choose percentage as the format number presentation.2 Values of 20% and 90% were used in the given examples. The choice of these limits should preferably be connected to an inspection of the residual plots. Nevertheless, it is always useful to set default values related to the extent of reaction. Initially, the calculation of the limits must be linked to the experimental curve to avoid a circular reference. Take cells G18 and G19 and use the MIN(C:C) and MAX(C:C) functions to determine the minimum and maximum P values (Pmin and Pmax ). Cells H18 and H19 calculate the experimental
Vol. 83 No. 12 December 2006
•
www.JCE.DivCHED.org
Information
values corresponding to the lower and upper percentage limits (Px’s), as defined by cells G7 and G8, the so-called chosen limits (x ’s). This is done by applying eq 7 (transformed to calculate the needed Px values). x =
Px − Pmin Pmax − Pmin
Plot limits can also be automatically changed if a data collection is defined to determine which points appear. Given that, it seems useful to add another plot to the experimental and predicted curves, that is, the predicted curve obtained within fitting range. To define data collections, on the toolbar, press Insert > Name > Define. Then, add the names max and min to the workbook and assign them to cells I8 and I 7 , respectively. Next, define r a n g e 0 as =OFFSET(data! $T$1;min-1;0;max-min+1;1) and range1, range2, and range3 in the same way, substituting T for U, Q, and R, successively. Percentage residuals can now be plotted setting up x values =data!range0 and y values =data!range1. Also, the predicted curve obtained within fitting range can be added to the chart sheet with x values as =data!range2 and y values as =data!range3.
Calculation of Uncertainties To complement this spreadsheet, it is essential to calculate the uncertainties, i , associated to the adjustable parameters, ai . This can be achieved through
pii
SSE N − v
www.JCE.DivCHED.org
(8)
•
•
Media
•
Resources
where SSE is defined as before, N is the number of considered points, v is the number of adjustable variables, and pii1 denotes the iith element of the v × v inverted curvature matrix. Each element of the Pij matrix (curvature matrix) is calculated from N
pij =
=INTERCEPT(INDIRECT(“C”&I7):INDIRECT (“C”&I8);INDIRECT(“O”&I7):INDIRECT(“O”&I8)). Cells H24 and L21 should be likewise changed.
σk =
Textbooks
(7)
The next step is to inquire what cell numbers contain the closest values to H18 and H19. Those should be the array limits in the first optimization of k. To find those cell numbers use MATCH(value;C:C) function and cells H18 and H19 as the value. The results are shown in cells I18 and I19. However, only after k, b, and m values are determined through Solver, can one clearly determine the actual 20% and 90% limits in the reaction’s extent (). The same methodology is used, placing in cells H26 and H27 the lower and upper values of the predicted curve using the same formula as above, replacing the maximum value (Pmax ) by b and the minimum (Pmin ) by (b m) and repeating the calculation of the range limits in cells I26 and I27. Cells I7 and I8 are taken as the user’s final choice to set the minimum and maximum array limits. They can initially be set as I18 and I19 values to perform the first Solver run and then be manually modified either to I26 and I27 values or to another set of limits, chosen from an analysis of the residual plots. The b, m, and R2 formulas may now be changed to include the considered limits. This is done by using function INDIRECT, which allows calling upon a given column with limits equal to previously chosen cell values. For instance, b formula is rewritten as
−1
•
∂ F
∑ ∂ ani
n =1
∂ Fn ∂ a j
(9)
Subscripts i and j range from 1 to v, aj also represents an adjustable parameter, and Fn is the adjusted function, which is defined as Fn = a1 + a2 e − a3 t n
(10)
Partial differentials have to be calculated and correspond to ∂ Fn = 1 ∂ a1
(11)
∂ Fn = e − a3 t n ∂ a 2
(12)
∂ Fn = − a2 tn e − a3 tn ∂ a3
(13)
To build the auxiliary matrix formulas, we must perform another five intermediate calculations. First, SSE is calculated in column W, according to eq 4. Then, ∂ Fn ∂ Fn ∂ Fn ∂ Fn ∂ Fn ∂ Fn ∂ Fn ∂ Fn , , , ∂ a1 ∂ a3 ∂ a 2 ∂ a2 ∂ a 2 ∂ a3 ∂ a3 ∂ a3 are calculated in columns Y, Z, AA, and AB using equations 11 to 13. Next, assemble Pij in a 3 × 3 cells area (G35 to I37), where each Pij component is the sum of a correspondent column, taking the previously defined limits. Take, for instance, p12, set it as =SUM(INDIRECT(“O”&I7):INDIRECT (“O”&I8)). Notice that the value of p11 is simply calculated from (I8-I7+1) and that pij = pji. To obtain the error matrix (Pij1) paint a 3 × 3 area (cells G39 to I41) and then write =MINVERSE(G35:I37) but instead of pressing Enter, press Shift+Control+Enter to view the full matrix. Finally, calculate SSE兾(N − v) by setting cell H 3 3 as =(SUM(INDIRECT(“w”&I7):INDIRECT (“w”&I8)))/ ((I8-I7+1)-3), and place the square root of product of cells H33 and I41, divided by 60, in cell L17. This corre-
sponds to k in s1. It is also best to calculate k value in s1 and place it in L16, while cell L19 is filled with log(k兾s1). The relative error in k is presented in L18.
Relevant Information and General Improvements Other relevant information should be included to improve the spreadsheet, namely, minimum and maximum and t limits used to determine k and total measured and t. Equation 7 is used to calculate the first two values. For in-
Vol. 83 No. 12 December 2006
•
Journal of Chemical Education
1881
Information
•
Textbooks
•
Media
•
Resources
Figure 1. Data from 3-Br-3-MePe + Prop. Carb. final 20%–90% sheet.
In summary, to work with this spreadsheet the user must first enter time and measured property values in columns B and C, respectively. Secondly, the user must choose the limits for (initially referred to P ) and an initial k value.3 Thirdly, Solver should be run for the first time. Finally, the user should adjust range limits, either according to those selected for or from the evaluation of dispersion plots4 and then run Solver again until the adjusted parameters obtained are in agreement with the user’s decision. Final Remarks
Figure 2. Chart from 3-Br-3-MePe + Prop. Carb. final 20%–90% sheet
stance, minimum is calculated in cell L24 using the formula =(INDIRECT(“C”&min)-(H24+H23))*100/(H23(H23+H24)). Maximum , placed in cell L25, is obtained by just substituting min by max. Minimum time is determined through the =INDIRECT(“B”&min) formula and placed in cell M24 and maximum time is obtained similarly. Total is calculated in cell L28 using the expression =(MAX(C:C)-(H24+H23))*100/(H23-(H23+H24)) and on its right, total time is obtained through an inspection of the time column, searching for its maximum value. To complete the spreadsheet design, several highlights and squared colored boxes were included to assist the user. Colors were chosen to differentiate several types of information (Figure 1, color available in the online version): green for information provided by the user, black for complementary data and red for the final relevant information. Plots were also presented in different colors to help identifying the three curves defined in a previous section (Figure 2, color available in the online version). It is important that the spreadsheet is user friendly to facilitate a new user’s adaptation or to hasten one’s own work. 1882
Journal of Chemical Education
•
Selected examples, given in the Supplemental Material,W provide the chance to directly interact with the devised spreadsheet before the user starts to construct his or her own spreadsheet. This spreadsheet and the examples can be used to test the determination of k for different extents of reaction and, if a well-established relationship between the measured property and concentration is known, assess the adequacy of a firstorder behavior. On the other hand, the same approach can be used to clearly identify situations for which a relationship between the property and concentration is not known and a strong effect in k is observed as a result of the use of different extents of reaction. In these cases, this procedure recommends that the best estimate of k is obtained when one uses a conveniently large extent of reaction, chosen in view of a careful analysis of the residual plots. The three examples provided in the Supplemental MaterialW illustrate the importance of a critical overview of the results and the several potentialities of the spreadsheet. The perfect match between calculated k and associated errors given by the designed spreadsheet and those obtained through commercial software,5 demonstrates the absence of any accuracy loss, thus confirming the merits of the calculation method and the changes introduced by us in the differentials computation. The tools provided in this spreadsheet allow the users to perform all the above tasks with a considerable economy of effort and time and contribute to develop a critical mind over these matters.
Vol. 83 No. 12 December 2006
•
www.JCE.DivCHED.org
Information
•
Textbooks
Notes
Literature Cited
1. For further experimental details please see ref 18. 2. In this format, although we see the number as 20%, if the cell is called to perform calculations its true value is 0.20. 3. A divergence from the solution might be noticed if iterations begin with a k value higher than the optimal one. To ensure that this does not occur, the user should confirm in the chart that the predicted curve does not come to a plateau before the experimental one. Plateau is here defined as the curve region where the measured property values become approximately constant. Using a default value of 1 105 min1 should suffice to prevent any divergence. 4. Remember not to assign them directly to cells I24 and I25 to avoid a circular reference. The changes should be manual. 5. Comparisons were made with Table Curve 2D, version 5.01.
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14.
Acknowledgments We would like to thank Nelson Nunes for helpful comments and suggestions and Bernard Liengme for the valuable information promptly provided concerning Excel. Luís Moreira gratefully acknowledges financial support from FCT through grant BD 6190/2001. W
Supplemental Material
Nine Excel for Windows spreadsheet examples are available in this issue of JCE Online.
www.JCE.DivCHED.org
•
•
Media
•
Resources
Bruist, M. F. J. Chem. Educ. 1998, 75, 372–375. Ma, N. L.; Tsang, C. W. J. Chem. Educ. 1998, 75, 122–123. Ogren, P.; Davis, B.; Guy, N. J. Chem. Educ. 2001, 78, 827–836. Iannone, M. J. Chem. Educ. 1998, 75, 1188–1189. Howard, E.; Cassidy, J. J. Chem. Educ. 2000, 77, 409–411. Arena, J. V.; Leu, T. M. J. Chem. Educ. 1999, 76, 867. Harris, D. C. J. Chem. Educ. 1998, 75, 119–121. de Levie, R. J. Chem. Educ. 1999, 76, 1594–1598. Salter, C.; de Levie, R. J. Chem. Educ. 2002, 79, 268–270. Brown, P. J. Chem. Educ. 2001, 78, 268–270. Machuca-Herrera, J. O. J. Chem. Educ. 1997, 74, 448–449. Vitz, E. J. Chem. Educ. 1998, 75, 1661–1663. Denton, P. J. Chem. Educ. 2000, 77, 1524–1525. Zielinski, T. J.; Allendoerfer, R. D. J. Chem. Educ. 1997, 74, 1001–1007.
15. McNaught, I. J. J. Chem. Educ. 1999, 76, 1457. 16. Hemalatha, M. R. K.; Noorbatcha, I. J. Chem. Educ. 1997, 74, 972–974. 17. Billo, E. J. Excel for Chemists; Wiley: New York, 1997; pp 295–300. 18. Martins, F.; Leitão, R. E.; Moreira, L. J. Phys. Org. Chem. 2004, 17, 1061–1066.
Vol. 83 No. 12 December 2006
•
Journal of Chemical Education
1883