A Global Least-Squares Fit for Absolute Zero - Journal of Chemical

Sep 1, 2003 - Department of Chemistry, Moravian College, Bethlehem, PA 18018 ... the global fit to absolute zero using Solver in an Excel spreadsheet...
0 downloads 0 Views 52KB Size
In the Classroom

A Global Least-Squares Fit for Absolute Zero Carl Salter Department of Chemistry, Moravian College, Bethlehem, PA 18018; [email protected]

The concept of absolute zero is essential to an understanding of ideal gas behavior, so it is not surprising that many undergraduate lab experiments have been designed to measure absolute zero (1–3). Usually a gas sample is trapped and its pressure or volume is measured as a function of temperature; when the data are plotted as P versus T or V versus T the x -intercept is the estimated value of absolute zero. If data are collected for several different gas samples, a graph will show lines of different slopes but all the lines will converge to the same x-intercept, a satisfying illustration of the universality of absolute zero. The purpose of this article is to promote the use of leastsquares (LS) fits in which the x-intercept is a fit parameter, so that a fit of gas thermometry data immediately yields an unbiased estimate of absolute zero and its error. (Previous articles in this Journal have suggested an immediate method that can produce a biased value of absolute zero; see discussion below.) It is possible to extract x-intercept information from a LS fit to the slope and y-intercept, but then the determination of the error in the x-intercept involves a tedious calculation using the variance–covariance matrix (4). While a direct fit to the x-intercept requires a nonlinear LS routine, such routines are commonplace and easy to use in many computer programs. Furthermore, this method is readily adapted to permit multiple sets of data from different samples and even different gases to be combined in a single LS fit to find absolute zero. This unusual “global” fit to absolute zero is the LS equivalent of the graphical procedure described above. This fit can be carried out with surprising ease in Excel using two spreadsheet tools that have already been described in this Journal: Solver (5–10), which performs nonlinear LS fits, and SolverAid (8–10), an Excel macro written to compute errors associated with Solver fits. Least-Squares Theory A set of n ordered pairs of data (xi, yi) can be fitted to an equation for a straight line, either y = mx + b (slope, y-intercept) or y = m(x − xint) (slope, x-intercept). The experimental error is assumed to reside solely in the y values, which in practice requires the more precise measurements to be assigned to the x variable in the fit. Minimizing the sum of the squared errors (SSE) in y leads to a pair of equations for the two fit parameters. The first fit function leads to a set of linear equations that can be expressed as a matrix equation Ap = y. The parameters p are solved by finding A᎑1. The second fit function leads to a nonlinear set of equations owing to the presence of the product of the two fit parameters, mxint. The equations can be solved by iteration, and when the fit has converged, an equivalent A᎑1 matrix can be constructed. For both fits the variance–covariance matrix V = sy2A᎑1, where the fit variance sy2 = SSE兾(n − 2). V contains all the error information for the fit function including the errors in the fit parameters. When these two LS fits are applied to identical data, the fit variances are equal and all residuals are identical; the two fits are completely equivalent. The V matrices

are not equal, but they do contain the same error information; in particular, it can be shown that the variance of the x-intercept is

s x2int

=

s 2y m2

nx i2nt − 2x int ∑ x i + n∑ x 2i −

∑ x 2i

( ∑ x i )2

where m is the slope of the fit line and the summations run from 1 to n. This equation, derived in ref 4 for a fit to mx + b, also applies to a fit to m(x − xint), as shown in the appendix of ref 9. For data from a gas thermometry experiment we assign T as the x variable and V or P as the y variable. Either fit function gives equivalent results, but the fit to m(x − xint) immediately yields absolute zero; the error in absolute zero (the square root of the above expression) will be the routine output of the standard deviation of the fit parameter. The fit must be “seeded” with estimated fit parameters, but these can be obtained roughly from graphing the data without fear of convergence problems because there is only one minimum on the SSE surface. If the usual fit to mx + b is carried out, absolute zero must be computed from ᎑b兾m, and the error in absolute zero must be obtained by hand using the equation above or by matrix algebra using V. It is tempting to avoid these postfit calculations by assigning temperature as the y variable and fitting the data to mx + b so that the fit immediately returns absolute zero as the y-intercept with its associated error. This procedure, which has been suggested at least twice in this Journal (1, 3), is usually incorrect because the pressure or volume measurements are less precise than the temperature measurements. When used incorrectly, this short cut will bias the value of absolute zero, the size of the bias depending on the distribution of the temperature data. Suppose we record ns sets of gas thermometry data by using different quantities of the same gas and perhaps even different gases. The advantage of the m(x − xint) formula is that the data can be fitted directly to ns slopes and one xintercept, permitting the determination of a global value of absolute zero and its error. Absolute cell referencing in spreadsheets makes it easy to adjust the fit function so that each set of data has its own slope parameter; Solver’s search routine handles this unusual situation with ease. The ability to define both “local” and “global” fit parameters is a powerful and an underappreciated feature of spreadsheet LS analysis. An equivalent fit can be obtained using ns slopes and ns y-intercepts by introducing ns − 1 constraint equations that force all the lines through the same x-intercept. Such constraints can be incorporated into the LS equations using the method of undetermined multipliers; Solver has the ability to add constraints to its search, and it will perform this fit without difficulty. But when the fit is complete the postfit calculations for absolute zero and its error still remain; SolverAid cannot handle the error calculation in this situation.

JChemEd.chem.wisc.edu • Vol. 80 No. 9 September 2003 • Journal of Chemical Education

1033

In the Classroom Table 1. Gas Thermometry Data for Helium Samples and Results of Least-Squares Fitting Set

T/°C

Pressure/torr

Absolute Zero/°C

Slope

SSE

sy

1

50

599

᎑277.4 ± 7.2

1.847 ± 0.055

205.9

10.1

0 ᎑70.5

512

᎑195.8

145

50

399

᎑274.5 ± 4.8

1.239 ± 0.025

41.5

4.6

0 ᎑69.5

341

᎑271.1 ± 2.9

0.635 ± 0.008

4.3

1.5

263.1

5.7

2

᎑195.8 3

Global

394

259 95

50

203

0 ᎑69

172 130

᎑195.8

47

-----

-----

276.1 ± 3.2

-----

NOTE: DATA fit to P = m(T − TAZ) using Solver and SolverAid.

1034

700

Pressure / torr

Data Analysis Data taken from a gas thermometry experiment performed by physical chemistry students at Moravian College is presented in Table 1. A sample of helium was trapped in a stainless steel sphere and plunged into a constant temperature bath at 50 ⬚C, into ice water, into dry ice兾acetone, and into liquid nitrogen. A digital pressure gauge was used to monitor the helium pressure. There are three sets of data for pressures of roughly 200, 400, and 600 torr at 50 ⬚C. A graph of the data and LS fits to each set of data is presented in Figure 1. The x-intercepts for the three sets of data agree well; the numerical values of absolute zero and their errors can be found in Table 1. Also recorded in Table 1 are the slopes and their errors, the standard deviation, and the sum of the squared errors (SSE) for each fit. These fits were performed using Solver and SolverAid as described in example 1 of ref 9. The results were corroborated using a nonlinear LS program written in QuickBasic, which uses a simple Newton– Raphson search described by Wentworth (11). An Excel spreadsheet that performs a global fit of all the data to one value of absolute zero is illustrated in Figure 2. The experimental data are in columns B and C, calculated values of pressure are in column D, residuals are in column F, fit parameters are in column H; parameter errors and the standard deviation of the fit (in italics) are in column I. The fit function changes for each set of data so that it refers to a different slope (cells H3–H5), but all fit functions refer to absolute zero in cell H6. Cell H8 contains the SSE of the global fit, which Solver minimizes by adjusting the fit parameters. The global value obtained for absolute zero is ᎑276.1 ± 3.2 ⬚C, in fair agreement with the values from the individual fits. The slopes are slightly different; the largest change is a 2% decrease in the slope of set 3, the smallest slope. The SSE of the global fit is slightly higher than the sum of the SSEs of the individual fits. The contribution of each set to SSE is computed in cells H10–H12 using the SUMSQ function for each set’s residuals in column F; the SSEs for each set are slightly higher than the SSEs in the individual fits. The results of global fit were also confirmed using the QuickBasic program.

0 -300

-200

-100

0

100

Temperature / °C Figure 1. Pressure versus temperature plot for gas thermometry data in Table 1.

Discussion The SSE of the global fit is 263.1, while the sum of the SSEs of the individual fits in Table 1 is 251.6. The constraint that all three lines must have the same x-intercept results in a 5% increase in SSE; this small increase confirms that the model of a common x-intercept is supported by the data. In fact, because the global fit has two more degrees of freedom than the combined individual fits, it has a smaller fit variance than the combined variances of the individual fits. We conclude that the global fit is a reasonable reduction of the experimental data. One of the advantages of the global fit is that it makes it easier to spot a point suffering large determinate error. The “dry ice” point in set 1, ᎑70.5 ⬚C, 394 torr, has a residual of more than 2sy in the global fit and so would qualify for rejection under some criteria. It is perhaps surprising that the value of absolute zero from the global fit is closest to that obtained from the fit to set 1, which has the largest error. The data in the global fit are weighted equally and no prior knowledge of measurement precision is assumed—this situation, known as a posteriori weighting (12), is the typical situation in a least-squares fit. Under such circumstances the variance formula for the x-intercept given above implies that, other things being equal,

Journal of Chemical Education • Vol. 80 No. 9 September 2003 • JChemEd.chem.wisc.edu

In the Classroom

A B C 1 Temp C Pres (torr) 2 set 1 50 599 3 0 512 4 -70.5 394 5 -195.8 145 6 set 2 50 399 7 0 341 8 -69.5 259 9 -195.8 95 10 set 3 50 203 11 0 172 12 -69 130 13 -195.8 47

D E F P calc fit formula residuals -6.3 605.3 =$I$3*(B2-$I$6) 0.0 512.5 12.3 381.7 -4.1 149.1 -2.7 401.7 =$I$4*(B6-$I$6) 0.0 340.1 4.5 254.5 -3.9 98.9 0.0 203.0 =$I$5*(B10-$I$6) 0.0 171.9 1.1 128.9 -3.0 50.0

G

H I Global Fit

m1

1.856151 0.025927

m2

1.231653 0.019365

m3

0.622536 0.014189

Ab zero -276.121 3.230391 total SSE 263.1496 5.735303 set1 SSE

209.0

set2 SSE

42.9

set3 SSE

10.1

Figure 2. Spreadsheet for the global fit to absolute zero using Solver and SolverAid. Absolute zero and its error are in bold print. The fit functions are shown in column E for clarity. The formula in cell D2 (shown in E2) is copied to D3–D5; the formula in D6 is copied to D7–D9; the formula in D10 is copied to D11–D13. Cell H8 contains SUMXMY2(C2–C13,D2–D13) which computes the SSE for the global fit. Solver minimized the value of H8 by adjusting cells H3–H6. Cell H9 contains SUMSQ(F2– F5) which computes the contribution to SSE from set 1. SolverAid produces the error information in column I. Cell I8 contains the standard deviation of the global fit.

the fit with the greatest slope should have the most precise x-intercept; accordingly the global fit produces a value of absolute zero that is strongly influenced by the data with greatest slope. To confirm this, notice that the weighted average of absolute zero values from the individual fits, where the weights are the squared slopes m2, yields ᎑276.10 ⬚C, in excellent agreement with the global fit value of ᎑276.12 ⬚C. It is possible to carry out a “merge” fit in which the parameters from the individual fits are “data” and they are “fitted” to the global parameters (13). The details of the merge fit are beyond the scope of this Journal, but there is one interesting point: a key element of the merge is the formation of a weight matrix that is assembled from the A matrices of the individual fits. Each A contains the reciprocal of the variance of the x-intercept, so the weight matrix provides the merge with the correct weights to find the same global value of absolute zero. Thus the “individual plus merge” LS procedure is completely equivalent to the global LS fit, yielding the same slopes and absolute zero. The equivalence of these two fits is a pleasing example of the self-consistency of leastsquares analysis; readers interested in details about the merge fit are invited to view the author’s Web site at http:// www.cs.moravian.edu/~csalter/TheMergeFit.html (accessed Jun 2003). Conclusion The transparency and simplicity of a fit to slope and xintercept makes this fit function the clear choice for determinations of absolute zero in gas thermometry. Since the method

is readily extended to multiple data sets, it provides an easy way to determine a global value of absolute zero when student data are pooled in a class or a lab section. In the physical chemistry lab at Moravian, students run the gas thermometry experiment on different samples of air, helium, and argon; the global fit makes it easier to reduce their large data sets and compare values of absolute zero for different gases. A subsequent paper will deal with the details of this experiment. Can the global fit be implemented in other application programs that students might use in the lab? Lotus and Quattro Pro contain Solver-like routines, so there should be no problem in these spreadsheet programs. MathCad has nonlinear fitting routines (14) and enough flexibility in programming that it should be possible to define local and global fit parameters. Programs with nonlinear fit routines such as KaleidaGraph and PsiPlot are “column-bound”; that is, fit functions are defined at once for an entire set of (columnar) data; however, KaleidaGraph permits boolean expressions in the fit function, and these can be used to change the function for different data subsets (see the “Bomb Calorimetry” example in ref 12). There may be reluctance to adopt this global fit for lab or lecture work in general chemistry; nonlinear fitting is still not the “push-button” technique that linear fitting is. But in many ways students are probably better served educationally by nonlinear fitting: it forces them to work with their data, to examine graphs, and to estimate parameters, and it reveals the essence of least-squares fitting. A time investment is required to use nonlinear fitting, but the cost of that investment has dropped enormously; it is time to reap the pedagogical benefits. Literature Cited 1. Strange, R. S.; Lang, F. T. J. Chem. Educ. 1989, 66, 1054. 2. Garrett, D. D.; Banta, M. C.; Arney, B. E. J. Chem. Educ. 1991, 68, 667. 3. Kim, M.-H.; Kim, M. S.; Ly, S.-Y. J. Chem. Educ. 2001, 78, 238. 4. Salter, C. J. Chem. Educ. 2000, 77, 1239. 5. Machuca-Herrera, J. O. J. Chem. Educ. 1997, 74, 448. 6. Harris, D. C. J. Chem. Educ. 1998, 75, 119. 7. Denton, P. J. Chem. Educ. 2000, 77, 1524. 8. de Levie, R. J. Chem. Educ. 1999, 76, 1594. 9. Salter, C.; de Levie, R. J. Chem. Educ. 2002, 79, 268. 10. de Levie, R. How to Use Excel in Analytical Chemistry and in General Scientific Data Analysis; Cambridge University Press: Cambridge, England, 2001; p 442. SolverAid can be freely downloaded from http://uk.cambridge.org/chemistry/resources/ delevie (accessed Jun 2003). 11. Wentworth, W. E. J. Chem. Educ. 1965, 42, 96. 12. Tellinghuisen, J. J. Chem. Educ. 2000, 77, 1233. 13. Tellinghuisen, J. J. Mol. Spec. 1996, 179, 299. 14. Zielinski, T. J.; Allendoerfer, R. D. J. Chem. Educ. 1997, 74, 1001.

JChemEd.chem.wisc.edu • Vol. 80 No. 9 September 2003 • Journal of Chemical Education

1035