Applying a simple linear least-squares algorithm to ... - ACS Publications

Citation data is made available by participants in Crossref's Cited-by Linking service. For a more comprehensive list of citations to this article, us...
1 downloads 0 Views 2MB Size
the computer

edited by

RUSSELLH. BAT Kenyon College Gambier,OH 43022

Applying a Simple Linear Least-Squares Algorithm to Data with Uncertainties in Both Variables Paul J. Ogren and J. Russell Norton Earlham College Richmond, IN 47374 There are cases in which it is desirable to find a n optimum linear least-squares fit to data with significant uncertainties in both the x and y variables. Solving this problem also permits %" and "yD data sets to be treated on an equivalent interchangeable basis, which is a n important for example, for calibration graphs [ I ) . Ahhough other approaches to this problem have heen presented by a number ofauthors in thrs Journal t251.a oarticularlv useful algorithm, which is readily adap&blL to spreadsheet analysis or to simple programming, has been developed by Williamson (6). This method uses a n iterative procedure to determine the intercept a and slope b parameters that will minimize the following sum.

This is a linear approximation to a more complex relationship (6, 8). 5. Using the new value of b, repeat steps 2-4 until satisfactory convergence is obtained.

6. Determine a from

- -

a+bX=Y

(5)

7. Finally, determine the variance in b and a, using Williamson's formulas:

where Q is defined by

in which and

a,

where YJ denotes experimental data paira, and (xi, yJ denotes the "true" values of these data pairs. The weighting factors uiand vi are the variances of xi and yi where

The Williamson Approach The Williamson approach proceeds in seven steps. 1. Determine estimates of a and b hy a standard, weighted least-squares calculation in which uncertainties in the Xi values are assumed ta he zero (7). 2. Use the slope value b, with the ui and uj values, to calculate a weighting faetor for eachpoint.

3. Determine the weighted average for the Xi values:

Examples The examples1 shown were evaluated with the Microsoft Excel 2.2 spreadsheet pmgram and with a program written in ZBasic 5.0. The latter has the advantage that it easily generates plots with ermr bars. I t should be obvious that the error analyses given here pertain only to random sources of uncertainty, not to systematic errors. Table 1and Figure 1show data from a physical chemistry lab in which students measure the Joule-Thomson coefficient for a gas. This is done using a modification of a published experiment (9) in which gas flows through a n insulated medium-uorositv elass frit. Voltaw differences. ApV, between two t'henn&ouples above and'below the frit are measured as a function of aP across the frit. The slooe of the graph yields a value for l r =~(ATIAF' ~ IH. The nonzero -~~ --.-, - ~ ~ lame 1. J o u l e nomson ~ txpanslon uara ror r;u2

- ~.. . .~-.~ ~

and for the set of differencesbetween each Xj and?: Do the same types of calculations for the Yi values 4. Calculate q = Wi(uJ';

x (kpascals)

150.0 140.0 130.0

~

~~~

~

~

estimated ox

~~

-~~~~ - . - ~ ~ -y (microvolts)

1.0 1.0 1.0

+ bujYj)

Then determine a better slope b approximation from

'

An Excel spreadsheet with supporting macros to carry out the Williamson algorithm was used for the calculations on a Maclntosh-Plus system. Graphs were done using a ZBasic version of the program. The spreadsheet,macros, and BASIC program (in application and lisl form)are available on a 3.5-in. disk from the authors for $3. This disk also includes the data for the examples shown, a useful test data set used by Williamson and York (6, a),and some documentation.

A130

Journal of Chemical Education

Slope = 0.470 i 0.014(10)pvons k~a-' Intercept = -9.04 i 1.69(1o)pvolts Sl(N-2) = 0.246

62.0 57.0 51.0

estimated oy 1.O 1.O 1.O

JOULE-THONSOH CRRBON OlOXlOE

p 20 t

Figure 1. Joule-Tnomson aala lor C02 at 22 'C. Copperanstantan thermocoJple vollage d fterences are plotted as a function of pressure d fferencesacross an insu aled frsnedosn. Error oars in 20 ~nils are shown for both variables at each_point.The slope converts to a value of w =1 . 1 9 i 0.04(10)K atm . intercept is probably due to slight variations in composition of the thermocouple junctions. The pressure uncertainties are significant in this data due to the use of an inexpensive gauge with a small scale marked in l-kPa intervals. The second example shown concerns the spectrophotometric determination of K, for mesitylene complexing with iodine in carbon tetrachloride (10, 11).

This system is analyzed by measuring the absorbance A due to the complex over a range of concentrations and by then constructing the following plot (10).

where

and b = l-cm pathlength. Table 2 and Figure 2 show results based on eq 9, which can he used to determine K and E. The spreadsheet approach is particularly useful in this case because it can be expanded to systematically estimate uncertainties in the %" and 'y variables of eq 9. These are obtained from estimated errors inA and in the volumes of the stock solutions used to prepare the mesitylen+iodine mixtures. For the exam~les.the a values for slooes and interce~ts determined by h e ~illiamsonmethod d k e r by about ib% from values determined bv a v-error-onlv weighted leastsquares analysis. Both methois give o v&es chat are 2-6 times higher than those determined from unweighted IeasLsquares analysis for these examples. The slopes and .

.

-

.

~

.

~

.~ -

.~~~ ~

Table 2. Data for the Mesitylene-Iodine Equilibrium

Figure 2. Absorbance(332 nm)-wncentration data for mixtures of l2 and mesitylene in CCI, solution at 21 'C. The error bars are 20 units. The derived values of %and E,,,.,(332 nm) are given in Table 2 and may be compared with ref 11. intercepts differ by less than 1% &om those calculated by the simpler methods. However, for data that is more scattered, the discrepancy in these terms will be greater: almost 30%for the slope and intercept in the test set given by York and Williamson (8, 6). Variance expressions for the slope and intercept, eqs 6 and 7, are &en multiplied by S K N - 2), where S is given by eq 1to give a "standardized" variance. Williamson notes that this procedure, which amounts to assuming that the individual variances in X and Y are off by the same constant factor, is not likely to be correct (6). If the error estimates for the individual x and y data are reasonable, then the calculated standard deviations asand a h must be good estimates of the uncertainty in thea and b values. At the same time, deviations of the experimental data from the optimum line should give a value of S / ( N - 2) in the neiehborhood of 1.the exoected value of r," 2 / I N - 2) for the fitGd data. The low values of S I ( N - 2) for the data in Tables 1and 2 indicate that the points are "better" than might be exoected on a statistical basis. This mav not be too suorisine since some subjective selection o ~ e bceurs n when abartiel ular data set is chosen for presentation. Error estimates can also be obtained by an analysis involving x2 + 1contour lots (12).If that method is used. the estimated errors in a and b will differ from those repo&d here by the factor of ( S l ( N - 2))'".

-

Acknowledament

PJO recewed sabbatical support from the Chemistry Division and the Division of Educational Proerams at Argonne National Laboratory for much of t& work. We thank Jan P. Hessler of Argonne for pointing out the importance of the Williamson paper. We also thank Laurence E. Strong for his assistance in developing the computer program. Literature Cited

[Mesitylen el

[Iodine]

A

x of eq 9

est. y of eq 9 est. ax 62 15 11 9 8 8

UY

1.690 7.817E-5 0.369 4721 2793 38 0.9218 2.558E-4 0.822 3213 3486 18 0.6338 3.224E-4 0.787 2441 3851 19 0.4829 3.573E-4 0.703 1967 4074 21 4224 23 1647 0.3900 3.788E-4 0.624 0.3271 3.934E-4 0.556 1414 4321 26 Slope = -0.466 f.0.012(10)M' (=-Ids) Intercept = 4990 30 ( E ~ S X= 1.O7 +0.03(10) x lo4 M-'C~-' at 332 nm) S(N- 2 ) = 0.049

+

1. Miynek, P.D. J. Cham. Edm. 1981,68,180. 2. Kalantar, A. H . J Cham. Edue. 1981,61,28. 3. Christian, S. D.; Tucker, E. E. J. Cham Educ, 19.94,61.788. See also Christian, S. D.; Tucker, E. E.Amr Lab., 1982,14,3641. 4. h n ,J. A ; %ckenden,T. I. J Chem. Edvc 1983,60,711-112. 5. Christian,S. D.; h e , E. H.;Garland, F.J C h . Educ. 1874.51, 415-476. 6. Williamson,J.A.Con. J. Php. 1968.46, 1M&1847. 7. Bevington, P R. Data Reduction end E m r Analpis for the Physlml Seirnops; Mffiraw-Hill: New York, 1969: p 107. 8. York, D. Con J. PAP. 1966,44,107+1086. 9. Shoemaker, D. P.; Gsrland, C. W E%perimnlsin Phyaiml Chemistry. 2nd ed.; Mecraw-Ha: New York, 1967: pp 5 3 4 1 . 10. Daniela, R: Albelty, R.A,; Williams, J. W.; Camwell, C. D.; Bender. P.; Harriman, J. E. Eqetimnfal Physical Chemishy, 7th d.; Mffiraw-HIiL New York, 1970; pp 106112. 11. Andrews, L.J.; Keefer,R. M. J . A m Cham Soe. 1963,74,4500-4503. 12. Meyer. S. L. Doto Anvlysie fw ScienfisfsnndEn@mm; W h y : New York, 1975; pp 3-75,

Volume 69 Number 4 April 1992

A131