Linear least squares treatment when there are errors in both x and y

Hsieh, Jordan P. Sand, Ahmet Ors, A. McLeod, and Paul Marshall. The Journal of ... R. J. Berry, Jessie Yuan, Ashutosh Misra, and Paul Marshall. The Jo...
0 downloads 0 Views 1MB Size
Linear Least Squares Treatment When There are Errors in Both x and y John A. Irvin' and Terry I. Quickenden Department of Physical and Inorganic Chemistry, The University of Western Australia. Nedlands. W.A.. 6009.Australia When a straight line is fitted to experimental x , y data by the method of least sauares (1-11) it is commonly assumed that t h e y value contafins all the error in each data pair. Unfortunately, experimental data does not always conform to this restriction and York (12) has presented examples in which the use of this assum~tioncan cause fitted slopes to deviate by as ) alsoshown much as 40% from the correct values. ~ 0 r k i l 2has that the artifice of averaging the results of a least squares fit f iof x on y, can lead to equally of y on x with the r e s u ~ t s ~ ofit large errors. Outside of the specialist statistical literature (9-11) there have been few attempts (12) to deal with this ~rohlem,and the literature of chemistry provides no practical advice on how t o handle leazt squarcs calculati~msunder these c~irrumstances.T h r purpuse ofthe present paprr is L I I delineate the circumstances under which the conventional linear least squares treatment can he used and to also present an algorithm i w carrying uut linear least 5quares treatments in nll case3 when significant errors exist in 110th x and ?. ~

~

Validity of the Conventional Treatment When Both Variables Contain Errors It is possible to list three cases2 in which the conventional linear least squares treatment can still he used even when there are significant errors in both x and y. If the fitted line is y = a b x , these cases are:

+

(1) when o:. >> b2a:,,where o,, represents the standard error in I. li,: or . . and n, rrprcwnrs the itandvd C m , r t21 u hrn n , n , i. 8cmar.m' lor ail dalrs p t i n t i . I : ur

+ hn,' 16im,~ntfwdl d n l ~pinlr.pnrv~ding~nthis ease, thit all points are assigned the s h e arbitrary error in? (i.e. that an unweighted least squares fit is applicable)'.

.I, when m -

ditions, or who do not wish to run through these checks each time a least squares calculation is carried out, an algorithm is now presented for the generation of a relatively simple program which can be used to fit a straight line to x , y data when the errors in x and y are in any ratio whatsoever. The algorithm is presented in flow-chart form and can readily be converted to a Fortran or Basic program by a person possessing moderate programming experience. The algorithm can also he programmed for use on portable programmable calculators possessing magnetic card memories. althouah the Dropram may need to he separated into two for smaller machines. A; an example, the authors have programmed the flow-chart using an HP 25 calculator. Each . . iteration required two separate p r & x n s and convergence was generally achieved in two iterations. - The algorithm has also been used for several years as a Fortran propram on a DEC 10 computer in both teaching and research-laboratories. No operating problems have arisen and

'

Present address: "Photocare," Belmont Plaza. Abernethy Rd.. Belmont. W.A.. 6104. Australia. The canventionaiheatment can. of course. also be aDDlied in the ~. r a m obvious case where most of the enor is c¢rated in x instead inequality in condidion ( 7) holds) simply of y(i.e.,when the reverse of by carrying out the fit using y as the independent variable. The treatment described by Worthing and Geffner( 13)in which a :, and at are both constant for all data points, is a subset of this special case. ii is unnecessary to use the special equations provided by these authors as the conventional treatment in which x is assumed to be -~~~ error-fme. aives solution in this case .. . ..the - correct ~ The proof of tnfsstatement lo fousquare simply froman inspection of the form of the equal ons gsven in conActton with the flow-chan. If W. is constant for all i,then wj may be taken out of the summations and cancelled out of the numerators and denominators of all the equations. ~

~

~

~

~

~~~

~~

~~

A General Llnear Least Squares Treatment

Unfortunately, the above conditions are not always met, and it is indeed rare for them to even be checked. In order to meet the needs of those whose data does not meet the above con-

~

~~

~

Equations for Use with the Flow Charta

Covariance

Correlation coefficient

2

( 2w,2&

2 w,2 W,X,Y, - 2w,x,2wiy, - ( 2 ~ , 4 ) ~ ) w,2Y?w, "~(2 - (2w,yW2

in which n is the number of data points, is abbreviated by 2 in both the flawzhan and in the above.

e,

Volume 60 Number 9

September 1983

711

treatment requires substantial programming expertise. An approximate solution has been provided by Deming (1) but is Input n still somewhat complex for the novice programmer. In the present work, we have used a Input (xi. yi) simpler, but mathematically for i = 1to n v to be fed in? equivalent (9) approximation. Here, b is kept constant in the denominator of eqn. (1) and s Yes Input (xi, yi, aYi),then evaluate is then minimized with respect wi = l/a;i for i = 1to n. only to be fed in? to a and b in the numerator. The new estimate of b thus produced is inserted into the Yes denominator and the process Input maximum number of iterations (xi, yi, Oyi)for i = 1 to n. reiterated until convergence is obtained. It has been shown by Powell and Macdonald (10) that even Set wi = 1 in a very adverse case, the apfor i = 1 to n proximate solution converges to give a slope and an intercept which each lie within a few percent of the exact solution. This is a very substantial improvement upon the performance of the conventional linear least squares method Evaluate the slope which can introduce (12) discrepancies as high as 40% in the slope and the intercept when the error in x is ignored. Were errors in both x and y fed in at the input step? The only significant deficiency in the present treatment is that it is possible in extreme cases for the standard errors in the slope and intercept to be too so far and the slope large by up to 40% (10). Occasional overestimation of the Evaluate the error in the slope and the intercept, the errors in the slope and intercovariance and the correlation Has the number of iterations cept is a small price to pay for exceeded the maximum? coefficient. the simplicity of the algorithm presented here, and normally this will not pose a significant problem. In situations where i = 1 to n . using.the new slope the intercept and slope errors I I must he known with a high degree of reliability, the reader Flow chart for the linear least squares Program. is referred to the exact treatment of Powell and Macdonald (10) which makes much more substantial demands on proample opportunity has been provided for cross-checking the gramming expertise and on computational facilities. output in this period. As physical scientists frequently require error estimates for the quantities derived from the slope and Acknowledgment the intercept of a least squares line, an important feature of J. A. I. gratefully acknowledges the tenure of a postgraduate the algorithm is the calculation of slope and intercept errors. research studentship from the Australian Institute of Nuclear T h e correlation coefficient also forms part of the output. Science and Engineering during the earlier stages of this work. Theory of the Treatment The problem of finding the line of best fit to experimental Literature Cited data is usually (1,8) solved by minimizing the weighted sum of the squares, s, of the deviation of the measured points from the fitted line. The quantity, s , is given by (61 Sands. D. E . . J C H E M EDUC..51.477 (IY74l. ( y i - n - bx;I2 i s ) (:hristian.S.D..Lane. E.H.. and Garland. F.. J. C M E M Eilllc., 51.475 11871l.

P Start

6

4

1

1

5 - 2

OF; + b2ol.

and is minimized with respect to the independent variables, the intercept and slope a . and b. which resnectivelv. remesent . u l the least squares line. T h e exact solution to this problem requires (10) a complicated numerical approach when a,, is non-zero. This type of 712

Journal of Chemical Education