Linear least-squares analysis: A caveat and a solution

Chemists are generally familiar with methods of least squares analysis; numerous articles in this Journal have discussed the applicability and limitat...
2 downloads 0 Views 2MB Size
Sherril D. Christian, Edwin H. Lane, and Frank Garland The University of Oklohoma Norman. 73069

Linear Least-Squares Analysis A caveat and a so/ution

Chemists are generally familiar with methods of least squares analysis; numerous articles in this Journal have discussed the applicability and limitations of linear and nonlinear least squares techniques. Unfortunately, several aspects of least squares, particularly in regard to the use of weights and error analysis, appear to be poorly understood by most experimenters. The present article deals with the use of weighted linear least squares analysis in fitting experimental results in which all of the independent and dependent variables are subject to errors of measurement. It is assumed here that a number of sets of measurahles (XI, Yl,Zl, . . . ; x z , Yz, zz, . . . ; . . . ; Y,, . . . ; .. . ; X., Y., Z,, . . . ) has been obtained experimentally, and that the experimenter has reason to believe that there exists a functional relation among these variables of the form y = ax bz . . . which would be followed exactly hy the observable~were i t not for errors of measurement. The problem of least squares analysis is to obtain the values of a, b, . . . which produce a minimum in the weighted sum of squares of residuals, given by

x,, zt,

+ +

that values of all the observables for the ith point are obtained repeatedly. Now, even if the true values of a, b, . . . were known exactly, the variability of Ry, would obviously be greater than that of Y4 alone. For small variations in all of the observables for the ith point, one may express the variation in Ryi as a,,, 6Y,- a & - Mz, - ... (4)

-

and assuming a random distribution of deviations in Y,, XI, Z,, . . . , with no correlation among the variances in the measurables, one may write that on the average 6R,,s = 6": + aVXa2 bZ6Z,1+ ...

+

This latter equation becomes OR",2 = aY .2 + a 2 ~ , , ~ b ' ~ ~ , ...~

+

+

(5)

when the squares of the standard errors or variances are substituted for the individual, 62 terms. Thus, the weighting factor to use in eqn. (2) for the ith data point is W, = 1 / 0 ~=, l/(uy; ~ a2aX,Z blaz,l ...I (6)

+

+

+

and the expression for S may be written The individual residuals (Ry,, R a , . . .) appearingineqn. (1) represent differences between the observed values of the variables and adjusted values conforming exactly to the bz . . . in which a, b, . . . assume their equation y = ax least squares values. A variational method which will not be described here is ordinarily used to obtain the least squares values of the parameters and the adjusted values of the observed variables (y~,XI, z,, . . .) from sets of the observables (Y,, Xt, Z,, . . .) (1-3). The comment should be made here, however, that the solution of the variational problem commonly given in statistical references is based on the assumption that the weights (wy,, WZ,,W Z C , . . .) are functions of the variances of the observable quantities but not of the parameters a, b, . . . . Thus in differentiating S to obtain the derivatives aS/aa, &S/ab, etc., the weights are treated as constants. An alternative form of eqn. (1) has been used in least squares analysis. In terms of the observed variables, one may write

+ +

S

=

Cw,(Y,

-

ax,

- bZ, - ...l2

(2)

where S is again to be minimized with respect to the parameters a, b, . . . . The weights, w,, are now necessarily functions both of the variances of the individual observables (ay,Z,az,Z,az,Z, . . .) and of the values of the parameters (4, 5). To see how W,should depend on the values of the parameters and variances, consider the following semi-intuitive development. One of the residual terms in eqn. (2), for the ith data point, is R,,=Y,-ax,-bZ;-

...

For those familiar with the X Z statistic, i t is worth noting that the value of S obtained by substituting the least squares values of a, b, . . . into eqn. (7) is expected to be on the averaee of freedom .. eoual . to the number of demees (i.e., the number of data points less the number of parameters). Therefore. if the individual variances in Xr, Y , and Z, are known, tables of the 22 distribution may be used to determine whether or not the least squares solution provides an adequate fit of data. Equation (7) is an interesting and convenient starting point for a numerical treatment of linear least squares. In attempting to determine the values of a, b, . . . which minimize S, the form of eqns. ( 6 ) and (7) makes it obvious that one cannot legitimately neglect the dependence of W, on the parameters unless one of the terms byi*, azuxiz, bZozlZ, . . .) is much larger than the sum of all the others for each value of i. If one of the terms is dominant, eqn. (7) simply reduces to the well-known form of least squares

(3)

Imagine now that sets of values of the ith data point (Y,, Xi, Z,, . . .) are measured over and over again; i.e., assume that the conditions of the experiment are fixed exactly (insofar as that is possible chemically and physically) and

'Note that the expectation values of Ry, are zero when the least squares values of a, b, . . . are substituted into eqn. (3);however errors of measurement will lead to a distribution of values of RvL having an average value of zero and variance o~,,2. a Assuming that there is a unique least squares solution which leads to an absolute minimum in S, it turns out to be immaterial whether we select A& Au or AJ as the dependent variable. If Az is chosen as dependent, the individual residuals in the numerator of eqn. (8) will all be multiplied by l/a, and the sum of squares of residuals by l/a2. However, the denominator becomes 1 + (11 aZ) + (bZ/az), which is just equal to the denominator in eqn. (8) multiplied by l/a2. This shows that fundamentally there is no meaningful distinction between the terms dependent and independent variable as applied to this problem, and that the final least squares fit of data will not be influenced by the order in which the variables are arranged in the regression equation. Volume 51, Number 7, July 7974

/

475

which annlies when only one of the ohservables (e.g., Y)is subject tierror. In the general case, however, it is not justifiable to ignore the variation of Wi with a. b. . . . in differentiating S to obtain a s j a a , a s j a b , . . . . Yet, unfortunately, this is equivalent to what is done in most standard treatments of the linear least squares problem. The procedure commonly developed corresponds to seeking a minimum in the function

by treating the parameters a, b, . . . as variables in the numerator while holding them constant in the denominator. The fact that iterative procedures are sometimesused, in which estimated values of the least squares parameters determined in one linear least squares cycle are employed to derive values of weights for use in the next cycle, does not surmount the basic fallacy in the method. The point of the foregoing criticism and the implied caveat to those who would apply conventional least squares methods can he clarified by considering the analysis of actual data sets. Tahle 1 consists of absorbance values determined by infrared spectroscopy a t three wavelengths and 9 different total pressures for trimethylamine-acetylene vanor mixtures (6). For purposes of the present discussion'it will be assumed that random errors &cur in absorbance measurements independent of wavelength or vapor concentration and t h a t the standard error in ahsorbance is 0.005 for all measured A values. It is assumed that the ahsorhances a t the three wavelengths should obey the linear dependence relation A1 = aAz + bAa. Results ohtained by applying the usual weighted linear least squares method to these data are given in Table 2. When the ahsorbance a t wavelength 1 (A1) is selected as the dependent variable and fitted as a linear function of A2 and A3 (in the form given above) the result is not even approximately equivalent to the solutions ohtained by fitting A2 as a function of A1 and A3 or fitting A3 as a function of A, and A?. Widelv - disnarate . values of S are obtained and the constants in the linear expressions are not algebraicalIv consistent from one choice of dependent variahle to the next. Clearly this is an unacceptable result in that i t indicates that there are three different "least squares" planes which are "best" fits and in fact none of the first three solutions in Tahle 2 approximates that which minimizes S as given by eqn. (7). In fitting the data given in Table 1 to the linear form A1 = aAz bAp the expression for S which is equivalent to eqn. (I) is2 C(A,, - aA,, - bAd2 1 , S =(8) 1 + a9 + b2 0.0052 ~

A

~~~

-

~

+

This function can be minimized with respect to a and b bv a nonlinear ontimization technique. Details of the procedure are not &en here, but it may be worth mentioning here that the general problem of optimizing functions of the type givenby eqn. ( I ) , with any number of paramet e n (a, b, c, . . .) and with variances which vary from point to point can be reduced to a one-parameter nonlinear optimization prohlem in addition to the usual weighted linear least-squares procedure (7). A computer program is available for minimizing eqn. (7) and calculating the least squares parameters a, b, . . . . The last row in Table 2 shows results obtained by correctly minimizing 5' for the ahsorbance data from Tahle 1.

476 I Journal of Chemical Education

Table 1. Absorbance Values Determined By IR Spectroscopy for 9 Different Total Pressures of Trimethylamine-Acetylene Vapor Mixtures Wavelength 1 (1954 c m - ?

Wavelength 2 (1976 c m ?

Wavelength 3

Run No. 1

0.018

0.026

0.072

Table 2.

(2312em-')

Results from The Usual Weighted Linear Least Squares Treatment for Data In Table 1 (Ai = Absorbance At Wavelength i)

Dependent

variable

ine ear east Squana Result A,

A,

=

0.646 A1

+ 0.091

A8

proper minimization gives: A l = 1 . 5 5 0 A? - 0 . 1 6 6 A a

S (em. (7))

7.89

5.66

The value of S obtained when one allows for the variability of a and b in the denominator of eqn. (8) in this case approximates most closely the value obtained with the conventional least squares method when A2 is fit as a linear function of A1 and A3. However, the correct least squares solution leads to values of the parameters which are considerably different from those obtained in any of the conventional least squares solutions. There appear to he many cases in which one of the conventional least squares solutions (with a particular observable used as the dependent variahle) will he approximately equivalent to that ohtained by the more general (and we think preferable) method given here. But, as in the case shown above, there are also instances where the least squares fit and minimum value of S obtained by minimizing eqn. (7) are considerahly different from those obtained with standard methods. Therefore, if standard techniques are employed in curve-fitting, it is advisable that each variahle in turn be chosen as dependent, and that all of the possihle solutions be examined before reporting least squares fitting parameters. However, even if this procedure is followed, it should he realized that the best of the conventional least squares solutions may not be in satisfactory agreement with the solution which minimizes eqn. (7). Acknowledgment

E. H. L. wishes to exnress his anmeciation for a aradua t e fellowship from the 'National science ~oundation.The authors also wish to thank Dr. Eric Enwall and Dr. Bradford Crain for many helpful discussions Literature Cited 11) Derning, W. E.. "Statistical Adjustment of Data." Wiley and Sons. Now York, New York. 1913. 121 Spxent, P.. "Modcir in Regression and Related To~ic%"Mathuon and Co.. London. 1969.

(31 Wolhen, J. R.,"Prediction Analysis," Van Noatrand Co., Princeton. N.J.. 1967. (41 Referencell). pp. D4ftandp. 145. 151 Reference 12). chapter 3. (61 McNutt, W., MSThesis. the University ofOklahoma, 1974. (71 Lane. E. H.. Christian. S. D., and Garland F., unpublished m r k