Rigorous least squares adjustment: Application to some non-linear

paper. Rigorous Least Squares Adjustment. Application to some non-linear equations, I. Limitations of the Usual Presentation to. Undergraduates. The p...
0 downloads 0 Views 7MB Size
W. E. Wentworlh

University of Houston Houston, Texas

Rigorous Least Squares Adjustment Application to some non-linear equations,

The importance of statistical treatment of data is generally recognized. This is especially true in recent years as evidenced by the increased use of statistical analysis in the literature and recent texts on the subject as applied to each of the branches of science. For example, in chemistry a wen-known book by Youden (1) presents many aspects of the treatment of data. in a manner which readily can be utilized by chemists. More recently, a text was published by Young (8) in which the subject was presented in such a manner that it can be used conveniently within a course on the sophomore, junior, or senior level of the undergraduate curriculum. Articles (3--5) discuss the Gaussian published in THIS JOURNAL distribution, and describe two experiments on the subject for physical chemistry. If there has been any hesitancy on the part of investigators to use statistical analysis, it probably has stemmed from the rather time consuming and tedious calculations which are involved. This is especially true in least squares curve fitting where the function relating the variables is complicated and/or involves several parameters to be determined. However, in more recent years with the advent of high speed computers this task can be eliminated. Considerable effort may be required initially to set up and check out the program, but in situations where data from a large number of experiments can be analyzed by the same program, the initial work is more than justified. For example, in this laboratory a large number of experiments involving equilibrium measurements and subsequent evaluation of thermodynamic parameters is underway. I n these cases most of these equilibria can be expressed by the same functions. Hence the same computer program can he utilized numerous times. I n the future we will most likely find an even greater use of electronic computers in analyzing data and subsequently a greater interest in the application of more refined techniques. For this reason, it is essential that we give our students an adequate background in statistical analysis. The author believes that this is most essential in physical chemistry in the general problem of curve fitting where a wide variety of functions of varying complexity are encountered. A general treatment of the least squares solution is presented by Deming (6). The specific case of curve fitting with parameters has been extracted from the general theory of Deming so that it can be understood more easily by an undergraduate student. An alternative yet simpler derivation is given also along with recent suggestions to enhance the convergence of the iterative process. A condensed summary of the necessary computations 96

/ Journal of Chemical Education

I

followed by the application to two functions encountered in physical chemistry will complete this paper. Limitations of the Usual Presentation to Undergraduates

The physical chemistry texts available discuss to some length the treatment of experimental data within the subject of least squares. They also present quite adequately the Gaussian distribution and its application to the analysis of a straight line or possibly a general polynomial. Of course the majority of the functions encountered in physical chemistry are not linear or polynomial and a least squares analysis with this background does not allow the student to solve these problems. Frequently the statement is made that various functions can be put into a linear relationship between various combinations of the variables. However, this statement is seldom followed up by an appropriate discussion of weighting the observations or data points. I n the process of making a linear function in terms of some new variable, the new variables may take on some rather weird relationships with respect to the observed variables. As a consequence the new variables derived from the original observations rarely have equal weights. It is therefore obvious that a discussion of weighting observations is essential in order to obtain the correct least squares solution. A quote from the text by Deming (6) (p. 195) is appropriate to this discussion concerning the least squares adjustment of an exponential function: "It is customary among computers to fit the exponential equation by taking logarithms and treating it as linear in log y and z,but it is not so usual for them to change the weightings to correspond to the logarithms. The neglect of the factor ( 2 . 3 0 ~ ) ~ not only distorts the results for a and b, but also invalidates the reciprocal matrix and all calculations made with it on the standard error of a function of the parameters. y = ae"

. .".

The importance of this statement will become more apparent a t the end of this paper. In a recent laboratory text in physical chemistry (7), the expansion of a function in a Taylor's series about the parameters prior to the least squares adjustment eliminates the need of weighting observations in many situations. It is assunled that the function can be solved for the dependent variable and that the observations of the dependent variable are of equal weight. However, if the weights are not equal then we are back to the same problem as before. Furthermore, it is inherently assumed that the errors in the dependent variable predominate over the errors in the independent variables. Situations where this is

true are not always obvious, especially to an inexperienced person. Nevertheless, it should be stated that this presentation is a very significant extension over other treatments of curve fitting in that many complex functions containing several parameters can he adequately evaluated on the principle of least squares. The desirable feature in this treatment is the resulting linear relationship with respect to the parameters. Upon closer evaluation one will notice that the least squares calculations are simplified significantly. A Taylor's Series expansion will be used in this paper except that a more general least squares treatment will be carried out. The purpose of this paper is not to criticize the present laboratory texts. I n general the texts give an excellent presentation of the over-all problem of treatment of data. This is especially true when one considers the vast amount of material that must be covered in such a text and the space that can be allotted justifiably to each topic. The intent of this paper is to bridge the gap between the background normally acquired in the undergraduate curriculum and more advanced matrix treatments of the general problem of least squares. If the student, after graduation, should encounter a more complex problem in curve fitting in his research work, it is hoped that this paper will be helpful. Importance of Systematic Errors

Before delving into the problem of least squares adjustment of data, it is appropriate to reiterate the topic of systematic errors and the importance of properly accounting for them before attempting to take into account random errors. All too frequently a systematic or bias error has been overlooked in an experiment and as a result the residuals in the least squares adjustment to some function are distorted from their original distribution. As a consequence, under such circumstances the method of least squares becomes somewhat meaningless since the sum of the weighted squared residuals has a different relationship to this new distribution with the bias error superimposed. If this bias or systematic error is great enough, the time expended in carrying out the least squares adjustment could be completely fruitless. Possibly a very simple example involving the use of the well-known Beer-Lambert absorption law will clarify this point. The function of concern is A

=

-1ogT

-

el

(1)

where A = absorbance, T = transmittance, = molar extinction coefficient, c = molar concentration, 1 = path length in centimeters. I n order to carry out the absorption measurements it is necessary to use cuvettes and generally these will not have identical transmittance characteristics. Assume the cuvettes have the same path length within a tolerance of an order of magnitude significantly less than the precision of the absorption measurements. Equation (1) may be rewritten for this practical situation as: A = 4ogT

=

re1

+b

Offhand, one may be tempted to evaluate the constant b by Jilliug both cuvettes with solvent (c = 0) and the absorbance reading is then b. Then this value could be subtracted from successive absorbance readings for corresponding concentrations and proceed to carry out the least squares solution considering ( A - b ) and c as observations and r as the parameter to be determined in the equation. ( A - b) = d (3) However, in general this would be incorrect since b is not known exactly, but is only estimated from the absorbance measurement with solvent in the cuvettes. The least squares solution of equation (3) in which the curve is forced to go through the origin will distort the residuals of the original observations as can be seen in Figure 1. I n this case assume the errors in the A measurements far exceed those in the c measurements. Note that the residuals using equation (3) are not only larger than those using equation (2), but they also are not randomly distributed about the least squares estimate of the function. I n such a situation the use of a least squares adjustment is basically incorrect since the individual residuals do not belong to Gaussian distributions. From this simple example, it is apparent that systematic errors, such as the difference in absorbance of the cells, must be properly taken into account before considering the least squares adjustment. Many times this may be extremely difficult to do and may actually require more effort than that required to carry out the experiment. However, it is imperative that the student he aware of such problems in the treatment of data.

(2)

where b = constant which accounts for the difference in absorbance of the cells.

Figure 1. Hypothetical graph of absorbance measurements versus concentration: dashed line s h o w ieort rquorer fit to equation (3); solid iinerhors least rquorer fit to equation (21. 6b = difference in obrervation d b and the least squores estimote "ring equation (2).

Basis for Least Squares Adjustment of Data

Assume that the errors of each of the observations of Volume 42, Number 2, February 1965

/

97

a given quantity belong to a Gaussian distribution, commonly expressed as

where us = standard error (also called standard devisr tion or root mean square error), us2 = variance, T I = true residual of x = (observed value - true value). The probability of observing the ith residual in the dr, is region r , to r ,

+

Since the probability of obtaining a given set of n observations is simply the product of the probabilities of each of the ith measurements,

Now based upon the principle of maximum likelihood, the probability becomes a maximum when the sum of the squared residuals becomes a minimum.

z*‘)

=

minimum

(7)

Hence, the origin of the term least squares is apparent. I n the above discussion it is assumed that all measurements arise from the same distribution. However, in the more general situation where this is not true, equation (5) should be written as

Equation (6) then becomes

and the least squares principle then becomes

If the weight of an observation is defined as a quantity inversely proportional to the variance,

where uo2= variance of unit weight (an arbitrary constant), then the principle of least squares i s

i.e., the sum of the weighted squares of the residuals is made a minimum. I n actual practice, of course, the true value of the quantity x is not known. However, the principle of least squares attempts to adjust the estimate of the parameter such that W*'V,:

=

minimum

(14)

where R is the least squares estimate of the true parameter. 98 / Journal o f Chemical Education

The general problem of least squares adjustment of data with uncorrelated errors is thoroughly presented by Deming (6)). One section of this text is concerned with the specific problem of curve fitting to a single function containing parameters. I n physical chemistry and generally in chemistry as a whole the adjustment is frequently concerned with a single function containing one or more parameters to be determined (or a system of equations which can be reduced to a single function). For this reason Deming's general treatment as applied specifically to curve fittings with parameters will be presented in this discussion. Under this restriction of Deming's treatment can be simplified considerably. Deming's nomenclature has been followed closely so that the reader can refer back to the original work. It will be assumed that pairs of observations (xt, y,) are obtained of the variables ( 5 , ~ with ) a single function (F) relating these variables and which contains in addition three parameters (a,@,?)for which we wish estimates. The extension to more than two variables and more than three parameters will be obvious when the normal equations are developed. The treatment of the problem considering a varying number of variables (e.g., x,y,z) and a variable number of parameters is facilitated by the use of matrix algebra. Generally undergraduates taking physical chemistry have not been exposed previously to this level of mathematics. For reasons of clarity and simplicity this treatment will he concerned with a fixed number of variables and a tixed number of parameters. Suppose a function relates two variables 6 7 and three parameters, a,@,?. Furthermore, assume that n pairs of observations are made of the variables which will be designated by (x,,y,), i = 1, 2, . . . . n. Since n is a finite number it is not possible to evaluate the true variables ( f , ~ and ) the true parameters (a,@,?). However, an estimate of these parameters may be found which will be designated by (a,b,c) based upon the criteria of least squares of the observations (st, yJ. Designate the adjusted or calculated values of the variables by (&, g,) where i = 1, 2, . . . n and the residuals of the observations by

The least squares problem can now be stated mathematically as a desire to obtain a minimization of the sum of the squares of the weighted residuals S

=

. + WVjV,;)

z ( W=i V,?

=

minimum

(16)

under the restriction that the condition equations

(13)

where Vz,is defined as the residualin X; i.e.,

v,, = (xi- H)

General Least Squares Adjustment to a Single Function with Uncorrelated Doto

be satisfied. The weights W , , W,, are defined in equation (11). uo2 is generally selected so that the magnitude of the weight is convenient. The solution of this problem is simplified considerably if the condition equations are linear with respect

to the xs ys a,b,c. This can be accomplished very simply by expanding the function in a Taylor's Series about the point (xi, ys an, bo, co) and truncating the series after the first order terms. It is assumed satisfactory first approximations to the parameters can be obtained by graphical means or a numerical solution of a system of equations (three equations in this case) with a selected set of the observations (three sets of x,,y, in this case). Represent the difference between this first approximation to the parameters and the least squares estimate of these parameters by

The Taylor's Series expansion of the condition equations (17) then becomes Fiiri, &, a,b,e) = R 0 ( x i , yi, ao, bo, 8 ) - F='V,, - Fv4V,; F,Aa - F b < ~ b- F,Ac = 0 i = 1,2,. . .n ( 1 9 )

+ +

There are now n 3 differentials remaining in equation (24). However, it was previously decided that exactly this number, n 3, of the differentials were arbitrary. Therefore it may be concluded that since the remaining n 3 differentials in equation (24) are arbitrary the only way in which equation (24) can be specified is that the coefficient of each differential be equal to zero; i.e.

+

+

Equations (20), (25), (26) have 3n 3 equations and 3n 3 unknowns, namely (V,, , . . V,, V,,, . . . V,,, Aa, Ab, Ae, A, XP, . . . .Al). Equations (25) can be solved for the residuals

+

The usual nomenclature for the partial derivatives has been employed

Fm