When, why, and how to use weighted least squares - Journal of

When, Why, and How to Use Weighted Least Squares Robert de Levie Georgetown University. Washington. DC 20057

The ready availability of computers, often incorporated in modern instruments, affects the type of experimental data collected. With much of the tedium of manual data-taking gone, there is often less hesitance to acquire a large number of sequential experimental data. The larger the data set, the more attractive it becomes to reduce it, subsequently, to its essential parameters by statistical analysis; such a procedure promises the advantage of decreased sensitivity to random experimental errors. There are two rather distinct approaches to data acquisition. One is to collect a relatively amall number of data, and to repeat each of these a sufficient number of times so that their standard deviations can be calculated and can then he used to assign indiuidual weights to those data points. This is, in general, the method of choice because no assumptions need be made about the precision of the various data points; such precision is often found to he nonuniform in a given data set. Here we will focus on the often-practiced alternative of taking a single set of closely spaced data, in which case insufficient information is available to justify assigning separate weights to the individual data points. Nonetheless, a general, global weighting may still be required, viz. one dictated by any transformation used to adapt a given mathematical model to the mold of a polynomial, the common starting point for a noniterative least-squares analysis. Failure to do so can lead to a false choice between less than optimal results or a needlessly cumbersome, iterative procedure. Unwelghled Leas( Squares

The least-squares analysis is based on the observation that, often, random experimental errors closely follow a Gaussian distribution. (This distribution is so common that it is also referred to as the "normal" one.) In that case, minimizing the sum of the squares of the deviations between the data and the assumed model should vield the best estimate of the model parameters. A second assumption is often made as well. viz. that virtuallv all of the random ex~erimental errors are concentrated one measured while the other(s) can he considered to be free of such errors. When we sample, say, the optical absorbance of a transient species as a function of time, then the accuracy and precision of the timer used will usually he much superior to that of the ahsorbance, in which case we are justified in considering random errors in the ahsorbance only. (We are not dealing here with systematic errors, which canoccur in bothparameters and are usually much harder to catch.) In what follows, we will make both assumptions, namely that the only random errors to be considered are concentrated in one parameter, here denoted by Y , and that such errors are adequately described by a Gaussian distribution. The ProporrtionaliiyY

= AX

We consider a set of i data pairs Xi, Y;which must be fitted to the proportionality Y = AX where A is, as yet, unknown. Assuming that all experimental errors are concentrated in Y, we write for the deviation Y between the experimental Y value and that calculated from A times the corresponding X value

10

Journal of Chemical Education

AYi = Yi - AX,

(1)

where i denotes the ith data pair. Note that the proportionality constant A is not yet known; in fact, the whole point of the present exercise is to extract avalue for A. We do that by requiring that A he chosen in such a way that the resulting sum of squares of the individual deviations AYi will be minimal, so that its derivative is zero,

Combining eqs 1 and 2 and interchanging summation and differentiation, which is permissible because the derivative of a sum of terms is equal to the sum of the derivatives of the individual terms, yields

=

1-2xi(Yi - AX,) = -2 CXiY, + 2A CX: i

i

so that

Cxiy, A

=

L

1x: i

which is the sought formula with which to calculate A. The Linear Relation Y = AX

+B

The errors are now given by

+

AY, = Yi - (AX, B)

(5)

and the sum of squares of AY; must he minimized with respect to both A and B, so that

and

where the last term on the right-hand side of eq 7 derives from the presence of N data pain, so that 4 2 8 = 2BN. Combining eqs 6 and 7 yields

with

which constitute the basic relations for a linear regression. The Parabolic Relation Y = AX2

+ BX + C

In this case we have AYi = Yj - AXi2 - BXi - C so that

This example shows how theleast squares method can readily he extended to polynomial expressions in X . Transformatlons and the Need for Trandormatlon-Related Weighting

The range of least-squares data-fitting methods can he expanded far beyond that of polynomial expressions by the simple expedient of transformations. Let us consider an irreversible first-order reaction whereby a compound reacts with rate constant k. We will assume that the reaction is initiated a t time t = 0 by a change in, say, temperature or pH, and that the concentration of the reacting species is monitored by measuring the corresponding light absorption y as a function of time. Under these conditions, y is given by y = yoe-kt

(17)

Because y is measured as a function o f t , eq 17 contains only two unknown parameters, yo and k. Even though i t does not B, i t can easily he resemble the linear relation Y = AX transformed into such a form by taking natural logarithms,

+

Iny = Inyo-kt

(18)

Y=AX+B

(19)

which we rewrite as

from which the coefficients A, B, and C can he obtained as

where Y = In y, A = -k, X = t, and B = In yo. This transformed eauation can. of course. he fitted to the linear relation, using eqs 8and 9. In doing so, however, wr are n~inirnizine thesumofthesauaresof thedeviations in Y = In Y rather t h i n those in the measured ahsorhance y. In the i r e s e n t example, the errors in y are likely to follow a Gaussian distribution, hut those in In y are definitely not, since taking the logarithm tends to compress the high absorbance values while expanding its low values. This defect can, fortunately, he remedied by using a globally weighted least-squares method instead. The usual unweighted linear regression of eq 19 will mini(AYj)Z whereas we want to minimize xi (AyJz. mize Assuming (as we must for any valid application of leastsquares methods) that AYi and Ayi are relatively small, we can approximate the relation between AY; and Ayj as AY, dYj dlny, 1 -=-=-=(20) A Y ~ dyi dyi Y;

xi

Consequently, we can validly use a linear regression on the transformed data provided we minimize not AYi hut, inVolume 83

Number 1 January 1986

11

stead, Ayj = yiAI'j. This can he achieved by assigning a weight Wj = yi2to (AY?, and by minimizing the so weighted sum of squares of the deviations, = Cy$Yi)'= (AyJ2 Wi(aYJ2= Y?(AY~)~

(21)

In general, then, when experimental data yi are converted into transformed data Yj for subsequent use in a leastsquares analysis, one should introduce a weighting factor Wj given by

The general incorporation of such global weighting factors, based on the transformations a ~ o l i e dto the data rather than on their individual standard d&iations, is quite straightforward. For example, instead of eqs 6 and 7 we now write

These are the properly weighted least-squares expressions for this particular case. They differ from the corresponding unweighted expressions and will usually yield more reliable estimates of k and yo.

Enzyme Kinetics The initial rate u of a noncooperative enzyme-catalyzed reaction involving a single substrate is given by ( I )

where s denotes the substrate concentration, V the maximum rate, and K the so-called Michaelis constant. Inversion of eq 30 converts it into the linear form

which is the basis for the double-reciprocal or LineweaverBurk plot (2). Clearly, eq 31 can be made t o be identical with eq (19) by setting so that Since the dominant random experimental errors are, typically, concentrated in the measured initial rates u we have, according to eq 22

so that eqs (25) and (26) lead to

When all weighting factors W; are set equal to unity, eqs 25 and 26 reduce to eqs 8 and 9, respectively. First-Order Reaction Kinetics We now return to our example of an irreversible firstorder chemical reaction, monitored by recording the time course of the absorbance y. We anticipate that the data pairs t;yj will fit eq 17 and can be transformed into eq 19. This transformation automatically implies a weighting factor

quite apart from any additional weighting of the individual data points which may he required. Equations 25 and 26 then yield

from which Vis obtained hy inverting eq 35 while division of eq 34 by eq 35 yields 12


the random errors from u. Therefore, eq 42 requires a more sophisticated form of least-squares analysis, which, when applied, indeed yields the same values for K and V. However, from the point of view of least-squares analysis, there is no advantage in using eq 42 over either eq 31 or 37. We note here that the need t o use proper weighting was clearly emphasized by Lineweaver et al. (2) but has often been overlooked by subsequent workers. Equation 30 can also he written in the alternative linear form (3) K + i -= -

"

V

V

which is of the form of eq 19 if we set

Given that the dominant random errors are in u. the weighting factor in this case follows from eq 22 as

so that application of eq 25 results in

The Polarographic Current

As our final example, consider the polarographic current which, in general, consists of two components, a faradaic current associated with oxidation or reduction processes, and a charging current resulting from the changing area of the growing mercury droplets. Often, the time dependence of the faradaic current can he described by the IlkoviE equation (5) while the charging current has a time dependence given by (6) i, = bt-'" (44)

The coefficient a in eq 43 is directly proportional to the concentration of the electroactive species, and the faradaic current usually dominates when the concentration of the electroactive species is sufficiently large, say, M. In order to extend the sensitivity of polarography into the region where iris no longer much larger than i,, the current can be measured as a function of time, and subsequently analyzed as (7) it"3

= ,,t112

+b

(46)

+

which is of the form Y = AX B where Y = it1l3and X = t1l2, while A = a and B = b. Thus, dYldi = t113 so that W = 1-", hence eqs 25 and 26 yield

which is the very same result as found in eq 35, while eq 26 now yields

where, for the sake of simplicity of notation, we have deleted the indices i. The parameter a in eqs 43 and 47 contains all information of the faradaic current except its time dependence, while the parameter b of eqs 44 and 48 contains the equivalent information about the charging current. We note here that the very same results for a and b, eqs 47 and 48, are obtained when eq 46 is replaced by its alternative form, it-116 = a + bt-112, with Y= it-116, X = t-112 W = tlb, A = b and B = a, just as eqs 31 and 37 led to identical expressions for the Michaelis-Menten parameters. Conclusion

which aerees with ea 34. Thus these two seeminelv senarate approacies, based o; the Lineweaver-Burk and ~ k e s p l o t s , resoectivelv. vield identical results orouided that the orooer Leightiigfactors are app1ied;'the reader can readily verifv that identical results are not obtained when. instead of eq"s 33 and 39, weighting factors of unity are used, which would corresoond to unweiehted least sanares. Finally wenote that a third popular firm ( 4 ) , u=V-Kuls

(42)

cannot be handled by the simple least-squares approach discussed here since, now, both Y = u and X = u/s contain

The above examples illustrate how straightforward i t is to use weighted least squares. The examples given were all based on the linear regression, i.e., on a fit to the linear relationship Y = AX B, and therefore used eqs 25 and 26 together with the result for the weighting factor, eq 22. The latter is equally valid for the proportionality, the parabola or any higher-order polynomial. Least squares analysis usually provides a most efficient way to fit experimental data to an appropriate model. It should be realized, of course, that the correctness of the mathematical model is presumed, and that systematic errors are not considered in the analysis. Within that framework, the method as discussed here in its simplest form, i.e., a Gaussian distribution of random errors in only one of the measured parameters, is often applicable. The method is

+

Volume 63


13

Table 1.

Some ExamolesaofEauatlons that Can Be Transformed lnto the SlmDle Prowrtlonalitv Y = AX

function

Y=

y=ax y = B/X y = ax'

Y Y Y

Y = B ~ X

Y Y Y

A=

X=

1

X'

a a a

1 1 1 l-y2 l-y2

x l/x

In x

B

sin x 6i"'~

a

arcs111y arcsin y

x X'

a a

y=eU y= y=#

In Y

x II X

B

X'

a

Y=X*

in Y in Y ln (Y/* in ~ r

X I ~ X

a a a a

y = a sin x y = asin'x y = sin BX y = sin ax'

In Y in Y

y=& y = xe" y=xb&

?

1

1

a

y2 Y' y2

B

x'kx x X'

comments

W=

x f Ob r must b e known: x > 0 when rnoninteger X> 0

r must be known -lO

y > ~ : x f 06 y > 0; r m w t be known: x 0 when r noninteger

>

y2 y2 IY/X)~ y*r~~

Y>O;X>O

y>O;x>0;rmustbeknown >0 y x s 0; rand s must be known; x > 0 when rnoninteaer

w*

>

'Tranafwmed parameters are denoted by capitals. It is assumed that me random enors are mncemrated in y, and that they follow a Gawolan ditlbiblmon. me reievam least squares eqyatlon f w the calculation of A 18 eq 49, while tha weight factor Wig given by eq 22. b U ~ ae~uftlcientlysmall number insteadof x = 0.

Table 2. function

Some Examplesad Equationsthat Can Be Transformed lnto the Linear Relation Y = A X + B Y=

X=

y=ax+b y=ah+b y=ax'+b

Y Y Y

x l/x

y=akx+b y = ax'+ bxs

Y Yr'

In x

X'

x'~

A=

B=

W=

a a a

b b b

1 1 1

a a

b b

1

.+

comments

x#Oa rmust be known; x 0 for r noninteger x>O rand s must be known: x r, s Or both nonlnteger

>

> 0 when

Y>O x>0;y>o x # 04 rmust be known: x > 0 when r noninteger Y>O x+o4y>o y > 0; r must be known. x > 0 when r is noninteget x>0;y>o

.Transfwmed parameters are ¬ed by capifals. It is assumed mat the random erras are concentrated in y, and that they follow a Gatmian distribution. me relevam leastsquares BquatlOnsfor the calculation Of A and Bare eqs 27 and 28. respenlvely. while the weight Fanor Wis given by sq 23. Usa a suniciemly omail number lnstead of zero.

direct and can be written in such a way as to use a minimum of memory, because no original data need be stored, but only the running sums updated to incorporate subsequent data pairs. Strictly speaking, such a least-squares analysis applies only t o polynomial expressions. However, its range can be extended considerably by using transformations. Tables 1 14


through 3 show some examples of functions which can be transformed into a proportionality, a linear relation, or a quadratic one, together with the transformations involved, and the resulting global weighting factors. Conversion of the relations for unweighted least squares into their weighted counterparts is equally simple. One just adds a term W; to any sum, and changes any term N into

Table 3. function b

H=a+-+cu

A few Examplesaol Equationsthat Can Be Transformed Into the Quadratic Form Y = AX

Y=

X=

A=

Hu

V

c

B

b

bY RY

x

llc 11c

-2b/c

Ra+2b/c

In x

-2b/c

In a

I/Y

x

a

Zab

c+ab2

we=

C=

5

+ EX + C comments

u-*

van Deemter eq: x # 0'

y2 yZ

Gauss distribution: y > 0

Y'

Cauchy distribution: y > 0

U

= ael-b?l= = aeibmd'lc

1

Y=

a(x

+ blZ + c

+ 2b/c

lognormal distribution: x >

0;y > 0

'Transfamod p m m o t e r s a r s d e n by ~ capitals. nisaswmdmattherandammrsare concentrated in y (w. I n m flrslsxampleglven, in HI,andthat h y IallowaGauraiandlMbw tlm. me relwan least sqwes equations lor hcalculation of A, B and C are q n 50, 51 and 52, respectively. w41ilhile me welght facmr W Is given by eq 22. &Usea 8uHicIenlly small number instead of x = 0.

xiWi. Thus, the weighted least-squares expression for a proportionality is

C

WiXiYi (49)

A=

1wix: i

When a transformation is used but the associated weighting is ignored, nonoptimal results can of course be expected. I t should be noted here that the transformation-dependent weighting discussed here does not replace, but should be used in addition to any weighting based on measured standard deviations o of individual data points. The combination is achieved merely by dividing the weighting factors given by eq 22 by the individual variances uj2. Thus, in general, we have

instead of eq 4, and the relations for a weightedleast squares fit to a parabola are obtained by replacing eqs 13 through 16 by

However, when the individual standard deviations are not known, the weights given by eq 22 are the best one can do under the circumstances. Of course, not every function can be converted into a suitable form. For example, the Poisson distribution

contains only one (A) or, if n is not specified, two unknowns. In either case, we know of no transformation which will convert eq 55 into either Y = AX or Y = AX B such that X is equal to, or a known function of, time t. However, as the specific examples given above and in Tables 1 through 3 indicate, a large number of commonly encountered functions can be converted into a form suitable for simple, noniterative least-squares analysis. If such analysis is performed using proper weighting, it can be an efficient, reliable, and powerful method.

+

Acknowledgement

This work was supported by AFOSR grant 84-0017.

(1) Henri,V.Compt.Rend. 1W2,13J.916;Micha~iis,L.;Menten,M.L.Biaeham.Z. 1913. 49, 333: van Slyke, D.D.: Cullon. G.E. J. B i d Cham. 1914. 19, 141; Brap, G.E.: Haldane,J.B.S.Biochzm.J 1925,19,338. (21 Lineweaver, H., B u k , D.: Derning, W.E.J. Amer. Chem. Soc. 1934. 56, 225; Lineweaver, H.; Burk, D. J. Amer. Chem. Soc. 1934,56,658 (31 Hanes,C.S.Biochem. J. 1332.26.1406. (4) Eadie,G.S.J B i d Cham. 1942.146.85; Hofstee,B.H.J. J. B i d Cham. 1952,199,357. (51 Iikoui6.D.D. Coil. Czech. Chem. Commun. 1934.6,498: IIkoviE, D.D. J. Chim. Phys.

,ax% ?< 190 ."-" ,-*,

(61 nkwiE.0.D. CaN.Czoch. Chem. Commun. L936,8,170. (71 Atwell. R J.; Stidharm, R.; de Levie, R. J. Eiactroonol. Chem. 1985.19d. 143.

Volume 63


15

When, why, and how to use weighted least squares - Journal of

Recommend Documents