COMMUNICATIONS
On the Violation of Assumptions in Nonlinear least Squares by Interchange of Response and Predictor Variables’ When estimating the parameters of linear or nonlinear models, one should always minimize the sum of squares of discrepancies for the response variable, even when the response cannot b e expressed explicitly as a function of the predictor variables. Minimization of the sum of squares of discrepancies for any of the predictor variables is theoretically wrong. This point is illustrated by a Hougen-Watson rate equation example. Fifty different simulated data sets are used to demonstrate the kinds of errors that can arise when the incorrect procedure is employed.
1 1 1 many scientific studies, iiivestigators seek quantitative relatioriships amoiig tlie physical quantities involved. Geiierally such relationships coiitaiii lmameters which must be estimated from the available experimental data, most usually via the niethod of least squares. 111 employiiig the leastsquares tecliiiique, we niust distiiiguish betweeii two different t y l m of variables: (1) the “iiidepeiitlent” or predictor variables which arc usually assumed to be free from experimerital error; ( 2 ) the “depeiideiit” or respoiise variable whose values are presumed to be largely determined by tlie values of the liredictors but which are also suliject to exlierimeiital error. We \voultl nsuall>. write our l)ostulated quantitative relatioiiship as a model of general form
Yu =
f(xu,0)
+
eu
(1)
where u = 1, 2 , . . , n represents the n data poiiits, y u is t’lie respoiise at the ut11 data point, xu is a vector x u = ( x l u ,x2u, . . , , xmu)’ of values for the utli poiiit of the predictor variables zl, x 2 , . . , , .rm, 6 = (e1, e2, . . . , 8,)’is a vector of parameters (for example, rate constants) to be estimated, and t u is the random error t h a t occurs in the u t h run. The method of least squares under these assumptioiis then consists of choosiiig, as estimates 2, of e,, i = 1, 2 , . . . , p , those values of the 8, which minimize the sum of squares fuiiction
c n
s(0)=
u= 1
(Yu
- f(xu,O)}Z
(2)
V a h u s routines are available for achieviiig this rniiiimizatioii; see, for example, Draper mid Smith (1966) and Bard aiid Lapidus (1968). The use of least squares when justified by tlie principle of maximum likelihood, has, as a standard implication, the underlying assumptioiis that the distributions of the r m d o m errors eU are all normal aiid that these distributions all have the same means, zero, and the same variances u,Z = 6 2 aiid that any two errors are independent of each other. Ciiless this is so, the use of least squares would be improper. I n particular, if not all the uU2are equal, some sort of weighted least squares would have to be performed; otherwise grossly inaccurate parameter est,imates can be obtaiiied. (This latter assumlition is probably the oiie most frequently violated iii 1 This work was done in the Chemical Engineering Department of New York University in cooperation with the Statistics Eepartment of The University of Wisconsin.
applicatioiis of least squares to engineering data.) The way one qfteii discovers such discrepancies is t o assume (perhaps wroiigly) that all is well and then to examine aiid plot the residuals yIL - f l u , where Q, = f(xu,b),u = 1, 2 , . , . , n are the fitted values, t o see if they reveal faults iii the assumptions made; see cliapter 3 of Draper aiid Smith (1966). Any faults idelltifieti would then be rect’ified in a further analysis, a i d the process nould be continued until a satisfactory esplaiiatioii of t h e data were attained. Such a sequential procedure is both correct’ aiid usuitl. A-l faulty model-fitting llrocedure that is been more frequently than tlie present writers would like is one in which a fornial least-squares analysis is carried out on oiie of the predictor variables rather than on the response. Such a procedure appears very tempting when the equatioii coiiiiectiiig response and predictors cannot be expressed straightforwardly aiid directly iii the desirable form vu =
(3)
f(Xu,O)
where vu is the t,heoretical respoiise but can, for example, be expressed in the geiieral form x j u = g ( q u , xlu, x2ul . .
1
xj-l,u, z j + ! , u ,
,
.
,
t
xmu;
6)
(4)
u = 1, 2 ! , . . , n, where z j is oiie of the predictor variables. If we write xu* for a vector similar to xu but which omits 2 j U , we call espress (4) more succinctly as
111 such
(5)
= g(7u,xu*,O)
xju
a situation, it is very teniptiiig to fit, by least squares,
a model of the form xju
=
+
g(Yu,xu*,~)
e,
(6)
and so to estimate 0 by miiiimiziiig the sum of squares quaiitity
c n
T(0) =
u= 1
(Zju
-
g(Yu,Xu*,~)P
(7)
Sucli a iirocedure has beeii employed, for example, by Johnson (1960), by Fromeiit and Mezaki (19iOj, and by Miller we have said, it is theoretically incorrect and Kirk (1962). to estimate 6 by minimization of T ( 8 ) . One should always minimize S(O), even if the coiiversioii of eq 5 to eq 3 can ollly be dolie numerically. The reader may wonder, however, about the effect of using the wroiig procedure. For example, how Ind. Eng. Chem. Fundam., Vol. 12, No. 2, 1973
251
24
0.91 0.94
1.00
0.97
1.03 1.06
1.09
T r u e Value
100
99
101
--
n2,
-
r
, ,. ,
,,
99
I00
I01
10.2
8
fi
Figure 3. Histograms of with u2 = 1 correct method; (b) via incorrect method
X
(a) via
(a)
I
1 2
2
98
10
9
IO3
8
Figure 1 . Histograms of 8 with u2 = 1 X l o - + (a) via correct method; (b)via incorrect method
n
96
97
98
99
10.0
101
10.2 10.3
10.4
6
1
True V a l u e
d
4
92
93
1
4
*)4
95
1 4
96
97
98
99
I00
101
102
103
Figure 4. Histograms of 6 with g2 = 1 X correct method; (b) via incorrect method
4.0
4.9
5.0
(a) via
In ('ia), is the reaction rate, 8, LY, and /3 are parameters to be estimated, aiid the predictor ~ a r i a b l e spa aiid p~ are the partial pressures of Z . and 13, respectivel?. Suppose \\e are iiiterested i n the coli\ ersioii 7 If w represents the predictor variable space time, then
31
4.7
8
5.1
52
53
5.4
55
a
Figure 2. Histograms of C$ with cr' = 1 X 10% (a) via correct method; (b) via incorrect method
Iindly are the estimates affected? The 1)urpose of this paper is to show, via a n example, what one can expect.
Equatioii i a caii be e\pressed iii terms of conversion, 7,and the total pressure, T ,as folloi! s.
From (Sa) and (8) we caii obtain, after iiitegration
Example
h
-For13 + C,
252
illustintioil, coiisider a solid-catalyzed gaseous reaction ivith a Hougeii-Katsoii rate equation given by
Ind. Eng. Chem. Fundam., Vol. 12, No. 2, 1973
where r represents the total pressure. R e see that, if we attached subscripts u to w arid ?, we should have an equation of t h e general form (4)where w aiid a are the predictors, and
n
-
7
A I IS
0.82 4.5
4.6
47
4.8
4.9
50
5.1
5.2
5.3
0.85
088
0.91
0.94
0.97
1.00
1.03
106
ID9
1.12
B
t
5.4
Nota that lowar scale i s tar t o l * t t o t upper K a l e
(b) I
7
e
IO
.L
7 -275
-272
-269 - 2 6 6 - 2 6 3 - 2 6 0 - 2 5 7 - 2 5 4 -251 - 2 4 8 - 2 4 5 - 2 4 2 - 2 3 9
8
Figure 6. Histograms of with uz = 1 X correct method; (b) via incorrect method
0, CY, p are the paraineterb. K e caiiiiot obtain an e\plicit equatioii of the form of (3) for 7 from (9) $0 that \\e have an example of e\actlj the type previously described. The tn o rnetliotls giveii above for estimating the parameters iiov correy)oiid to chooyiiig 0, CY, a i d to miiiimize, respectively, t h e quantities
arid
Iri the first of these expressions, eq 10; the specific values f(w,, r u ;e, CY,p) are determined numerically from (9) after specific values for m u , rt1,6, CY, aiid p have beeii subetit'uted, using standard techniques ivhich obtain roots of equations. For ( l l ) , eq 9 is used directlj-. Thus tlie latter procedure is much easier to carry out, but it is incorrect. S o t e , however: that n-lieii the parameter values are known, it is perfectly legitimate to ube liotli eq 3 aiid 5 t o oiltaiii values of the predictors xhich gi1.e rise to specified response values. To demolistrate the iiifereiitial errors that occur nheii (11) is erroneously miiiimized we treated two groups of simulated data hy each method. Each group consisted of 50 sets of data generated from the model defined by eq 5 t o 9 wit,h true parameter values selected as
0
=
values for each method a i d comparing the pairs of histograms with each other aiid with the k n o w i true values of eq 12. Figure 1 compares histograms of the two seta of 50 estimates of e obtained nin the t,wo methods wlieii u2 = 1 X 10-5. The centers of t,he intervals are indicated in the figure and the width of the intervals is 0.1. The true value of 8, we recall, is 10. K e see that, while the correct least-squares method consistently produces satisfactory estimates, the incorrect method based on niiiiimizing eq 11 iiot oiill- incurs a bias to the right of the true value, but the estimates are spread more widely, t,hus providing a larger variance for the estimated value. Figures 2 and 3 provide similar comparisons for a aiid 6 , respectively. The true values are 01 = 5 and = 1. Figure 2 s h o m that CY is estimated properly oil average hut with a larger variance. Figure 3 s h o w that, for p! the incorrect method produce.