estimation of unknown constants from multiresponse data

Frequently in practice, however, data are obtained not on a single response alone but on two or more responses simultaneously. Several different crite...
3 downloads 0 Views 306KB Size
E S T I M A T I O N OF UNKNOWN C O N S T A N T S FROM MULTIRESPONSE DATA The analysis of data when only a single response variable is measured has been discussed extensively in the chemical and chemical engineering literature. Frequently in practice, however, data are obtained not on a single response alone but on two or more responses simultaneously. Several different criteria that have been used for fitting such data are discussed, and their limitations are indicated.

analysis of data when only a single response variable is measured has been discussed extensively in the chemical and chemical engineering literature-for example, work on nonlinear least squares (2, 4, 7-76). Frequently in practice, however, data are obtained not on a single response alone but on two or more responses simultaneously for a given setting of the controllable variables. 4s a simple example, consider the THE

I:,

kz

chemical reaction A -+ B + C, where both steps are first order and irreversible, the rate constants being k l and kz. Suppose the initial conditions of the reaction a t time t = 0 are [ A ] = 1, [ B ] = 0, and [C] = 0. If only [ B ]is measured, the appropriate mathernatical model to consider is

Suppose n data points ,are available a t times tu, u = 1, 2, . . . , n. Using the method of nonlinear least squares, one would minimize the quantity n

with respect to k1 and k z , obtaining the so-called least squares estimates and A*. An introductory account of nonlinear least squares is given by Draper and Smith (6). Now, if [ A ] and [C] had been measured in addition to [ B ] , one could presumably obtain better estimates of k l and kz. Use would be made of the following two equations in addition to Equation 1

[A] =

IC] = 1

(kl

lB1u, o b s d , Y ~ ( ~= )

[ c l u , obsd, ? u ( ' )

[c],,

=

calcd,

qu") = [ B l u , calcd,

and = &d. I n general, a mathematical model ) f,(k), will, therefore, consist of T response functions, q ( ~ = u = 1, 2, . . ., T , (similar to Equations 1, 3, and 4) depending o n a set of constants, k, which are to be estimated from the data. This criterion, CS,has been adopted by a number of research workers, in particular, Ball and Groenweghe ( 3 ) . I t is worthwhile to ask whether this criterion can be derived from first principles. Such a derivation, if possible, would be especially useful in understanding what assumptions were implicit in its use. Derivation of Criterion CS

Suppose that for the uth response the probability density function for yU(') is normal and that the variance u u ( 2 ) is the same for all experimental conditions-that is, for u = 1 , 2 , . . . , n,

(7) Furthermore suppose that y u ( u ) and yS(O) (u # s) are statistically independent. Then the joint probability density function for n observations y(') = (yl('), y ~ ( ' ) , . . . , yn('))' on the uth response is

(3)

e--kl'

+ (kz - k l ) - l

I n terms of the above example, T = 3,y,(') = [ A ] , , &ad> yu(*) =

e-"*

- k 2 e--k1')

(4)

But what criterion should be used to fit the data simultaneously from all three responses? This note discusses several possible answers to this question, not only for the case where there are three responses but also for the general situation where there are Y responses. Indications are given as to the applicability of each criterion.

Now suppose the above assumptions hold for all T responsesthat is, u = 1, 2, . . ., r . Also assume that y u ( m ) and y s ( z ) (u # s, u, # x ) are statistically independent. Making these assumptions, one obtains for the joint probability density function of all n X r readings y = (y(l), y(*), . . ., y('), . . ,,

y ( v )1

Straightforward Generalization of Nonlinear least Squares

Perhaps the most obvious extension of the method of nonlinear least squares is to minimize the combined sum of squares for all the responses. For instance, in the example just cited this criterion is

k ([clu,

u-1

obid

-

[Clu, calcd)'

where the variance-covariance matrix of y, = I y u ( l ) , y u ( 2 ) , . . . ,Y , ( ~ ) ] the , T observations made during the uth experiment, is

(5)

The criterion in the geineral case where there are T responses can be written (in a more convenient notation) simply as C6

=

r

n

2; v..l

u-1

Iyu(')

-

?7,('))12

(6)

VOL. 6 NO. 3

AUGUST 1967

and ct, is the sum of cross products of deviations from the ith and j t h response functions

Now, considering the data as having been obtained (and therefore fixed) and considering the parameters as the variables, one can write the likelihood function as

r

r

The term

utj

c i j in the exponent is positive.

Therefore

i=l j = 1

r c 1 4

1. Each of the I responses is normally distributed, each with its own variance,:a and y,(u) is independent of ys(u), u ZS. 2. All the variances u? are assumed equal--that is, u12 =

=

r

i=l j=1

= . . . = u?.

3. There is no correlation between y,(w), and y u ( z ) ,LL' # x ; for all experimental n = 1, 2, . , , u-that is, 6'1 = 0. Correlation Absent,

ut1

Known Case

If assumption 2 alone is violated (meaning that certain Ut'

(14)

Ctj

is to be minimized. If the further assumption is made that criterion C14 simplifies to

u'j

=

0 for i # j ,

What the assumption u'f = 0 (i # j ) means in physical terms is that there is no correlation, for a single experiment, among the readingsy,('), yU@),. . . ,y U ( l ) . This would not usually be a realistic assumption to make in practice. But if it is appropriate, then the criterion to be used is CIS,which would be minimized with respect to k to obtain the maximum likelihood estimates for the parameters. CIS is seen to be a weighted sum of squares, the weighting factors being the reciprocals of the variances. If a response, say response q ( w ) , is measured relatively precisely (uww = uw2 is small), the data on that response will be weighted rather heavily (uww = uW+ is large) ;

responses can be measured more precisely than others), criterion CISshould be used instead of cg. Correlation Present, ut? Known Case

If, in addition to assumption 2, assumption 3 is also violated (meaning that for a given experiment the measurements obtained on the different responses are correlated), criterion c 1 4 should be used in preference to either c g or CIS.This generalization of the method of least squares was first given by Aitken (7). ujj

Unknown Case

But there is still a point of difficulty with Equation 14 as a criterion. The equation involves the variances and covariances u t j as known quantities, although in practice u" will be unknown. Box and Draper ( 5 ) have successfully tackled this problem and, using Bayes' theorem, they derived the following determinant criterion which, when minimized with respect to k, gives estimates of the unknown constants k in the multiresponse model under consideration,

=

conversely, if uzz is large, the weight given to the data on response q ( z ) will be small. Finally, assume that u12 = uz2 = . . . = uO2= , . . = uT2= &-that is, that the variance is the same for all responses.

462

Thus, we see that it is possible to derive CP,from first principles via the method of maximum likelihood, but only if one makes some fairly restrictive assumptions, which are, in fact, unrealistic for most engineering applications. I n particular, it has been assumed that:

u22

if the values ut' are known and it is desired to find the maximum likelihood estimates &-namely, those values of k which maximize Equation 13-clearly

Cl6

Once again, for most practical situations this assumption will not hold because some of the responses will usually be measured more precisely than others. But, making this assumption together with all those mentioned previously, one obtains criterion Cg, since c15 with all 6'' equal is equivalent to

I&EC FUNDAMENTALS

For example, referring back to the chemical example cited earlier and using multiresponse data on [ A ] , [ B ] , and [C] (Y = 3) to estimate kl and k ~ one , would minimize the following 3 x 3 determinant with respect to k l and k2:

Recalling that yU(') = [ A ] , , and tu(')= [ A ] , , &d, we see that the upper left-hand element is nothing more than the sum of squares associated with A . Similarly, the other two diagonal terms are simply the sums of squares associated with B and C. The off-diagonal terms are merely the sums of cross products between A and B , A and C, and B and C. For instance, the second element in the first row is the sum of cross products between A and B . T h e calculation is, then, straightforward. Usually, such a problem would be solved by carrying out an iterative procedure with a digital computer. Since the r diagonal terms of the determinant are the sums of squares for each of the r response functions, if there is only one response this criterion reduces to the ordinary method of least squares. Box and Draper ( 5 ) point out the relationship between their criterion (Equation 16) and Aitken's (Equation 14); essentially the u t ] , which are assumed known in Aitken's criterion, have been replaced in Box and Draper's criterion by their maximum likelihood estimates. The proper criterion to use in a given data-fitting problem will, of course, depend on the assumptions one is entitled to make. I n any event, the assumptions should always be checked by examining the residuals yU(') - ju(')for any unusual behavior. But, lacking any special knowledge, minimizing the quantity defined in Equation 16 is more appropriate. as a method to try first, than any of the alternative methods mentioned above for general multiresponse estimation problems. Acknowledgment

The author expresses his appreciation to the National Science Foundation for its support and to the Wisconsin Alumni Research Foundation.

=

vector of maximum likelihood estimates =

,,ki, . ", = likelihood of . .,

=

number of runs

= probability density of = number of responses = =

(41,

42,

kp)'

.

time uth observed value of vth response where u = 1, 2, , . . , n a n d v = 1, 2, . , .,r uth predicted value of the 0th response-using k in the response function equation = f,(k) n X T matrix of all observations = [y"),y@),. . ., y(u),. . . , y'r)] T X 1 vector of observed values of uth experiment = [yu(l),yu(2), , . . ,Y"'"', . . . ,y u ( ' ) ] I n X 1 vector of observed values for vth response = h l ( Z ) , y+,J), . . . , yu("), , . yn(*)]' calculated value of uth response for uth experiment standard deviation of vth response variable covariance between responses y(') and y ( i ) ; (i,j)th element in matrix (i,j)th element in B -l matrix variance-covariance matrix of y. (Equation 10) inverse of variance-covariance matrix of yu (Equation 11) L.

= = = =

.)

= = = = = =

literature Cited (1) Aitken, A. C., Proc. Roy. Sot. Edinburgh 5 5 , 42 (1935). ( 2 ) Atkinson, A. C., Chem. Eng. 73, 149 (May 9, 1966). (3) Ball, \V. E., Groenweghe, L. C. D., IND.END.CHEM.FUNDAMENTALS 5 , 181 (1966). (4) Blakemore, J. W., Hoerl, A. E., Chem. Eng. Progr. Symp. Ser. 59, No. 42, 14 (1963). (5) Box, G. E. P., Draper, N. R., Biometrika 52, 355 (1965). ( 6 ) Draper, N. R., Smith, Harry, "Applied Regression Analysis," Chap. 10, Wiley, New York, 1966. (7) Hays, G. E., Poska, F. L., Stafford, J. D., Chem. Eng. Progr. 60, No. 1, 61 (1964). ( 8 ) Hunter, J. S., Chem. Eng. Progr. Sym). Ser. 5 6 , No. 31, 10

iinm) ,I

Nomenclature

A B

c

= reacting component =

I

reacting component

= reacting component = concentration of reacting component

[. . [ . 1%.calcd = calculated concentration of reacting component . for the uth run, where Equations 1, 3, and 4 are used for the components B , A , and C, respectively [. Iu, obsd = observed concentration of component for uth run cij = quantity defined by Equation 12 CC = criterion defined bv, Eauation 6 A = criterion defined by Equation 14 C;, = criterion defined by Equation 15 ClS = criterion defined by Equation 16 Cl6 = responsc function for vth response f,(k) = rate constant for ith reaction step k, k = vector of unknown constants =

.

. . ., kp)'

( 9 ) Hunter, \Y. G., Atkinson, A. C., Chem. Eng. 73, 159 (June 6, 1966). (IO) Kittrell, J. R., Hunter, \V. G., rVatson, C. C., A.Z.Ch.E. J. 11,1051 (1965). (11) Kittrell, J. R., Mezaki, Reiji, Watson, C. C., Znd. Eng. Chem. 57, No. 12, 18 (1965). (12) Lapidus, Leon, Peterson, T. I., A.Z.Ch.E. J . 11, 891 (1965). 113) LaDidus. Leon. Peterson, T. I.. "Nonlinear Estimation ' Kinetics Analysis o'f the Catalytic Dehydrogenation of Ethanol," 59th National Meeting, A.I.Ch.E., May 16, 1966. (14) Marquardt, D. W., Chem. Eng. Progr. 55, No. 6, 65 (1959). (15) Mezaki, Reiji, \Vatson, C. C., Znd. Eng. Chem. Process Design Develo). 5 , 62 (1966). (16) Peterson, T . I., Chem. Eng. Sci. 17, 203 (1962).

WILLIAM G. H U N T E R University of Wisconsin Madison, ?Vis. RECEIVED for review October 10, 1966 ACCEPTED February 1, 1967 Research supported by the National Science Foundation under Contract (NSF GP 2755). MBR Technical Report 32.

DISPERSION OF MATTER I N NON-NEWTONIAN FLOW A T LOW FLOW R A T E S is introduced into a solvent which flows through a capillary, it spreads out longitudinally because of the combined effect of both molecular diffusion and the variation of velocity over the cross section. Experimentally and theoretically it has been shown ( 8 ) that the concentration of injected material obeys the ordinary heat equation in a frame which moves a t the mean speed of the flow. Taylor's analysis is valid under certain conditions on the Ptclet number (9). However, this restriction imposed by Taylor can be F SOLUBLE MATTER

I slowly

removed (7). Then the virtual diffusion coefficient is the sum of the molecular diffusion coefficient and the Taylor diffusion coefficient. T o show this, Aris used the moment method. The analysis mentioned above can be extended to nonNewtonian flow systems. The dispersion of a solute by the flow of the Ostwald-de Waele fluid and the flow of the Bingham plastic and the Ellis model fluid has been considered (4, 5 ) . For some types of non-Newtonian fluids the velocity disVOL. 6

NO. 3

AUGUST 1967

463