Use of Statistical Experimental Design in a Kinetics Study - Industrial

Ind. Eng. Chem. Process Des. Dev. , 1972, 11 (3), pp 355–359. DOI: 10.1021/i260043a006. Publication Date: July 1972. ACS Legacy Archive. Cite this:I...
3 downloads 0 Views 495KB Size
Use of Statistical Experimental Design in a Kinetics Study Vernon E, Saterl Department of Chemical Engineering, Arizona State University, Tempe, A Z 85281

F. Dee Stevenson Ames Laboratory] PSAEC and Department of Chemical Engineering, Iowa State University, Ames, I A 50010

A brief review of the statistical design of experiments where the objective of the experiments is to estimate parameters in a given model i s presented. Methods for choosing experimental conditions that, for a given number of experiments, will result in the maximum accuracy (or the smallest confidence region) of the estimated parameters are discussed. The techniques are applied to the design of an experimental study of the rate equation for formation of phosgene from carbon monoxide and chlorine with constraints on the entering feed composition and temperature of the reactor. The resulting confidence regions for several designs are presented and show, for instance, that two seemingly good designs can differ in size of confidence regions b y a factor of 30.

M o s t experiments are performed to establish both the qualitative and quantitative relationships between variablesLe., the effects of independent or controllable variables on the dependent or measurable variables. Information concerning both the functional form of the correlation and the values of parameters in the correlation is usually desired. However, in some cases, the functional form of the relationship is already known, and the purpose of the experiment is solely t o evaluate the constants in the relationship. One such example was a recent investigation of the kinetics of the formation of phosgene from chlorine and carbon monoxide (Kowalczyk, 1968), the objective being to establish the value and confidence interval of the parameters in the rate equation originally proposed by Bodenstein and Plaut (1924). Using this reaction along with his possible ranges of experimental conditions as an example, this article will outline a method for choosing the conditions a t which the experiments should be run to achieve maximum accuracy in the values of the sought parameters. Background

Let y = y(zl, z2, . . . zk; pl, p2, . . . p p ) represent the true relationship between the dependent variable, y I and t’he k independent variables, $1, XZ, . . . xk;.The const’ants, 81,Pz, . . , p p , represent the true values of the p parameters in the relationship. To determine the best estimates of the paramet’ers,y is measured a t various levels of the x’s,and for a linear relationship, a least-squares analysis yields:

B = (X’X)-l(X’Y)

(1)

0

where = (&), Y = { y i l t and X = { xji).In the above, xjl is the value of the j t h independent variable for the i t h experiment while y i is the resulting value of the measured variable for the ith experiment. T,he resulting estimate of the rth parameter is denoted by P7. (The term linear means here P

that the relationship is of the form: y

=

Po

To whom correspondence should be addressed.

+ i = l pixi.

However, even a highly nonlinear relationship such as: bxZ c log x can be linearized by letting 21 = x 2 and x2 = log x.) If the errors in the experiment are normally distributed, it has been shown (Atkinson, 1966; Atkinson and Hunter, 1968; Box and Lucas, 1959; Hunter and Atkinson, 1966; Wald, 1943) that the boundary of a confidence region of the is given by the values of which satisfy: true parameters, y = a

+

+

e,

(e - O ) ’ ( X w 3 - 6) = psZFa(p,

y)

(2)

where s2 = estimate of experimental error variance F = statistic F from the F distribution CY = confidence level Y = degrees of freedom associated with sz The region represents a hyperellipsoid in space centered at Thus, one can be confident a t the a level that the ellipsoid includes the true values. Since the volume of the ellipsoid is inversely proportional to [X’XI, it is evident that by proper design of the experiment] the value of /X’XI can be maximized] thereby minimizing the Confidence region and increasing the accuracy of For the special case where the number of experiments] N , is equal to the number of parameters] p , the matrix, X, is square, and

0.

6.

JX‘XI

=

1 x 1 2

(3)

This simplifies the problem of maximizing IX’XI to the easier case of maximizing 1x1. Geometrically, 1x1 is proportional to the volume of a simplex in x space formed by the origin (0, 0, . . . , 0) and the N-points corresponding to (XU, XZI, . ., x p l ) , . . ., (XNI, XN2,

, ,

,

1

xNp).

As an example, let y = PlXl

+

P2xz

(4)

represent a two-parameter model. Also imagine constraints on x1and x2 as follows: Ind. Eng. Chem. Process Des. Develop., Vol. 1 1 , No. 3, 1972

355

a n experimenter usually has some feel for the parameters and can at least make an order-of-magnitude guess. Another application would be where some runs yield results with wide confidence ranges. T o narrow the range requires more data and the succeeding experiments should be designed making use of the preliminary results. For the case where N = p , the analogy of Equation 3 holds for nonlinear cases.

0 I N

X

IF’FI

0

2

I XI

Figure 1. Optimal design for model given by Equations 4 and 5

1 5 2 1 5 2 0 5 2 2 5 1 The feasible region in s-space is shoyn in Figure 1. I n this two-dimensional case, is equal to twice the area formed by the two experiments and the original. By inspection, the triangle with the minimum area would be formed by (O,O), (2,O) and any point along the line, x2 = 1, with 1 5 s1 5 2. Thus, a pair of experiment’s, one a t point 0 and the second anywhere on line 0as shown on Figure 1, would yield the most accurate values of and 02. If the relationship between y and the 2 ’ s cannot be linearized, the problem is complicated. I n this case, @ is found by solving the set of nonlinear least-squares equations (Box, 1959) :

1x1

N

C

i=l

(jri)(Gi

-

~ i ]=

0

(r = 1, 2, . . ., P)

(6)

F’(P - Yj where Y

=

=

0

{ yi]

Application to a Kinetics Problem

Because of interest a t the Ames Laboratory in using phosgene as a chlorinating agent in the purification of the less common metals (Boesiger, 1967j, Kowalczyk (1968) began a study to verify the existing rate equations describing the formation and dissociation of phosgene according to the overall reaction:

co + Cln is COClZ Christiansen (1922) proposed the following rate equation based on the assumption that both the formation and dissociation reactions are catalyzed by chlorine: =

kD(COC1n)(C12)”2 - kF(CO)(~ls)~’2

log kF = (-5741/T)

3

for the conditions of the i t h experiment. Because of their nonlinear form, the above equations must be solved numerically, usually by a series of linear approximations which converge on Although a n analogy to Equation 2 for a nonlinear model is not exact, if the model is assumed linear in the region of then the following equation would approximate the confidence region for the nonlinear case (Box, 1959; Sater, 1970).

b.

6,

(e - @)’(F’F)(@ - Q) = p s 2 ~ & , v )

(7)

Here, the volume of the confidence region is inversely proportional to IF’FI which is a function of both the estimated parameters, and the experimental conditions represented by X. Thus, to maximize IF’Fi by adjusting X,it is necessary to have a good estimate of tjhe parameters, 8. This seems to be a paradox since the results of the experiment must be known to design a good experiment to obtain these results. However, through preliminary experiments or prior information,

1,

356 Ind. Eng. Chem. Process Des. Develop., Vol. 1 1 , No. 3, 1972

(8)

(9)

Bodenstein and Plaut (1924) fitted their data to this expression and arrived a t the following expressions for the rate constants as functions of temperature:

ci = value of y i calculated from the model using @

F = (fri} f r r = [byi/3b,l,=

IFi2

Here, 1 FI is proportional to the volume of a simplex in f-space formed by the origin and the N-points corresponding to S N A . . . , SN,,). Each point corre(A1,jZ1, . . . , f,d, . . . , sponds to an experimental run and is also a function of @ for nonlinear models. All constraints must be translated from 2-space to f-space to form the feasible region before a geometrical approach can be used. (Box and Lucas (1959) present an excellent example using this approach.) For more than two parameters in the model the geometric approach is difficult to use directly but is useful from a qualitative basis as shown in the following example. It is generally necessary to maximize IF1 using some numerical, iterative optimization techniques.

-rcocip

or in matrix notation:

=

log

kD =

(-11451/T)

+ 8.023 + 13.483

(10)

where the units are consistent with g-moles, liters, seconds, and O K . There were thus eight parameters in the model (including the powers on the concentrations in rate equation) which Kowalczyk wanted to verify, assuming the general functional form of the rate equation to be correct. Komalczyk’s apparatus consisted of a tubular reactor in a constant temperature bath and a system for providing feed streams of any compositions using helium as a dilut’ant’.His experimental variables were temperature and initial compositions of CO and Cln. Residence time did vary slightly during his experiments, but will be considered constant in t’his discussion. Conversions were based on the CO composition in t,he outlet gas. This example will deal only with the case where t’hefeed gas contains only He, CO, and C12,and conversion is kept low so that the dissociation reaction can be ignored. Then:

~ C O C l *=

=

where A

=

kF(C0)(C1Z)a’z

A exp (-E/RT)(CO)(C12)3’2

(11)

5.0 X lo7 (mol/l.)-l(sec)-l

E / R = -12,700°K-’ Using this rate equation in the material balance for a constant temperature plug flow reactor, where the concentrations of chlorine and CO are represented as cA and cB, respectively, gives

where

CA

= cAa and C B = cB0 a t

CB = CBO CA

=

t

=

n

3

Y

0

u(

T.652.4

--

0

- (CAO - CA)

cA, when t

=

7

V/F

=

Because of metering flow rates and measuring concentrations, the following constraints were placed on the experimental variables:

Ce,

(xg)

Figure 2. Feasible region in x space

1 0.003

c A O ,C B ~ cA0

+

5 0.018

C B ~

The temperature range of interest was limited to :

(.Ot5, .

0

0

3

,

0

652.4 _< T _< 746.4’K *

Kowalczyk’s experiments essentially confirmed the model given by Equation 9 and the parameters given in Equation 10. The following shows how the previously discussed techniques could have been used to make his experiments as efficient as possible. To generalize the equation using the previously established nomenclature, let: cAo X? = cBo 53=T

XI =

Y

= cAf

PI

= 1.5 (power of cA in rate equation)

P2 = 1 (power of cB in rate equation) P3=A 0-1= E / R

1.009,. 0 0% 652.4)

009,.o 09 746.4)

/

(. 003,. 0I5,74 6.4I

Figure 3. Optimal design for kinetics example

with the constraints: XI,

xz 2 0.003

XI

+ xz 5 0.018

652.4

5

23

5 746.4OK

Since there are four parameters in the model, a t least four experimental runs are required and the resulting determinant, IFI,which is to be maximized will be fourth order. This means that for each design, 16 derivatives must be evaluated. Each derivative is evaluated by numerically integrating Equation 12 a t some design point, xi, using B and then repeating four times after incrementing the parameters, one a t a time, by APT.Then the derivatives can be approximated by:

a normal optimization scheme such as changing one variable a t a time, steepest ascent, etc., would be extremely timeconsuming even on a large computer. Because of the large number of variables, the problem was approached through a combination of trial-and-error search and examination of the function and its derivatives as the variables are changed. All of the derivates, f r i , monotonically increase with an increase in any of the variables, eAo,cBo,or T . The constraints are also in a linear form. Therefore the feasible region in F space should also be simplex and the optimal design should lie on the intersections of the planes described by: XI,

XI

This requires five integrations of the equation a t each design point. A digital computer program was used to perform the integrations and evaluate the derivatives. il fourth-order Runge-Kutta numerical integration technique was used. The problem of finding the optimum design is complicated by the calculations required. There are three independent variables to be determined a t each of the four points resulting in a 12-dimensional search problem. Since each time any one of these 12 values is changed, five integrations are required,

xz

=

0.003

+ xz = 0.018 23 =

652.4 and x3 = 746.4

The intersections form the perimeters of the two triangles shown in Figure 2 . Some designs and the resulting values of IF1 are shown in Table I. The best design using the criterion of minimum confidence region would be design 3 shown in Figure 3. The following can be surmised from the results and inspection of Equation 12: Ind. Eng. Chem. Process Des. Develop., Vol. 1 1 , No. 3, 1972

357

Table 1. Effect Design

1

2

3

4

5

6

7

8

9

10

11

of Various Designs on IF1 Design

Temp, Point

O F

CAo

CBP

1 2 3 4

652.4 746.4 746.4 746.4

0.009 0.015 0.003 0.003

0.009 0.003 0.003 0.015

0.72 X

1 2 3 4

652.4 746.4 746.4 746.4

0.003 0.015 0.009 0.003

0.003 0.003 0.009 0.015

0.22

1 2 3 4

652.4 746.4 746.4 746.4

0.009 0.015 0.009 0.003

0 I009 0.003 0.009 0.015

0.34

1 2 3 4

652.4 746.4 746.4 746.4

0.009 0.009 0.003 0.009

0.009 0.003 0.009 0.009

0.14

1 2 3 4

652.4 652.4 652.4 746.4

0.015 0.003 0.003 0.009

0.003 0.003 0.015 0.009

0.63

1 2 3 4

652,4 652.4 652.4 746.4

0.015 0.003 0.003 0.003

0.003 0.003 0.015 0.003

0.48

1 2 3 4

652.4 652.4 652.4 746.4

0.015 0.003 0.003 0.003

0.003 0.003 0.015 0.015

0.44

1 2 3 4

652.4 652.4 652.4 746.4

0.015 0.009 0.003 0.003

0.003 0.009 0.015 0.003

0.28 X

1 2 3 4

652.1 652.1 746.4 746.4

0.015 0.003 0.003 0.009

0.003 0.015 0.003 0.009

0.10

1 2 3 4

652.1 652.1 746.4 746.4

0.015 0.003 0.015 0.003

0.003 0.015 0.003 0.015

0.30 X

1 2

652.1 652.1

0.015 0.003

0.003 0.015

0.25

IF1

x

x x

O F

CAP

3 4

746.4 746.4

0.003 0.015

0.003 0.003

12

1 2 3 4

652.1 652.1 746.4 746.4

0.003 0.009 0.015 0.003

0.003 0.009 0.003 0.015

0.87

x

13

1 2 3 4

652,4 652.4 746.4 746.4

0.015 0.003 0.003 0.009

0.003 0.003 0.003 0.009

0.45

x

14

1 2 3 4

652.4 652.4 746.4 746.4

0.009 0.003 0.003 0,009

0.003 0.009 0.003 0.009

0.19

x

10-27

15

1 2 3 4

652.4 652.4 746.4 746.4

0.009 0.003 0.003 0.009

0,003 0.015 0.003 0.009

0.50

x

10-27

16

1 2 3 4

652.4 652.4 746.4 746.4

0.015 0.003 0.003 0.009

0.003 0.009 0.003 0.009

0.39

x

10-27

1

652.4 652.4 746.4 746.4

0.015 0.003 0.003 0.009

0.003 0.015 0.003 0.009

0.99

x

10-27

2 3 4 18

1 2 3 4

652.4 652.4 746.4 746.4

0.015 0.003 0.003 0.003

0,003 0.003 0.015 0.003

0.46 X

19

1 2 3 4

652.4 652.4 746.4 746.4

0.015 0.003 0,003 0.015

0,003 0,003 0.015 0.003

0.21 x 10-27

20

1 2 3 4

652,4 652.4 746.4 746,4

0.015 0.003 0.003 0.003

0.003 0.003

0

0.015

1 2 3 4

652.4 652.4 746.4 746,4

0.003 0.015 0.003 0.009

0.003 0.003 0.015 0.009

10-28

10-26

10-25

x

x 17

x

x

10-28

10-26

21

x

10-27

Only one of the four runs should be made at a lower temperature. T o evaluate E / R , runs a t at least two different temperatures should be made, and the best estimate of E / R would be achieved with two runs a t each temperature. However, the conversion a t the lower temperature is less, resulting in a lower value of IF/if two runs, rather than one, are made at the lower temperature. This is seen by comparing the results of designs 1 and 2 with those of designs 12 and 21. Two of the runs should be made with cAodiffering as much as possible from cBo.Point 2 in design 3 is with cAa at its maximum while cBo is a t its minimum. Point 3 is the converse. This results in the best estimate of the difference between PI and 0 2 since if cA0 = cB0, then cA = cB. If the concentrations are no longer independent, no distinction can be made between PLand Pz. This maximum spread in cAaand cB0 also gives a smaller confidence region as seen by comparing designs 3 and 4. 358 Ind. Eng. Chem. Process Des. Develop., Vol. 1 1 , No. 3, 1972

Temp, Point

IFi

CBO

O.Ol5I

10-27

Duplicate runs 0.19

x

10-27

One of the runs should be made at conditions giving maximum conversions. This r u n will give t h e best estimate of A and should be a t the highest possible temperature. I n the case where cAoand cB0 are both independently constrained, they should both be a t their maximum values. I n this example, however, cAa + cB0 = 0.018 and the relative values for maximum conversion depend on the powers of cA and cB in the rate equation. With equal powers, cAa = C B ~= 0.009, but when PI = 1.5 and p2 = 1.0, cAo > cBo for the maximum conversion. Thus, design 3 could be improved by adjusting point 3 to a higher cAa/cBa ratio, but it is doubtful that the improvement in the design would justify the effort. Point 3 also gives a good estimate of (PI p2) which when coupled with the results of points 2 and 4 yields the best estimates of both PI and pl. It should be evident that an analysis similar to the above, if done before any experiments are run, would aid in achieving

+

the best estimate of the parameters. For example, design 10 with two runs a t each temperature level and cAo differing as much as possible from cBo would apparently be a good design. B u t comparing design 10 with 3, it is seen that the ratio of /Fl’sis loa corresponding to a ratio of IF’FI’s of lo8. Assuming this difference in volumes of the confidence region was distributed equally over each of the four parameters, design 3 would give confidence ranges on each variable ( l O 6 ) 1 / 4 or approximately 30 times smaller than those resulting from design 10 even though the same number of experiments with the same precision of measurement is required in either case. Comments

A few points should be mentioned here. The above development is limit,ed to the case where the functional form of the relationship between the variables is known, and only the values of the parameters are to be found. This is normally not the case during experimental investigations and designs which maximize lXX/ for a given modd and yield no information as to the validity of that model. Therefore, if the purpose of the experiments is also to determine the model form, other runs must be made. Box and Lucas (1959) present a n interesting discussion of this point. Even when the model is known, setting the number of experiments equal to the number of parameters will not allow evaluation of the estimate of the experimental error, s. Maximizing IX’X/ or /F’Fl will minimize the confidence region, but s must be known t o fix its size. (See Equation 2.) Therefore, some replication is necessary. Furthermore, if the model is not linear, good estimates of the parameters must be available since F is a function of the parameters themselves. Bowever, after p experiments are completed, the results will yield improved values of the parameters and these can be used in the design of any sequential experiments. Each succeeding experiment would be based on the results of all the prior runs and chosen to maximize IF’FI. Applications of this sequential design technique are given by Atkinson and Hunter (1968) and by Graham and Stevenson (1972a,b). Nomenclature

A

= specific reaction rate constant c = concentration, g-mol/l. C A = concentration of C12, g-mol/l.

CB =

concentration of CO, g-mol/l.

E/R

kp,

= activation energy/gas constant, OK-’ f T i = (by/bb,)a,$ for the i t h experiment F = Ifvii F = ;ai& of t h e F statistic k~ = rate constants for the forward and

N p r s2 1

T zj(

dissociation reactions, respectively, (g-mol/l.) -l(sec) -1 = number of experiments performed = number of parameters in model = rate of production of phosgene, (g-mol/l.)/sec = estimate of experimental error variance = time, see = temperature, O K = value of the j t h independent or controlled variable for the i t h experiment

x = {%I

y i = value of dependent or measured variable for t h e i t h experiment y = {Yil

GREEKLETTERS a = confidence level in F statistic

p7

=

the r t h parameter in model

ev = degrees of freedom associated with s2 iP71

=

T =

residence time in reactor, sec

literature Cited Atkinson, A. C., Chem. Eng., 73 ( l o ) , 149 (1966). Atkinson, A. C., Hunter, W. G., Technometrics, 10,271 (1968). Bodenstein, &I.,Plaut, H., Z. Phys. Chem., 110, 399 (1924).

Boesiger, D. D., PhD thesis, Iowa State University, Ames, 1-4, 1967.

Box, G. E. P., Lucas, H. L., Biometrika, 46, 77 (1959). Christiansen, J. A,, Z. Phys., 103, 99 (1922). Graham, R. J., Stevenson, F. D., Ind. Eng. Chem. Process Des. D e v e h . . 11. 160 11972a).

Graham; R . J.; Stevenson,’F. D., ibid., 164 (1972b). Hunter, W. G., Atkinson, A. C., Chem. Eng., 73 (12), 159 (1966). Kowalczyk, K. G., 11s thesis, Iowa State University, Ames, IA, 1a

m

Sater, V. E., Engineering Center Report, Arizona State University, Tempe, AZ, 1970. Wald, A., Ann. Math. Statist., 14, 134 (1943). RECEIVED for review December 12, 1970 ACCEPTED April 12, 1972 This work was initiated while one of the authors (V. E. S.) was under the Summer Faculty Research Participation Program at the Ames Laboratory. Support was also received in the form of a Faculty Research Grant from Arizona State University.

Ind. Eng. Chem. Process Des. Develop., Vol. 1 1 , No. 3, 1972

359