More on Planning Experiments to increase Research Efficiency

statistical analysis, and in a minimum number of runs. The message was, that if experiments are planned wisely, incorporating statistical designs such...
0 downloads 0 Views 581KB Size
More on Planning Experiments to Increase Research

n the March 1967 issue of I &EC, a n excellent article appeared on “Planning Experiments to Increase Research Efficiency,” by Hunter and Hoff (75). This paper clearly demonstrated by example how experimental designs are used to efficiently screen variables in research investigations with little or no mathematical or statistical analysis, and in a minimum number of runs. The message was, that if experiments are planned wisely, incorporating statistical designs such as those made popular by Box and Hunter (5, 6), the information in the data is often apparent and straightforward to extract. However, if the planning of experiments is poorly done, no matter how sophisticated the analysis, one will generally not be able to extract much useful information. This is because poorly planned experiments are not likely to contain the information in the first place. The problems of poorly planned experiments can be seen as follows. T o determine the effect of temperature and pressure on per cent con\-ersion in a chemical reaction, a plan that studies temperature at a fixed level of pressure, and then studies pressure at a fixed level of temperature may reveal very little about the real effects of these variables. This can be the case if there is a significant interaction effect between temperature and pressure. For instance, if an increase in temperature causes a n increase in conversion at a low pressure, and the same increase in temperature causes a decrease in conversion a t a higher pressure, then the previous onevariable-at-a-time design would fail to detect this important information. The resulting failure to detect this interaction may lead to incorrect conclusions about

I

60

INDUSTRIAL A N D ENGINEERING CHEMISTRY

the effects of the variables. I n contrast to one-variableat-a-time experimentation, statistically planned experiments can detect these interactions easily, as was well illustrated in the Hunter and Hoff paper. In many experimental programs in research, the study of onc or more responses (e.g., conversion, yield, quality, impurity, cost, etc.) leads to one or more of the following questions : (1) which variables (e.g., temperature, pressure, catalyst concentration, etc.) affect the response(s)? (2) how do these variables affect the response(s)? and (3) why do these variables affect the response(s) the way they do? The first question involves the screening of variables, and the approach discussed by Hunter and Hoff applies here. The second question involves developing functional relationships visually, or using polynomials to relate the responses to the variables. I n simple terms, it is French curving of the data to approximate empirically the relationship between a response and the variables. T h e results can be used in predicting process behavior within the range of variables studied, and in optimizing the process with respect to the variables. Statistically planned experiments are very useful here, and their application has become popularly known as response surface methodology (8, 72). T h e third question involves finding the underlying mechanism that relates the response(s) to the variables. I t can involve the development of a theoretically based mechanistic model that describes the system under study. Statistically designed experiments can also be very helpful; that is, in identifying the mechanism (2, 7 7 , 77), in estimating precisely the physical parameters

Emphasis on the application of statistical designs as a powerful research tool to screen variables and to find their optimal levels

William J. Hill and Walter R. Demler

such as rate constants and activation energies (7, 4, 7, 74, 76, ZO), and in jointly performing both of these functions (73). The purpose of this paper will be to show by an actual example how simply statistically planned experiments can be used to answer questions (1) and (2). While Hunter and Hoff concentrated on question (l), that is, screening variables, this paper will illustrate by example how an experimenter can progress through the screening stage and on through to the optimization of the process, which involves the answering of question (2). T h e answering of question (3) on the determination of the underlying mechanism is not considered in the following example. However, examples on the use of statistically designed experiments in mechanism studies can be found in the literature (2, 3, 9, 73, 14, 76, 77, 78, 79, 20). Example

A research group in the Specialty Chemicals Division of Allied Chemical was assigned a project to improve the yield of a dyestuff in order to make the product economically attractive. T h e product formation was thought to take place by the reaction steps

C

+F

+B

+ Others Intermediates + Others Product + Others

A

B,E

Intermediates

+C

(1) (2)

B,E

_ _ f

(3)

where F is the basic starting material that reacts in the presence of a solvent with material C to give interme-

diates and then the final product. The second reaction is a condensation step, and the third reaction is a ringclosing step. I t wasn’t certain what role materials B and E played in reactions 2 and 3. The “others” are off-gases. Even though there were a t least these three separate reactions, preliminary laboratory runs indicated that the reactants could be charged all at once. Therefore, the reaction procedure was to run for t l hr at TIOC, corresponding to the steps needed for condensation, and then the temperature raised to T2’C for t z more hours to complete the ring-closure. Preliminary experiments indicated that the time and temperature of the condensation deserved additional study, because this step appeared more critical and less robust than the ring-closing step. Other variables thought to affect the reaction were the amount of B, E, and the solvent. Therefore, these five variables were studied in an initial screening design to determine which had significant effects on product yield. In many speeches and writings by Box, it has been emphasized that the experimental method is a n iterative procedure where the steps are conjecture, design, experiment, and analysis. The sequence is continued in as many cycles as necessary to find the answers to one or more of the which,how,and why questions. Therefore, to initiate a screening design to answer the which question, all existing knowledge should be used to choose a relevant set of variables and practical ranges over which their effects can be studied. As briefly described above, this very important first step (conjecture) was performed prior to setting u p the screening design. VOL. 6 2

NO.

10

OCTOBER 1 9 7 0

61

TABLE I.

Run

xz

XI

XS

STATIST ICAL DES I GN (26-

X4

xs

Field color Blue Very red Much red Very red Red Red Blue Very much red

23.2

16.9 16.8 15.5 23.8 23.4

16.2 18.1

Variable 1. X I 2.

=

X, =

3. X I =

Xd

=

5. Xg

=

4.

T h e statistical design that was used in the laboratory screening study is shown in Table I . I t is a five variable fractional factorial design, described and referred to as a 25-2 design in (5, 15). That is, the variables are studied in 25-2 = 23 = 8 runs where each variable is studied at two levels, a low level ( - ) and a high level (+). The levels for each variable are indicated in Table I. The responses considered to be important were (1) color of field after condensation step, (2) number of red crystals after condensation step, (3) filtration time, and (4) yield of product. (Note: color of field is the shade of liquor under a microscope.) Responses (1) and (2) served to indicate how the reaction proceeded. Based on preliminary experiments, a bluish field with a very slight amount of crystals present prior to raising the temperature for ring-closing was thought to be important for high yields. A short filtration time indicated desirable large uniform particles that could easily be freed of impurities. The results of this designed set of experiments could be interpreted visually for most responses. For example, the reddest field corresponded to those runs at the high (+) level of temperature. Also, more crystals were formed a t the high temperature (z.e., the “much” values match up with (+) levels of temperature]. Yield and filtration time were a little more difficult to see visually. But there were trends to indicate that the lower level of reactant B and the higher solvent level were desirable for high yield (see runs 5 and 6). No clear trends could be seen for filtration time. These visual results can be verified by a slightly

Crystals None Much None Very much Kone Much h-one Sfuch

Variable Level

-

Condensation temperature Amount B Solvent volume Condensation time Amount E

Screening

1)

Yield, g

Filtration time,sec 32 20 25

21 30 8

17 28

+

90°C

llooc

2 9 . 3 cc 125 cc 16 hr 29 cc

3 9 . 1 cc

175 cc 24 hr 431/2 cc

more rigorous analysis of the data. That is, by averaging the results at the high level of a variable and subtracting the averagc at the low level of the variable, one can obtain an estimate of the variable’s main effect ( L e . , average change in the response when going from the low to the high level of the variable). In a full factorial design (e.g., 2 5 = 32 runs) this estimate is indcpendent of estimates of other effects. However, in a fractional factorial design we sometimes tie up main effects with interactions. This is called confounding, and for the design used here the confounding patterns are shown in Table 11. In Table 11, the main effect of variable 1 is shown to be confounded with the interaction between variables 3 and 5. This means that when the results are averaged at the high level of variable 1, and the avcrage at the low level of variable 1 subtracted, the combined effect of variable 1 and the interaction 3 X 5 is being calculated, and not their individual effects; this is similar for the other confounding relations. However, if it is reasonably sure that certain of the interactions will be negligible, then most of the combined effect can be attributed to the main effect rather than an interaction. The effect involving the interaction 1 X 2 and 3 X 4, is found by cross multiplying columns 1 and 2 (or 3 and TABLE I I .

CONFOUNDING PATTERNS FOR DESIGN

1

+3 X 5

2

+ 4 X j

=

+

(Effect of variable 1) (Interaction between variables 3 & 5 )

3

+ 1 x 5

4 5

+ 2 x 5 + l X 3 + 2 X 4

1 X 2 + 3 X 4 l X 4 + 2 X 3

William J . Hill is a Senior Research Engineer, and Walter R.Demler is a Research Chemist, Sjeciulty Chemicals Division, Allied Chemical Corfioration, Buffu‘alo, N. Y . 74240. AUTHORS

62

INDUSTRIAL AND ENGINEERING C H E M I S T R Y

All 3-Jactor interactions negligible.

(e.g., 1 X 2 X 3) and hipher are

assumed

TABLE 111.

Ef&t 1 2 3 4 5 l l

+ + + + + X X

3 4 1 2 1 2 4

X X X X X + f

Grams yield f2 S . E . -1.5 i 0 . 4 -5.2 f 0.4 2.3 f 0 . 4 -0.7 f 0 . 4 2.3 f 0.4 1.8 f 0 . 4 -1.3 f 0 . 4

5 5 5 5 3 + 2 X 4 3 X 4 2 X 3

E f e c t = average(+)

This is analogous to “hill climbing,” where one changes the significant variables in the direction predicted by the screening design to give maximum improvement in a response. Once the optimum region is found, using this iterative procedure, then a n optimization design, such as the one below, can be applied.)

SUMMARY OF EFFECTS Filtration time, sec -7 0 -4

G

Optimization

1

I t was decided that only temperature (variable 1) and the amount of B (variable 2) would be studied in the initial optimization runs. These variables appeared to have the most important effects on the responses on the basis of the screening runs. Although also important, variables 3 and 5 were set at their present best levels (+) because of economic factors associated with batch size and cost of materials of a visuqlized commercial process. Later it was shown that variable 3 was in fact a t its optimum level. Variable 4 did not appear to be important and was fixed at the preferred (-) level. A simple design for studying both linear and quadratic effects of 2 variables is a 32 equals 9 run design. That is, each variable is studied at three levels. This design is shown in Table IV and was incorporated in this study. Notice that the range in temperatures was kept the same, but the range on the amount of B was lowered so that the center point (0) in this design was the low level (-) in the previous design. This trend was indicated by the first design. Notice, also, that run 6 of the first design fitted into the second design. The results of these 9 runs were very informative. They did confirm the quadratic effects owing to these two variables on both yield and filtration time. (See Figures 1 and 2, respectively.) Due to the grid-like pattern of the design runs 10-17, plus runs 6 and 8 of the first design, curves of constant yield could be drawn in visually (e.g., contour for 24 grams yield); this could also be done for filtration time. For both responses, the contours represent concentric ellipses with unique optimum values of the variables a t the center

10 3

- average(-)

4), and finding the difference in the averages correponding to the resulting (+) and (-) levels; this is similar for other interaction effects. When cross multiplying columns, a (-) times a (-) is a (+), a (-) times a (+) is a (-), and a (+) times a (+) is a (+). A summary of the effects on the quantitative responses (yield and filtration time) is shown in Table 111. The standard error (S.E.) for the effects on yield was obtained from the formula 2/4S2/NwhereN equals number of runs equals 8, and S equals . 3 is an estimate of the error standard deviation for yield, which was found from prior repeats. T h e variable (and/or interaction) having the largest effect on yield was reactant B (variable 2). However, all the other combinations had effects that were significantly larger than two standard errors. This led the researchers to speculate that they were near an optimum or straddling it, meaning that there were unaccounted for quadratic effects owing to the variables. This type of behavior is well documented in Chapter 11 of (70). A similar behavior was noticed in the filtration time results. I t was on this basis that future experiments were planned to study this quadratic or curvature behavior in the response surfaces. (Note: if the optimum is not near after a n initial screening design, a very useful technique for finding the optimum region is the method of steepest ascent (70).

TABLE IV.

Run 10

11 12

13 14 15 16 6 17

O P T I M I Z A T I O N DESIGN (39

x 1

xz

Yield, g

-

-

21.1

0

23.7

-

20.7 21.1

0

+

0 0

P

+ + +

Field color

Filtration time, sec

Crystals

Blue Blue

None None

24.1

Red Slightly red Blue

22.2

...

-

18.4

0

29.4

Slightly red Red Very much red

None None Very slight Slight Slight Much Much

+

+

21.9

Variable

10

8, 35

8 7 18 8 10

Variable Levels

1. X I = Condensation temperature 2. X S = AmountB

150

+

0

9OoC

l0O0C

llooc

24.4 cc

29.3 cc

3 4 . 2 cc

VOL. 6 2

NO.

10

OCTOBER 1970

63

-y -

110-

3

5 P 100-

E

3w P

-

8 9024.4 24.4

Figure 1.

29.3 AMOUNT

Contours

34.2 B (CC)

29.3 AMOUNT

39.1

34.2

39.1

B (CC)

Figure 2. Contours of constant jltration time (seconds)

01constant yield (grams)

-

110-

0 0

W

L

3

L

-

k a W

0

5 100t

z

P

3

-

z

90I

24.4

29.3 AMOUNT B

34.2 (CC)

64

INDUSTRIAL A N D E N G I N E E R I N G CHEMISTRY

29.3 AMOUNT B

34 2 (CC)

39. I

Contour diagram of crystals after condensation step

Figure 4.

Figure 3. Contour diagram of jiiield color after condensation step

of the ellipses. For yield, the maximizing values of temperature and amount of B were predicted to be 100°C and 29.3 cc, respectively. Run 14 verifies this result. Also, this same run is a near optimum for minimum filtration time. In addition to these two responses, contour surfaces were drawn in for the qualitative responses, field color, and crystal amount. These are shown in Figures 3 and 4, respectively. Their behavior is parallel in nature; an increase in both temperature and amount of B produced more redness and more crystals. This was interpreted as an extremely valuable indicator for the process operator. That is, if he noticed that the field was too blue with no crystals at the end of the designated condensation time and before raising the temperature for ring closing, he could take corrective measures (increase the temperature and/or the amount of B) to make the reaction mass less blue in field and having very slight crystals. This would correspond to desirable yields and filtration times. One can see this correspondence by superimposing transparencies of the contour surfaces on top of each other. The optimum conditions for yield (100°C and 29.3 cc of B) correspond

I

24.4

39.I

to a bluish to slightly red field with very slight crystals. The resulting yield represented a better than 50% improvement over previous technology. A more rigorous statistical analysis was also performed on this second set of data as a check of the visual interpretation. An empirical graduating function (e.g., polynomial) was fitted to the yield, using an analysis procedure described in (10). This model TABLE V.

p

=

REGRESSION EQUATION FOR YIELD

24.4 - 0.3

XI

+ 0.7 Xg - 0.9 X i z - 2.8 Xz2 -k

1.0

XiXz

where = Predicted value for yield

XI =

X? =

(Condemation temperature ("C) - 100°C)

looc (Amount B (cc) - 29.3 cc) 4.9 cc

Multiple correlation coefficient R2 = 0.99. of the variation is explained by the equation.)

(This means 9 9 % h

XI = 0, Xe = 0, and Y = 24.4. This compares with the observed value of 24.1. e.g., at 100°C and 29.3 cc,

TABLE VI.

RUNS TO DETERMINE BEST SOLVENT LEVEL

Filtration Run 18 19 20

Solvent vol., 225 125

150

GC

Yield, g 24.1 22.4 23.4

Other variables fixed a t : Condensation temperature equals 100°C; Amount E equals 43.5 cc.

or regression equation is shown in Table V and contains linear terms in the variables, an interaction term, and quadratic terms in both variables. X I is a coded variable for temperature, and X Z is a coded variable for the amount of B where the coding is defined in Table V. This equation can be used as a prediction equation within the range of variables studied, but extrapolation using this model would be unwise, because it is only a French curve of the data and has no mechanistic meaning. However, it does serve as a useful prediction equation within the region studied and, also, it can be used to generate the response contours. I n this study, the response contours generated by the equation agreed with the visual approximation; that is, the equation generated the same concentric ellipses in Figure 1. Once the process appeared optimized for temperature and reactant B, three additional runs were performed a t varying solvent levels a t the optimum conditions of the other four variables. These runs are shown in Table VI. When these runs are compared to run 14, the existing solvent volume (i.e., 175 cc in run 14) gave the best overall results. Therefore, the optimum lab operating conditions were concluded to be 100OC condensation temperature, 29.3 cc B, 175 cc solvent, 16 hours condensation time, and 43.5 cc E. Epilogue

After the lab study, the process was moved into the pilot plant to make market development quantities for potential customers. The best conditions for the lab were scaled up to the equivalent conditions in the larger scale unit. T h e results of the first pilot plant run indicated that the yield was below optimum by about 10%. To find a corrective measure to improve the pilot plant yield, the condensation step results were checked. They indicated that the field was bluer than it should be under optimum conditions. A review of Figure 3 indicated that the amount of B should be increased in order to give a blue to slightly red field that would correspond to near optimum yield conditions. Therefore, the next pilot plant run was performed a t 10% higher B, giving the desired field results and the same optimum yield found in the lab. The process was further scaled up to full scale plant operation where the same optimum yield was attained. The same process conditions were used as in the pilot plant except for a slight change in the operating procedure that was necessary to adjust for a change in the raw material source.

Field color Blue Blue Blue

Crystals

time, sec

Very slight Very slight Very slight

14 15 10

Amount B equals 29.3 cc;

Condensation time equals 16 hr;

Summary

The purpose of this paper is to show, by a case history, the value and simplicity of using statistical designs in research situations involving variable screening and optimization. The point stressed here is that once the experiments are well planned, as in a statistical design, the analysis of the data is very often straightforward and leads to informative results. This is because, with many statistical designs, the experiments are patterned in such a way as to concentrate on the information associated with the variables, and to provide a n easy interpretation of the experimental results. However, with classical one-variable-at-a-time experimentation the task is usually much more difficult. If the experiments are poorly planned, the information on the variables and their interactions probably isn’t there; even if the information is present, the pattern of the experiments may make for difficult interpretation. A combination of limited information and difficulty in interpretation can lead to overall inefficiency in the experimental method. The fact that statistical designs help to avoid these hazards is perhaps reason enough for their application. Acknowledgments

The authors wish to express their appreciation to Arthur Jachlewski and Margaret Orluk for their excellent work in conducting the laboratory and pilot plant runs. Also, the authors wish to thank E. Kenneth Roop for performing the computer verification of the yield and filtration time response surfaces. REFERENC ES (1) Atkinson, A. C . and Hunter, W. G., Technometrics, 10, 271 (1968). (2) Box, G. E. P. and Hill, W. J., ibid., 9 , 57 (1967). (3) Box, G. E. P. and Hunter, W. G., ibid., 4, 301 (1962). (4) Box, G. E. P. and Hunter, W. G., Proc. IBMSci. Comput. Symp. Sfafirt.,October, 113 (1965). (5) Box, G. E. P. and Hunter, J. S., Technomefrics, 3, 311 (1961). (6) Box, G. E. P. and Hunter, J. S., ibld., 9, 449 (1961). (7) Box, G. E. P. and Lucas, H . L., Biometrika, 46, 77 (1959). (8) Box, G. E, P. and Wilson, K . B., J . Roy. Slptist. Soc., B13, 1 (1951). (9) Box, G. E. P. and Youle, P. V., Biometrics, 11, 287 (1955). (10) Davies 0. L., “ T h e Design and Analysis of Industrial Experiments,” Hafncr Publ. Co:, New York, 1954. (11) Hill, W. J. and Hunter, W. G., Technomefrics, 11, 396 (1969). (12) Hill, W. J. and Hunter, W. G., ibid., 8, 571 (1966). (13) Hill, W. J., Hunter, W . G., and Wichern, D . W., ibid., 1 0 , 145 (1968). (14) Hunter, W. G. and Atkinson, A . C . , Chem. Eng. (New York),73,159 (1966). 5 9 , 43 (1967). (15) Hunter, W. G. and Hoff, M. E., I N D .ENO.CHEM., (16) Hunter, W. G . , Hill, W. J., and Henson, T. L., Can. J . Chem. Eng., 47, 76 (1969). (17) Hunter, W. G. and Mezaki, R., A I C h E J . , 10, 315 (1964). (18) Hunter, W. G. and Mezaki, R., Can. J.Chem. Eng., 45,247 (1967). (19) Kittrell, J. R. and Erjavec, J., Ind. Eng. Chem., Procesr Der. Develop., 7, 321 (1968). (20) Kittrell, J. R., Hunter, W. G . , and Watson, C. C . , AZChE J.,12 (5), (1966).

VOL. 6 2

NO. 1 0 O C T O B E R 1 9 7 0

65