FACTOR SCREENING IN PROCESS DEVELOPMENT - Industrial

FACTOR SCREENING IN PROCESS DEVELOPMENT. Cuthbert Daniel. Ind. Eng. Chem. , 1963, 55 (5), pp 45–48. DOI: 10.1021/ie50641a007. Publication Date: ...
0 downloads 0 Views 422KB Size
STATISTICAL APPKOACHES TO EXPERIMENTAL DATA

a

FACTOR SCREENING IN PROCESS DEVELOPMENT

A method of organizing both known and unknown efects of m a y factors, a systematic scheme f o r analyzing all data available, and entirely new experimental designs to supplement existing data actor screening is often just prejudice screening.

F There is nothing wrong with screening prejudices.

O n the contrary, that may be the most salutary thing to do with them. But surely we should avoid dignifying our current beliefs by calling them theory. T h e term screening is used here to describe what I think you should do when the number of factors whose effects you know is only a small fraction of those whose effects you do not know. Three steps are suggested. First, compile an influence matrix containing what you think you do and do not know about the effect of variables on responses which you can measure. Second, analyze by simultaneous least squares all data available. Third, on the basis of the second step, revise the influence matrix and plan the first phase of factor screening in balanced sets, never using less than 16 in a set. For this stage, many published plans are available.

CUTHBERT DANIEL

Many will be simply plus or minus signs or zeros to indicate the judged direction of an effect. Other cells will contain simply a n i for ignorance. Small sketched graphs are sometimes useful. If such a table is not compiled, the experimenter may proceed in an unconscious, inadvertent way. He will drop rows and columns because he feels that they are less important or because he fears that the experimental program will become too big. The aim of a factor-screening program is to find out as much as possible about the correct entries for a particular influence matrix within the usual time, money, and man-power limitations. Clearly this aim includes a large part of industrial experimentation. However, two aspects of current practice can be criticbed. First, inefficient and even erroneous methods of analyzing data are the rule rather than the exception. Often more infwmation can be obtained from existing data than from uncritical use of canned multiple regression programs. Second, much wider ranges of experience and hence more validity can be reached by planning and executing experimental programs that include a larger number of factors in each part of the campaign.

Compiling the Influence Matrix

Screening the Mass of Available Data

I t is useful to set down the present state of knowledge in a table called the influence matrix. Table I shows part of a typical influence matrix. The rows identify the factors (independent variables, operating conditions, and raw material properties). The columns identify responses (dependent variables, product properties, and yields). Cell entries indicate present knowledge and beliefs about causes and effects. Some entries will be regression coefficients already known.

The mass of data referred to here is in the analyst’s own records, or in those of his own company. Here also, the ideal way to present such data is as a table where one column represents values of each independent variable, Xi,and one column represents the observed value of each response, Y,. Also, one row, numbered from 1 to N must be included for each experimental, pilot plant, or plant run. (Continued on next puge) VOL. 5 5

NO. 5 M A Y 1 9 6 3

45

The reason for analyzing all data is to find a simple equation

Tliose data are best which coi-er a wide range of experimental conditions, preferably more than is needed for new experimental or plant-design work. However, when this is not the case, the scope of factor screening broadens even more to include what G. E. P. Box calls the exploration and exploitation of response surfaces. The reason for analyzing all data is to find a simple equation, having preferably only a few constants, which describes adequately all the data. This objective is rarely reached in a single step. More often than not, a series of trial fits are required Sometimes, a small part of the data, perhaps only one or two runs, must be rejected because it does not fit the pattern established by the remainder. Items which according to my own experience (gained of course with a wide group of colleagues) are desirable at each step in least squares fitting are listed on page 48. But perhaps something can be gained by the so-called standard, one-at-a-time, graphical, or other methods If you think so, try them. Try also the many new nostrums that excess computer time has made so temptingly available. There is stepwise regression and many other ways of regressing, all devised to save time when we do not need to save time. None of these techniques has been discussed in referred reports in technical journals. Occasionally these methods work. But this compares in importance to the fact that onefactor-at-a-time experimentation sometimes works also. The commonest defects in collections of multifactor data are plot-splitting, bad values, and function bias. Each of these defects invalidates the conventional least squares computing routine, and each should be looked for before, during, and after each attempt at a leastsquares fit. Plot splitting or just plain grouping appears in much laboratory data, most pilot plant series, and nearly all plant data. It results from collecting data in sequences, based on ease of operation. Hard-to-change factors are changed rarely, usually only when necessary. O n the other hand, easy-to-change factors are whipped through several levels, all at one level of the hard-tochange factor. Often, this is adequate for pilot plant or even bench scale equipment, but not for standard multiple regression computer programs, all of which assume statisticallv independent observations with the same average error variance For example, in the 21 data points given by Brownlee ( 7 ) ,the successive points are by no means independent. Most of them represent the plant’s approach to five or six equilibrium conditions. Similarly, the laboratory data given by Hader and Grandage ( 70) to illustrate multiple regression techniques do not represent 32 independent tests, but rather the behavior of 10 crude oils, each tested under 3 or 4 conditions of distillation. T h e multifactor data given on multiple regression in the book by Box and others 46

INDUSTRIAL A N D E N G I N E E R I N G CHEMISTRY

(5) do not seem to have this shortcoming. This book is to be recommended for its clear explanation of the illogicality of one-at-a-time analysis of multiple regression data. Occasionally, bad values can be spotted by extreme discrepancy of j values from points taken at x , levels which are nearly the same. Such a discrepant y value at an extreme x i condition may mean curvature away from the fitting equation. However, when it is near the middle of the whole set of 2 %conditions, it can be called a bad value with more confidence. Indeed, its omission usually will not change the fitting equation noticeably; it will only decrease the residual mean square and hence improve the goodness of fit as reported by R,*and F,. The plot of residuals on a normal grid often aids in spotting excessive deviates, but of course, these are dropped only if their corresponding Y values are not extreme. Function bias or systematic lack of fit of an equation can be detected easily if a separate estimate of variance is available. But detection is not easy when the variance estimate must be derived from the same data. T w o suggestions, yet unreported in the literature, are made here : If points that are clustered in factor space but taken at widely different times show residuals of the same sign, then they are on the same side of the fitted plane and hence show probable function bias. If those points which are closer together in factor space show relatively small spread in their corresponding y values, then this mean square y variation may be taken as an upper bound on the variance of y’s taken at the same point in factor space. With this upward-biased variance, perhaps it still can be shown that the equation does not fit properly. For giving tlw type of information just described, an interim regression program is usually convenient. It can indicate the degree of improvement in fit produced by each new cleanup or transformation of variables. Given on page 48 are nine items that are useful in these preliminary passes. Building up a regression equation by looking for the most influential factor and then for the best supporting factor has the same defects as one-at-a-time experimentation. Contradictions may appear-a factor that was earlier judged most influential may later be rejected. The principal defect of this practice is that it provides no internal evidence of its correctness or consistency. Similarly, starting with a full equation containing all factors is merely an inverted one-at-a-time method. All regressions produced by the set of K factors taken in all combinations should be examined. There are 2K such equations. If K is greater than five, most analvsts are unwilling to compute the whole set, and all are glad to hear that this does not appear necessary. For K less

TABLE 1.

AN INFLUENCE M A T R I X

I

FACTORS

-

Name

RESPONSES

Sjmbo

Food ratc

--

10-15

+

Conc. of

0.1-0.4

0

i

y

-A

a

0.1.

m

0.4

--

Season

0- 1 0-1 0-1

i = don’t know sign of effect. response.

Winter f Summer -

0 = factor has no influence on this

than 9, it is usually sufficient to fit a balanced fraction of the whole set to see if each factor operates consistently in all regressions. One useful measure of goodness of fit appears to be the logarithm of the ratio of the fitted mean square to the residual mean square. This is called log F,,even though F, cannot possibly be distributed as the ratio of two chi-square variates, because the residual mean square in its denominator must occasionally contain some real regression. Table I1 gives a balanced set of 16 regressions ( N = 94, K = 7 ) and their corresponding log F, values. The usual computation to show effects and interactions gives the values shown in the third column of the table. However, there is no simple correspondence between the numbers in this column and those in the first two columns. A half-normal plot of the 15 contrasts is sometimes needed to see that factors X I and x 2 should be included in the approximating equation, and that the remaining factors should be excluded. Screening by Collecting N e w Data

TABLE II. A BALANCED SET OF 16 REGRESSIONS ON A SET O F DATA W I T H N 94 and K 7

=

Factors in

700 log (F,/70)

None

1

4 67

2 4 6 12 34

0oa

562

40

434

-8 7

7

Total Effects

Effect Due to

268

88 -41

96

12

1 3 6

47

36

23 67

4

70

1234

81

- 104

567

18

140

1

30

- 168

2 45 7

28

30

12 56

79

6

45

=

3456

12

70

1 3 5 7

41

- 22

23 5

51

-8

1234567

92

52

a Zero arbitrarily inserted. b Large negative interaction means that when X I i s in, addinp xg decreases log F,.

AUTHOR Cuthbert Daniel is a Consultant in Industrial Experimentation in Rhinebeck, N . Y . He is especially indebted to J . W . Gorman, Engineering Research Dept., American Oil Co., and 0 . Dykstra, Research Center, General Foods Carp. for their criticisim and care in carrying out this work.

All experimental plans reflect what you know, what you think you know but don’t, what you don’t know, and what you think you don’t know but do. All experimental plans should give information about the contents of each of these categories, but they should not be hypersensitive to a major mistake in some of your less-sure assumptions. For example, if a catalyst ages while it is being tested a t each of two pressures, PI and Pz,then the simplest linear trend-free, one factor, two level, plan is the one with three runs done in the order P1,Pz,and P1. If the trend in catalyst activity is indeed linear in time, then the average of the two results a t P1can be compared with the result at Pz. But if the catalyst has a seriously curved trend of activity with time, then the effect of pressure change and curvature with time are measured as a sum, completely confounded. The plan used would give no clue to this. Do not take a standard screening program and hope it will work. Consider your ignorances and allow for all of them you can afford. Do not spend too much time looking for the controlling factor. There may not be any. Rather, plan your work in blocks of 1 6 to 32 runs. Separate main effects from two-factor interactions early, and not leave this unhappy contingency for later study. T o get the gains of high precision along with gains of increased validity, you must do your screening in wellbalanced fractional replicates. Time is not lost by this procedure if you need the precision of duplicated tests, and if you must eventually examine for interactions any way. This advice is good when it can be followed. But sometimes the experimenter is uncertain whether or not he can reach all conditions for a balanced set. Screening will then mean making a set of “operability runs.”

(Continued on next page) VOL. 5 5

NO. 5

MAY 1963

47

If the operability region of a process is under study, perhaps a set of random levels for each K factor should be chosen, and then 2K runs (or 25 whichever is greater), should be done. This is to determine in each run only whether or not the process will operate, without troubling to take all measurements (12). O n the other hand, if the object is to select a small number of factors (four or less) from a large number of RESULTS T O BE P R I N T E D AFTER EACH COMPUTER PASS 1

I

3

2

bisi

__

I(1

7. R u n s i n d r 8. o.c.d., dm 9. d,, us. Y ,

Definition of terms used in the above table

1. S (number of independent data points) 2.

K (number of constants used in fitting an equation of

the form Y , = bo +

K

b,x,) *=1

3.

R,,2 (coefficient of determination for each j 1by all x , )

4. F,, (ratio of mean square fitted to mean square not fitted) 5. iMSR (residual mean square)

candidates (20 or more), group-screening methods may be desirable. I n such plans, levels of whole sets of factors, usually three, four, or five, can be changed one at a time to see if any of the set makes a difference. The remarkable paper of Watson (13) shows how size of the grouping depends on the (subjective) probability of a factor’s influence. This has been extended to multistage grouping by- Patel ( I I ) and to factorial arrangements by Connor (8). If relative influences of say four to 20 factors must be estimated, some at two, three, or four levels, then Addelman’s plans ( I ) may be preferable. However, if one or at most two influential factors, among say 18 candidates, are sought, then main effect, supersaturated plans of Booth and Cox ( 3 ) may be effective, at least for a first trial. When it is safe to assume that only a few factors, say two or three out of seven, and a very small minorityof two-factor interactions are large, then Youden’s partially confounded pairs of fractional replicates ( 7 5 ) will also be safe. If the main effects of from four to 16 factors must be detected with maximum sensitivity and separated from all two-factor interactions, then the so-called plans of Resolution I V ( 4 ) ,should be used. Such plans are fully efficient for the first-order effects of their factors, but are supersaturated for all two-factor interactions. The supersaturation can sometimes be satisfactorily resolved by doing, one, two, four, or eight additional runs ( 9 ) . If you cannot afford to miss a main effect or a twofactor interaction even in the first round of experimentation, then use a “two-factor interaction clear” plan. The plans given in the standard texts--e.g., by BOX (6)-are efficient in the purely statistical sense of giving maximum precision for all estimates. But some are inefficient in the sense that they use about twice as many runs as are really necessary (for K greater than 5 ) . They can all be reduced in size with some loss in precision (2,9,7 4 .

6. A table of individual judgments of the reality t,, of the linear interdependence,Rt2, and of the average influence, b,S, of each of the K factors : 2

t,

R,’

REFERENCES

b,S,

K

- v , ~ %where m

7.

Chronological listing of . . .Ar for each response.

8.

Plot on a normal grid of the observed cumulative distribution of the d,

9.

Plot of d,

48

= ym

= 1,

2

us. Y , (equation values) to detect bad values, curvature. and nonconstant variance

I N D U S T R I A L A N D E N G I N E E R I N G CHEMISTRY

Addelman, S., Technometrics 3, 479-96 (1961). Zbid., 4, 21-46 (1962). Booth, K., Cox, D. R., Technometrics (to be published). Box, G. E. P., Hunter, J. S., Zbid., 3, 311-51 (1961). Box, G. E. P., others (0.L. Davies: ed.), “Statistical Methods in Research and Production,” Oliver and Boyd, 1957. (6) Box, G. E. P., others, (0.L. Davies, ed.),“Design and Analysis of Industrial Experiments,” Oliver and Boyd, 1956. (7) Brownlee, K. A , , “Statistical Theory and Methodology,” Wiley, New York, 1960. (8) Connor, W.S., “Developments in the Design of ExperimentsGroup-Screening Designs,” Proc. 6th Conf. on Design of Experiments in Army Research, Development and Testing. (9) Daniel, C. J . , Am. Statistical Assoc. 57, 403-29 (June 1962). (10) Hader, R. J., Grandage, A. H. E., in “Experimental Designs in Industry” (V. L. Chew, ed.), \%ley, New York, 1958. (11) Patel, M. S., Technometrics 4, 209-17 (1962). (12) Robbins, H. E., personal communication. (13) Watson, G. S., Technometrics 3, 371-88 (1961). (14) Webb, S. R., Annals of Mathematical Statistics 33, 296 (1962). (15) Youden, W. J.: Technometrics 3, 353-8 (1961).

(1) (2) (3) (4) (5)