CUTHBERT DANIEL 1 16 Pinehurst Ave., New York 33, N. Y.
Application of Statistical Methods in Chemical Engineering 1 The backlog of
statistical knowledge ready to be applied to chemical engineering problems is large
b
MOST
Best immediate prospect is that some engineers will study this research and decide to use some of it
chemical engineers spend part of their time interpreting data. Kearly all data require interpretation; only rarely do the numbers force a single indubitable conclusion on the interpreter. Even when “the data” consist of a single number, the experienced engineer knows that another measurement taken under essentially the same conditions would not always agree exactly with the number a t hand. Such discrepancies are often shrugged off with some unverified statement: “Conditions have changed” or “We thought that carload was poor” or “The assay lab again.” The fact that every system possesses some intrinsic random variability is not easily accepted by chemical engineers. Perhaps part of the reason for this rejection is the engineer‘s lack of knowledge of how to be objective in drawing conclusions from uncertain data. He can hardly be blamed for this since none of the standard texts-e.g., Walker, Lewis, McAdams, and Gilliland‘s “Principles of Chemical Engineering” (22), Perry’s “Chemical Engineers’ Handbook” (76), Sherwood and Reed‘s “Applied Mathematics in Chemical Engineering” (20) makes even a passing reference to modern statistical methods. Random variation and the uncertainty it brings with it cannot be entirely eliminated by mathematical manipula-
1 392
tion. There are, however, many cases in the experience of every statistician, and quite a few in the literature, in which random error has been strikingly reduced by statistical means. Some examples appear later. A major contribution of statistics to chemical engineering lies in giving engineers a means of measuring the random variability of their processes, together with a means of partitioning the variability into parts allocable to different sources. Thus, if the engineer could actually say “80% of our observed product variability comes from raw-material variation, 159& from process variation, and 5% from assay uncertainty,” he would take entirely different action from that needed if the percentages were respectively 20. 40, and 40. This sort of partitioning (discussed later) is called components-of-variance analysis. When large masses of data on pressures, temperatures, concentrations, and other operating conditions have already been accumulated, it is natural to believe that they must contain a great deal of information about the process. The analysis of these data is usually handled by the statistical device called “multiple regression” analysis. The yield of information from these data, per dollar spent, is generally low. They contain much less information than data taken under planned conditions. T o this
INDUSTRIAL AND ENGINEERING CHEMISTRY
must be added the further adverse circumstance that the numerical work required to extract the little information present is expensive and complex, usually requiring large-scale computing machinery. When the effects of a considerable number of independently variable factors are to be studied, factorial designs and their fractional replications are available. I t is the writer’s opinion that the greatest gains from current statistical methods applied to chemical engineering problems will be made in this area. The practicability of studying the simultaneous impact of several factors (five to fifteen are oftcn manageable) is a windfall unsuspected by most engineers. By far the commonest application of statistics to the proccss industries is in the use of some of the techniques of statistical quality control. Simple plotting of the averages of small adjacent groups of readings gives many operating engineers a chance to visualize the operation of the plant more effectively than the usual columns of hourly or shift figures. Parallel plotting of the scatter (range) within the small sets gives one a feeling for how large the variability of neighboring points is and, even more important, a sense of how stable thisvariability is. If the papers published in the Transactions of the National Conocntions of the American Society for Quality Control are typical, then improvements in stability can often be produced even with very little attention given to such tiresome details as assumptions. The simple attention-calling of the range and mean charts appears sufficient to induce improvement in many cases. More careful appIication of the serious ideas behind statistical quality control can hardly fail to produce even better results. The idea of verifying the stability of the population sampled (establishing a “state of statistical control,”
in W. A. Shewhart’s phrase) is a serious one, cutting deeply into the whole set of ideas and practices called scientific method. T h e idea of rational subgroups gives in many cases a real handhold in the separation of measurement error from process error. T h e idea of examining long sequences of process measurements in rational groups, to watch for trends, jumps, mavericks, and mistakes, is the third first-rate idea of statistical quality control. Perhaps the principal use of statistical quality control for chemical engineers will turn out to be as an educational step in the development of men who will want to make more thoroughgoing applications of statistics. Statistical quality control as it may be applied in the process industries is discussed further in a later s-ction. The statistical devices mentioned so far (componen ts-of-variance analysis, design of experiments, multiple regression, and statistical quality control), even though modern, are standard and so to speak ready-made. \%‘henever the assumptions made in their derivations appear justified, efficient use may be made of these tools. The bulk of this paper is directed to discussing these methods and some others Lvhich have already attained wide use. Some statisticians feel that these readymade tools are not likely to remain the major contributions of statistics, that they are not modern (the main principles were clearly understood twenty years ago), and that new statistical tools should be forged, more closely fitted to chemical engineering situations. These contentions may well be true. Much can be learned, however, by using the available and tested methods. Their lack of modernity does not mean that chemical engineers have used them widely; indeed their age has been attained in occasional and cautious application. Some parts of statistics apply with about the same cogency to laboratory scale engineering research, to pilot plant work, to semiworks process development, and to full scale plant tests. This degree of generality is claimed for regression analysis and for the related areastatistical design of experiments. Oper-
ating engineers, and those connected with assay and control laboratories, find quality-control charts of direct and easy use, both in judging process stability and in evaluating the analyses reported by the laboratories. The recently developed methods of acceptance sampling by variables have found application in plants which receive raw materials in batches, or in lots that have serious heterogeneity-e.g., iron oreand which therefore require some sophistication in sampling methods. T o return to the general discipline called here “the statistical part of experimental design,” developed originally by Fisher (72),it appears that some parts of this field apply directly whenever more or less variable measurements of the effects of several factors are to be made. The general problem of so placing experimental points that they give maximum information about the effects of several independent variables has been solved for many common situ-
-3r
-‘d
-“
,
0
+e
72”
+
0
The normal law of error
ations. T h e work of Box and his associates (3, 6, 8 ) is finding increasing application, especially in bench and pilot plant work. O n the other hand plant scale tests with their great expense, limited range of variation of factors, and high variability, are usually best run as “factorial designs.” An unexpected advantage of these designs is that they can usually be arranged so that no out-ofspecification material is produced. Statistical Literature. There are now available several works written expressly for chemists and chemical engineers by statisticians of rank. Two of these deserve immediate reference since thev
- = 92
high temperature
84 82 .
Results at low temperature
80 -0
1
2
TIMU scale
Solid lines show no interaction between temperature and time. Lower solid line and upper dotted line illustrate interaction between temperature and time
should be on the desk of any engineer using statistical methods. “Statistical Analysis in Chemistry and the Chemical Industry,” by Bennett and Franklin ( Z ) , is a one-volume text planned to take a patient engineer most of the way from the very beginning to medium competence in application. Its balance between theory and application is excellent, especially as its authors show frequent recognition of the fact that it is only theory that is applied. No experienced engineer will expect a single text to answer all his questions, or to give a clear summary of all the latest developments. Some developments that were already well along when the book was being written do not appear. S e w work on error rates and allowances-mainly by Tukey and his associates (27)-on fractional replication of factorial designs (7, 14) and, on the fitting of response surfaces (3-6) are in the writer’s opinion not fully treated in this volume. But it is quite safe to say that engineers who have mastered the contents of this single book will generally know what to read next, and they will actually know more about the field than most engineers who have been brought up on less important texts. Probably the very next book to be read by the engineer who has studied Bennett and Franklin will be “Design and Analysis of Industrial Experiments,” edited by Davies (9). Many new ex-
An experimental plan which uses 1/2 (figure a t left), 1/4 (center), 1/8 (right) of the treatment combinations from a complete factorial design with seven factors, each a t two levels. (Letters denote the factors, subscripts the levels of factors, shaded squares the combinations to b e used) VOL. 48, NO. 9
SEPTEMBER 1956
1393
Millions and millions of digits-indispensable
amplrs of factorial designs are given and a n excellent summary of the main points in Box’s surface-fitting program takes up a full chapter. The research or development engineer who has used or read both these works will be able to decide responsibly on the extent to which statistical design can aid him in his work. Some will wish to extend their insight into the designs proposed by reading some of the historically important papers and books. The leading papers are by Fisher and by his colleagues, especially Yates. Forty-three of Fisher‘s papers are available in ( 7 7 ) Yates’s pamphlet “Design and Analysis of Factorial Experiments” (24), readable by chemical engineers with average mathematical background. A good summary of the underlying distribution theory is to be found in Kempthorne’s book ( 7 4 ) . Some engineers and chemists, finding the ideas of statistics unfamiliar and abstruse, may wish to start their reading with Youden’s little book (25). Written by a chemist who has himself made major contributions to statistical methods, this work has been criticized only as being perhaps too persuasive. An optimist might report cheerfully on the prospects, the trends and the opportunities in statistical education, but the record of current activities is short. A term of statistics is now required in the course work of undergraduates in chemical engineering a t Cornell. As far as the writer knows, no other school has such a requirement. A similar elective course a t Princeton is well attended.
1 394
building blocks in statistical design of many experiments
In many universities chemical engineering students can take courses in statistics given in other departments. The range of excellence of these courses is so wide that it lvould seem safer to plan selfteaching until some more serious system of accrediting is in wide use. Something can be judged by the quality of texrs used in statistics courses for engineers. If either of the two longer works praised above or Hald‘s book (73) is used. and if the instructor has himself some experience beyond the field of quality control in applying statistical methods to chemical engineering problems, then the course is in all likelihood a recommendable one. But it appearsa t least in the East and Midwest-that these conditions are not met by many of the courses offered. An interesting interim effort to bridgr the gap between demand and supply of serious statistical teaching is being made by some schools in the form of short ,,. “intensive” courses. These courses tend to become propaganda sessions if they are shorter than a week, and are sometimes merely extended exhortations when longer. The minimum course from which something can be gained by a willing chemical engineer is: in this ivriter‘s opinion. a 6-week course a t one of the major statistical centers of the country. (hlost statisticians would judge the statistical departments of the State College of North Carolina, of Iowa Stare University, and of Princeton. Columbia, Stanford. and the University of California as in this category. hlan)- ivould ivant
INDUSTRIAL AND ENGINEERING CHEMISTRY
to add to these the departments at Virginia Polytechnic Institute and a t the University of Illinois.) Statistical and Scientific Methods. There is a feeling among some engineers and scientists that statistical methods are put forward as a substitute for other methods usually referred to as “scientific.” This feeling is admittedly reenforced by the scientific limitations of some of the proponents of statistical methods. The defective integration of modern staristical ideas with the general scientific program has resulted in the more-or-less ingroxvn development of a sizable bolus of material that now must be accepted or rejected as a whole. There is little content to the argument often heard from scientists bvho are not statisticians to the effect that they prefer exact methods and hence will have nothing to do with statistics. The real decisions to be made are : a. Between archaic and modern statistics 6. Between less and more efficient methods of collecting data c. Betlveen judging by fashion and current practice and judging by taking the time to make a more logical appraisal Each of these choices has been stated ro make clear which is the desirable alter-
native. Scientists can hardly maintain the convention that whatever they are noiv doing is scientific and that all elsr is not scientific. Many introductory- texts in physical chemistry. elementary physics, and ai)-
plied mathematics contain a section on “errors of measurement.” This section is usually a n oversimplified presentation of Gauss‘s propagation-of-error equations, making the assumption (usually unstated) that if there is random variation, it must be in the measuring operation. I t violates the scientists‘ deterministic preconceptions to consider that random variation may be present in the process itself. This objection persists in spite of the successes of random (“stochastic”) models in many parts of pure science, from the kinetic gas theory and statistical mechanics to particle physics and radiation chemistry. The preference for exact methods is entirely natural and such methods will al\vays be striven for. but it seems safe to say that there \vi11 for some time remain a n area of scientific endeavor in which such exact methods are not a t hand. The graphs which appear in every issue of every chemical engineering journal-and ivhich shoiv the data points-support this contention. For many readers, it is not necessary to add the comment that statistical methods cannot be substituted for technical competence or for creative scientific thought. They are only offered as efficient means of testing scientific ideas, of measuring properties, of screening large numbers of alternative possibilities, of aiding in making the decisions that plague all technical men, especially chemical engineers. A clearer exposition of this point of vierv is to be found in “Introduction to Research,” by TVilson (.??). Improvements in Precision a n d Accuracy of Measurements
Paired Comparisons Versus Absolute Measurements. For some purposes so-called “absolute” measurements are required, but more often than not it is the comparisons between several conditions that are of major interest. Such direct comparisons must usually be made ”close together.” Color matching, flavor preferencing, and scent grading all have their more objective counterparcs in chemical engineering. It is usually better to test the effect of raising some temperature in adjacent time periods, and with the same lot of raw material, if that is possible. Of course, the effect \vi11 have to be checked a t other (adjacent pairs of) time periods, and n i t h other lots of raw material. But it is generally-and rightly-believed that comparisons that are more closely grouped are likely to be more sensitive indicators of the difference to be estimated. Suppose four different catalysts are to be compared, and that only three can be tested in one day. The obvious subdivision to make is into all possible scts of 3-namely, (1, 2, 3), ( l ? 2; 4),
(1, 3, 4), and (2, 3, 4 ) . Direct comparisons of, say, 1 with 2 are possible in the first two of the sets. An indirect comparison is possible for the latter two sets since 1 is compared with the average of 3 and 4 in the third set, and 2 with the average of 3 and 4 in the fourth set. The proper means of weighting these three comparisons will not be detailed here: but it is worth remarking that information from all four blocks is used in drawing conclusions about every difference. Instead of using some standard catalyst in every block, which would consume one third of the test time, lve can use pairs of catalysts as go-betweens for making indirect comparisons. T h e variety of such balanced block designs is great and has been very completely worked out. -An introductory discussion appears in Youden (76). More detailed treatments appear in Kempthorne ( 7 I): in Cochran and Cox (8),and in Davies ( 9 ) . Controls Generalized. As suggested above, the scientific dogma of “adequate controls’‘ is capable of considerable generalization and strengthening. especially \\-hen small blocks of experimental material tend to be more homogeneous than larger ones. Suppose that only two treatments (one comparison) can be arranged inside the natural block. The block may be a day, a shift. a batch of raw material: a pair of reactors, or xvhatever other natural grouping is expected to be homogeneous. To compare, say? nine treatments using a control each time means that half of all the work done is given over to the control. Suppose each pair is done in duplication. Eighteen blocks are required and each condition or treatment is looked a t twice. But different treatments can be compared only through their (matching) standards or controls, rvhich brings in some error each time. Consider in contrast to the nine treatments, each paired \vith a standard in duplicate, the following set of 18 pairs taken from Kempthorne ( 7 1 , p. 554): (1: 21, (1, 31, (1: 41, (1: ( 2 , 31, (2: 3, (2, 81,(33 61, (3, 91, (4: 3, (4: 61, (4: 71, (5, 6 ) . (5: 8), ( 6 : 91%( 7 >8): ( 7 >91, (8,9). Each of the nine treatments or conditions can be compared \vith each other but not all comparisons are direct. For example, 1 is compared with 2: 3. 4, and 7 directly. These comparisons are exactly as precise as the duplicate comparisons through a standard considered above. But in addition, 1 may be compared with 2 by matching (1, 3) Lvith (2, 3), also by matching ( l ! 4), (4: 5) and (2, 5): and finally by matching (1, 7), (7, 8) and (2, 8 ) . The proper weighting to give the three sorts of comparison is not shown here) but it is clear that the comparison using all these bits of information is considerably more precise than by the classical each-treat-
a>
ment-versus-standard method. I t is aIsu possible to extract some information from these 18 “partially balanced incomplete blocks” abouE the random error made inside blocks. Exactly analogous remarks hold for the other eight treatments, and so this design will in general give more precise results than the usual plan. Of course? if the standard is still needed for comparison purposes, one of the niny numbers used must be allocated to it. Drift and Time Trends During Experimentation. Since drift is usually less between adjacent units or ronsecutive times, the ideas reviewed in thr. preceding section have also direct use here. Youden’s work. some of i t ahead!. known to chemical engineers through his columns in Industrial and Engineering Chemistry (1954-5). is especially noteworthy because of its combination oi‘ ingenuity and fitness. His paper (76) shoFvs hoiv to arrange measurements in one sequence so as to have all comparisons practically unaffected by instrumental drift. A more exact but more difficult scheme, allowing for elimination of an)polynomial trend (of kno\vn degree) is discussed by Box (..I, 5). This method is appropriate for estimating the firstorder effects of any number of independently variable factors. First-order multifactor designs arc also discussed later. Extra Measurements. It is evident from the experimental plans outlinrd thus far, that all possess a high degree of internal balance and symmetry. It follows that any later additions, fur example. a n extra set of measurements a t a n intermediate temperature level, \vi11 not ordinarily have the built-in richness of comparability that the earlier sets shoLved. I n many cases it is possiblc to reserve some of the several batches of raw material so that, if further comparisons are needed, material \vi11 bc available for linking the new treatments with the old. Xgain Youden has mad? the earliest contribution. A discussion of successive blocks in a multifactor experiment. to be run onc after another when improved sensitivity is desired, is given in the section on applications. Carrying this idea to the point wherc, the decision to make each measurement rests on the results of the last measurement gives us the devices called sequential plans. These are also discussed under “applications.” Control of Laboratory Precision and Accuracy. Although many analytical chemists incline to consider such action as impugning their professional competence, engineers often send in duplicate samples for assay, ivith different code letters. I t is no secret that such duplicates do not in general agree so \vel1 as VOL. 48, NO. 9
SEPTEMBER 1956
1395
,the parallel pairs that chemists traditionally run. A more justifiable course would be for the assay laboratory to publish regularly its precision for each type of assay. These precisions would have to be determined by “sleepers” (duplicate samples unidentifiable by the analysts, but known to the Iaboratory director). If some of the sleepers are also made up as standards, then a running pair of plots-one of the ranges of duplicate pairs, the other of the averages of the pairs-can be used to keep track of the analyst’s performance both as to the stability of his precision (as measured by the ranges behaving stably), and of his accuracy (as judged by the pair averages hovering around the true or standard value, with no long runs on either side). Naturally such a chart system should be set up with the aid of a quality control engineer. Naturally too, as stability over considerable periods of time is demonstrated, relaxation in the frequency of sampling should be arranged.
a rather generalized example. A correlator is confronted with a volume of plant data which gives the daily values of siveral outcomes (yields, various properties of a product being made, etc.) along with the measured values of a number of plant operating conditions and some repeatedly measured properties of incoming streams. The outcomes-of which J are recorded-are to be called y, (where j = 1, 2, 3. , . . , J ) . The independent variables will be named xi, where i = 1, 2 , 3, . . . , I. There are then I “independent” factors. The main assumptions that must be satisfied for the simplest or classical (least-square) fit of a set of equations of the form Y , = f , ( x , ) are as follows : a. An equation connecting each y j with the x i is known as to form, but has several unknown constants in it. The equations are (for the simplest case) linear in the unknown constants, but not necessarily in the xi. Thus it is assumed that
Applications of Statistics to Chemical Engineering Research
and Development Use of Data Already Available. Statistical advice is usually sought by the chemical engineer who has collected a mass of data which resists “ordinary” interpretation. The data may have been accumulated over a term of plant operation or may be the result of a number of laboratory experiments or successive pilot plant runs. Indeed such collections are sometimes made the occasion for “trials” of statistical methods. I t should be said at once that such trials are usually failures. A common reason for failure is that the data are inhomogeneous. Inhomogeneity as used here means that some conditions (not measured and often not even namable) have changed (slowly or suddenly), so that different sections of the data are actually samples from quite different populations. Perhaps almost as frequent a cause of failure is the personal one. the selfstyled statistician not having been careful to master the assumptions and limitations of the data-reduction methods he uses. The so-called multiple-regression methods are by far the most useful in reducing massive data to a few manageable equations. In the hands of competent workers aided by modern computing machinery, such data can sometimes be made to yield valuable information. Even in the contrary cases, where “no significant regression” is found, the analysis may pin the possible effects of some factors into a small enough range so that useful statements may be made about the lack ofinfluence of these factors. Many of the statistical contributions to data correlation can be indicated through
1 396
where the form of all the ppl~(xJare known and only the constants b k ( k = 0, 1, 2, . , . , K ) are to be determined. Each p may be a function of several X I . It is not judged necessary here to adjoin the subscriptj to every term on the right. b. Each datum point is complete, consisting of a t least one measurement of each x1 and one of each y,; a “datum point” is then a set o f 1 f J numbers. c. Each x, is quite precisely measured, compared with its actual spread in the data under study. d. The scatter of each y j for fixed x i , although unknown, is about the same regardless of the set of values a t which the x1 were fixed. e. No pair of x i shows very close linear correlation. (If such linear correlation is found, it is easy to reinstate this assumption by dropping one of the correlated pair.) If assumption a (known form of equation) is not a safe one, it may be practicable to test the data for the existence of some connection simply by seeing whether “near duplicates”-i.e., y, values taken for the (small) sets of nearly matching x,, are more closely equal than nonneighbors-Le., y 1 values taken for widely differing sets of xi values. If nearduplicate y, do in fact match better than nonneighbors (exact statistical tests are not available), then there must be some dependence of y , on the x , , and full engineering and technical effort would be brought to bear to derive the form of the relation, thus reinstating assumption a. If the form of equation relatingyj to the xi is known but is not linear in the unknown constants, then laborious trialand-error methods must be used. Oc-
INDUSTRIAL AND ENGINEERING CHEMISTRY
casionally a series approximation can be developed to bring this part of assumption a back into force. If assumption b requires, as it often does, that a large part of the data be removed from the study, this is mainly an advantage, since it serves to emphasize that voluminous data do not necessarily contain a large amount of information. If certain fixed sets of x , are present in subgroupings of the original collection, then each may be analyzed separately. But if, as is more usual, there is little uniformity in rhe names of the missing xi, then only the “complete” points can be analyzed. (A recent case in the writer’s experience saw the reduction of 1300 datum cards to 300, in order to meet this restriction.) Assumption c is sometimes mistakenly ignored. Each x , that is measured with error comparable with the scatter of xi in the data will generally be judged to have no influence, or to have one that is a n underestimate of its true influence. However, if the observed values of some x , cluster around a few target- or specification-values from which they differ because of control difficulties (not measurement errors), then “unbiased” estimates of the true bk may bc found. The “error in y , for fixed x,,” sometimes called the “error of estimate,” is usually assumed independent of the x, when a least-squares fit is made. If the error of estimate varies widely from point to point in x , space, then unbiased estimates of the 61,may still be obtained but the precision of these estimates may be very low. The alternative is to weight the datum points inversely by their local scatter. Such a “weighted regression analysis” is not easy, and is most fruitful when the weights are rather well known or vary very widely. Once these assumptions are shown to apply tolerably well for a particular set of data, then the work of estimating the constants, bk: and their uncertainties, is not too onerous. The estimation of thirty such values, with the precision of each, would have been considered quite unmanageable twenty years ago, but is now commonplace, thanks to the development of automatic computing machinery. One useful way to present the conclusions of such a numerical analysis is by means of two equations for each dependent variable. The first is a prediction equation, which gives the best being the true prediction of E’, values (Y, value of whichj, is an observation). given any set of x , in the range studied. The second equation shows how well the first one predicts. In terms of the p K , this equation is that of a hyperboloid of two sheets. I t is usually displayed as a table of values showing the error of prediction for various favorable and unfavorable combinations of the x,. This form of judgment of the quality of prediction (or
of representation of the data used) is more intelligible to engineers than is a large collection of statements about the “significance” of the various constants in the prediction equation. Many engineering readers will have long since asked themselves, What is wrong with current common sense methods? The common sense method usually starts by winnowing the available collection of numbers, retaining only those that are thought by experienced men to have some relevance to the current problem. No one objects to this phase of the “c.s. method.” T h e second step, sometimes used even as part of the winnowing process, is that of graphing some of the important independent variables against one (or more) of the dependent ones. A serious error is sometimes made here. To take a very simple example, suppose that the true relation is
and that in the data a t hand XI and are in fact linearly correlated so that
x2
The constants bo, b l , 6 2 , G and d are not known. If a single scatter diagram of y versus X I is now attempted, it is clear that the rate of change of Y with x I for fixed xp that will be estimated is ( b l bzd) and not bl. I t is entirely possible for the former quantity to be widely different from the latter. Such confusions are eliminated by the multiple-regression type of analysis. Put more generally, it is a pleasing and quite general property of least-squares analysis (when the assumptions given above are all roughly satisfied) that they give unbiased estimates of the true (unknown) constants. The term unbiased refers to a long range property. In a particular case some of the estimated slopes will be larger than their true values, and some smaller. The writer has occasionally encountered enterprising engineers who have decided on their own to tackle cases \\.here there is heavy error in one or more x , as well as in some of the y,. They have responded to this by deriving equations that minimize, say, the perpendicular distances of the observed points from the fitting plane. This plan is usually of little use, since it will have parameters that are not scale invariant. Thus changing the scale of one of the x1 will give a different plane. Even using five squares per unit for XI on the graph paper chosen instead of ten will then give a different predicting equation! A sounder approach to such a problem would start with the services of a statistician who is familiar with the “identifiability problem” and with socalled multivariate analysis.
+
The paragraphs just preceding are not planned to make their readers better analysts, but only to indicate some rather common situations which are now manageable by fairly well understood methods. These methods, while not so simple as to be mastered overnight, are not so complicated that chemical engineers need despair. It is fortunate that a t least two first-rate expositions by engineers and chemists are now available (2, 73). The engineer who is approaching multiple regression for the first time will probably want to have both books a t his desk, since different aspects of nomenclature, order of presentation, and even subject matter of examples, will appeal to different readers. A word of warning should be given to the engineer who has read some book on data fitting and who feels that the methods he is familiar with are much simpler and probably nearly as good. Many results of great usefulness were given by Gauss. These were partly rediscovered and greatly extended by Fisher. Perhaps a fair touchstone of the responsibility of recommendations on curve and surface-fitting would be the extent of reference to Gauss and to Fisher. An excellent set of supplementary readings on regression is given in (73). Of use to the engineer confronted with a multivariate problem (uncertainty in several variables in a n equation) is the book by Rao (78), but this work is far more difficult than those hitherto named. Research in multivariate analysis is proceeding rapidly, but the services of a statistician active in the field will be required for up-todate application. The results of a careful multiple-regression analysis are often discouraging. Great masses of data are fed into complicated calculations only to find that a few slopes are estimated with about 4=507, precision. I t violates our sense of fitness to find so little issue from such a voluminous substrate. A more objective sense of the proportionality factors relating amount of data to amount of information can be developed by considering a few elementary examples. Suppose that the relation between some variable Y measured with precision U, and a variable x , which can be measured with negligible error, is known to be linear. Thus: Y = bo blx. The true value of the slope, 61,is not known. The value of x can be set a t any value inside some range, XI to X I I , and then y can be measured. Since each Y is measured with uncertainty, the slope is also estimated with uncertainty. A’ measurements are to be made, and they may be spaced as the experimenter sees fit. The uncertainty in slope is related to the uncertainty in the y valucs by the equation
+
U*P
= u”/s(x
- Q)Z
(4)
where ub is the the standard error of the slope u is the standard deviation of duplicate y values S ( x 2)z is the sum of squares of deviations of the chosen x values from their average, 3 T h e measurements may be equally spaced over the interval X I to XII, but it seems likely that the points near the middle of the x range are giving little information about the slope. Equation 4 confirms this conjecture. At the other extreme the data may all be taken with x a t x I or a t xII, and again it seems reasonable to take iV/2 observations a t each end. Using only Equation 4,one easily finds that the ratio of the variance of slope for the equally spaced case to the variance of slope for the equally bunched case is 3(N 1)/(N 1). (The square of a standard deviation or standard error is called a variance.) Thus for any tolerably large A’ the equally bunched arrangement is in this important sense three times as effective as equal spacing. Put the other way, almost three times as many equally spaced measurements will be required to reach any given precision of slope estimation as will be needed if the measurements can be equally bunched over the same x range. The new property of a set of data that is utilized here is its arrangement. The extensions and generalizations of this motion are a large part of the field, unfamiliar to most engineers, called the statistical design of experiments. Some of these developments will be discussed later. The connection between Equation 4 and the apparent paucity of information in much plant and operating data is easy to make. Inspection of such data, classified by each X I in turn, usually reveals that most of the data are grouped rather closely around the average value for that X I . T h a t is just the position that gives least information about the rate of change of y with X I . It is quite common to find that pairs of x I vary together (or in opposition) so that it is not easy to separate the effects of the two. Such correlations, which can occur for many pairs of X I in a single set of data, further weaken the informative value of the set with respect to the “effects” (constant rates of change in the simplest case) of X I on y. If the writer’s experience is typical, then it can be said that the prospects of finding a good prediction equation by multiple-regression techniques by use of plant data are poor. However, exceptions have occurred and many hints about the relations among a number of factors have been garnered by this means. The testing of derived prediction equations and the using of the hints just mentioned are both carried forward most effectively by the methods to be described.
-
-
VOL. 48, NO. 9
+
SEPTEMBER 1956
1397
Figure 1 a. Region of experimentation
permissible
These devices, most of them by no means new, are discussed because so few chemical engineers are aware of their advantages. The improvements are aimed a t the deliberate elimination of the two disadvantages of ‘.unplanned” data just outlined. Data will be taken, then, where they d o the most good for detecting influences or effects and so that there are no correlations (or minimal ones) between the independent variables. Balanced Schedules of Experimental Runs. Deliberate variation of factors (independent variables) is more costly than is the unselective taking of data. This commonplace observation should he followed by the more important statement that it is only by the deliberate variation of factors that the engineer can become sure that observed correlations are in fact in causal relation to one another. The more experienced and pessimistic operating engineer would want to add several more comments, one being that even after such deliberate variation we are often not sure enough.
Figure 1 b.
The claim made here is only that such variation is necessary, not that it is sufficient. Perhaps the simplest example of statistical design of a multifactor experiment can be constructed to answer the (rhetorical) question, if we vary more than one factor a t a time, say XI and x ? , hoxv can we tell Lvhich produced the observed variation in J ? It is curious that this question \vas not raised in connection with the gleaning of information from planr data since it is only the rarest of accidents to find single and independent variation of two factors under normal plant operation. 41though the question asked above is usually not meant to be answered, it is exactly in its careful answer that we find new gains in the power of variable data to give information about effects. Suppose that the “response surface” is a plane to a close enough approximation over the practicable range of variation of 1 1 and xz. The equation is then 1- =
6n
+
61.~1
+
(5)
62x2
I I
I
kx2 -----
/ y3
J
1 398
“Classical” experiment
INDUSTRIAL AND ENGINEERING CHEMISTRY
kx27 ----- --_
/
/
4
Figure 1 c. Experiment covering larger region of permissible experimentation
where the constants 63, 61,and bs are to be estimated from the data. I t is further assumed that there is only a certain (known) range of X I and x2 in which experimentation is practicable. The experimental conditions (but not the results of the experiments) can be plotted on an XI, x2 grid (Figure la), inside or on the practicability boundaries. The “classicist,“ varying only one factor a t a time, will d o the threr runs indicated by the three dots in Figure I b . Comparing this “design” with that indicated in Figure I C suggests that the latter might be somewhat better since it seems to span the available experimental region a little better. Calculation confirms this suspicion, thr slope 6 1 being determined just as well b? the symmetrical design, but 6 2 being determined with somewhat better precision-namely, with three fourths as large a variance. This gain, though small, becomes proportionately greatei as more factors, and more constants. are included in the prediction equation. It need not be overemphasized that the three points in the second design are
with a more fully balanced design. Figure 2a gives relative positions of the points in “factor space” for the former design; Figure 2b for the latter. The crampedness that was only just noticeable in the two-factor case is here rathei painful. No one would claim that the four points of Figure 2a cover the per-
face of the cube. Such a difference has in it a component of uncertainty from each measurement. If y precision: u. is constant a t all x , : then the standard error of the difference 0 . 3 - y l ) is 4%. Less obvious is the fact that the J difference in Figure 2b will he measured by the difference betlveen the average of the txvo y values a t the higher level of X I and the average of the tivo a t the lower level. Everyone feels that the average of a pair of independent measurements is more reliable than a single number. The standard error of the relevant difference here, namely of 0 3 J,) 2 (y1 12) ;2>is U: which is notably smaller than the d2ufound above. The same advantage holds for the estimates o f t h e effects of var!-ing .Y? and ~ 3 . Put another xvay, the four points of Figure 2a xvould have to be run in duplicate. making eight runs in all: to gain as precise results as the four points of Figure 2b. The gain in economy of effort. a t least in those cases \\.here the cost of experimentation is heavily influenced by the number of runs, increases steadily with the number of factors to be varied. Similar designs exist for any number of factors. They are exceptionally easy to write when the number of runs is a power of 2 and the number of factors is one less. Two (equivalent) nomenclatures are given in Table I for the t\vo designs just discussed. The three independent variables XI. .??, and .x3 are here renamed A . B, and C.
Table I. Numerical and Literal Notations for Multifactor Experiments Classical Design One-at-a- Time
Balanced Desigri Two-at-a- Time
\-il-
.Yll-
meiical
mei l -
3
ABC 000 100 010
4
001
1 2
A
+
+
Run
8
Liteial (11
a b C
cal ABC 000 011 101 110
Litri a
I
(11
bc ac
ab
A listing of the designs for all numbers of factors up to 100 divisible by 4 has
been given by Plackett and Burman (77). Box (4) has shokvn that all the intermediate designs also exist. They are called by Box multifactors designs of first order. It is not always safe to speak about the “effect of factor A” without further qualification as to the levels of other factors. It may well be that the effect of factor ‘4, by Tvhich is meant the effect on some dependent variable of changing factor A from one level. 0, to another level 1, is not the same at one level of factor B as it is at another level of
Figure 3a.
2 ? factorial design
Figure 3b.
B. If four measuremenrs are taken. the consistency of effect of d a t tivo levels of B can be judged. The four measurements required are indicated as the vertexes of a rectangle in Figure 3a. The corresponding literal notation is given beside each point. The design of Figure 3a is the first shoivn here of the type called factorial, meaning that all levels chosen for each factor are rested at all levels of every other factor. I t is in fact the factorial design called the 22, the exponent meaning that two factors are being varied and the base that each is a t t\vo levels. Such a design gives three results, each using all the data and each being of maximum precision. The three results are an estimate of the average effect of increasing factor A (or 21) from the low level chosen to the hither level; a similar estimate for the effect of B,and finally a judgment of the consistency, or additivity, of these ttvo effects. This measure of additivity (or of nonadditivity) is found as the agreement (or otherwise) of the tt5-o “effects” (ab - b ) and [a ( l ) ] and so is given by combining the results of the runs as follows: [ab (1) - a - b ] . If this quantity is of considerable magnitude, one then speaks of A and B as interacting. The effect of varying A from its low level to its high level is then not the same when B is low as when B is high. The average effects of A and B no longer have simple meaning. The effects of the two factors can no longer be spoken of as additive. Regarding the four y values obtained as points on a surface, there is some twist a t right angles to the direction of motion as one proceeds from the two points at low B to the two a t high ,4. Since only t\vo levels of A and of B are available in the 22. no measure of curvature in the direction of motion is possible. But by adding one point, as in Figure 3 b , it is possible to get some idea of the “total” curvature. Consider
+
Five point design
nom‘ the biquadratic equation in x1 and x2:
I- = bo
+ 61x1 +
62x2
611x12
4-
+ b.nx? +
hl?xl.rr
(6)
The coefficients b l and b ? are estimated by the “main effects” of x1 and x t , each divided by the change in the respective independent variable. The coefficient b12 is estimated by the correspondins interaction effect. The sum of the t\vo pure quadratic coefficients-Le., bll 6 2 2 , is estimated by the difference between the average response of the four outside points and the response a t the center. (For simplicity, the ranges of variation of x1 and x 2 are here taken as unity.) If the response surface is simply curved, being uniformly convex or uniformly concave over the region of experimentation, then the design given is efficient. If the response is possibly saddle shaped in the region studied, then this arrangement of points may fail since the trde value of (b11 622) may then bc zero or very small, even though 611 and bs2 are large. The five-point design just given is one that would rarely he used because the precision of the various effects and of their sum as assembled in Equation 6 is not good enough. A few simple rules are needed to show how the precision of a quantity (let us call it y ) measured with uncertainty is related to the precision of functions of that quantity. In the first place, if tm’o independent measurements of y are made, the variance (standard deviation squared) of the sum or difference of the two measurements is the sum of the variances of the single measurements. Thus the uncertainty (in variance units) of the total weight of some material that must be weighed in two lots is the sum of the uncertainties (still in variance units) of the Lveights of the tlvo parts. The net weight: estimated as the difference betlveen a gross and a tare Lveighing, also
+
+
VOL. 48, NO. 9
0
SEPTEMBER 1956
1399
has a variance equal to the sum (not the difference) of the variances of the individual weighing. Generalizing to A‘ independent measurements on y, y l , y z , . . . y, . . . y N yields
where Var { } symbolizes the variance of the quantity in curly braces. The y i need not all be measurements of the same true Y but are required only to be independent measurements. If the variances of all the y , are the same, then the sum of the variances shown in Equation 8 may be written as Nu: where uv is the standard deviation of duplicate y measurements. Since standard deviations have the same units as the random variables to which they refer, it comes as no surprise to learn that the standard deviation of CJ is cu,, for any constant c. Similarly, VUY
( c y ] = c2uy*
(9)
and in particular
where c has been set a t l/A’. Equations 8 and 10 may be combined to give a n expression for the variance of the commonest of all linear functions of a set of measurements, their average:
where the )I] are repeated independent measurements ori y . Using Equations 8 and 10 above gives
Equation 12, possibly the most important equation in our field! justifies and gives quantitative expression to the widely held feeling that a n average value is more reliable than a single measurement. Taking the square root of both sides of Equation 12 yields
The other equation needed here gives the standard deviation (sometimes called the standard error) of the difference between two independent averages :
VUT ( j z - j l ]
= VUT{ji)
+ VU^ (uz] (13)
Equation 14 follows from 13 when the error in y l and in y2 measurements is the same. If, as is often convenient, N 1 = itT*= N / 2 , then
1 400
From Equation 15 it is seen that the standard error of a difference between two equally precise means is 4 i i u v , rrhen N independent measurements in all have been taken. This relation plays a n important part in industrial experimentation. I t is the fashion among chemical engineers to attribute to others the random variation that appears in their data. The assay laboratory cannot check its own results, the analytical chemists make all kinds of errors or, turning in the other direction, the raw material we have to use varies all over the place, no two batches are ever the same? and so on. It is quite feasible to get quantitative estimates of all these sources of variability by taking data in a balanced, nested arrangement. As a simplified example, assume that two separate carloads of raw material can be followed through the process, that two runs can be made on each carload, and that duplicate, but differently coded, samples from each run can be sent to the assay laboratory. Eight results will be produced. The four differences between duplicate assays properly averaged (by averaging their squares) can be used to estimate the assay variance completely separated from the other sources of variallility under study here. The two differences between the averages of the pairs of runs from the same carload estimate run-to-run variability with some contamination of assay variability but with none from rawmaterial carload-to-carload scatter. The run variation can? by subtraction, be estimated clear of assay variation. Finally, the single difference between the two carload averages gives an estimate of the raw-material variability. T h e repetition of this whole cycle of “hierarchal” measurements, including if necessary further splits, can be carried far enough to specify, with desired precision, just what proportion of the total variability is being conlributed from each source. But the purpose in criticizing here the engineers’ attitude toward assay error is a different one. I t is to emphasize that most engineers do not have a clear idea of the order of magnitude of the error standard deviation of their own results. I t is to emphasize, further, that without a n accurate estimate of the “run standard deviation” one knows neither how many duplicate runs to make or hohv many duplicate assays to request. I t is obvious that if the duplicate-run standard deviation (which we will call U R ) is much larger than the duplicate-assay standard deviation, u,, no gain in precision can be expected by increasing the number of duplicate assays. If AT, duplicate runs are made and X,,,
INDUSTRIAL AND ENGINEERING CHEMISTRY
duplicate assays are made on the results of each run, and if, as can generally be assumed, run variabirity is not correlated with assay variability, then the reliability of the average of all N,iZ:, measurements is given by the equation
Obviously there is a sharp limit to the improvement in precision to be attained by increasing N , only. I t is a common occurrence that improvements in yield (or whatever other dependent variable is being measured) about as large as UR are of industrial importance. In planning experimental work, then, it is well to be quite sure that effects as large as UR are detected and not missed. To ensure this outcome, a minimal requirement might be that the observed effect ( 7 2 - j 1 in the simplest cases) be assuredIy within f U ~ / 1 2of the true effect. If “assuredly” be translated modestly “with 95% assurance,” then the true value will assuredly be covered by an interval centered on the observed effect (D = 5 2 - j J and of length four times the standard error of the observed D. Thus
From this equation hiR,the total number of runs required (half at each level of the factor whose effect is being studied), emerges as 64. Most engineers will shy at this number, feeling that if this were a correct line of reasoning hardly anything \vould ever have been discovered. I t should suffice to remind them that effects of magnitude 2uR can be measured within XICUR with only 16 runs (eight under each condition of the independent variable) and that effects of the order of 4 u ~ can be estimated with the same relative precision by four runs. I t is these factors whose effects are discovered first: and little or no statistical sophistication is required to interpret data obtained by varying factors of such influence. But the principle of simultaneous and balanced variation of several factors can now be brought to bear, with impressive gains in effectiveness per run. As a first step a full 26 factorial experiment can be made in 64 runs (25 = 64) and thus the average effects of six separately variable factors can be estimated, each one as precisely as if the whole 64 runs were devoted to studying only the effect of a single factor. I t is worth adding that all the fifteen (6 X 5,’2) twofactor interactions among six factors are also estimable and one can see whether the “average effects” observed are in fact roughly the same a t high and low levels of other factors or whether they are quite disparate. I t will have been noted that only 6
+
15: or 21, conclusions have been drawn from the 64 runs imagined above. The remaining 38 conclusions can also be drawn, but they estimate such abtruse properties of the data as 3-, 4-? 5-, and 6-factor interactions. Just as it was shown above to be possible to answer some questions about the effects of three factors in only one half the 23, so it is also possible to fractionate the full factorials in still larger numbers of factors without serious loss of information about main effects and t\vo-factor interactions. Indeed, if three-factor and higher order interactions can be assumed negligible (this assumption is usually a safe one), then eight factors and their 28 two-factor interactions can all be estimated in 64 runs. Since 64 is one fourth of 28, or 256, such a design is called a “quarter replicate of a 28,’’ or more compactly, a P 2 . The actual combinations of runs that ivould be needed are given in (7, 8, 74) and in many other places. They can be generated immediately from the six sets of letters ab, adf, bc, de, fg, and ph. Each of these sets represents the conditions of a run. The product of each pair of these (dropping all squared letters) gives another f i n . I n fact, all possible products of the six sets given, plus the “low set”-with all factors a t their low levelsgive the 64 runs required. Thirty-six conclusions can be drawn, each one using all the data. Other fractional replicates are of course possible, the 2+’, 28-2, 211-4,and 2l5-7 being the most efficient in the sense that each permits drawing about half as many conclusions as there are runs. Put another way, the four designs require 32, 64, 128, and 256 runs respectively: if one of these numbers of runs has been chosen for precision reasons, then the corresponding number of factors (6, 8, 11 or 15) should be marshaled and built into the sequence of runs. Put a third way? if only seven factors are thought to be important, then 64 runs must be made in any case. since the 27-1 is the smallest fraction having all tbvofactor interactions separately estimable. It would then be a pity not to add another factor, since the 28-2 also requires only 64 runs.. The run specifications for all these designs can be found in (7). Efficient methods of analysis and interpretation can be found in (2, 9, 74). Even though factors are being varied over ranges that are expected to produce effects of magnitude uR, it will of course happen that some factor or factors may produce much larger effects. I t will be necessary to wait until all N R runs are in to find this out if the designs just given are used in toto. But balanced fractions of these fractions are available which, while they measure each main effect along with several two-factor interactions, d o
nevertheless suffice to pick out any very influential factors in a smaller number of runs. T h e fractions of fractions are 26-1-1 28-8 -2 211-4-3 and 215-7-3, requiring, then 16, 16, 16, and 32 runs respectively. They can be easily derived by any statistician experienced in this field. I t will sometimes be practical to augment the 2P-a fractional factorials with some additional measurements taken a t the center of the design to permit a n over-all estimation of the curvature of the (p-dimensional) response surface. T h e contrast which measures over-all curvature is the difference between the average of the “outside” points and the average response a t the center. If one third of 2p-q points are taken a t the center, the variance of the curvature contrast will be the same as that of the effect contrasts. A less precise judgment is not likely to be useful; indeed, more precision will sometimes be required. The augmentation of the two-level factorials brings with it a further advantage in that a n unbiased estimate of UZ can be obtained from the duplicated center points. This estimate can be compared with that obtained from the comparisons due to combinations of factors that can be assumed roughly equivalent. (Many such comparisons are to be found in any large experiment. Their unsupported use as error estimates is likely to err on the conservative side; that is, the error is likely to be somewhat overestimated.) Only one other experimental principle derived from statistics awaits discussion ; this is the principle of randomization. Of the 64 runs in the 28-2 just discussed, 32 are a t “low A” and 32 a t “high L4,’’ I t would be an elementary scientific error to run off the 32 runs a t either level of A first, simply on the grounds that such a grouping was convenient. If some factor not under control, e.g.. cooling-Mater temperature (c.w.t ), were to increase gradually during the sequence of runs, then obviously the runs a t one level of A would have been carried through a t a higher cooling-water temperature than those a t the other level. If c.w.t. in fact has quite a n influence on the outcome. then the effect reported as “due c.w.t.” to A” is actually “due to A There are a multitude of such uncontrolled variables operative in most experimental situations. I t is imperative to evade as far as possible the effects not only of linear trends in these uncontrolled factors, but of any other changes or combinations of changes. This neutralization can only take the form of spreading the trends as equally as possible across all the levels of the factors being varied. This can be carried through by doing the runs in a n objectively randomized order. Tables of random digits available in many statistical texts (9, 70)are used for this purpose. The practice of ran-
+
domization, invented by Fisher (as was the principle of symmetry and balance or orthogonalization described above), must often be carried through several times independently in one experimental sequence. As a n example, the material produced in a set of randomized runs should usually not be sent to the analytical laboratory in that order, but rather should be held a t least until some major fraction of the total is available and then sent in a new random order. A too-brief summary of the recommendations just made might run as follows. Block the experiment into sections judged more homogeneous than the whole; orthogonalize (balance) inside the blocks with respect to all factors and two-factor interactions thought to influence the outcome; randomize with respect to all conditions which might influence the results but which cannot be controlled. Returning now to our strictures against the analysis of unplanned data, it will be seen that the following points have been covered. a. Deliberate variation in random order of the conditions under which runs are made guarantees that observed effects, if consistent, are due to the factors varied. 6 . Simultaneous variation of several factors permits the use of each run in drawing several conclusions. 6. Balanced variation of the factors permits adequate precision in estimating the effect of each factor and removes undesirable intercorrelations between factors. d. Extreme variation of the factors, measurements of y being taken a t x values as far apart as possible, maximizes the sensitivity of the experiment compared Lvith the relative insensitivity of poorly placed, unplanned data. e. Fractionating the fractional factorial design alloivs early discovery of influential factors, thus making redesign practicable lvhen necessary. Questions of analysis and interpretation are discussed elsewhere-e.g., in ( 9 ) . Suffice it to conclude this section with a reassurance. The calculations required to analyze even the largest of the designs discussed above are much simpler than the corresponding multiple regression calculations and usually do not require the use of computing machinery. In some of those cases in which curvature is detected, the development engineer will want to study the response surface in more detail, just in the region already covered. O n the other hand, the responses found may suggest to him, by their magnitudes and directions, that further tests should be run a t new values of the independent variables. In the latter case rough approximations to steepest ascent (or descent) directions will be desirable. I n the former case efficient designs for studying curvatures will be wanted. One scheme for accomplishVOL. 48, NO. 9
*
SEPTEMBER 1956
1401
ing these ends is that developed by Box and his associates (3,6 ) . I t is compactly described in ( 9 ) .of which Box is coauthor. Improvements in the Art Statistical Inference
of
Tests of Significance and Confidence Intervals. T h e familiar reporting of some result of experimentation as “very highly significant” has probably misled more engineering readers than has any other single piece of statistical jargon. If the statistician had written instead that “as divergent a result as this would only very rarely be observed if there \cere no real effect,” then the engineer reader Lvould be less impressed. T h e engineer would then probably further translate the statement as, “ H e means that the true value is very likely on this (or that) side of zero.” I n this form the conclusion is hardly ne\vs. and rarely useful. Indeed it had better be admitted straight off that tests of significance, while representing an important stage in the recent history of statistics, correspond to a quite primitive state of the engineering field to which the statistics are being applied. Kearly always the chemical engineer would like to knoiv uithin what range of value he can be quite sure: on the basis of these data, that the true value lies. Since many significance tests (or tests of hypotheses) can be recast in this form. and since actually much more information is given by this means, the so-called “estimation by confidence interval” procedures are in general recommended. More information is given by saying, for example, “The true effect on yield of varying the temperature from T I to Tn lies, with 99% certainty, in the range 0.2 to 4.0,” than is given by saying, “There is a highly significant positive effect of raising the temperature from T I to T2.“ Multiple Comparisons. The standard tests of significance and confidenceinterval statements are derived in such a way that the long-run relative frequency of their giving incorrect resultsLe., of reporting significant differences among a set of I; means when in fact the true means are identical, has the stated value provided only one significant judgment or confidence-interval statement is made by use of the particular average values needed for that test. T h e average greedy engineer can hardly be content with one use of each average value: he will want a t least to make all possible pair-wise comparisons among, say, K means. T h e resulting K ( K - 1)/2 comparisons are by no means independent, nor does making all of them conform to the derivation of the significance tests mentioned above. Even when only three average values are produced, say as the result of testing one factor at three levels, it is not safe to use the t test for comparing
1 402
the largest with the smallest (8:page 68). In fact, the t value calculated from the two extreme means of three will exceed the tabled 0.05 value 13% of the time when there are no true differences. This mistake is of course made \verse \vhen the extremes of a still larger set are chosen. Thus for six means? use of the 0.05 t-value on the largest versus the smallest mean will give apparent significance 40y0 of the time when no difference exists, Tukey has shown (21) how to use more recently derived multipliers that will operate with the desired over-all probability of error and give narroivest possible intervals when only pairwise comparisons are to be made. The required multipliers, called “studentized ranges” are tabulated in Dixon and Massey‘s text ( 7 0 ) , in Bennett and Franklin (2). and in Pearson and Hartley‘s tables (75). \\‘hen other comparisons are also to he made-for example, in a set of seven means. it may be that the loivest four are to he compared with the upper t\vothen the studentized range does not give the shortest intervals (though it may be used a t the given level of confidence). Many alternatives appear. depending on how one wants to distribute one‘s errors, on ho\v much freedom is demanded to mull over the data after they are collected, and on hoxv much calculation one is willing to carry out. A monograph in this whole matter by Tukey is in press (27). A paper by H . Scheff6 (7s) compares Tukey’s range method for pairs with the (infinite) set of contrasts that is equivalent to the usual “analysisof-variance test.” Another approach to the multiplecomparison problem is exemplified in the recent work of Bechhofer. Sobel, and Dunnett (7). Picking the largest of I: varieties, each measured with uncertainty is easier the greater the distance betiveen the largest and its closest contender. Bechhofer and coworkers have shown hoiv to decide the number of measurements that must he taken under each of the I.; conditions, so as to guarantee with a predetermined probability that rvhen the largest has some predetermined degree of superiority. it will be in fact chosen. Prospects. T h e field of research in problems of chance variation is groiving rapidly. T h e relations between subfields are becoming somelvhat clearer. Research in decision-function theory. in sequential-approximation procedures, in time-series spectral analysis: in statistical linear programming, in multivariate analysis. and in other areas is occasionally seen to approach applicability. S o space need be taken here to summarize or judge this lvork. The greatest immediate prospects are, in the writer’s judgment, elsewhere. T h e backlog of statistical knowledge ready to be
INDUSTRIAL AND ENGINEERING CHEMISTRY
applied to chemical engineering problems is quite large. The best immediate prospect is that some engineers will study this body of research and will decide to use some of it. First-rate texts are available, the journal literature is growing, and more and more research statisticians are becoming interested in chemical engineering problems.
Literature Cited
(1) Bechhofer. R. E.; Sobel, X I . . Dunnett, C . LV., Bionietrzka 41, 170 (1954). (2) Bennett, C. A.: Franklin, N. L., ”Statistical Analysis in Chemistry and the Chemical Industry,” Wiley, New York. 1954. ( 3 ) Box, G. E. P.: Biomdrics 10, 16 (1954). ( 4 ) Box, G. F,. P., Biometrika 39, 49 (1952). ( 5 ) Box? G. E. P.: Hay, FY. A.: Biometrics 9. 304 11953). ( 6 ) Box; G. E. P.. iVilson, K. B.! J. Roy. Statistical Soc. B13, 1 (1951). ( 7 ) Brownlee, K. A.: Kelley. B. K., Loraine, P. K., BiometriXa 35, 268 11948). ( 8 ) Cochran. W. G.? Cox, G., “Experimental Designs,‘’ LViley, Sew York, 1950. ( 9 ) Davies, 0. L., ed., “Design and Analysis of Industrial Experiments,” Hafner, New York, 1954. (10) Dixon. W. J., Massey, F., “Introduction to Statistical Analysis,” McGraw-Hill, New York. 1951. (11) Fisher, R. A.; “Contributions tu hlathematical Statistics,” Wiley, New York, 1950. (12) Fisher, R. X.: “Design of Experiments,“ 6th ed., Oliver and Boyd, London, 1951. (13) Hald, A , : “Statistical Theory with Engineering Applications,” Wiley, Xew York, 1952. (14) Kempthorne, O., ”Design and Analysis of Experiments,” FViley, New York, 1952. (15) Pearson, E. S., Hartley, H. 0.. “Biometrika Tables for Statisticians.” vol. 1, Cambridge L. P., New York. 1934. Perry, J. H., “Chemical Engineer’s Handbook.” 3rd ed., McGraw-Hill, New York, 1950. Plackett. R. L., Burman. J. P., Biomeirika 33, 405 (1946). (18) Rao, C. R., “Advanced Statistical Methods in Biometric Research,” Wiley, New York, 1952. (19) Scheffe: H., Biometrika 40, 87 (1953). (20) Sherwood, T. K., Reed, C. E., “Applied Mathematics in Chemical Engineering,” McGraw-Hill. New York, 1939. (21) Tukey, J. W., “The Problem of Multiple Comparisons,” in press. (22) Walker, W. H., Lewis, M’. K., McAdams, W. H.: Gilliland, E. R.? “Principles of Chemical EnTineering,” 3rd ed. McGrawHill, New York, 1937. (23) Wilson, E. B., Jr., “Introduction to Scientific Research,” McGraw-Hill. New York, 1952. (24) Yates, F.. Commonwealth Bur. Soil Sci. Tech. Commun. No. 35 (1937). (25) Youden. W. J., “Statistical Methods for Chemists,” Wiley, New York, 1951. (26) Youden. W. J.: Science 120, 627 (1954). RECEIVEDfor review
hlarch 6, 1956 ACCEPTEDMay 8, 1956