I
FRED
H. TINGEY’
Atomic Energy Division, Phillips Petroleum Co., Idaho Falls, Idaho
Identification and Estimation of Variation in Process Measurements H~~~is a soundapplication of standard statistical techniques in the chemical proc-
appropriate mathematical model in any given study is the classification of each component source of variation to be identified as essentially being (‘crossed,” or being “nested,” with respect to the other components. A component variable is said to be crossed if it can reasonably be assumed that its contribution to the total error in a given measurement is uniquely identified by the particular level of that variable involved in the measurement. For example, if a measurement procedure included a mass spectrometer assay step, the component error associated with mass spectrometer I, say, in a given measurement, would probably in no way depend upon the particular levels of the other components involved in the measurement. Contrast this with a colnponent variable which is said to be nested, as would be the case, for example, in the usual batch-sampleanalysis chain. H~~~ the component deviation associated with sample one, say froin batch one, could not logically be assumed to be identical to that for sample one from batch two, say, and hence the sampling component in this to be uniquely defined would have to be identified by both batch and particular sample within the batch. Most variance component studies involve both crossed and nested variables; however, the method of analysis and the computational procedure associated with a study in which all the variables are assumed to be crossed can, by a simple technique, be adapted to all c~assifications, nested and all mixed The mathematical model in which all the component Sources of variation are crossed with respect to each other results in the classical factorial type design. This design is characterized by tions at all combinations of factor
essing field-atechnique by which a comparative novice can statistically analyze data obtained in a great variety of error-component studies
A
FUNDAMENTAL problem associated with essentially all chemical processing is the determination of the reliability of measurements involved in a material balance calculation. Not only is such information required to evaluate the significance of an apparent objectively, but the efficient allocation of effort--i.e., number of samples, number of preparations, number Of -within a given economic framework or error limit is contingent upon the identification and estimation as to magnitude of the major sources of variation which are associated with each measurement. The area of mathematical statistics which has application to Problems Of this type is called “analysis of variance’’ (5-7). This is a technique by which the variation in a set Of measurements obtained according to a predesigned scheme is resolved into component parts identifiable with the particular sources of variation pertinent to the measurements. the particuIn any given lar statistical technique used in the identification and estimation Of component variations relative to a process measurement is a consequence of the assumed manner by which each component combines to give the final observed Thus, before specifying the design which the observations in a given study be Obtained, and the a mathematical model relating each component to the observed measurement must be derived. Pertinent to the construction of an
Present address, Technical Operations, Inc., Monterey, ,Calif. 1
levels involved in the study. As a general example, consider a process measurement in which three major SOUrCeS of variation are assumed. If the sources are denoted by the letters A, B, C, and the corresponding levels under which the investigation is to take place by
c1,cz
... cj ... CJ
where the experiment is to be replicated
K times, then the mathematical model which results in a factorial type experiment assumes an observation X h i j k ,
h = 1(1)H, i = 1(1)I, j = 1(1)J, k = l ( 1 ) K is related to the sources of variation in the following manner :
m
Xhijk
(UC)hj
+
ah
f
f
bp
Cj
+
+
(U6)h,
+ (bc)ti + (~bc)raj+ e
w
(1)
The first term is assumed to be constant. The other terms as identified by subscripts are assumed to be random variations from normal populations where the means of the populations are zero, and the variances are defined a8 follows: V(Uh)
= V(U); V(b%)= V(b); v(cj)=
V ( U b) h,
= V(ab); V ( U C ) h f =
Ir(,bc)hi,
= V(U~C)
v(efitjd
= V(e)
V(C)
Y(uc); V(6c),$= V(bc) (2)
Thus in a three-factor experiment where all three factors are crossea, there may be as many as eight sources of random variation when the combined effects are also considered. The analysis of variance technique provides the method by which the total variation in the HIJK observations can be resolved into component parts and effects the estimation of the eight variances listed above. The arithmetical detail is given in any standard statistical text (7, 2, 5, 6). For the sake of completeness the more convenient computing forms are given below. If we define Thzj = total of the K observations corresponding to the hth level of A, ith level of B, and jth level of C Tht. =, total of the J K observations corresponding to the hth level of A, and ith level of B (3) Th . = total of the ZJK observations corresponding to the hth level of A T . . . , = total of all HZJK observations
and analogous definitions for such totals as T.ij., T..j., etc., where the rule is simply that a d o t ( . ) for the subscript indicates that particular factor has been summed out, and if we compute
s,
...’ =.c T h
h
IJK
VOL. 50, NO. 7
T.. ..’ HZJK JULY 1958
1017
of Variance for Three-Factor Factorial Experiment Mean Square S,/(H
+ + + + + + + +
+
SahI(H - 1) (1 - I ) SocI(H - 1) ( J - 1) S d ( I - 1) (J - 1) Sabc/(H - 1) ( I AaIHIJ(K - 1)
V(e)
sb/(I
-
1)
S J ( J - 1)
s
Expected Mean Squarea ( 7 )
+
1)
V ( e ) KV(abc) J K V ( a b ) IKV(ac) IJKV(a) T'(e) KV(abc) JKV(ab) HKV(bc) HJKV(b) V(e) KV(abc) IKV(ac) HKT'(bc) HIKV(c) V(e) KV(abc) JKV(ab) V(e) KV(abc) IKV(ac) V(e) K V ( a b c ) 4HKV(bc)
-
-
1) ( J - 1)
+ + + +
+ +
+ V ( e ) + KV(ubc)
HIJK - 1 All components of variation refer t o sampling from infinite populations.
Corrections
t o the pertinent coefficients in the expected mean square equations must be made if Some of the populations are assumed t o be finite ( I ,p. 394).
Table 11. Analysis of Variance for First Three-Factor Example Degrees of Freedom Mean Square Expected Mean Squarea
Sum of Squares
Table 111. Sum of Squares 8,
+ S