Application of Tests for Randomn T h e study of runs and the use of mean square successive difference are useful methods in testing for many types of nonrandomness commonly encountered in chemical data. Use of the mean square successive difference seems empirically more powerful in most cases. However. the use of r6ns is computationally easier, is theoretically less dependent on distribution, and is surprisingly efficient. These tests can be used as preliminary to other statistical tests or to the use of the data for estimation purposes, or
they may serve as an end in themselves in detecting the presence of systematic Yariation. Their use can avoid faulty conclusions because o f incorrect assumptions o r can indicate the need for investigation of factors systeniatically affecting the results obtained.
CARL A , B ENIVETT Gerierul Electric Co., H a n f o r d F-orks, Richland, V a s h
T
HE primary purpose of modern statistical methods is to draw inferences from a sample of experimental data concerning the large mass or population of data from which the sample was drawn. Most sets of experimental observations or measurements are considered as representing only a relatively few of the many observations or measurements that could have or might have bee3 obtained; and it is the laws governing the behavior of this larger real or hypothetical mass of data that are t o be determined, since it is on the basis of thesc laws that past behavior can be explained or future behavior predicted. I n general, the success of the methods of statistical inference used to accomplish these aims depends on tlro types of assumption: 1. Assumptions concerning the fundamental nat,ure of the population being studied, which lead to a statistical or mathematical model of the true situation. 2. The assumption that the sample of observations or nieasurements available for study is a random sample from this population, which justifies the application of the theory of probability in checking the former assumptions and in estimating the characteristics of the model assumed. This second assumption of randomness is fundamental to all stat,istical work. Unless certain errors of observat,ion or measurement or other deviations from the “esact” laws which govern the behavior of the espcrimental data can be assumed to be random, the best statistical methods are frequently of very little use and may be completely misleading. However, data are sometimes used to obtain averages, estimate precision, and perform other common statistical tests without any investigation of the validity of this assumption of randomness or of the presence of a state of “statistical control.” It is difficult to give a practical working definition of randomness. The concept can best be exemplified by the haphazard occurrence of heads or tails in individlal tosses of a coin, or by the haphazard occurrence of the different faces in tossing a die, or by the variations which are alwaj,s present in the most accurate chemical analyses or the best controlled production process. No attempt will be made to define this concept or to say how it can be achieved. On the other hand, most people have a n intuitive feeling for t,hings which are not random. For example, if a coin, on being tossed 25 times, showed alternate series of 5 heads and 5 tails, something systematic rather than haphazard would certainly be suspected about the behavior of the coin. Similarly, the sudden occurrence of a 72y0 yield by a process, which had been consistently giving a yield of between 60 and 65y0,would not go unnoticed and neither would a consist,ent upward trend in the analyses of a Iaboratoq standard. T o use the terminology of Shewhart ( 4 ) , it would be concluded in each of these cases that there was some “assignable cause” present and t h a t the variations were not simply of a random nature due to “chance causes.” I n general, these intuitive concepts of nonrandomness can be summed up in the following four characteristics of observed data:
1. The presence of oxtreiiie v d u e i 2. The prec;enceof disc,ont,inuities 3 . The presence of trends 4. The presence of cyclic movements or periodicities The 1:tst t\vo could be combined undcr the general heatiii~gof “trmtis” but are most frequently thought of as unique chtiracteristivs. 1,ktreme examples of these types of nonraiido~n~irss are shon-n in Figure 1 (a,b, c , d , and e ) ; (I)shows a series of ciatit chosrri R P nearly as possible to be random. h sccond point which must alIvays be considered is that tlicse concepts w e a function of the order in which the observations occur. Thus a series of observations which appear complcqcly riiridoni when placed in one order may appear quite nonrandom when placed in another order. In every case except the first, the obvious nonrandomness of t’he examples of Figure 1 would disappear if the points were randomly ~carrangeil. As a further example, consider the results of three srries of 20 tosses of a coin, shon-n in Figure 2. The first shows the actual occurrenre of heads and tails in a series of 20 random tosses. The second and third lines show two difEererit rearrangements of this series of tosses, each containing the same numbel, of heads and tails. Although there is nothing unusud about the first series of tosses, one would immediately conclude, on looking a t the second or third series, that some sjsteniat’ic effect iim present. It is riot the number of heads and tail3 occurring rrhich causes us to suspcct some nonrandomness in the second and third cases; it is the order in which the). occur. Thus in studying the randomness or nonrandomness of a series of observations, they must always be consideiwl as ordered in some way-i.e., the natural order of occurrence, the magnitude of some associated variable, or some similar characteristic ; the tests for nonrandomness have meaning only with respect to this order. As niigh t be gathered from this discussion, there is no way of concluding from the examination of a given set of data that the data are random. However, i t is possible to make tests which enable the conclusion, with a given degree of confidence, that a given set of data are not i,andom-i.c., that systematic behavior of one of the types discussed is indic-atrd. The Shewhart control chart is on(: of the simplest of thcse. It’is primarily a means of segregating the extreme deviations, which are most likely due to nonrandom assignable causes, from the deviations due to randonl chance causes which it is usually economically unfeasible t o attempt t o eliminate. Another means of detecting nonrandomness is the use of regression methods-i.e., the fitting of a trend line to determine whether there is any evidence of a syfitematic increase or decrease in observations with t h c increase or decreitse in some other variable or, frequently, with time. Another method of detecting nonrandomness, chiefly used in the study of economic or business cycles, is that of serial correlation or the correlation of a series of observations with succeeding observations of the same series. Two other methods of detecting nonrandomness, which are the principal subject of this paper, are:
2063
.
.
2064
20
I
5
a)
IN D U S T R I A L A N D E N G IN E E R I N G C 7,I E M I S T R Y
. .
. I
10
15
. .
0
I b
20
Observations Showing Extreme Variations
,.... ,.....
d)
I
0 . -
0
,
.
1
I
.
Vol. 43, No. 9
0
0 .
I
I
15
10
10
Observntionr Showing Long Term Periodicity
. ..
0 .
0
I e)
I
I
I
5 10 15 Observations Shoring Discontinuity
20
20
-
.
0
.
. .
-
.. . -
-
-
A
I
0 I
0
Observations Showing Rapid Periodic Fluctuations
Figure 1.
f)
1 Series 1 Series 2 Series 3
T H T
2
T T T
3 T T T
4 T H T
5 H T T
6 H T T
7 T H T
1
I
20
15
Random Obscrvntions
Types of Nonrandomness
(1) the study of runs or the successive occurrence of like elements, and ( 2 ) the more complex and more sensitive mean square swcessive difference. As a simple example of the use of the theory of runs, let us consider the sequences of heads and tails of Figure 2 . In the first sequence, obtained randomly, there are runs of four tails, tMo heads, two tails, one head, etc. Note that there are thirteen runs in this sequence of 20 tosses, varying in length from one to four. I n the second sequence there is a total of fourteen runs, all of length one or two. T h e nonrandom behavior of these tosses is reflected in the increased number oi runs and in their consistency. In the third sequence there are only three runs-two of tails of length six and seven, respectively, and one of heads of length seven. Here i t is the small number of runs and their great length t h a t make it improbable t h a t this series of tosses was obtained in a random fashion. Thus in a sequence of events of two kinds, possible nonrandom behavior is reflected by either a few long runs or many short runs. If the actual number of runs is so small or so large as t o make it improbable t h a t such a series could have occurred randomly, i t r a n then be concluded that some type of nonrandom behavior is indicated. T h e distribution of the number of runs in a scries of 1 ) elements, of which 72, are of one kind and 722 of another, has been Figure 2.
a
10
. . - _
.
.
5
e)
a
- 0 -
determined undw thtt assumption that the stvies is actually random (6). From this distribution, upper and lower limits on the number of runs which will be eweeded with a given small probability can be determiried. For the special case in which the number of elements of ttach kind is equal, these liniits are given in Table I for series of various lengths. For exsmplr, in a series of n = 16 events of which = n / 2 = 8 itre of one kind and thc &her eight of another, as few or fewer than 5 ~ U I I Swould be espected only 5y0 of the time and as many or more than 1:) rutis only 5% of the time. If this few or this many runs were ac*tually observed, it would thus I)e likely that some nonrantloni I~eliavior was present. In order to use this Acthod of detecting nontwidomness 011 measurement data, it is necessary t o somehow transform thest. data into a sequence of events of two kinds. The method most, commonly used is the dmignat.ion of each element as being (a) above or ( b ) below the median. This has the advantage t h a t i t alwavs leads t o a series containing equal Iiunibers of a’s and b’s if the median itstblf is esc-ludd when there is a n odd ~iumt)erof measurements, and therefore the limits given in Table I are immediately applicable. Notice t h a t all the types of norirandomness shown in Figure 1, with the exception of ( a ) , would
Series of Heads and Tails
8 T T H
9 H T H
Toss Number 10 11 12
T H H
H T H
13 T T T H H H
14 H T H
15 T T T
16 H H T
17
18
T T T
T T T
19 H H T
20
T T T
INDUSTRIAL AND ENGINEERING CHEMISTRY
September 1951
2065
tw reflected by either too few or too many runs in sequences
TABLE I.
formed in this manner. As an example of t,he use of this method of detecting nonrandomness in chemical problems, let us consider the data shown graphically in Figure 3, which gives the weekly average coke yield for a period of 26 weeks in a coke oven plant, as recorded in a British Coke Research Association Report. The median in this case is 80.12, halfway between the thirteenth and fourteenth measurements, and hence 13 values are above and 13 are below the median. Counting runs of consecutive elements above and below the median, there is a total of six runs. Entering Table I for m = 13, it, is found that these few runs would be expected in less than 17' of the cases where the measurements were actually random; hence, some systematic behavior on the part of the yields over and above the random fluctuations from week to neek is indicated. Kotice that this would immediately preclude t h r use of these weekly averages in studying the variability in coke yields or the average coke yield to be expected, without some further study of the causes of the systrmatic variation present; it would also indicate the possibility of systematically higher yields if these causes could br econoniically eliminated. As a second example of the use of this method, consider tht) data shown in Figure 4. These data represent the differences bet,\veen two laboratories in det,ermining the density of 99 samples of uranium. These samples were analyzed over a period of time and are arranged in order of time of analysis. It is immediately apparent that the differenre is systematically negative. However, further investigation many show whether this differenoe can b(3 assumed t,o have remained constant over the period studied-i.c., can it be assumed that the variations in the differences are simply random fluctuations about somr constant difference.
m = nl = n2 5
6 7 8 9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
82
81
80
0 ; 7 9
i
%
s
n 76 ~
l
~
~
*
-
~
+
-
-
-
-
-
~
d
~
The median value or 50th value in order of magnitude is in this case -0.02. However, since the values obt,ained by the two laboratories were recorded only t o the nearest 0.01 gram per cc., there are 14 differences having this value. Of the remaining 85 differences, 38 are less than -0.02 and 47 greater than -0.02. Hence to apply t,he test, 11 of the differences having value -0.02 should be considered as below the median and 2 as above; the remaining value is considered the median and is omitted, since 99 - 1 there is an odd number of points. This will give -~ = 49 2 values on either side of the median. In this example, the two values which are considered as above the median (indicated by
~
5 6
4 4
9
5
A9
10 11 11 12 13 14 15 16 17 17 18 19 20 21 22 23 24
Upper Critical Values, :u a = 0.05 a = 0.01 10 12 13 14 16 17 18 20 21 22 23
6 6 7 8 9 10 10 11 12 13 14
24
26 27 28 29 30 32 33 34 35 36 37 39 40 41
14
1.5 16 17 18 19 19 20 21
art'o~vspointing upward in Figure 4) and the value to be considered as the median (indicated by a crossed dot) were chosen a t random. A more conservat,ive approach t o this situation is t o choose thosc values to be above and k)elow the median in such a fashion as to maximize the number of runs. A still better solution, xvhich was not feasible in this case, would be to obtain thc original analytical data and carry the results of the density deterniinations, and hcmc~ethe tiiffwencev, to another decimal place, thereby reducing the number of idcntical differences. When the values -0.02 are assigned as above or below the median, there are 47 runs. I n this examplem = 49 which is beyond the largest value of m given in Table I. However, for tn this large, the distribution of the number of runs in a random sequence is closely approximated by a normal distribution 1 and standwit'h expected value m ard deviation u = l/n/2. For 7n = 49, ?n 1 = 50 and 4 2 2 = 4.95; hence the observed number of 47 is less than one standard deviation from thrl expected value and not a t all unlikeljt o have occurred by chance. Thus, the test gives no indication of nonrandomness on the part of the variations in the observed differcnces. ~ 5 ~ [ ~ 6 ~ Runs above and below the median can he used in other ways to detect nonrandomness. Alosteller ( 3 ) has obtained upper limits on the length of the longest run above, below, either above or below, and both above and below the median which would be expected in a given sequence. For example, in the series of coke yields shown in Figure 3 , t.he longest single run was of length eight; the chances of occ'urrence of a run of this length or longer in a sequence of length 26 are less than one in 100. Similarly, there is a run of length seven or longer on each side of the median: the chances of such an occurrence in a random sequence of 26 [vents are again less than one in 100. Hence either of these ctons,iderations would lend to tbe same conclusion as before concerning the nonrandomness of these data. Mood ( 2 ) has eonsidered in some detail the expected distribution of runs of various lengths, and these are valuable in studying longer sequences, such
.
*
CRITICAL VALUESFOR RUNS
Lower Critipal Values. uLI a = 0.05 a = 0.01 3 2 2 3 3 4
+
I
+
INDUSTRIAL AND ENGINEERING CHEMISTRY
2066
as the data ‘shown in Figlre 4. Other authors have considered runs obtained in other fashions, such as using some other value than the median or considering the direction of change in successive observations-i.e., runs “up” or “down.”
Vol. 43, No. 9
general more powerful than the study of runs but which involves additional computations, is the use of the mean square successive difference. As the name implies, this is simply the average of the squares of the n - 1 differences between successive elements in a series of n measurements, or, symbolically, *2
= Z%(Zi+l -
Z*)2
n-1 For a series of random measurements the expected value of this quantity is twice the variance of the population sampled or 2u2. Hence since the expected value of the usual s a m d e variance
.03 .02 .01
.oo
.
-
-
is u2, the ratio q = P/s2 would have expected value 2. However, when nonrandomness such as a continuous upward trend, or a long-term periodic -.OB movement, is present in the data, 62, -.og which depends only on the difference 40 50 60 70 10 90 100 between adjacent elements, will in10 20 a0 Sample Number crease much less than s2: hence n would be expected t o be considerably Figure 4. Differences in Density Determination less than the expected value of 2. On the other hand, short rapid fluctuations in the observed values, although increasing both s2 and 62, will These methods of detecting nonrandomness by the study of cause a proportionately greater increase in ti2, and values of the runs are easily and quickly applied and furnish a good quick ratio greater than 2 would be expected. Again, the distribumethod of checking the randomness of a given set of data. They tion of 11 = P / s 2 has been determined for a sample of n measalso have the advantage that they do not depend on the distribuurements from a normal distribution (6),and from this values tion of the measurements involved, since the critical values given of 7 which would have a very small probability of being exceeded are obtained completely from a study of the possible arrangements unless some nonrandomness were present, can again be obof a given series of elements of two kinds. However, they are not tained (1). Both upper and lower 5 and 1% limits for q for very powerful in a statistical sense when applied t o measurevarious sample sizes are given in Table 11. ments-Le., they will frequently not detect nonrandomness when it is present. I n order t o illustrate this, let us look at the data shown in Figure 5. These are results obtained from the analysis 85.0 I I of a standard sample of uranium oxide (UaOa)and are arranged in 84.8 the order of analysis. Considering the runs above and below the median, there is a total of 9; although this is smaller than the expected number of 11, i t is not significantly low at the 5 % level. The longest run on either side is of length six, and the chances of a run of this length or longer in a series of 20 values are greater than one in ten. However, there seems to be some systematic variation in the data which the test in this instance has not detected. A second me-thod of determining nonrandomness, which is in
:
.
.
P m
-..- I
TABLE 11. CRITICAL VALUES FOR q5 Sample Size,
n 4 5 6 7 8
9 10 11 12 15 20 25
Lower Critical Values, ka a = 0.05
a = 0.01
0.78 0.82 0.89 0.94 0.98 1.02 1.06 1.10 1.13 1.21 1.30 1.37
0.63 0.54 0.56 0.61 0.66 0.71 0.75 0.79 0.83 0.92 1.04 1.13
Upper Critical Values, k.& a = 0.05 a = 0.01
3.22 3.18 3.11 3.06 3.02 2.98 2.94 2.90 2.87 2.79 2.70 2.63
83.6
83.0
I
-
I++
3.37 3.46 3.44 3.39 3.34 3.29 3.25 3.21 3.17 3.08 2.96 2.87
a These values have been reduced by a factor of from thoqe given in (I) because of the difference in definition of the sample variance sf; also, because of the nature of the distribution of ’1. the limits ka and kh for 01 0.01 are actually narrower for n = 4 than n = 5 .
-
+4-.te*
+1++
s-+
As an example of the use of this statistic, consider again the series of analytical results shown in Figure 5. Table I11 gives the numerical results with the successive differences and the computation of 9. As there are only two chances in 100 of q falling outside the 1% critical limits for n = 20 given in Table I1 (one chance in 100 of being above the upper limit and one chance in 100 of being below the lower limit), the value of q obtained is significant, and some nonrandomness is in-
INDUSTRIAL AND ENGINEERING CHEMISTRY
September 1951
by computing the standard deviation in the usual manner (Table 111) and gives a truer picture of the random variations to be expected in this analysis in the absence of the systematic variation present. Table I V shows the computation of 7 for the data of Figure 4. For values of n beyond those in Table 11, the significance of 7 can be tested by considering E = 1 - 7/2 as a normally distributed variable with expected value zero and standard deviation ue = , _ _ ~ _
-05
.Oh
.03
.M 0
E e
2
2067
.01 .a2 -.a1
P
-.a2 -.03
-.a -.@
Actually this approsimation is quite good for values samals NumLmrs of n as small as 10. A value of t = Figure 6 . Averages of Four Density Differences c/ue greater in absolute value than that obtained in Table IV will occur less than t\\ice in 100 by chance; hence some systematic variation in 5 T A B L E 111. COMPUTATION O F 7 FOR DATAO F FIGURE these density differences is indicated by this test, whereas no such conclusion was possible on the basis of the run test previously perSample Result, Difference, Sample Result, Difference I ? + ,- 2% SO. xi zit I - Xi h.0, xi formed. What caused this significant value is probably best 11 81 40 0.10 0.13 83.50 indicated by considering Figure 6, which shows the averages of 12 0.38 81.50 0.53 83.63 13 -0.34 84 88 -0.91 84.16 successive subgroups of four differences. I n the absence of some 0.16 81 54 0.11 14 83.25 1.5 84 70 of the random variations of Figure 4 and ignoring the extreme 0.10 0.90 83.36 - 0 56 16 84 80 -0.26 84.26 value of the average of memurements, 36 to 40, it appears that a t -0.13 84 24 17 0.61 81.00 0.41 84 11 -n,i5 18 84.61 Q9 about measurement 44 the difference in determinations shifted -0.38 19 84.52 -0 2 6 84 46 from an average value of about -0.03 t o an average value of 84 14 ... 0.20 PO 10 84 2 0 about -0.01. Hoviever, this interpretation of the existing nonn = 20 8 2 = 1684.26 3 9 = 141840.7214 randomness should not be considered as statistically conclusive S,(Z, - Z)q = 4.1341 Y , ( z , + ~- 2%)' 3.4664 but rLerely as an indication of the direction of the investigation of s2 = the systcmatic variation present. T,(T& - \ ) ? = 02176 n - 1 As mentioned before, the study of runs and the use of the mean square successive difference are cnly two of many methods of detecting nonrandomness, and certainly they are not the best or most powerful tests for all types of alternatives. IHowever, they are useful in testing for many of thc types of nonrandomness commonly encountered in chemical data. The use of the mean = 0.302 s = 0.466 square successive difference seems empirically more powerful in most cases; however, the use of runs is computationally easier, is theoretically less dependent on distribution, and is surprisingly COMPUTATIOB O F 7 FOR DATAO F FIGURE 4 T A B L E IF'. efficient, especially if something is known concerning the type of .\- = 99 22 = -1.73 C x 2 = 0.1092 nonrandomness expected, so that the test based on runs best able Z L ( ~ , - 1- 2%)' = 0.1197 P L ( x l - 2)' = 0.078969 to detect this can be chosen for use. As has been indicated in the examples, these tests can be used as a preliminary to other statis1 82 = Z,(.Z~,-- Z)* 0.0008058 tical tests or to the use of the data for estimation purposes, or n - 1 1 they may serve as a n end in themselves in the detection of the 6' = Z 1 ( ~ , + l - 2,)' = 0.0012214 n - 1 presence of systematic variation. Their use can therefore aid in avoiding faulty conclusions because of incorrect assumptions or 6' 00012214 = 1,j16 can indicate the need for investigation of factors systematically = 3 = O.COC8058 affecting the results obtained. 0.242 2,43 = 1 - 7/2 = 0.242 ue = = 0.0995 t = --- = 9800 0.0995 ~
4;
~
~
.,/E
LITERATURE CITED
dicated. From Figure 5 it appears that there is a systemdtic trend in these analyses with time, over and above the expected I andom fluctuation. It should be noted that since 6 2 is expected to have the value 2 u2> 62/2 is an estimate of u2,and hence 4 6 7 3 is an estimate of the standard deviation. This estimate of the precision of the analysis is much lower than the estimate obtained
(1) H i r t , B. I . , Ann. M a t h . Slat., 13,445 (1942). (2) Mood, A.M., I b i d . , 1 1 , 367 (1940). ( 3 ) Mosteller, F., I b i d . , 12,228 (1941). (4) Shewhart, W. A , "Economic Control of Quality of Manufactured Product," Kew York, D. Van Nostrand, 1931. (5) Swed, F.S., and Eisenhart, C., Ann. Math. Stat., 14,66 (1943). (6) Von Neumann, J., I b i d . , 12, 367 (1941). RECEIVED &lay 2, 1951.