V O L U M E 27, NO, 2, F E B R U A R Y 1 9 5 5 kl = ( - 1 ) ( - 0 00010439) (-0 00036819) (-0 07516473) (1.00245575) = 0.00000000 (from column under *4czz8) = ( 4 X lo-*) (8 X or, by inspection, ( 1 X 3.2 x 10-9 o.oooooooo kr = ( -1) ( -0.39583333) (1.00036819) (1.00527202) (1.0oooo091) = 0.39806740 (from column under T h z B ) ks = ( - 1 ) (-0.00148968) (l.Oi546473) (-0.00527292) (1.00015332) = -0.00000845 (from column under Razz4) k8 = ( - 1 ) ( - 3 x 10-7) ( - 2 x 10-6) ( - 9 x 10-7) ( - 2 x 10-4) g ~ . ~ ~ O O(from O ~ column O under RnzZO) Therefore,
n’,(o)=
1.24247417 X
(e-X:t
-
1.39805897 ecAnt f
0.39806740 e - i l t - 0.00000845 C - A J ~
299 LITERATURE CITED
(1) Bateman, H., Proc. Cambridge Phil. Soc., 15, 423 (1910). (2) Cork. J. M.,“Radioactivity and Nuclear Physics ” 2nd ed., p. 27, Van Nostrand, Kew York, 1950. 26, 1595 (1954). (3) Flanagan, F. J., and Senftle, F. E., ANAL.CHEM., (4) Friedlander, G.. and Kennedy, J. W., “Introduction to Radiochemistrv.” DD. 109-10. Wilev. New York. 1949. (5) Hollander,”J. ii.,Perlman, I.,”and Seaborg, G. T., Rets. M o d . Phys., 25, 469-651 (1953). (6) Kirby, H. W., and Kremer, D. A . , “Simplified Procedure for Computing the Growth of Radioactive Decay Products,” Office of Technical Services, Department of Commerce, mashington 25, D. C., MLM-970, U. S. Atomic Energy Commission, May 10, 1954. (7) Rutherford, E., Chadwick, J., and Ellis, C. D., “Radiations from Radioactive Substances,” reprint ed., p. 13, Cambridge University Press, London, 1951. RECEIVED for relieu, J u n e 14, 1954. Accepted October 14, 1954. hlound Laboratory is operated by Monsanto Chemical Co for t h e United States Atomic Energy Commission under Contract Number AT-33-1-GES-23.
The Rank Correlation Method JOHN T. LITCHFIELD, JR., and FRANK WILCOXON Stamford Research Laboratories a n d Lederle Laboratories Division, American Cyanamid Co., Stamford, Conn., a n d Pearl River, N. Y.
The usefulness of the rank correlation method has been increased by providing a table of critical totals for two levels of probability and a nomograph which gives the rank correlation coefficient directlJ Use of the method is illustrated.
.
C
O R R F L I T I O S or association between two variables is
widely used RS a basis for product control and evaluation. Frequently it is difficult, time-consuming and, hence, C o d y to measure directly t h a t property of a product which is of major interest. In such cases, search is made for some other property which can l)e measured quickly, easily, inexpensively, and which is closely related to the propert’y of major interest. Generally, a number of properties are esamined and that one showing the best correlation is selected. -1rapid test for correlat,ion is very useful, as it circumvents uiirieressary computations of correlation coefficients in cases where there is poor or insignificant correlation. T h e Spearman rank correlation method (4,6, 9) is generally satisfactor>- for t h i j purpose, and like other quick methods for detecting association ( 4 ) 7 ) is nonparametric-i.e., t,he normality of the distrihution sampled need not be assumed. However, the tables of critic:il totals given by Olds ( 6 ) , Bendall (4)) and Dixon and 3Iassey ( 1 ; : r e brief or incomplete. Furthermore, the values given by Dixon and 3Iaseey are for a probability of 0.05 and 0.01, only if t h e experimenter is interested in a one-sided test. I n many cases one is interested in the probability of exceeding the critical or - direction, and wishes to work value of the coefficient in a a t the 5 or 1% level of Fignificance. In order to facilitate use of the rauk correlation method. a table of critical totals of squared rank differences and a nomograph which permits direct reading of the rank correlation coefficient were prepared covering 6 t o 40 pairs of observations and two probability levels, 0.05 and 0.01. These probability levels are for the case described above in which the test is made for presence of correlation whether posit’ive or negative, and if in this test an observed total falls &hin the critical limits, the conclusion will be reached t h a t the correlation may \vel1 be zero-Le., there is no correlation.
+
CO\IPUTATIOUS
T h e Spearman rank correlation coefficient, designated as has limits of + l to -1, and is obtained by solution of:
p,
where R.D. = difference between paired ranks, n = number of pairs, and Z = summation over n values of (R.D.jZ. Iiendall (4)calculated the exact, distribution of the sum of squared rank differences for values of n from 1 through 8 and found considerable departure from normalit,y, particularly in the tails of the curve. H e showed that for values of n greater than 8 and less than 21, a good approximat,ion [vas given by making use of Student’s t distribution as: n - 2 ‘ = p t ; l l ; P ?
Substituting the espression for
p
(Equation 1 ) and solving for
Z(R.D.)*gives:
Using the values of 1 for p = 0.05 or 0.01, and n - 2 degrees of freedom, this equation v a s solved for the lower and upper critical totals of squarcd rank differences for all values of n from 8 through 40. Values of f were taken from the table in Snedecor ( 8 ) , except in the case of t for 31 through 34 and 36 through 38 degrees of freedom, where graphic interpolation was used. The accuracy of the interpolated figures was equal to that of the tabular values. I n Table I, the lower and upper critical totals of squared r m k differences for probabilities of 0.05 and 0.01 are given for n = 5 to n = 40. The values for n from 5 through 8 were taken from I and DuRois (3) give methods of correcting p for cases of tied ranks. However, when ties are not very numerous, this correction makes little difference in the final result. .in exact correction for confidence limits when ties orcur appears not t o h a r e been published ] T h e y values are then ranked independently from 1 to n (low to high). Xext, each { j rank ip subtracted from its corresponding 5 rank t o give n rank differ,ences, and these 3hould add t o zero as a check on the withmetic. Each r a d i difference is squared and the sum of the squared rank differences ii: accumulated, Z(R.D.)*. K i t h Z(R.D.)zfor rz pairs: reference t o Table I indicates lvhether t h e correlation is significantly different from zero a t either the 5 or 1% level of significance, while a straight-edge placed on the nomograph gives an estimatr of p, the rank correlation coefficient. EXAMPLE
In the production of a special blend of corrosion inhibitors, uniformity from one batch t o another was controlled by analysis for per cent zinc. This was a n expensive control because t,he analysis was very slow, and production batches had to be held in storage. -4study was made, therefore, to find other methods of anaIJ-sis which were quick and easy and which were correlated with per cent zinc. The following data were obtained when both per cent zinc and per cent ash were determined on each of eleven different batches. Per cent zinc and ash were ranked separately and Z(R.D.)zwas obtained as shown helon:
100
i :: 4
U S E O F RARK CORRELATION METHOD
40
Batch 170
10
1 2 3
94
4
7 i
t
3
1.9.
6 !
Figure 1. Nomograph for obtaining p, the rank correlation coefficient
Table I.
Critical Values of 2(R.D.)2for the Rank Correlation Method“
Use of table. If the observed total is equal t o or less than the appropriate lower tabular value, or equal t o or greater t h a n the appropriate higher tabular value, the correlation is significant for t h a t probability. High values correspond t o negative correlations, and low values to positive correlations. Z(R.D.)Z Probability Z ( R ,D ,) za Probability ?z b 0.05 0.01 n 0.05 0.01 873-2207 ... 21 695-2885 22 820-2722 1022-2520 ... 1187-2861 23 ... 960-3088 ... 24 1115-3485 1370-3230 .. 5 0-40 ... 1287-39 13 1570-3630 25 0-70 1789-4061 6 4-06 26 1475-4375 2028-4524 12-100 4-108 1681-4871 7 27 10-158 2287-5021 1906-5402 28 22-146 8 2149-5971 40-200 9 24-216 2569-5551 29 39-291 2414-6976 10 61-269 2873-6117 30 58-382 11 2700-7220 88-352 31 3199-6721 84-488 3550-7362 12 121-451 3008-7904 32 3926-8042 3338-8630 13 163-565 115-613 33 14 154-756 4328-8762 3693-9397 34 2 13-697 15 201-919 4157-9523 4073-10,207 35 272-848 257-1103 5213-10,327 4476-11,064 36 16 342-1 0 18 322-1310 4908-11,964 423-1209 37 17 5698-11,174 5366-12.912 515-1423 398-1540 6213-12,065 18 38 484-1796 6758-13,002 5853-13,907 621-1659 19 39 7334-13,986 6367-14,953 740-1920 20 40 583-2077 b
Z’(R.D.)* = sum of squared rank differences. n = number of pairs ranked.
5 6 7 8 9 10 11
Zinc, 5.33 5.63 5.42 5.39 0.45
5.33 o . 36 5.37 5.34 5.39 5.30
.Ish,
7
17.97 18.13 18.06 18.05 18.16 17.73 17.75 17.92 18.01 19.02 17.92
Rank of Zinc Ash 2.5 11.0 9.0 7.0 10.0 2,5 4 0 5.0 7.0 7.0 1.0
Rank Difference
(R.D ) *
5.0 9.0 8.0 7.0 10.0 1.5
-2.5 +2.0 t1.0
0 2.5 4 00 1.00
1 J
1-2.5 +l.5 +1.0 -4.0 -2.5
(i.25 2.25 1.00 16.00 6.25
+KO -9.0
Z ( R . D . ) * = 44.00
3.5 6.0 11.0 3.5
0.0 0.0 +1.0
...
1
.oo
Reference t,o Table I for n = 11 reveals that the observed total 44) is less than t,he critical total of 58 for p = 0.01. There is, therefore, a significant positive correlation, and there is less than 1 chance out of 100 that an observed correlation as large as or larger than a correlation corresponding to a critical total of 58 is due t o experimental error. T h e rank correlation coefficient p was read from t h e nomograph, by connecting with a transparent straight-edge n = 11 and Z(R.D.)* = 44, as p = +0.8. LITERATURE CITED
(1) Dison, W. J., and Nassey, F. J.. “Introduction to Statistical
Analysis,” p. 261, 1IcGraw-Hill Book Co., New York, 1951. (2) Douglas. R., and .%darns, D., “Elements of Nomography,” AIcGraw-Hill Book Co., Sew York, 1947. (3) DuRois, P. H.. f s g c h o l . Record, 3, 46 (1939). (4) Kendall, 31. G., “Rank Correlation lIethods,” Grifiin and Co., Ltd., London, 1948. 15) Levens, A . S., “Somograyhy,” lViley.?iew York. 1918. (6) Olds, E . G , , Ann. Math. Statistics.9, 133 (1938); 20, 117 (1949). (7) Olnistead, P. S..and Tukey, J. IT., Ihi?., 18, 495 (1917). (8) Snedecor, G. TIr., “Statistical Method?,” Tables 3 arid 8, Iowa. State College Press, dnies, Iowa, 1940. (9) Spearinan, C.. Brit. J . Psycho!., 2, 89 (1906). (10) Woodbury, 11.h.,Ann. Math. Stafistics,11, 368 (1940). RECErVEn f o r review March 20, 1954. Accepted October 14, 19.54, Fullsize copies of t!ip nomograph will be iiiade available by t1.p authors on request.