INDUSTRIAL A N D ENGINEERING CHEMISTRY P U B L I S H E D BY T H E A M E R I C A N CHEMICAL SOCIETY W A L T E R J. M U R P H Y , EDITOR
Efficient Statistical Methods in Chemistry JOHN MANDEL’, Columbia University, N. Y. Statistical procedures are described for the evaluation and comparison of pwcirions, and for the estimation of accuracies. Illustrative use is made of numerical data taken from articles published in EDITION. Special emphasis i s placed on the practhe ANALYTICAL tical interpretation of the statistical tests and computations.
0
S E of thr oulsi:inding cIi:ir~ictc~i~i.;tics of motlt~rnscientific research is t h ( - importance attached to numvric’nl data.
In certain cases, tlic conclusions obtaincd from such nuniericxl niatrrial are obvioiih: Thus,a quantity can appcar to bc proportional to the squarcs ot anothcr variablv with a Iiigh dcgrw of accuracy. It is true that a conclusion of such degree of certainty requires the expvrinicntal elimination of all perturbing factors 01‘ an exact knowli~tlgcof thcir inHucnces. Furthermore the ohjcscts under investigation must be accurately defined: Thus, an espcrimenter rnca~uritigthe conductivity of a certain snbPtancc will w c a sample the purity of ivliich corresponds to thr. gtmrrally : I C copted standartla :It t l i c , time of his cxy)c,ririieiitation. These idcal c.ontlir iotis cnnnnot al~vaysbe fulfillcd. Scwiii1:ii.y factors, such as variations i n tcrnpcrature, atmospheric, prc’ssurc~, and many othcw, ho\vcvcr slight they may be, oftthn huvr dctcctat)lc effects. SO^ ~ i i ’ eour nit~:tsuring iiihtrurnents pc.rf‘cct. :iiid i i i adtlition to their inti,insie in1~:erfcctions are the suhjvctivcs prrors due to tlic rc3:iding of the instrument. On t h e other hand, ~ o u ~of ohjects of brientific iiivcxstigntion, thtsrc are n i ~ x m ~ ~rliisws \vhic*h arc. not capable of ; i n unequivocal ckfinition: Tliu-, t l i r concept “colloi(l:il *ilver”, cvcn if i t is further elucitlatcd by :in vsact procedure for it;. prccpwnrion, dow not reprcwnt :in entity \vlio,w quantitative 1)rop~rtit~s :ire constant within t l i c prcri.-ion of t h rnc~cLsurin,qi n 5 1r i i i i i c ~ n t ~ . The Huctuatioiih t l u c a t o thcw and similar caust“ of’i (’11 ohscure tliv trur incaning oi t h c esl;ei.imental finding.-. \\Iic~wver an itnprovenlc,nt r : ~ ntic, otit:iincd by possible rcarrangc~ni(wtof the cupc~riment,this i.h gc:iier:illy the thing to do. Rut tlii.; again is tiot always po.;:“ihle, ; 4 n d , nh t h i 5 paper inttmtts t o ~ I O I V , not (\veil a I ~ . : ~ yneccssary. s The motfrrn ~n(~t!ioils of inathematical stati.;tic.q, when properly applied, can throw cnn4dcr:ihle light on situations of this typc,. I n this connc~rtion,t l i i ~ c ~tact< c~ requirc s p c ~ i aelripha l 1. The compiit:it ion:il h b o r involved is much less cumhersome than is c1ftc.n thought, p r o v i i l d that onr uqcs the shortest rnrthodd of calculation. Tliia c:in be further rctfuced by thc u s c of a calculating macatiiiic,,n o \ v avai1:ible i n many chemical laboratories. 2 . Modwn .;t:iri.tic:il tnt~thods are not confined to mere fahulation o r (‘.t i i i i : i t i o i i of parameters, such as averages and -tuntiard tlrviatioii-; t!icy ninlic~it possible to clxpreso i n terms of prohahility t!iv .yignific:inw of such estimates, t h u s giving an ohjective mcaniiig to the coiiclusions drawn from t h e experiment. 3. Finally, niotlcrn methods, without requiring more computational work, are more efficient than the older ones.
To illustrate this, let us assume, for example, that an object be weighed by ten different persons, to 0.01 gram, with an estimation of the milligram from the scale reading. If we are asked about a way of estimating the correct weight, to 1 mg., on the basis of these ten weighings, we have several courses of action; one, for example, would consist in taking the arit,hmetic mean of the ten numbers, another in taking the mean value between the lowest. and the highest among them. It can now be roved on the basis of certain assumptions concernin,: the progabilit; (listribution of these experimental readings (such a 3 “normality” arid “independence”) that the variability of the estimate obtained by the former method is less than the variability of the estimate obtained by the latter. This fact is expiessed by saying that t,he first method is the more efficient one. Inother words, if, for example, the experiment in question were repeated daily for a period of a year and the results plotted on two graphs, one for the first method and one for the se ond, taking the estimate as abscissa and the frequency of its occurrence as ordinate, the first graph would very likely show a better clustering of the estimates about the true value than awuld the second graph. \\’hat appclars here is one of the main features of statistic:, and can be expressed for this particular cxamplc by the assctrtion I hat \vhiIe there is no justification for stating t,liat “thc fornicr niethoti :il\vays yields more accurate estimates than tlie latter”, it ib ncverthcltw true that, in the long run, the formcsr method of cstimation is more precise. In this article some questions confronting the practical analyst as well as the analytical rebearch chemist are examincd from the statistical point of view. Thc numerical material serving as illustration is t a k m cntirely from previous publications in the A N A L Y T I C ~E:DITIOX. L The emphasis is upoii the practird statistical procedure. Such statistical trcatnicnt requires, Iiou.ewr, that certain underlying assumptions he at lrast approximately fulfilled; whether or not this is thc: case in the cxamples treated is lcft out of consideration; the readcr who is willing to apply statistical methods to the interpretation of his ~ w i data i should, hoivc~ver,give some thought to these matters. One of the most iniportant of these undm-lying assumptioils-statistical iridependenre-is briefly d i s c u s , d .
’
I
Present addrec;s, R t i C u r p a r a t i o n , New York, N. Y .
20 1
E S T I M A T I O N AND C O M P A R I S O N OF PRECISIONS
The precision of an analytical method, or more generally of a measurrment, is a measure of its reproducibility within certain limits of error. In the caw of a nicw,uring instrument, the precision is often cxpresstxd by adding to the ohserved value the symbol “ + e ” , e being the “maximum error”. The analytical chemist, however, who is dcaling with data resulting from an often long arid complicated sequence of operations, nceds replicate values in order to evaluate their precision. I t is therefore imperative to utilize objective mcans for the evaluation of a precision from a given series of replicate determinations. T h e most satisfactory way of doing this consists in computing an estimate of the “variancc”, u*,which is the square of the more
INDUSTRIAL AND ENGINEERING CHEMISTRY
202
commonly used “standard deviation”, will observc
u.
Before defining 8,we Table
The existence of such a population is postulated. It is true that in actual practice an experiment is never completely “controlled”; the extent to which this circumstance invalidates statistical estimates should not be exaggerated for two remons: An experiment or an analysis is not an end in itself: I t is supposed to prove or disprove something or to lead to some definite action. But any nonstatistical conclusion is just as much subject to the uncertainty resulting from lack of “control” as a statistical one. The lack of control itself can be statistically detected, and probably better than by nonstatistical means. Evidently all knowledge about any change in conditions should at once be utilized; statistically this is often accomplished by subdividing the population into subpopulations. To get the “true” variance, or “population variance”, we add the squares of the deviations of all keplicate values from the “true” value (“population mean value”) and divide by the number of replicates. Since this number is in general infinite for the total population, a mathematical limit is taken and the summation becomes an integral. Obviously the population variance is not determinable by experiment; the best we can obtain from a finite series of measurements is a “sample estimate” of u*, denoted by s2. Now, for several reasons, one of which will soon become apparent, the often used formula N
stands for the a t h of a series of N measurements and
f for their arithmetic mean, suffers from certain shortcomings and
can very advantageously be replaced by
I. Determination of Chromium and Nickel
Determination
1. That u z is a better way of expressing precisions, even in the case of measuring instruments, than “*e’’, on account of the vagueness of the “maximum error” concept. 2. That u2 is a “population parameter”, the term “population” or “hypothetical infinite population” denoting the totality of all (hypothetical) replicate values which could have been obtained under the given conditions of experimentation, these conditions being subject to their ordinary fluctuations only.
where
% Cr 28.35 28.80 28.80 28.75 29.05 28.59 28.75 28.99 28.52 28.38
1 2 3 4 5 6 7 8 9 10
(XI
- a ) + (21 - 2 ) + . . . + (ZN - 3 ) =
( 2 1 + 2 1 + . . . f ~ ~ ) - N ~ = N f - N f= O
Therefore, only N - 1 of these N numbers can vary freely; hence the statement that s z is estimated with N - 1 degrees of freedom, and, incidentally, the use of N 1 in the denominator of s2.
-
The usefulness of a “statistic”-i.e., a function of observations such &s s* or the “mean” f-as an estimate of the corresponding “true value”, u z or p, is necessarily limited by the inaccuracy of these estimates, in the case of small samples. (“Sample” here obviously refers to the statistical concept: a number of observations from the same statistical population.) But the real justification for their computation lies in the fact that, even in the case of small samples, they make it possible to obtain answers to a variety of problems, known in statistical terminology as “problems of testing hypotheses”. We will now examine some of these problems on the basis of EDITION. data published in the ANALYTICAL The data in Table I represent 10 determinations of chromium and nickel made on a single sample by a spectrographic method ( I ) . They are sufficient material for the estimation of the precision. I t will be extremely helpful always to keep in mind the two following theorems, when calculating sample means and sample variances:
N (xu
N-1
u-1
We will now say that: 1. The sample estimate of the “true” value of z is 22
2. The sample estimate of sz =
(21
- f)2
= f - C
Hence,
u2 is
(ZN
- a)*
N - 1
- 1 degrees of freedom.
The latter point requires some clarification: The measurements z1 to ZN are independent observations, which means that the probability that any one of them has some given value is unaffected by the values already found for the previous measurements. There would, for example, be lack of independence if the eye of the experimenter became more tired as the number of observations increases. Another example would be given by the drawing of a slip from a n urn containing ten slips marked from 1 to 10: Having drawn slip No. 6, the probability that any subsequent drawing be slip No. 6 is zero: in this case repiacement of the drawn slip in the urn restores independence. On the other hand, the N numbers 21
A:
- c)
ZN
+ (22 - n)* + . . . +
3. s2 is estimated with N
17.68 18.17 17.42 17.70 17.78 18.16 17.70 17.45 17.72 17.83
are obviously not independent, since the knowledge of any N - 1 among them yields immediately the value of the Nth, as resuits from the fact that their sum is equal to
= a=l
+ +N . . . +
% Ni
1. If a same constant, c, is subtracted froni N numbers, their arithmetic mean is diminished by this constant; in symbols:
N
f = 21
Vol. l?, No. 4
- f,22 - f, . . ., ZN - f
2. The sum of squares of the devihons of N numbers from their arithmetic mean f is less than the sum of squares of their deviations from any other number c; the difference amounts to
N(c
- f)z: N
(2, a=l
- 2)Z =
N
(zu -
C)Z
- N ( c - f)2
(D)
0-1
Those two formulas constitute a considerable simplification in the calculations, provided constant c, which is arbitr,ary, be chosen as a round number near a roughly estimated mean. In this case we might take
c = 28.50 for chromium and
c = 17.70 for nickel Table I1 exhibits the calculations.
ANALYTICAL EDITION
April, 1945
The author of the method obtained for the value3 of
cerning the nature of the experimental errors affecting our data: ill a w m e that these errors follow the ‘‘normal law of We w
6:
For chromium: s
= 0.223 s = 0.237
For nickel:
-
The difference arises from his me of N instead of N 1 in the denominator of g2. In the last paragraph of the paper considered, data are given representing the ‘ I % deviations” obtained by means of 7 determinations on the same sample by chemical methods. If we consider that they were calculated as
70deviations
100
=
% Cr or xi
we find for sL (as we defined it) : 9?
=
(%deviations) X (% Cr or Si) 100
sum of squares 6
Hence,
[“”10028’6981 X
For chromium:
0.89 X 17.761 2 1
For nickel:
1 = o.0291
Suppose that a statistician, who is often confronted with questions of this type, consistently bases hi.? decision on the a p plication of this test, using every time the same level O f significance, say 5%; it will then occasionally occur that he rejects a hypothesis which happened to be true; but the probability for this being 5%, the fraction of such cases of erroneous statement8 will actually tend to 5%, as the number of deci:ionq increase-; indefinitely. This does not mean that 95% of his decision. in the long run, commit a type I errrJr 10 time- in a hundred. iVe could reduce t h i - ri;k fCJ ’yo by chrv,-inz the critical value F. = 5.35, b u t in that ra-e increa-c. rlir ri-k of ride type I1 error-i.e., ~f con4dering as i~lentii.al,prpf.i-ion- 51 ijich are actually different.
INDUSTRIAL A N D ENGINEERING CHEMISTRY
204
Our numerical example gives a value for F far below both tabulated values: We will say that “F is not significant” (on either level of significance), which means that the evidence from this experiment is insulXci.ent to detect any difference in precision between the two spectrographic methods, on either level of significance. We would probably have anticipated this result from the very nature of the spectrographic analysis. If then, consistently with this, we attribute the same variance to both methods, we will obtain a better estimate for this common variance by adding the sums of squares and dividing by the sum of degrees of freedom: 0.5603 = o.0590 sa = 0.5010 9 f 9
+
(Whenever a variance is to be estimated from a set of data which can be apportioned among several groups with different population means, this procedure should be used. The number of degrees of freedom is the difference between the total number of observations and the number of groups.) This estimate is now based on 18 degrees of freedom, and, measures the precision (or rather the lack of precision) of the spectrographic method. I n exactly the same way, we can now compare the precisions of the spectrographic method and the chemical methods. We thus obtain: For chromium: F =
-- 4.77
while the F table, for 6 and 18 degrees of freedom, gives
2.66 for the 10% level 4.01 for the 2% level
0 0590 -= 2.03 0.0291
The relevant table values are the same as above. Here F is not significant; there is consequently no reason to consider either of the two methods-apectrographic and chemical for nickel-more precise than the other. It will now be apparent that the use of N - 1 as the number of degrees of freedom in the denominator of an estimated variance is but a special case of the more general situation referred to in the note just above; its merits become evident when the variance estimate is to be used in a test of significance. ESTIMATION OF THE A C C U R A C Y
OF A SERIES OF MEASUREMENTS
It is a commonly known fact that some analytical methods yield very close results on replication and are nevertheless faulty: This is the case wherever a systematic error, affecting equally all replicates, is present. From the statistical point of view we can consider such cases as being obtained by a horizontal shiftTable Composition of Sample, Cresol
% 0.00 0.00 0.45 0.67 1.03 1.37 1.40 2.08 2.33 2.83 3.43 4.17 4.71 5.59
111.
Mean of Determinations
10 5 10 5 10 10 10 10 10 10 10 10 10
66.39 06.41 07.00 07.36 07.70 08.19 08.33 69.18 69.40 70.23 70.77 71.59 72 34 73.42
a
10
c.
ZW,
-
16 5 10 10 10 10 10 10 10 10 10 10 130
X 102 1
IV. Calculations
x
W,6,
W, X
102
10
-1
0
0
- 35
-7 7 3
1 -2 9 -8 4 9 -3 -9
70 30 10 -20 90 80 40 90 30 90 ZIvQ6, X 101 = 80
-
-- -
ZW,6:
X 10‘
Calculated Cloud Point
c.
66.40 00.40 07.00 07.29 07.77 08.22 08.34 69.16 69.49 70.15 70.81 71.68 72.31 73.33
-
6:
X 10
10 r. 0 245 490 90 10 40 810 640 160 810 90 810 4210
Mean value of 6 6 0.80/130 = 0.00615 Sum of squares 0.4210 - (0.00615)X (0.80)= 0.416 Variance of = 0.416/13 = 0.032 (13 degrees of freedom) Variance of 6 = 0.032/130 = 0.000246° StanGard deviation of 0.0157 1
- -
a
-
O.OM15/0.0157
-
=
0.391
si
The table giving the probability distribution of “student’s t” (8, 3) shows that the probability of a greater value of 1 , in the case of 13 degrees of freedom, is 0.7. CONCLW~ION. t is not significant. a The variance of the arithmetic mean of N independent Observations of same precision equals one Nth of the variance of a single observation; this is a simple consequence of the following two important theorems: 1. The variance of an algebraic sum of several independent observations equals the arithmetic sum of their variances. 2. The variance of a product of a variable by a constant equals the variance of this variable, multiplied by the square of the constant.
ing of the distribution function: The frequency of occurrence of any particular value z becomes the frequency of occurrence of the value z E , where E is the same constant for all values z, and represents the magnitude of the systematic error, also called bias. The population mean, instead of representing the true value, now represents the “true value plus systematic error”. The accuracy of the method can now be expressed by the smallness of this bias. Evidently the variance is still a valid measure for the precision, but the mean fails to represent the “true value”. How are we to test the accuracy of a method? It is a t once clear that from the sole knowledge of a series of measurements belonging to the same statistical population it is impossible to state anything about the accuracy. We need some means of comparison: either the true value itself or a series of corresponding measurements, obtained by a different method, of known accuracy. As an example consider the interesting determination of ocresol in phenol by a cloud-point method ( 5 ) . Table I11 contains the data given by the authors. Our problem is to ascertain the accuracy of the values given in the column “calculated cloud points”, by comparing them with the corresponding accurate values in the column “means of determinations”, keeping in mind that the latter mentioned data are not the original observations, but rather means of varying numbers of determinations as indicated in column 2. Therefore, in evaluating the average deviation of a single observation from its “calculated” value, we will ‘‘weight]’ the observed deviations, 6, before summation, as indicated in the third column of Table IV. This average, denoted by and found to be equal to 0.00615,is a measure of the systematic error, E , which possibly affects the %alculated” values. But are we really justified in regarding this small value as a systematic error? Could it not just be a sampling error, and our method perfectly accurate? The clue to the answer lies in the following fact: The bias, E, affects equally all the differences, 6, between columns 3 and 4 of Table I11 (and consequently also their average, ; therefore the variability of as expressed by its variance or its standard deviation, is not affected by E . If we therefore consider the ratio t of to its standard deviation ui, we will conclude that only the numerator is affected by E , and consequently the value found for this ratio is indicative of the magnitude of E. In par-
a
Determination of o-Cresol in Phenol
No. of Determinations
F,
,6
+
We conclude that F is significant (even on the 2% level) and consequently consider the spectrographic procedure as definitely more precise than the chemical procedure for chromium. For nickel: F =
Table W, 10
Vol. 17, No. 4
a,
a)
ANALYTICAL EDITION
April, 1945
205
+ 1-Le.,
ticular, a high value of t, having a low probability of occurrence when E = 0, makes the alternative hypothesis E # 0 more credible. The test will thus consist in:
On the other hand 6 would become i would then find fur t:
1. Assuming momentarily that E = 0 (no bias; method accurate) 2. Calculating on this basis (and on the assumption of “normality’’ and “independence” of the errors) the probability of obtaining a value of 2, as high as or higher than 0.00615, using the square root of the sample variance as a means of comparison 3. Rejecting the hypothesis E = 0, if this probability is low, say the sample value. Therefore the following two prohlems arke: ( a ) to calculate confidence limits corresponding to’some specified Confidence co-
206
INDUSTRIAL AND ENGINEERING CHEMISTRY Table VI.
Material
Determination of Lead
Lower Limit 2.39 11.7 9.6 24.1
MgCOs
ZnO Zn stearate Ocher CaCO:
Upper Limit 3.16 20.3 17.0 49.9 0.32 60.9 3.33 22.9 25.6
-47.1 0.16 - 12.5 0.63
Ti02
.Talc Kaolin Bas04
23.2
Central Value 2.78 16.0 13.3 37.0 0.08 64.0 1.45 17.7 24.4
efficient-Le., a probability indicating in what fraction of cases the confidence interval will, in the long run, actually cover the true value; and ( b ) to judge, in any given case, whether the result is significantly different from 0. The t test is here appropriate and will rovide a n answer to both questions. Thus for magnesium carionate (first item in Table V) we find: Mean of sample determinations = f = 7 . 9 difference: d = Mean of blank determinations = jj = 2 . 3 5 ) f - ?j = 5.55
We conclude that d is significantly different from 0. Confidence limits for d ate found as follows: Table value for t corresponding to P = 0.05: t = 4.30
(0.18) X (4.30) = 0.77 0.77 = 4.78 { 5 5555 +- 0.77 = 6.32 Finally, for 1 gram of magnesium carbonate, the limits are:
2.39 and 3.16 p.p.m. of Pb Results obtained in a similar fashion for the eight other items together with the corresponding central values, are given in Table VI. I t is seen that in the case of calcium carbonate and talc the lower limit is negative; the interval includes 0, which, as we know, is to be interpreted as indicating that the observed difference in p.p.m. of lead between sample and blank is not significantly different from 0. I n other words, there is no evidence, on the 5% level of significance, for considering the sample as containing more lead than the blank.
‘
Estimate of variance for a single determination:
l(7.8 st =
-
-
1)
Degrees of freedom = (2
+ (2 - 1)
1)
+ (2 - 1) = 2
Estimate of variance of d: 8:
LITERATURE CITED
- 7.9)s + (8.0 - 7.9)*)+ { (2.2 - 2.35)’ + (2.5 - 2.35)*) = o.0325 (2
= Sf
+ t =
SI
2
= 0.0325 -
2
+
0.0325 = o.0325 2
5.55 = 5.55 d0.0325 = 0.18
30’8
Corresponding prohahility (found in t table for 2 degrees of freedom) :
P
=
0.001 (approximately)
Vol. 17, No. 4
(1) Coulliette, J. H.,IND.ENG.C H ~ MANAL. ., ED., 15, 732-4 (1943). (2) Fisher, R. A., “Statistical Methods for Re-
search Workers”, New York, G. E. Stechert & Co., 1941.