1712
ANALYTICAL CHEMISTRY Kuster, F. W., and Stallberg, A., A m . , 278, 215 (1893). Lindenbaum, d.,Schubert, J.,and Armstrong, W. D.,. ANAL. CHEhl., 20, 1120 (1948). XIcCready, R. >I., and Hassid, W.Z., IXD. ENG.CHEY.,;INAL. ED., 14, 526 (1942). Nessinger, J., Ber., 23, 2756 (1890). IIeyer, Hans, “Organic Analysis,” Berlin, Julius Springer, 1938; .bin Arbor, Mich., Edwards Bros., 1943. Neville, 0. K., J . Am. Chem. Soc., 70, 3499 (1948). Sicloux, 31.RI., Bull. soc. chim. biol., 9, 639 (1927). Raaen, V. F., and Ropp, G. A , -4s.~~. CHEM.,25, 174 (1953). Sinex, F. AI., Plasin, J., Clareus, D., Bernstein, W., Van Slyke, D. D., and Chase, R., in press. Thompson, W.B., K. Y. State Dept. Health, Ann. Rept. Diu. Labs. and Research, 1945, 21. Van Slyke, D. D., and Folch, J., J . Biol. Chem., 136, 509 (1940). Van Slyke, D. D., and Kreysa, F., Ibid., 142, 765 (194.5). Van Slyke, D. D., Page, I. H., and Kirk, E., Ihid., 102, 635 (1933). Yan Slyke, D. D., and Plaain, J. P., unpublished work. 1-an Slyke, D. D., Plasin, J. P., and Weisiger, J. R., J . Biol. Chem., 191, 299 (1951). Van Slyke, D. D., Steele, R., and Plasin, J. P., Ibid., 192, 769 (1951).
counter. For such, since the combustion itself requires only a few minutes, the procedure makes combustion of many samples per day possible. The standard error of 0.2 to 0.3% in specific carbon-14 activities determined by Raaen and Ropp indicates the reproducibility of both the counting procedure and the combustion with the “anhydrous” mixture. ACKNOWLEDGMENT
The author acknowledges the generous support of Eli Lilly and Co. LITERATURE CITED
(1) Backlin, E., Biochem. Z., 217, 483 (1930). (2) Bernstein, W., and Ballentine, R., Rev. Sci. Instr., 20, 347 (1949). (3) Boivin, 9., Bull. soc. chim. biol., 11, 1270 (1929). (4) Buchanan, D. L., J . Am. Chem. Soc., 74, 2389 (1952). (5) Farrington, P. S., Kiemann, C., and Swift, E. H., ; ~ N A L .CHEM., 21, 1423 (1949). (6) Folch, J., dscoli, I., Lees, AI., Neath, J. A , and LeBaron, F. S . , J . Biol. Chem., 191, 833 (1951). (7) Hoagland. C. L., Ihid.. 136, 533 (1940). (8) Ibid., p. 543. (9) Hoagland, C. L., and Fischer, D. J., Proc. Soc. Ezptl. Biol. M e d . , 40, 581 (1939). (10) Kirk, E., J . Biol. Chem., 106, 191 (1934). (11) Kirk, P. L., and Williams, P. A., IND.ENG.CHEY.,d s a ~ ED., . 4, 403 (1932).
RECEIVED for review August 28, 1953. Accepted July 19, 1954. Presented before the Division of Analytical Chemistry at the 123rd Meeting of the A a a ~ n r r . 4CHE?.IICAL ~ SocmTr, Los dngeles, Calif., I\.lrtrch 1953. Research done under the auspices of the Atomic Energy Commission.
Application of Statistical Analysis to Analytical Data P. D. LARK School o f Applied Chemistry, The N e w South W a l e s University o f Technology, Broadway, Sydney,
The principle of least squares provides a means of separating and estimating the systematic and random errors in determinations by an analytical method if it is investigated over a sufficient range of concentrations. The method of obtaining the estimates of error and interpreting them depends on the adoption of some suitable hypothesis about the relationship of the tw-o types of error to the amount of substance being assayed. Unless this is done, tests of significance or statements involving probability ma>-not be made. In the extended treatment of an example the necessary steps in a statistical examination-choice of hypothesis, computation of regression equations and associated errors, and testing for rejection of suspected values-are illustrated. The results of such an examination are applied to the prediction of the true amount of substance in a sample from that found by chemical analysis.
I
IT T H E investigation of a method of quantitative analysis it
is desirable to carry out tests over a range of concentrations or weights of the substance analyzed. For any given amount of substance taken the amount found n-ill vary, and the difference or error may be divided into a random part, the cumulative result of steps in procedure and measurement, and a systematic part, sometimes a constant error or blank, which is the fault of the method. Statistical analysis of regression, which may be carried out if the range of the investigation is reasonable, provides a means of separating these two sources of error and enables the reliability o€ the method to be assessed as precisely and accurately as the data permit. -4n example of the application of regression analysis t o data of this type is presented by Youden (‘IS). I n this example, the dependent variate, Y, is the amount of substance found, the independent variable, X , is the exactly known amount taken, and an equation, f‘ = a b X , is fitted by the method of least squares. Here is stated ( I S ) and in a previous paper ( I ? ) and note ( 1 1 ) that
+
N.S.W., Australia
the intercept, a,in this equation is “an estimate of any constant error,” and that there is no evidence of constant error or blank in the analytical process if a does not differ significantly from zero. If,in addition, the slope, b, does not differ significantly from unity, it seems to be implied that there is no systematic error in the method but only the random error of analytical determinationLe., ? = a bX non- indicates the ideal relationship, Y = X. However, while this may sometimes be true, a slightly more elaborate treatment reveals a different state of affairs in the example discussed. The position becomes clearer if the regression oi total error of analj tical determination, Z = Y - X , on amount taken is considered. The alternative and more instiuctive equation takes the form: Z = a b’X, in which b‘ = b-1, and which has a standard error of estimate, s(e), equal to that of the equation of Y on X . To interpret the constants so obtained, it must be assumed that this equation estimates a true relationship of the same form. T o be more precise, the hypothesis is made that the systematic error, Z’, varies and is related to X by a linear equation, Z‘ = 01 p‘X, and that the random error, E , is normally distributed with constant standard deviation, u. TTe may write
+
+
+
z = Z’ + E
=
cy
+ j3‘X +
E
for total elror equals Eystematic error plus random error. *$ecepting this, Z‘, 01, p’, and u are best estimated by ,$?,a, b’, and s(e). S n alternative hypothesis might be adopted, that there is no p’ and that the systematic error is a constant, Z‘ = a’’. If this is so, no trend should be shonn by the Z values, and the slope b’ should be close to zero and b to unity. Thus, if the reasonableness of either hypothesis is to be judged from the data themselves, it is the value of b‘, or of b, that is the criterion of constant or other systematic error, while tests on a have a confirmatory value only. Furthermore, a and b are not independent and tests of significance must be made m-ith due care. These two do not exhaust the number of feasible hypotheses which might be made, although they are the most practical.
1713
V O L U M E 26, N O . 11, N O V E M B E R 1 9 5 4 Youden suggests a third: that the systematic error is zero when 2’ = p”X, the variability of random error being unchanged. This is discussed Iielon-, but in addition, any of these assumptions with regard t o systematic error might be combined with some assumption of varying random error-e.g., u = k X , or u2 = k X , or u = j kX-and the methods of estimating constants and of making trsts of significance would be altered accordingly.
S is zero, but is directly proportional to X-Le.,
+
4PPLIChTIOY TO THE EXAalPLE
The values of the example, given in Table I, are taken from the results of analyses of calcium oxide in calcium oxide-magnesium oxide mixtures (8). T h e equation fitted to these values is, from the computations in Table I
3
= -0.2281
+ 0.994i5ix
(1)
in which the intercept per se does not differ significantly from zero, nor the slope per se from unity, as may be seen from their confidence limits or from the appropriate t tests. Yet it is obvious from a casual inspection of the data that Y is low and Z negative in nine analyses out of ten, and t h a t the ideal relationships, Y = X and Z‘ = 0, are unlikely to occur. I n fact, the obverved differences have a mean of -0.35 with a standard error of 0.0654, and assuming constant error, i t can be shown by a t test t h a t the probability of such a value arising by accident, were 2‘ rrally zero, is less than one in a thousand. Thus, although a does
not differ from zero by more than can he attributed to chance, no conclusion can be drawn from its value and standard error. The equation for the regression of 2 on X is
2
= -0.2281
- 0.005243X
(2)
Since t h e standard errors of the constants are unchanged, thrir 90% confidence limits are: For t h e intercept, - 0.4845 to +0.0283 For the slope, -0.014956 t o +0.004470 However, intercept and slope are not independent (as a = 2b’z)nor are their confidence intervals, and they may not be zero simultaneously. This may be seen from Figure 1,and is consiqtent with the significant negative value of 3. Taking the data as they stand, we may adopt one of the folloning plausible hypotheses about the systematic error of the method. 1. 2‘ is linearly related t o X by an equation of t h r form, Z’ = P‘X, of n-hich Equation 2 is the best estimate. The data are not consistent JTith both 01 and p’-i.e., Z’-being zero. 2. 2’ = cy”, a constant. If this assumption is made, c y ” iq beet estimated b y the mean of the ten values of 2. From t h r standard error of this quantity based on n - 1 = 9 degree3 of freedom, the 90% confidence limits are found to be -0.3500 f 0.1199-i.e., -0.23 t o -0.47-thus excluding 2‘ = 0. 3. 2’ = p”X. I n this case the constant is best estimated hib“ = ZXZ/BX2, with variance a
+
V(b“)=
-
[ZZZ - (ZXZ)2/2X21
(n
-
1)ZP
1 = 9 drgrreq of freedom (14). Substituting, we find that b“ = -0.012847, and the 90% confidence limits, of a“ which it estimates arc a p p r o x i m a t e l y -0.008 to -0.018 and again Z’ # 0. Table I. -4nalytical Determination of Calcium Oxide in Known Alixtures Thus, it can be seen that all hypotheses lead to the concluCaO, AIg. Error sion t h a t systematic error of Taken, I’ound, of Y, Pome kind must be contended ;Y E’ Z Computations (x = X - X,2x2 = I(S - T ) 2 , etc.). with and that the role of a in 4 .0 Px? = 1668.625, Zxy = 1560.4 3.7 -0.3 t,he original regression equation 8.0 2 ~ 2 1652.560, Z X Z = -8.226 7.8 -0.2 (Equation 1) is ambiguous. 12.5 12.1 -0.4 222 = 0.385 I n this esample, it is consist16.0 15.6 -0.4 h = Zxy/Zx2 = 0.994757 ent with both the second and 20.0 19 8 -0.2 h‘ = ZXZ,’IX’ = -0.005243 third hypotheses, but taken hy a = 3 - b’2 -0.2281 26.0 24.5 -0.5 itself i t is merely the brst Zyz - hPXy Z Z ~ ~‘ZXZ estimate of error n-hen thc T-(Rj = - 71 - 2 = 0.042737 n - 2 amount of substance to I)e s(e) = 0.20672 (8d.f.) annlyzed is zero. This is, of 31.0 31.1 $0.1 T-(b,h’) = I-(?)11x2 = 0.00002727 coursr, also true of the ana:30.0 35.5 -0.5 s(h.h’) = 0.00522 (8d.f.) Iyt,ically determined tilank, 40.0 39 4 -0.6 90% Its. ( 3 ’ ) = h’ zt tsos(b’) which may indicate only :t 40.0 39,s -0.5 = -0.005243 0 . 0097 13 miall part, of the error when ___ the amount of suhstance prrqSum 2 8 2 . 5 229.0 -3.5 T7(a) = V ( e ) + X-VZ.r?) = 0.019000 ent is substant,ial. On t’he other hand, b and b‘, although Mean 2::i 25 22.90 -0 3.5 $ ( a ) = 0 . 137S4 (S(1.f.) perhaps not so helpful here, do 90% Its. ( a ) = a f s a x s(a) rrflect conditions over the = - 0.2281 =k 0.2564 n-hok range of the variablr, P 2 and if their values are suffip-(Z) = ~~--:*~- = 0.04278; s(Zj = 0.2068(Btl.f.) 11 - 1 ciently close t o unity and zero, respectively, the hypot’hesir T-(z) = V ( Z )! 7 b = 0.004278; s ( Z ) = 0.0664 of constant error may he a,