Statistical Operating Rule for Analytical Chemists

Thefather cut one for a pattern, but the boy each time used the one last cut as the measure for the next. How often we see all the good intentions wra...
0 downloads 0 Views 547KB Size
V O L U M E 23, N O . 11, N O V E M B E R 1 9 5 1 is a good round percentage of common sense. The final expression of a standard is the work of human hands. No matter by what physical means the length of a meter can be established, it is, practically, the distance between two marks on that prototype bar preserved in Paris, and the real value of any other meter is the accuracy of the intervening comparisons. My boyhood schoolmate found out something of that sort when his

1531 carpenter father set him to cut some six hundred pickets for a fence. The father cut one for a pattern, but the boy each time used the one last cut as the measure for the next. How often we see all the good intentions wrapped up in a standard laid waste by the absence of some practical experience. R

~ j U l y 5 ~, 1951. ~

~

~

D

4th Annual Summer Symposium-Standards

Statistical Operating Rule for Analytical Chemists H. A. LIEBHAFSKY, 13. G. PFEIFFER, AKD E. W. BALIS Research Laboratory, General Electric Co., Schenectady, N . Y. A simple statistical rule relating to the precision of chemical analyses is presented and discussed. The rule specifies the establishing of the standard deviation, s, by replicate determinations and “guarantees” the result of a single determination to within &3s. A large number of results from the microcombustion of organic compounds have been reexamined, and 3s limits for carbon and hydrogen have been suggested. Statistical analysis of the most comprehensive published investigation of constantboiling hydrochloric acid as an acidimetric standard has yielded 4 parts per thousand as the 3s limits, which means that the reliability of a single batch of this standard has been overrated in the literature.

I

N COXXECTION with the development of a method for the microcombustion of organosilicon compounds ( I ), andwith an investigation into the reliability of constant-boiling hydrochloric acid ( 6 ) ,it became necessary to examine published data from the viewpoint of a statistical operating rule formulated in this Iaboratory. The rule is presented here along n.ith the conclusions reached from the statistical analysis of the published data in question. The analytical chemist can have either a research or an operating interest in an analytical method. (Naturally, these interests may coexist.) As regards statistics, the research interest revolves around causes of variation, known and unknown; the operating interest is concerned with the reliability of the method as a measuring stick. The research interest is served by analysis of variance and tests of significance; the operating, by the methods governing prediction and specification-methods in rvhich the standard deviation is conveniently applied. A recent, important statistical analysis of the microdetermination of carbon and hydrogen illustrates the former interest (8). I n connection x i t h the reproducibility of analytical methods, there is a gron ing tendency to speak of “confidence limits,” and to single out the 95.45 or the 90.73% limit as the precision measure. ( I n a normal universe, 95.45y0 of a sufficiently large number of observations will lie within 2u of the mean, u being the standard deviation of the universe; 90.73 is the correqponding percentage for 3 u . ) The procedure in this laboratory has been not to speak of “confidence limits” (11); to regard any limits as approximate guides; to take 3s as the preferred limit, \?-here s (the estimate of U ) is defined by

The following operating rule is used here:

As a basis for action, i t is considered certain that a single determination by an accurate analytical method will give a result within 3s of the true value, where s is the standard deviation established by the results of a t least 5 replicate determinations according to the method. This rule defines the authors’ attitude townrd the important problem of doing a large number of analyses, all of the same kind. The problem is subdivided into the establishing of s, and the obtaining of results on the “customer’s” samples. I t is assumed that the kind and distribution of errors are the same in both parts. The comments below wiil clarify the rule. 1. “As a basis for action” means that the analytical chemist “guarantees” the rcsult to within 3s limits, and that the “customer” is justified in acting on that guarantee. The guarantee may be evpected to fail in 1 case out of not less than 20 (see below). 2. If the customer considers 3s limits too large, they should be reduced by making additional determinations (to a total of n’),so that they become 3s/< 3. The rule says “accurate analytical method” because only precision is being considered. 4. 3s limits are chosen because 2s limits are not conservative enough. Further justification of this choice is given below. It has the sanction of control chart experience ( I O ) , where n is large and reliable limits are consequently easier to establish. 5 . What is the chance that a single determination will fall outside the 3s limits? Manifestly, it is 1 in about 370 (corresponding to 0.27%) if n is very large and the universe is normal. In the usual chemical analysis, however, this chance will be considerably greater for these reasons: The universe may not be normal; a sufficient number of replicate observations to establish its character will seldom have been made; and s, not U , must usually serve as a precision measure. The Camp-Meidell inequality ( 4 ) predicts that this chance will be 1 out of 20 or more; hence 1 in not less than 20 (corresponding to 95% or more within 3s limits) is a reasonable description of the chance in question. 6. The number of replicate determinations done to establish s is often limited by economies and is governed by the precision to which s must be known. A running adjustment of s is especially desirable in analytical control methods. 7. If, instead of a single result, the operating datum is the mean of four or more analytical results approximately normal in distribution, then the chance of paragraph 5 will be 1 in considerably more than 20 because the distribution of such means may be assumed normal in practice ( 2 ) . The standard deviation of means is written as smbelow. 8. The operating rule is applicable to processes (such as the preparation of a standard) other than analytical methods. CHOICE OF PRECISION LIMITS

z,-2 being the deviation from the mean of the i’th of n determinations; and to regard even 99% as too optimistic an estimate of the proportion of results usually lying within 3s of the mean.

Every treatment of precision in analytical chemistry should include a consideration of the article by Power (9),whose classifications “one chemist on many samples” and “many chemists on one sample” are used here. Variations in technique from one chemist t o another and from one laboratory to another are implicit in the

1532

ANALYTICAL CHEMISTRY

definition of the second classification It will be shown that 3s limits are desirable for each. One Chemist on Many Samples. As an adjunct to the work on constant-boiling hydrochloric acid, L. Bronk recently carried out in this laboratory five careful titrations against 1 N sulfuric acid on each of six high-quality sodium carbonates purchased from well-known chemical supply houses. The thirty (consecutive) rewlts when plotted on a common basis give a skewed distribution curve and there is only 1 chance in 40 ( l a ) that they belong to a normal universe. This supports adopting the Camp-Meidell inequality as the basis for the operating rule given above. Poaer (9) does not give detailed results for the 281 samples amlyied by five chemists t o establish his “one-chemist-on-manysamples” case. The standard deviations (for which Power’s symbol u is retained here) for this case are: carbon, 2.5 parts per thousand; hydrogen, 18 parts per thousand. Table I of (9) shoas how these standard deviations varied among the five chemists participating. Many Chemists on One Sample. The operating rule may now be tested by applying it to (9), Table 11. The test consists of calculating s (in parts per thousand) for all determinations reported and examining the individual results t o establish the number lying outside 2s and outside 35 limits.

Table I. Test of Operating Rule on Original Unselected Results (9, Table 11) s

Grnrip Compound 1 Benzoic acid

2 a

Ephedrine hydrochloride

KO.‘‘ (Carbon) 46

3.2

51

5.7

Number Outside (Hydro- 2s 3s 2s 3s gen) (Carbon) (Hydrogen) 37 3 0 3 0 26

2

0

0

=

sr:

Hydrogen

86

SL

ST

so

‘L

‘T

2.2 3.3

2.4 5.0

3.3 6.0

16 12

36 24

39 27

The values of sa are comparable (as they should be) with Power’s standard deviations for “one chemist on many samples,” and are usually smaller than the corresponding values of SL. PRECISION OF CARBON AND HYDROGEN DETERMINATIONS

Data for the microdeterminations reported in (8), Table 11, are considered along with those reported by Power (9) to establish suggested 3s limits for carbon and hydrogen, partly because it is desirable to compare these with similar limits from the work on organosilicon compounds. The variances given in (8),Table 11,are values of ,s: for the individual analysts. Over-all values of ss for each of the two substanres analyzed !yere calculated from the relation

where s” i is the individual value for the i’th collaborator reporting results for nj samples. After one error in ( 8 ) ,Table 11, had been corrected, the follopc-ingstandard deviations (all in parts per t’housand)were obtained.

Table I shows the guarantee of the operating rule to be valid in every case and indicates that the 3s limits ought not to be relaxed to 2s. The practical value of the operating rule can be shown by comparing the foregoing application with Power’s treatment of the same data. Power rejected results from his Table I1 when they lay beyond 4u8 of the true value. He eliminated results for other reasons also. While the authors do not criticize Power’s reconstruction of his Table 11, they have preferred to proceed on the assumption that no result considered worth reporting by competent analysts ought to be rejected. Power’s criterion of rejection uses the true value and the standard deviations from the one-chemist-on-many-samples case The operating rule uses the mean and standard deviations established by the data that are to be judged. The authors believe that the latter procedure is preferable. (For Group 2, 18 carbon results lie more than 2u8,and 11 more than 3ua, from the mean I Table I shows that the standard deviations for “many chemists on one sample” are considerably larger than those quoted above from Power for “one chemist on many samples.” [As is made clear below, the data in (8) do not support this conclusion.] Two important points are involved. First, analysts usually manage to attain the greatest precision in consecutive replicate determinations on the same sample. Second, changing analysts and/or laboratories reduces precision The second point is especially important in any determination, such as those under discussion, that is subjective in character. Analysis of variance illuminates both points. This analysis rests upon the fundamental additive character of variance. Necessarily, estimated variances-Le., standard deviations squared-must be substituted for variances so that

+ s&

Carbon Group 1 2

0

Number of determinations for earh element

s:

that is, the total (estimated) variance s: may be divided into two parts: the variance among laboratories, si, and the variance within a laboratory on one sample, s:. (For purposes of the present discussion, s7 and s as defined by Equation 1 may be considered identical.) A similar analysis is described by Snedecor (IS)on the variation in the birth weight of pigs in litters of various sizes; to apply his analysis to the present problem, read “laboratories” for “litters” ( s z ) , and “determinations within a laboratory on one sample” for “pigs of same litter” (ss). The standard deviations (in parts per thousand) found in this way for Table I are given below.

(2)

Substance Carbon _. -Hydrogen Analyzed So.“ sa s~ s~ sa s~ ST 3 h-icotinic acid 97 5.7b 3.06 6.4b 32 42 53 4 Benzylisothiourea 81 3 . 8 2.8 4.7 27 27 38 Number of determinations for each element. b Omitting results of one collaborator reduces standard deviations to 3.7 2.4,and 4.4,respectively. Group

Here the values of sa and sL tend t o be more nearly comparable than for the results cited by Power. Objections could be raised to the combining of the four groups of results, especially because Ogg, Willits, Ricciuti, and Connelly (8) found that one point of technique-the replacing of oxygen in the absorption tubes by air-significantly influenced the precision. Unfortunately, large bodies of analytical results seldom conform to the neat experimental designs that are so dear to statisticians. Inasmuch as the proposed 3s limits will be chosen arbitrarily anyhow, the authors have elected to proceed as simply as possible. Power’s investigation shows that varying the chemist is likely to lead to greater standard deviations than varying t’he compound burned by one chemist. I t is consequently expedient to set 35 limits for each case. The suggested limits follow. Case 1. One chemist on many samples. 3s (carbon), 7.5 parts per thousand; 3s (hydrogen), 54 parts per thousand. Based on 281 determinations of each element (9). Case 2. Many chemists on one sample. 3s (carbon), 12 parts per thousand; 35 (hydrogen), 90 parts per thousand. Based on 275 determinations of each element [Groups 1 and 2, Table I, and Groups 3 and 4 from (8)I.

The authors appreciate the seriousness of suggesting that a single determination of hydrogen by microcombustion cannot be guaranteed t o better than 9% of the amount present when carried out by a presumably competent microanalyst chosen at random.

V O L U M E 23, N O . 1 1 , N O V E M B E R 1 9 5 1

1533

The 3s limits for Case 2 are con4eraIdy greater than the “a1lowable error” of the Liebig niacrocombustion method, which Niederl and Kiederl (7) consider to be approximately applicable also to microcombustion. These “allon able errors” are: carbon, 5 parts per thousand; hydrogen, 30 parts per thousand. The 275 carbon and 275 hydrogen results of Case 2 were re-examined to test the various criteria, deviations in each group being nieawred from the mean for the group. Table I1 summarizes the tests.

Table 11.

Group

Total a

Number of Results Lying Outside Various Limits (Case 2) Allowable Error (Kiederl and Niederl) 28“ _ _ 38“ _ _ - ~ Carbon Hydrogen Carbon Hydrogen Carbon Hydrogen

-

-

-

-

-

-

66

109

21

30

5

12

s = 4 parts per thousand (carbon) and 30 parts per thousand (hydrogen)

The Niederl and Niederl criterion is obviously unrealistic for these data, nor can 2s limits be used as the basis for a guarantee. The guarantee of the operating rule fails for 5 out of 275 carbons and 12 out of 275 hydrogens. In neither case is the chance of failure greater than the expectation (“1 in not less than 20”) estimated above for the operating rule. It is, of course, highly probable that some a t least of these 5 carbons and 12 hydrogens should not have been reported for various reasons. But they were reported, and-because the operating rule is concerned with aingle determinations-we cannot afford t o rrject them CONSTANT-BOILING HYDROCHLORIC ACID

The attempt to establish the reproducibility of constant-boiling hydrochloric acid by re-examining the relevant published data (6) involves both the research and the operating interests mentioned in the introduction. Inasmuch as the emphasis in this paper is on statistics, only the most comprehensive published investigation of this acidimetric standard is considered here. The -operating rule (3s limits) is applied both to the preparation of the standard and to the analytical results that establish its acid content. In 1942, the Association of Official -4gricultural Chemists accepted as official, first action, “the method of simultaneous preparation and standardization of hydrochloric acid solutions from constant,-boiling hydrochloric acid.” This action was taken subsequent to work by King and four collaborators, all skilled chemists, and t6e detailed results thereof have been published ( 5 ) . Each of the five collaborators prepared one batch of constantboiling acid according to instructions and diluted it to 0.1 N, and each (except perhaps King) titrated the dilute acid in triplicate against borax and also against potassium acid phthalate. [King either ran four sets of triplicate titrations on the same acid, or he used two batches of constant-boiling acid prepared a t the identical corrected barometric pressure (see Table 2 , 5 ) . It is assumed that he did the former, but the conclusions from the statistical analysis R-ould be about the same in either case.] The authors have applied the t-test ( 3 )to the null hgpothesis that there is no significant difference between the detailed results against borax and those against potassium acid phthalate. The test is carried out by forming the six differences between the means of a set of three borax results and the means of the corresponding set of three phthalate results, and by finding the mean and standard deviation of these six differences. The result, t = 1.73 for five degrees of freedom, falls between the 10 and 25% levels of significance; hence, the null hypothesis is valid. Variance analysis

confirmed this conclusion. Consequently, the subsequent statistical analysis was done on six means, each the mean of six values of the strength of the same 0.1 N acid. Reproducibility of Standard. The data used to test the reproducibility of the A.O.A.C. constant-boiling hydrochloric acid are given in Table 111. Column 2 contains normalities calculated for the dilute acids by taking into account the barometric pressure during the distillation of the constant-boiling acid. For each collaborator, this normality, p , should be statistically equivalent, to the mean of a universe of mean normalities, one of which, R , has been determined and is listed in column 3. Furthermore, because this is a universe of means, the distribution may be assumed normal. The standard deviation of the data in column 3 gives 3s’ h i t s of over 4 parts per thousand. Theee limits are distinguished by a prime because they are due in part to uncertainties in the preparation of the acid and in part to uncertainties associated with the analyses An attempt will be made to see whether for any collaborator the former uncertainty must have greatly exceeded the latter. Statistically this attempt takes the form of finding an answer to the question: “Given six titration results with mean and standard deviation sm, what is the chance that f is taken a t random from the universe with mean p deecribed above?” [Kote that sm = s/& s being calculated by Equation 1 from data in ( 5 ) ,Table 2. Table 111 contains 3smvalues in colunin 4for reasons which become cleat below.] The t-test applies in the form =

’~~

(See Table 111, column 5)

(4)

Sa,

arid the chance sought has been taken from the t-table and is given in colunin 6. We must conclude that in 2 out of 6 cases (a large proportion) z could not have been taken from universes of mean p. In these 2 cases (Fine and Snider), most of the difference R - p can therefore logically be attributed t o irreproducibility in the preparation of the conetant-boiling acid.

Table 111. Data on Constant-Boiling Hydrochloric Acid Collaborator Schurman Fine Couroy Snider King King

N, Calcd. 0.10047 0.10004 0.10000 0.10000

0.10000

Mean A’, Found 0.10044 0.09977 0.10003 0.10013 0.10000 0.10OO5

3sm, Parts in 10,OOOa 6 6 6

5 11 10

t 1.5 12.7 1.4 7.3

i:5

Chance (See Text) 1 in 10 1 in 40,000 1 in 10 1 in 2600

.....

1 in 10

a Precision limits for analyses (not t o be confused with 3s‘, where s‘ is the standard deviation of data in column 3).

Application of the operating rule leads in a rough-and-ready way more directly to the same conclusion. According to the rule, f to within 3s., For Fine and Snider the analyst LLguarantees” R p does exceed 39,; in these cases, accordingly, there is a “basis for action”-Le., a justification for suspecting conetantboiling hydrochloric acid as a standard. rlnother simple test of the conclusion can be made by applying the usual square-root rule for the combination of error:

-

(5)

where s p and S A are standard deviations, respectively, for preparation and for analysis. With 3s’ over 4 parts per thousand, the contribution of 38.4 (Table 111, column 4) is small, even if the highest value (I1 parts in 10,000)is taken for the analytical precision limit. The re-examination of the data assembled by King thus leads to the conclusion on the basis of the operating rule that the nor-

1534

ANALYTICAL CHEMISTRY

mality of a single batch of constant-boiling hydrochloric acid cannot be guaranteed to much better than 4 parts per thousand. This conclusion is markedly different from the way a n unwary reader might interpret the following factual statement by King ( 5 )about the same data: “The average of 18 comparisons by five collaborators showed 1 part in 10,000 deviation with borax and 0.3 part per thousand with potassium acid phthalate.” Finally, it may be asked whether the statistical analysis might have resulted differently had the original data contained one more significant figure-e.g., 0.10049 instead of 0.1005. The authors have supplied such additional figures to the detailed data of Fine and Snider as would minimize 3 - p . This heroic measure did not materially alter the conclusions. I n the authors’ opinion, constant-boiling hydrochloric acid is less reliable as an acidimetric standard than some claims made for it would indicate. ACKNOWLEDGMENT

The authors are grateful to C. 0. Willits and C. L. Ogg, Eastern Regional Research Laboratory, for making possible the application of the operating rule t o the analyses underlying (8).

LITERATURE CITED

W., Sixth Annual Microchemical Symposium, Brooklyn College, Brooklyn, K.Y . , February 1951. (2) Davies, 0. L., “Statistical Methods in Research and Production,” p. 55, London, England, Oliver and Boyd, 1949. (3) Ibid., p. 57, par.4.42., (4) I b i d . , p. 238. (5) King, W. H., J . Assoc. Ofic.Agr. Chemists, 25, 653 (1942). (6) Liebhafsky, H. A., Fourth Summer Symposium, Analytical Division, A.C.S., Washington, D. C., June 1951. (7) Niederl, J. B., and Niederl, V., “Micromethods of Quantitative Organic Analysis,” p. 132, New York, John JTiiey & Sons, 1942. (8) Ogg, C. L., Willits, C. O., Riociuti, Constantine, and Connelly, J. A., ANAL.CHEW.,23,911 (1951). (9) Power, F. W., IXD.ENG.CHEY.,ANAL.ED.,11, 660 (1939). (10) Shewhart, W. A,, “Economic Control of Quality of Manufactured Product,” p. 277, New York, D. Van Nostrand Co., 1931. (11) Simon, L. E., “Engineers’ Manual of Statistical Methods,” pp. 22,42,43, New York, John Wiley & Sons, 1941. (12) Snedecor. G. W.,“Statistical Methods,” p. 176, Ames, Iowa, Iowa State College Press, 1946. (13) Ibid., p. 232. (1) Balis, E.

RECEIVED August 31, 1951.

4th Annaal Sammer Symposium-Standards

Commercial Development of Primary Standards HENRY V. FARR, ALBERT Q. BUTLER, AND SAMUEL M. TUTHILL Mallinckrodt Chemical Works, St. Louis 7, Mo. The term “primary standard” is applied to chemical substances which by virtue of their purity can be weighed out and used directly either for standardizing a volumetric solution of unknown strength or for preparing a determinate solution of the substance itself. The impetus to the commercial development of primary standards was supplied nearly 40 years ago by the National Bureau of Standards, when it arranged to issue primary standards for sale to American chemists. The first primary standard made available by the bureau was sodium oxalate. Some of the problems connected with the commercial preparation of this first standard sodium oxalate, which was made in the authors’ laboratory

P

RIMARY standards are chemical substances which by virtue of their purity can be weighed out directly, either for the purpose of assaying a volumetric solution of unknown strength or for the preparation of a determinate solution of the substance itself. For many years individual users of primary standards prepared these chemicals t o meet their own needs. Later, primary standards were developed on a commercial scale and became available along with other reagent chemicals. The impetus t o the widespread use of primary standards was supplied by the National Bureau of Standards through its publications and its certified samples. The bureau should take great pride in its achievement in this direction during the past 40 years. It should be credited, in addition, with stimulating the industry t o develop and market many chemicals approaching primary standard quality in tonnage volume. The commercial production of primary standards has been a direct result of investigations carried out a t the bureau and its development of specifications for these substances. The bureau

for the Bureau of Standards, are described. Other primary standards have been developed and made available by the National Bureau of Standards over the intervening years. Primary standards also have been made available through normal commercial channels. Examples include arsenic trioxide, benzoic acid, potassium acid phthalate, potassium dichromate, sodium carbonate, and sodium oxalate. The criteria which primary standards should meet are given, as well as special requirements which must be considered in their commercial preparation. Primary standards are analyzed for impurities and the purity factor is determined by assay. Methods of assay are outlined briefly.

has constantly emphasized the necessity for improvement of chemical standards and methods of analysis and it is doubtful whether vrithout this interest on its part the large scale production of chemicals of primary standard quality would have taken place. HISTORICAL

Volumetric analysis had its beginnings more than a century ago, when Gay-Lussac ( 4 , 5 ) from 1824 t o 1835 pioneered in the fields of chlorimetry and alkalimetry. Arsenic trioxide and sodium carbonate, which are discussed in this paper, were first employed as standard substances by Gay-Lussac. He used a standard solution of the former, in hydrochloric acid, in his system of chlorimetry and suggested the latter for the standardization of acid solutions. Margueritte ( I I ) , in 1846, first used potassium permanganate for the determination of iron. To standardize his permanganate