Accuracy and Precision of Microanalytical ... - ACS Publications

FRANCIS W. POWER, S. J., Fordham University, New York, N. Y.. A direct empirical test of the accuracy and pre- cision of the microanalytical determina...
2 downloads 0 Views 2MB Size
Accuracy and Precision of Microanalytical Determination of Carbon and Hydrogen A Statistical Study FRANCIS W. POWER, S. J., Fordham University, New York,

A direct empirical test of the accuracy and precision of the microanalytical determination of carbon and hydrogen, embracing 349 individual analyses of about 200 pure compounds by 23 experienced analysts, yields the information that this process is conducted at present with an over-all precision of about 2.9 parts per 1000 of carbonand about 22 parts per 1000 of hydrogen; both elements are determined slightly too high, the error on the carbon being probably significant and that on the hydrogen probably not. The statistical methods used are described and illustrated briefly. While tolerance limits expressed in per cent on the sample are not in accord with the usual custom among American chemists, such expression is sound in principle for carbon and hydrogen determinations. The outside tolerances for hydrogen as found from the present study agree very well with those commonly accepted; those on carbon, however, are a little wider than commonly accepted values. The precision attained by microanalysts varies considerably and should be given due consideration by organic chemists. Microanalysis is an art as well as a science.

N. Y.

Such a direct empirical test may be conducted in two ways: We may study the results obtained from the analysis of one pure compound by many chemists, or from the analyses of many pure compounds by one chemist. Both methods have been used in the present study.

Accuracy and Precision In none of the summaries on this general question is adequate distinction drawn between these two terms. Accuracy is here used as the conformity between the obtained result and the “true result”. Precision is taken to mean the consistency of the obtained results among themselves. The distinction is in a broad sense that which exists between the objective and the subjective element in a set of physical measurements, although precision as usually expressed may possess a certain objective value. Indeed, after the exclusion of all known sources of error it becomes our chief criterion of accuracy. If accuracy be taken to mean the agreement between a given microanalysis and the theoretical percentage of carbon or hydrogen in a pure substance of known composition, the definition is sufficient for the purpose of this paper. It will be unnecessary, therefore, to go into the epistemological implications involved in the concepts of accuracy and precision, although many pages are devoted to this subject by statistical writers. For a study of precision alone i t would be sufficient to perform calculations on any controlled series of analyses of any homogeneous substance; but if the substances are pure and of known composition, a study of their analyses compared with theory will serve to calculate the accuracy as well as the precision of the analytical process.

T

HE accuracy of the microanalytical determination of

carbon and hydrogen is a matter of considerable importance in organic chemistry, but unfortunately it seems also to be rather controversial. Niederl (28) sums up what the various authorities have to say on the matter-that is, for compounds not too abnormal in composition, the results of an experienced analyst should ordinarily come within * 0 . 2 per cent of the theoretical carbon and hydrogen content of the substance, and should not exceed the limits -1.0.3per cent. Since these limits are given by the authors on the basis of general experience, without presentation of specific and concrete data, it should be not only very interesting but also of value to chemists if these limits of accuracy were made more objective and definite by subjecting them to a direct empirical test, with the data analyzed afterwards by modern statistical methods.

Methods of Approach At the start of this study I had intended to rely primarily on the method of “one sample by many chemists”, as has been done by the National Bureau of Standards on standard samples, by the Committee on Uniformity in Technical Analysis on samples of various ores @), and by Lundell (26). Samples of Bureau of Standards benzoic acid (standard sample 39e) were sent to several microanalysts both here and abroad, who were asked to report results on duplicate determinations of carbon and hydrogen. It seemed desirable also to try a compound of more complex composition, so a sample of Merck’s ephedrine hydrochloride (serial 52,763), which had been used 660

DECEMBER 15, 1939

ANALYTICAL EDITION

satisfactorily as a test substance in our laboratory, was also sent. It melted a t 218.3’ (corrected), a value slightly higher than that given in the literature. The instructions called for giving these substances a preliminary drying a t about 50” in the Pregl micrpdesiccator. Reports on 18 samples were received (13 from the United States and 5 from Europe), to which are added the results of my assistant, Joseph Alicino, and my own. The other method (L‘many samples by one chemist”, Method I ) was extended t o “many samples by five chemists”, as follows: Four experienced microanalysts were asked to pick out at random from their laboratory notebooks 30 or more routine analyses on compounds of whose identity and purity there could be no question. Donald Price had carried out most of the microanalyses connected with the work of the late Dr. Hooker (25) and of Adelson and Bogert ( I , W), and the results under his name were taken from these articles. Adelbert Elek of Rockefeller Institute and Wm. Saschek of the College of Physicians and Surgeons, Columbia University, wrote out lists of such analyses after consulting with the chemists who submitted the compounds for analysis, to make sure that they were reporting on substances of high purity. Alicino did the same and I have added a few of my own.

Methods of Calculation All the errors dealt with in this paper were originally calculated in parts per 1000 of carbon and hydrogen, because (1) many different substances of varying carbon-hydrogen content are concerned; (2) at the outset the relation between precision and content was not known; and (3) this is the more common practice among American chemists. However, e., exthe practice in most microanalytical literature-i. pressing the error in per cent on the sample-has considerable justification. The original data from the five laboratory notebooks (Method I) would be prohibitively lengthy if published in easily usable form; hence only a summary is presented. The original data of Method I1 (many chemists on one sample) are given in full. The deviation of each individual analysis from theory was calculated in parts per 1000 of carbon or hydrogen, the difference being taken as positive where the analysis was higher than theory and as negative when it was lower. The algebraic sum of these deviations, divided by the number of analyses, gave the mean error of any particular series-that is, the accuracy of the analyses on t,he assumption that all the compounds were pure known substances. The differences between this mean error and the individual results were written down together with their squares. If we designate these differences from the mean error, without regard to sign, by d, and the number of analyses in the series by N, we may compute the so-called average deviation, a, and the standard deviation, s, by the usual formulas

66 1

persion about the arithmetic mean of a distribution of errors, and the average deviation, a, is a convenient measure of dispersion about the median, a quantity practically never used by chemists. I n the normal error curve, the mean and median coincide, and the theoretical relation between the average deviation and the standard deviation in an infinity of normal observations is the expression u = d/’/saa =

Not knowing how closely these analytical results would approximate a normal distribution, I preferred to calculate the standard deviation by its own formula rather than by the use of the average deviation. The approximation to normality turns out to be close enough, with a corresponding agreement between means and medians. One of the specifications given by Bond (4) for a median type of distribution is that the precision of the individual determinations should vary in a haphazard manner, whereas the corresponding specification for a normal distribution is that the precision should be about the same for all the determinations. I n the work described here, done by experienced analysts using very nearly the same detailed technique, the condition for a normal distribution would be expected. The use of the expression u8 =

The analyses in the five laboratory notebooks (Method I) were all calculated on the basis of C = 12.00, using the corresponding gravimetric factor 0.2727. If the more recent value of C = 12.01 (5)and its gravimetric factor 0.2729 had been used, the final figure for the precision would suffer no change, but a slight effect would be noted on the average agreement between analysis and theory. Rather than ask the analysts to recompute their results on the basis of C = 12.01, it is considered sufficient to correct the average error of each analyst’s list by adding 0.5 part per 1000. This is arrived at by computing the per cent of carbon calculated and found on a compound of about the average composition of all these substances-i. e., about 63.9 per cent carbon-and comparing the results using C = 12.00 with those for C = 12.01. If the analytical error on such a compound is found to be, for instance, 4.0 parts per 1000 high on the basis of C = 12.00, the error will be 4.5 parts per 1000 high on the basis of C = 12.01. Table I, a summary of all these analyses, has been corrected in this way.

.\/m

TABLE I. METHOD I

In the ideal case, as N approaches a,s approaches a limit desig’* nated as U. Symbolically, u = Lim s N-.

-

Analvst

(Summary of results, one chemist on many samples) Standard Deviation Number of IndiCalculated Calculated vidual Mean Median from a: Analvses Error Error uB = &d2/N u9=1.253a Parts per 1000

An estimate of u may conveniently be written u,, to show that it is estimated from a sample. The most common estimate is

- 1) = l/I;dZ/(N

1.253~

as an estimate of the standard deviation is usually frowned on by statistical writers, however, since it is not only less efficient from a mathematical standpoint, but its indiscriminate use necessarily involves the assumption of a normal distribution in cases where this may obtain only very approximately.

a = I;d/N s =

= s.\/N/(N

1.253~

- 1)

which for practical purposes, especially when N is large, may be written oI = m N The calculation of the average deviation, a, has been included in what follows, because i t is probably more familiar t o chemists and it seems to afford a fairly close estimate of the standard deviation from the analytical results in this paper. The standard deviation, s, is a convenient measure of dis-

Elek Power Price Alicino Saschek

58 37 48

Elek Power Price Alicino Saschek

Carbon (C = 12.01) $0.4 $0.4

77 61

+l.Z -0.3

$1.8 -0.1 +l.l

+1.3 +1.1 $0.8 Hydrogen (H = 1.008)

58 37 48 77 61

+5: 10 +8

?; + 15

-6

+p -0

Parts per 1000

1.2 4.2 2.4 2.8 2.3 9 20 25 21 18

1.2 4.3 2.4 2.7 2.1 8 20

27

20 19

CONDENSED FORMO F A ~ o v DATA s

c

Number of individual analyses Standard deviation, parts per 1000 Mean error of analysis, parts per 1000

281

2.5 4-0.8

H 281

18

f2.6

662

INDUSTRIAL AND ENGINEERING CHEMISTRY

Any analyst by using the statistical methods employed here can calculate the accuracy and precision of his own results on the same assumptions as are made here-namely, that the compounds are pure substances of known composition, that he chooses the analyses without prejudice, and that enough is taken to give a reasonable approximation to the statistical laws for large samples. About 150 individual analyses would be a minimum for this purpose.

VOL. 11, NO. 12

are controlled by standard deviation u. I n statistical work, however, i t is customary to plot the frequency as a function of t rather than of 2,where t = z/u, thus expressing the error as some fraction or multiple of the standard deviation. I n a strictly normal distribution of an indefinitely large number of measurements, the expression

Comparison of the Methods When a preliminary report was presented @I),the results by these two methods of approach agreed fBrly well, but as more data accumulated it was clear that the accuracy and precision estimated by the cooperative analysis on the two test samples were going to be considerably lower than those obtained from the five laboratory notebooks. Method I1 is probably the more objective, being dependent only on the purity of the test substances and not open to choice or interpretation by me or by the cooperating analysts. Against it is the fact that it does not correspond to the ordinary routine procedure in organic research, where the composition of a substance is usually judged on its repeated analysis by the same person; hardly ever does a research director send an unknown compound to many different analysts. Furthermore, the use of such a method, involving only two rather simple substances, would not give a true cross section of a procedure which in practice involves all sorts of substances of widely differing composition. Method I is open to the serious objection that it is less objective-that is, anyone would naturally be inclined to select analyses on the basis of their close agreement with theory rather than on the basis of sample purity. In collecting these 281 individual analyses, however, this danger was reduced to a minimum, and while they probably do not satisfy all the requirements for a truly random sampling in a statistical sense, they approach it nearly enough for practical purposes. I propose, therefore, to use the results obtained from these analyses as a criterion to be applied to the results obtained by Method 11. Some individual analyses on this list were so far away from theory that a fair estimate of accuracy would necessitate applying some criterion of rejection, which would look like begging the question. It would seem, however, that such a procedure can be justified. It was not proposed to make a crude estimate of the accuracy of a process whose error had never been investigated before, but rather to attempt a methodical refinement of a quantity which has been known by many years of practical experience. Only a few of the cooperating analysts had any idea what use was to be made of their reports; it did not suit the purpose to have them run the analyses with any special precautions. Results obtained in the regular course of the day’s work were wanted. Under these conditions it is to be expected that an occasional analysis will go wrong; and the few results that are highly discrepant, and which will subsequently be rejected, illustrate that when a man says he can “check a result to 0.3 per cent” he should add “in so many analyses out of a hundred”.

Criteria of Significance and Rejection Of the indefinitely large number of possible frequency curves, the normal curve, usually associated with the name of Gauss, is most commonly used as a basis for the statistical study of physical measurements. For this purpose i t is usually written

where y is the frequency of occurrence of an error of magnitude x away from the arithmetic mean of N measurements, which

gives the probability of occurrence of a chance, random, or indeterminate error between zero and t. Since in analytical work we expect errors of both signs, it is usually more convenient to take twice this integral, giving that fraction of the measurements affected by random errors over the range --t to +t. While the normal curve is widely used as a basis for statistical treatment of physical measurements, its limitations for this purpose must be taken into account, and the treatment of its various advantages and disadvantages is gone into per Zongum et Zatum by the statistical writers. For practical purposes of error theory the normal curve is only a mathematical convenience, the use of which is based largely on pragmatic grounds. Its most obvious limitation lies in the fact that it can be approximated only when very large numbers of measurements are available, which may not always be the case in chemical work. Then, too, the very large random errors which the normal curve would require cannot occur when an experienced investigator performs repeated high-precision measurements by recognized techniques on the same invariable object; this is the chief basis for the statement that the normal curve is only a fairly close approximation to the distribution of measurements made in a very poorly conducted experiment. This very point, however, is made the basis of a simple and very commonly used criterion of rejection If we consider a measurement giving a very large value of t, we see that it has a very low probability of having occurred by chance, and for practical purposes this statement may be reversed to read that highly discrepant measurements have a high probability of not belonging to the series of concordant measurements affected only by random errors-that is t o say, measurements of high t values may be rejected as being affected by errors which are real or significant. Just what t value to select as a criterion of rejection is a matter of opinion; the larger we take it the more lenient we shall be in admitting widely discrepant values among the measurements accepted as valid. Fisher (12) and many other statistical writers regard those deviations as significant which exceed twice the standard deviation. In the limiting case of a normal distribution of a very large number of measurements, this would mean that we enter the probability table (19) at t = 2 and find the corresponding probability t o be 0.955, which under the assumptions mentioned may be taken to mean that 955 measurements out of 1000 may be expected t o show chance deviations (from the mean) whose magnitude will not exceed twice the standard deviation. Conversely, the other 45 measurements, whose deviations from the mean exceed twice the standard deviation, may be said t o be affected by errors which are not due to chance but are determinate or significant errors. Some authorities set this “critical ratio” higher than t = 2; for example, Ostwald and Luther (29) take t = 2.5, while Shewhart (42) uses t = 3, not so much on account of the particular value of the probability associated with it (0.9973) but because “experience indicates that t = 3 seems to be an acceptable economic value”. In certain statistical applications to experimental psychology and education even higher values are used.

For most analytical work one will be safe in taking the value t = 2-that is, if an individual measurement differs from the mean of the series by an amount greater than twice the standard deviation of the individual measurements in the series, one will usually be justified in suspecting it as a valid member of the series. Or if a given analysis, conducted with the usual care necessary to exclude all known errors, gives a result differing from the expected value by more than twice

DECEMBER 15, 1939

663

ANALYTICAL EDITION

the standard deviation of the analytical process, one will usually be safe in suspecting either the identity or purity of the compound. If these are shown to be unexceptionable, the chances are that a determinate error has crept in despite the vigilance of the analyst. Observations or measurements which may be treated statistically may be grouped in two classes, according to their material objects. I n one class would be, for example, the stature of men, the size of maple leaves, the blowing time of fuses, the weight of newborn infants, the rainfall in New York, the market price of a group of commodities, etc. In the other class would be most physico-chemical measurements, such as the atomic weight of an element, the velocity of light under a given set of conditions, the equatorial diameter of the earth, etc. For the purposes of this paper it is necessary and sufficient to note one important distinction between these two classes of measurements. In the first class the mean value which the investigator is looking for has itself no physical existence in nature prior to the calculations; the investigator makes it. In the second class the mean value actually exists in nature prior to the calculations; the investigator is trying to find it. The variations encountered among individual observations or measurements of the first class are due primarily to the diversity of the individuals themselves and secondarily or not at all to the imperfections and limitations of the measuring process. In the second class the variations encountered are not due to the object being measured, since it is a definite extra-mental entity, but arise entirely from the imperfections and limitations of the measuring process itself. It is advisable to emphasize these points because statisticians usually object strenuously to rejecting any observation; and rightly so, if it belongs to the first class described. For example, if one is getting the average height of men one cannot reject an individual in the agreed-on population merely because his height is 30 below that of the group. He might be rejected as a prospective police officer, but he cannot be rejected as a man. On the other hand, certain of our better physico-chemical techniques are per se incapable of such wide variations, and if such should be observed, the experienced investigator who has checked and rechecked this technique may logically ascribe unusually wide discrepancies to the sample rather than to the method of measurement. I n other words, he may justly have such confidence in the method as to conclude that he Is taking measurements coming under the first classification rather than the second. The classical instance of this is the discovery of the rare gases through high-precision measurements of the density of nitrogen from different sources. While the ordinary daily procedures of microanalysis do not fall under this category, they do possess a certain precision which when once established and applied can serve as a basis for judging highly discrepant results as being due either to the impurity or identity of the sample or to some inadvertence on the part of the analyst in controlling the known sources of error. The present article will serve as a general estimate of the precision of these processes, representing a cross section for many analysts; but no chemist will be justified in using these numerical data to establish a criterion of rejection for his own work unless he has found by actual test that his accuracy and precision are the same or very nearly the same as those set down here as a sort of over-all estimate. Furthermore, the use of any such criterion in a given analysis will be governed less by statistical considerations than by a critical evaluation of the analysis from a purely microchemical viewpoint-the personal assurance that all the necessary precautions have been taken to exclude all known and determinate errors in a given analysis.

An interesting statistical test analogous to the use of t as a criterion of rejection would be t o a ply this calculation to the mean values from Table 1. Just ase!t standard deviation of the individual determinations measures their scatter or dispersion around their arithmetical mean, so the standard deviation of the mean value itself (called the standard error of the mean) is a measure of the dispersion of various mean values of N I , Nz, etc., individual determinations around the “grand mean” of a very large number of determinations, the standard error of the mean of N observations being U

uM =

7%

The corresponding t value will be t = - Ax nM

where the magnitude of t will indicate in a general way whether any given discrepancy- Ax, for example-between a given analytical mean and the theoretical value, may be reasonably ascribed t o accidental errors or whether it should be considered t o indicate a constant error of analytical significance. From Table I we have estimates of U M as follows, with N = 281 : UM =

16.78 2 5 = 0.15 part per 1000 for carbon - 1.07 parts per 1000 for hydrogen

uM=16.76-

and the values of t for the mean discrepancies from theory will be t=-

0.15

= 5 . 3 for carbon

t = - 2*6 = 2.4 for

1.07

hydrogen

The reader should distinguish carefully between the magnitude of an error and its significance. In this case, the pooled results of Table I indicate that both the carbon and hydrogen analyses were affected by a small positive error; but the t value would incline one to conclude that this error would not ordinarily have arisen by chance or indeterminate errors, but that rather it represents a real, although perhaps trivial, error inherent in. the analytical process.

Method of “Many Chemists on One Sample” I n Table I1 are given the original unselected analyses reported on the two test substances. The analysts are designated by letters according to the order in which their reports were received, The carbon results, both calculated and found, are on the basis of C = 12.01. A summary of Table I1 is given in Table 111, as it may be considered the most objective set of data presented in this paper. DISCUSSION OF TABLE 11. These results are rather disconcerting. Testing them against their own means and standard deviations they are not bad; theoretically only four or five carbons and hydrogens should exceed twice the standard deviation, and actually only five carbons and one hydrogen are outside this range. The chemist, however, is interested in testing the analyses against the theoretical values, and using a more reasonable estimate for the standard deviations of the process. If we assume these to be 2.9 parts per 1000 for carbon and 22 parts per 1000 for hydrogen (the final values arrived a t in this paper), we find 18 carbons and 16 hydrogens differing from theory by more than twice these standard deviations. Since this is really testing a series of measurements against a mean and a standard deviation not derived from the series itself, these figures cannot be given a strictly statistical interpretation, but the practical analyst’s interpretation would probably not be very complimentary. Furthermore, one may count 31 carbons and 18 hydrogens which fail to meet Pregl’s outside tolerance of *0.3 per cent

INDUSTRIAL AND ENGINEERING CHEMISTRY

664

VOL. 11, NO. 12

TABLE 11. ORIGIKAL UNSELECTED ANALYSES (METHOD 11) Analyst

Benzoic Acid C H

% A B C

D EC

69.03 68.96 69.18 69.22 69,586 69.43 68.79 68.98 69.23a 69.17a 69.17 68.85 68.67a

F G

H

I

69.14 '69.42 68.86 68.73 68.86 68.815 68.84

Ephedrine Hydrochloride C H

%

%

%

5.00

59.22 59.66 59.23Q 59.27 59.22 59.60 59.46 59.55 59.23 59.54a 59.68 59.63

8.19 8.31 7.76a 7.86 7.87 7.56 7.72 8.07 8.08 8.12a 8.02 7.94

59.87 59.86 59.43 59.59 59.09 59.17a 59,27a 59.13a 59.10 59,14a 59.775 59.81 59.72

7.80 7.66 7.95 8.02 7.79 7.85a 7.72a 7.64a 7.88 7.65a 7.64a 7.71 7.66

4.91 5.22 5.19

5.546 5.388

5.01 5.10 5.06a

4.81a 4.81 4.88 5.105 5.03 5.06 4.92 4.96 5.04 5.02a 5.03

OF RESULTS OF TABLE I1 TABLE 111. SUMMARY

Benzoic Acid Ephedrine Hydrochloride C H C H 68.84 4.952 69.55 7.995 59.62 7.930 69,OO 5.140 68,96 5.03 59.60 7.94

Theoretical per cent M e a n per cent from analyses Median per cent of analyses Actual error of mean (from theory), parts per 1000 + 2 . 3 +38 a, % 0.18 0.17 8%% 0.22 0.20 s, parts per 1000 3.2 40 s, parts per 1000, calculated from a 3.3 43

+l.Z 0.26 0.34 5.7

5.4

-8 0.17 0.21 26

27

(from theory), or about 25 per cent of all the 'individual analyses. The accuracy of the mean analytical results is much better on the ephedrine, a compound presumably inferior in purity to the Bureau of Standards benzoic acid; the precision on the latter is better than on the former for carbon, but much worse for hydrogen. RECONSTRUCTION OF TABLE 11. It is only fair to take into account certain extenuating circumstances in connection with some of these analyses. One of the two men who reported very high results did the analyses on one of the hottest and most humid days of the summer of 1937, as was the case with one of my own analyses. The other used a technique which may occasionally admit undried air to the absorption tubes. Another analyst, whose results are rather badly out of line, used a gas velocity different from that ueually recommended, as he was working on a problem which required such a modification and did not reset his pressure regulators. In this case, too, there was some question as to contamination of one of the samples. There are two sets of high carbons reported by European analysts and one high carbon of my own for which no apparent explanation can be given. From a purely statistical standpoint one would hesitate to reject any of these analyses, but from a chemical standpoint some question can be raised as to the purely random character of some of the larger errors. There should be no serious objection, therefore, to reconstructing Table 11. I n the first place I propose to apply to all the data of this table a very lenient criterion of rejection-namely, the theoretical percentages plus or minus four times the standard deviations

C

Benzoic Acid H

% J

K L M

N

0 P

5.03 4,87a 4.96 a Excluded a t random from Table 111. b Excluded as too inaccurate for Table 111. 0 Analyst, J. Alicino. d Excluded from Table I11 for want of a duplicate. e Analyst, F.IT. Power. 68.80 68.83a 68.95

Analyst

Q R S

FWPe

69.14 69.24 69.06 68.99 69.01 69.23 68.55 68.70 68,99 69.10 68.72 68.81 68,94 68.86 68.92 69.46 68,92 68,85 68.93 68,92 69,245 68.88 69.04

Ephedrine Hydrochloride C H

%

,

%

%

5.32 5.27 5.05 5.01

p9.84 09.84 59.65d

7.51 7.55 7.82d

5,566 5.516

59.49 59.56 60.44b 60.19b 60.426 60.216 59.45 59,63 59.54 59.56 59.91a 59.12 59.46 59.70 59.86 59.79 59.65 59.51 59.75a 60.24b 59,61a 60.276 59,60

7.97 8.17 7.86 7.84 8.01 8.10 8.07 7.94 7.99 8.01 7.866 8.27

5.40b 5.13d 5.05 5.09 5.02 4.99 4.99 4.93 5.19 5.15 4.93 4.96 5.04 4.96 4.9'25 4.89 4.76

8 . OR ..

7.59 7.68 8.00 7.98 8.24 8.16a 8.26 8.040 8.07 8.00

already deduced from the results of Table I. These results have certain points in their favor; and besides, such a wide tolerance will exclude only those few analyses which have evidently been affected by some determinate errors. The theoretical percentages are selected as the basis of the criterion of rejection so as to give information as to the accuracy of the process; but in the reconstructed table the standard deviations will naturally be calculated from the actual means of the analyses. From the analyses meeting this tolerance I propose to select a t random two results in those cases where more than two were reported, so as not to overweight the results of any one analyst. The reconstructed table will consist of 17 duplicate analyses within the following tolerance ranges: Benzoic acid Ephedrine hydrochloride

C

H

68 15 t o 69.53 58.95 t o 60.15

4.60 t o 5 . 3 2 7 . 4 3 t o 8 57

The new table is not given in extenso, but can be made by writing down those analyses in Table I1 not noted in footnotes as having been excluded. If a hydrogen analysis failed to meet the above tolerances, the corresponding carbon value was not used, and vice versa. Table IV gives a summary, made up in the same way as was Table 111. Since in all but one instance each chemist reported more than one analysis, the final standard deviation, representing the total variance of all the analyses, is really a composite quantity; part of the variance is due to discrepancies among the results of any one analyst and part to discrepancies among OF RECONSTRUCTED TABLE I1 TABLE IV. SUMMARY

Bensoic Acid C H 68.84 4.952 68.99 5.022 68.95 5.02

Theoretical per cent Mean per cent from analyses Median per cent of analyses Actual error of mean (from theory), parts per 1000 + 2 . 2 $14 0.14 0.088 0.18 0.119 2.6 24 s, parts per 1000 s, parts per 1000 calculated from a 2.5 22

:;

z

Ephedrine Hydrochloride

C

H

59.55 59.55 59.59

7.995 7.916 7.96

Zero -10 0.18 0.174 0.23 0.212 3.8 26

3.8

27

DECEMBER 15, 1939

ANALYTICAL EDITION

665

A test for significance on these final figures gives : C

N

Ax

uIM(estimated)

t

349

0 9 0 153 5 9

H 349 2 3 1 20 1 9

showing a high probability that the slight over-all positive error on the carbon determination is significant, but that the considerably larger positive error on the hydrogen is accidental. Another test on the entire collection of analyses would be to see how closely they approximate a normal distribution. One standard method for doing this is to compute the skewness and the flatness of the entire collection, involving the third and fourth moments, respectively, of the distribution. The information thus gained, however, while of considerable academic interest, would hardly compensate for the added amount of computation, which increases considerably when one goes beyond the squares of the deviates. The skewness for some of the individual lists in Table I was found near enough to zero to warrant using the normal curve for the distribution rather than a Gram-Charlier series or any of the many other frequency distributions noted in books on statistical analysis. In order more easily to visualize the statistical distribution, all the analytical data from Table I and reconstructed Table I1 have been plotted as frequency curves, the results for carbon in Figure 1 and those for hydrogen in FigFIGURE1. FREQUENCY CURVEFOR CARBON ANALYSES ure 2. Here the analytical errors in parts per T h e analytical errors on carbon, in parts per 1000 are plotted as abscissas a n d the frequenoy 1000are plotted as abscissas and the frequency of their occurrence is plotted as ordinate. The’total number is 349. The smooth curve i n dashed line is t h a t of the normal curve for N = 349, u = 2.9. of their occurrence as ordinates. Along with the actual frequencies are shown those required for the corresponding theoretical normal curves. I n these cases each analysis i s given equal weight, the chemists themselves. Statistical methods (known as the while in the summary above the pooled results of Table IV analysis of variance) are available (18, 37, 46) whereby one were weighted equally with those from Table I. The carbon may assign a certain degree of statistical probability to the results in both are corrected to the basis of C = 12.01. influence of these two factors. On performing this operation A visual comparison of the observed frequencies with those on Table I1 (before it was quite complete), the various calculated is rather deceptive. The curves are leptokurticchemists checked their own results much better than their i. e., too sharply peaked. Once a mean value has been esticolleagues’ results-that is, the variations in the results of mated, the form of any frequency curve is set for all practical the different analysts were much larger than could be acpurposes by three parameters-the standard deviation, the counted for solely by the variations within the work of each coefficient of skewness, and the coefficient of flatness-funcone. A similar variance analysis on reconstructed Table I1 showed that for the hydrogen determination on benzoic acid tions controlled, respectively, by the second, third, and fourth powers of the deviates. These curves are not noticeably and for both carbon and hydrogen on the ephedrine, the agreement within the duplicate analyses done by any one man skewed, but the departure from normality appears most was significantly better than that between the different anmarked in the center, in that the measurements are better alysts. The reverse was true of the benzoic acid carbon. than normal-that is, there are more analyses of high precision than a strictly normal distribution would require. There is, The mathematical analog of these carbon and hydrogen analyses in Snedecor’s book (46) is the yield of bacon from however, a mathematical criterion for the “goodness of fit” of different breeds of hogs! It is a good illustration of wide apa set of observed values to a hypothetical or calculated set of values; this is the so-called “chi-square test” of Karl Pearson, plication of modern statistical methods. SUMMARY OF ANALYTICAL DATA. Giving equal weights to whereby one calculates the results from Tables I and IV we may make the following (frequency calculated - frequency found)2 general summary : Chi-square = 2: frequency calculated Standard

Method One chemist on many samples M a n y chemists on one sample

Av.

CDeviat’o; Parts per IO00 2.5

3.2 2.9

18

25 22

Mean Error (Actual Deviation from Theoretical) C H Parts per I000 +0.8 +2.6 +1,1 +2.0 +0.9

+2.3

1

This test is described in many books ( I S , 17’36). The values of chi-square are tabled as a function of the number of pairs of values or groups of values summed up, and one may see from these tables the frequency by which any chi-square is exceeded by chance. In general, for a given number of classes

INDUSTRIAL AND ENGINEERING CHEMISTRY

666

summed, the larger chi-square, the worse the fit. This test on the foregoing data yields the following results (kindly checked by Joseph Kubis of the Department of Psychology) : Number of analyses Number of classes summed Chi-square Probability

Carbon 349 15 31.9 1.84) may be rejected, since here the mean of the universe should be outside this rather wide range. In the table from which Figure 4 was calculated there are actually 4 values where z > 1.84 instead of the 5 cases re uired by theory, an excellent agreement. In order to evaluate &is as an empirical test, however, it is necessary to see that actual analytical errors are involved in the cases of these four means which the z test has rejected. Here again it will be necessary to assume some range for their rejection, independent of this test; for this is used twice the standard deviation for carbon already deduced as a final result in this present article, reduced here to that of a mean of four-i. e., 2 X 2.9/& = 2.9 parts per 1000. Any sample means in Figure 4, therefore, differing from the mean of the universe by this amount, should be rejected independently of the z test, and those within this range should be retained. The mean of the universe of 180 (which is not so extensive as it might be) is known to be f0.4 part per 1000; hence the absolute range of acceptance will be -2.5 to f3.3 parts per 1000 of carbon. Of the four cases rejected by the z test none in my table is outside this range. Here, then, are four cases where results which would be acceptable in an absolute sense are rejected by the test-that is, four errors of the first kind. Considering the errors of the second kind (again from Figure 4 where the mean of the universe is assumed to be known), we

672

INDUSTRIAL AND ENGINEERING CHEMISTRY

inspect the actual errors of the 100 sample means and find 3 outside the above absolute range whose x values are less than 1.84-that is, there are three errors of the second kind. It would seem that one kind of error is just as bad from the standpoint of the organic chemist as the other, so here we have a total of 7 errors out of 100 possibilities of random sampling. This is not so bad with such small samples, but this very restricted experiment is not offered as a justification or criticism of the wide use of these significance tests by statistical writers.

If the original universe is far from normal and the samples drawn from i t are small, the errors would undoubtedly be (46)shows an example somewhat like this large-Shewhart in his monograph. In such cases, particularly, one should be hesitant about drawing conclusions concerning structure until derivatives have been prepared and analyzed, as can almost always be done. I doubt whether the authors lay enough stress on the point that a normal universe is a prerequisite for “Student’s” or Fisher’s small-sample technique. What is more important is to be sure that a universe exists a t all, normal or otherwise; none does until statistical control has been established.

Advantages of Higher Precision The point that one’s time is better spent in improving precision than in running large numbers of rechecks is important enough to warrant one or two numerical examples. When the old chemists were confronted with a choice between the alternative formulas for cholesterol H

C

C26H440 C27H460

%

%

83.80 83.87

11.90 11.99

they realized that combustion analysis on the substance itself would never get them very far, as can be seen from the following figures : Carbon

Hydrogen

This is clearly a hopeless task. This particular situation was first clarified 50 years ago by the cdmbustion analyses of Reinitzer (34) on cholesterol acetyl dibromide; he checked the theoretical analysis of this derivative t o 1.7 and 9 parts per 1000 of carbon and hydrogen, respectively. If this difficult choice on cholesterol itself had to be made on the basis of only one analysis, however, it could be made decisively by the technique of Baxter and Hale. Fieser and Jacobsen (9) went into this matter very thoroughly a few years ago and reported very interesting results, including a choice between two formulas differing bji 0.34 per cent in carbon and 0.14 per cent in hydrogen on a sapogenin which had given widely discrepant results in the hands of several experienced analysts who used ordinary macro- and microprocedures. They settled the matter by two precision combustions, and one would really have been enough. This sort of atomic weight technique cannot be recommended for ordinary research and control work, since even this precision does not often compensate for the fact that it involves running one combustion every other day on gram samples; but it is a striking exemplification of what a square root sign means in an equation.

Bad Compound us. Bad Analysis A perennial controversy arises when the analyst’s figures do not agree with the formula expected by the person who did the synthesis; the latter is as sure that the analyst did a poor job as the analyst is that the compound was “no good”. Let us suppose that an analyst runs 500 combustions a year with an over-all precision of 2.9 and 22 parts per 1000 of carbon and hydrogen, respectively; this will allow him an absolute error of about 0.36 per cent on carbon and 0.30 per cent on hydro-

VOL. 11, NO. 12

gen, taking average values for many compounds to cover the year’s analyses. If the errors are normally distributed, 95.5 per cent should lie within this range of &2a, and 4.5 per cent should lie outside this range. According to the theorem of Tchebycheff (21, @), which applies to this case as stated, the estimated error on these figures should not exceed t - 2 or about 25 per cent. When confronted with badly discrepant figures by the individual who synthesized the compounds, our average analyst could admit that he was in the wrong about 23 times a year, or about once every two weeks; the rest of the time he could lay the blame on the impurity of the compounds. From practical experiences, however, I doubt if such statistical considerations would afford him sufficient protection!

Recommendations Those who depend largely on microanalytical data in their research work should first, know their analyst and be sure that he always acts on the scriptural injunction, “Prove all things-hold fast that which is good.” Secondly, they should not take too seriously results obtained under very unfavorable weather conditions. Thirdly, they should see that the analyst runs test substances frequently, especially when a particularly critical point is being decided, and lastly, remember that his results follow in a general way the error function, whose curve has a certain finite spread; in other words, he is not expected to be right all the time.

Acknowledgments I wish to express sincere thanks to all who have cooperated with me on this problem, especially to the four chemists whose figures were used in Table I. I am also deeply indebted to Josef Solterer of Georgetown University and to W. A. Shewhart of the Bell Telephone Laboratories, Inc., for their valuable criticisms and suggestions along the lines of the statistical theory involved; and I owe a special debt of gratitude to Jack W. Dunlap of the University of Rochester for the help he has always generously given in studying and applying statistical methods to this and other problems undertaken in our laboratory. W. Edwards Deming of the U. S. Department of Agriculture has also rendered invaluable assistance, and I wish to thank him for putting his skill and experience a t my disposal with such generosity and patience.

Literature Cited Among the books and articles dealing with modern statistical methods, which the chemist can read and study with a great deal of profit, the article of Deming and Birge, the monograph and book of Shewhart, and the books of R. A. Fisher, Goulden, and Rider are especially recommended. Adelson, D. E., and Bogert, M. T., J . Am. Chem. SOC.,58, 653, 2236-8 (1936) ; 59, 599 (1937).

Adelson, D. E., Hasselstrom, T., and Bogert, M. T., Ibid., 58, 871-2 (1936).

Baxter, G. P., and Hale, A. H., Ibid., 58, 510 (1936); 59, 506 (1937).

Bond, W. N., “Probability and Random Errors”, p. 41, London, Edward Arnold and Co., 1935. Committee on Uniformity in Technical Analysis, J . Am. Chem. SOC.,26, 1644 (1904). Deming, W. E., J . Am. Statistical Assoc., 31, 124 (1936). Deming, W. E., and Birge, R. T., “On the Statistical Theory of Errors”, p. 140, Graduate School, U. S. Dept. Agriculture, Washington; revised from Rev. Modern Phus., 6 , 119 (1934). Ibid., p. 128. Fieser, L. F., and Jacobsen, R. P., J . Am. Chem. SOC.,58, 943 (1936).

Fisher, H. L., “Laboratory Manual of Organic Chemistry”, 4th ed., p. 328, New York, John Wiley & Sons, 1938. Fisher, R. A., Metron, 5 (a), 90 (1925). Fisher, R. A., “Statistical Methods for Research Workers”, 5th ed.. D. 44. London. Oliver & Bovd. 1934. (13) Ibid.,’dhap.’ IV. ’ (14) Ibid., p. 118.

DECEMBER 15, 1939

ANALYTICAL EDITION

Gossett, W. S., Biometrika, 6, 1 (1908). Goulden, C. H., “Methods of Statistical Analysis”, Chap. IV, New York, John Wiley & Sons, 1939. Ibid., Chap. IX. Ibid., Chap. XI. “Handbook of Chemistry and Physics”, Cleveland, Ohio, Chemical Rubber Co., Mathematical Tables. Also any book on statistics. Haynes, Dorothy, and Judd, H. M., Biochem. J., 1 3 , 2 7 2 (1919). Holmes, M. C., “Outline of Probability and Its Uses”, p. 28, Ann Arbor, Mich., Edwards Bros., 1936. Ibid., p, 48. Hooker, S. C., J . Am. Chem. Soc., 58, 1165-7, 1178 (1936). Ingram, G., J . Soc. Chem. Ind., 58, 34 (1939). Lundell, G. E. F., IKD.ENG.CHEM.,Anal. Ed., 5, 221 (1933). Munch, J. C., J . Am. Pharm. Assoc., 37,404 (1938). Neyman, J., “Lectures and Conferences on Mathematical Statistics”, p. 45, Washington, D. C., Graduate School, U. S. Dept. of Agriculture, 1938. Niederl, J. B., and Niederl, Victor, “Micro Methods of Quantitative Organic Analysis”, p. 109, New York, John Wiley & Sons, 1938. Ostwald, Wilhelm, and Luther, Robert, “Hand- und Hilfsbuch zur Ausfuhrung physico-chemische Messungen”, 5th ed., p. 13, Leipzig, Akademische Verlagsgesellschaft, 1931. Pearson, E. S., “Application of Statistical Methods to Industrial Standardization and Quality Control”, Section 8, Table 13, London, British Standards Institute, 1935. Power, F. W., “Probable Error of Microdetermination of Carbon and Hydrogen”, presented before Microchemical Section, American Chemical Society, Chapel Hill meeting, April, 1937.

673

Power, F. VI., “Some Temperature Effects in Microchemical Weighing”, presented at Milwaukee meeting, American Chemical Society, September, 1938. (33) Pregl, Fritz, andRoth, H.,“Quantitative Organic Microanalysis”, 3rd English ed., tr. by E. B. Daw, p. 13, Philadelphia, P. Blakiston’s Son & Co., 1937 (34) Reinitser, F., Monatsh., 9, 421 (1888). (35) Rider, P. R., “Introduction to Modern Statistical Methods”, Chap. VI, New York, John Wiley & Sons, 1939. Ibid.., Chaa. - ~ VII. Ibid., Chap. VIII. Schwarz-Bergkampf, Erich, 2. anal. Chem., 69, 321 (1936). Scott, E. L., J . Biol. Chem., 73,81 (1927). Shewhart, W. A,, “Economic Control of Manufactured Product”, p. 95, New York, D. Van Nostrand Co., 1931. Ibid., p. 184. Ibid., p. 277. Ibid., p. 390. Shewhart, W. A., “The Statistical Method from the Viewpoint of Quality Control”, issued for Dept. of Inspection Engineering, Bell Telephone Laboratories, Inc., New York (1937); to be published by Graduate School, U. S. Dept. of Agriculture, Washington, D. C. Ibid., p. 30. Snedecor, G. W., “Calculation and Interpretation of Analysis of Variance and Co-variance”, pp. 13 ff., Ames, Iowa, Collegiate Press, 1934. Wieland, H., Ann., 507, 226 (1933). Wieland, H., and Kotzschmar, A., Ibid., 530, 152 (1937). Williams, R. J., IKD. EKG.CHEM.,Anal. Ed., 8, 229 (1936). ~

-

Volumetric Estimation of Lac on Glazed Candies NICHOLAS Bl. MOLNAR AND JOSEPH GRUMER Molnar Laboratories, New York. N. Y.

T

HE widespread use of lac for coating candies and the

establishment by the new Food, Drug, and Cosmetic Act of a maximum permissible lac content have stimulated interest in the development of a method for the estimation of semimicroquantities of the material. The method presented here, based on the extraction and the titration of the lac acids with standard sodium carbonate, is quick and simple enough to be used in control testing in the manufacture of glazed candies.

Requirements of the Food, Drug, and Cosmetic Act The production of arsenic- and lead-free lac enabled candy manufacturers to use lac coating on candies to serve a double purpose: (1) to form a protective seal, and (2) to produce a desirable glaze. The Food, Drug, and Cosmetic Act (2) allows the use of such harmless glaze, but not in excess of 0.4 per cent. A manufacturer of candies, well before the law went into effect, asked the authors to render analysis on the lac content of his candies, in order to modify his manufacturing procedure if necessary and to ascertain that his products are within the law. A search of the literature showed no methods for the estimation of lac on glazed candy, but various articles (1, 3, 7, 8, 11) dealing with the chemical composition of shellac, none of which could be used for quantitative estimation in a complex system such as candy. The United States Department of Agriculture, Food and Drug Administration, advised (9) that they had had little or no occasion to determine glaze on candy quantitatively; hence they have no immediately available method for quantitative analysis of glazed

candy for shellac content. The need for a quick and accurate method is obvious. In manufacturing, a quick control with results obtainable within a few hours is essential, so that before the candies are packed the manufacturer may be certain that his product is within the requirements of the law.

Discussion I n the manufacture of glazed candies the glaze is applied by means of a solution of pure lac in specially denatured alcohol 35, an authorized formula (10) for a solvent in manufacturing candy glazes under code 015. This formula is prepared by the addition of 35 gallons (9.25 liters) of ethyl acetate to 100 gallons (26.4 liters) of pure ethyl alcohol. Tests run on samples of refined lac such as is used in the candy industries showed that alcohols 35, 2B, and 3A were satisfactory solvents. Alcohol 2B is made by adding 0.5 gallon of benzene to 100 gallons of ethyl alcohol, and alcohol 3A by adding 5 gallons of commercially pure methyl alcohol to 100 gallons of ethyl alcohol. (Though the use of alcohols 2B and 3A is permissible in analysis, they are not to be recommended for use as solvents for lac used for glazing candies.) Inasmuch as other constituents of many candies-namely, dextrose, coloring matter, fatty matter, and alkaloids such as theobromine-would be extracted in part or completely, evaporation of the alcoholic extract and subsequent weighing could not be used. Water could be added to precipitate the lac from the alcoholic solution, but the partial formation of colloidal dispersion as well as the precipitation of fats made the method of filtration and weighing impractical. It was then decided to base the determination on the solvent proper-