The experimental determination of an error distribution - Journal of

Lloyd S. Nelson. J. Chem. Educ. , 1956 ... Arthur. Rose , R. Curtis. Johnson , Richard L. Heiny , Theodore J. Williams , Joan A. Schilk. Industrial & ...
0 downloads 0 Views 5MB Size
0

THE EXPERIMENTAL DETERMINATION OF A N ERROR DISTRIBUTION LLOYD S. NELSON General Electric Research Laboratory, Schenectady, New York

IN

RECENT years chemists and chemical engineers have come to recognize the utility and power of statistics as an integral research and development tool. Unfortunately students are seldom exposed to any of the ideas, much less the techniques, of statistical analysis. These are gained later on the job, if a t all. I t seems reasonable, therefore, t o consider how students might be introduced t o some of the basic concepts of statistics. Peterson1 has described a laboratory experiment in physical chemistry which consists of having pairs of students measure the length of an object to the nearest tenth of a millimeter 100 or more times with a common meter stick. The students were required to construct a histogram of their results and to calculate the average length, the average deviation, and the probable error of a measurement. Four typical examples of student results were presented. It is the purpose of the present paper to discuss the experiment described by Peterson and to show how its value can be greatly increased if the students are shown how their results are admirably suited t o several simple statistical tests. The opportunity for the student to work with his own data should not be underestimated. Most experimenters never come to appreciate the

'PETERSON, S., J. CHEM. EDUC.,26.408 (1949).

utility of statistics until they begin to apply the ideas to their own data. The results of this measurement experiment can serve to introduce the concept of testing a hypothesis and illustrate the ideas of statistical significance and confidence limits. I n addition, since most students will find that the hypothesis of a normal distribution is not contradicted by their sample, they should gain an appreciation of the validity of the assumption of a normally distributed error which underlies much present-day statistical analysis. And lastly, the student should come to realize that the sample data he collects is of interest only in so far as it gives information about the hypothetical infinite population which it represents. Detailed calculations are given to illustrate the comparison of observed and expected frequencies of lengths and to show how two students can compare their precisions and their average lengths. Statistical tables not to be found in the "Handbook of Chemistry and Physics" are included. As was pointed out1 a n y long narrow object can be used for this experiment. Glass tubing sealed a t both ends is particularly satisfactory because it is readily available and poorly enough defined a t the rounded ends to provide a reasonable test of the students' ability t o estimate. The measurements might be made by aligning one end of the object with a mark on the meter stick and estimating the decimal part of a milli*meterat the other eud. However, they should be made by laying the object down a t random (i. e., without looking) on the meter stick and estimating the decimal part of a millimeter a t each end. The first problem confronting the student is: what sort of a distribution would be generated if the measurement were repeated a very large number of times? I n this type of operation where positive and negative errors should occur with equal frequency, and large errors should occur less frequently than small errors, we have reason to suspect that the normal (or Gaussian) distribution will be the underlying one. For this experiment the mathematical model is Lengthob. = Length,,.. Error. The hypothesis is that the error is normally distributed. The second problem facing the student is: is the distribution of mv s a m ~ l eSUEciently non-normal t o warrant rejecting the hypothesis that the measuring operation produces a normally distributed error? Figure 1 shows the result of estimating each end of an 8-mm. 0.d. object with rounded ends to the nearest

+

1 2.39 centimaters Fig-

I.

Rssulb of &timatins the h n g t h of an Object t o the Ne-t 0.01 centimeterusinga ~~t~~ stick

VOLUME 33, NO. 3, MARCH, 1956

tenth of a millimeter 100 times and plotting the 100 differences as a histogram. The theoretical normal distribution curve is superimposed on the histogram. It is easy to test the hypothesis that a sample of, for example, 100 length estimates came from anormally distributed population. The test to be used is the chi square test which was developed by Karl Pearson and published in 1900. This is one of the most widely used statistical tests because it can be applied to test whether an observed frequency differs from the expected frequency. The steps to be followed here are: (a) calculate the average and standard deviation of the sample, (b) define a normal curve with the same average, standard deviation, and total number as the sample, (c) take differences between the observed frequencies and the frequencies predicted by the normal curve, and (d) calculate chi square by the use of equation (1). The chi square value calculated is used to enter Table 4 to determine the chance that the observed frequencies would have arisen if the hypothesis of normality is correct.

where 0 = observed frequency and E = expected frequency. For the first step, the standard deviation can he calculated quickly and easily by a method in which the average is assumed (and later corrected). The data for the histogram in Figure 1 are given in Table 1 where thistechnique is illustrated. The most convenient assumption for the average is the class length 12.37. By having the assumed average near the center of the distribution, the arithmetic is lightened. The third column of Table 1 states the deviations in terms of class intervals from the assumed mean. The first class is three classes below the assumed mean, so it is labeled -3, etc. The fourth column is the product of each class deviation and the number of lengths (i. e., the frequency in that class). The fifth column contains the products of the items in the third and fourth columns. The standard deviation for a single measurement is given now by:

The true average of the sample can be obtained from the total of the jd column in Table 1 by the formula:

where R' = assumed average. Substitution in equation (3) gives: True average = 2

=

12.37

-39 +( 0 01) = 12.3661 cm. 100 '

The calculations of the standard deviation and the average can be readily checked by repeating the computation with a different assumed mean: Steps (b) and (c) relate to the calculation of expected frequencies. Table 2 exemplifies this calculation. The third column represents the distance of each length class from the average (12.3661 cm.). The fourth column states these deviations in terms of the standard deviation (0.0114 cm.). The lifth column headed Handbook table ualue is from the table of "Areas, Ordinates, and Derivatives of Normal Curve of Error" in the "Handbook of Chemistry and physic^."^ The values required are ordinates. The deviations in terms of standard deviation units in column four of Table 2 are the values of tin the handbook table. Finally, it is necessary to convert the handbook table values (which are for a unit normal curve) to the units of the present problem. The equation for the curve is:

'HODGMAN, C. D., Edito~, "Handbook of Chemistry and Physics," 35th ed., Chemical Rubber Pnblisbing Co., Cleveland, 1953, p. 229.

TABLE 1 Calculation of S t e n d a d , Deviation, Assumed Mean = Y

Length (class)

Obsemed fmpuenw, f

= 12.37

Class deviation, d

fd

.faz

where Ct = the class interval and N = total number of measurements. Substituting the data from Table 1 gives: TABLE 2 Calculation of Expected F ~ e q u e n c i e . Length (class)

Observed freouencu. f

Deviation from true averaoe. z

Deviation i n a m i l s . '1.

Handbook table value

Ezpecled freouencu, f'

JOURNAL OF CHEMICAL EDUCATION

128

If the probability had turned out to be one per cent, for example, it would have been concluded that the method does not produce a normauy distributed error. Why? Because the chance of getting a sample with that high a chi square value if the parent population were normal would be only one in a hundred. We would prefer to conclude that the parent population is not normal rather than conclude that an event having only one chance in a hundred of occurring had come about. For the present example it can be said that about 15 per cent of the time we would get a poorer approximation to normality and about 85 per cent of the time we would produce samples which would more closely fit the normal curve. It is necessary to decide a priori on what probability (significance) level one will use to reject the hypothesis. A convenient value would be one per cent. This would mean that any sample giving a chi square value equal to or exceeding the critical probability value of one per cent would be considered as indicating that the parent population from which the sample was drawn was not normal. This is a calculated risk. On the average, the conclusion based on this significance level will be wrong once every hundred times it is applied. From this it can be seen that a significance level states the probability of reTABLE 3 jecting the hypothesis when i t is really true. This is Calculation of Chi Square an undesirable error to make and so the probability Observed Ezpeded level is set fairly low. frequency frepuency (M Table 4 also gives chi square values corresponding to f f' 1' probabilities close to unity (100 per cent). A value in this range would occur if the experimental results 12:) 0.75 36 30.2 1.11 followed the normal distribution too closely, i. e., more 30 33.0 0.27 closely than chance would predict. A student report16.6 ing a value for chi square of 0.02 with two degrees of 3.9) 0.11 freedom could be said (to the 99 per cent significance 2.24 = xa level) t o have invented his data. The 99 per cent significance level is just as convincing as the one per A chi square value is calculated for each line provided cent significance level. The importance of the normal distribution must be each class has an expected frequency of a t least five. Classes having fewer than five cases are combined with emphasized because it forms the backbone of much of adjacent classes until the total expected frequency is present-day statistical analysis. Students should oba t least five and a chi square is computed using the cor- serve that it is a normal curve which is usually generated in an experiment such as described here. responding combined observed frequencies. Using the calculation technique just detailed, the The total of the individual chi squares is the chi square by which the normality hypothesis is tested. student data given as typical by Peterson1 can be It has n-3 degrees of freedom, where n = num- evaluated. His Example 1 is plotted in Figure 2 along ber of individual chi square values summed. The with the theoretical normal curve. Despite the symnormal curve which is fitted to the experimental data metry, the value found for chi square was 57.9 with has been constrained t o agree with these data in three three degrees of freedom. The hypothesis of normality respects, viz., it has the same number of cases, the same is rejected with very much greater significance (lower arithmetic average, and the same standard deviation numerical value) than 0.1 per cent. The parent popuas the original figures. Each constraint uses up a de- lation might have be'en normal, but the chance that it gree of freedom. The chi square value from Table 3 was and gave us the sample it did is very much less than is 2.24 with 4-3 = 1 degree of freedom. Table 4 0.1 per cent (actually, it is about one in ten billion!). gives the probability that the observed frequencies Indeed, these data can be shown to follow the "median shown by the histogram in Figure 1 would be obtained lawn3 which is the distribution generated when a few if the measuring operation produces a normally distrib- measurements are made with considerable care and uted error. Since the probability is greater than 10 many measurements are made with less care. Students per cent, it can be concluded that there is no evidence a BOND, W. N., "Probability and Random Errors," Edward against the normality hypothesis. Arnold and Co., London, 1935, p. 61. The table referred t o in the handbook, when entered in the ordinate column, gives the values of the factor in brackets. This conversion is done by multiplying each ordinate of the unit curve by (N) ( C r ) / s . I n this problem, it means that each value in column five of Table 2 be multiplied by (100) (0.01)/0.0144 or 87.7. I n this way column six is produced. An expected frequency of 2.5 means that in a large number of experiments, each consisting of 100 measurements of the particular object in question, half the time (on the average) one could expect to get two values of 12.34 cm. and half the time three values of 12.34 cm. The smooth curve in Figure 1 is drawn through the expected values (ordinates) shown in Table 2. To aid in drawing this curve any number of additional points can be calculated. For example, t o find the ordinate value three standard deviation units (i. e., (3) (0.0114) = 0.0342 cm.) away from the mean (either above or below-the normal curve is symmetrical), t = 3.00, handbook table ordinate value = 0.0044, and in terms of the units of this problem, (0.0044) (87.7) = 0.4. Step (d), the calculation of chi square values, is shown in Table 3.

3

VOLUME 33, NO. 3, MARCH, 1956

129

%

TABLE 4 Chi Square Distribution* D.F?

99

90

50

Probability, O/o 10

5

I

0.1

Abridged from FISHER,R. A., AND F. YATES,"Statistical Tables for Biologiod, ~grioultu;al and Medical Research," Oliver and Boyd, Edinburgh, 1953, Table IV, by permission of the authors and publishers. * D.F. = degrees of freedom.

should be cautioned that all measurements must be 100 times and had obtained a standard deviation of made with the same care. 0.0125, an F ratio of Peterson's Example 2 gives no evidence against the = normality hypothesis. The probability level of chi (0.0114)' square was 50 per cent indicating that, on the average, half the time a repetition of this experiment would shows (Tahle 5, N = 100 D.F. e100) that the two give better results and half the time poorer results. workers do not differ in their precision. The probaOne could not expect a more favorable indication of bility is greater than 10 per cent that the two variances came from the same hypothetical population of varinormality. Concerning his Example 4, Peterson says that a ances. Only when this probability is, for example, double peak can result if one partner (collecting half 5 per cent or less are we willing to risk stating that the the data) tends to read higher than the other. It is variances differ. If the variances do not differ they may be combined reasonable to test whether the partners differ with regard to (a) precision and (b) average measurement by simple averaging, since the number of measureto determine whether their results could be combined. The precisions must be 75compared first. The standard deviation obtained by each student is squared to give the variance. The larger variance is divided by 6 0 the smallertogiveaquotient which is assessed by means of Tahle 5. The table is set up to accommodate the 45 situation in which each variance is based on N oh- ' servations. If this so-called F ratio equals or exceeds k the tabulated value for P 30 probability level, the two variances are said to differ a t the P significance level. The interpretation here is identical to that described 15for the sienificance of the chi square test. The data shown in Figure 1 yielded a standard 1 I I deviation of 0.0114 cm. If 46.75 46.80 46.85 a second experimenter had Centimeters measured the length of the ngum a. Rasulte of Estimstinu the b n e h of Object to the Ne-st 0.01 Centimefsr U*nw same (or a different)object ~ . t - stick (D-. from P . ~ = . O ~ ,IOC. it.)

-

-

-

-

.

JOURNAL OF CHEMICAL EDUCATION TABLE 5 Distribution of F Ratio (Two-tailed)-

nR S

in

Probability, 6

%

I

The difference between the two average lengths is now tested by substitution in equation (4).

Looking up this value of t in the table of "Areas, Ordinates, and Derivatives of the Normal Curve" in the handbook one finds it corresponds to an area of 0.4279. This value is the area under the curve from Kcprodwed 1j.v permission of Profwaur ti. S. I'raraon from ~ERRINOTOS, \IAXIYE, A N I ) CAT HE HI^ 11. THOMPSON, ..Tnhlt?$ the mean to + t standard deviations. Our interest 01 I'oitrts of the Inverted Brta (FI 1)istribution." lies in the percentage of the area lying beyond this - - I'rrwntaar - ~ Values for D.F. = '100 were obtained Biometrika. 33.73 (1943). . . point for both halves of the curve. Hence, 1-2 (0.4279) by interpoi~tion. = 0.14 or 14 per cent. This falls short of the usual D.F. = degrees of freedom = N - 1. -arbitrary significance level of 5 per cent (or 1 per cent) ments in each is the same. The best value for the and so the difference is labeled statistically nonstandard deviation of a single measurement is then significant. Recourse is made to the table only if the the square root of this average variance. If the vari- probability is to be calculated, otherwise it can be said ances can be combined, it is then possible to test whether that the calculated value for t must equal or exceed the two averages differ using the following formula: 1.96 for 5 per cent significance or 2.58 for 1 per cent significance. These values assume an infinite number of measurements but can applied with negligible error whenever the number of measurements exceeds 50. The standard deviation of a single measurement has The absolute difference between the two student's already been calculated. There would seem to be no average values is divided by the best standard deviation estimate (from the combined variances) and multi- special reason for calculating the probable error which plied by the square root of one-half the number of de- is 0.6745 times the standard deviation. Instead, in terminations each student made. The value of t conformance with current practice, the 95 per cent or obtained is used to enter the same table2 from the 99 per cent confidence limits should be computed. "Handbook of Chemistry and Physics" previously These limits are +1.96s and *2.58s, respectively, used, only this time one minus twice the area cor- where a is the standard deviation of the value being responding to the t is required. The table is thus made delimited. It should be of interest for the student t o calculate, to give the decimal significance level reached by the difference being tested. Again an arbitrary signifi- for example, the 95 per cent confidence limits for his cance level (customarily the 0.05 or 0.01; i. e., the average value. The standard deviation of the average 5 per cent or 1 per cent level) is settled on before the of N values is the standard deviation of a single measurement divided by the square root of N. Applying comparison is made. As an example, suppose this second experimenter this to the data given in Table 1: found an average length of 12.3636 cm. for the object 95 per cent confidencelimits = 12.3661 ik ( 1 . 9 6 ) ( 0 . 0 1 1 4 ) / d ~ referred to in Figure 1. The two standard deviations, = 12.3661 0.0022 since they were shown not to differ significantly, should This means that if there is no bias on the part of the be combined to give a better estimate. experimenter or the meter stick, the true length of the object lies somewhere in the range 12.364- 12.368 cm. TABLE 6 unless the 100-95 = 5 per cent chance of being Calculation of Chi Ssuem misled through sampling variations has occurred. Final Observed Ezpected (f-f')l The 99 per cent confidence limits reduce the chance integer frequency, f frequency, f' f' of being wrong about the statement as to the value 0 23 10 16.9 of the true length to 100-99 = 1 per cent, but the cost 5 10 2.5 1 of this added pr~tectionis a larger range (i.e . , 12.3638 10 0.4 2 3 10 10 0.0 12.369). 4 7 10 0.9 Peterson' also comments that some students show 5 13 10 0.9 prejudice in their readings favoring last figures which 6 11 10 0.1 7 9 10 0.1 are zero or five, or perhaps, even integers. The chi 8 8 10 0.4 square test previously used is equally applicable to 9 6 10 1.6 test this situation. The final integers of the readings taken a t the right hand end of the object used in ~~

~

~

*

VOLUME 33. NO. 3, MARCH, 1956

gathering the data for Figure 1 are operated on in Table 6. Thereadings obtained a t the left end of the object could have been used instead, but not both because the two sets of readings are not independent. The chi square test, as well as all other tests described here require that the errors in the individuals making up the data t o be tested be uncorrelated (i. e., independent). This chi square has n - 1= 9 degrees of freedom because only the totals of the observed and expected frequencies have been made to agree. The expected value of N/10 for each class was set up independently. The value of chi square, 23.8, with 9 degrees of freedom lies between 0.1 and 1 per cent significance. Examination of the data shows that the chi square associated with zero is responsible for more than half the total chi square. It can be concluded, therefore, that this experimenter preferred zeros! This is further borne out by the observation that ones and nines appear with the lowest frequencies. Apparently, whenever the end of the object lay fairly near zero, the reading was recorded as zero a t the expense of the adjacent integers. To test whether the even integers are preferred over the odd, or vice versa, two categories, even and odd, would be set up with expected frequencies of 50 per cent of the total in each. The total chi square would have n - 1 = 1 degree of freedom. It should be emphasized here that the hypotheses to be tested should be set up before the results are examined. It is unfair to search the data for peculiarities or trends and then expect the probability tables to yield valid significance

levels when these are tested. The test exemplied in Table 6 should be the most generally useful. I n conclusion it should be emphasized that although classical large-sample (N>50) statistics have been exemplified here, small-sample statistics (the basis for modern experimental design) differs only in the tables used and not in interpretation. The techniques and concepts treated here form a background which the student will find indispensable t o an understanding and appreciation of modern statistical experimental design. For those who are interested in reading further on this subject, three books can be recommended especially for the chemist. These are Dixon and M a ~ s e y ,Davies,' ~ and Brownlee! Each in its own way can serve to extend the chemistry teachers' horizon in the subject of statistics. The first book is universally considered to be an excellent b e ginning text. The second is a superbly-written primer which all chemistry teachers and chemists who handle numerical data should have a t hand. Brownlee's book stands virtually alone in the way it has inspired chemists to "get their feet wet" in statistics. The latter were written especially for chemists. Each of these b o o b deals with the techniques described in this paper. DIXON,W. J., AND F. J. MASSEY,"Introduction to Statistioal Analysis," McGraw-Hill Book Co., New York, 1951. WAVIES, 0.L., E d i t o ~ "Statistioal , Methods in Research and Production," 2nd ed., Oliver and Boyd, London, 1949. e B ~ K. ~A,, "Industrid ~ ~ ~Experimentation,'' ~ ~ , 4th American ed., Chemical Publishing Co., New York, 1953.

DEXTER CHEMICAL CORPORATION AWARD IN THE HISTORY OF CHEMISTRY THE Dexter Chemical Corporation has established an award in the history of chemistry amounting initially to the sum of 5250 and a suitable scroll. The following are the conditions under which the Division of History of Chemistry of the Americd Chemical Society will administer the award. (1) The sole administration of the award shall rest a.ith the Division of History of Chemistry. (2) The award shall be given not more than once s. year, a t a meeting of the Division. When possible, the recipient should be present to receive the award, and should thereafter give an address. * nomin:.tion* for the w a r d ?hall I,? p~~trliahed in C h ~ n l i c n l# ~ nEnVinr~ring d ( 3 , I ~ w i t ~ t i o nfor . Y P during ~ Jnnusry. Ntmiw!liow :!re u, br a w t U, the srrn.txry of tlw Divi.4ou n u t h t r r t l w ~ !drrdl 10. TI>+wvrrtary h t 1 I thm rrnd t l ~ enuntinntiorw to the tlrrre rucuhrrr of tlar ae.ru.d

committee. ( 4 ) The chairman of the Division shall appoint three members of the Division to serve as the award committee. Of these members, one shall be appointed for one year, one for two, and one for three years. Each following year one new member shall be appointed to keep the number atthree. The decision of the committee shall be by secret ballot to he sent to the secretary. The find d c cision shall be submitted to the secretary not later than June 10 so that announcement of the recipient may be published in the official program of the fall meeting of the American Chemicd Society. ( 5 ) The award shall be m d e on the basis of services which have advanced the history of ohemistry in any of the following ways: by publication of an important book or article; by the furtherance of the teaching of the history of chemistry; by significant contributions to the bibliography of the history of chemistry; or by meritorious services aver a long period of time which have resulted in the advancement of the history of chemistry.