I/EC A
STATISTICAL
W O R K B O O K
DESIGN
F E A T U R E
by W. J. Youden, National Bureau of Standards
Statistical Table for Duplicate Results A technique useful in sampling problems and vendor-consignee disagreements
RADITIONALLY the course of action has been so to improve the experi mental facilities a n d the instru mentation that experimental errors will not obscure the gains or losses that m a y arise when operating con ditions are changed. In many cases such efforts are very successful. I n other cases difficulties m a y prevent such reduction in error. For ex ample, m a n y results depend upon a sample. T h e sample m a y be taken at the end of the process, or the proc ess m a y depend upon the composi tion of one of the r a w materials which has to be sampled. I n either event the acquisition of a sample that a p propriately represents the whole of a supply of material is seldom easy. Often a rather elaborate ritual of taking a composite sample is speci fied. Somewhere along the line the sampling error has to be deter mined a n d a procedure evolved to keep it within bounds.
T
Checking α Sampling Procedure Some believe that the way to de termine the sampling error (if this be the chief source of error) is to take a considerable n u m b e r of samples in some particular instance of the proc ess concerned. From the view point of an immediate estimate of the sampling error, this approach has a n advantage. O n the other hand, the extension of this informa tion to all subsequent examples of this process depends completely upon the hope that the instance so inten sively investigated is in fact a typical example. If, for convenience, the sampling technique is based upon only one or two instances, its adequacy should
be tested over a considerable period of operation. This can be achieved with relatively little additional work. All that is necessary is to do the sampling in duplicate for a reason able sequence of instances, be these lots, batches, or any chosen seg ment of operations. T h e argument is advanced that the difference between duplicates evaluates the sampling technique. Taking a difference drops out the magnitude of the quantity being es timated, so that variation among lots in the quantity being estimated does not interfere. T r u e one difference does not give m u c h of an idea of the agreement to be expected between duplicates. But this is the whole point. It should be relatively easy to accumulate a fair n u m b e r of such differences in a continuing process. Several gains result from such a rec ord, besides the direct one of being able to compute the average of all the differences. First, the investi gation is spread over a substantial number of instances, so that protec tion is afforded against the optimis tic or pessimistic idea of the variation that intensive sampling of just one instance might give. Second, ex amination of the variation exhibited among the differences gives some idea of the largest difference that may turn u p . An unusually large difference generally leads to further sampling. Someone has to set a particular difference which, if ex ceeded, calls for further investiga tion. I n m a n y cases such differences can arise from paired values. Some times material is sampled before loading for shipment a n d another sample is taken from the loaded car
rier. Repeatedly there are avail able the results for samples taken by shipper and consignee. These form natural pairs. Each pair supplies a difference. Any j u d g m e n t about a particular presumed overlarge dif ference must, if it is to be soundly based, rest upon a clear picture of the kind of differences that m a y turn u p under normal circumstances. T w o types of situations must be clearly distinguished. I n one type the two results are of such a nature that there is no way to distinguish between t h e m — t h a t is, there is no way to decide which is to be sub tracted from the other. Here the difference between the two results is taken as the absolute difference and given a positive sign. T h e sum of these differences divided by n, the n u m b e r of differences, gives the average difference. I n the other type the origins of the two results being compared gives an unmis takable identification. T h e n it is imperative to take the difference in the same order for every pair: be fore loading minus after loading, or shipper's result minus consignee's result. T h e sign as well as the mag nitude of each difference must be carefully recorded. T h e algebraic average of these differences gives a measure of any bias or persistent tendency for the difference to be in a particular direction. T h e individ ual differences therefore consist of two parts: the bias a n d the varia tion arising from sampling and meas urement. Therefore all these dif ferences must be corrected by sub tracting, with due regard for signs, this average difference or bias. T h e remainders or corrected dif ferences are now considered to be VOL 49, NO. 4
·
APRIL 1957
79 A
I/EC
STATISTICAL DESIGN
Table I.
A Workbook Feature
Proportion of Differences Between Duplicates Which Exceed Indicated Multiple of A v e r a g e Difference Between Duplicates
Multiple 0 1.0 2.0 3.0 4.0 Table II.
.
1.0000 0.4249 0.1105 0.0167 0.0014
0.2
0.4
0.5
0.6
0.8
0.8733 0.3383 0.0792 0.0107 0.0008
0.7496 0.2640 O.OSSS 0.0067 0.0004
0.6900 0.2314 0.0461 0.0053 0.0003
0.6321 0.2017 0.0380 0.0041 0.0002
0.5233 0.1509 0.0255 0.0024 0.0001
Distribution of 5 0 Differences from 5 0 Pairs of Duplicate Spectrographic Determinations of Tin Diff. 0.00 0.10 0.20 0.30 0.40 0.50
0.00 1 2 1 1 1
(Approx. Sn = 3.3%) 0.01 0.02 0.0.3 0.04 0.05 0.06 0.07 0.08 0.09 4 2 1 1 2 2 2 1 1 3 1 1 2 3 1 1 3 4 1 1 1 2 1 1
1 1 Entries show number of differences for each size of difference. Thus there was one case of perfect agreement between duplicates; four cases where the difference between duplicates was 0.24. The sum of the 50 differences is 8.59; av. diff. = d = 0.1718. The standard deviation for a single determination may be computed from the average difference using the relation (if there are 10 or more pairs) S.D. = d\/ir/2 = 0.886 d = 0.1522.
just as they would have been if there had been no persistent tendency of shipper's results to be higher (lower) than consignee's results. T h e signs of all of these remainders are changed to plus and the average of the a b solute values is used as in T y p e 1. T h e divisor should be (re — 1) to compesate, in part, for the adjust ments. This column is concerned to show something about the way such dif ferences behave. By this is meant, if we know the average difference between duplicates, what propor tion of the observed differences can be expected to exceed twice or three times this average difference? T h e word "expected" here refers to dif ferences that occur in the normal course of events and not as a result of a blunder. Individual differ ence will range from zero to some quantity considerably larger than the average difference.
Table III. Comparison of Table II Differences with Theory N o of Multiples Differences : of Av. Exceeding This Difference Difference Obsd. Expected 0.5 d = 0.086 34 34.5 1.0 d = 0.172 26 21.2 1.5 d = 0.258 10 11.6 2.0 d = 0.344 5 5.5 2.5 d = 0.430 2 2.3 3.0(ί = 0.515 1 0.8
80 A
T a b l e I shows that 6 9 % of the differences m a y be expected to be larger than half the average differ ence; that 4 2 . 4 9 % are, in the long run, larger than the average dif ference. About 1 difference in 9 exceeds twice the average differ ence, a n d about 1 difference in 60 turns u p larger t h a n three times the average difference. These larger differences occur in the normal course of events. T h e y belong and do not indicate that one or the other of the two results is at fault. Even more important (in well behaved data) is the absence of any correla tion between the correctness of the average of two results a n d the size of the difference between them. T a b l e I is useful to set u p stand ards for asking for repeat results. It tells what it will cost in unneces sary repetition of work for any chosen level of difference at which additional samples are d e m a n d e d . T h u s if an observed difference ex ceeds 2.5 times the average dif ference further samples m a y be ex amined. But 4 . 6 1 % of perfectly good results m a y be expected to show such a large discrepancy. T h e r e fore nearly 5 % of the material is needlessly re-examined as a price for catching real blunders that cause differences of that size. Sup pose this standard for taking action is set a n d over a long time it turns out that 8.6% of the lots are re
INDUSTRIAL AND ENGINEERING CHEMISTRY
examined for reason of excessive differences. This would indicate that about 4 % of the results really do have excessive errors for one reason or another. T o catch this 4 % , more than that a m o u n t of good material also has to be re examined. T h e r e is no way to es cape further work in order to de cide whether a somewhat large dif ference comes from a blunder or ii one of those that do not reflect on either duplicate. This explains why re-examination of material often only confirms the average already in hand. T a b l e I makes it easy to select that multiple of the average differ ence that will feed back any desired smaller percentage of the good work. Of course, somewhat larger real mistakes will then escape detec tion. T h e table puts on a quanti tative basis decisions which a r e usually formed intuitively wherever this question of re-examination arises. T a b l e I I shows 50 actual differ ences for duplicate spectrographic determinations of tin. Although the average difference is 0.1718, one pair of duplicates shows a difference of 0.54 or over three times the average difference. Such a difference might well arouse some skepticism re garding one or the other of the two results. T h e fact is that statistical theory predicts one such large dif ference in 50 pairs. This is not to say that this entirely removes this pair from suspicion. It does say that one should not be surprised if further determinations on this lot of alloy check the average of this some what discrepant pair. T a b l e I I I shows the good agree ment between observed and expected frequencies for various multiples of the average difference (0.172). Announcement
T h e Southern Regional G r a d u a t e Summer Sessions in Statistics will be held this year at Virginia Poly technic Institute, Blacksburg, Va., J u n e 12 to J u l y 20. Four universi ties cooperate with the Southern Regional Education Board to give each year a program with degree credits accepted by all the cooperat ing schools. This year a n impres sive seminar program has also been arranged.