Statistical Table for Duplicate Results

Statistical Table for Duplicate Results. A technique useful in sampling problems and vendor-consignee disagreements. TRADITIONALLY the course of actio...
2 downloads 4 Views 3MB Size
I/EC A

STATISTICAL

W O R K B O O K

DESIGN

F E A T U R E

by W. J. Youden, National Bureau of Standards

Statistical Table for Duplicate Results A technique useful in sampling problems and vendor-consignee disagreements

RADITIONALLY the course of action has been so to improve the experi­ mental facilities a n d the instru­ mentation that experimental errors will not obscure the gains or losses that m a y arise when operating con­ ditions are changed. In many cases such efforts are very successful. I n other cases difficulties m a y prevent such reduction in error. For ex­ ample, m a n y results depend upon a sample. T h e sample m a y be taken at the end of the process, or the proc­ ess m a y depend upon the composi­ tion of one of the r a w materials which has to be sampled. I n either event the acquisition of a sample that a p ­ propriately represents the whole of a supply of material is seldom easy. Often a rather elaborate ritual of taking a composite sample is speci­ fied. Somewhere along the line the sampling error has to be deter­ mined a n d a procedure evolved to keep it within bounds.

T

Checking α Sampling Procedure Some believe that the way to de­ termine the sampling error (if this be the chief source of error) is to take a considerable n u m b e r of samples in some particular instance of the proc­ ess concerned. From the view­ point of an immediate estimate of the sampling error, this approach has a n advantage. O n the other hand, the extension of this informa­ tion to all subsequent examples of this process depends completely upon the hope that the instance so inten­ sively investigated is in fact a typical example. If, for convenience, the sampling technique is based upon only one or two instances, its adequacy should

be tested over a considerable period of operation. This can be achieved with relatively little additional work. All that is necessary is to do the sampling in duplicate for a reason­ able sequence of instances, be these lots, batches, or any chosen seg­ ment of operations. T h e argument is advanced that the difference between duplicates evaluates the sampling technique. Taking a difference drops out the magnitude of the quantity being es­ timated, so that variation among lots in the quantity being estimated does not interfere. T r u e one difference does not give m u c h of an idea of the agreement to be expected between duplicates. But this is the whole point. It should be relatively easy to accumulate a fair n u m b e r of such differences in a continuing process. Several gains result from such a rec­ ord, besides the direct one of being able to compute the average of all the differences. First, the investi­ gation is spread over a substantial number of instances, so that protec­ tion is afforded against the optimis­ tic or pessimistic idea of the variation that intensive sampling of just one instance might give. Second, ex­ amination of the variation exhibited among the differences gives some idea of the largest difference that may turn u p . An unusually large difference generally leads to further sampling. Someone has to set a particular difference which, if ex­ ceeded, calls for further investiga­ tion. I n m a n y cases such differences can arise from paired values. Some­ times material is sampled before loading for shipment a n d another sample is taken from the loaded car­

rier. Repeatedly there are avail­ able the results for samples taken by shipper and consignee. These form natural pairs. Each pair supplies a difference. Any j u d g m e n t about a particular presumed overlarge dif­ ference must, if it is to be soundly based, rest upon a clear picture of the kind of differences that m a y turn u p under normal circumstances. T w o types of situations must be clearly distinguished. I n one type the two results are of such a nature that there is no way to distinguish between t h e m — t h a t is, there is no way to decide which is to be sub­ tracted from the other. Here the difference between the two results is taken as the absolute difference and given a positive sign. T h e sum of these differences divided by n, the n u m b e r of differences, gives the average difference. I n the other type the origins of the two results being compared gives an unmis­ takable identification. T h e n it is imperative to take the difference in the same order for every pair: be­ fore loading minus after loading, or shipper's result minus consignee's result. T h e sign as well as the mag­ nitude of each difference must be carefully recorded. T h e algebraic average of these differences gives a measure of any bias or persistent tendency for the difference to be in a particular direction. T h e individ­ ual differences therefore consist of two parts: the bias a n d the varia­ tion arising from sampling and meas­ urement. Therefore all these dif­ ferences must be corrected by sub­ tracting, with due regard for signs, this average difference or bias. T h e remainders or corrected dif­ ferences are now considered to be VOL 49, NO. 4

·

APRIL 1957

79 A

I/EC

STATISTICAL DESIGN

Table I.

A Workbook Feature

Proportion of Differences Between Duplicates Which Exceed Indicated Multiple of A v e r a g e Difference Between Duplicates

Multiple 0 1.0 2.0 3.0 4.0 Table II.

.

1.0000 0.4249 0.1105 0.0167 0.0014

0.2

0.4

0.5

0.6

0.8

0.8733 0.3383 0.0792 0.0107 0.0008

0.7496 0.2640 O.OSSS 0.0067 0.0004

0.6900 0.2314 0.0461 0.0053 0.0003

0.6321 0.2017 0.0380 0.0041 0.0002

0.5233 0.1509 0.0255 0.0024 0.0001

Distribution of 5 0 Differences from 5 0 Pairs of Duplicate Spectrographic Determinations of Tin Diff. 0.00 0.10 0.20 0.30 0.40 0.50

0.00 1 2 1 1 1

(Approx. Sn = 3.3%) 0.01 0.02 0.0.3 0.04 0.05 0.06 0.07 0.08 0.09 4 2 1 1 2 2 2 1 1 3 1 1 2 3 1 1 3 4 1 1 1 2 1 1

1 1 Entries show number of differences for each size of difference. Thus there was one case of perfect agreement between duplicates; four cases where the difference between duplicates was 0.24. The sum of the 50 differences is 8.59; av. diff. = d = 0.1718. The standard deviation for a single determination may be computed from the average difference using the relation (if there are 10 or more pairs) S.D. = d\/ir/2 = 0.886 d = 0.1522.

just as they would have been if there had been no persistent tendency of shipper's results to be higher (lower) than consignee's results. T h e signs of all of these remainders are changed to plus and the average of the a b ­ solute values is used as in T y p e 1. T h e divisor should be (re — 1) to compesate, in part, for the adjust­ ments. This column is concerned to show something about the way such dif­ ferences behave. By this is meant, if we know the average difference between duplicates, what propor­ tion of the observed differences can be expected to exceed twice or three times this average difference? T h e word "expected" here refers to dif­ ferences that occur in the normal course of events and not as a result of a blunder. Individual differ­ ence will range from zero to some quantity considerably larger than the average difference.

Table III. Comparison of Table II Differences with Theory N o of Multiples Differences : of Av. Exceeding This Difference Difference Obsd. Expected 0.5 d = 0.086 34 34.5 1.0 d = 0.172 26 21.2 1.5 d = 0.258 10 11.6 2.0 d = 0.344 5 5.5 2.5 d = 0.430 2 2.3 3.0(ί = 0.515 1 0.8

80 A

T a b l e I shows that 6 9 % of the differences m a y be expected to be larger than half the average differ­ ence; that 4 2 . 4 9 % are, in the long run, larger than the average dif­ ference. About 1 difference in 9 exceeds twice the average differ­ ence, a n d about 1 difference in 60 turns u p larger t h a n three times the average difference. These larger differences occur in the normal course of events. T h e y belong and do not indicate that one or the other of the two results is at fault. Even more important (in well behaved data) is the absence of any correla­ tion between the correctness of the average of two results a n d the size of the difference between them. T a b l e I is useful to set u p stand­ ards for asking for repeat results. It tells what it will cost in unneces­ sary repetition of work for any chosen level of difference at which additional samples are d e m a n d e d . T h u s if an observed difference ex­ ceeds 2.5 times the average dif­ ference further samples m a y be ex­ amined. But 4 . 6 1 % of perfectly good results m a y be expected to show such a large discrepancy. T h e r e ­ fore nearly 5 % of the material is needlessly re-examined as a price for catching real blunders that cause differences of that size. Sup­ pose this standard for taking action is set a n d over a long time it turns out that 8.6% of the lots are re­

INDUSTRIAL AND ENGINEERING CHEMISTRY

examined for reason of excessive differences. This would indicate that about 4 % of the results really do have excessive errors for one reason or another. T o catch this 4 % , more than that a m o u n t of good material also has to be re­ examined. T h e r e is no way to es­ cape further work in order to de­ cide whether a somewhat large dif­ ference comes from a blunder or ii one of those that do not reflect on either duplicate. This explains why re-examination of material often only confirms the average already in hand. T a b l e I makes it easy to select that multiple of the average differ­ ence that will feed back any desired smaller percentage of the good work. Of course, somewhat larger real mistakes will then escape detec tion. T h e table puts on a quanti­ tative basis decisions which a r e usually formed intuitively wherever this question of re-examination arises. T a b l e I I shows 50 actual differ­ ences for duplicate spectrographic determinations of tin. Although the average difference is 0.1718, one pair of duplicates shows a difference of 0.54 or over three times the average difference. Such a difference might well arouse some skepticism re­ garding one or the other of the two results. T h e fact is that statistical theory predicts one such large dif­ ference in 50 pairs. This is not to say that this entirely removes this pair from suspicion. It does say that one should not be surprised if further determinations on this lot of alloy check the average of this some­ what discrepant pair. T a b l e I I I shows the good agree­ ment between observed and expected frequencies for various multiples of the average difference (0.172). Announcement

T h e Southern Regional G r a d u a t e Summer Sessions in Statistics will be held this year at Virginia Poly­ technic Institute, Blacksburg, Va., J u n e 12 to J u l y 20. Four universi­ ties cooperate with the Southern Regional Education Board to give each year a program with degree credits accepted by all the cooperat­ ing schools. This year a n impres­ sive seminar program has also been arranged.