AN ECONOMIC BASIS FOR SETTING CONFIDENCE LIMITS - "The

AN ECONOMIC BASIS FOR SETTING CONFIDENCE LIMITS - "The Influence of Subjectivity in Hypothesis Testing". Frank P. Vance. Ind. Eng. Chem. , 1966, ...
0 downloads 0 Views 3MB Size
AN ECONOMIC BASIS FOR SETTING CONFIDENCE LIMITS The Influence of Subjectivity in Hypothesis Testing FRANK P. VANCE

Every . cholce . ‘ between alternatives is beset by uncertamties because absolute information is never available. This is illustrated by the diagram below. The body of information represented by FACT is a mysterious area which we attempt to define by analyzing random samples. By increasing sample size, we can cause a and 0 to approach zero. Robert Schlaifer, in “Probability and Statistics for Business Decisions” (Z), complains that the techniques of “objectivist” statistics, while providing formal procedures for controlling the frequency of Type I and Type I1 decision errors, provide no guidance as to whether these should be set equal or unequal, and at the lyo,5%, or higher level. The point is worthwhile. The purpose of this study is to present an analysis of the factors influencing the economic limit for these errors by taking account of expenses due to the two types of error and the expense of gathering the necessary data. Customarily, the magnitude of Type I error is set a t 5% for want of a

better limit. When a 95% confidence interval is constructed, it is equivalent to setting Type I errorusually denoted n-at 5%. Type I1 error is often ignored. T o achieve systematization in the choice of magnitude of decision errors-the lack of which Schlaifer decriesit is necessary to evaluate the dollar consequences of each type of error, the expense of making measurements, and the frequency distribution parameters of the factor being measured. The last item would include the mean and standard deviation if the variable is normally distributed; but in some circumstances the variable might have a Poisson, binomial, or other frequency distribution. The one most often encountered, particularly for chemical or physical measurements, is the normal. I n any case, many frequency distributions can be closely approximated by the normal. However, this factor should not be disregarded. T h e standard deviation is obtained from the square

I

I

The attempt to determine the F A C T concnning an observation is beset wilh umerfoinlies. Eucry CHOICE is mbjecl to possible errors, dtjned as rkown in the table. The nolure of decision mors is shown further in the diagram, where zo is the crifical value used to detmine whethn a sample belongs to the popu/alion on the righl 30

INDUSTRIAL A N D ENGINEERING CHEMISTRY

where a and P are the frequency of Type I and Type I1 decision errors shown in the matrix in the illustration above. As shown in the derivation section below, when the null hypothesis is true and the monomer is, in fact, polymer grade :

root of the variance estimated by the computation :

62 =

c x z ” - -c -c- N N - 1

where X , = results of individual samples AT = sample size 62 = variance estimate

a = F ( M X a X N112)

The larger N can be made, the more reliable will be the risk calculations in which & is subsequently employed. A formal procedure for finding the minimized risk, or point of diminished return on sample size, will be developed. Fundamental theory and derivation of equations are given in a later section. Risks

The decision to reprocess a lot of monomer, erroneously judged not to be polymerization grade (Type I error) ; the decision to ship a lot of material erroneously judged to be within specification limits (Type II error) ( 1 ) ; the decision to change operating variables in a polymer washing step, having erroneously judged the ash level to be too high (Type I error) ; and the like are actions that must of necessity be based upon the inexact information provided by the laboratory measurements. In each case the consequence is expense that could have been reduced had more reliable information been available, or had the information available been more reliably interpreted. Erroneous decisions cannot be eliminated entirely, except at intolerable expense. Therefore, it is important to detect the point of diminished return of useful information from sampling. Calculation of Risks

The term “calculated risk” implies that the hazards of erroneous decisions have been nicely balanced. These hazards can be expressed in terms of expensefor example, that due to reprocessing a lot of polymerizable monomer or to shutdown of a reactor system due to charging unpolymerizable monomer. The null hypothesis would be that the monomer is polymer grade; the alternative would be that it is sufficiently far off-specification to cause loss of reaction. I n process surveillance, this type of decision arises repeatedly. The dollar consequences of each type of decision error can be evaluated. In the case of judging the utility of a lot of monomer, the measurement performed usually consists of a benchscale polymerization test using a sample of the monomer and standard operating conditions, including catalyst and solvent if any. This test has some mean and standard deviation based on long-term use. O n the basis of an average of A- such bench tests, the lot of monomer will be charged to the polymerization system, or it will be reprocessed. Denoting C1 as the dollar expense of reprocessing a lot of monomer, CZ the expense caused by reactor shutdown, and Ca the unit expense of performing the bench test, we have

+

RI = C ~ O C C3N RII = CBp 3- CiN

(2) (3)

On the other hand, when the alternate hypothesis is true:

p

=

F[-MMn71/2(a

+ l)]

where

M

= distance between null and alternate hypoth-

eses in units of the standard deviationthat is, (PO - P d / 5 a = critical value for the test result average, normalized to the distance between the two hypotheses (2, - PO)/(l*O - P d F ( t ) = the fractiles of the cumulative normal distribution Tingey ( 3 ) has shown a mini-rnax solution to a similar problem involving material balance in processing valuable material of high dollar value-e.g., enriched uranium-as a means of optimization of the risks in apprehending diversion of the material. In his case, the over-all standard deviation of the material imbalance calculation was used. I n the present illustration, the decisions are of a somewhat different nature in that they have been extended generally to process surveillance and apprehension of the point of diminished return from increasing sample size. Further, since a reliable basis exists for favoring the null hypothesis because of conservative process design, this prior information is used to show the influence of subjectivity. Thus, as an illustration, let the following properties of a system apply :

c1 = $lO,OOO C2 = $45,000

C3 = $300 = 50 5 = 50

- p1

from which M = 1. Figure 1 has been constructed to show the influence of this weighting of the two types of decision errors at the indicated expense of gathering data for the decision p ~p1) rule. The abscissa is in units of a = (a, - p ~ ) / ( and the ordinate is in units of R/C1. As the critical value for the test result average, zc, approaches the alternate mean, pl, risks due to Type I1 errors increase, and conversely for Type I. Note that when a = 0, 8,= pa, and when a = - 1, aC= pi. The lower asymptotes for indicated sample sizes equal 0.03N because Cs/C1 = 0.03, and by moving 9, far enough to the ieft or right, 01 or can be made to approach zero (see Derivation of Equations). At the intersection of the VOL. 5 8

NO. 2 F E B R U A R Y 1 9 6 6 31

,

two curves, R I / G and RII/G, the maximum risk has been minimized, whichever hypothesis be true. The following table can be constructed from Figure 1 : N

m f 7.0 - 6.0

+0.14 -0.12 -0.30 -0.40

1

2 5 10

RI - _ Rir _ e, - e*

xc

a

LO

0.58 0.55 0.40 0.41

- 15.0

- 20.0


p e I error is zero-that is, a has large negative values. Accordingly, risk would be limited to the expense of collecting data-in this case, 0 . 0 3 S . The upper asymptote is reached when Type I error equals 1-that is, a has large positive values. Therefore total risk equals C1 plus the expense of collecting data. Because risks are shown normalized to CI, upper asymptotes are at 1.0 0 . 0 3 S . Similarly for Type I1 errors (y = O), when a has large positive values and Type I1 error equals zero, the lower asymptote for RII/C1 is the same as for RIlCL, namely 0.03-Y.For large negative values of a, probability of Type I1 error approaches unity, and risk equals C2/C1 0 . 0 3 S = 4.5 0.03S. HoweL er repetitive decisions are reached, the following factors must be taken into account:

usually seen-for example, 5%, l%, and the like--are redundant. The evaluation of risks in decision-making follows from knowledge of the consequences of the two types of error, coupled with their expected frequencies. The dollar consequences of each error should be available from cost accounting. 'The expected frequencies are found as shown in following sections. The frequent) distribution of most measurements can be closely approximated by the Gaussian, probabilistic fractiles, for which for zero, the mean and the unit variance have been widely tabulated as the normal distribution. Thus, if the x,'s have Gaussian frequency distribution N ith mean, p, and variance u2, then

t = -x, -

+

+

+

-Expense due to erroneous decisions; Le., Type I and Type I1 as described -Expense attributable to the information collection process

If these data are reliably obtained and, in addition, the standard deviation of the measurement is reliably evaluated, then we know how to assemble this information to find the decision rule to lead to minimization of the maximum risk.

has the normal frequency distribution with zero mean and unit variance. Accordingly, knowing p and u, we can find the expected frequencies of Type I and T>-peI1 errors as follows: a = F(t,)

jvhere F ( t ) symbolizes the fractile of the cumulative normal. Further

and t = - x - Pl U5

where

Derivation of Equations

PO

=

null hypothesis-i.e.,

The construction of statistical hypothesis tests is ell described in the literature of statistical methodology (Dixon and Massey, 7). Schematically, the principles are illustrated by the diagram on page 30. The two frequency distributions showm are of sample independent samples, i = means, 3 = 8x,/.V, where 1, 2, . . ., AYare taken. Under the null hypothesis, the population mean, p, equals p o . Alternatively, the true mean could be p1. T o determine which it is, the S samples are taken. O n the basis of the average computed from these results, it will be decided from which population the samples came (or did not come). When in factL.I, = P O , it will be decided with frequency cr that p = p1; conversely, when in fact p = p1, it ill be decided with frequency 6 that P = P O . In the usual test, the levels of a and ,B are set essentially arbitrarily, as Schlaifer has complained (2). Thus, it may be argued that an element of subjectivity enters an objective hypothesis test as soon as the levels of Type I and Type I1 errors are ventured subjectively. However, the levels of a and P should never be set without quantitative evaluation of the consequences of each. Then, as shown by the example on page 33, the levels

p1

=

alternate hypothesis-Le.,

34

I N D U S T R I A L AND E N G I N E E R I N G CHEMISTRY

PO

l7

0% =

HO;

,u = ,uo

HI;

,U

= PI

a,/-Y1/2

When we refer to the diagram on page 30, it is seen that 01 = F(t,) and

p

1 - F ( t p ) = F(-tt,)

=

For convenience we denote

and 5c - PO a=---PO

-

PI

where zc is the critical value for the decision rule. If 2 < x,, reject P O ; otherwise, reject P I . M'e have then t , = Ma-Y'[Z

t@=

-,MS1/2(u

+ 1)

When p = p o , risk due to erroneous decisions is given by: RI = CiF(MaN112) C3N

+

and when

p = p1

RII

=

+

C2F[-MMN1/2(~ l ) ]

+ C3N

f(MUlV/2)

where

C1

=

I n the immediately foregoing expressions, f ( t ) = d F ( t ) / d t -i.e., f ( t ) is the ordinate of the normal curve at t , and F ( t ) is the area to the left o f t under the normal curve. Collecting terms and simplifying from the two partial derivatives, we have :

f [-MNl’2(a

+I)]

- (1 -

C3 = unit expense due to each measurement performed to produce the x i

+ l ) ]= o

from b(R,/CI)/bN. First we note that

Further, for convenience, we can normalize all risks t o C1. If we let C2/C1 = g and C3/C1 = h, then:

f(MUNl/Z) =

1

___

d 2 7

exp - (MaN112)2

and

+

_ -- F(MuN112) hN

RII - _ - gF[-MN112(a

+

2 h.1’112 yMaf[-M4V’*(u

C 2 = dollar expense due to Type I1 error

CI

+ 1)

7a

dollar expense due to Type I error

R1

r)g(.

(7)

+ I ) ] + hN

f[-MiVl12(a

+ l)]=

(8)

c 1

Thus, the subjectivity in the choice of levels ventured for p o and p l would have been eliminated from the risk evaluation, if we knew which risk function to use. Clearly, if we knew which to use, the hypothesis test would be redundant. Accordingly, it is necessary to introduce a weighting factor, y, which physically represents the unconditional probability that the null hypothesis is true-i.e., p = po. This allows us to write

+ + hN

(9)

Note that, in any operating process, the true process mean may very well be a random variable, fluctuating over a range that embraces both po and p1. However, we are interested only in situations where the true mean gets as low as p 1 as in the illustrative example of the lot of monomer mentioned earlier. I n such an application, when p falls below po, reduction in reaction rate-if it occurs-can usually be conipensated for by increasing catalyst rate, temperature, or by manipulation of other primary operating variables. In the hypothetical situation discussed here, these devices fail when p reaches p1, as the reaction dies completely, resulting in expense CZ. Thus, po and p1 are treated mathematically as though they were dichotomous. Equation 4 is differentiable, whence we can find conditions for zero slope in terms of sample size N , and critical value %c (expressed as a) :

1)]2

We have from b(R,/Cl)/da the following (having substituted and taken logarithms) : -l/Z

M2a2N- M2N(a

+ 1)

whence

=

In

[(‘;’>gI

- 0.5

ln[(+)g] a =

g ( l - Y ) F [ - M N ’ ~ ~ ( Ul ) ]

+ vi1; exp - [-MMn”ya 2

M2N

Further, substituting B(RT/C1)/ba = 0 into b(RT/C,)/ = 0, we have

dN

f (MaN112) f[-Mh“/2(a l)]

+

whence

and

Now, Equation 10 gives admissible values for a [equals ( x , - p o ) / ( k o - P I ) ] in terms of y , g, M , and N . Equation 11 gives values for N in terms of y, M , h, a, and N . Iterative simultaneous solution of Equations 10 and 11 provides relatively quick solutions for a, giving 3c, and sample size corresponding to minimum risk, as illustrated by the example given. LITERATURE CITED (1) Dixon, W. J., Massey, F. J., “Introduction to Statistical Analysis,” McGrawHili, New York, 1951. (2) Schlaifer, Robert, “ Probability and Statistics for Business Decisions,” McGrawHill, New York, 1954. (3) Tingey, F. H., IND. ENG.CHEM.54 ( 4 ) , 36 (1962).

VOL. 5 8

NO. 2

FEBRUARY 1966

35