14 Downloaded by IMPERIAL COLLEGE LONDON on February 18, 2015 | http://pubs.acs.org Publication Date: November 6, 1985 | doi: 10.1021/bk-1985-0292.ch014
The Alpha and Beta of Chemometrics George T. Flatman and James W. Mullins Environmental Monitoring Systems Laboratory, U.S. Environmental Protection Agency, Las Vegas, NV 89114
Because of the importance of their decisions and the need for statistical justification of their results, monitoring statisticians and chemometricians are being asked by their customers to use hypothesis testing with its attention to false positives and false negatives. This paper explains the prerequisite assumptions, logic flow, and customary confidence values (alpha, beta) of classical random variable hypothesis testing. An algo rithm, equating the expectations of the loss values of a false positive and a false negative, calculates the ratio of alpha to beta given a site specific beta rather than the customary arbitrarily fixed value. Two real -world examples are given to illustrate the extreme variability of estimated beta values. The conclusion states the need for hypothesis testing in monitoring activities and the need for site specific alpha and beta algorithms in hypothesis testing. Chemometrics and monitoring s t a t i s t i c s often are used to make very exacting decisions with p o t e n t i a l l y costly and contested consequences. Conclusions are presented with s t a t i s t i c a l textbook vocabulary but not always with s t a t i s t i c a l r e l i a b i l i t y . A s t a t i s t i c a l l y s i g n i f i c a n t difference may suggest the presence of p o l l u t i o n or suggest only the underestimated variance or skewness of the d i s t r i b u t i o n of the test s t a t i s t i c . In hypothesis t e s t i n g , t h i s l a t t e r case i s c a l l e d a f a l s e positive and i t s probability i s c a l l e d alpha. The power of the test to detect clean as clean i s one minus alpha. A sustained n u l l hypothesis may suggest no p o l l u t i o n or suggest only the sample size was too small. In hypothesis testing, t h i s l a t t e r case i s c a l l e d a f a l s e negative and i t s probability i s c a l l e d beta. The power of the test to detect polluted as polluted i s one minus beta. Chemometrics, l i k e monitoring s t a t i s t i c s , needs to use a l l of hypothesis testing. A l l include alpha, beta, power to detect clean as clean (1-alpha) and power to detect d i r t y as d i r t y (1-beta).
This chapter not subject to U.S. copyright. Published 1985, American Chemical Society
In Environmental Applications of Chemometrics; Breen, J., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1985.
Downloaded by IMPERIAL COLLEGE LONDON on February 18, 2015 | http://pubs.acs.org Publication Date: November 6, 1985 | doi: 10.1021/bk-1985-0292.ch014
14.
FLATMAN AND MULLINS
The Alpha and Beta of Chemometrics
185
Monitoring s t a t i s t i c s starts with a random variable design to f i n d and describe a toxic chemical s i t e by a mean and a variance. If a large and intense plume i s found, then g e o s t a t i s t i c s i s used to f i n d the s t r u c t u r a l pattern of the toxic substance i n time and/or space. If a strong c o r r e l a t i o n structure e x i s t s , then monitoring s t a t i s t i c s can draw a contour map of the toxic substance plume by means of s p a t i a l variable methods such as Kriging. As the environmental s c i e n t i s t s calculate means, variances, and contour maps, the r i s k assessors and health s c i e n t i s t s need to know how good are these s t a t i s t i c s . They are asking what alpha (probability of c a l l i n g clean "polluted"), what beta (probability of c a l l i n g polluted "clean"), and what powers of the test (1-alpha, the p r o b a b i l i t y of c a l l i n g clean "clean" and 1-beta, the p r o b a b i l i t y of c a l l i n g polluted "polluted") do the s i t e selection c r i t e r i a or clean-up c r i t e r i a have. In r e sponse to these questions, monitoring s t a t i s t i c s and chemometrics must apply meaningfully the s t a t i s t i c a l abstractions "alpha," "beta," and "powers of the tests." These are well defined for random v a r i ables. This presentation discusses especially the beta-problems plaguing monitoring s t a t i s t i c s i n random variable hypothesis t e s t i n g . The U.S. EPA s Environmental Monitoring Systems Laboratory-Las Vegas i s extending "alpha," "beta," and "powers" to s p a t i a l s t a t i s t i c s . The task i s complicated by the s h i f t s from single inference to mult i p l e inference and from random variable to s p a t i a l variable. 1
The l o g i c of the hypotheses testing was developed by R. A. Fisher for the needs of the a g r i c u l t u r a l experiment s t a t i o n . The l o g i c i s simple and obvious but should be worked out c a r e f u l l y step by step. In the rush of the workaday world, overworked s c i e n t i s t s often f a i l to think through c l e a r l y the hypotheses which they are t e s t i n g . This can lead to a powerless experiment that proves only that the number of samples taken was too small. F i r s t the hypotheses must be chosen. There are two: (1) the n u l l hypothesis denoted by H sub zero which i s assumed true u n t i l rejected, and (2) the alternative hypothesis denoted by H sub one or sub A for alternative which i s assumed f a l s e u n t i l the n u l l hypothes i s i s rejected. The l o g i c of the test requires that the hypotheses be "mutually exclusive" and " j o i n t l y exhaustive." "Mutually exclusive" means that one and only one of the hypotheses can be true; " j o i n t l y exhaustive" means that one or the other of the hypotheses must be true. Both cannot be f a l s e . The n u l l hypothesis i s to r e f l e c t the status quo, which means that f a i l u r e to reject i t i s only continuation of a present l o s s . For the a g r i c u l t u r a l station, f a i l ure to improve the status quo means that the old brand of seed, p e s t i c i d e , or f e r t i l i z e r i s used when, i n f a c t , a new and better brand i s available. This i s a status quo loss of productivity (e.g. 10 percent lower y i e l d ) , but one which the farmer unknowingly accepts. The loss from a customary a l t e r n a t i v e hypothesis might be 90 percent of the crop destroyed by disease or insects that the old s t r a i n was immune to, or a new f e r t i l i z e r or pesticide that i s found to leave a carcinogenic residue exceeding an action l e v e l i n 90 percent of the crop. Obviously, the status quo loss i s smaller i n the extreme case than the potential alternative loss. Management science s t a t i s t i c s
In Environmental Applications of Chemometrics; Breen, J., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1985.
ENVIRONMENTAL APPLICATIONS OF CHEMOMETRICS
186
Downloaded by IMPERIAL COLLEGE LONDON on February 18, 2015 | http://pubs.acs.org Publication Date: November 6, 1985 | doi: 10.1021/bk-1985-0292.ch014
often uses worst case expected losses i n evaluating a l t e r n a t i v e s . If the decision maker can tolerate the worst case l o s s , then he can use that a l t e r n a t i v e . The expected value of the losses w i l l be very important i n the discussion of beta. W i l l the loss from c a l l i n g a polluted area clean be minimal among the losses associated with the tests of p o l l u t i o n hypotheses? For monitoring toxic substances, such as dioxln cleanup, assume we have calculated an χ and s for each unit area or rectangular panel p o t e n t i a l l y needing cleanup and have been given an action level_of 1 ppb. The action l e v e l i s a constant and has no variance. The χ and s are computed from a f i e l d t r i p l i c a t e of a composite of subsamples equally spaced from a uniform grid covering the panel. The n u l l hypothesis says "no difference," and represents the status quo. Hopefully, nonpolluted or less than 1 ppb i s the status quo, and p o l luted or equal to or larger than 1 ppb i s the exception.
Let x^ be the mean i n ppb from panel i SJL be the standard deviation i n ppb from panel i N u l l Hypothesis:
This panel i s clean Ho:
Alternative Hypothesis:
. 1 ppb
TS be a test s t a t i s t i c which approximates t-distribution aft a
b
e
a Student's
t a
b l e value of t - d i s t r i b u t i o n for appropriate degrees of freedom ( d f ) , alpha (a), confidence l e v e l for a one t a i l test fc
CV be the c r i t i c a l value of d f a from the t-table. df
=
3 - 1 or 2
α
=
.05
TS
=
-1
x
±
s
CV
-
i
dfta
If (TS < CV), there i s no reason to reject the n u l l hypothesis and i f (TS >^CV), the n u l l hypothesis i s rejected, implying the a l t e r n a t i v e hypothesis i s true.
In Environmental Applications of Chemometrics; Breen, J., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1985.
Downloaded by IMPERIAL COLLEGE LONDON on February 18, 2015 | http://pubs.acs.org Publication Date: November 6, 1985 | doi: 10.1021/bk-1985-0292.ch014
14.
FLATMAN AND MULL1NS
The Alpha and Beta of Chemometrics
187
In Figure 1 the decision space i s represented by the bottom horizontal l i n e which i s divided by the v e r t i c a l l i n e representing the c r i t i c a l value (CV). The segment of the l i n e less than CV repre sents the part of the decision space sustaining the n u l l hypothesis. The segment of the l i n e equal to or greater than CV represents the part of the decision space rejecting the n u l l hypothesis and accept ing the alternative hypothesis. The upper horizontal l i n e represents the r e a l l i n e and the value of the test s t a t i s t i c (TS). Again the l i n e i s divided by the value of CV. The height or ordinate of the curve represents the probability that the test s t a t i s t i c (TS) takes on the value of abscissa. The b e l l shape of the curve shows that TS has a high probability of taking the abscissa values near the center of each d i s t r i b u t i o n (Ho or Ha) and a low p r o b a b i l i t y of taking the values i n the t a i l s . The dashed shaded area represents the d i s t r i b u t i o n of the TS under the n u l l hypothesis, and the dotted shaded area represents the d i s t r i b u t i o n of the TS under the a l t e r native hypothesis. C l a s s i c a l s t a t i s t i c s assumes i d e n t i c a l l y d i s tributed and equal variances; therefore, the shaded areas are the same shape with equal spreads but d i f f e r e n t locations (different means). Note the decision space (bottom l i n e ) i s discrete but the " r e a l world" data of the r e a l l i n e and shaded d i s t r i b u t i o n s overlap. This overlap gives r i s e to the p o s s i b i l i t y of error labeled i n Figure 1 as alpha, a dashed area right of CV, and beta, a dotted area l e f t of CV. Alpha and beta appear equal i n Figure 1. Their r e l a t i v e size i s the concern of this paper. Since the area (cumulative p r o b a b i l i t y ) of a probability d i s t r i b u t i o n must add to one, the area of no error (correct decision) i s represented by the dashed area below CV one-minus-alpha and the dotted area above CV one-minus-beta. Then: Alpha (a) i s the probability of c a l l i n g a clean panel polluted or the type I error and shown as dashed area to the right of CV i n Figure 1. Beta (β) i s the p r o b a b i l i t y of c a l l i n g a polluted panel clean or the type I I error and shown as dotted area to the l e f t of CV i n Figure 1. One-minus-alpha (1 - a) i s the probability of c a l l i n g a clean panel clean and shown as dashed area to the l e f t of CV i n Figure 1. One minus beta (1 - β) i s the p r o b a b i l i t y of c a l l i n g a polluted panel polluted, i s c a l l e d the power of the t e s t , and i s shown as dotted area to the right of CV i n Figure 1. Now the conventional value for alpha i s .05, giving .95 proba b i l i t y of c a l l i n g a clean panel clean. The probability of beta (β) depends on the true value of the mean i n the alternative d i s t r i b u t i o n and on the testing assumptions of: (1) equal standard deviations and (2) i d e n t i c a l frequency d i s t r i b u t i o n s . Since the difference between
In Environmental Applications of Chemometrics; Breen, J., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1985.
Downloaded by IMPERIAL COLLEGE LONDON on February 18, 2015 | http://pubs.acs.org Publication Date: November 6, 1985 | doi: 10.1021/bk-1985-0292.ch014
188
ENVIRONMENTAL APPLICATIONS OF CHEMOMETRICS
F i g u r e 1· The lower l i n e r e p r e s e n t s the d i s c r e t e d e c i s i o n s p a c e , the upper l i n e r e p r e s e n t s the r e a l v a l u e s t h a t the t e s t s t a t i s t i c (TS) may t a k e , and the o v e r l a p p i n g shaded areas r e p r e s e n t the p r o b a b i l i t y t h a t the t e s t s t a t i s t i c t a k e s t h e s e r e a l v a l u e s under each h y p o t h e s i s .
In Environmental Applications of Chemometrics; Breen, J., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1985.
Downloaded by IMPERIAL COLLEGE LONDON on February 18, 2015 | http://pubs.acs.org Publication Date: November 6, 1985 | doi: 10.1021/bk-1985-0292.ch014
14.
FLATMAN AND MULLINS
189
The Alpha and Beta of Chemometrics
the n u l l and alternative hypotheses i s the difference between the dispersion of no pollutant and the dispersion of a pollutant, i t seems reasonable that there would be a d i f f e r e n t standard deviation and frequency d i s t r i b u t i o n , thus contradicting the assumptions of hypothesis testing; however, answering this problem i s beyond the scope of this paper. Assuming equal alternative standard deviation and d i s t r i b u t i o n , an acceptable beta (β) has been c l a s s i c a l l y set i n USDA's USFS Experiment Station work at .20 or l e s s . However, t h i s beta, four-times-larger than alpha, i s based on the assumption that type II error has lower loss value. What are the loss values of: (1) cleaning a panel that i s already clean (type I) and (2) leaving d i r t y a panel that i s i n fact polluted (type II)? Management science s t a t i s t i c s uses expected loss to make p r o b a b i l i s t i c losses comparable. E(LOSS)
•
probability of loss χ value of loss
For type I:
E(LOSS)
=
α χ value of loss from committing I error
For type I I :
E(LOSS)
•
β χ value of loss from committing a type II error.
a type
With a fixed sample s i z e , the magnitudes of α and β are inversely related; that i s , i f alpha decreases by moving the CV to the l e f t then beta increases, and i f alpha increases, then beta decreases. Increasing the sample size would reduce both alpha and beta, but samples and especially t h e i r analyses cost money. I n t u i t i v e l y the minimal actual loss should occur when the expected losses are equal. So the r e l a t i v e alpha and beta should be found from equating expected loss from type I error with the expected loss from type II error. Ε (type I loss)
-
Ε (type II loss)
α χ (loss from type I error)
-
β χ (loss from type II e r r o r )
β : α :: (loss from type I e r r o r ) : ( l o s s from type II error) For example, i n a s o i l cleanup, the loss from type I error or clean ing a clean panel might be the cost of scraping up s i x inches of s o i l within the panel, trucking the s o i l away, and disposing of the s o i l ; probably a cost measured i n hundreds to thousands of d o l l a r s . The loss from a type II error or leaving a polluted panel would have a wide range of potential costs from nothing to the adverse human health e f f e c t s . I suggest that r e a l i s t i c a l l y the health e f f e c t s ' cost i s at least as high as the cost of the unneeded cleanup, or i n the magnitude of hundreds to thousands of d o l l a r s . Mathematically t h i s means: β : α :: (loss from type I e r r o r ) : ( l o s s from type II error) β : α :: (cost of cleaning a panel):(cost of human's health)
In Environmental Applications of Chemometrics; Breen, J., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1985.
ENVIRONMENTAL APPLICATIONS OF CHEMOMETRICS
190
β : α :: (hundreds of dollars):(hundreds of d o l l a r s ) =^> β » α
Downloaded by IMPERIAL COLLEGE LONDON on February 18, 2015 | http://pubs.acs.org Publication Date: November 6, 1985 | doi: 10.1021/bk-1985-0292.ch014
Note that i f beta equals alpha, beta i s one fourth of the t r a d i t i o n a l l y allowed type I I error ( i . e . , .05 Instead of .20). This shows that the unthinking use of textbook examples or t r a d i t i o n a l c o n f i dence levels can be dangerous to the environment and public health. P o l l u t i o n monitoring s t a t i s t i c s must have i t s own beta calculations. Next apply this analysis of the hypotheses testing logic to the proposed monitoring of a RCRA dump s i t e using the Fisher-Behrens Test (another Student t - d i s t r i b u t i o n ) . I f a clean ground-water sample i s diagnosed as polluted (type I e r r o r ) , the corrective action i s resam p l i n g and reanalysis which would cost a few hundred d o l l a r s , but diagnosing a polluted ground-water sample as clean (type I I error) may allow a ground-water p o l l u t i o n plume to grow to a size that w i l l require a cleanup of thousands or tens of thousands of d o l l a r s . Mathematically this means: β : α :: (loss from type I e r r o r ) : ( l o s s from type I I error) β : α :: (cost of resampling and analysis):(cost of ground-water cleanup) β : α :: (hundreds):(tens of thousands) β : α :: 1 : 100 Note that i n this case, beta should be one one-hundredth of alpha. Again the unthinking use of textbook examples or t r a d i t i o n a l c o n f i dence levels can be dangerous to the environment and public health. Even the previously calculated beta for the s o i l cleanup example i s two orders of magnitude too large. In conclusion, chemometrics, l i k e monitoring s t a t i s t i c s , r e quires an alpha and beta which d i f f e r from c l a s s i c a l values. Espe c i a l l y beta must be calculated by s t a t i s t i c a l expectations for each a p p l i c a t i o n . Conventional values of beta or values of a previous p o l l u t i o n s i t e may be incorrect by orders of magnitude f o r the cur rent s i t e . S t a t i s t i c s i s not a tool that can be used by rote; thor ough understanding and s i t e - s p e c i f i c thought i s e s s e n t i a l . The alpha and beta of monitoring s t a t i s t i c s i s s i t e - s p e c i f i c . I f alpha i s an acceptable type I error for the test, then one minus alpha i s an acceptable power for c a l l i n g clean "clean," and i f beta i s an accept able type I I error for the test, then one minus beta i s an acceptable power for c a l l i n g polluted "polluted." A l l four values must be thought out. RECEIVED July 17, 1985
In Environmental Applications of Chemometrics; Breen, J., et al.; ACS Symposium Series; American Chemical Society: Washington, DC, 1985.