Comment on" Parametric distributions of regional lake chemistry: fitted

(2) Perry, R. H., Chilton, C. H., Eds., Chemical Engineers'. Handbook, 5th ed.; McGraw Hill: New York, 1973. CORRESPONDENCE. Comment on “Parametric ...
0 downloads 0 Views 147KB Size
Environ. Sci. Technol. 1988, 22, 1367-1368

(3) Hearl, F. J.; Manning, M. P. Am. Ind. Hyg. Assoc. 1980, 41, 778.

Registry No. Water, 7732-18-5;atrazine, 1912-24-9;p-xylene, 106-42-3.

Literature Cited (1) Kerfoot,H.; Mayer, C. Ground Water Monit. Rev 1986,6, 74-78. (2) Perry, R. H., Chilton, C. H., Eds., Chemical Engineers’ Handbook, 5th ed.; McGraw Hill: New York, 1973.

Received for review January 4, 1988. Accepted June 6, 1988. The research was supported by Biomedical Sciences Research Grant No. 34131 and by NC WaterResources Research Institute Grant No. 86-10-70069.

CORRESPONDENCE Comment on “Parametric Distributions of Regional Lake Chemistry: Fitted and Derived” SIR In a recent article by Small, Sutton, and Milke (I), the authors have done an excellent job of analyzing water quality data. However, they appear to have misinterpreted the use of the Kolmogorov-Smirnov statistic as a measure of the goodness of fit of their derived distributions to these data. On page 197 it is stated that “As a quantitive measure of fit, the Kolmogorov-Smirnov statistic-the maximum expected difference (D) between the observed and fitted cumulative probabilities-has a value ranging from 0.13 to 0.11 for sample sizes ranging from n = 102 to n = 155 (significance level 0.05).” The authors appear to have used the approximation formula for D obtained from Massey (2) and Birnbaum (3) for n > 100, D = 1.36/r~l/~ at the 0.05 significance level. This formula is a correct one to use whenever the hypothesized distribution is completely specified-that is, when the parameters of the distribution are predetermined without usage of any information contained within the data set itself. Because the authors derive the three log normal parameters (mean, standard deviation, and lower bound) from their data, they may be fitting some of the noise in the data set, and consequently, one must expect a better fit represented by a lower value of D at the 0.05 significance level. Lilliefors ( 4 ) has performed a Monte Carlo analysis of this effect on D for normal transform distributions where two parameters (mean and standard deviation) are determined from random samples of size n. The modified values of D at the 0.05 significance level to be used for n > 30 are described by the approximation formula D = 0.886/r~l/~. With this usage, the D values for n = 102 and n = 155 must be less than 0.088 and 0.071, respectively, if the distributions are not to be rejected at the 0.05 level of significance. Since these values are based on two parameters fit to a data set and the authors have fit three parameters, their critical D values must be even lower than these, but to the writer’s knowledge the extension of Lilliefors (4) work to three parameters determined from a normal transform data set has not appeared in the statistical literature. The authors could therefore compare their maximum D values using the Lilliefors formula to test for rejection at the 0.05 level of significance. If the models are not rejected on this basis, the authors should then use another statistical test such as x2which can account for the three parameters determined from the data set by a reduction 0013-936X/88/0922-1367$01.50/0

in the degrees of freedom. Because of the controversy in choice of log normal models for fitting environmental data ( 5 ) )the authors should justify to the readers their use of these three-parameter log normal models by establishing that they are not rejected at the 0.05 level using an appropriate statistical test.

Literature Cited (1) Small, M. J.; Sutton, M. C.; Milke, M. W.; Environ. Sei. Technol. 1988, 22, 196-204. (2) (3) (4) (5)

Massey, F. J., Jr. J. Am. Stat. Assoc. 1951, 46, 68-78. Birnbaum, Z. W. J . Am. Stat. Assoc. 1952,47, 425-441. Lilliefors, H. W. J . Am. Stat. Assoc. 1967, 62, 399-402. Georgopoulos, P. G.; Seinfeld, J. H. Environ. Sei. Technol. 1982, 16, 401A-416A.

Davld 1.Mage P.O. Box 12550 50782 Kuala Lumpur, Malaysia

SIR: Mage raises two issues: (1) the applicability of the Kolmogorov-Smirnov test for evaluating the fit of the lognormal ANC distributions and (2) the need for an alternative statistical test of the distributions. On the first issue, Mage is correct. The use of the KolmogorovSmirnov test with standard test statistics is inappropriate for testing whether a set of observations is from a distribution whose parameters have been estimated from the data. The result of this misuse is that models may be accepted when they should be rejected. Additional discussion of this issue is provided by Crutcher ( I ) . The test should not have been used; however, as discussed below, the error does not alter the essential findings or conclusions of our paper. The key issue to consider in this regard is Mage’s second point. Mage contends that a statistical test of some type must be used to test the fitted distributions. This is certainly correct when distributions are used for regulatory or design purposes. Examples include applications to air pollution regulation where fitted distributions are used to determine compliance with an air quality standard (2), or the use of the upper tail of a fitted stream flow distribution for the design of a darn spillway (3). Our goals in this work are more modest: attempting to demonstrate the general lognormal character of lake chemistry data and explore possible physical mechanisms that may lead to this phenomenon. Indeed, much of our discussion focuses on reasons for deviation from the idealized distributions. This is similar to the approach to air pollution distributions taken by Bencala and Seinfeld ( 4 ) . Given the exploratory nature of the paper, it should be noted that statistical tests of significanceare likely to reject

0 1988 American Chemical Society

Environ. Sci. Technol., Vol. 22, No. 11, 1988 1367