Environ. Sci. Technol. 1905, 19, 747-749
NOTES Regression Method in Ecotoxicology: A Better Formulation Using the Geometric Mean Functional Regression Efralm Halfon National Water Research Institute, Canada Centre for Inland Waters, Burlington, Ontario, Canada L7R 4A6
The linear regression least-squares procedure commonly used in ecotoxicology assumes that one variable is measured with error (the Ys)while the other is fiied (the X s ) . In reality, also the “fixed” X variables, for example, the octanol-water partition coefficient KO,, solubility, bioaccumulation Kb, etc., are often measured with error; thus, when investigators look for lawlike relationships, the geometric mean (GM) functional regression method should be used to compute the slope and intercept coefficients. Examples are taken from the literature, and new equations are derived for data presented by Geyer et al. (1) and Mackay (2). Theoretical considerations persuaded Mackay that a slope of one should be forced in the equation between log KB and log KO,. If the GM functional regression method is used to compute the coefficients, the linear model is n = 44, r = 0.977 log KB = 1.000 log KO,- 1.336 and no a priori constraints are necessary to compute the theoretically correct slope of 1 from Mackay’s data.
Introduction The linear regression method commonly used in ecotoxicology to derive a regression line between two variables assumes that one variable is measured with error and/or subject to natural variability (the Ys), while the other is fixed (the Xs). The regression equation is then used to fiid out what proportion of the variability in the Y’s is due to the regression and what proportion is due to random errors of observations. In the literature (I,2) many of these equations have been published during the years: examples may be the relations between the octanol-water partition coefficient (Kow)and bioconcentration KB, K O , and solubility, etc. However, users have not recognized that the “fixed” X variables are usually also measured with error or in general that the observed values are subject to additive random variation; for example, for a given compound several measures of K O , are reported, and they are often different ( I ) . The method used to compute the regression coefficients should take this fact into account, Le., that both the X s and the Y s are uncertain, since the resultant linear models are used in practical applications and in modeling efforts where lawlike, or functional, relationships with a predictive role have a large importance. In the rest of the paper I suggest an alternative method to compute the regression coefficients and further that this method should always be used when there is uncertainty in the measurement of the X s . The statistical method is not new (3-6), but as far as I know it has not been extensively used in ecotoxicology. Ricker (5) recommends the geometric mean (GM) regression when measurement variances are approximately proportional to the total variance of each variate, and he also states that it is the 0 Published 1985 by the American Chemical Society
best estimate available for short series with moderate or large variability.
Geometric Mean Functional Regression Method A statistical index often associated with paired ecotoxicological data is the correlation coefficient r. The correlation coefficient is computed as r = covariance(X,Y)/[variance(X) x ~ariance(Y)]’/~ which takes into account both the variability of the X s and the Ys and therefore the fact that both are measured with error, i.e. xi = Ai si Yi = Pi ti
+
+
with the assumptions that (7) “si and ti are independent of one another for different values of i, that they have the same distributions with zero means and variances Var(s) and Var(t),respectively, with Cov(si,ti) = 0, and that they are uncorrelaed with Xi and Yi.” These assumptions are the same in the geometric mean (GM) functional regression method. The common regression line is based on a least-squares model in which the sum of the squares in the vertical (y) direction to the data is minimized. The standard way of computing the slope of a regression line y = a + bx based on n observations is
b” = S x y / ( S x 2 )
(1)
where
Sxy = C x y - C x C y / n and
Sx2 =
Ex2 - ( C x ) 2 / n
The computation of the slope takes into account the covariance of the X s and the Y s assuming that the x’s are measured with no error, i.e. E(b’? = Cov(X,Y)/Var(X) (2) When, however, the X s are measured with error, i.e., statistically both X and Yare considered random variables, then the estimate of the slope is E(b) = Cov(X,Y)/[Var(X) + Var(s)] (3) Thus, b” (eq 2) is a biased estimator of b (eq 3), and the value of b”is always nearer to zero than of b. Numerically slope b must be computed with the following formula (3-6):
b = sign ( r ) [ s ~ ~ / ( S x ~ ) ] ~ / ~ (4) where
Sy2 = C y 2 - ( C y ) 2 / n or depending on what statistics are available Environ. Sci. Technol., Vol. 19, No. 8, 1985 747
b = b”/lrl (5) where b ” is computed according to eq 1 and r is the correlation coefficient. The sign of b is of course the same as the sign of b”and r (5). The GM functional regression minimizes the sum of the products of the vertical and horizontal distance of each point from the line. The intercept coefficient a is computed according to eq 10. Lindley (3) proved that eq 3 is the maximum likelihood estimator of slope b, but the GM linear model is so named because the slope of the regression line is also the geometric mean of the slopes of the two regression lines y = a b”x (6) and x=c+dy (7) i.e., the regression lines of Yon X (eq 6) and X (eq 7). The regression of X on Y minimizes the sum of squares of horizontal distances from the points to the line, on a graph of Y vs. X. Ricker (5) refers to b as the “geometric mean estimate of the functional regression of Yon X”.Kermack and Haldane (7) called it the “reduced major axis” while Teisser ( 4 ) called it “la relation d’allometrie”. In fact, GM slope b is computed as the geometric mean of b’’and l l d :
+
b = (b”/d)1/2 (8) Slope l / d or b‘ is chosen because the frame of reference of eq 7 is X on Y. If eq 7 is transformed to Yon X,it becomes y = -c/d + x/d and if e = -c/d and b’ = l / d , then y=e+b% The geometric mean b of b‘ = Cxy/Cy2 and b” = Cxy/Cx2 is
b = (b’b’’)1/2 = (b”/d)1/2 = [(CXY/ Ex2)/(CXY/CY2)11/2 =
(CY2/CX2)‘/2
(9)
Equations 4 and 9 are the same; intercept a is computed as usual as a = P- bX (10) where X and Y are the X and Y averages, respectively. Examples Geyer et al. (I) presented equations, derived by linear regression, to estimate the bioaccumulation of organic chemicals by the alga Chlorella using the octanol-water partition coefficient KO,. Geyer et al. realized that several KO, measures were available for the 41 compounds they used to compute a regression equation; therefore, they included them in their Table I and Figure 4, but they used an average of the measured Ko,’s as Xs. As it is wellknown in statistics, to use means instead of raw data implies losing useful information; if they had derived the equation using all 70 data points, instead of 41, the resulting linear model would have been log BF1, = 0.680 log KO, + 0.171 n = 70, r =0.913 (11) rather than n = 41, r =0.902 log BF1, = 0.681 log KO, 0.164 (12) where BF1, is the bioaccumulation factor by Chlorella on a wet weight basis after 1 day and KO, is the partition
+
748
Environ. Sci. Technol., Vol. 19, No. 8 , 1985
coefficient of the chemical between octanol and water (1). The two equations are quite similar, but they do not take into account the fact that all Kow’sare measured with error. With the geometric mean functional regression method the equation would have been log BF1, = 0.737 log KO, - 0.046 n = 41, r =0.902 (13) by using the 41 data points used by Geyer et al. ( I ) , or better log BF1, = 0.745 log KO, - 0.053 n = 70, r =0.913 (14) by using all 70 data points. In this case the slope of eq 11 is higher than those of eq 9 and 10. Mackay (2) recently reanalyzed data by Veith et al. (8) that show an obvious correlation between log KO,and log KB In his Table I1 Mackay reported 71 values, stating in the text that he excluded 27 points because of data inconsistency. Figure 1in the same paper, however, shows that he used 51 points and excluded 20. I therefore could not reproduce his calculations to obtain the linear model n = unspecified, r =0.975 log KB = log KO, - 1.32 (15) because Mackay (9) could not confirm which data he used, given the length of time elapsed since the publication of the paper. My calculations using 44 data points from his Table I1 show a similar n = 44, r = 0.977 log KB = 0.978 log KO,-- 1.239 (16) Mackay claims that a slope of 1should be forced between log KB and log KO,because of theoretical considerations. This forcing is not necessary if the geometric mean method is used to compute the linear model. In this case the model is n = 44, r = 0.977 log KB = 1.000 log KO, - 1.336 (17) and no a priori constraints are necessary to compute a higher and theoretically correct slope from the data for the regression line.
Discussion When the relationships between chemical characteristics of contaminants and biological properties such as bioaccumulation are studied with statistical methods, often the correlation coefficient r and the linear regression method are used. The correlation coefficient takes into account the variability of both the X s and the Ys, but often ecotoxicologists use the regression method (eq 1)without realizing that the independent variables, the Xs, are also often measured with error. In this paper I have suggested that the geometric mean (GM) functional regression method (eq 4) should be used instead when the coefficients of linear models are calculated. This usage is particularly important when regression results are extrapolated to new compounds and when the linear models are used for predictive purposes. Given the fact that the geometric mean fitted line takes into account errors in the X s and the Ys, the inverse equation can also be calculated easily, since b’ of X on Y is equal to l l b , where b is the slope of Yon X . When the normal method of computing the regression slope is used (eq l), b’= r2/(b’q4;thus, the two slopes are not the inverse of each other. Mackay (9) noted that, in some instances, theoretical calculations show that the slope of a regression line between two variables such as solubility and Kowand KO,and
Envlron. Sci. Technol. 1985, 19, 749-752
(2) Mackay, D. Environ. Sci. Technol. 1982, 16, 274. (3) Lindley, D. V. J . R. Stat. SOC.,Supp. 1947, 9, 218. (4) Teisser, G. Biometrics 1948, 4 , 14. (5) Ricker, W.E. J . Fish. Res. Board Can. 1973, 30, 409. (6) Sprent, P. “Models in Regression and Related Topics”; Methuen & Co.: London,-l969; pp 1-173. ( 7 ) Kermack, K. A,; Haldane, J. B. Biometrika 1950, 37, 30. (8) Veith, G. D.; DeFoe, D. L.; Bergstedt, B. V. J . Fish. Res. Board Can. 1979, 36, 1040. (9) Mackay, D.University of Toronto, personal communication, 1984.
KBshould be 1while published regression lines are always less than 1. However, as shown in eq 17 the correct use of a regression formula allows the derivation of the theoretical expected slopes without any a priori constraints. Acknowledgments I am indebted to K. Kaiser, J. M. Ribo, and D. Mackay for fruitful discussions and comments. Four anonymous reviewers also provided useful editorial comments. Literature Cited (1) Geyer, H.; Politzki, G.; Freitag, D. Chemosphere 1984,13, 269.
Received for review June 4, 1984. Revised manuscript received December 31, 1984. Accepted February 21, 1985.
Peroxyacetyl Nltrate: Comparlson of Alkaline Hydrolysis and Chemiluminescence Methods Daniel Grosjean * and Jeffrey Harrison
Daniel Grosjean and Associates, Inc., Suite 645, 350 N. Lantana Street, Camarillo, California 93010
rn Peroxyacetyl nitrate (PAN; CH3C(0)OON02)was prepared from sunlight irradiation of organic-NOx and chlorine-organic-NOx mixtures in air, and its concentration was measured by using two methods. The first method involved ion chromatography following alkaline hydrolysis of PAN to acetate, and the second method involved PAN measurements using a chemiluminescent NO, analyzer. The two methods were found to be in good agreement in the range of PAN concentrations tested, 0-400 ppb. Applications and limitations of the two methods are discussed for both laboratory and ambient measurements of PAN. Introduction Peroxyacetyl nitrate (PAN CH3C(0)OON02)is a major product of photochemical reactions involving hydrocarbons and oxides of nitrogen in the atmosphere (1). Levels of PAN in urban areas such as Los Angeles sometimes exceed 40 ppb (2,3). PAN has been studied for its phytotoxic (4) and mutagenic (5) properties and for its importance in the long-range transport of oxides of nitrogen in the troposphere (6, 7). PAN measurements in smog chamber studies of HC-NO, reactions are essential to the development of a better understanding of these complex reactions and serve as input to the testing and validation of computer kinetic models describing the atmospheric chemistry of hydrocarbon pollutants (8, 9). We recently described a portable PAN generator and its application to on-site calibration of PAN analyzers (10). The PAN output of the generator, which can be varied in the range -2-400 ppb, was determined by ion chromatography following alkaline hydrolysis of PAN to acetate: CH&!(O)OONO2 + 20HCH3COO- + NO2- + 0 2 + H2O (1) Chemiluminescent NO, analyzers have been shown to respond to, besides NO2, a number of nitrogenous pollutants including PAN (11-13). This interference from PAN may be a serious problem when ambient levels of NO2 are monitored by using chemiluminescent analyzers (2). In turn, the chemiluminescence method can be employed for measurements of PAN under certain conditions, for example, by difference following removal of PAN in alkaline -+
0013-936X/85/09 19-0749$01.50/0
solutions (reaction 1). In this paper, we compare the alkaline hydrolysis and chemiluminescence methods with PAN prepared in situ from a number of organic-NO, mixtures. Experimental Methods Test atmospheres containing ppb levels of PAN were prepared in 4-m3outdoor chambers constructed from FEP 200A Teflon film. The matrix air was supplied by an Aadco Model 737-14 purified air generator. PAN was measured by electron capture gas chromatography (ECGC) as described before (10). The EC-GC instrument was calibrated against the output of the portable PAN generator (10). The PAN generator output was measured by ion chromatography (IC) following alkaline hydrolysis of PAN to acetate in dilute KOH impingers (10). The chemiluminescent NO, analyzer, Teco Model 14 B/E, was calibrated by gas-phase titration as is described in detail elsewhere (13). The converter efficiency (molybdenum converter, T = 450 OC) was measured as part of the calibration and was 20.98. Due to evaporation of dilute KOH solutions from the impinger, sampling lines downstream of KOH impingers should be verified and cleaned or replaced periodically. A Teflon line coated with KOH becomes a very efficient PAN denuder tube. Removal of PAN by nylon filters (Ghia, 1-pm pore size, washed with deionized water prior to use) was measured directly by EC-GC and was found to be negligible (