Environ. Sci. Technol. 1990, 24, 927-929
ponents designated as carcinogens have been adequately shown to cause pulmonary cancer in animals via inhalation (2). Others have commented on “an absence of metaplastic changes in the bronchi of nonsmokers” (3)that would be expected if ETS or its components were causally associated with lung cancer. Lofroth et al. also cited the National Research Council and others as reporting “excess lung cancer deaths in individuals exposed to ETS. Those conclusions were based on studies that have been widely criticized because they associated risk not with ETS but only with marriage of nonsmokers to smokers, they were subject to significant misclassification errors, and they failed to consider relevant confounding factors (4-10). Speculation about the properties of ETS and its components was not supported by the authors’ findings.
Literature Cited (1) Lofroth, G.; Burton, R. M.; Forehand, L.; Hammond, S. K.; Sella, R. L.; Sweidinger, R. B.; Lewtas, J. Characterization of environmental tobacco smoke. Environ. Sei. Technol. 1989,23,610-614. (2) Aviado, D. M. Suspected pulmonary carcinogens in environmental tobacco smoke. In Indoor and Ambient Air Quality;Perry, R., Kirk, P. W., Eds.; Selper, Ltd.: London, 1988; pp 141-146. (3) Wynder, E. In Roundtable Discussion. Preu. Med. 1984, 13, 730-746. (4) Ahlborn, W.; Uherla, K. Passive smoking and lung cancer: Reanalyses of Hirayama’s data. In Indoor and Ambient Air Quality;Perry, R., Kirk, P. W., Eds.; Selper, Ltd.: London, 1988; pp 169-178. (5) Balter, N. J.; Schwartz, S. L.; Kilpatrick, S. J.; Witorsch, P. Causal relationship between environmental tobacco smoke and lung cancer in non-smokers: A critical review of the literature. Proc.--APCA Annu. Meet. 1986,8680.9. (6) Kilpatrick, S. J.; Viren, J. Age as a modifying factor in the association between lung cancer in non-smoking women and their husbands smoking status. In Indoor and Ambient Air Quality; Perry, R., Kirk, P. W., Eds.; Selper, Ltd.: London, 1988; pp 195-202. (7) Lee, P. N. An alternative explanation for the increased risk of lung cancer in non-smokers married to smokers. In Indoor and Ambient Air Quality;Perry, R., Kirk, P. W., Eds.; Selper Ltd.: London, 1988; pp 149-158. (8) Lee, P. N. Lung cancer and passive smoking: Association an artefact due to misclassification of smoking habits? Toxicol. Lett. 1987, 35,,,157-162. (9) Letzel, H.; Bliimner, E.; Uberla, K. Meta-analysis on passive smoking and lung cancer effects of study selection and misclassification of exposure. In Indoor and Ambient Air quality; Perry, R., Kirk, P. W., Eds.; Selper Ltd.: London, 1988; pp 293-302. (10) Uberla, K. Lung cancer from passive smoking: Hypothesis or convincing evidence? Znt. Arch. Occup. Environ. Health 1987,59,421-437.
Alan W. Katzensteln Katzenstein Associates 51 Rockwood Drive Larchmont, New York 10538
Comment on “Prediction of Aqueous Solubility of Organic Chemicals Based on Molecular Structure. 2. Application to PNAs, PCBs, PCDDs, etc.” SIR: A recent series of papers published in this journal (1-3) provide a means of estimating aqueous solubility on
the basis of structural information. A model that is based 0013-936X/90/0924-0927$02.50/0
on a combination of group contribution terms and connectivity terms treats a large set of non-hydrogen-bonding organic compounds. Unfortunately, there are some errors and omissions in the data set. These errors and omissions bias the conclusions in favor of the model proposed in those papers. Also certain data have been accepted or rejected on the basis of screening by the very model that they are supposed to verify. Furthermore, because it fails to take into account the effects of crystallinity upon solubility, the model is conceptually weak and should not be considered seriously. A large number of models have been proposed for the estimation of the aqueous solubility of organic compounds. Those models based entirely upon group contribution approaches are doomed to failure because they cannot account for the differences in the solubilities of isomeric groups of compounds. These differences are usually due to the effects of the crystallinity of the solute on its solubility. The role of crystallinity in decreasing solubility is easily quantitated by the van’t Hoff equation. This has been called the crystal-liquid solubility ratio, the ideal solubility of the crystal, and the fugacity ratio by various workers. Its applicability has been amply demonstrated (4-1 1). As a first approximation, the van’t Hoff equation predicts that the room temperature solubility of a rigid molecule will decrease 10-fold for every 100 “C increase in melting point, Thus, for isomeric groups of compounds having similar values for log KO,,connectivity indexes, polarizability, etc., the log of the solubility should be inversely proportional to the melting point. This is clearly evident from the data in Table I. From the above it would be expected that an approach such as that of Nirmalakhandan and Speece, which ignores the effect of crystallinity, would result in a systematic underestimation of the solubilities of high-melting isomers. This fact is obscured in the papers because of six types of error. 1. Incorrect values are reported for some of the highmelting solutes. 2. Certain compounds have been omitted. 3. Some of the data have been “preferentially accepted or rejected ... on the basis of the premise that ... it is reasonable to use the model to screen the data.” 4. Most of the data are for liquids. 5. Correction factors, such as the constant for dioxins, inadvertently partially compensate for the effects of the crystallinity of the group. 6. Some of the solubilities listed represent the hypothetical supercooled liquid while others do not. In other words, the data have already been corrected for the effects of crystallinity. It is clear from the data in the tables that the model proposed in ref 3 yields gross errors, even for the relatively simple compounds selected. This is because a model for solubility that does not account for crystallinity is simply inappropriate. The papers give the impression that the solubility estimations are based upon only three variables, when in reality there are as many variables as there are functional groups. This is because the polarizability parameter CP is really an accumulation of a large number of group contribution values, Le., CP = a(C1) + b(H) + c(F) + d(1) + f(ketone or aldehyde) + g(dioxin) + h(NH2) + i(NH) + j(N0,) + l(doub1e bond) + m(a1kane or alkene) where the lower case letters are regression-determined coefficients for the corresponding atoms and groups that appear in parentheses. It has already been demonstrated
0 1990 American Chemical Society
Environ. Sci. Technol., Vol. 24.
NO. 6, 1990 927
Table I compound phenanthrene anthracene chrysene triphenylene naphthacene dibenzo-p-dioxin 1-chloro 2-chloro 2,3-dichloro 2,7-dichloro 2,8-dichloro 1,2,4-trichloro 1,2,3,4-tetrachloro 1,2,3,7-tetrachloro 1,3,6,8-tetrachloro 2,3,7,8-tetrachloro 1,2,3,4,7-pentachloro 1,2,3,4,7,8-hexachloro 1,2,3,4,6,7,8-heptachloro octachloro
solubility, mol/L
log s
log s bred)
residual reported actual
-5.16
-5.40 -5.40b -7.07b -7.07 -7.07b
0.24 -0.98 -0.99 0.34 -1.53
-5.16 -6.38 -8.06 -6.73 -8.60
4.21 x 10-7 8.76 X lo4 2.50 x 10-9 4.70 X 1.91 x 1.35 X 5.90 X 1.48 X 6.60 X 2.61 X 1.70 X 1.30 X 9.94 x 6.00 X 3.31 X 1.13 X 5.64 X 1.61 X
log s (Speece)
-6.73
Chlorinated Dibenzo-p-dioxins" -5.33 -5.72 -5.87 -5.24 -7.23 -7.83 -6.00 -7.18 -7.58 10+ -8.77 -7.14 lo4 -8.89 10-10 -9.00 -7.09 -11.22 -9.48 -7.79 lo-'' -9.95 lo-'* -11.25 -8.88 -11.79
0.34
-5.00 -5,376 -5.37 -5.91b -5.91 -5.91b -6.50' -7.10 -7.10b -7.10b -7.10b -7.64 -8.18' -8.70 -9.24'
lo4 lo" lo4
-0.33 -0.35 -0.50 -1.32 -1.92 -1.27 -1.08 -1.67 -1.79 -1.90 -4.12 -1.84 -1.77 -2.55 -2.55
0.13 -0.09 -0.04 0.01 -0.15 -0.18
The authors appear to have used the hypothetical solubility of the liquid instead of the actual solubility for all the dibenzo-p-dioxins, However, they have used actual solubility values for the polycyclic aromatic hydrocarbons. bAssumed to be similar to the value calculated for isomer. 'Estimated from the equation in ref 3. Table I1 mp, "C
log K O W '
log s (expt)
log s bred)"
log s (predIb
A
B
phenanthrene anthracene' chrysene' triphenylene naphthacene'
101.0 216.2 255.0 198.1 357.0
4.49 4.49 5.66 5.66 5.66
-5.16 -6.38 -8.06 -6.73 -8.60
-5.40 -5.40 -7.07 -7.07 -7.07
-4.70 -5.85 -7.41 -6.84 -8.43
-0.24 0.98 0.99 -0.34 1.53
0.46 0.52
dibenzo-p-dioxin 1-chlorod 2-chloro 2,3-dichlorod 2,7-dichloro 2,8-dichlorod 1,2,4-trichlorod 1,2,3,4-tetrachloro 1,2,3,7-tetrachlorod 1,3,6,8-tetrachlorod 2,3,7,8-tetrachlorod 1,2,3,4,7-pentachloro 1,2,3,4,7,8-hexachlorod 1,2,3,4,6,7,8-heptachloro octachloro average absolute error
123.0 105.5 89.0 164.0 210.0 151.0 129.0 190.0 175.0 219.0 305.0 196.0 273.0 265.0 332.0
-5.00 -5.37 -5.37 -5.91 -5.91 -5.91 -6.50 -7.10 -7.10 -7.10 -7.10 -7.64 -8.18 -8.70 -9.24
-5.07 -5.72 -5.55 -7.07 -7.53 -6.94 -7.46 -8.80 -8.65 -9.09 -9.95 -9.58 -11.06 -11.70 -13.07
0.32 0.34 0.49 1.31 1.91 1.27 1.08 1.66 1.78 1.90 4.12 1.84 1.76 2.54 2.55 1.45
compound
Chlorinated Dibenzo-p-dioxins 4.64 -5.33 5.46 -5.72 5.46 -5.97 6.23 -7.23 6.23 -7.83 6.23 -7.18 6.96 -7.58 7.69 -8.77 7.69 -8.89 7.69 -9.00 7.69 -11.22 -9.48 8.41 9.13 -9.95 9.84 -11.25 10.55 -11.79
residual
0.64 -0.11 0.17
0.25 0.00 0.31 0.15 0.29 0.24 0.12 0.02 0.24 0.08 1.27 0.09 1.11
0.04 1.27 0.37
"Predicted from the method of Nirmalkhandan and Speece (3). bPredicted from the method of Yalkowsky and Valvani (7). 'Data taken from ref 7. dData taken from ref 10.
(4-11) that the aqueous solubilities of compounds such as those in refs 1-3 can be estimated with good accuracy by using only two parameters: melting point and log KOw. This approach has also been shown to be applicable to a number of drugs and pollutants having highly complex organic structures (7-9). Ironically, some of these references are cited by Nirmalakhandan and Speece (1-3). Yalkowsky (7) proposed a convenient relationship for estimating aqueous solubilities for organic compounds. This equation
log S , , ( M / L ) = -O.Olmp - log KO, + 0.8
(1)
is not based upon regression and has been shown to be 928
Environ. Sci. Technol., Vol. 24, No. 6, 1990
applicable to a wide variety of structures. The solubilities calculated by the above equation and their corresponding residuals are given in Table 11. It is clear that the values calculated by eq 1 are closer to the observed values than those calculated by the regression-based equation of Nirmalakhandan and Speece (1, 3 ) . Literature Cited (1) Nirmalakhandan, N. N.; Speece, R. E. Enuiron. Sci. Technol. 1988, 22, 328. (2) Nirmalakhandan, N. N.; Speece, R. E. Enuiron. Sci. Technol. 1988, 22, 1349. (3) Nirmalakhandan, N. N.; Speece, R. E. Enuiron. Sci. Technol. 1989, 23, 708.
(4) Yalkowsky, S. H.; Valvani, S. C.; Mackay, D. Residue Rev. 1983, 85, 43. (5) Yalkowsky, S. H.; Orr,J. R.; Valvani, S. C. Znd. Eng. Chem. Fundam. 1979,18, 351. (6) Yalkowsky, S. H.; Valvani, S . C. J. Chem. Eng. Data 1979, 24, 127. (7) Yalkowsky, S. H.; Valvani, S . C. J. Pharm. Sci. 1980, 69, 912. ( 8 ) Pinal, R. Ph.D. Thesis, University of Arizona, 1988. (9) Yalkowsky, S. H.; Banerjee, S.; Pinal, R. Estimation of the Aqueous Solubilities of Organic Compounds; Marcel Dekker: New York, in preparation. (10) Shiu, W. Y.; Doucette, W.; Cobas, F. A. P. C.; Andren, A.; Mackay, D. Enuiron. Sci. Technol. 1988, 22, 651. (11) Yalkowsky, S. H. In Techniques of Solubilization of Drugs; Marcel Dekker: New York. 1981.
S. H. Yalkowsky,” D. S. Mlshra Department of Pharmaceutical Sciences University of Arizona Tucson, Arizona 85721
S I R A number of points concerning the contents of ref 1 have been raised by Yalkowsky and Mishra. Before presenting our comments to their obserations, we would like to reiterate the rationale behind our approach as indicated in our first paper (2) in this series: “... derive reliable predictive equations for aqueous solubility with molecular descriptors which are simple to formulate and fundamental in nature, could be derived or calculated purely from structural information, would encode physicochemical properties, and are consistent and calculable for environmentally relevant compounds without any experimental inputs”. One of the major concerns is regarding the PCDD data set. As pointed out by them, we have inadvertently used the “hypothetical solubility” for this subset. We have now reworked the entire data set using the “actual solubility” for the PCDDs, and the overal result remains practically unchanged. However, the indicator for PCDDs in the 9 term has to be changed from -1.823 to -3.332. Apart from this correction, the overall model (eq 6) as reported in ref 1 remains the same. Further, eq 7,which correlated all
470 compounds in our data set, also remains the same in terms of model parameters and statistical goodness of fit: log S = 1.564
+ 1.627 (Ox) - 1.372 (Ox’) + 1.000 @
n = 470;r = 0.990;r2 = 0.98; SE = 0.355 The experimental data and the fitted data as per this revised model are now presented in Table I, along with the results of Yalkowsky and Mishra. [In the case of 2,3,7,8-tetrachlorodibenzo-p-dioxin, the error appears to be unreasonably high. However, Shiu et al. reported four different experimental values for this compound (0.000 62, 0.000 984,0.00006,and 0.000 024 6 mmol/m3) and recommended the third value. If the average of the first two values is used, the fitting error decreases to 0.45.1 While we concede the comment of Yalkowsky and Mishra about the inability of log KO,,connectivity indexes, polarizability, etc. to predict the effect of crystallinity on the solubility of isomers, such descriptors could be quite useful as a preliminary model in estimating solubility in the absence of any experimental data such as melting point. They would enable estimation of solubility even before making the chemical. In addition, descriptors such as molecular connectivity indexes encode information on melting point also. For instance, in the case of PCDDs, the connectivity index used in our study correlates with melting point with an r of 0.87. Another comment raised by Yalkowsky and Mishra is that certain compounds have been omitted and some data have been “... preferentially accepted or rejected ...”. Regarding the PCDD data set covered in our study, all the data for the 15 PCDDs reported in the original reference were included. Nine of these were used as the training compounds and the remaining six were used as testing compounds. However, only the testing set was published (As suggested by the editors of Environmental Science and Technology, the training set was not published due to space limitations.). Further, no chemical was preferentially included nor excluded; rather, in the case of chemicals with multiple data, the model was used to pick the best value. The justification for doing this was outlined in the paper under Discussion: in the absence of any other method to resolve and reconcile such discrepant data, the use of a correlation model to screen multiple data is a rational and productive approach, rather than to use, blindly, all the
Table I. Comparison between log KO,and Connectivity Approachesa exP log
s
dioxin 1-chlorodioxin 2-chlorodioxin 2,3-dichlorodioxin 2,7-dichlorodioxin 2,8-dichlorodioxin 1,2,4-trichlorodioxin
1,2,3,4-tetrachlorodioxin 1,2,3,7-tetrachlorodioxin 1,3,6,8-tetrachlorodioxin 2,3,7,8-tetrachlorodioxin 1,2,3,4,7-pentachlorodioxin 1,2,3,4,7,8-hexachlorodioxin heptachlorodioxin octachlorodioxin average absolute error a
-5.33 -5.72 -5.87 -7.23 -7.83 -7.18 -7.58 -8.77 -8.89 -9.00 -10.22 -9.48 -9.95 -11.25 -11.79
calculated log S approach 1 approach 2 -5.07 -5.72 -5.55 -7.07 -7.53 -6.94 -7.46 -8.80 -8.65 -9.09 -9.95 -9.58 -11.06 -11.70 -13.07
-6.44 -6.99 -6.99 -7.53 -7.53 -7.53 -8.22 -8.77 -8.77 -8.77 -8.77 -9.32 -9.87 -10.42 -10.97
fitting error approach 1 approach 2 -0.26 0.00
-0.32 -0.16 -0.30 -0.24 -0.12 0.03 -0.24 0.09 -0.27 0.10 1.11 0.45 1.28 0.33
1.11 1.27 1.12 0.30 -0.30 0.35 0.64 0.00
-0.12 -0.23 -1.45 -0.16 -0.09 -0.83 -0.82 0.58
Approach 1, log K, approach. Approach 2, connectivity approach. Environ. Sci. Technol., Vol. 24, No. 6, 1990 929