Environ. Sci. Technol. 1992, 26, 1023-1030
Multivariate Characterization and Modeling of Polychlorinated Dlbenzo-p-dioxins and Dibenzofurans Mats Tyskllnd, *,t K ell Lundgren,t Chrlstoffer Rappe,t Lennart Erlksson,t Jorgen Jonsson,z Mlchael SJostrom,$ and Ulf G. Ahlborg
6
Institute of Environmental Chemistry, University of Umel, S-901 87 Urnel, Sweden, Research Group for Chemometrics, University of Umel, S-90 1 87 Urnel, Sweden, and Institute of Environmental Medicine, Karoliska Institute, S-104 01 Stockholm, Sweden ~~~
A multivariate physicochemical characterization of 136 tetra- to octachloro-substituteddibenzo-p-dioxins (PCDDs) and dibenzofurans (PCDFs) is reported. By principal component analysis (PCA), an overview of similarities and differences between congener groups and substitution patterns is obtained. Two examples of quantitative structureactivity relationships (QSARS) are reported. The first model concerns the photolytic half-lives for the PCDDs in different organic solvents. The second model correlates various dibenzofuran structures with their potential as inducers of aryl hydrocarbon hydroxylase (AHH). The QSARs are accomplished using partial least-squares modeling with latent variables (PLS). The analysis can be used for a ranking of congeners for future chemical or biological testing. An alternative and informationally efficient way to select candidates for toxicological testing is also discussed.
Introduction We are, in our daily lives, surrounded by an enormous number of chemical compounds for which the chemical and biological activity patterns are hardly known. In practice, the testing of all existing chemicals is neither economically nor experimentally feasible. Predictions of physicochemical properties and biological activities of closely related compounds could give important knowledge of compounds for which less information exists. In this respect, quantitative structure-activity relationships (QSARs) may represent a useful tool (1).A QSAR can be described as a mathematical expression relating the variation in biological or physical activity in a series of compounds to the variation in their chemical structures. The basic assumption in QSAR is that the variation in biological activities and physicochemical properties is influenced by the same underlying factors and that changes in chemical structure may causally alter the particular “activity” under investigation (2). Most physical and biological systems are complex, and it is unlikely that only a few variables will suffice to describe them. Thus, it is necessary to characterize the compounds with a multitude of physicochemical variables describing their properties. It is expected that such a battery of structural variables together will capture the underlying, hidden factors that correlate with the response of interest. Usually the biological activity of a compound is related to lipophilicity and steric and electronic effects, but variables originating from chromatographic, spectroscopic, and reactivity measurements have been found to be informative as well. Polychlorinated dibenzo-p-dioxins and dibenzofurans have gained an increasing interest during the past two decades. These large groups of chlorinated aromatic compounds are ubiquitous in the environment. PCDDs +Instituteof Environmental Chemistry. *Research Group of Chemometrics. 8 Institute of Environmental Medicine. 0013-936X/92/0926-1023$03.00/0
and PCDFs have been identified in emissions from a number of sources and in all compartments of the ecosystem (3). Among the 210 congeners (mono- to octachlorinated congeners) of PCDDs and PCDFs, there is large variation in physical and chemical properties (4-8) as well as biological activities (9, 10). Extensive research has been focused on these compounds during recent years and in particular on 2,3,7,8-TCDD, which has been found to be an extremely toxic compound when tested on laboratory animals (IO). The chemical structures and numbering of the carbon atoms of the PCDDs and PCDFs are given in Figure 1. QSARs have earlier been established for both the PCDDs and the PCDFs to model both toxicity and different physicochemical properties. Several studies report correlations between the potential of polychlorinated aromatic compounds as inducers of aryl hydrocarbon hyand their toxic response in vivo. The droxylase (A”) responses in toxicity show a similar and characteristic pattern for polychlorinated dioxins, dibenzofurans, and biphenyls (PCBs), as well as other chlorinated and/or brominated aromatic compounds (11-17). Quantitative structure-activity relationships have also been reported and based on multiple linear regression analysis (18-21). Other examples are found in the literature where linear regression techniques have been used to correlate photolysis half-lives (22) and infrared spectroscopy data (23) with LD50values, i.e., LD,, for guinea pig and chicken embryos. The primary step toward the construction of QSAR models for a series of related compounds, such as the PCDDs and PCDFs, is the characterization of their chemical and structural properties, preferably by compilation of as many relevant physicochemical variables as possible (2). Depending on which method is used, different requirements regarding the proportion of objects and variables have to be met. Regression analysis methods, e.g., multiple regression (MR) and stepwise multiple regression (SMR), are the most widely used methods for establishing QSARs. However, these two regression methods suffer from two major limitations (24). The first being that they require the number of objects (here, compounds) to be substantially larger than the number of independent variables (here, chemical variables). Second, these methods assume that the independent variables are uncorrelated. This latter condition is rarely fulfilled in practice. Recently, other methods have gained increased attention for QSARs. These are projedion-based methods, such as principal component analysis (PCA) combined with multiple regression, so-called principal component regression (PCR), and partial least-squares projections (PLS) (25, 26). These methods can deal with a large number of chemical variables, which may in fact exceed the number of objects (27). Further, PCA can be used to delineate general trends and irregularities in the physicochemical descriptors. This may serve not only as a basis to understand which chemical factors are correlated with
0 1992 American Chemical Society
Environ. Scl. Technol., Vol. 26, No. 5, 1992 1023
Table I. Physicochemical and Biological Variables Used in Multivariate Characterization and Modeling of PCDDs and PCDFs I
CI
CI
PCDD
CI
PCDF
Flgure 1. Structural formulas for the PCDDs and PCDFs.
a certain behavior but also to identify compounds that are likely to behave differently. PLS is used to establish QSARs. Neither PLS nor PCA requires that the used variables are orthogonal to each other or the absence of “missing data”. These properties of the PCA and PLS methods make them especially suitable for establishing QSARs for groups of compounds such as the PCDDs and PCDFs. Multivariate characterization by physicochemical properties of compounds for use in QSARs has been frequently reported. I t has been applied to several other classes of compounds, including polycyclic aromatic hydrocarbons (PAHs) (281,halogenated hydrocarbons (29), monosubstituted benzenes (30), solvents (31), peptides (32),and DNA sequences (33). The use of the PLS method in QSAR modeling has earlier been demonstrated in a series of studies, e.g., QSAR for the binding of PAHs to the rat liver TCDD receptor (28,341,and in a strategy for ranking environmentally occurring chemicals (35). To date, only a limited number of studies report physical and chemical measurements on a larger set of PCDD and PCDF congeners. Most studies found in the literature are focused on the highly toxic 2,3,7,8-substituted congeners, Le., 7 PCDDs congeners and 10 PCDF congeners. However, in recent years, both experimental and semiempirical data have been obtained for a number of basic properties which makes it possible to attempt a multivariate chemical characterization. The aims of the present article are (i) to report on the multivariate characterization of PCDDs and PCDFs, (ii) to develop multivariate QSARs for these two classes of compounds, (iii) to assess the predictive capabilities of the QSARs, and (iv) to discuss an alternative selection of candidates for chemical and biological testing. The analysis is restricted to congeners with four to eight chlorine atoms since these compounds are the most extensively studied. Materials and Methods Physicochemical Properties. Prior to the construction of a QSAR it is necessary to decide which chemical factors might influence the activity of interest. If any knowledge of importance of certain features is available, this information should be utilized. Here, the analysis is based on literature data obtained for larger sets of congeners. Here we consider variables describing electronic, steric, and lipophilic factors to be relevant. The variables used in the principal component analysis are given in Table I. The first four variables (1-4) are data on molecular properties obtained by semiempirical molecular orbital calculations by Koester et al. (36). Variable 5 is the octanol/water partition coefficient (log K d ) reported by Sijm et al. (37). Two variables are based on gas chromatographic retention times (6 and 7) (38). Variable 8 is the number of chlorine atoms attached to the molecule. The halogenation pattern is described by variables 9-16. These variables were obtained by giving the eight positions open for chlorine substitution an indicator variable of “1”or “0” depending on the presence or absence of chlorine substitution. Information from infrared spectra of the tetra- to octa-PCDDs by Grainger et al. is summarized in variables 1024
variable
CI
Envlron. Sci. Technol., Vol. 26, No. 5, 1992
X Matrix ionization potential heat of formation dipole moment lowest unoccupied molecular orbital octanol/water partition coefficients GC retention times Supelco SP-2330 GC retention times CPCIL8 no. of chlorine atoms substitution pattern, position of chlorine atoms” GC-FTIR spectroscopy* Y Matrix photolytic half-lives AHH induction ECbo
1 2 3 4 5 6 7 8 9-16 17-20
21 22
“Carbon atom no. 1 (Cl) = variable 9 (V9), C2 = V10, C3 = V11, C4 = V12, C6 = V13, C7 = V14, C8 = V15, and C9 = V16. *V17 = no. of absorption bands 1300-1600 cm-l, V18 = bands in the region 1270-1330 cm-’, V19 = number of absorption bands 900-1000 cm-I, and V20 = wavelength of the tri-ring bending band.
17-20 and is presently only available for the dioxins (39). Variable 17 is the number of absorption bands in the region 1300-1600 cm-’, variable 18 is the bands of the asymmetric C-0-C stretch in the region 1270-1330 cm-’, and variable 19 is the number of absorption bands in the region 900-1000 cm-l. The last IR variable (20) is the wavelength of the “tri-ring bending” band. Response Data. The two responses used in the QSAR modeling are the photolytic half-lives ( t l i z ) for some PCDDs in organic solvents (8, 40, 4 1 ) and the aryl hydrocarbon hydroxylase (A”) induction (EC,) obtained by H41IE rat hepatoma cell bioassay (15, 20) for some PCDFs. The photolytic half-lives have been normalized to the value of OCDD in order to exclude influence of different experimental conditions. The ECbOvalues for A” induction are expressed as relative to the most active inducer, the 2,3,7,8-TCDD. The precision of the rat hepatoma cell bioassay has been extensively studied (42)and found to be fairly high. Thus, the coefficients of variation associated with these estimates are generally in the range of 5-15%. In addition, the bioassay was found to be reproducible among different laboratories. Data Analysis. Principal Component Analysis (PCA). Multivariate projection methods, such as PCA (26),provide a means by which compiled physicochemical data may be analyzed and interpreted. PCA combines the included variables to a few underlying descriptive dimensions, summarizing the systematic information present in the data matrix. The primary scope of PCA is to get an overview of the dominant patterns or major trends in the data. Interpretation of such patterns is possible, since the general features are easy to survey when visualized in pictures. Thus, this procedure reveals information concerning relationships between objects and variables. In this study the PCDDs and PCDFs constitute the objects. Mathematically, PCA corresponds to a decomposition of the data matrix x into means ( @ , scores (tka),loadings ( P a k ) , and residuals (elk)according to the equation A xik
=
f k
+
xtk$ak a=l
+
(1)
Here, x k k are the physicochemical descriptors compiled in the multivariate characterization of the polychlorinated dioxins and dibenzofurans. The index i is used for the compounds, and the index k for the physicochemical de-
0.8 0.7 0.6 ~-
2 TCDFs
PeCDFs
N
p
c
+
=
0)
-1
4
I
TCDDs
PeCDDs
HxCDDs
0
H ~ C D DO ~
0
-3 -4
-2
0
2
4
6
8
Principal component 1
Flgure 2. Score plot of princlpal component 2 versus component 1 for the PCDDs and PCDFs. The 136 congeners are distributed in two prominent groups, the PCDDs in the lower part of the plot and the PCDFs in the upper part.
scriptors. The score (ti,) describes the location of the object (compound) i along the ath principal component (PC). The absolute value of a loading tells how much the variable k contributes to a PC. The sign of a loading shows whether the corresponding variable is positively or negatively correlated to a PC. The first calculated PC reflects the linear combination associated with the main variation in the data, the second PC explains the next largest variance, and so on. The resulting object plot, the score plot, is an optimal projection showing the relation between the different objects, here congeners. Objects close to each other in the score plot have similar physicochemical characteristics. The corresponding variable plot, the loading plot, shows how the variables are related to each other and how they influence the different PCs. The number of meaningful principal components, A in the equation above, is determined with a significance test called cross-validation (25).The cross-validation technique provides a means of obtaining optimal predictive power without overfitting of the model. Partial Least-Squares Projections to Latent Structures (PLS). To establish QSARs for the two examined classes of compounds, we used PLS. The PLS method calculates latent variables for the two matrices plus a relation between them. In the present example there are two tables of data, X and Y . In the Y block we have only one variable, i.e., the photolytic half-life or AHH activity, and in the X block there are 20 (PCDDs) and 16 (PCDFs) variables. The relation between the X and Y blocks of data is formed as a model of latent variables, U and T. The predictions of new objects (compounds in the test set) are obtained from the X data of these compounds through the "inner relation" existing between U and T. Analogous to PCA, the number of significant terms of the PLS model is determined by cross-validation (25),in order to secure that the components are statistically significant and to avoid overparameterization ("overfitting") of the model. The PCA and PLS methods have been thoroughly described in the literature (24-27). Prior to the actual data analysis, the data were preprocessed by means of autoscaling and mean centering. All calculations were performed on an IBM PS2 comR package (Umetri AB, puter by the S I M C A - ~ . ~program S-901 24 Umea, Sweden). The SIMCA program is a menu-driven software for multivariate data analysis. Results and Discussion Principal Component Analysis of 136 PCDDs and PCDFs. The PCA of'the 136 dioxins and dibenzofurans, based on variables 1-8, resulted in a two-dimensional
1
0.5 -0.4 --
0.3 -0.2 --
:I: 1
OCDD
2
-0.3 1 -0.5
3
+ 6
4
5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
7
0.4
Loading vector 1
Flgure 8. Loadlng plot for the PCDDs and PCDFs; the second vector is plotted versus the first. The Variables and thelr corresponding variable numbers are given in Table I.
model. In the model the first and second principal components (PCs) explained 52% and 24%, respectively, of the variance in the physicochemical variables. As seen in Figure 2, where the second PC is plotted versu8 t e first (the score plot), the 136 compounds are distribute in two prominent groups. The dioxins constitute one group in the lower part of the plot, whereas the other group in the upper part contains all the dibenzofurans. When moving from the left to the right within these groups, one first encounters the tetrachloro congeners and then, successively, the penta-, hexa-, hepta-, and octachloro congeners. Thus, it appears that the first principal component reflects the degree of chlorination among the compounds, while the second component separates the two classes of compounds. The variable loadings (see Figure 3) show that the first PC is influenced by lipophilicity (i.e,, indirectly the degree of chlorination) of the compounds, as log Kd (variable 5 ) has a high loading for this component. Moreover, the closely related variables describing GC retention times (6 and 7) and the total number of chlorine atoms (variable 8) are located near log Kd'The LUMO parameter (variable 4) also strongly contributes to the first PC, though with an inverse correlation with regard to the lipophilicity variables. The second PC, which separates the dibenzofurans from the dioxins, is primarily influenced by the heat of formation (variable 2) and the ionization potential (variable 1). This means that the dioxins are distinguished from the dibenzofurans mainly by the differences occurring in these two variables. On the basis of these findings, a natural way to proceed was to divide the dioxins and dibenzofurans into the two groups indicated by the score plot in Figure 2. Such a subdivision also allowed the use of more chemical descriptors found in the literature, but only covering either group of compounds. Principal Component Analysis of the Dioxins. In the P C analysis of the 49 studied dioxins (1-49), the physicochemical descriptors matrix was extended with four FTIR variables (variables 17-20). In addition to this, the battery of qualitative 1/0 variables depicting the chlorine substitution pattern was added (variables 9-16). Thus the collected number of physicochemical descriptors for the dioxins was raised to 20 variables in all. For the numbering of the congeners, see Table 11. The principal component analysis of the 49 X 20 data matrix gave a two-dimensional model accounting for 46% (36% + 10%) of the variation in the data. The resulting score plot is shown in Figure 4. The resolution of the dioxins is much better, compared with the corresponding plot of the overall analysis (Figure 2). Perhaps the most interesting feature in Figure 4 is the striking tendency of
k
Envlron. Sci. Technol., Vol. 26, No. 5, 1992
1025
4
I
3 1
4 20 33
3
"C
91
54.
-4 22
-3 1
-5
-4
-2
0
2
4
6
-4
8
-2
0
Flgure 4. Score plot of principal component 2 versus component 1 for the 49 PCDDs. The congeners are dlstributed in five prominent groups, the tetrachlorodloxins to the left and then, successively, the penta-, hexa-, hepta-, and octachlorodioxin. For the numbering of the congeners, see Table 11. 0.4
0.2
0.3
I
,
0.7 7 0.6
-0 4
20
18
16
0.4
N
I
~~
10
8
13
l6
i 1:
-0.5
6
Flgure 6. Score plot of principal component 2 versus principal component 1 for the PCDFs. The 87 congeners are distributed in five groups, with the tetrachlorodibenzofurans to the left and the octachlorodibenzofurans to the right. For the numbering of the congeners, see Table 11.
12 19
4
2
Principal component 1
Principal component 1
-0.1 :'O
f
+
7
11 12
8
1
13
5
14 15
~
-0.4 -0.5
-0.4
-0.3 -0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
Loading vector 1
Flgure 7. Loading plot for the PCDFs, with the second vector plotted versus the first. The variables and the corresponding variable numbers are given in Table I.
and 1,2,3,9-substitutedcongeners to slightly separate from the other compounds. The 1,2,3,9-substituted PCDF congeners can be found in the upper part of the plot in contrast to the 1,4,6,9-substituted congeners found for the PCDDs. The 1,4,6,9-substituted PCDF congeners are situated in the middle of the projection. The corresponding loadings plotted in Figure 7 show that variables like log Kd (variable 5), the ionization potential (variable l),and the number of chlorine atoms (variable 8) dominate the first dimension, whereas the second dimension is mainly influenced by the behavior of the substitution pattern (variables 9-16). Thus, the variable distribution pattern for the PCDFs resembles that of the PCDDs. Structure-Activity Relationships. Photolytic Degradation. The chemical descriptors for the PCDDs tested are listed in Table I and the corresponding photolytic half-lifes in Table 111. The PLS model is based on a randomly selected training set of seven different congeners and validated by forming a test set of the remaining four congeners. The data was log transformed prior to the calculations. Two significant components describe 91% of the variation in the Y variable, i.e., the variation in the photolytic half-lives. The first component explains 75% and the second 16% of the variation in photolytic activity. Figure 8 shows the plot between predicted/ calculated and observed half-lives. A linear regression of the data yielded a r2 of 0.86. The congeners with the shortest half-lives, 2,3,7,8-TCDD (22), 1,2,7,8-TCDD (13), and 1,2,3,7,8,9-HxCDD(44),can be recognized in the lower left corner. The congeners with the longest half-lives are found in the upper right corner, i.e., 1,2,4,6,7,9-HxCDD(45) and 1,2,3,4,6,7,9-HpCDD(48). Note that the 2,3,7,8-substituted
Table 11. Numbering of the 136 Tetra- to Octachlorinated P C D D s and P C D F s
23 24 25 26 27 28 29 30 31 32 33
34 35 36
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
no.
congener
half-lives
2
1,2,3,6-TCDDb 1,2,7,8-TCDDb 2,3,7,8-TCDDb 1,2,3,7,8-PeCDD' 1,2,3,6,7,8-HxCDDb 1,2,3,6,7,9-HxCDDb 1,2,3,7,8,9-HxCDDC 1,2,4,6,7,9-HxCDDC 1,2,3,4,6,7,8-HpCDD' 1,2,3,4,6,7,9-HpCDDb 1,2,3,4,6,7,8,9-OCDDb
0.57 0.34 0.18 0.56 0.48 0.52 0.34 2.94 1.10 2.26 1.00
22 28 41 42 44 45 47 48 49
114
115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131
88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105
Table 111. Photolytic Half-Lives Relative to O C D D for Some PCDDs"
13
106 107 108 109 110 111 112 113
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87
TCDDs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
"From Tysklind et al. (8),Nestrick et al. (40),and Dobbs (41). bTraining set congeners. Test set congeners.
congeners have higher photolytic activity than the corresponding 1,4,6,9-substituted congeners. This substitution effect is clearly observed if the 1,2,3,4,6,7,8-HpCDD(47) and 1,2,3,4,6,7,9-HpCDD (48) are compared. In the PLS loading plots (not shown) the first dimension is dominated by substitution pattern-related variables, namely, IR absorption bands (variables 17-20), chlorine position 9 (variable 16), and GC retention time (variables 6 and 7). The second PLS dimension is related to size, the important parameters being the number of chlorine atoms (variable 8) and the heat of formation (variable 2). The predicted value of the tl12indicates two types of substitution patterns with different behavior. The most stable chlorination configuration is found in the 1,4,6,9substitution pattern. When at least three of the four
132 133 134 135 136
0.6 0.4
1
45 4s
p9
)47
-o.6 -0.8
t
'
-0.8
22 -0.6
-0.4
-0.2
0
0.2
0.4
0.6
Calculated / predicted log t 1 / 2
Figure 8. Correlation between calculatedlpredicted and observed photolytic half-lives (log t,,2 relative OCDD) for 11 PCDDs. r 2 = 0.86. The training set congeners are marked with filled symbols and the test set congeners with open symbols. For the numbering of the congeners, see Table 111.
positions are occupied, the molecule has special properties in this respect. The model predicts 1,2,4,9-TCDD, 1,2,6,9-TCDD, 1,2,7,9-TCDD, 1,2,4,6,9-PeCDD,1,2,3,6,9PeCDD, and 1,2,4,6,7,9-HxCDDto be congeners with expected long half-lives. Congeners with the expected highest photolytical activity have at least three out of four chlorine atoms in the 2,3,7,8-positions. Example are 1,2,3,7-TCDD, 1,3,7,8-TCDD, 2,3,7,8-TCDD, 1,2,3,6,8PeCDD, 1,2,3,7,8-PeCDD, 1,2,3,4,7,8-HxCDD, and 1,2,3,4,6,7,8-HpCDD. Enzyme Induction. The log ECWvalues for the AHH induction are listed in Table IV. The data were treated in two different ways. In the first case all objects were Envlron. Sci. Technol., Vol. 26, No. 5, 1992
1027
no.
congener
51 52 57 79 80
81 83 84 90 95 96 98 99 101 102 109 111 115 119 122 126 131
-1
AHH
1,2,3,6-TCDF' 1,2,3,7-TCDFb 1,2,4,8-TCDFb 2,3,4,6-TCDFC 2,3,4,7-TCDFb 2,3,4,8-TCDFb 2,3,6,8-TCDFc 2,3,7,8-TCDFb 1,2,3,4,8-PeCDFb 1,2,3,7,8-PeCDFC 1,2,3,7,9-PeCDF" 1,2,4,6,7-PeCDFc 1,2,4,6,8-PeCDFb 1,2,4,7,8-PeCDFb 1,2,4,7,9-PeCDFb 1,3,4,7,8-PeCDFb 1,3,6,7,8-PeCDFb 2,3,4,7,8-PeCDF' 1,2,3,4,7,8-HxCDFC 1,2,3,6,7,8-HxCDFb 1,2,4,6,7,8-HxCDFb 2,3,4,6,7,8-HxCDFb
-6.16 -5.52 -5.22 -4.30 -2.40 -2.77 -4.16 -1.74 -2.46 -1.55 -3.10 -3.66 -4.16 -3.17 -2.72 -1.35 -2.04 -0.55 -0.70 -1.31 -2.77 0.96
57. -6 -7
tJ -7
Calculated log
-2
-1
0
1
EC50
included in the PLS model, and in the second case the data set was arbitrarily divided into a training set and a test set. In all, 22 congeners of PCDFs were included in both models. Figure 9 shows the calculated versus observed AHH induction value for the first case when the training set included all 22 congeners. The two significant components describe 88% (r2= 0.88) of the variance in the response variable. The first component explains 70% and the second 18% of the variance. The 2,3,7,8-substituted PCDFs are situated in the upper right corner of the plot with 2,3,4,7,8-PeCDF, 1,2,3,4,7,8-HxCDF,and 2,3,4,6,7,8HxCDF exhibiting the highest AHH induction potencies. To test the validity of the PLS model, we recalculated the QSAR on a training set of 14 congeners (filled symbols in Figure 10) and used a test set of eight PCDF congeners (open symbols) for which the log ECS0values were predicted. A two-component model explained 76% of the variance in Y. Figure 10 shows the resulting plot between calculated/predicted and observed log ECm value. Thus, the correlation has a good predictive power (1.2 = 0.71) and can be considered for further predictive and screening purposes. Again, no PLS loading plots are shown, but variables of importance in the first model dimension are ionization Environ. Sci. Technol., Vol. 26, No. 5, 1992
-6
-5
-4
-3
I
-2
-1
0
1
predicted log EC50
Figure 10. Correlation between caiculated/predicted and observed log ECS0value for AHH induction. r 2 = 0.71. The training set congeners are marked wlth fliied symbols and the test set congeners with open symbols. For the numbering of the congeners, see Table IV.
Flgure 9. Correlation between calculated and observed log EC5, value for induction of AHH for 22 FCDFs. r2 = 0.88. All congeners included in the training set. For the numbering of the congeners, see Table IV.
1028
81 :126 101 96 83 790U
is
Calculated
52.
-3
122 111
051
-7
83.
-4
.
131.
-7
I -5
., .
0
,115
52
51.
-6
84.095
102 90..so
-6
109
99.. 79
~
57
0
' I
"119 109
99
"From Bandiera et al. (15) and Mason et al. (20). bTraining set congeners. Test set congeners.
-i
.
0 ,
Table IV. log EC,, for AHH Induction Relative to 2,3,7,8-TCDD for Some PCDFs in in Vitro Testsa
potential (variable l ) , heat of formation (variable 2)) LUMO (variable 4), and total number of chlorine atoms (variable 8). In the second, dimension substitution pattern variables play a more significant role and the positions 1, 3, 7, and 8 (variables 9, 11, 14, and 15) show the highest modeling capability.
Conclusions The PCDDs and PCDFs have been intensively studied over the last decades, but their mechanism of action is still poorly understood (43). The multivariate approach using principal component analysis and partial least-squares modeling of latent variables may give the possibility to make more general statements of specific groups of compounds, e.g., ranking of photolytic activity or toxic potential. In the analysis of all 136 PCDD and PCDF congeners studied, the two main groups of compounds as well as the different homologous groups within each class are clearly discriminated. The plots of loadings reveal that this separation is due to variables such as degree of chlorination, heat of formation, and ionization potential. This Fist step in the multivariate characterization is based on a limited number of variables, and the "resolution" between different substitution patterns is not as pronounced as in the case of separate analysis of the two classes. In the analysis of both PCDDs and PCDFs the 2,3,7,8-substituted congeners are separated from the others in the score plots. This is interesting as these congeners exhibit chemical and biological properties that in many respects clearly are separated from the 2,3,7,8-unsubstituted ones. This work presents a tool with which it is possible to analyze several chemical variables of PCDDs and PCDFs simultaneously. Two types of chemical variables can be identified. The first type is the size-related variables, e.g., number of chlorine, GC retention time, and ionization potential, These variables contain information about the separation of the different homologous groups. The second type of variables describes the substitution pattern. The "extreme patterns" are 2,3,7,8 and 1,4,6,9 substitution for the PCDDs and 2,3,7,8 and 1,2,3,9 substitution for the PCDFs. The substitution pattern variables are of importance for the predictive power of the two models, and variables reflecting the substitution pattern (e.g., IR and UV) should therefore be further developed in the future. The composition of the training set is crucial for all QSARs. It should contain a sufficient number of compounds to span the whole physicochemical domain in which predictions are sought. Suitable training sets can be selected from the PC score plots based on a multivariate characterization of the class of compounds. This selection
can be done with, for example, a factorial design. Such a strategy has been shown to give small training sets with good predictive power for a number of classes of compounds (35). In future work we will discuss the design applied to the PCDDs and PCDFs. The use of toxic equivalent factors for so-called “dioxin-like”effects has been discussed during recent years. The equivalency factors are constructed from a merging of several biological and toxicological effects. So far this approach includes PCDDs and PCDFs, but the inclusion of some PCBs has been suggested (44).A developed QSAR model, based on physicochemical descriptors not only for the PCDDs/PCDFs and PCBs but also for other related compounds, may be used to predict the relative toxicities for a large number of substances. In a long-term perspective, interaction effects between various compounds may also be included. The focus on the 2,3,7,8-substituted congeners of PCDDs and PCDFs in series of experiments has generated a lot of data on these compounds. The concern for and scientific interest in these highly toxic congeners are of course of importance. However, we believe that the current knowledge must be supplemented with tests on congeners not expected to give high toxic responses. If candidates for biological testing are selected in an efficient way, general conclusions can be drawn also for not-yet tested but related substances. It is not possible to know in advance exactly the mechanism behind the effects studied. Thus, as many chemical and physical descriptors as possible should be added to the characterization. The PCDDs and PCDFs have been studied for many years. However, the conclusion that can be drawn from this pilot study is that more basic physical and chemical data are needed in order to understand both physical and biological mechanisms. Preferably, such data should be collected on a much larger set of congeners than that available today, in order to cover the structural domains of the PCDDs and PCDFs in a balanced and representative manner. Registry No. 1, 30746-58-8; 2, 71669-25-5; 3, 67028-18-6; 4, 53555-02-5; 5, 71669-26-6; 6, 71669-27-7; 7, 71669-28-8; 8, 71669-29-9; 9, 71665-99-1; 10, 40581-90-6; 11, 67323-56-2; 12, 40581-91-7; 13, 34816-53-0; 14, 71669-23-3; 15, 62470-54-6; 16, 33423-92-6; 17, 71669-24-4; 18, 50585-46-1; 19, 62470-53-5; 20, 40581-93-9; 21, 40581-94-0; 22, 1746-01-6; 23, 67028-19-7; 24, 39227-61-7; 25, 71925-15-0; 26, 71925-16-1; 27, 82291-34-7; 28, 40321-76-4; 29, 71925-17-2; 30, 71925-18-3; 31, 82291-35-8; 32, 71998-76-0; 33, 82291-36-9; 34, 58802-08-7; 35, 82291-37-0; 36, 82291-38-1; 37, 58200-66-1; 38, 58200-67-2; 39, 58200-68-3; 40, 39227-28-6; 41, 57653-85-7; 42, 64461-98-9; 43, 58200-69-4; 44, 19408-74-3; 45, 39227-62-8; 46, 58802-09-8; 47, 35822-46-9; 48, 58200-70-7; 49, 3268-87-9; 50, 24478-72-6; 51, 83704-21-6; 52, 83704-22-7; 53, 62615-08-1; 54, 83704-23-8; 55, 71998-73-7; 56, 83719-40-8; 57, 64126-87-0; 58, 83704-24-9; 59, 83704-25-0; 60, 83710-07-0; 61, 70648-18-9; 62, 58802-20-3; 63, 83704-26-1; 64, 70648-22-5; 65, 83704-27-2; 66, 70648-16-7; 67, 92341-04-3; 68, 83704-28-3; 69, 57117-36-9; 70, 71998-72-6; 71, 83690-98-6; 72, 57117-35-8; 73, 64560-17-4; 74, 66794-59-0; 75, 82911-58-8; 76, 70648-19-0; 77, 83704-29-4; 78, 83704-33-0; 79, 83704-30-7; 80, 83704-31-8; 81, 83704-32-9; 82, 57117-39-2; 83, 57117-37-0; 84, 51207-31-9; 85, 57117-38-1; 86, 58802-19-0; 87, 57117-40-5; 88, 83704-47-6; 89, 83704-48-7; 90, 67517-48-0; 91, 83704-49-8; 92, 57117-42-7; 93, 83704-51-2; 94, 83704-52-3; 95, 57117-41-6; 96, 83704-53-4; 97, 83704-54-5; 98, 83704-50-1; 99, 69698-57-3; 100, 70648-24-7; 101,58802-15-6;102,71998-74-8;103,70648-23-6;104, 69433-00-7;105,70872-82-1;106,83704-36-3;107,83704-55-6;108, 70648-15-6; 109,58802-16-7;110,70648-20-3;111, 70648-21-4; 112, 83704-35-2;113,57117-43-8;114,67481-22-5;115,57117-31-4;116, 79060-60-9; 117,69698-60-8;118,91538-83-9;119,70648-26-9;120, 91538-84-0; 121,92341-07-6;122,57117-44-9;123,92341-06-5;124, 75198-38-8; 125,72918-21-9;126,67562-40-7;127,75627-02-0;128, 69698-59-5; 129,71998-75-9;130,92341-05-4;131,60851-34-5;132,
67562-39-4; 133,70648-25-8;134,69698-58-4;135,55673-89-7;136, 39001-02-0; AHH, 9037-52-9.
Literature Cited (1) Wold, S.; Dunn, W. J., III; J. Chem. Inf. Comput. Sci. 1983, 23, 6. (2) Hellberg, S. A Multivariate Approach to QSAR. Thesis, University of Umea, Sweden, 1986. (3) Rappe, C.; Andersson, R.; Bergqvist, P.-A.; Brohede, C.; Hansson, M.; Kjeller, L. 0.;Lindstrom, G.; Marklund, S.; Nygren, M.; Swanson, S. E.; Tyklind, M.; Wiberg, K. Waste Manage. Res. 1987, 5, 225. (4) Rordorf, B. F. Chemosphere 1986, 15, 1325. (5) Shiu, W. Y.; Doucette, W.; Gobas, F. A. P. C.; Andren, A,; Mackay, D. Environ. Sci. Technol. 1988, 22, 651. (6) Fiedler, H.; Hutzinger, 0.;Timms, C. W. Toxicol.Environ. Chem. 1990,29, 157. (7) Van den Berg, M.; Olie, K.; Hutzinger, 0. Toxicol. Environ. Chem. 1985,9, 171. (8) Tysklind, M.; Rappe, C. Chemosphere 1991, 23, 1365. (9) Ahlborg, U. G. Chemosphere 1989, 19, 603. (10) WHO. Environmental Health Criteria 88, IPCSInternational Programme of Chemical Safety; WHO: Geneva, 1989. (11) Poland, A.; Glover, E. Mol. Pharmacol. 1973, 9, 736. (12) Poland, A,; Knutson, J. C. Annu. Rev. Pharmacol. Toxicol. 1982, 22, 517. (13) Safe, S. Annu. Rev. Pharmacol. Toxicol. 1986, 26, 371. (14) Safe, S.; Sawyer, T.; Bandiera, S.; Safe, L.; Zmudzka, B.; Mason, G.; Romkes, M.; Denomme, M. A.; Fujita, T. Banbury Rep. 1984,18, 135. (15) Bandiera, S.; Sawyer, T.; Romkes, M.; Zmudzka, B.; Safe, L.; Mason, G.; Keys, B.; Safe, S. Toxicology 1984,32,131. (16) Safe, S.; Fujita, T.; Romkes, M.; Piskorska-Pliszczynska, J.; Homonko, K.; Denomme, M. A. Chemosphere 1986,15, 1657. (17) Romkes, M.; Safe, S.; Piskorska-Pliszczynska, J.; Fujita, T. Chemosphere 1987, 16, 1719. (18) Prokipcak, R. D.; Golas, C. L.; Manchester, D. K.; Okey, A. B.; Safe, S.; Fujita, T. Chemosphere 1990, 20, 1221. (19) Mason, G.; Farrell, K.; Keys, B.; Piskorska-Pliszczynska, J.; Safe, L.; Safe, S. Toxicology 1986, 41, 21. (20) Mason, G.; Sawyer, T.; Keys, B.; Bandiera, S.; Romkes, M.; Piskorska-Pliszczynska,J.; Zmudzka, B.; Safe, S. Toxicology 1985, 37, 1. (21) Long, G.; Mckinney, J.; Pedersen, L. Quant. Struct.-Act. Relat. 1987, 6, 1. (22) Mamantov, A. Environ. Sci. Technol. 1984, 18, 808. (23) Grainger, J.; Reddy, V. V.; Patterson, D. G., Jr. Chemosphere 1988, 18, 981. (24) Wold, S.; Dunn, W. J., 111; Hellberg, S. Environ. Health Perspect. 1985, 61, 257. (25) Wold, S. Technometrics 1978, 20, 379. (26) Wold, S.; Albano, C.; Dunn, W. J., III; Edlund, U.; Esbensen, K.; Geladi, P.; Hellberg, S.; Johansson, E.; Lindberg, W.; Sjijstrom,M. Multivariate Analysis in Chemistry;Kowalski, B. R., Ed.; NATO AS1 Series C138; Riedel: Dortdrecht, Holland, 1984; pp 4-96. (27) Stile, L.; Wold, S. Progress in Medical Chemistry: Ellis, G. P., West, G. B., Eds.; Elsevier: Amsterdam, 1988; p 291. (28) Norde’n, B.; Edlund, U.; Johnels, D.; Wold, S. Quant. Struct.-Act. Relat. 1983, 2, 73. (29) Eriksson, L.; Jonsson, J.; Hellberg, So;Lindgren, F.; Skagerberg, B.; Sjostrom, M.; Wold, S.; Berglind, R. Environ. Toxicol. Chem. 1990, 9, 1339. (30) Skagerberg,B.; Bonelli, D.; Clementi, S.; Cruciani, G.; Ebert, C. Quant. Struct.-Act. Relat. 1989, 8 , 32. (31) Carlsson, R.; Lundstedt, T.; Albano, C. Acta Chem. Scand. 1985, B39, 79. (32) Wold, S.; Eriksson, L.; Hellberg, S.; Jonsson, J.; Sjostrom, M.; Skagerberg, B.; Wikstrbm, C. Can. J . Chem. 1987,65, 1814. (33) Jonsson, J.; Eriksson, L.; Hellberg, S.; Lindgren, F.; Sjostrom, M.; Wold, S. Acta Chem. Scand. 1991,45, 186. (34) Johnels, D.; Gillner, M.; Norde’n, B.; Toftgard, R.; Gustafsson, J.A. Quant. Struct.-Act. Relat. 1989, 8, 83. Envlron. Sci. Technol., Vol. 26, No. 5, 1992
1029
Envlron. Sci. Technol. W92, 26,1030-1035
(35) Eriksson, L. A strategy for ranking environmentally occurring chemicals; Thesis, University of Umea, Sweden, 1991. (36) Koester, C. J.; Hites, R. H. Chemosphere 1988, 17, 2355. (37) Sijm, D. T. H. M.; Wever, H.; de Vries, P. J.; Oppenhuizen, A. Chemosphere 1989,19, 263. (38) Tysklind, M.; Lundgren, K., Institute of Environmental Chemistry, University of Umeti, Sweden. Unpublished data. (39) Grainger, J.; Reddy, V. V.; Pattersson, D. J., Jr. Chemosphere 1989, 19, 249. (40) Nestrick, T. J.; Lamparski, L. L.; Townsend, D. I. Anal. Chem. 1980,52, 1865. (41) Dobbs, A. J.; Grant, C. Nature 1979, 278, 8.
(42) Tillitt, D. E.; Giesy, J. P.; Ankley, G. T. Environ. Sei. Technol. 1991, 25, 87. (43) Biological Basis for Risk Assessment of Dioxins and Related Compounds; Gallo, M. A., Scheuplien, R. J., Van der Heijden, K. A., Eds.; Banbury Report 35; Cold Spring Harbor Laboratory Press: New York, 1991. (44) Safe, S. Crit. Rev. Toxicol. 1990, 21, 51.
Received f o r review September 12, 1991. Revised manuscript received January 21,1992. Accepted January 22,1992. Financial support f r o m the Center for Environmental Research in Umea (CMF)is gratefully acknowledged.
High-Resolution Mass Spectrometry Method for the Analysis of 3-Chloro-4-( dichloromethyl)-5-hydroxy-2 (5H)-furanone in Waters M. Judlth Charles,* Gong Chen, Rohinl Kannigantl, and G. Dean Marbury Department of Environmental Sciences and Engineering, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599-7400
A method was developed for the of 3-ch10ro4-(dichlorometh~l)-5-h~drox~-2(5H)-furanone (MX) in drinking waters that is based on electron-ionization highresolution mass spectrometry and selected-ion monitoring of the m / z 199 and 201 ions of the methyl derivative of MX and the use of isotopically labeled benzoic acid as the internal standard. High-resolution mass spectrometry is needed to provide sensitivity, selectivity, and confidence in the identification of MX in water extracts. In the analysis of matrix spikes and replicate samples, the method provided recoveries of 102-108% and a precision of 8% RSD. Extrapolation of a signa1:noise-ratio that was measured on a chlorinated water sample provides a detection limit of 0.6 ng/L at a S:N of 3:1,thereby enabling the detection of pptr levels of MX in waters.
Introduction Environmental interest in the presence of the compound 3-chloro-4-(dichloromethyl)-5-hydroxy-2(5H)-furanone (MX) in drinking waters is growing for two reasons. One reason is that MX is a potent bacterial mutagen. The other reason is that MX has been shown to account for a significant (20-50%) portion of the mutagenic activity in chlorinated drinking waters (1-4). It appears that the mutagenic activity of MX is affected by the pH and temperature-sensitive equilibrium between the closed hydroxyfuranone form of MX and its tautomer, an open oxobutenoic acid (Figure 1). The mutagenic activity of the open form of MX parallels that of aflatoxin. The activity of MX in the Ames assay has been reported as 25.9 and 46.3 revertanta/ng ( 1 , 3 , 5 )and the activity of aflatoxin in the Ames assay has been reported as 22.62 revertants/ng (6).
Analytical methods that exist to measure the concentration of MX in waters rely on derivatization to methylate the hydroxyl group on the molecule followed by highresolution gas chromatography/low-resolution mass spectrometry (HRGC/LRMS) by selected-ion monitoring (SIM) of fragment ions at m/z 147, 199, 201, and 203. Quantification is accomplished by using either external standardization (7) or internal standardization (3). Internal standardization is a more accurate approach because any changes in the response of the analyte, due to the presence of other compounds or changes in the ion source conditions, are normalized to the response of the internal standard. In the method proposed by Hemming et al. (3), 1030
Environ. Scl. Technol., Vol. 26, No. 5, 1992
mucobromic acid was used as the internal standard and the response factorof MX to mucobromic acid was generated by measuring the response of 0.40p g / ~MX to 3.66 pg/L mucobromic acid. By definition, this method is semiquantitative because quantification is achieved on the basis of the response of a single standard of MX to mucobromic acid. A quantitative mass spectrometry method based on internal St-a&diZation is thus urgently needed. In this study, we utilize information from previous work (8) to develop and validate a quantitative method for the analysis of MX in waters. The method described herein is based on electron-ionization high-resolution mass spectrometry and selected-ion monitoring of the m/z 199 and 201 ions of the methyl derivative of MX and the use of isotopically labeled benzoic acid as the internal standard.
Experimental Section Purity of MX Standard Material. Standard material of MX was obtained from the US.Environmental Protection Agency, Health Effects Research Laboratory. The purity of the material was determined to be 84%. We calculated this value in our laboratory by dividing the response (area) of MX by the total response (area) of all the peaks present in a chromatogram obtained by highresolution gas chromatography/ flame-ionization detection (GC/FID). In this analysis, the injector port of the gas chromatograph was 200 "C to prevent thermal degradation of MX. We and others (9) have observed the formation of 2-(dichloromethyl)-3-chloropropenalfrom the decarboxylation of MX at temperatures greater than 200 O C . Derivatization of Standard Materials and Sample Extracts. Solutions of the methyl derivatives of mucobromic acid (Aldrich Chemical Co., Inc., Milwaukee, WI), mucochloric acid (Aldrich Chemical Co., Inc.), MX (50-1000 pg/pL), benzoic acid, and [13C,]benzoic acid (MDS Isotopes, Merck Chemical Division, St. Louis, MO; 50 pg/pL) were prepared by reacting the standard materials independently or together, as indicated, with solutions of 14% (v/v) BF3 in methanol (Alltech Inc., Deerfield, IL) for 12 h. The resulting solutions were then neutralized with 3 mL or 500 pL of 2% (v/v) NaHC03, extracted twice with 250 pL of hexane, and concentrated to 100 pL under a stream of N2 gas. Preparation of Chlorination and Chloramination Extracts of Fulvic Acid. Monochloramine solutions were prepared by the method of Johnson and Overby (IO),
00113-936X/92/0926-1030$03.00/0
0 1992 American Chemical Society