Environ. Sci. Technol. 2003, 37, 5559-5565
Analysis of Sources of Dioxin Contamination in Sediments and Soils Using Multivariate Statistical Methods and Neural Networks RAINER GO ¨ TZ* AND RAIMUND LAUER Beho¨rde fu ¨ r Umwelt und Gesundheit, Institut fu ¨ r Hygiene und Umwelt, Marckmannstrasse 129 b, D-20539 Hamburg, Germany
Multivariate statistical methods and neuronal networks were used to evaluate the concentration dioxin patterns of a large data set (407 samples) in order to identify the dioxin sources of contaminated waters (sediment and suspended particulate matter samples). The evaluations indicated that a considerable proportion of the dioxin contamination of the river Elbe in the section between the Mulde tributary and the North Sea and their flood plains (soil samples) and the Port of Hamburg was caused by pollution originating from the Bitterfeld region, an industrial area of the former German Democratic Republic. The dioxin patterns of sediment samples from tributaries of the river Elbe in the Bitterfeld area itself are similar to dioxin patterns that can be attributed to metalworking processes. The dioxin patterns of the Hamburg inner city waters could be attributed to “incineration” dioxin sources, for example waste incineration plants. The results of cluster analysis applying different modes of distance measure and linkage compared well with neuronal networks. The number of clusters was determined based on the stability of the results of different cluster analyses and background information.
Introduction Several investigations using multivariate statistical methods for the determination of sources of dioxin contamination have been published. Fiedler et al. (1) investigated dioxin contaminations in a river system in South Mississippi using PCA (principal component analysis) and hierarchical cluster analysis. Hagenmaier et al. (2) applied hierarchical cluster analysis analyzing the source of dioxin in sewage sludge samples. Fattore et al. (3) used PCA and discriminant analysis to look for the sources of dioxin contamination in the Venice Lagoon in Italy. Go¨tz et al. (4) used PCA, hierarchical cluster analysis, and the combination of PCA/ hierarchical cluster analysis to analyze dioxin patterns. The usefulness of PCA as a cluster method was discussed. Go¨tz et al. (5) carried out cluster analyses using a data pool of 407 dioxin samples using neural networks. In this study the suitability of cluster analyses was tested by applying different modes of distance measure and linkage on a data pool of 407 dioxin samples. The results were compared with those of neuronal networks. * Corresponding author phone: (Int+40)42845-3876; fax: (Int+40)42845-3873; e-mail:
[email protected]. 10.1021/es030073t CCC: $25.00 Published on Web 11/14/2003
2003 American Chemical Society
Methods Dioxins. For simplicity, the two groups of substances polychlorinated dibenzodioxins (PCDD) and polychlorinated dibenzofurans (PCDF) are called dioxins in this text. Dioxins are formed in the presence of chlorine in practically all incineration processes (e.g. waste incineration, house fires, traffic, forest fires) and other thermic processes (e.g. in the metal-working industry) and also in certain chemical production processes involving chlorine and its compounds. Dioxins occur in the environment ubiquitously: in air, water, and soil samples, in animals (e.g. fish), in human adipose tissue, blood samples, and breast milk (6-9). Dioxins are among the 12 POPs (persistent organic pollutants) of the UNO environmental program (UNEP) (10). The dioxin concentrations are reported as dioxin toxicity equivalents (PCDD/F WHO-TEQ). Each of the concentrations of the 17 most toxic compounds is multiplied by its respective toxicity equivalent factor (TEF), and the products are summed up. The TEF for dioxins according to the WHO (11) were used. Description of the Various Dioxin Sample Groups (Table 1). Among the reference environmental samples classified by source PCP and PCB samples are listed first. Organochloro pesticide production samples are from sediment cores of a canal in Hamburg near a former pesticide plant (lindane, 2,4,5-T). Some sediment samples were taken near a magnesium factory in Norway. The copper slag residue samples (“Kieselrot”) originate from an ore roasting process in Marsberg, Germany. A plant in Sweden is a source of the sludge samples of graphite anodes used as in the chloralkali production of chlorine. For the following sample groups with dioxin contamination the sources were to be determined. Dioxin in the River Elbe. The results of two sampling expeditions for dioxin samples from the Elbe are shown in Figure 1. The sampling points are situated from the Czech border down to the North Sea. The dioxin profile was discussed in detail in ref 5. In sediment samples of this sample group dioxin concentrations ranged from 1.7 to 142 pg WHO-TEQ g-1 dw. Dioxin in the Port of Hamburg. The mean dioxin pollution of the Port of Hamburg was around 70-80 pg WHOTEQ g-1 dw. Dioxin in the Flood Plains of the River Elbe. In soil samples from the flood plains of the Elbe in Lower Saxony, Germany, dioxin concentrations of up to 2300 pg WHO-TEQ g-1 dw were measured and from flood plains of the DoveElbe, a tributary of the Elbe in Hamburg, concentrations of up to 230 pg WHO-TEQ g-1 dw were measured. Dioxin in the Bille Residential Estate in HamburgMoorfleet. The Bille residential estate was 31 hectares in size with 240 houses. Prior to development, the areas of the Bille residential estate were designated (in the 1940s) as disposal sites for the dredging material from the Elbe and the Port of Hamburg. In addition various wastes were dumped there (rubble from the bombed city, old oils). The surface soils of the Bille residential estate showed maximum dioxin concentrations of about 4000 pg WHO-TEQ g-1 dw. The area has been cleared of contamination. Dioxin in the Bitterfeld Region. The Bitterfeld group of samples contains sediments, suspended particulate matter, and soil samples from the area of the river Mulde and its tributary, Spittelwasser. The Mulde flows through the Bitterfeld region, an industrial area of the former German Democratic Republic. In surface sediments of the Mulde up VOL. 37, NO. 24, 2003 / ENVIRONMENTAL SCIENCE & TECHNOLOGY
9
5559
TABLE 1. List of Samples with Dioxin Patterns Used for Cluster Analysis (Hierarchical Cluster Analysis, K-Means-Cluster-Analysis, and Neural Networks-Kohonen Network): 407 Samples sample labela
sample groups
Environmental Samples Classified by Source (External Samples for Comparison) PCP-samples (pentachlorophenol) Z11-Z14 PCB-samples (polychlorinated biphenyls) Z21-Z24 chloralkali process (sludge-samples, Sweden) Z31, Z32 organochloro pesticide production f1-f5 (sediment core samples from the Moorfleet Canal in front of the former pesticide plant, Hamburg, Germany) magnesium production (sediment samples, Norway) c1-c5 copper production; Kieselrot, copper slag residue d1, d2 (soil samples, Germany) d3-d24 sinter plants (gas samples, Germany) Z51-Z515 municipal waste incinerators (MWI) fly ash samples, Japan Z61, Z62 flue gas samples, Germany (average from 22 samples) Z63
n
references
4 4 2 5
(12) (13, 14) (15)
5 2 22 15
(16) (17, 18)
2 1
(19) (20)
43 98 4
(21)
Air Samples deposition samples, Hamburg, Germany urban air samples, Hamburg, Germany
h1-h43 L1-L98 L99-L102
Environmental Samples Not Classified by Source: River Elbe, Hamburg Port, and Inner City Surface Waters (Hamburg) river Elbe; sediment and SPM (suspended particulate E1, E2, E4-E22, E24, 47 matter) samples E26-E44, E53-E58 E45-E52 8 river Schwarze Elster and river Saale (tributaries of Elbe) Els1, Sa1 2 (SPM samples) Hamburg Port (sediment samples) H1-H5, H7-H17 16 inner city surface waters in Hamburg not connected S1-S6 6 to the river Elbe (sediment samples) Flood Plains of the River Elbe and the River Dove Elbe flood plains of the river Elbe in Lower Saxony, Germany AE1-AE21 (soil samples) flood plains of the river Dove-Elbe (tributary of the river Elbe) ADE1-ADE5 in Hamburg, Germany (soil samples) soil samples
Bille Residential Estate, Hamburg, Germany x4-x28
(22)
21 5
25
Bitterfeld Region river Mulde (tributary of the river Elbe) (SPM samples) river Spittelwasser (tributary of the river Mulde) (sediment and SPM samples) flood plains of the river Mulde (soil samples) sewage sludge, Hamburg, Germany oil leachate samples, sanitary landfill Hamburg-Georgswerder, Germany a
B43-B48, B50, B51
8
B1-B21, B35, B37-B42, B49
28
B22-B34, B36 K1-K4 g1-g15
15 4 15
The same sample labels were used in the tables and dendrograms in the Supporting Information.
to 600 pg WHO-TEQ g-1 dw and in the Spittelwasser up to 3000 pg WHO-TEQ g-1 dw were measured. Sediment core samples from the Spittelwasser reached 14 500 pg WHOTEQ g-1 dw and soil samples from the flood plains of the Spittelwasser up to 180 000 pg WHO-TEQ g-1 dw. Dioxin in Hamburg Inner City Waters. A further group contains sediment data from Hamburg inner city waters not influenced by the river Elbe. In these sediments up to 300 pg WHO-TEQ g-1 dw were measured. Dioxin in the Sanitary Landfill Hamburg-Georgswerder. In leachate oils of the sanitary landfill Hamburg-Georgswerder dioxin concentrations of up to 740 000 pg WHO-TEQ g-1 were measured. Chemical waste from the above-mentioned organochloro pesticide plant was dumped here (23). Dioxin in the Sewage Sludge of the Wastewater Treatment Plant in Hamburg. The maximum concentrations were 39 pg WHO-TEQ g-1 dw. The measured dioxin concentration data of the 407 samples from Table 1 used for statistical analyses are provided as Supporting Information, Table SI.1. 5560
9
ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 37, NO. 24, 2003
The Preparation of the Dioxin Data for Mathematical Evaluation Methods. For the cluster analysis the dioxin raw datasthe measured concentrations of the 17 toxical (2,3,7,8substituted) dioxin and furan congenersshave to be suitably transformed. The concentration of each of the 2,3,7,8substituted tetra- to hepta-congeners was divided by the sum of the concentrations of the respective dioxin homologue group, e.g. the concentration of 2,3,7,8-tetrachlorodibenzop-dioxin (2,3,7,8-TCDD) was divided by the value of the sum of the 22 tetrachlorated dibenzodioxins (sum TCDD). The concentrations of OCDD and OCDF are divided by the concentration of PCDD and PCDF, respectively (2). As 18th variable, the quotient of PCDD concentration and the total concentration of PCDD plus PCDF was added. This transformation is based on two assumptions (2). First the physicochemical properties of the PCDD and PCDF of the same degree of chlorination, that means the isomers within one homologue group, differ less than congeners, which belong to different homologue groups, second the behavior of the PCDD/PCDF in the single environmental
FIGURE 1. Dioxin in the river Elbe. compartments, e.g. partition, migration, microbiological and chemical decomposition, is determined largely by their physicochemical properties. Therefore Hagenmaier et al. (2) postulate that the concentrations in the environment of all isomers within one homologue group show the same tendency and that the profiles of relative congener concentrations should be subject to changes in the environment to a lesser degree than the homologue profiles.
Mathematical Classification Methods Multivariate Statistical Classification Methods. Cluster analysis serves to aggregate the dioxin patterns of dioxin samples into clusters in such a way that the patterns in any one cluster are as similar to each other as possible within one cluster and as different from each other as possible in comparison to other clusters. The multivariate classification methods were carried out with the SPSS program pack (24). Hierarchical Cluster Analysis. Clustering begins by finding the closest pair of dioxin samples according to a distance or similarity measure and links them to form a cluster. Distances measure how far apart two dioxin samples are and similarities measure how close they are. The algorithm continues step by step, linking pairs of dioxin samples, pairs of clusters, or a dioxin sample with a cluster, until all 407 dioxin samples are in one cluster. Table 2 shows the formulas of distances and similarities as calculated by the SPSS program pack. SPSS offers seven methods for linking clusters. The default method in SPSS is between-groups linkage. The distance/ similarity between two clusters is taken as the average of the distances/similarities of all pairs of dioxin samples with one dioxin sample each from each of the two clusters. All dioxin samples are taken into consideration so that the distance/ similarity is not influenced by individual dioxin samples. The within groups linkage cluster method also includes in the calculation dioxin sample pairs from within the same
TABLE 2. Measures (SPSS)a distance measures squared Euclidean distance: Euclidean distance:
∑(Xi - Yi)2
x∑(Xi - Yi)2 Maxi(|Xi - Yi|)
Chebychev:
∑|Xi - Yi|
block: Minkowski (p ) 3) similarity measures cosine:
x∑(Xi - Yi)p ∑ (Xi*Yi)
x∑ Xi2*∑ Y2i Pearson correlation:
∑ Zxi*Zyi N-1
a X : value of the dioxin sample X in the ith dioxin variable. Y : value i i of the dioxin sample Y in the ith dioxin variable. Zxi: standardized value of the dioxin sample X in the ith dioxin variable. Zyi: standardized value of the dioxin sample Y in the ith dioxin variable. N: number of the dioxin variables.
cluster. In our opinion this means that the selectivity between the clusters can become lower. In the very simple cluster methods nearest neighbor and furthest neighbor, the distance/similarity between two clusters is seen as the distance/ similarity between their nearest neighboring and furthest neighboring dioxin samples. In the case of the sample groups here, some of which are large, the use of these two cluster methods can lead to implausible results, as a large section of the dioxin samples are not taken into consideration. Centroid clustering is different from the above-mentioned methods in that the first step is not the determination of the distance measures. First the mean values of the dioxin variables of the dioxin samples contained in the cluster are VOL. 37, NO. 24, 2003 / ENVIRONMENTAL SCIENCE & TECHNOLOGY
9
5561
calculated. The distance between two clusters is then calculated with the squared Euclidian measure. Median clustering is similar to centroid clustering. Ward’s method also first calculates the mean values of the individual dioxin variables in one cluster. Then the squared Euclidean distances of the dioxin variables of each dioxin sample to the cluster mean value are calculated and added together. Both clusters are linked to form a new cluster, this linking giving the lowest possible increase in the total sum of squared distances (25, 26, 27, 28). K-Means Cluster Analysis. The algorithm used for determining cluster membership in the K-Means Cluster Analysis procedure is based on nearest centroid sorting. That is, a dioxin sample is assigned to the cluster with the smallest distance between the dioxin sample and the center of the cluster (centroid). The distance measure is the Euclidean distance (25, 26, 28). Neural Networks - Kohonen Network. The SPSS program pack (29) was used to perform the calculations. Here the input vector space of the Kohonen net is of dimension 18, and each input vector consists of the 18 transformed dioxin concentrations of a dioxin sample. Each neuron contains a weight vector of the same dimension as the input vectors. After an input vector (dioxin sample) enters the network, only the neuron with the most similar weight in the active layer is selected (the “winner takes it all” method). The weights of the winning neuron are adapted, and the weights of the neurons in the vicinity of the winning neuron are also adapted. These steps are repeated for all input vectors to complete one iteration. We produced three outputs. One output, the vector quantization codebook, is the code of the winning node. Each neuron in the Kohonen layer is given a code number. The other output is the vector quantization. This is the set of weights of the winning neuron. A colored neuron plot is also produced (26, 30, 31). Discriminant Analysis. The discriminant analysis was applied to analyze differences among the groups resulting from the clustering by hierarchical cluster analysis and the neural networks (25-28).
Results and Discussion The Dioxin Data Pool with 407 Dioxin Samples. Hierarchical Cluster Analysis. Classification with the Cluster Method between Groups Linkage. First we present the cluster result which we consider most plausible based on comparison of the results of the different classification methods with varying cluster numbers on one hand and with background information on the other hand. The dendrogram in the Supporting Information (Figure SI.1) visualizes the result of the cluster method cosine/between-groups linkage. Since a large dendrogram is hard to read, Table SI.2, which lists cluster memberships, is added in the Supporting Information. Table 3 summarizes this classification result for the seven clustersolution with reference to the respective dioxin sample groups from Table 1. The two largest clusters are the “Bitterfeld-Elbe cluster” and the “air cluster”. In the “Bitterfeld-Elbe cluster” there are the sediments and suspended particulate matter samples (SPM) from the river Elbe from the mouth of the river Mulde down to the North Sea, also the sediment samples from the Port of Hamburg, the soil samples from the flood plains of the Elbe and the Dove-Elbe, further the samples from the Bitterfeld region. This result leads to the hypothesis that the Bitterfeld contamination is the source of a considerable part of the dioxin pollution of the Elbe and its flood plains and also of the Port of Hamburg, having been transported as suspended particulate matter. 5562
9
ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 37, NO. 24, 2003
TABLE 3. Classification Results407 Dioxin Samples, 7 Clusters: (a) Hierarchical Cluster Analysis (Cosine/ between-Groups Linkage) and (b) Neural Networks, Kohonen-Network (16 Neurons) 7 clusters (sample groups) Bitterfeld-Elbe cluster magnesium production copper production Bitterfeld region river Elbe (river Mulde to North Sea) Port of Hamburg flood plains (river Elbe) flood plains (river Dove-Elbe) Bille residential estate, Hamburg PCP cluster PCB cluster organochloro pesticide cluster organochloro pesticide production sanitary landfill Hamburg-Georgswerder (oil leachate) chloralkali process cluster air cluster municipal waste incinerators deposition urban air sewage sludge river Elbe (upstream river Mulde) inner city surface waters, Hamburg sinter plants cluster
The “Bitterfeld-Elbe cluster” also contains the soil samples from the Bille residential estate. With their values of up to 4000 pg WHO-TEQ g-1 dw the dioxin concentrations of the Bille residential estate are, however, more than one scale above the current pollution of the river Elbe. To further substantiate the assertion that the peak values of the dioxin pollution of the soil in the Bille residential estate originated from the Elbe, it had to be proved that the Elbe pollution in former times contained a similarly high level of dioxin. Therefore in 1995 sediment cores from the Elbe meadows near Pevestorf (near Gartow, district of Lu¨chow-Dannenberg, Lower Saxony, river km 485)sfar upstream from Hamburgs and in Heuckenlock (from where the Elbe begins to divide within the area of Hamburg, river km 610.5) were investigated. The result of dating the Heuckenlock core is shown in Figure 2 a. The dating was done with the help of gammaspectrometry for the cesium isotopes Cs-134 and Cs-137. The time axis was divided into equal segments starting with the main peaks of Cs-134/Cs-137 pointing the year 1986 (Chernobyl accident) and the main peak of Cs-137 alone indicating the year 1964 (maximum nuclear fallout). The first cesium contaminations in the environment start after 1940. The Heuckenlock core shows a very narrow segment between the surface (1995) and the 1986 mark. This upper layer was therefore classified as the 1986-1995 area. Figure 2b shows the dioxin concentrations of the individual sediment core layers. In the sediment layer that can be related to the period of about 1940-1950, there is an abrupt increase in the dioxin concentrations to approximately 2200 pg WHO-TEQ g-1 dw. In this period there also occurred the disposal of dredgings from the Port of Hamburg in the areas of the Bille residential estate. The sediment core from Pevestorf contains the highest dioxin concentrations of about 7600 pg WHO-TEQ g-1 dw and 3600 pg WHO-TEQ g-1 dw at a sediment depth which can be assigned by the radiochemical time-dating to the period of 1940-1960 (Figure 2c,d). An additional classification showed that the sediment core layers from Heuckenlock and Pevestorf belong to the “Bitterfeld-Elbe-cluster”.
FIGURE 2. Dating and dioxin concentration (WHO-TEQ) in sediment cores from the Elbe meadows near Pevestorf (river km 485) and Heuckenlock, Hamburg (river km 610.5): a. Dating of the sediment core from Heuckenlock. b. Dioxin concentration in the sediment core from Heuckenlock. c. Dating of the sediment core from Pevestorf. d. Dioxin concentration in the sediment core from Pevestorf. Very high dioxin concentrations were also proved in the Bitterfeld region in old sediment core samples (Spittelwasser) of which the age has not been determined. Thus the examination of the absolute levels of the dioxin contents also provided conclusive support for the hypothesis that there is a causal connection in the chain of causality between the Bitterfeld regionsthe river Elbesthe Port of Hamburgsthe Bille residential estate. As far as the sources of the Bitterfeld dioxin contaminations, influencing the river Elbe, are concerned, there are indications that these are related to thermic processes in the metallurgical industry: The sample groups magnesium production and copper production are in the Bitterfeld-Elbe cluster. According to the Bitterfeld chronicle (32), there were several metal-working companies in Bitterfeld: an aluminum works, electrochemical plants for the production of magnesium, production of magnesium alloy Elektron, the production of various magnesium-aluminum alloys, the production of magnesium powder plates and blocks, the production of ferro-titan, plants for ferrozinc, special steel alloys, nickel-aluminum alloys, and a large pyrite roasting plant. Rappe (7), who investigated the soil samples from the
Bille residential estate (the old dioxin contamination of the Elbe) for dioxins and polychlorinated dibenzothiophenes (PCDTs), supposed that the production of iron and steel was a major source of the contamination in this area. Probably the sulfur of the PCDT originates from the pyrite of the roasting plant mentioned. The samples of the dioxin sources PCP, PCB, chloralkali electrolysis, and organochloro pesticide productionsall samples from chlorochemical processess form separate clusters. The sediment core samples from Pevestorf and Heuckenlock showed no correlation between dioxin and the chlororganic substances chlorbenzenes and HCH. The “organochloro pesticide cluster“ contains the sediment core samples from the port canal (Moorfleet canal) in front of the former organochloro pesticide company and the oil leachate samples from the sanitary landfill HamburgGeorgswerder. The cluster formation is plausible, as in former times waste from that pesticide company was deposited on this sanitary landfill. As samples from the Moorfleet canal are not in the same cluster as the dioxin samples from the Port of Hamburg from other areas of the port which are in the Bitterfeld-Elbe cluster; it is therefore to be concluded VOL. 37, NO. 24, 2003 / ENVIRONMENTAL SCIENCE & TECHNOLOGY
9
5563
that there is no significant contribution from the former pesticide company to the dioxin contamination of the Port of Hamburg. The appearance of the sediments from Hamburg inner city waters in the “air cluster” indicates that these sediments are mainly contaminated through wet and dry deposition from the atmosphere, and thus the source of their pollution is to be found in the “incineration“ dioxin sources (waste incineration plants and similar). The fact that the sediment samples from the river Elbe above the mouth of the Mulde, i.e., above the Bitterfeld region are not in the same cluster as the Bitterfeld samples, is plausible, as these sediments cannot be contaminated by the Bitterfeld region. The dioxin samples from the sinter plants (emission air samples) form a separate cluster. This finding is difficult to interpret. As these dioxin samples are from the metal industries, they would have been expected to be found in the cluster with the other dioxin samples from the magnesium and copper production. According to ref 28 a cluster result can be characterized by stability respective to the number of clusters, to change of the distance or similarity measure and to change of the linking method. Figure SI.1 and Table SI.2 make evident that the “Bitterfeld-Elbe cluster” from Table 3 in the range of six to at least nine clusters is stable (i.e. the cluster content remains unchanged), also the “air cluster” and the “organochloro pesticide cluster“ in the region of seven to at least eleven clusters. The four clusters from chlorochemical processes remain unchanged in the range of seven to at least eleven clusters. Since in the six cluster solution the “air cluster” and the “organochloro pesticide cluster” combine which seems unplausible, the seven cluster solution is considered most adequate. The same classification was obtained using the similarity measure Pearson correlation and the distance measure Chebychev, and similar results were obtained with the other distance measures. The dendrogram from the distance measure squared Euclidean is provided as Supporting Information, Figure SI.2. In this case the discrimination power is less, the sample group river Elbe (upstream river Mulde) is combined with the “Bitterfeld-Elbe cluster”. Sewage sludge is a cluster of its own. With seven clusters “sinter plants” join the “air cluster”, with eight clusters they are separated. Classification Using Other Linkage Methods. Concerning the linkage methods centroid clustering, median clustering, and Ward’s method only the distance measure squared Euclidean distance is considered suitable (24, 28). With the other linkage methods cluster analyses were calculated with all similarity and distance measures. Centroid clustering with eight clusters gives a similar result as shown in Table 3. There is an eighth cluster with a single sample, the sample group river Elbe (upstream river Mulde) combines with the “Bitterfeld-Elbe cluster” and the sample group PCP joins the sample group sewage sludge. With seven clusters the “Bitterfeld-Elbe cluster” and the “air cluster” come together (see dendrogram in Figure SI.3, Supporting Information). In the case of the linkage methods nearest neighbor, furthest neighbor and median clustering, the “BitterfeldElbe cluster” and the “air cluster” from Table 3 are joined in one cluster for the majority of the combinations similarity or distance measure/linkage method. In the case of two combinations the “Bitterfeld-Elbe cluster“ is divided into 2 and 3 clusters, respectively. This classification result does not appear plausible. The linkage methods nearest neighbor and furthest neighbor were already questioned above for methodical reasons. The doubts expressed about the cluster method linkage within the group are reflected in a rather implausible classification result. In the Ward’s method, the “Bitterfeld-Elbe cluster“ appears in three clusters, and the sample groups PCB and sinter plants are in one cluster. 5564
9
ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 37, NO. 24, 2003
Perhaps the implausible result is explained according to ref 29 where is stated that Ward’s method only works well if expected clusters are of similar size which is not the case here. Neural Networks - Kohonen Network. The calculation using the Kohonen network was carried out with 16 neurons and single network size. The evaluation of the output codebook and the vector quantization output produces the following result: neuron 1 contains the sample group PCB, neuron 15 the sample group PCB, neuron 8 the two sample groups organochloro pesticide production and the sanitary landfill Hamburg-Georgswerder, which together form the “organochloro pesticide cluster” in the hierarchical cluster analysis. Neuron 5 contains the sample group chloalkali electrolysis and neuron 6 the sinter plant sample group. This corresponds exactly with the classification by the hierarchical cluster analysis as shown in Table 3. Neuron 16 contains all sample groups of the “air cluster” except for the sample group Elbe (above the Mulde). Neuron 10 contains most of the “Bitterfeld-Elbe cluster”: all samples of the two groups flood plains of the river Elbe and Bille residential estate, also samples of the groups river Elbe (Mulde to North Sea), Port of Hamburg and Bitterfeld region. The remaining samples are divided over the remaining neurons. The colored node plot does not provide a clear indication as to how the remaining neurons can be summarized. Therefore output vector quantization was used to carry out a cluster analysis with hierarchical cluster analysis (cosine/between-groups linkage) and the K-Means cluster analysis. The first method produced the same classification as shown in Table 3. The result of the second method is different from Table 3 in that the Bitterfeld sample group is divided between two clusters and the sample group sinter plants and the “air cluster” of Table 3 are together in a cluster. The Kohonen network was also used to test out other setting parameters. Calculations with 36 and 100 neurons led to unfavorable cluster results. With 16 neurons and double network size the same result was achieved as with single network size. The inclusion of the neighborhood decay rate produced implausible results. Discriminant Analysis. It was investigated how well discriminant analyses would reproduce the cluster memberships given in Table 3. The discriminant analysis was able to group 99% of the dioxin samples as grouped by hierarchical cluster analysis and neural networks (see Table SI.3, Supporting Information). The Dioxin Data Pool with 140 Dioxin Samples. Classification with Hierarchical Cluster Analysis and Neural Networks. This is a partial set of the main data set with 407 dioxin samples. It contains neither the sample group from the sanitary landfill Hamburg-Georgswerder nor that of the Bille residential estate. The hierarchical cluster analysis cosine/between-groups linkage show the same classification result as the large data pool of 407 dioxin samples in Table 3 (see the dendrogram in Figure SI.4, Supporting Information). This result is a strong argument for the stability of the cluster solution from Table 3 because the cluster solution from Table 3 is independent of the respective dioxin samples in the clusters. The result of the application of the Kohonen network provides a slightly divergent result. The cluster “sinter plants“ occurs with the “air cluster” and the Mg-production and Elbe flood plains area sample groups divide off from the “Bitterfeld-Elbe cluster”.
Acknowledgments We gratefully acknowledge the delivery of dioxin data from the Bitterfeld region by the Landratsamt Bitterfeld, Dezernat Umweltschutz, Naturschutz und Abfallwirtschaft, Germany, and the Landesamt fu ¨ r Umweltschutz Sachsen-
Anhalt, Wittenberg, Germany, data from flood plains of the river Elbe by the Niedersa¨chsisches Landesministerium fu ¨r Erna¨hrung, Landwirtschaft und Forsten, Hannover, Germany, and sinter plant data by the Landesumweltamt Nordrheinwestfalen, Essen, Germany.
Supporting Information Available Two tables and four figures. This material is available free of charge via the Internet at http://pubs.acs.org.
Literature Cited (1) Fiedler, H.; Lau, C.; Kjeller, L.-O.; Rappe, C. Chemosphere 1996, 32, 421-432. (2) Hagenmaier, H.; Lindig, C.; She, J. Chemosphere 1994, 29, 21632174. (3) Fattore, E.; Benfenati, E.; Mariani, G.; Fanelli, R. Environ. Sci. Technol. 1997, 31, 1777-1784. (4) Go¨tz, R.; Steiner, B.; Friesel, P.; Roch, K.; Walkow, F.; Maass, V.; Reincke, H.; Stachel, B. Chemosphere 1998, 37, 1987-2002. (5) Go¨tz, R.; Steiner, B.; Sievers, S.; Friesel, P.; Roch, K.; Schwo¨rer, R.; Haag, F. Wat. Sci. Technol. 1998, 37, 207-215. (6) Rappe, C. Chemosphere 1993, 27, 211-225. (7) Rappe, C. Fresenius J. Anal. Chem. 1994, 348, 63-75. (8) Schecter, A., Ed. Dioxins and Health; Plenum Press: New York, 1994. (9) Ballschmiter, K.; Bacher, R. Dioxine, Chemie, Analytik, Vorkommen, Umweltverhalten und Toxikologie der halogenierten Dibenzo-p-dioxine und Dibenzofurane; VCH Verlag Chemie: Weinheim, New York, 1996; pp 247-328. (10) United Nations Environment Programme (UNEP). Dioxin and furan inventories, national and regional emissions of PCDD/ PCDF; Unep Chemicals: Geneva, Switzerland, 1999. (11) Van den Berg, M.; Birnbaum, L.; Bosveld, A. T.; Brunstro¨m, B.; Cook, P.; Feeley, M.; Giesy, J. P.; Hanberg, A.; Hasegawa, R.; Kennedy, S. W.; Kubiak, T.; Larsen, J. C.; van Leeuwen, F. X.; Liem, A. K.; Nolt, C.; Petersen, R. E.; Poellinger, L.; Safe, S.; Schrenk, D.; Tillitt, D.; Tysklind, M.; Younes, M.; Waern, F.; Zacharewski, T. Environ. Health Perspect. 1998, 106, 775-792. (12) Hagenmaier, H.; Brunner, B. Chemosphere 1987, 16, 1759-1764. (13) Wakimoto, T.; Kannan, N.; Ono, M.; Tatsukawa, R.; Masuda, Y. Chemosphere 1988, 17, 743-750.
(14) Brunner, H. Dissertation, Universita¨t Tu ¨bingen, Germany, 1990. (15) Rappe, C.; Kjeller, L. O.; Kulp, S. E.; de Witt, C.; Hasselsten, I.; Palm, O. Chemosphere 1991, 23, 1629-1636. (16) Oehme, M.; Mano, S.; Brevik, E. M.; Knutzen, J. Fresenius J. Anal. Chem. 1989, 335, 987-997. (17) Theisen, J.; Maulshagen, A.; Fuchs, J. Chemosphere 1993, 26, 881-896. (18) Rotard, W.; Christmann, W.; Knoth, W.; Mailahn, W. UWSF-Z. Umweltchem. O ¨ kotox. 1995, 7, 3-9. (19) Yasuhara, A.; Ito, H.; Morita, M. Environ. Sci. Technol. 1987, 21, 971-979. (20) Hagenmaier, H. Polychlorierte Dibenzodioxine und polychlorierte Dibenzofurane - Bestandsaufnahme und Handlungsbedarf; VDI-Berichte: Germany, 1989; Nr. 745, pp 939-978. (21) Rappe, C.; Kjeller, L. O.; Bruckmann, P.; Hackhe, K. H. Chemosphere 1988, 17, 3-27. (22) Go¨tz, R.; Friesel, P.; Roch, K.; Pa¨pke, O.; Ball, M.; Lis, A. Chemosphere 1993, 27, 105-111. (23) Go¨tz, R. Chemosphere 1986, 15, 1981-1984. (24) SPSS 10.0 for Windows; Chicago. (25) SPSS Inc. SPSS Base 10.0. Applications Guide; Chicago, 1999; pp 243-315. (26) Vandeginste, B. G. M.; Massart, D. L.; Buydens, L. M. C.; de Jong, S.; Lewi, P. J.; Smeyers-Verbeke, J. Handbook of Chemometrics and Qualimetrics: Part B; Elsevier Science: Amsterdam, 1998; pp 57-86, 207-242, 687-691. (27) Einax, J. W.; Zwanziger, H. W.; Geiss, S. Chemometrics in Environmental Analysis; VCH Verlag Chemie: Weinheim, New York, 1997; pp 153-195. (28) Backhaus, K.; Erichson, B.; Plinke, W.; Weiber, R. Multivariate Analysemethoden, 8th ed.; Springer: Berlin, Heidelberg, New York, 1996; pp 90-165, 261-321. (29) SPSS Neural Connection 2.1 for Windows; Chicago. (30) SPSS Inc. Neural Connection 2.1 Applications Guide; Chicago, 1997. (31) Zupan, J.; Gasteiger, J. Neural Networks for Chemists; VCH Verlag Chemie: Weinheim, New York, 1993; pp 79-98, 167-182. (32) Bitterfelder Chronik. 100 Jahre Chemiestandort Bitterfeld-Wolfen; Herausgeber: Vorstand der Chemie AG Bitterfeld-Wolfen, 1993.
Received for review June 11, 2003. Revised manuscript received September 19, 2003. Accepted September 23, 2003. ES030073T
VOL. 37, NO. 24, 2003 / ENVIRONMENTAL SCIENCE & TECHNOLOGY
9
5565