Environ. Sci. Technol. 1994, 28, 1015-1022
Visual Neural Mapping Technique for Locating Fine Airborne Particles Sources Dietrich Wienket and Philip K. Hopke’
Department of Chemistry, Clarkson University, Box 5810, Potsdam. New York 13699-5810
A combination of two pattern recognition methods has been developed that allows the generation of geographical emission maps from multivariate environmental data. During such a projection into a visually interpretable subspace by a Kohonen self-organizing feature map, the topology of the higher dimensional variables space can be preserved, but parts of the information about the correct neighborhood among the sample vectors are lost. This loss can partly be compensated for by the additional projection of Prim’s minimal spanning tree onto the trained neural network. This new environmental receptor site modeling technique is theoretically discussed for measurements from single sampling sites. In order to obtain a further quantitative evaluation of such a combined mapping of minimal spanning tree and Kohonen neural network, the concept of a geographic unit circle (GUC) is introduced as well. The GUC around the single sampling site in Granite City, IL, yielded estimates of the emission levels, the trace element profiles, and the geographic directions for a number of airborne particle sources. Introduction For the evaluation of the air pollution situation in a particular geographical area, it would be helpful to have mapping procedures based on sampling site measurements that could be compared with the emission inventory. The presentation of multivariate measurements, such as multielemental particle sample compositions, in low-dimensional and visually interpretable projections especially is needed. Wienke and Hopke (I) recently have demonstrated that a Kohonen self-organizing artificial neural network can be used as such a projection technique for multivariate airborne particle data. Kohonen (2, 3) developed this kind of neural network more than 10 years ago as an unsupervised pattern recognition working model of the brain. It is based on the discovery by neural sciences that the brain seems to be able to map images, noise, language, knowledge, etc. into highly organized forms by physically low-dimensional 1-D, 2-D, and 3-D arrangements of neurons. Among numerous technical applications in language analysis, speech recognition, and image processing, the self-organizing feature map found interest also in environmental, chemical, and spectroscopical research in the last few years (4-9). Kohonen’s neural network has the interesting property of preserving the correct topology of the original m-dimensional variable space in a graphical interpretable low-dimensional map. As recently illustrated by Wienke and Hopke ( I ) , distinct trends in the mdimensional space such as straight and curved lines can
* Author to whom correspondence should be addressed; e-mail address:
[email protected]. + Present address: Catholic University of Nijmegen, Laboratory for Analytical Chemistry, Toernooiveld 1,6525 ED Nijmegen, The Netherlands. 0013-936X/94/0928-1015$04.50/0
0 1994 American Chemical Society
be identified as such in the 2-D neural map. Discontinuous data structures (clusters) can be also identified in such a map. However, Kohonen’s method has a deficiency as does any other projection method. During the projection of the data, information about the correct neighborhood is lost. A neighbor in the lower dimension map is not necessarily a neighboring object in the original higher dimensional space. Wienke and Hopke (I)have shown that a second pattern recognition method, the minimal spanning tree, can help to partly compensate for this deficiency if it is projected in a particular way onto a trained Kohonen map. In contrast to the Kohonen network, the minimal spanning tree yields information about the correct neighborhood relationships between sample vectors or groups of them. The minimal spanning tree is one of the oldest, unsupervised working pattern recognition methods (10-12) and can be traced back to the work of Florek ( I O ) in the early fifties. In the present study, this new algorithmic combination of Prim’s minimal spanning tree with a modified Kohonen self-organizing feature map is applied to the mapping of multivariate environmental data. In this way, the described individual deficiencies of both methods can be overcome. In Wienke and Hopke ( I ) , the multivariate trace elemental patterns of the coarse fraction of airborne particulate matter samples taken in Granite City, IL, were explored. The analysis provided a directly geographically interpretable map that reflected the directions for several type of known industrial air pollution sources in the Granite City area. However, the interpretation of the corresponding trace element pattern of the fine particle fractions was expected to be more difficult. Fine particles tend to be transported over longer distances, to become mixed with particles from distant sources, and to be more subject to chemical transformations. The present work focuses on this more difficult problem by applying the newly developed mapping technique to this particle fraction. In this way, the limitations of the new combined mapping method are explored compared to the potentially easier analysis of the coarse fraction data. In the next section, the underlying theoretical model for a single sampling site will be developed. Model for the Single Sampling Site The conceptual framework for a single sampling site is presented in Figure 1. The site is affected by several sources of air pollution that can contribute to either identical or distinctly different multivariate patterns of component species. If m species are analyzed in each sample taken a t the site, then every source is also characterized by this m-dimensional vector of m species concentrations. However, neighboring sources and continuously changing meteorological situations (wind direction, wind speed, etc.) influence the multivariate profile Environ. Sci. Technol., Vol. 28, No. 6,
1994
1015
\
I 0
0
0
3
0
0
0
0
0
G
O
O
0
0
0
0
o
c
0
G
G
O
O
O
G
C
0
3
0
0
0
0
3
0
G
3
0
3
3
o
69
0 O
G
C
\ 0
0 -3 0
s
\ Flgure 1. Single sampling site (X) superimposed onto a Kohonen selforganizing feature map. Multivariate airborne particle compositions affect each site through a weighting resulting from their origin as well as from the meteorological situation (sample vectors symbolized by different types of arrows). I t is assumed that these chemical data vectors form a topological structure in the Kohonen map, which is similar to the geographical locations of the air pollution sources, S.
of a given source. This modification of the pure source profile during its transport is coded in Figure 1 by the choice of distinct lengths, directions, and line thicknesses of the vector arrows. The pure source profiles are thereby transformed to produce the multivariate sample vectors. For the internal presentation of the computer model, the single sampling site is thought to be superimposed onto a two-dimensional Kohonen neural network containing u neurons. This superposition is made to obtain the desired geographical map. In Figure 1,a square array of 9 X 9 neurons has been chosen (u= 81). Alternative neural arrangements can be a circle, a rectangle, or an ellipse. In reality, the u neurons only represent an index for u weight vectors of the same length m as the n sample vectors. The essence of Kohonen’s algorithm is a repeated comparison of the n sample vectors with these u weight vectors using a distance metric such as Euclidian distance or correlation coefficient. During each comparison, a winner, i , among all u weight vectors can be found that has the highest degree of similarity (smallest distance) to a particular sample vector, Xk. After finding this winning vector, the j elements, Wij,ol& of the winning weight vector i are adjusted to be a small step closer to the elements X k of the kth input sample vector using the Kohonen learning rule wtj(new) = w,(old)
+ ?7[xij- w,(old)l
(1)
The learning rate, TJ,is chosen as a positive, real number with a value < 0.1. Together with the winner, i, further weight vectors are modified within a topological neighborhood of radius, R , around the winner. Squares, hexagons, or circles have been used as topological neighborhoods ( 1 , 3,13,14). At the beginning of the analysis, R = Rois large but decreases slowly with the training time and reaches R = Re. After these adaptations, the subset of weight vectors within R become slightly more similar to the actual input vector in terms of the chosen distance metric. The comparison of all n sample vectors with all u neural weight vectors and their modification is called one epoch. After repeating this process over a large number of epochs, ne, with ne > 500u (3, 7), a self-organized behavior of the n samples in the low-dimensional neural array can be 1016
Environ. Sci. Technol., Vol. 28, No. 6, 1994
observed. They form a visual topological structure because of their arrangement in the original m-dimensional variables space. It has been found (13)that an efficient approach to this training is to split the total number of epochs into two parts of lengths, ne1 and ne2. During the initial phase, nel, the size of R is reduced from R, to Re as described by Kohonen (13). He also found that for several applications the values of initial learning rate, TJI= 0.1, and final learning rate, 1 2 = 0.008, provided optimal results and, thus, have been used in these studies. During the ne2 epochs, the map is refined and stabilized to its final form. Details of this process are provided by Wienke and Hopke ( 1 ) and Kohonen (13). The final result of this training process is an aggregation of sample vector subsets to certain weight vectors in the neural array. The trained weight vector for such an aggregated cluster of input vectors comes numerically very close to their mean vector. The corresponding neuron can be termed a ‘loaded neuron’. Another result might be that other weight vectors do not collect any input vectors. Their corresponding neurons are defined as ‘unloaded neurons’. Up to this point, only an unsupervised pattern recognition result has been obtained similar to that which could be obtained from hierarchical cluster analysis, nonlinear mapping, or principal component scores plots. However, despite the interesting fact that the topology of the m-space will be preserved, the quantitative interpretation of the Kohonen map remains difficult particularly when information about the correct neighborhood is lost during the projection into a 2-D plane. It is straightforward and logical to add as a missing step a procedure that allows the visualization of the global relationships between all loaded neurons simultaneously in the map. The technique chosen in this work is a calculation of the minimal spanning tree between the loaded neurons of the trained Kohonen selforganizing feature map. The concept of loaded and unloaded neurons developed in this work, permits the viewpoint that the Kohonen map is a data compression procedure before applying the minimal spanning tree. After this compression of the n input samples to a few loaded neurons, they are presented in a topology-preserving, low-dimensional map. For the complete combined algorithm, given as a flow chart, see Wienke and Hopke ( I ) . The model can be expanded to include the more general case of multiple sampling sites that are affected by multiple sources of air pollution. This expanded model and its application to the appropriate data sets are presented in a companion paper (15).
Experimental Section Airborne particle samples were taken by the Illinois State Water Survey at a single sampling site located in Granite City, IL, between March 1986 and June 1987. A dichotomous sampler fitted with a PMlo inlet was used with sampling intervals of 12 and 24 h. Forty-eight coarse fraction filters (providing a particle fraction of 2.5-10 pm aerodynamic diameter) and 49 fine fraction filters (below 2.5 hm aerodynamic diameter) were analyzed using neutron activation analysis and X-ray fluorescence spectroscopy so that, for each sample, a vector of 48 elemental concentrations (variables) was determined. From these variables, 33 analytes (Na, Al, Si, P, S, C1, K, Ca, Sc, Ti,
0
0
0
0
0
0
0
0
0
0
0
0
of(lJ0
0
0
0
0
0
0
0
0
3
0
0
0
Flgure 2. Map of the 48 coarse airborne particles samples characterized by 33 trace elemental concentrations and two wind direction variables projected onto a 9 X 9 neuron 2-D Kohonen map. Training parameters ( 7): matrix was row and column unit length scaled, Euclidian distance measure, Ro = 9, Re = 3, ne, = 3000,ne2 = 1000, kl = 0.1, k2 = 0.008. Numbers within the loaded neurons equal the number of samples that are aggregated to the particular neuron.
07-@ 0
@J0
0
0
0
0
0
0
0
0
0
0
3
0
c
0
0
3
0
0 t l J
Flgure 3. Map of 49 samples of fine airborne particles characterized by 33 trace elemental concentrations and two wind direction variables projected onto a 9 X 9 neuron 2-D Kohonen map. Training parameters are the same as in Figure 2. Numbers within the loaded neurons equal the number of samples that are aggregated to the particular neuron.
Flgure 4. Coarse particle map from Figure 2 with the minimal spanning tree projected into it. The edges of the tree connect the loaded neurons in the map.
V, Cr, Mn, Fe, Co, Ni, Cu, Zn, As, Se, Br, Rb, Sr, Sb, La, Ce, Eu, Dy, Yb, HE, Pb, T h and U) were selected as having concentrations above the detection limits €or most of the samples. For further analytical details, see Glover et al. (16).
These data were studied by Glover e t al. (16) using principal components analysis (PCA) with varimax rotation and chemical mass balance (CMB) analysis. Trace element patterns have been detected in the fine and coarse particle fractions that were identified as being caused by emission sources in the Granite City area. The particle
Flgure 5. Fine particle map from Figure 3 with the minimal spanning tree projected into it. The edges of the tree connect the loaded neurons in the map.
sources included non-ferrous metal smelters, steel industry, motor vehicle traffic, soil, etc. For the present study, the 48 X 33 coarse fraction data matrix and the 49 X 33 fine fraction data matrix were augmented with the mean wind direction during the sampling period coded as two additional variables, the sine and the cosine of the mean direction. Eachof the 35 variables was scaled to unit length to give it an equal influence on the analysis. In a subsequent scaling step, the sample vectors were alsoscaled to unit length. The two, doubly scaled data matrices formed the inputs to the combined algorithm of Prim's minimal spanning and the Kohonen neural network. Computation
The complete algorithm of the minimal spanning tree projected into the Kohonen neural network has been implemented in a program package called 3MAP (minimal spanning tree self-organizing feature map). It was written in Turbopascal 5.5 for MS-DOS computers. 3MAP/DOS presents an on-line view of the self-organization process within the map on the screen for up to 19 X 19 neuronsized two-dimensional feature maps. Because of the rapidly decreasing speed of the Kohonen algorithm with an increasing number of samples, an 80486 processor is recommended. ASCII files are used for the input of an n X m data matrix and for the output of the resulting tree projected into a Kohonen map. 3MAP/DOS can process 100 input samples with up to 100 variables depending on the available computer memory and calculation time. In a subsequent development phase, 3MAP was implemented in Standard Pascal on an IBM 6000/RISC workstation under the UNIX operating system. 3MAP/UNIX is able to project data sets of 1000 samples and 100 variables in much larger arrangements of neurons in up to 20 times less CPU time compared to an 80486-PC (50 MHz). For that estimate, it has been assumed that the workstation is being used exclusively for this task. 3MAP/DOS and 3MAP/UNIX are available upon request from the authors. Results and Discussion
The pure Kohonen maps of the coarse fraction data (Figure 2) and of the fine fraction data (Figure 3) yield comparable results. Sample clusters are found in the four corners ofthe maps, in the centers, and in the four positions that are halfway along each of the map's edges. However, the addition of the minimal spanning tree (Figures 4 and 5) shows the topological differences for the two data sets Environ. Sci. Technol., Vol. 28, No. 6, 1994
1017
-0.8
-0.6
0.40 o.80!
g
-0.4
I",
-0.2
o.oo~
I-
-0.401 Flgure 6. Map of coarse particle samples recalculatedafter excluding the two variables that code the wind direction. Training parameters are identical with those in Figure 2.
~
-0.4
-0.804
?-0.8 ~
-1
1-2.0+1-2. -1.20
0.40
-0.40
-0.80 G
(
1
)
O
3
0
0
0
7 Flgure 7. Map of fine particle samples recalculated after excluding the two variables that code the wind direction. Training parameters are identical with those in Figure 2.
in the 35-dimensional variables space. The coarse fraction data (Figure 4) form a chain of clusters, arranged as an open circle with an opening, visible at the left lower corner of the map. In contrast to this configuration, the fine fraction data are arranged in a cross-like pattern in the map (Figure 5 ) . Another calculation of the map, but without the two variables that code for the wind direction are shown in Figures 6 and 7. The trees in the case of the fine fraction data keep the cross-like structure (Figure 7). In contrast, the circle formally seen in the coarse data (Figure 4) has changed to a more cross-like arrangement (Figure 6). These results are in agreement with those of Glover et al. (16) in that in Granite City the composition of the coarse fraction particles are strongly dependent on the wind direction. The fine fraction composition in Granite City was much less dependent on the wind direction. Coarse particles need stronger and more directed wind to be transported from the sources to the sampling site. If the wind is not strong enough, the particles will deposit after only being transported over short distances. Note that an exclusive map of only the two wind variables would yield an open and nearly ideal circle comparable to Figure 8 that displays the measured mean wind direction for each of the sampling intervals. However, in comparison to the other 33 unit length scaled chemical variables, the two wind direction variables contribute only 2/35 % = 5.7 % of the information input to the map. Given this consideration, the sensitivity of the maps of the coarse fraction data to the two wind variables is remarkable. An increase of the number of neurons from u = 81 by afactor of more than 4 to u = 256 allows a better unfolding 1018 Environ. Sci. Technol., Vol. 28, No. 6, 1994
>
-0.00
1.20 0.80
SOUTH
Flgure 8. Projection of the measured wind directions for each of the 49 sampling intervals used at the single sampling site in Granite City, IL, between 1986 and 1987.
; 1
0
0
0
0 0 0 0 0 0
0
0
0
0
0
0
0
0
$6 0 0 0 1
0
0
0
0
c 0 c o
0
0
0
0
0
0
0 0 0
0
0
0
0
0
1
0
0
0
0
0
0
~
0 0 0 0 0
0 0 0 % 0
0
0
0
0 0 1
10 '
0
0 1 0 0
0 0 0 o o o o o b 0 0 0 0 0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0 0 0 0 0
Flgure 9. Map of the same data set (coarse fraction) as Figure 4 using a larger scale (16 X 16) Kohonen map and linkage of the loaded neurons as described by the minimal spanningtree. Training parameters ( I ) : matrix was row and column unit length scaled, Euclidian distance measure, Ro = 19, R. = 4, ne, = 5000, ne2 = 2000, k, = 0.1, k2 = 0.008. Numbers within the loaded neurons equal the number of samples that are aggregated to the particular neuron. Four decreasing line thicknesses represent the four categories of increasing length of the tree's edges (very thick, D < 0.1; thick, D > 0.1; medium, D > 0.2; thin, D > 0.3).
of the minimal spanning tree (Figures 9 and 10). The topological structure of an open circle (Figure 9) is kept for the coarse fraction data, including the two wind direction variables. Wienke and Hopke (I) have shown that this circle-like chain structure corresponds to an arrangement of several local emission sources around the sampling site in Granite City. Furthermore, they have shown that the chemical composition of the coarse particles was the result of local industrial sources in Granite City.
o@K
o o - 0
0
o b
0 0 0
-4 1 1 8
0
0
0 0
0
0
0 o0 o0 o 0\ 0 0 0 o 0 0 0 0 0 0 01
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
‘I
N
0
J0 0
@ v 2
0 / 0 0 o 0 02 O 0
0
0
2J
0
o o o o o o 0
0
0
0
0
0
0
O k M
0
0
0
0
0
0
0
4
0
0
a 0.05 (
~
0
Figure 10. Map of the same data set (fine fraction) as Figure 5 using a larger scale (16 X 16) Kohonen map and linkage of the loaded neuronsas described by the minimalspanningtree. Training parameters are the same as in Figure 9. Numbers within the loaded neurons equal the number of samples that are aggregated to the particular neuron. Four decreasing line thlcknesses represent the four categories of increasing length of the tree’s edges (very thick, D < 0.1;thick, D > 0.1; medium, D > 0.2; thin, D > 0.3). Dotted lines and large bold characters label the 14 identified clusters. The numbers correspond to the sample numbers.
Those results correspond well to those of Glover et al. (16) on the basis of the PCA and chemical mass balance calculations. For the fine fraction data set (Figure lo), the cross-like arrangement remained stable compared to the smaller neural networks (Figures 5 and 7). For a more quantitative evaluation of the several clusters within the map for the fine fraction data (Figure 101,the concept of a geographical unit circle (GUC) is introduced (Figure 11). The clustered groups of samples, labeled in Figure 10 in alphabetic order from A to N, have been arranged according to their mean wind direction since there are variations of the wind direction within a cluster and of pollution levels. A critical step is the determination of the number of clusters in the map (Figure 10) to obtain the GUC (Figure 11). In this present work, it was decided to classify the edge lengths of the projected minimal spanning tree into four categories (very close, short, medium, distant). On this basis, loaded neurons were combined to a cluster that have a mutual distance of at least short or very close. As a second condition, at least two samples should be present in one cluster. On this basis 14 clusters A-N were detected in Figure 10 and used for the design of the GUC in Figure 11. However, it is felt that a test statistic is needed for a more quantitative decision about the number of clusters, significant distances, outliers, etc. in such a combined projection of Prim’s minimal spanning tree onto a Kohonen neural network. This search for an appropriate statistic will be the subject of future studies. The GUC (Figure 11)shows that the strongest pollution sources are located in a southern direction from the sampling site in Granite City. There were no sampling intervals with winds arriving from the northwestern direction (Figure 8). Northeastern and eastern wind directions affected the site with much lower concentrations
0.05 > 0.10
Figure 11. Arrangement of 14 clusters (as labeled in Figure 10) around a geographic unit circle (GUC). The mean cluster position and Its variation correspondto the mean wind direction and standard deviation of the wind direction of the assigned sample vectors. Four pattern levels represent the pollution levels relative to the median value calculated over the 33 unit length scaled trace element concentrations as given in Figure 12. (The medians for clusters A-N are 0.096,0.076, 0.105,0.089,0.070,0.138,0.139,0.13,0.057, 0.066, 0.043,0.030, 0.029, and 0.046.)
of particulate matter. The GUC pattern for the fine particle fraction is basically quite similar to the GUC that was obtained for the coarse particle data (Figure 8 in ref 1). The main difference between them is the larger variation in the wind directions for the fine particles. In general, both the GUCs reflect the lower degree of industrialization to the north of Granite City compared to the south. However, there are exceptions such as clusters F and I in Figure 11. The wind direction for the samples within those clusters varies from north to south with a much less constrained geographical angle. The particle sources for these two clusters seem to be diffusely distributed over the whole area. This result could be caused by a rapidly changing wind direction during the sampling interval or by multiple sources with a similar emission pattern but different locations around the site. A comparison of the two GUCs (Figure 11; Figure 8 in ref 1) with their corresponding maps (Figures 10 and 9, respectively) shows that most of the clusters have rather narrow bands of wind directions. Clusters A, G, N, M, and K in Figure 10, for example, are arranged in a counterclockwise direction, corresponding to the geographical directions southwest, south-southeast, southeast, east, and noth-northeast. Cluster F with a less specific direction is located in the center of the map. Clusters I and J are exceptions with wider bands of wind direction influence. The edges of the minimal spanning tree also seem to have environmental meaning. One type of edge forms a circle along the borders of both maps (Figures 9 and 10). This type of edge connects clusters that are located in order according to the windrose. The second type of edge can be seen only in Figure 10moving across the map. These edges seem to geographically connect ‘mixed’ or less specific sources of air pollution. The source profiles (Figure 12a-n) corresponding to the 14extracted clusters were obtained from the median values of the unit length scaled variables (single scaled data matrix). It should be noted that the clusters were obtained from mapping the doubly scaled data matrix. After the Envlron. Sci. Technol., Vol. 28, No. 6, 1994
1019
0 70
a
07
0
B 2
O O7Or-----B0
6
050-
0
;5
' m
-'I:
-
040-
-04
5
FB 0 3 -
-03
M" Yb
3
6 s
0
aoNa
C,
ps
010-
ca
~
C
AI
v
",
Sb
IIII I
I
Ab
T I
U
co C"
*TI
I
-32
90,
NI
ij
c
0
-01
c%tPy HfbTh
~a
040-
114
-
030-
Q 3
'f
6
0 20-
P S
112
se
ea
Ne SI AI
NI
Br A8
Serav CI
000-
33
0
0 0 10-
02
NS
0
1
,
, ,
, , , , , , , , , , ,
,
,
, , , , ,
,
,
,
, ,
,
,
0 10-
-01
-01
WCS
WE
-02
0 20,
07
0 70
r02
Species
Species
07 0 -
e
'B
-36
-05
0 eo
050
A
cn -03
DY
s 2n
6
*
ij
MrFe
0
Hf
-
-02
-
0 20
-01
e
010
v
E"
3
se 0,
Th
CI
6
000
0
0 -0 104
NS
I
0 20-1
VI8
-01
02
1.0
0 10
0
Species
2
0
0
Species
070,
0 70,
A 7
'
0 60-
050-
- 0 104
I
0 204
Lo2
WF
Species
1020
Environ. Sci. Technol., Vol. 28, No. 6, 1994
c
0
E V
-0
010-
g
-0
-45
P &
Species
0
1
2
07
0 70
0 70
I
m be
050i
0 20
n 0 504
01
.o
.o 2
.o 20
Species
NS
10
0 %
02
Species
Flgure 12. Median sample vectors dedicated to each cluster (as detected by the minimal spanning tree, Figure 10) for the identification and characterization of 14 different emission sources in Granite City. Variables have been scaled to unit length.
classification based on the shape of the scaled sample vectors, the 49rows of the 49 X 35 data matrix were rescaled for the computation of the profiles shown in Figure 12. In contrast to the doubly scaled data, this single scaling allows a distinction between different levels of pollution. The use of an identical range of axis labels in Figure 12 allows a direct comparison of the profiles with their median pollution levels shown by the GUC in Figure 11. The profiles in Figure 12 are in generally good agreement with the 'Factor' profiles found by Glover et al. (Table VI1 in ref 16) by PCA. Clusters A and B (Figure 12a,b) correspond to the emission pattern 'factor 4' of the local and regional steel plants to the west and southwest direction (two local plants and two plants in St. Louis,
MO). Cluster B also includes information about the local non-ferrous smelters. Cluster C corresponds to 'factor 1' (motor vehicle), mixed with the heavy metal emission pattern of the local TerraCorp (site of a former secondary lead smelter). Cluster D matches the soil/flyash pattern ('factor 2'). Clusters E and H correspond to 'factor 3' called regional sulfate whereby the Ti emission in E suggests paint pigment production. Cluster F shows the catalytic cracker pattern of the northern regional refineries between Alton and Wood River, IL ('factor 9') (17). The high potassium peak ('factor 5', fertilizer or incinerator) in Figure 12n is a clear indicator of the fertilizer company to the southeast between East St. Louis and Granite City, IL. Clusters I, J, K, and L (Figures 12i-1) Envlron. Sci. Technol., Vol. 28, No. 6, 1994
1021
have lower emission levels compared with the others. This result is due to the low level of industrialization of the area to the northeast of Granite City. These patterns are less specific and rather difficult to interpret. However, looking carefully at cluster J, the higher level of heavy metals is apparent. The numerous metal works in the region for copper, zinc, and lead are seen as potential sources for those elements. A copper smelter in the northeastern direction, a brass plant north of Granite City, and a lead smelter close to the sampling site could cause the heavy metal patterns in the fine particle fraction. Also, clusters I, K, and L carry information about other heavy metals such as Hf, Se, Ce, and Sr. Conclusions
A pattern recognition method has been developed by projecting a Prim’s minimal spanning tree into a trained Kohonen neural network. Its use as a new environmental receptor modeling technique provided reasonable results for the mapping of a 35-dimensional space of chemical concentrations (33) and meteorological (2) variables into a two-dimensional visually interpretable picture. The arrangement of clusters in this 2-D map correctly reflected the geographical and industrial situation around a single sampling site in Granite City, IL. In this way, the chemical ‘coding’ of the different air pollution sources has been ‘decoded’ into a geographical map. The map has been transferred to a geographical unit circle (GUC) around the site. Using this GUC, the main sources of air pollution in Granite City could be characterized by their geographical direction and their elemental emission pattern. A comparison of the GUC for fine and coarse particle data showed a less directed origin for the fine particles. This result is in agreement with prior results obtained by principal component analysis and chemical mass balance calculations. The source profiles obtained also agree very well with the factor profiles obtained by the PCA. The Kohonen self-organizing neural network forms an a priori step of data compression. After that, Prim’s minimal spanning tree is calculated between the loaded neurons. The edges of the spanning tree form environmentally meaningful connection lines between the clusters in the Kohonen map. In a subsequent study (15), data sets from a multiple sampling site network from the Southern California Air Pollution Study have been evaluated. In this way, the advantages and drawbacks of this new receptor modeling technique combining Prim’s minimal spanning tree with a Kohonen self-organizing neural network were explored further.
1022
Environ. Sci. Technol., Vol. 28, No. 6, 1994
Acknowledgments
This work was supported by the National Science Foundation under Grant ATM 9114750. We would like to thank Drs. Clyde Sweet and Stephen Vermette and the Illinois State Water Survey for taking the samples, the Illinois Environmental Protection Agency for their support of the neutron activation analysis of the Granite City samples, and Sheldon Landsberger for performing the neutron activation analyses of these samples. Literature Cited Wienke, D.; Hopke, P. K. Projection of Prim’s Minimal Spanning Tree into Kohonen’s Neural Network for Identification of Airborne Particle Sources by Their Multielemental Trace Pattern. Anal. Chim. Acta 1994,291, 1. Kohonen, T. Proceedings of the 2nd Scandinavian Conference on Image Analysis; Suomen Hahmontunnistustutkimuksen Seura r.y.: Helsinki, 1981; p 214. Kohonen, T. Self-Organization and Associated Memory; Springer Verlag: Heidelberg, 1989. Gross, M.; Seibert, F. Neural Network for Image Analysis of Environmental Protection. In Visualisierungvon Umweltdaten; Denzer, R., Ed.; Springer Verlag: BerlinHeidelberg-New York, 1991. Arrigo, P.; Giuliano, F.; Scalia, F.; Rapallo, A.; Damiani, G. Comput. Appl. Biosci. 1991, 7 (3), 353. Ross, V. S.; Croall, I. F.; Maefic, H. J. H. Quant. Struct.Act. Relat. 1991, 10 (l), 6. Melssen, W. J.; Smits, J. R. M.; Rolf, G. H.; Kateman, G. Chemom. Intell. Lab. Syst. 1993, 18, 195. Zupan, J.; Gasteiger, J. Anal. Chim. Acta 1991,248 (I), 1. Zupan, J.; Gasteiger, J.; Neural Networks for Chemists; VCH Publishers: Weinheim, 1993. Florek, K. Colloq. Math. 1951, 2, 282. Kruskal, J. B. Proc. Am. Math. SOC.1956, 7, 48. Prim, R. C. Bell. Syst. Technol. J. 1957, 36, 1389. Kohonen, T. Speech Recognition Based on TopologyPreserving Neural Maps. In Neural Computing Architectures; Aleksander, I., Ed.; MIT Press: Cambridge, MA, 1989; p 26. Wasserman, P, D. Neural Computing-Theory and Practice; Van Nostrand Reinhold: New York, 1989; p 65. Wienke, D.; Gao, N.; Hopke, P. K. Multiple Receptor Modeling by a Minimal Spanning Tree Projected into a Kohonen Neural Network. Environ. Sei. Technol. 1994, following paper in this issue. Glover, D. M.; Hopke, P. K.; Vernette, S. J.; Landsberger, S.; D’Auben, D. R. J. Air Waste Manage. Assoc. 1991, 41 (3), 294. Olmez, I.; Gordon, G. E. Science 1985,229, 966.
Received for review May 18,1993. Revised manuscript received January 12, 1994. Accepted March 7, 1994.” 0 Abstract
published in Advance ACS Abstracts, April 1, 1994.