Environmental Applications of Chemometrics - American Chemical

Atmospheric particle types are identified using k-means cluster analysis. Nearest neighbor classification is used to produce particle number versus ty...
0 downloads 0 Views 1MB Size
9

Downloaded by NATL UNIV OF SINGAPORE on May 6, 2018 | https://pubs.acs.org Publication Date: November 6, 1985 | doi: 10.1021/bk-1985-0292.ch009

Cluster Analysis of Chemical Compositions of Individual Atmospheric Particles Data 1,2

2

2

T. W. Shattuck , M. S. Germani, and P. R. Buseck 1

Department of Chemistry, Colby College, Waterville, ME 04901 Departments of Chemistry and Geology, Arizona State University, Tempe, AZ 85287

2

Atmospheric particle types are identified using k-means cluster analysis. Nearest neighbor classification is used to produce particle number versus type histograms that allow identification of spatial and temporal emission patterns. Factor analysis is carried out on the particle-type results from several sampling periods or sites to identify relationships between particle types and for source identification. The methods are applied to the elemental composition of particles from the Phoenix aerosol which are obtained using an automated analytical scanning electron microscope. Seven methods are considered for choosing cluster seedpoints. Cluster significance is judged using the ratio of the sum of squared distances between clusters to the sum of squared distances within clusters. In order to account for the full variability in the data set, more clusters are necessary than may be statistically significant. Data obtained from the analysis of i n d i v i d u a l atmospheric p a r t i c l e s i s i d e a l f o r the i d e n t i f i c a t i o n of p a r t i c l e sources and f o r the study of p a r t i c l e dynamics and emission patterns ( 1 ) . Using an a n a l y t i c a l scanning electron microscope (ASEM) equipped f o r energydispersive X-ray spectrometry (EDS), the elemental composition, s i z e , shape and morphology of p a r t i c l e s can be determined. This information i s necessary f o r determining the e f f e c t s of p a r t i c l e s on such important areas as health, climate and v i s i b i l i t y . Individual p a r t i c l e analysis i s p a r t i c u l a r l y useful f o r studying elemental speciation and association, p a r t i c l e agglomeration, surface coatings and the d i s t r i b u t i o n of elements as a function of p a r t i c l e s i z e (2=6). The ASEM i n our laboratory i s automated so that analyses of about 1000 p a r t i c l e s are commonly used t o characterize each sample. The a b i l i t y t o rapidly analyze large numbers of p a r t i c l e s necessitates the development of s t a t i s t i c a l methods f o r data reduction and analysis of these large data sets. 0097-6156/85/0292-0118S06.00/0 © 1985 American Chemical Society Breen and Robinson; Environmental Applications of Chemometrics ACS Symposium Series; American Chemical Society: Washington, DC, 1985.

Downloaded by NATL UNIV OF SINGAPORE on May 6, 2018 | https://pubs.acs.org Publication Date: November 6, 1985 | doi: 10.1021/bk-1985-0292.ch009

9.

SHATTUCK ET AL.

Cluster Analysis of Atmospheric Particles

119

Cluster analysis i s used to determine the p a r t i c l e types that occur i n an aerosol. These types are used to c l a s s i f y the p a r t i c l e s i n samples c o l l e c t e d from various l o c a t i o n s and sampling periods. The results of the sample c l a s s i f i c a t i o n s , together with meteorological data and bulk a n a l y t i c a l data from methods such as instrumental neutron a c t i v a t i o n analysis (INAA), are used to study emission patterns and to screen samples f o r further study. The c l a s s i f i c a t i o n r e s u l t s are used i n factor analysis to characterize s p a t i a l and temporal structure and to a i d i n source a t t r i b u t i o n . The c l a s s i f i c a t i o n r e s u l t s are also used i n mass balance comparisons between ASEM and bulk chemical analyses. Such comparisons allow the combined use o f the detailed characterizations of the i n d i v i d u a l p a r t i c l e analyses and the trace-element capability o f bulk a n a l y t i c a l methods. These methods, while being developed f o r the study of the Phoenix aerosol, are also applicable to a wide range of studies. The vast majority of p a r t i c l e s >1um i n diameter i n the Phoenix aerosol are c r u s t a l i n o r i g i n , representing a wide variety of mineral p a r t i c l e s . They thus provide a stringent test case f o r the methods, since these p a r t i c l e s produce many large, c l o s e l y spaced c l u s t e r s , and these tend to obscure smaller, a t y p i c a l c l u s t e r s that are of anthropogenic o r i g i n . Cluster A n a l y s i s There are three goals f o r c l u s t e r analysis. 1) The most immediate i s the q u a l i t a t i v e i d e n t i f i c a t i o n of the types of p a r t i c l e s that occur i n an aerosol. The compositions of the c l u s t e r s often d i r e c t l y i n d i c a t e sources. For example, p a r t i c l e s containing Pb, CI and Br indicate auto exhaust. The c l u s t e r s may also provide information on formation mechanisms. For example, a c l u s t e r composed mostly of calcium and s u l f u r but with a small amount of s i l i c o n and a few percent of t r a n s i t i o n metals suggests a CaSO^ p a r t i c l e with a s i l i c a t e core which i s most l i k e l y formed as a r e s u l t of combustion processes. 2) The next goal i s t o reduce the mass of data to a tractable s i z e , but i n a way that emission patterns can be e a s i l y discerned. This i s done by using the c l u s t e r centroids from representative samples to define the p a r t i c l e types i n the aerosol. P a r t i c l e s from the remainder of the data set are assigned to the various p a r t i c l e types. Histograms of the number of p a r t i c l e s f o r each p a r t i c l e type, f o r each sampling s i t e and period, provide a rapid way t o follow temporal and s p a t i a l emission patterns. The p a r t i c l e type c l a s s i f i c a t i o n s also are used as input f o r factor analysis. 3) The t h i r d goal i s to allow poorly populated c l u s t e r s to be treated separately from the c l u s t e r s containing many p a r t i c l e s . An example of the need f o r t h i s separation a r i s e s i n the Phoenix aerosol. This i s because about 75% of the p a r t i c l e s >1.0 urn i n diameter i n the Phoenix aerosol are quartz or alumino-silicate mineral p a r t i c l e s which make i t d i f f i c u l t to monitor p a r t i c l e s of s i m i l a r s i z e -that are not of c r u s t a l o r i g i n . P a r t i c l e s that are not represented by a c l u s t e r are l e f t unassigned. These unassigned p a r t i c l e s are p a r t i c u l a r l y useful f o r studying unusual events. However, t h i s requires that the c l u s t e r analysis i s s u f f i c i e n t l y i n c l u s i v e so that only unusual p a r t i c l e s are i n the s e t of unassigned p a r t i c l e s . Such separation i s p a r t i c u l a r l y important i f

Breen and Robinson; Environmental Applications of Chemometrics ACS Symposium Series; American Chemical Society: Washington, DC, 1985.

ENVIRONMENTAL APPLICATIONS OF CHEMOMETRICS

120

Downloaded by NATL UNIV OF SINGAPORE on May 6, 2018 | https://pubs.acs.org Publication Date: November 6, 1985 | doi: 10.1021/bk-1985-0292.ch009

there i s a subsequent need t o return to those p a r t i c l e s f o r further analysis. There are three steps to nonhierarchical c l u s t e r a n a l y s i s . The first i s t o choose seedpoints; these are approximate points compositions from which t o s t a r t c l u s t e r a n a l y s i s . Choosing seedpoints i s by f a r the most c r i t i c a l step. Secondly, a c l u s t e r analysis algorithm i s applied to define the c l u s t e r s . F i n a l l y , the statistical s i g n i f i c a n c e of the c l u s t e r s must be determined. In other words, are the c l u s t e r s w e l l resolved or do they overlap? The three steps are detailed below. Choosing Seedpoints. A group of successive observations or a set of observations chosen at random from the data set may be used f o r seedpoints. However, the r e s u l t s of such simple procedures are often not r e l i a b l e . Seven d i f f e r e n t methods are considered f o r choosing seedpoints i n t h i s study. The f i r s t four are standard h i e r a r c h i c a l techniques; s i n g l e , complete, average (between merged groups) linkage and Ward s method (£). Nearest centrotype s o r t i n g ( d

s o

t n e

t e s t

Cluster Algorithm. The Forgy variety of k-means c l u s t e r analysis (£) i s chosen because of i t s speed f o r large data sets. Forgy kmeans c l u s t e r analysis i s an i t e r a t i v e process. In the f i r s t i t e r a t i o n observations are assigned to the nearest centroid. This defines the i n i t i a l c l u s t e r s . The composition of the observations i n each c l u s t e r are then averaged to f i n d approximate centroids. Let xk be the centroid vector f o r c l u s t e r k, with components xkj» f o r

Breen and Robinson; Environmental Applications of Chemometrics ACS Symposium Series; American Chemical Society: Washington, DC, 1985.

9.

SHATTUCK ET A L

Cluster Analysis of Atmospheric Particles

123

a l l v a r i a b l e s j . Then the average i s given by

1=1

Downloaded by NATL UNIV OF SINGAPORE on May 6, 2018 | https://pubs.acs.org Publication Date: November 6, 1985 | doi: 10.1021/bk-1985-0292.ch009

χ

J

for the n^ observations, x-fj, i n the c l u s t e r . I f the i n i t i a l seedpoints are f a r from the true centroids, then the true and approximate centroids so calculated may not be very close. This may be improved through successive i t e r a t i o n s by using the approximate centroids as seedpoints and then repeating the assignment and averaging steps. This continues u n t i l the centroids no longer change on subsequent i t e r a t i o n s . Cluster centroids are updated at the end of each assignment cycle. The E u c l i d i a n distance measure i s used. O u t l i e r s are excluded by choosing a maximum distance f o r c l u s t e r assignment. Convergence of the centroids may take as many as f i v e i t e r a t i o n s of the k-means procedure. Cluster S i g n i f i c a n c e There are two goals f o r s i g n i f i c a n c e t e s t i n g . The f i r s t i s t o estimate the number of c l u s t e r s i n the data and the second i s to i d e n t i f y the amount of overlap between the various c l u s t e r s . Unfortunately, no completely s a t i s f a c t o r y s t a t i s t i c a l t e s t e x i s t s . One i s faced with a d i f f i c u l t decision, e i t h e r to ignore the problem or t o make do with a v a i l a b l e t e s t i n g methods. The simplest, most straight-forward t e s t i s chosen f o r t h i s study, the sum of squares ratio test. Even though the t e s t method may be flawed, i t i s necessary to underscore the importance and usefulness of s t a t i s t i c a l measures of c l u s t e r separation. The sum of squares r a t i o t e s t compares two c l u s t e r s by f i n d i n g the r a t i o of the between-clusters sum of squares (B) to the w i t h i n c l u s t e r s sum of squares(W). This i s based on the w e l l known sum of squares decomposition, Τ =Β + W where Τ i s the t o t a l sum of squares f o r the two c l u s t e r s . c l u s t e r k w i t h nk members and centroid vector x^,

For each

n

2 k Τ = 2 2 (x!f - x) (xÎ - x) k=1 i=1 ~ " f

-

1

1

where x i i s observation vector i from c l u s t e r k and χ i s the mean vector" over a l l the observations i n the data set. The prime indicates vector transposition. Then 2

-

Β = n^x^j - x) + n ( x 2

2

-2

- x)

and

Breen and Robinson; Environmental Applications of Chemometrics ACS Symposium Series; American Chemical Society: Washington, DC, 1985.

ENVIRONMENTAL APPLICATIONS OF CHEMOMETRICS

124

2

n

k

k=1 i=1

2 1

k

1

k

The term dik i s the Euclidian distance centroid k. The test s t a t i s t i c i s then

Downloaded by NATL UNIV OF SINGAPORE on May 6, 2018 | https://pubs.acs.org Publication Date: November 6, 1985 | doi: 10.1021/bk-1985-0292.ch009

C =

n

k=1 i=1 between

k 1

observation

i

and

B/W

The C - s t a t i s t i c i s used to t e s t the s i g n i f i c a n c e of p a i r s of c l u s t e r s under the n u l l hypothesis that the observations are a sample from a single normal population. Hartigan ( H ) and Engleman and Hartigan (12) compiled a set of percentage points f o r C f o r c l u s t e r i n g on one variable» assuming a normal d i s t r i b u t i o n of the observations f o r optimal c l u s t e r i n g obtained by maximizing B/W. These percentage points cannot s t r i c t l y be used to t e s t the s i g n i f i c a n c e of c l u s t e r s i n t h i s study since a) the c l u s t e r i n g occurs over many v a r i a b l e s (dimensions), b) the c l u s t e r s obtained are usually at best only l o c a l l y optimal and c) the underlying observations are not normally d i s t r i b u t e d . However, applying the C - s t a t i s t i c i n a simulation study, using k-means c l u s t e r i n g of synthetic data over 3 to 9 v a r i a b l e s generated using a rectangular d i s t r i b u t i o n , shows the Engleman and Hartigan percentage points to be useful. The percentage points seem to be rather i n s e n s i t i v e to the number of variables. A low confidence l e v e l (50%) i s normally chosen when applying the percentage points to actual data. Regardless of the f a i l i n g s of a given s t a t i s t i c a l t e s t , i t i s the philosophy of the use of the test that i s most important. This i s e s p e c i a l l y c l e a r when addressing the problem of estimating the number of c l u s t e r s i n the data set. In some standard s t a t i s t i c a l packages t h i s i s normally handled i n the following way. Cluster analysis i s c a r r i e d out by i n t e n t i o n a l l y using too many seedpoints. The distance between the r e s u l t i n g centroids or the variance of the variables i n each c l u s t e r i s then used to decide which c l u s t e r s to combine and which c l u s t e r s to s p l i t . Using the i n t e r c e n t r o i d distance as a c r i t e r i o n has the danger of combining two w e l l resolved but closely spaced c l u s t e r s . Using the variance as a c r i t e r i o n has the danger of a r b i t r a r i l y d i v i d i n g a single large c l u s t e r . However, using the sum of squares r a t i o , as i n t h i s study, i s a more r e l i a b l e c r i t e r i o n because i t takes i n t o account both the between-centroid distance and the dispersion of the c l u s t e r s . In t h i s study, we purposely started w i t h too many seedpoints. The number of seedpoints f o r analysis and the f i n a l seedpoint set i s determined i n the following way. A f t e r an i n i t i a l round of c l u s t e r a n a l y s i s , the seedpoint which gives the l a r g e s t number of t e s t f a i l u r e s i s rejected. A f t e r a seedpoint i s rejected c l u s t e r analysis i s repeated. This process continues u n t i l the number of unassigned p a r t i c l e s begins to increase rapidly and the number of s i g n i f i c a n t c l u s t e r s decreases. In our case, i t i s q u i t e l i k e l y that there are several c l u s t e r s i n the f i n a l set that are not s i g n i f i c a n t ( e s p e c i a l l y i n the group of alumino-silicate c l u s t e r s ) , but i t i s necessary t o keep some of them i n order adequately to describe the v a r i a t i o n s that occur when the centroids are used i n discriminant analysis f o r other sampling s i t e s and periods. This

Breen and Robinson; Environmental Applications of Chemometrics ACS Symposium Series; American Chemical Society: Washington, DC, 1985.

9.

SHATTUCK ET AL.

Cluster Analysis of Atmospheric Particles

125

s i t u a t i o n arises i n part because k-means analysis works best on spherical c l u s t e r s , but many c l u s t e r s are c e r t a i n l y not spherical. In addition, the natural v a r i a b i l i t y o f aerosol p a r t i c l e s undoubtedly produces s i g n i f i c a n t overlap between c l u s t e r s . Since there i s , at present, no adequate s t a t i s t i c a l t e s t f o r significance and no rapid method f o r c l u s t e r i n g non-spherical c l u s t e r s , the actual use of the c l u s t e r centroids f o r the c l a s s i f i c a t i o n of p a r t i c l e s must serve as the t e s t f o r the adequacy o f the c l u s t e r analysis. That i s , the usefulness and v a l i d i t y of the r e s u l t s i s the ultimate t e s t of the c l u s t e r analysis.

Downloaded by NATL UNIV OF SINGAPORE on May 6, 2018 | https://pubs.acs.org Publication Date: November 6, 1985 | doi: 10.1021/bk-1985-0292.ch009

Particle Classification P a r t i c l e c l a s s i f i c a t i o n i s c a r r i e d out using a nearest neighbor criterion with Euclidian distance. Histograms o f the s i z e d i s t r i b u t i o n w i t h i n each p a r t i c l e type can be generated i n addition to p a r t i c l e number versus p a r t i c l e type histograms. P a r t i c l e s are not c l a s s i f i e d i f they are further than a chosen maximum distance from the nearest centroid. Histograms of the d i s t r i b u t i o n of elements i n the unassigned p a r t i c l e s are useful f o r following unusual events. (Linear discriminant analysis i s not used because of the extreme inhomogeneity o f the c l u s t e r variance-covariance matrices i n the data.) A p a r t i c u l a r l y powerful use of the c l a s s i f i c a t i o n r e s u l t s i s i n factor analysis. This w i l l help to uncover i n t e r r e l a t i o n s h i p s among the p a r t i c l e types and w i l l provide additional information f o r source a t t r i b u t i o n . The r e s u l t s of the factor analysis are also h e l p f u l f o r judging the s i g n i f i c a n c e of the c l u s t e r analysis, i n that i f the occupations o f two s i m i l a r p a r t i c l e types are uncorrelated over several samples then t h i s indicates that the p a r t i c l e types and the c l u s t e r s from which they are derived are significantly different. Experimental Methods The elemental compostion of the i n d i v i d u a l p a r t i c l e s used i n t h i s study were determined by energy-dispersive, X-ray spectrometry (EDS). The data were acquired using an automated a n a l y t i c a l scanning electron microscope (JEOL JSM-35). The automation system includes both sample stage and electron-beam automation, allowing unattended operation. Elemental compositions were obtained from the p a r t i c l e X-ray spectrum by integration of the background-corrected X-ray peak i n a region-of-interest about one of the c h a r a c t e r i s t i c X-ray l i n e s for each element. The region of i n t e r e s t i n t e g r a l s are converted t o r e l a t i v e abundance concentrations by d i v i d i n g the i n t e g r a l for each element by the sum over a l l the elements detected i n the p a r t i c l e . No other variable normalization was used i n order to avoid the i n c l u s i o n of noise i n the form o f a n a l y t i c a l uncertainty due to the r e l a t i v e l y large detection l i m i t s inherent i n EDS analysis. Data f o r 31 elements can be rapidly determined but r e s u l t i n some interferences between elements. No spectral curve f i t t i n g or matrix (ZAF) correction schemes were used i n t h i s survey study. ZAF correction does not seem t o markedly a i d the c l u s t e r analysis

Breen and Robinson; Environmental Applications of Chemometrics ACS Symposium Series; American Chemical Society: Washington, DC, 1985.

Downloaded by NATL UNIV OF SINGAPORE on May 6, 2018 | https://pubs.acs.org Publication Date: November 6, 1985 | doi: 10.1021/bk-1985-0292.ch009

126

ENVIRONMENTAL APPLICATIONS OF CHEMOMETRICS

process, presumably because the natural dispersion of the c l u s t e r s i s so large and due to the errors i n applying t h i c k f i l m ZAF corrections to small p a r t i c l e s . The elements used i n t h i s study were Na, Mg, A l , S i , Fe, K, Ca, S, P, CI, T i , Mn, Cu, Zn, Cr, N i , As, Br and Pb. The p a r t i c l e s f o r t h i s study were c o l l e c t e d on Nuclepore f i l t e r s . P a r t i c l e s i n the size range of 1 to 15 urn i n diameter were analyzed. Cluster analysis i s f a r from an automatic technique; each stage of the process requires many decisions and therefore close supervision by the analyst. I t i s imperative that the procedure be as i n t e r a c t i v e as possible. Therefore, f o r t h i s study, a menu-driven i n t e r a c t i v e s t a t i s t i c a l package was w r i t t e n f o r PDP-11 and VAX (VMS and UNIX) s e r i e s computers, which includes adequate computer graphics c a p a b i l i t i e s . The graphical output includes a v a r i e t y of histograms and scatter p l o t s based on the raw data or on the r e s u l t s of principal-components analysis or canonical-variates analysis (lit). H i e r a r c h i c a l c l u s t e r trees are also available. A l l of the methods mentioned i n t h i s study were included as an i n t e g r a l part of the package. Results The seven seedpoint methods were tested using a data set containing 1000 p a r t i c l e s from a representative aerosol sample c o l l e c t e d i n downtown Phoenix. The f i r s t 70 successive observations were chosen from the data as the i n i t i a l set f o r choosing seedpoints for each method. Each of the seven methods was applied to reduce t h i s set to 30 seedpoints. The 30 seedpoints were then used i n k-means c l u s t e r analysis. No two of the seedpoint sets were i d e n t i c a l ; however, 25 out of 30 f i n a l c l u s t e r s were found i n each set. The unique seedpoints were found to be s t a t i s t i c a l l y i n s i g n i f i c a n t , and i n general, the d i f f e r e n t methods seemed to be d i v i d i n g large complex c l u s t e r s i n s l i g h t l y d i f f e r e n t ways. Single linkage gave the most unusual set of seedpoints and would seem to be an excellent companion method to the "merge" procedure, e s p e c i a l l y since i t gave an unusually small t o t a l number of t e s t f a i l u r e s . However, f o r general use single linkage does not do a good enough job on c l u s t e r s with t y p i c a l composition, such as the alumino-silicate c l u s t e r s . Ward's method gave a s l i g h t l y smaller number of t e s t f a i l u r e s than complete and average linkage, but otherwise a l l three gave comparable r e s u l t s . Nearest centrotype s o r t i n g also gave comparable r e s u l t s with t h i s data set. I t s use i s probably warranted only f o r c l u s t e r i n g data containing s i m i l a r , c l o s e l y spaced c l u s t e r s with few atypical clusters. The " r e f i n e " method gave the largest number of pairwise test f a i l u r e s . The "merge" procedure also gave a r e l a t i v e l y large number of t e s t f a i l u r e s , but the seedpoints were w e l l balanced between the c l u s t e r s of t y p i c a l and atypical composition. A l l of the methods gave only 2 c l u s t e r s that contain no s i g n i f i c a n t test f a i l u r e s , except " r e f i n e " which gave only one. However, a l l seven methods gave the same number of c l u s t e r s w i t h l e s s than 4 t e s t f a i l u r e s . The differences between the methods would have been more pronounced i f the f i n a l number of seedpoints had been smaller. The "merge" method was used i n a l l of the following studies, with the two-round procedure, described above,

Breen and Robinson; Environmental Applications of Chemometrics ACS Symposium Series; American Chemical Society: Washington, DC, 1985.

9.

SHATTUCK ET AL.

Cluster Analysis of Atmospheric Particles

127

Downloaded by NATL UNIV OF SINGAPORE on May 6, 2018 | https://pubs.acs.org Publication Date: November 6, 1985 | doi: 10.1021/bk-1985-0292.ch009

f o r choosing seedpoints. The sum-of-squares r a t i o test was used t o eliminate some of the nonsignificant c l u s t e r s . These methods, when applied to the downtown Phoenix aerosol sample, produced a s a t i s f y i n g range of p a r t i c l e types and l e f t unassigned only about 4% of the p a r t i c l e s (Table I ) . The major p a r t i c l e type was quartz which accounted f o r 19% of the p a r t i c l e s . Various alumino-silicate types were the next most abundant. Easily i d e n t i f i a b l e types included c l u s t e r s r i c h i n only one to three elements, including i r o n ( 7 % ) , calcium (3%), c a l c i u m - s i l i c o n - i r o n (4%), calcium-sulfur (1%), lead (3%), lead-chloride-bromide (3%) and titanium (2%). The abundances of these p a r t i c l e types, indicated i n parentheses, vary widely from s i t e to s i t e . Many p a r t i c l e s r i c h i n heavy metals were found i n the unassigned group at t h i s point. Table I .

Cluster Composition f o r Representative Phoenix Aerosol Sample

Elemental Composition

Similar Mineral^

% Abundance

S i K A l Fe S i A l Κ Fe S i A l Fe Ca S i Ca Fe A l S i Fe A l Κ Si Fe S i A l Mg Fe Ca S i Fe Ca Ca S S i Ca S i Fe Ti S i T i Fe S i Κ CI S i Pb CI Br Pb S i Fe Zn S i S S S i Na Unassigned

Orthoclase Muscovite Albite/Montmorillonite (Epidote) Biotite Quartz Ripidolite/Chlorite Magnetite Pyroxene Calcite Gypsum (Tremolite/Actinolite) (Rutile)

7 15 14 6 4 19 2 7 4 3 1 2 2 0.5 0.5 3 3 1 1 4

a

S i indicates that S i may be present i n the p a r t i c l e s or may be due to a spectral a r t i f a c t (carbon absorp­ t i o n edge). ( ) indicates only a possible mineral assignment f o r the c l u s t e r . In a further t e s t of the c l u s t e r i n g procedure, analyses o f p a r t i c l e s of standard clay minerals, r i p i d o l i t e , montmorillonite, nontronite as w e l l as muscovite mica, were clustered. The procedure e a s i l y i d e n t i f i e d the d i f f e r e n t minerals, g i v i n g r i s e to w e l l

Breen and Robinson; Environmental Applications of Chemometrics ACS Symposium Series; American Chemical Society: Washington, DC, 1985.

ENVIRONMENTAL APPLICATIONS OF CHEMOMETRICS

Downloaded by NATL UNIV OF SINGAPORE on May 6, 2018 | https://pubs.acs.org Publication Date: November 6, 1985 | doi: 10.1021/bk-1985-0292.ch009

128

resolved c l u s t e r s . These r e s u l t s , and r e s u l t s from other standard mineral p a r t i c l e s , were compared to the c l u s t e r s determined from the Phoenix aerosol and l i s t e d i n Table I . This comparison indicated that, while many c l u s t e r s were w e l l resolved (e.g., those mentioned above), the alumino-silicate c l u s t e r s i n the Phoenix samples were probably mixtures of several mineral types. The minerals indicated i n Table I have been i d e n t i f i e d i n the Phoenix aerosol i n the 5 t o 50 urn diameter s i z e range (JjJ). They were l i s t e d not as absolute assignments but as suggestions f o r the most prominent mineral type i n the given c l u s t e r . Obviously, many of the p a r t i c l e s were not necessarily c r u s t a l i n o r i g i n . For example, there are many sources of i r o n and i r o n oxide p a r t i c l e s other than magnetite. Also, evidence from other s i t e s indicated that the titanium c l u s t e r may r e s u l t from an anthropogenic source. Table I I . C l a s s i f i c a t i o n Results f o r Chandler, Arizona, as percent of t o t a l p a r t i c l e s c l a s s i f i e d . Date

Quartz

Orthoclase

Muscovite

Calcite

Pyroxenes

Feb 22 23 24 26 27 28 Mar 3 4

8.8 10.0 8.1 11.9 15.8 10.6 10.6 10.6

8.3 7.5 8.7 6.5 7.7 9.4 5.6 6.4

21.3 22.0 32.0 24.7 18.2 21.6 19.3 22.5

1.3 4.3 2.2 2.0 1.1 1.4 8.0 3.5

4.0 5.8 4.2 1.7 1.7 2.0 5.2 1.8

Using the p a r t i c l e types outlined i n Table I , a series of samples from Chandler, Arizona, were c l a s s i f i e d . The samples were c o l l e c t e d over a two- week period i n l a t e February and early March. The r e s u l t s f o r several p a r t i c l e types are l i s t e d i n Table I I . The f i r s t i n t e r e s t i n g r e s u l t i s that muscovite i s always more abundant then quartz, i n contrast w i t h the downtown Phoenix sample. In addition, the pyroxene, muscovite and c a l c i t e types are negatively correlated, over time, w i t h quartz. The c l a s s i f i c a t i o n r e s u l t s were used as input f o r p r i n c i p a l components analysis, with the observations being the d i f f e r e n t samples and the variables the p a r t i c l e types. The f i r s t p r i n c i p a l component has a predominant weighting on muscovite, explaining 52% of the variance of the data set. The second p r i n c i p a l component has strong p o s i t i v e weightings on the pyroxenes and c a l c i t e and strong negative weightings on quartz, explaining 23% of the variance. Therefore the c r u s t a l p a r t i c l e s show a s t r i k i n g difference i n behavior, counter t o what one would have expected. This does not appear t o be simply random behavior because the sample scores on p r i n c i p a l component two show a good c o r r e l a t i o n with the east-west d i r e c t i o n of upper l e v e l winds.

Breen and Robinson; Environmental Applications of Chemometrics ACS Symposium Series; American Chemical Society: Washington, DC, 1985.

9.

SHATTUCK ET AL.

Cluster Analysis of Atmospheric Particles

129

Summary

Downloaded by NATL UNIV OF SINGAPORE on May 6, 2018 | https://pubs.acs.org Publication Date: November 6, 1985 | doi: 10.1021/bk-1985-0292.ch009

K-means c l u s t e r analysis i s an excellent method f o r the reduction of i n d i v i d u a l - p a r t i c l e data, i f extra c l u s t e r s are used t o allow f o r the non-spherical shape and natural v a r i a b i l i t y of atmospheric p a r t i c l e s . The "merge" method f o r choosing seedpoints i s useful f o r detecting the types of low abundance p a r t i c l e s that are i n t e r e s t i n g f o r urban atmospheric studies. A p p l i c a t i o n to the Phoenix aerosol suggests that the a b i l i t y t o discriminate between various types of c r u s t a l p a r t i c l e s may y i e l d valuable information i n addition to that derived from p a r t i c l e types more commonly associated with anthropogenic a c t i v i t y . Acknowledgments F i n a n c i a l support f o r t h i s work was provided by grants ATM-8022849 and ATM-8404022 from the Atmospheric Chemistry D i v i s i o n of the National Science Foundation.

Literature Cited 1. Post, J. T.; Buseck, P. R. Environ. Sci. Technol., 1984, 18, 35-42. 2. Armstrong, J. Τ., Buseck, P. R. Electron Microsc. X-Ray Appl. Environ. Occup. Health Anal., [Symp.], [2nd], 1978, 211-228. 3. Bradley, J. P., Goodman, P., Chan, I. Y. T., Buseck, P. R. Environ. Sci. Technol., 1981, 15, 1208-1212. 4. Bradley, J. P., Buseck, P. R. Nature, 1983, 306, 770-772. 5. Buseck, P. R. and Bradley, J. P. In "Heterogeneous Atmospheric Chemistry"; Schryer, D. R., Ed.; GEOPHYS. MONOGR. No. 26, Am. Geophys. Union: Washington, D.C., 1982; pp. 57-76. 6. Thomas, E. and Buseck, P. R., Atmospheric Environment, 1983, 17, 2299-2301. 7. Anderberg, M. R. "Cluster Analysis for Application"; Academic Press: New York, 1973. 8. Massart, D. L.; Kaufman, L. "The Interpretation of Analytical Chemical Data by the Use of Cluster Analysis"; Wiley: New York, 1983; p. 107. 9. SAS Institute Inc. "SAS User's Guide: Statistics"; SAS Institute Inc: Cary, NC, 1982; pp. 417-434. 10. Tou, J. T.; Gonzalez, R. C. "Pattern Recognition Principles"; Addison-Wesley: Reading, MA, 1974; pp. 90-92. 11. Hartigan, J. A. "Clustering Algorithms"; Wiley: New York, 1975, p. 97. 12. Engelman, L.; Hartigan, J. A. J. Am. Stat. Assoc. 1969, 64, 1647-1648. 13. Pewe, T. L.; Pewe, Ε. Α.; Pewe, R. H.; Journaux, Α.; Slatt, R. M. Spec. Pap.--Geol. Soc. Am. 1981, No. 186. 14. Friedman, H. P.; Rubin, J. J. Am. Stat. Assoc. 1967, 62, 1159-1178. RECEIVED July 17, 1985

Breen and Robinson; Environmental Applications of Chemometrics ACS Symposium Series; American Chemical Society: Washington, DC, 1985.