3 Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on March 11, 2018 | https://pubs.acs.org Publication Date: November 6, 1985 | doi: 10.1021/bk-1985-0292.ch003
Exploratory Data Analysis of Rainwater Composition 1
2
1
2
Richard J. Vong , Ildiko E. Frank , Robert J. Charlson , and Bruce R. Kowalski 1
Environmental Engineering and Science Program, Department of Civil Engineering, University of Washington, Seattle, WA 98195 Laboratory for Chemometrics, Department of Chemistry, University of Washington, Seattle, WA 98195
2
While some aspects of rainwater composition are un derstood, a large number of important questions remain unresolved, particularly those relating to sources and controlling factors. In search for the chemical and meteorological factors controlling rain water composition we have utilized SIMCA, PLS, princ ipal component factor analysis, and cluster analysis in the analysis of data consisting of rainwater samples collected in Western Washington State in 1982-83. Major steps of this type of analysis in clude initial data scaling and transformation, outlier detection, determination of the underlying factors, and evaluation of the effect of experimental error. To reduce potential masking of source-recep tor relationships by meteorological variability a data normalization technique was utilized. The com ponents identified for Western Washington rainwater were interpreted to represent the influence of atmo spheric oxidation of sulfur and nitrogen compounds, seasalt, soil, and the emissions of a nearby copper smelter. Considerable interest i n the composition of rainwater has been expressed by members of the s c i e n t i f i c community i n the United States and elsewhere. "Acid r a i n " has been suggested as the c u l p r i t for observed degredation of t e r r e s t r i a l and aquatic ecosystems i n the Northeastern United States, Canada, Germany, and Scandanavia. While some aspects of rainwater composition are understood, a large number of important questions remain unresolved, p a r t i c u l a r l y those r e l a t i n g to sources and c o n t r o l l i n g factors. Studies of rainwater composition t y p i c a l l y include the measurement of the concentrations of a number of chemical species, conduct i v i t y , and rain volume and sometimes include supporting measurement of winds or other meteorological parameters. Much of the desired 0097-6156/85/0292-0034$06.00/0 © 1985 American Chemical Society Breen and Robinson; Environmental Applications of Chemometrics ACS Symposium Series; American Chemical Society: Washington, DC, 1985.
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on March 11, 2018 | https://pubs.acs.org Publication Date: November 6, 1985 | doi: 10.1021/bk-1985-0292.ch003
3. VONG ET AL.
Analysis of Rainwater Composition
35
information, the intercorrelations among the measurements, may remain hidden i n the complexity of the data. Multivariate pattern recognition techniques attempt to i d e n t i f y underlying factors contained i n the measurements while reducing the dimensionality of the data. The measurement of available information (such as the concent r a t i o n of an element) i s used as a step towards i d e n t i f y i n g these underlying factors since the factors, themselves, are not d i r e c t l y measureable (e.g., the influence of a smelter or seasalt). In a search f o r the chemical and meteorological factors c o n t r o l l i n g r a i n water composition we have demonstrated the performance of these techniques i n the analysis of data consisting of rainwater samples collected weekly at three s i t e s i n Western Washington State i n 1982-83. The approach we have undertaken involves the i d e n t i f i c a t i o n of the underlying factors governing p r e c i p i t a t i o n composition at i n d i vidual s i t e s supplemented by i d e n t i f i c a t i o n of the factors which l i n k the l o c a l composition at d i f f e r e n t s i t e s within a region. Major steps i n this type of analysis include i n i t i a l data scaling and transformation, o u t l i e r detection, determination of the underlying factors, and evaluation of the effect that experimental procedures may have on the variance of the results. Most of the calculations were performed with the ARTHUR software package (1). Methodology We have combined c l a s s i c a l s t a t i s t i c a l techniques with graphical techniques which allow the user a more d i r e c t i n t e r a c t i o n with the data than would be achieved by a "black box" operation of purely mathematical techniques. For a data set where many samples are available the data reduct i o n begins with treatment of missing values by elimination of samples with more than one missing measurement to avoid introducing bias associated with f i l l i n g out a large number of missing values. Single missing values are mean-filled. Due to the low concentrations of many species i n r a i n , measurements below the detection l i m i t of the a n a l y t i c a l technique must be s p e c i a l l y treated. Substitution of a random number between zero and the lower detection l i m i t avoids introducing correlations which would occur i f a constant or zero value i s used. This approach preserves the useful information that the undetected specie has a very small concentrat i o n r e l a t i v e to other samples and to other species. A problem i n the analysis of these data i s the potential masking of some sources of v a r i a b i l i t y by other correlated variables which may be d i f f i c u l t to quantify. For example, the potential meteorological influences of atmospheric dispersion and mixing, scavenging differences between warm and cold clouds, variable rates of oxidation of sulfur and nitrogen species, and the d i l u t i o n effect of variable r a i n volume may mask source-receptor chemical r e l a t i o n ships. A p a r t i c u l a r problem i s that meteorological data and source-receptor locations share d i r e c t i o n a l dependence. To help reduce these influences, various data normalization techniques may be applied. Analysis of deposition (concentration times volume) rather than concentration alone may help avoid v a r i a b i l i t y associated with p r e c i p i t a t i o n amount. Another approach which was previously applied to aerosol measurements i n Sweden (2)
Breen and Robinson; Environmental Applications of Chemometrics ACS Symposium Series; American Chemical Society: Washington, DC, 1985.
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on March 11, 2018 | https://pubs.acs.org Publication Date: November 6, 1985 | doi: 10.1021/bk-1985-0292.ch003
36
ENVIRONMENTAL APPLICATIONS OF CHEMOMETRICS
involves converting concentrations to the r a t i o of an individual specie to the t o t a l concentration of a l l dissolved species. The data analysis i s then performed on these normalized or r e l a t i v e concentrations. To the degree that an assumption of constant scavenging e f f i c i e n c y holds (each element i s removed from the atmosphere with equal e f f i c i e n c y ) r e l a t i v e concentrations might be expected to better r e f l e c t the influence of a p o l l u t i o n source, which, over time might experience d i f f e r i n g amounts of d i l u t i o n by a i r and water. This technique may produce spurious correlations due to closure (the constant sum) depending on the data structure before normalization (3). The multivariate techniques which reveal underlying factors such as p r i n c i p a l component factor analysis (PCA), soft independent modeling of class analogy (SIMCA), p a r t i a l least squares (PLS), and cluster analysis work optimally i f each measurement or parameter i s normally distributed i n the measurement space. Frequency histograms should be calculated to check the normality of the data to be analyzed. Skewed d i s t r i b u t i o n s are often observed i n atmospheric studies due to the process of mixing of plumes with ambient a i r . They should be transformed before further data analysis (4^). Often the natural logarithm w i l l convert a skewed d i s t r i b u t i o n to a roughly gaussian shape. A l l further data analysis i s performed on these transformed measurements. Normalized or transformed measurements are termed "features" i n the following discussion. Pattern recognition techniques represent each sample as a point in N-dimensional space, their coordinates along the axes are the values of the corresponding measurements. For only two measurements per sample this i s equivalent to representing the sample as a point on standard two dimensional graph paper. Projection of N-dimensional data onto two dimensional p r i n c i p a l component plots provides a good demonstration of the fundamentals of any multivariate technique. As i n two dimensional graphical techniques the data must be scaled before further analysis. If no a p r i o r i knowledge about the importance of the d i f f e r e n t features i s available, scaling i s done to equally weight the variance of each feature. A common approach i s termed "autoscaling" (5) where the mean of a feature i s subtracted followed by normalization by the t o t a l variance of that feature. In t h i s manner each feature i s transformed to a zero mean and unit variance. A l t e r n a t i v e l y , the features may be weighted to r e f l e c t the uncertainty i n t h e i r measurement, thus giving poorly determined features less influence on the result (6). SIMCA and PLS techniques generally u t i l i z e a training set for modeling and predicting the underlying factors i n the data and for c l a s s i f i c a t i o n of unknown samples. This training set must be homogeneous and representitive of the data to be modeled and/or c l a s s i f i e d . Therefore, once the i n i t i a l data scaling and transformation i s completed i t i s important to i d e n t i f y o u t l i e r s among the samples so that they w i l l not bias the estimation of model parameters. I d e n t i f i c a t i o n of o u t l i e r s also aids i n i d e n t i f i c a t i o n of c o n t r o l l i n g factors when the p e c u l a r i t i e s of a p a r t i c u l a r sample can be explained i n terms of physical processes. We have used exploratory data analysis tools to eliminate o u t l i e r samples and choose the most informative features. Cluster analysis and PCA group the data in the measurement space to observe natural clusters and o u t l i e r s . Projection of the samples onto the f i r s t two p r i n c i p a l component
Breen and Robinson; Environmental Applications of Chemometrics ACS Symposium Series; American Chemical Society: Washington, DC, 1985.
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on March 11, 2018 | https://pubs.acs.org Publication Date: November 6, 1985 | doi: 10.1021/bk-1985-0292.ch003
3. VONG ET AL.
Analysis of Rainwater Composition
37
axes which represent the bulk of the variance i d e n t i f i e s o u t l i e r s as samples f a r from the rest of the data. Figure 1 i s an example of a p r i n c i p a l component projection for rainwater samples collected at the Tolt reservoir s i t e near Seattle, Washington, projected onto axes representing seasalt and aerosol p r i n c i p a l components. One sample near the upper l e f t corner of the plot i s f a r from the bulk of the data and i s considered an o u t l i e r . The determination of the underlying factors which affect the p r e c i p i t a t i o n composition at a s i t e i s done by PC analysis i n combination with clustering of sample features. The f i r s t step i n this process i s to i d e n t i f y the " i n t r i n s i c dimensionality", the number of c o n t r o l l i n g factors which are s i g n i f i c a n t i n characterizing the rainwater composition. The o r i g i n a l number of features are thus reduced to a smaller number of components which contain the information of those o r i g i n a l features. The choice of s i g n i f i c a n t factors for a s i t e can be v e r i f i e d by cross-validation ( 7 ) . The determination of which features the underlying factors are composed of provides a basis for attaching a physical interpretation to the factors. Varimax rotation of the PCA may be u t i l i z e d to aid i n the interpretation of the factors. Hierarchical dendrograms indicate feature clusters whose composition are analogous to PC factors. The physical interpretation of the clusters and p r i n c i p a l components indicates the influence of p o l l u t i o n emission sources or meteorological processes on the rainwater composition at an individual monitoring s i t e . If the o r i g i n a l data contain information on the uncertainties associated with each measurement the s e n s i t i v i t y of the variance of the results to these errors can be studied. Approaches include uncertainty weighting during the autoscaling procedure which i s provided for i n ARTHUR, uncertainty scaling (the data standard deviation used for autoscaling i s replaced by the measurement absolute error such as presented i n Table VII), and Monte Carlo simulation f o r estimating the variance of the s t a t i s t i c s based on the error perturbed data 06). After determining the underlying factors which affect l o c a l p r e c i p i t a t i o n composition at an i n d i v i d u a l s i t e , an analysis of the s i m i l i a r i t y of factors between d i f f e r e n t s i t e s can provide valuable information about the regional character of p r e c i p i t a t i o n and i t s sources of v a r i a b i l i t y over that s p a t i a l scale. SIMCA (8) i s a c l a s s i f i c a t i o n method that performs p r i n c i p a l component factor anal y s i s for i n d i v i d u a l classes ( s i t e s ) and then c l a s s i f i e s samples by calculating the distance from each sample to the PCA model that describes the p r e c i p i t a t i o n character at each s i t e . A score of percent samples which are correctly c l a s s i f i e d by the PCA models provides an indication of the separability of the data by s i t e s and, therefore, the uniqueness of the p r e c i p i t a t i o n at a s i t e as modeled by PCA. Spatial interrelationships i n the chemical composition among two or more blocks ( s i t e s ) can be calculated by p a r t i a l least squares (PLS) (9). PLS calculates latent variables s i m i l i a r to PC factors except that the PLS latent variables describe the correlated (variance common to both s i t e s ) variance of features between s i t e s . Regional influences on rainwater composition are thus i d e n t i f i e d from the composition of latent variables extracted from the measurements made at several s i t e s . Comparison of the results
Breen and Robinson; Environmental Applications of Chemometrics ACS Symposium Series; American Chemical Society: Washington, DC, 1985.
ENVIRONMENTAL APPLICATIONS OF CHEMOMETRICS
38
obtained from PCA, SIMCA, and PLS models allows the data analyst to separate l o c a l and regional influences on p r e c i p i t a t i o n composition.
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on March 11, 2018 | https://pubs.acs.org Publication Date: November 6, 1985 | doi: 10.1021/bk-1985-0292.ch003
Results We have applied the above approach to a data base consisting of weekly measurements of 14 chemical species i n Western Washington State rainwater (ammonium, n i t r a t e , chloride, s u l f a t e , arsenic, cadmium, copper, lead, zinc, potassium, magnesium, sodium, calcium, and hydrogen ion from pH), conductivity, r a i n f a l l volume, r a i n f a l l rate, surface wind speed (U), and frequency of wind d i r e c t i o n from four sectors (NE, SE, SW, NW). Samples were collected at three s i t e s , i n Seattle and i n the f o o t h i l l s of the Cascade Mountains i n Washington State over one year (10). Figure 2 indicates the location of the monitoring s i t e s and a nearby copper smelter which i s a major sulfur dioxide emission source. Additional emissions occur i n the Seattle area, primarily between the West Seattle and Maple Leaf s i t e s . The wind rose presents data for the frequency that the wind i s from a given d i r e c t i o n . Variation i n composition associated with wind d i r e c t i o n was deliberately minimized i n advance by s i t e s e l e c t i o n d i r e c t l y downwind of the smelter. The chemical analyses were performed i n the USEPA Manchester, WA water quality labs by atomic absorption and autoanalyzer techniques. Charge balance calculations indicated that a l l dissolved species of significance were analyzed. Comparison of f i l t e r e d and u n f i l t e r e d aliquots suggested that un-ionized species were not present i n appreciable quantities. Sampling and analysis uncertainties were determined by the operation of two co-located samplers f o r 16 weeks. The calcium and sulfate data were corrected for the influence of sea s a l t to aid i n the separation of the factors. This correction was calculated from bulk sea water composition and the chloride concentration i n rainwater (11). Non seasalt sulfate and calcium are termed "excess" and flagged by a * in the following discussion. Histograms revealed approximately lognormal d i s t r i b u t i o n s for C l , Na, Mg, K, Ca*, As, Pb, Cd, Cu, Zn ang H so those features were transformed by the natural logarithm. SO^, N0~ and NH^ d i s t r i b u tions were roughly gaussian and were not transformed. After i n i t i a l data reduction (treatment of missing values, transformation and autoscaling) cluster analysis and PCA were used to v i s u a l l y i d e n t i f y o u t l i e r s among the samples and to determine which features did not contribute to the interpretation of the underlying factors. PCA and cluster analysis were performed f i r s t on the transformed and scaled but unnormalized data. Figure 3 presents the dendrogram (complete l i n k method) for the clustering of a l l 22 chemical concentrations and meteorological features at the West Seattle s i t e . Variables connected at high s i m i l a r i t y values on this dendrogram contain s i m i l a r information about the rainwater composition. R e l a t i v e l y tight groupings exist for Na, Mg, and CI or for N0 and NH^. The separate branch for As, Pb, Cu, SO*, Η , wind speed, SW wind d i r e c t i o n , and Cd demonstrates that these variables are connected with the remainder of the data set at very low s i m i l a r i t y values. This i s consistent with a separate source of v a r i a b i l i t y i n the data due to emissions from the Tacoma copper smelter (the smelter routinely reduces emissions during low wind 3
Breen and Robinson; Environmental Applications of Chemometrics ACS Symposium Series; American Chemical Society: Washington, DC, 1985.
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on March 11, 2018 | https://pubs.acs.org Publication Date: November 6, 1985 | doi: 10.1021/bk-1985-0292.ch003
3. VONG ET AL.
39
Analysis of Rainwater Composition
COMPONENT
I '· H N
4 >
N0 (19.7% of variance) 3
Figure 1: P r i n c i p a l component projection for rainwater samples collected at the Tolt River s i t e .
Figure 2: Map of Western Washington with wind d i r e c t i o n during r a i n and locations of the Tacoma copper smelter (1) and monitoring s i t e s at West Seattle (2), Maple Leaf (3), and the Tolt Reservoir (4).
Breen and Robinson; Environmental Applications of Chemometrics ACS Symposium Series; American Chemical Society: Washington, DC, 1985.
Breen and Robinson; Environmental Applications of Chemometrics ACS Symposium Series; American Chemical Society: Washington, DC, 1985. SIMILARITY
0.7
0.5 VALUES
0.3
0.1
Figure 3: Hierarchical dendrogram f o r the clustering of a l l 22 features (unnormalized) at the West Seattle s i t e .
0.9
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on March 11, 2018 | https://pubs.acs.org Publication Date: November 6, 1985 | doi: 10.1021/bk-1985-0292.ch003
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on March 11, 2018 | https://pubs.acs.org Publication Date: November 6, 1985 | doi: 10.1021/bk-1985-0292.ch003
3. VONG ET AL.
Analysis of Rainwater Composition
41
speed or winds from the north). Dendrograms for the other s i t e s were similar. Zn, K, conductivity, and some of the meteorological data were subsequently eliminated from the data set because they did not contribute to the separation of factors or the interpretation of the results. PC projections (such as i l l u s t r a t e d i n Figure 1) and clustering of samples (as opposed to clustering of features which i s displayed i n Figure 3) were used to i d e n t i f y rainwater samples which were o u t l i e r s (as previously described) and might bias the estimation of the PLS and PCA model parameters. These samples were eliminated from the data set before further analysis. P r i n c i p a l component factor analysis followed by varimax rotation of s i x factors was performed on four d i f f e r e n t subsets of the remaining data (each with d i f f e r e n t preprocessing): 1) Concentration of 14 species with wind d i r e c t i o n and r a i n f a l l amount, 2) Concentration of 12 species, 3) Deposition of 12 species (concentration times r a i n f a l l amount), 4) Fractional concentration of 12 species. The results of the PCA from each subset are s i m i l a r except that the data subsets which did not either include the meteorological data or normalize the data to reduce meteorological v a r i a b i l i t y (subsets 2 and 3) were not able to separate several of the components probably due to the atmospheric masking e f f e c t . Information on the wind d i r e c t i o n and r a i n f a l l quantity dependence of seasalt and metals i s obtained when meteorological data are included i n the analysis. From the standpoint of separation of chemical factors the fourth subset (normalization to f r a c t i o n a l composition) provided the best resolution of the data. Using deposition or concentrations, a component that indicated a combined influence of s u l f a t e , n i t r a t e , lead and calcium emission sources was resolved into separate components when the f r a c t i o n a l composition data were analyzed by PCA. In the interpretation of these results i t i s important to consider the normalization of the data to f r a c t i o n a l concentrations and potential spurious correlations due to closure of the data set. Recent work (3) indicates that closure i s not a problem when the data set consists of more than eight variables of equal means and variance. I f one or several variables are large r e l a t i v e to the others, closure may result i n an a r t i f i c i a l negative correlation between the larger variables and, sometimes, a p o s i t i v e c o r r e l a t i o n among the smaller variables. Comparison of the pairwise correlations from the rainwater concentrations to the correlations for the normalized concentrations f o r our data reveals that only the hydrogen and sulfate correlations with sea s a l t elements are appreciably altered when the data set i s closed. These elements are large r e l a t i v e to the rest of the data (S0£ i s approximately 40 percent of the t o t a l i o n i c mass) such that closure might be influencing the negative c o r r e l a t i o n between seasalt elements and SO^/H*. However, physical processes present an alternate explanation which indicates that this negative c o r r e l a t i o n would be expected to actually occur as follows: 1) seasalt (Na, CI, Mg) should be a higher f r a c t i o n of the ions i n winter when high wind speeds generate more s a l t p a r t i c l e s , 2) hydrogen ion and SO* should be a higher f r a c t i o n of the ions i n summer when low wind speeds produce less atmospheric dispersion. When the data are not
Breen and Robinson; Environmental Applications of Chemometrics ACS Symposium Series; American Chemical Society: Washington, DC, 1985.
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on March 11, 2018 | https://pubs.acs.org Publication Date: November 6, 1985 | doi: 10.1021/bk-1985-0292.ch003
42
ENVIRONMENTAL APPLICATIONS OF CHEMOMETRICS
normalized a l l ions were more concentrated during summer when r a i n f a l l volume was small. Apparently meteorology dominates the fluctuations i n composition i n such a manner that the separate p o l l u t i o n influences could be observed only after meteorological v a r i a b i l i t y , especially variable r a i n f a l l volume, was reduced by the normalization procedure. Since the normalization technique helps to reduce v a r i a b i l i t y associated with atmospheric dispersion and scavenging, this result implies that meteorological v a r i a b i l i t y was an important influence on these data. The weekly sampling period resulted i n a variety of meterologi c a l conditions for each sample and, therefore, precluded any resolution of samples by unique wind d i r e c t i o n or representative r a i n f a l l rate. Therefore, i t was not possible to d i r e c t l y evaluate these meteorological influences on the composition of our p r e c i p i t a t i o n samples. Tables I, II and III present the results of the PCA for the three sampling s i t e s for the f r a c t i o n a l concentrations of 12 species. A l l loadings greater than 0.3 are included. These data indicate separate influences o£ Na, Mg, CI (interpreted to represent seasalt), NH,, N0~, SO^, and H (acid aerosol) and As, Cd, and Pb (smelter marfter elements). The exact combinations of these species vary from s i t e to s i t e . Hydrogen ion was associated with sulfate at the West Seattle s i t e but with both sulfate and n i t r a t e at the other two s i t e s . This i s i n agreement with the location of major S0« and NO emission sources. An additional factor involving Pb and Ca* was observed at two s i t e s . This i s interpreted to represent the influence of l o c a l s o i l or road dust. These results account for about 87 to 91 percent of the t o t a l variance i n the o r i g i n a l data set. The possible spurious negative correlations between seasalt elements and sulfate are flagged to note the possible influence of closure. Since the PCA and cluster analysis results were similar for the three s i t e s and since one emission source has been suggested (12) as the source of many of the species detected i n Western Washington r a i n , an analysis of the regional s i m i l a r i t i e s i n composition was appropriate. SIMCA modeling was u t i l i z e d to determine the separability of the samples collected at the three d i f f e r e n t s i t e s . The results presented i n Table IV indicate the model cannot separate the samples from the West Seattle and Maple Leaf s i t e s . Since both of these s i t e s are located downwind of the major regional emission sources and experience s i m i l a r meteorology t h e i r rainwater composition i s s i m i l a r . The Tolt reservoir s i t e i s separated from the Seattle s i t e s with 79 percent of the samples collected there correctly c l a s s i f i e d by the SIMCA model. This s i t e i s believed to be influenced by the same emission sources as the other two s i t e s but experiences different meteorological conditions (primarily longer transport times and more frequent and larger quantity of r a i n f a l l ) due to i t s location i n the f o o t h i l l s of the Cascade Mountains (elevation 550 meters). Considering the uncertainty i n the reported concentrations (see Table VII) and the s i m i l a r a i r p o l l u t i o n emission sources the SIMCA results are reasonable. The f i n a l step i n the analysis was u t i l i z a t i o n of PLS to examine the correlated variance of the features between different s i t e s .
Breen and Robinson; Environmental Applications of Chemometrics ACS Symposium Series; American Chemical Society: Washington, DC, 1985.
3. VONG ET AL.
43
Analysis of Rainwater Composition
Table I:
West Seattle ( f r a c t i o n a l concentrations)
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on March 11, 2018 | https://pubs.acs.org Publication Date: November 6, 1985 | doi: 10.1021/bk-1985-0292.ch003
Varimax Rotation of P r i n c i p a l Factor Pattern Factor Loadings Specie
a
a
l
2
NH. 4
.610
N0
.625
3
a
3
a
4
a
5
a
6
-.460**
CI SO, 4
.486
.573**
As
.716
Cd
• 570 .925
Cu .464
Pb Na
-.455**
Mg
-.469**
*
.468
.731
Ca
.890
H
Percent of Total Variance 19.3
18.8
13.8
12.6
9.5
8.8
*Corrected f o r seasalt based on c h l o r i n i t y r a t i o . **These loadings are believed to be r e a l i s t i c although a potential closure problem exists (see text).
Breen and Robinson; Environmental Applications of Chemometrics ACS Symposium Series; American Chemical Society: Washington, DC, 1985.
44
ENVIRONMENTAL APPLICATIONS OF CHEMOMETRICS
Table II:
Maple Leaf ( f r a c t i o n a l concentrations)
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on March 11, 2018 | https://pubs.acs.org Publication Date: November 6, 1985 | doi: 10.1021/bk-1985-0292.ch003
Varimax Rotation of p r i n c i p a l factor pattern
Factor Loadings Specie
a
l
a
2
NH, 4
.634
N0
.408
3
CI
a
3
a
4
a
5
a
6
-.556
-.481**
* .534**
SO. 4 As
.320 .910
Cd .834
Cu Pb
.776
Na
-.450**
Mg
-.302**
.373
* Ca H
.433**
Percent of Total Variance
26.7
.369
.413
10.9
10.4
-.433
15.9
14.9
8.8
*Corrected for seasalt based on c h l o r i n i t y r a t i o . **These loadings are believed to be r e a l i s t i c although a potential closure problem exists (see text).
Breen and Robinson; Environmental Applications of Chemometrics ACS Symposium Series; American Chemical Society: Washington, DC, 1985.
3. VONG ET AL.
45
Analysis of Rainwater Composition
Table III: Tolt Reservoir ( f r a c t i o n a l concentrations)
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on March 11, 2018 | https://pubs.acs.org Publication Date: November 6, 1985 | doi: 10.1021/bk-1985-0292.ch003
Varimax Rotation of P r i n c i p a l factor patterns
Factor Loadings Specie
a
l
NH. 4
.607
N0
.568
3
CI SO. 4
a
2
a
3
a
4
a
5
a
6
.365 .392**
-.622
-.602**
.363 .930
As
.332
.688
Cd .624
Cu
.605
Pb Na
.456**
Mg
.533**
*
.355
.663
Ca
.500
-.378**
H
Percent of Total Variance
19.7
17.8
13.7
13.3
12.1
9.1
*Corrected f o r seasalt based on c h l o r i n i t y r a t i o . **These loadings are believed to be r e a l i s t i c although a potential closure problem exists (see text).
Breen and Robinson; Environmental Applications of Chemometrics ACS Symposium Series; American Chemical Society: Washington, DC, 1985.
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on March 11, 2018 | https://pubs.acs.org Publication Date: November 6, 1985 | doi: 10.1021/bk-1985-0292.ch003
46
ENVIRONMENTAL APPLICATIONS OF CHEMOMETRICS
Table IV:
SIMCA r e s u l t s , c l a s s i f i c a t i o n matrix f o r f r a c t i o n a l concentrations at three s i t e s .
West Seattle
West Seattle
Maple Leaf
Tolt
Maple Leaf
Tolt River
7
11
20
23 45% correct
8
4
3
26 79% correct
19 51% correct
Breen and Robinson; Environmental Applications of Chemometrics ACS Symposium Series; American Chemical Society: Washington, DC, 1985.
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on March 11, 2018 | https://pubs.acs.org Publication Date: November 6, 1985 | doi: 10.1021/bk-1985-0292.ch003
3. VONG ET AL.
Analysis of Rainwater Composition
47
Any regional influence on rainwater composition would be expected to affect a l l three s i t e s reported here. A PLS two block model (9) was used to predict the variance i n rainwater composition at one s i t e from the variance i n rainwater composition at an upwind s i t e . PLS results are presented i n Tables V and VI. Loadings greater than 0.3 have been underlined. Loadings which may be influenced by closure are flagged. The regression of Maple Leaf composition ( f r a c t i o n a l ) on West Seattle composition reveals four components: la) Hydrogen ion, lead, s u l f a t e , n i t r a t e (positive c o r r e l a tion); lb) Sodium (negative correlation); 2a) Arsenic, cadmium, lead (positive correlation); 2b) Nitrate (negative c o r r e l a t i o n ) ; 3) Sodium, magnesium, chloride, ammonium; 4) Cadmium, copper. The f i r s t three components suggest regional sources of: a c i d i c anthropogenic aerosol, the marker elements of a copper smelter, and seasalt, respectively. The fourth component or the ammonium i n component three do not provide a ready interpretation of a known emission or meteorological source of v a r i a b i l i t y . The negative c o r r e l a t i o n of n i t r a t e with component two i s consistent with separate influences of the copper smelter and automobile emissions. The regression of the Tolt River rainwater composition on Maple Leaf data indicated four components: la) Sodium, magnesium, chloride (negative correlation); lb) Hydrogen ion, sulfate lead (positive correlation); 2) Arsenic, cadmium, lead; 3) Copper, lead 4a) Sulfate, magnesium (negative correlation) 4b) Ammonium (positive correlation) Three components are s i m i l a r to the results for the West Seattle-Maple Leaf PLS model except that the acid aerosol component no longer has high a loading from n i t r a t e . This specie i s o r d i n a r i l y associated with automobile emissions. The Tolt s i t e i s remote enough that auto emissions are not as important an influence on the v a r i a b i l i t y i n rainwater composition as i n Seattle. The fourth component for this PLS model might represent emissions from a cement plant which does not influence the West Seattle s i t e . The s o i l factor i s apparently l o c a l i n nature since i t appears i n the PCA results but not the PLS results. With emission source chemical signatures and corresponding aerosol or rainwater sample measurements PLS can be used to calculate a chemical element mass balance (CEB). Exact emission p r o f i l e s for the copper smelter and for a power plant located further upwind were not available for c a l c u l a t i o n of source contributions to Western Washington rainwater composition. This type of calculation i s more d i f f i c u l t for rainwater than for aerosol samples due to atmospheric gas to p a r t i c l e conversion of sulfur and nitrogen species and due to variations i n scavenging e f f i c i e n c i e s among species. Gatz (14) has applied the CEB to rainwater samples and discussed the effect of variable s o l u b i l i t y on the evaluation of the s o i l or road dust factor. Table VII presents data for Maple Leaf rainwater collected i n two co-located samplers operated for 16 weeks for the purpose of determining experimental uncertainty. These data reveal that Cu,
American Chemical Society Library 1155 16th St., N.W. Washington, 20036of Chemometrics Breen and Robinson; EnvironmentalD.C. Applications ACS Symposium Series; American Chemical Society: Washington, DC, 1985.
Breen and Robinson; Environmental Applications of Chemometrics ACS Symposium Series; American Chemical Society: Washington, DC, 1985.
latent variable West Seattle Maple Leaf
latent variable West Seattle Maple Leaf
latent variable West Seattle Maple Leaf
2.
3.
4.
.04 .08
.32 .39
.23 .16
.31 .16
NH,
.03 .13
.23 .28
.37 .33
,39 .33
NO-
West Seattle
-.14 -.17
.41 .44
-.21 .13
-,37** -.05**
CI
.08 .15
.33 .19
.04 .01
.22** .43**
SO,
-> Maple Leaf
-.21 -.29
.23 .29
-.57 -.51
.22 .16
As
.73 • 80
.06 .06
-.30 -.45
.21 .23
Cd
.56 .42
.23 .15
.25 .15
.03 .09
Cu
-.11 -.08
.29 .24
.38 .39
.37 .40
Pb
Mg
.01 .06
.30 .36
.08 .20
.16 .06
.30 .37
.34 .07
-.39** -.26 -.17** .04
Na
.15 .11
.24 .13
-, 15 41
.09 .31
Ca*
.10 .01
.01 .10
.34** .39**
-.13 .12
* Corrected f o r seasalt based on c h l o r i n i t y r a t i o . ** These loadings are believed to be r e a l i s t i c although a potential closure problem exists (see t e x t ) .
latent variable West Seattle Maple Leaf
1.
Specie
Two block PLS:
Table V: Outer relationship c o e f f i c i e n t s of the PLS model
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on March 11, 2018 | https://pubs.acs.org Publication Date: November 6, 1985 | doi: 10.1021/bk-1985-0292.ch003
ο
Ο m Η
χ m
Ο -π π
C/3
δ
Ο
"Ό γ-
>
m
< 53 ο ζ
m
Breen and Robinson; Environmental Applications of Chemometrics ACS Symposium Series; American Chemical Society: Washington, DC, 1985.
latent variable Maple leaf Tolt reservoir
latent variable Maple leaf Tolt reservoir
latent variable Maple leaf Tolt reservoir
2.
3.
4.
.40 .39
-.33 .20
.03 .03
.15 .12
NH
Maple Leaf
,33 .16
. 10 .05
.21 .20
.29 .18
NO,,
.03 .33
.22 .04
.09 .09
-.48** -.43**
CI
As
.47 .30
.07 .20
.08 .01
.14 .40
.29 .27
, 72 .71
.31** -.18 .28** -.07
SO?
> Tolt Reservoir
.25 .27
.29 .02
,28 .44
.09 -.04
Cd
.22 .33
,49 .44
.13 .01
.01 .10
Cu
.26 .05
.55 .63
.29 .29
^32 .36
Pb
Mg
.28 .13
.08 .12
.08 .22
.43 .47
-.21 -.01
.03 .09
-.35** -.38** -.52** -.40**
Na
-.15 -.21
.03 .49
.46 .26
.20 .17
Ca*
* Corrected f o r seasalt based on c h l o r i n i t y r a t i o . **These loadings are believed to be r e a l i s t i c although a potential closure problem exists (see text).
latent variable Maple Leaf Tolt reservoir
1.
Specie
Two block PLS:
Table VI: Outer relationship c o e f f i c i e n t s of the PLS Model
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on March 11, 2018 | https://pubs.acs.org Publication Date: November 6, 1985 | doi: 10.1021/bk-1985-0292.ch003
.17 .06
.25 .01
,14 .18
.36** .33**
H
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on March 11, 2018 | https://pubs.acs.org Publication Date: November 6, 1985 | doi: 10.1021/bk-1985-0292.ch003
50
ENVIRONMENTAL APPLICATIONS OF CHEMOMETRICS
Table VII:
Sampling and analysis precision for co-located rain samplers of the Maple Leaf s i t e (units • ppm unless indicated)
Species
Mean (1)
Absolute Error (2)*
(2)/(l)
Ν
NH. -N 4
0.30
0.05
.15
32
N0 -N
0.45
0.07
.16
32
CI
1.04
0.16
.15
32
SO, 4
3.67
0.81
.22
32
As(ppb)
7.18
1.94
.27
20
Cd(ppb)
0.65
0.74
1.15
26
Cu(ppb)
7.97
5.66
.71
32
3
Pb(ppb)
17.3
3.73
.22
30
Zn(ppb)
17.5
7.14
.41
26
Κ
0.13
0.08
.66
32
Na
0.68
0.08
.12
30
Mg
0.10
0.02
.15
30
Ca
0.26
0.04
.16
30
*The standard deviation was calculated assuming that the average of each co-located sample pair was the pair's true value. Random error was assumed. N/2 degrees of freedom were used for the N/2 sample pairs since no o v e r a l l mean for the data set was calculated. The absolute error i s defined as this standard deviation of paired sample c o l l e c t i o n s for a 16 week period. The data for the entire 52 week sampling period have been reported elsewhere (10).
Breen and Robinson; Environmental Applications of Chemometrics ACS Symposium Series; American Chemical Society: Washington, DC, 1985.
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on March 11, 2018 | https://pubs.acs.org Publication Date: November 6, 1985 | doi: 10.1021/bk-1985-0292.ch003
3. VONG ET AL.
Analysis of Rainwater Composition
51
Cd, Κ and Zn are not precisely determined. Previously reported (13) results for i d e n t i c a l s p l i t samples indicates that most of this experimental error was due to a n a l y t i c a l imprecision rather than c o l l e c t i o n and handling. Many of the samples were near the detection l i m i t f o r the f i v e trace metals (As, Cd, Cu, Pb, Zn). To determine the effect of these measurement errors the PCA was repeated with uncertainty scaled data. (The data standard deviation used i n autoscaling was replaced with the measurement absolute error.) The effect of including the measured a n a l y t i c a l and sampling errors i n the data scaling and PCA was to s p l i t factors consisting of several trace metals (which had higher uncertainties than the other species). In many cases the error weighted PCA indicate primarily single features such as arsenic, cadmium, or copper loading on a component. This i s consistent with a source of variance i n the data set which i s associated with random measurement variations rather than emission sources or meteorological processes. This emphasizes the importance of using accurate and precise a n a l y t i c a l techniques for rainwater measurements. Conclusions The four techniques (PCA, h i e r a r c h i a l clustering, SIMCA, and PLS) are complementary i n resolving p r e c i p i t a t i o n chemistry data. Interpretation of these results allows a hypothesis as to what factors influence p r e c i p i t a t i o n chemistry i n Western Washington. Since the choice of which species to chemically analyze i s subjective, other factors may be undetected due to lack of measurement. These results indicate the presence of seasalt, a c i d i c sulfate and n i t r a t e aerosol, road or s o i l dust, emission of metals from a copper smelter located to the southwest, and the occurrence of r a i n accompanied by strong southwesterly winds. These results are consistent with previous work (15). Further i d e n t i f i c a t i o n of meteorological influences on composition i s limited by the weekly sampling period which results i n a variety of wind and rain patterns f o r each sampling period. Although the measurement uncertainties l i m i t the conclusions which can be drawn from these r e s u l t s , the data set proved useful for the determination of general influences on rainwater composition i n the Seattle area and f o r the demonstration of the application of these exploratory data analysis techniques. Current e f f o r t s to c o l l e c t and analyze aerosol and rainwater samples over meteorologi c a l l y appropriate time scales with precise a n a l y t i c a l techniques are expected to provide better resolution of the factors c o n t r o l l i n g the composition of rainwater.
Literature Cited 1. Duewer, D.L., Harper, A.M., Koskinen, J.R., Fasching, J.L., and Kowalski, B.R., ARTHUR, Version 3-7-77 (1977). 2. Hansson, H.C., Martinsson, B.G., and Lannefors, H.O., accepted for publication in Nuclear Instruments and Methods, (1984). 3. Johansson, Ε., Wold, S. and Sjodin, K, Analytical Chemistry, 56, 1685, (1984).
Breen and Robinson; Environmental Applications of Chemometrics ACS Symposium Series; American Chemical Society: Washington, DC, 1985.
Downloaded by UNIV OF CALIFORNIA SANTA BARBARA on March 11, 2018 | https://pubs.acs.org Publication Date: November 6, 1985 | doi: 10.1021/bk-1985-0292.ch003
52
ENVIRONMENTAL APPLICATIONS OF CHEMOMETRICS
4. Brown, S.D., Skogerboe, R.K., and Kowalski, B.R., Chemosphere, 9, 265, (1980). 5. Kowalski, B.R. and Bender, C.F., J. Am. Chem. Soc., 94, 5632 (1972). 6. Duewer, D.L., Kowalski, B.R., and Fasching, J.L., Anal. Chem., 48, 13, 2002, (1976). 7. Wold, S., Technometrics, 20, 4, 397, (1978). 8. Wold, S., J. Pattern Recognition, 8, 127, (1976). 9. Frank, I.E. and Kowalski, B.R., J. Chem. Inf. Comput. Sci., 24, 1, 20, (1984). 10. Vong, R.J., Larson, T.V., Covert, D.C., and Waggoner, A.P., accepted for publication Water, Air, and Soil Pollution, (1985). 11. Junge, C., Air Chemistry and Radioactivity, New York (1963). 12. Larson, T.V., Charlson, R.J., Kundson, G.J., Christian, G.D., and Harrison, Η., Water, Air, and Soil Pollution, 4, 319, (1975). 13. Vong, R.J. and Waggoner, A.P., EPA 910/9-83-105, USEPA Region 10, Seattle, WA. (1983). 14. Gatz, D.F., "Source Apportionment of Rain Water Impurities in Central Illinois," presented at 76th A.P.C.A. Meeting, Atlanta, (1983). 15. Knudson, E.J., Duewer, D.L., Christian, G.L. and Larson, T.V. in: Chemometrics, Theory and Applications, (ed. B.R. Kowalski), ACS Symposium Series 52, Wash, D.C. (1977). RECEIVED June 28, 1985
Breen and Robinson; Environmental Applications of Chemometrics ACS Symposium Series; American Chemical Society: Washington, DC, 1985.