Use of a Nonnegative Constrained Principal Component Regression

Oct 20, 2009 - To solve the near collinearity problem in CMB, several methods have been introduced, such as the ridge regression weighted least-square...
0 downloads 0 Views 319KB Size
Environ. Sci. Technol. 2009 43, 8867–8873

Use of a Nonnegative Constrained Principal Component Regression Chemical Mass Balance Model to Study the Contributions of Nearly Collinear Sources G U O - L I A N G S H I , † Y I N - C H A N G F E N G , * ,† F A N G Z E N G , † X I A N G L I , †,‡ YU-FEN ZHANG,† YU-QIU WANG,† AND TAN ZHU† State Environmental Protection Key Laboratory of Urban Ambient Air Particulate Matter Pollution Prevention and Control, College of Environmental Science and Engineering, Nankai University, Tianjin, 300071, China, and Department of Computer Science, University of Georgia, Athens, Georgia

Received September 15, 2009. Accepted October 11, 2009.

Chemically similar sources may result in near collinearity without more specific chemical markers (12) and if two or more sources have similar composition profiles, negative results may be obtained by the CMB model (13). When conducting source apportionment on ambient air, the near collinearity problem often affects the apportionment process and may lead to unreasonable results (e.g., negative source contributions). One example of the near collinearity problem in China is that the chemical profile of urban resuspended dust (RD) is very similar to the profiles of soil dust and coal combustion fly ash (14, 15). The near collinearity problem arises from the solution algorithm of least-squares regression. In an ordinary leastsquares (OLS) regression process, the matrix notation is Y ) XB + residuals

(1)

where Y is the dependent variable vector (n × 1), X (n × m) is the independent variable matrix, B is the regression coefficient vector (m × 1) and n > m. To obtain the regression coefficient vector, the pseudoinverse matrix X+ (m × n) should first be calculated. X+ ) (X′X)-1X′, (Χ is the transpose matrix of Χ)

In this study, a nonnegative constrained principal component regression chemical mass balance (NCPCRCMB) model was used to solve the near collinearity problem among source profiles for source apportionment. The NCPCRCMB model added the principle component regression route into the CMB model iteration. The model was tested with the synthetic data sets, which involved contributions from eleven actual sources, with a serious near collinearity problem among them. The actual source profiles were randomly perturbed and then applied to create the synthetic receptor. The resulting synthetic receptor concentrations were also randomly perturbed to simulate measurement errors. The synthetic receptors were separately apportioned by CMB and NCPCRCMB model. The result showed that source contributions estimated by the NCPCRCMB model were much closer to the true values than those estimated by the CMB model. Next, five real ambient data sets from five cities in China were analyzed using the NCPCRCMB model to test the model practicability. Reasonable results were obtained in all cases. It is shown that the NCPCRCMB model has an advantage over the traditional CMB model when dealing with near collinearity problems in source apportionment studies.

Introduction Identification and apportionment of pollutants to their sources is very important for air quality management (1). Several receptor models have been developed (2, 3) and applied widely, such as the chemical mass balance (CMB) model (4-6), principal component analysis/multiple linear regression (PCA/MLR) (7, 8), Unmix 5, 9), and positive matrix factorization (PMF) (5, 10, 11), among others. Watson reviewed the strengths and weaknesses of a number of receptor models. Similar to some other receptor models, CMB is sensitive to the near collinearity problem. * Corresponding author phone: +86 22 23507962; fax: +86 22 23503397; e-mail: [email protected]. † Nankai University. ‡ University of Georgi. 10.1021/es902785c CCC: $40.75

Published on Web 10/20/2009

 2009 American Chemical Society

(2)

and then B ) X+Y

(3)

CMB applies an effective variance least-squares regression procedure, which introduces the uncertainty of X and Y into iterative regression. However, the near collinearity problem is left unsolved. To solve the near collinearity problem in CMB, several methods have been introduced, such as the ridge regression weighted least-squares algorithm (13). In this study, we use the principle component regression (PCR) method, which has been widely used in chemometrics. PCR is a method that combines linear regression and principle component analysis (PCA) (16, 17), which is often applied when dealing with near collinearity problems in mathematical or statistical problems (18). The calculation process of PCR can be described as follows: First, the independent variable matrix X (n × m) is analyzed by PCA and the factor score matrix T (n × m) and loading matrix L (m × m) are obtained. X ) TL′

(4)

T ) XW

(5)

and

W (m × m) is the factor score coefficient matrix, where W ) (L′)+ ) (L′)-1L

(6)

There are m factors in matrices T and L and each factor corresponds to an eigenvalue. If the collinearity problem exists in matrix X, some factors may have very small eigenvalues, which indicates that these factors contain little information about matrix X. To solve the near collinearity problem, it may be necessary to remove factors with low eigenvalues from the matrices T and L. Supposing that only the first p (p < n) eigenvalues are large and the remaining eigenvalues are very small, the last n - p terms in T and L can be neglected. In this way, eq 4 can be rewritten as follows: VOL. 43, NO. 23, 2009 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

8867

′ ′ ∗ ∗′ T(mxm) = T(nxp) L(pxp) X(nxm) ) T(nxm)

(7)

T* ) XW∗′

(8)

and

After PCA, the OLS relationship between the dependent variable Y (n × 1) vector and the new factor score matrix T (n × p) is established (p is the number of retained factors, p < m). Y ) T*A + residuals

(9)

where vector A (p × 1) is the intermediate regression coefficient vector. The final regression coefficient vector B in eq 1 can be expressed as follows: B ) W∗A

C ) FS (10)

Equation 10 specifies the transformation by which the MLR coefficients are related to the PCA results. However, during source apportionment and other real situations, the matrices T and L (in eq 7) should not contain negative values (which are physically unreasonable). Therefore, the PCA should be performed with nonnegative constraints. The method for nonnegative constrained PCA has been described previously (8). Factor loadings and scores from the original matrices T and L (obtained according to eq 7) are rotated: X ) TRR-1L′

be resolved. However, it fails to account for the uncertainties in both the source profile and the receptor data. In this study, a nonnegative constrained principal component regression chemical mass balance (NCPCRCMB) model was developed, which applies an effective variance weighted principle component regression algorithm. The study proceeded as follows: first, the algorithm of the NCPCRCMB model was introduced; and then the synthetic receptor data sets were generated to compare the performance between the CMB and NCPCRCMB models; and finally, the model was applied to some ambient receptor data sets from real world to test its practicability. NCPCRCMB Model Algorithm. The CMB model was constructed on the basis of six assumptions (19). The relationship between the receptor and sources is established as follows: (20)

where C (n × 1) is the receptor species concentration vector (µg/m3), which is associated with the vector Y in eq 1; F (n × m) is the source profile matrix (µg/µg), which is associated with the matrix X n eq 1; S (m × 1) is the source contribution vector (µg/m3), which is associated with the vector B in eq 1. n is the number of species measured, and m is the number of source categories. To take uncertainty into account, the CMB solution establishes an OLS relationship among the weighted source profile matrix Fw (n × m), weight receptor vector Cw (n × 1) and source contribution vector S (m × 1) in each iterative step:

(11)

Cw ) FwS

(21)

(12)

Cw ) (Ve)-1/2C

(22)

where T* is a factor score matrix, wherein negative entries are replaced by zero. Matrices T and L can then be rotated as follows:

Fw ) (Ve)-1/2F

(23)

Ve-1 ) (Ve)-1/2(Ve)-1/2

(24)

(Ve)-1/2 ) ((Ve)-1/2)′

(25)

where R is a score-transformation matrix. Matrix R can be computed based on the factor score T: R ) (T)+T* ) (T'T)-1T'T*

T1 ) TR

(13)

L1′ ) R-1L′

(14)

X ) T1L1′

(15)

Therefore, T1 is the new score matrix, which contains no negative values. Similarly, another transformation matrix, S, can be calculated based on the rotated factor loading, L1, as: S ) L*(L1′)+ ) L*(L1L1′)L1

(16)

where S is the loading-transformation matrix, based on factor loadings, and L* is a factor loading matrix whose negative entries are replaced by zero. L1 and T1 are further rotated by L2′ ) SL1′

(17)

T2 ) T1S-1

(18)

So,X ) T2L2′

(19)

The process is repeated until the sum of squares of negative values in the factor loading matrix is below a certain value and the final score matrix contains no negative values. If the above NCPCR method was applied directly in source apportionment studies, the near collinearity problem might 8868

9

ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 43, NO. 23, 2009

where

Ve (n × n) is the diagonal effective variance matrix. This diagonal matrix can show both source profile and receptor data uncertainties. Thus, the source contribution vector can be calculated as follows: S ) (FW)+CW

) ((FW)′FW)-1(FW)′CW

) (((Ve)-1/2F)′(Ve)-1/2F)-1((Ve)-1/2F)′(Ve)-1/2C

(26)

) ((F’(Ve)-1/2)(Ve)-1/2F)-1(F’(Ve)-1/2)(Ve)-1/2C ) (F’(Ve)-1F)-1F’(Ve)-1C

In each CMB iterative step, the effective variance matrix, Ve, changes, so the source contribution vector varies accordingly. In this study, the NCPCRCMB model added the principle component regression route into the CMB iteration. The calculation procedure is shown below, where the superscript k is used to show the value of a variable at the kth iteration (19): (1) Set the initial estimate of source contributions to zero. )0 sk)0 j

(27)

(2) Calculate the effective variance matrix, Ve. All off-diagonal components of this matrix are equal to zero (19).

2

k ) σc2i + vei



n

(skj ) · σf2iji ) 1, ...n; j ) 1, ...m

(28)

Percent mass: pm ) 100(

∑ S )/C j

(39)

t

j)1

where σci is the standard deviation precision of the ci measurement and σfji is the standard deviation precision of the fij measurement. (3) Establish the relationship between the column vector of the weighted receptor profile (Cw: n × 1), the column vector of the source contribution (S: m × 1) and the weighted source profile matrix (Fw: n × m) k

k k+1

(CW) ) (FW) S

(29)

Cw and Fw are calculated according to eqs 22 and (23). (4) PCA is performed on the weighted profile matrix (Fw)k(n × m), and the factor score matrix Tk (n × m) and factor loading matrix Lk (m × m) are calculated. (Fw)k ) Tk(Lk)′

(30)

(5) If there are small eigenvalues in PCA, the dimension of matrices Tk and Lk should be reduced according to eq 7, to obtain the new score and factor matrices, T*k and L*k. (6) Matrices T*k and L*k are rotated via nonnegative constraints according to eqs 11-19. Here, the iteration is repeated until the sum of squares of negative values in the factor loading matrix is below 0.01. (7) Calculate the (k + 1)th iterative intermediate regression coefficient Ak+1 (P × 1) (19). Because (Cw)k ) (Fw)kSk+1

(31)

(Cw)k ) T*kL*kSk+1 ) T*kAk+1

(32)

k k Αk+1 ) (Τ*k)+CW ) ((Τ*k)′Τ*k)-1(Τ*k)′CW

(33)

then

(8) Calculate the (k+1)th iterative column vector of the source contribution S (m × 1) Sk+1 ) Wk+1Ak+1

(34)

k where FW Wk ) T*k

(35)

and W ) ((L*k)′)+ ) ((L*k)(L*k)′)-1((L*k))

(36)

where

and

(9) Tests the (k + 1)th iteration of the sj against the kth iteration (19). If |(sjk+1 - sjk)/sjk+1| > 0.01, go to step 2; (j ) 1, 2, 3 · m), if |(sjk+1 - sjk)/sjk+1| < 0.01, go to step 10. (10) Assign the (k + 1)th iteration result to sj and σsj (19). σs2j ) (F′(Vek+1)-1Fjj)-1j ) 1, ...m

(37)

The final estimated source contributions (µg/m3) and their standard deviations can be obtained at the (k + 1)th iteration. Because the matrices F and C and the estimated contribution vector S remain in the same form, the performance indices in the NCPCRCMB model are the same as in the CMB model (19): Reduced chi square: chi2 )

1 n-m

I

n

∑ [(C - ∑ (F S ) /V 2

i

i)1

ij j

eii]

i)1

(38)

where Ct is the total measured mass. n

R square: R2 ) 1-[(n-m)chi2]/[

∑ C /V 2 i

eii]

(40)

i)1

Synthetic Receptor Data set Development. The synthetic data were generated according to eq 20. The synthetic receptor data set was composed of chemical species contributions from eleven actual PM10 sources, including resuspended dust, soil dust, coal combustion fly ash, vehicle exhaust, cement dust, vegetable burning, oil-fired power plant, sea salt, steel mill, lead smelter and municipal incinerator. These actual source profiles included here were given by Bi et al. (15) and Javitz et al. (20). The values of source profiles and simulated source contributions can be found in Supporting Information (SI) Table S1. (The simulated source contributions were referred to ref 20. Referring to the works of Javitz et al. (20) and Lowenthal et al. (21), the mean source profiles (fij) were randomly perturbed and multiplied by the “true” source contributions (sij). The resulting receptor concentrations were also randomly perturbed to simulate measurement errors. The random uncertainties generated from a log-normal distribution to avoid negative values (20, 21). The coefficient of variation (CV) is used to calculate the perturbation of source profiles and receptor concentrations. CV is defined as the standard deviation of species concentration divided by the mean of species contribution (20). This method of synthetic receptors development was discussed clearly by Javitz et al. (20). Similar to the work of Javitz et al. (20), several simulation runs were included in the synthetic data tests. In each simulation run, we created 100 days of data, using constant true source contributions. In addition, all of the source profile CVs was set to a common value; and the receptor measurement error CVs was set to another common value, in each simulation run. The values of these CVs (referred to the work of Javitz et al. (20)) were listed in Table 1. Use of the CMB and NCPCRCMB Models to Study the Synthetic. Data Sets. Before CMB and NCPCRCMB models analysis, the source profiles matrix (listed in SI Table S1) was analyzed in order to identify the near collinearity problem among them. The concepts of condition index (CI) and variancedecomposition proposition (VDP) were introduced, as described by Belsley et al. (22). The source profile matrix (16 × 11) (number of species by number of eleven source categories, which are all listed in SI Table S1) was analyzed. According to Belsley et al., the near collinearity problem can be identified as occurring when (1) there is a singular value with a high CI (>5) and (2) which is associated with high VDP values (>0.5) for two or more sources (22). The CI and VDP values are presented in SI Table S2. In SI Table S2, there are two high CI values, 81.95 and 1212.25. For CI ) 81.95, only one VDP > 0.5. For CI ) 1212.25, seven VDPs > 0.5: resuspended dust (1.00), soil dust (1.00), coal combustion (0.50), cement (0.71), vegetable burning (0.96), sea salt (0.69), and municipal incinerator (0.77). Thus, there was a serious near collinearity problem in the source profiles matrix. Then, the synthetic receptors were analyzed by CMB and NCPCRCMB models (all the species in SI Table S1 were selected for fitting). As described above, for each simulation run, there were 100 receptors developed. The 100 receptors were analyzed by CMB and NCPCRCMB models. So in each simulation run, there were 100 sets of source contributions VOL. 43, NO. 23, 2009 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

8869

TABLE 1. (A) CMB Model Performance; (B) NCPCRCMB Model Performance (A) simulation run

1 2 3 4 5 6 7 8

source profile

receptor measurement error

CV (%)

CV (%)

RDa

soil

coal

vehicle

cement

vegetable burning

oil-fired power plant

sea salt

steel mill

lead smelter

municipal incinerator

25 25 50 50 100 100 200 200

10 20 10 20 10 20 10 20

453 673 1759 1684 4445 4578 6730 8988

672 1020 2651 2554 6777 6891 10359 13772

40 51 124 118 298 294 404 557

29 36 80 83 193 225 267 398

26 38 56 62 123 136 112 197

84 134 322 319 787 886 1218 1672

12 12 24 23 42 47 72 88

24 34 85 91 236 218 322 464

22 31 79 78 191 234 267 413

7 11 14 16 21 22 29 40

46 62 192 203 543 514 740 1081

source profile

receptor measurement error

CV (%)

CV (%)

RDa

soil

coal

vehicle

cement

vegetable burning

oil-fired power plant

sea salt

steel mill

lead smelter

municipal incinerator

25 25 50 50 100 100 200 200

10 20 10 20 10 20 10 20

35 39 35 35 40 39 47 50

44 45 53 56 82 80 131 118

17 25 32 34 60 54 97 79

27 32 37 50 84 78 101 158

19 36 34 43 56 64 81 93

28 42 38 61 85 79 113 154

12 12 20 21 36 38 52 57

16 24 25 27 38 48 62 69

18 26 23 29 50 50 79 80

9 12 18 18 32 26 36 54

23 35 40 48 64 81 114 117

AAE of CMB estimated source contribution (%)

(B) simulation run

1 2 3 4 5 6 7 8 a

AAE of NCPCRCMB estimated source contribution (%)

RD: resuspended dust.

obtained by CMB and NCPCRCMB models, respectively. To quantify the total difference between estimated and true contributions for one source category, the value of AAE (average absolute error) (20) was used: AAE )

1 × n

n

∑ (|E - T |/T ) i

i

i

(41)

i)1

where n is the number of the days of data subjected to model (in this study, n ) 100); Ei is the estimated contribution for a particular source on the jth day; and Tj is the true contribution for a particular source on the jth day. As the estimated source contributions are more close to the true source contributions, the values of AAE become smaller. During the NCPCRCMB analysis, the factors with small eigenvalue should be eliminated to obtain a reasonable solution. The objective criterion we recommend is by referring to the CI and VDPs. In SI Table S2, for example, only the last factor was associated to the high CI and VDPs, so the last factor should be eliminated. It has been shown in literature (22) that the small eigenvalue may lead to high CI value. The values of AAEs in each simulation are listed in Table 1. It can be found that in all the runs, the values of AAE for CMB results were much higher than that for NCPCRCMB results, mainly because that CMB obtained several negative contributions for the nearly collinear sources (these sources got high AAE values in Table 1a). This results show that the estimated source contributions by NCPCRCMB were more close to the true values. Javitz et al. (20) suggested that AAEs less than 50% would represent acceptable precision for an individual model analysis. So according to the values in Table 1b, it can be concluded that the NCPCRCMB can obtain acceptable accuracy and precision even when the CV of source profiles 8870

9

ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 43, NO. 23, 2009

is 25% and the CV of measurement error is 20% (with serious near collinearity problem presenting). These CVs ranges are similar to that in the study of Javitz et al. (20). Use of the NCPCRCMB Model to Study an Ambient Receptor Data set from Wuxi City. For ambient data, completely compatible source and receptor measurements are not commonly available (12). Thus, if there were serious near collinearity problems among the fitting source profiles, it would be difficult to obtain reasonable results using the CMB model. In this section, an ambient data set from Wuxi, China was apportioned by the NCPCRCMB model. Sampling Site Description. The city of Wuxi (108°08′102°15′E, 26°05′-27°21′N), is located on the Yangtze River Delta, Jianghujian Corridor, near (128 km) Shanghai. Wuxi is one of the most prominent tourist and economically important cities in the Jiangsu province, with an area of 4650 km2 and a population of 4.42 million. The climate of Wuxi is semitropical, with a southern monsoon climate, warm temperatures (17 °C annual average) and plenty of rainfall (1126 mm annual average). Wuxi is located on a large plain but is higher in the southwestern region, with a basin terrain in the east. The prevailing wind directions are NW in winter, SE in spring and summer, N in autumn, with an annual average wind speed of 3.5 m/s. The ambient PM10 concentration data were obtained during sampling campaigns in 2005, with a total of 168 samples. All samples were collected by filtration with a medium-volume air sampler, situated about 5 m from the ground. The pump was set at 100 L/min and ran continuously for 24 h. Two parallel medium-volume air samplers were used for obtaining PM10 data on polypropylene membrane filters and quartz fiber filters. The sampling process was described in the literature (15, 23).

TABLE 2. Ambient Receptor (µg/m3) and Source Profiles (%) from Wuxi

source

ambient receptor µg/m3

samples number 168 Na 2.16 ( 0.12 Mg 0.79 ( 0.18 Al 6.28 ( 0.46 Si 10.82 ( 0.38 K 1.08 ( 0.08 Ca 7.28 ( 1.16 Ti 0.18 ( 0.05 Cr 0.05 ( 0.02 Mn 0.11 ( 0.02 Fe 1.90 ( 0.14 Co 0.01 ( 0.00 Ni 0.02 ( 0.02 Cu 0.04 ( 0.01 Zn 0.69 ( 0.10 Pb 0.06 ( 0.02 12.50 ( 1.69 OCb 1.33 ( 0.38 Cl 4.06 ( 0.59 NO328.21 ( 1.43 SO4 total mass 112.33 ( 29.42 a

RDa

soil dust

coal combustion

30 2.70 ( 0.91 1.02 ( 0.13 8.08 ( 1.02 19.13 ( 3.0 1.51 ( 0.49 14.31 ( 2.00 0.48 ( 0.12 0.02 ( 0.01 0.07 ( 0.01 1.80 ( 0.42 0.00 ( 0.00 0.01 ( 0.00 0.02 ( 0.01 0.13 ( 0.03 0.05 ( 0.02 5.41 ( 5.13 0.22 ( 0.15 0.36 ( 0.25 6.31 ( 2.45

30 1.73 ( 0.52 0.71 ( 0.26 12.81 ( 2.68 25.68 ( 1.77 0.97 ( 0.29 1.78 ( 1.28 0.22 ( 0.12 0.01 ( 0.001 0.06 ( 0.02 2.51 ( 0.32 0.00 ( 0.00 0.01 ( 0.001 0.00 ( 0.00 0.01 ( 0.01 0.00 ( 0.00 0.28 ( 0.22 0.00 ( 0.00 0.00 ( 0.00 0.02 ( 0.05

16 1.75 ( 0.52 0.24 ( 0.17 5.31 ( 4.63 12.03 ( 8.26 0.98 ( 0.29 4.31 ( 3.57 0.46 ( 0.17 0.00 ( 0.00 0.02 ( 0.01 1.74 ( 1.63 0.00 ( 0.00 0.00 ( 0.00 0.00 ( 0.00 0.02 ( 0.03 0.00 ( 0.00 11.58 ( 8.17 0.01 ( 0.02 0.00 ( 0.00 0.06 ( 0.10

RD: resuspended dust.

b

cement dust 100%

vehicle exhaust 100%

13 1.59 ( 0.30 1.48 ( 0.49 5.12 ( 1.79 8.62 ( 1.27 0.89 ( 0.17 41.10 ( 2.38 1.71 ( 0.37 0.03 ( 0.01 0.08 ( 0.02 1.18 ( 0.20 0.00 ( 0.00 0.01 ( 0.000 0.01 ( 0.000 0.01 ( 0.000 0.00 ( 0.00 0.95 ( 0.22 0.02 ( 0.02 0.00 ( 0.00 1.41 ( 0.47

4 0.49 ( 0.18 0.29 ( 0.15 0.24 ( 0.13 0.67 ( 0.58 0.24 ( 0.21 2.27 ( 0.81 0.48 ( 0.07 0.01 ( 0.02 0.01 ( 0.00 0.48 ( 0.17 0.02 ( 0.01 0.03 ( 0.02 0.01 ( 0.01 0.23 ( 0.13 0.01 ( 0.02 66.92 ( 5.16 1.48 ( 0.28 1.42 ( 0.40 1.83 ( 1.21

steel manufacture secondarynitrate secondarysulfate 5 0.86 ( 0.01 1.56 ( 0.04 0.53 ( 0.01 1.31 ( 0.13 3.28 ( 0.07 9.23 ( 0.19 0.04 ( 0.00 0.03 ( 0.00 0.98 ( 0.01 36.68 ( 0.02 0.03 ( 0.00 0.01 ( 0.01 0.04 ( 0.01 0.98 ( 0.09 0.32 ( 0.03 0.82 ( 0.08 0.72 ( 0.23 0.00 ( 0.00 1.66 ( 0.23

0.00 ( 0.00 0.00 ( 0.00 0.00 ( 0.00 0.00 ( 0.00 0.00 ( 0.00 0.00 ( 0.00 0.00 ( 0.00 0.00 ( 0.00 0.00 ( 0.00 0.00 ( 0.00 0.00 ( 0.00 0.00 ( 0.00 0.00 ( 0.00 0.00 ( 0.00 0.00 ( 0.00 0.00 ( 0.00 0.00 ( 0.00 77.50 ( 7.75 0.00 ( 0.00

0.00 ( 0.00 0.00 ( 0.00 0.00 ( 0.00 0.00 ( 0.00 0.00 ( 0.00 0.00 ( 0.00 0.00 ( 0.00 0.00 ( 0.00 0.00 ( 0.00 0.00 ( 0.00 0.00 ( 0.00 0.00 ( 0.00 0.00 ( 0.00 0.00 ( 0.00 0.00 ( 0.00 0.00 ( 0.00 0.00 ( 0.00 0.00 ( 0.00 72.70 ( 7.27

OC: organic carbon.

Source Sampling. Samples of soil dust source were swept from representative portions of the ground surface with a plastic brush and tray (24). Coal combustion was collected from particulate pollution control devices (electrostatic precipitators, fabric filters or wet scrubbers) or sampled by a dilution stack sampler (14). Cement source samples were obtained from cement plant and building construction sites. Steel manufacturing dust and vehicle exhaust dust were sampled from a steel plant and vehicle exhaust pipes, respectively. Dilution stack and vehicle exhaust samplers were able to collect PM10 samples on filters. All powder samples were sieved and suspended in a resuspension chamber or separated by a Bahco centrifugal instrument for PM10 (14). For secondary aerosol sources, ammonium sulfate and ammonium nitrate were expressed as “pure” secondary source profiles (25, 26). Chemical Analysis. A range of elements (Na, Mg, Al, Si, K, Ca, Ti, Cr, Mn, Fe, Co, Ni, Cu, Zn, and Pb) were analyzed by ICP (IRIS Intrepid II, Thermo Electron) (27, 28). Water-soluble Cl-, NO3-, and SO42- were extracted by an ultrasonic extraction system (AS3120, AutoScience) and analyzed by ion chromatography (DX-120, DIONEX) (29, 30). TC (total carbon) and OC (organic carbon) were analyzed by a carbon elemental analyzer (Vario EL, GmbH) (14). Further details about sampling, treatment and analysis of source and ambient samples are given in prior work (14, 15).

Results and Discussion Table 2 shows that the annual average PM10 concentration in Wuxi was 112.33 µg/m3. Compared with other published data, the average PM10 concentration in Wuxi was higher than those in a number of other cities of China, including Guangzhou (86 µg/m3 in 2004 (31)), Shenzhen (75 µg/m3 in 2004 (32)), and Zhuhai (44 µg/m3 in 2004 (32)), but lower than those in Tianjin (123 µg/m3 in 2002 (15)), and Taiyuan (186 µg/m3 in 2002 (15)). According to the ambient investigation and emission inventory in the sampling area, eight source categories were identified for the NCPCRCMB calculation: resuspended dust, soil dust, coal combustion, cement dust, vehicle exhaust, steel manufacture, secondary nitrate, and sulfate. These source profiles were measured in Wuxi and values are given in Table 2.

TABLE 3. Estimated Source Contributions (%) by NCPCRCMB Model, For Five Ambient Data Sets

a

RD soil dust coal dust cement dust vehicle exhaust secondary sulfate secondary nitrate steel manufacture sea salt others factors removed chi2 R2 percent mass a

Wuxi

Ji’nan Tianjin Yinchuan Taiyuan

29.88 8.72 21.77 2.43 10.46 7.04 4.36 2.90

17.05 18.32 22.51 8.87 17.68

12.44 15.57 1 2 3.37 0.67 0.96 0.96 87.56 84.43

14.48 16.35 21.58 5.45 18.77 6.39 1.76

25.23 16.14 15.80 11.27 14.87 2.00 1.00

3.94 11.28 1 2.25 0.93 88.72

13.71 1 0.24 0.99 86.29

17.40 15.59 21.95 10.70 18.46 11.20 4.01 6.77 1 0.41 0.96 106.07

RD: resuspended dust.

The source profile and receptor data were then apportioned by the NCPCRCMB model, with one factor removed. The average source contributions (%) to ambient PM10 receptors in Wuxi are listed in Table 3. The results of the NCPCRCMB model calculations were examined in terms of the performance indices described above, such as R2, chi (2) and the percent mass explained. The values of these performance indices, listed in Table 3, all meet the requirements of the CMB model (19). The results show that resuspended dust was the most significant source in Wuxi, contributing 29.88% to the measured PM10 mass. Coal combustion gave the second highest contribution, accounting for 21.77%. Vehicle exhaust, soil dust, secondary sulfate and nitrate, steel manufacture, and cement dust accounted for 10.46, 8.72, 7.04, 4.36, 2.90, and 2.43%, respectively. To evaluate the accuracy of the model, ambient data sets from another four cities were analyzed by NCPCRCMB. Information on the ambient receptors and source profiles was reported in our previous studies (14, 15). The results are listed in Table 3. The performance indices of the source apportionment results for these four cities (chi2: 0.24-2.25, R2: 0.93-0.99, percent mass: 84.43%-106.07%) are all within VOL. 43, NO. 23, 2009 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

8871

FIGURE 1. Fit between model results and measurements in five cities* the fitting species selected for the NCPCRCMB model are labeled on the plot. the ranges of the criteria for the CMB model. Also, the estimated source contributions conform to the actual situation in these four cities, according to our prior studies (14, 15). In addition, all five ambient data sets were studied by the CMB model. The results in SI Table S3 show that several nearly collinear sources in some cities resulted in negative contributions. Finally, according to eq 20, the estimated species concentrations can be calculated by the source profiles and contributions. The fits between the estimated and measured species concentrations for the five cities are presented in Figure 1. From the plot, regressions (p < 0.01) were from 1:0.86 to 1:1.05, and the values of R2 were from 0.97 to 1.00, which means that most estimated species concentrations were close to the measured species concentrations. It can be concluded that the NCPCRCMB model obtained reasonable results for the five cities ambient data set. In this study, the NCPCRCMB Model was introduced to conduct the source apportionment in the presence of nearly collinear source profiles. Model data sets were analyzed by the NCPCRCMB model to evaluate its accuracy. Also, it was applied for the source apportionment to five real ambient 8872

9

ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 43, NO. 23, 2009

receptor data sets from five cities in China. The reasonable results obtained using the model in ambient studies indicate that the NCPCRCMB model is useful for source apportionment studies.

Acknowledgments This study was supported by SDP of Tianjin (No. 043804611), the Innovation Foundation of Nankai University, and the Combined Laboratory of the Tianjin Meteorological Bureau and Nankai University.

Supporting Information Available Three tables provide further information for results and discussion. This material is available free of charge via the Internet at http://pubs.acs.org.

Literature Cited (1) Paatero, P.; Hopke, P. K.; Begum, B. A.; Biswas, S. K. A graphical diagnostic method for assessing the rotation in factor analytical models of atmospheric pollution. Atmos. Environ. 2005, 39, 193– 201. (2) Watson, J. G.; Zhu, T.; Chow, J. C.; Engelbrecht, J.; Fujita, E. M.; Wilson, W. E. Receptor modeling application framework for

(3) (4)

(5)

(6) (7)

(8)

(9) (10) (11)

(12)

(13) (14)

(15) (16) (17)

(18) (19)

particle source apportionment. Chemosphere 2002, 49, 1093– 1136. Hopke, P. K. Recent developments in receptor modeling. J. Chemom. 2003, 17, 255–265. Chow, J. C.; Watson, J. G. Review of PM2.5 and PM10 apportionment for fossil fuel combustion and other sources by the chemical mass balance receptor model. Energy Fuel 2002, 16, 222–260. Chen, L. W. A.; Watson, J. G.; Chow, J. C.; Magliano, K. L. Quantifying PM2.5 source contributions for the San Joaquin Valley with multivariate receptor models. Environ. Sci. Technol. 2007, 41, 2818–2826. Henry, R. C. Dealing with near collinearity in chemical mass balance receptor models. Atmos. Environ. 1992, 26A, 933–938. Thurston, G. D.; Spengler, J. D. A quantitative assessment of source contributions to inhalable particulate matter pollution in Metropolitan Boston. Atmos. Environ. 1985, 19, 9–25. Rachdawong, P.; Christensen, E. R. Determination of PCB sources by a principal component method with nonnegative constraints. Environ. Sci. Technol. 1997, 31, 2686–2691. Henry, R. C. Multivariate receptor modeling by N-dimensional edge detection. Chemom. Intell. Lab. Syst. 2003, 65, 179–189. Paatero, P. Least squares formulation of robust non-negative factor analysis. Chemom. Intell. Lab. Syst. 1997, 37, 23–35. Hwang, I.; Hopke, P. K.; Pinto, J. P. Source apportionment and spatial distributions of coarse particles during the reginal air pollution study. Environ. Sci. Technol. 2008, 42, 3524–3530. Watson, J. G.; Chen, L. W. A.; Chow, J. C.; Doraiswamy, P.; Lowenthal, D. H. Source apportionment: Findings from the U.S. supersites program. J. Air Waste Manage. Assoc. 2008, 58, 265– 288. Hopke, P. K. Receptor Modeling in Environmental Chemistry; John Wiley & Sons, Inc.: New York, 1985; pp 132-140. Zhao, P. S.; Feng, Y. C.; Zhu, T.; Wu, J. H. Characterizations of resuspended dust in six cities of North China. Atmos. Environ. 2006, 40, 5807–5814. Bi, X. H.; Feng, Y. C.; Wu, J. H.; Wang, Y. Q.; Zhu, T. Source apportionment of PM10 in six cities of northern China. Atmos. Environ. 2007, 41, 903–912. Otto, M. Chemometrics: Statistics and Computer Application in Analytical Chemistry; Wiley-VCH: New York, 1999; p 196. Pires, J. C. M.; Martins, F. G.; Sousa, S. I. V.; Alvim-Ferraz, M. C. M.; Pereira, M. C. Selection and validation of parameters in multiple linear and principal component regressions. Environ. Modell. Software 2008, 23, 50–55. Chatterjee, S.; Hadi, A. S. , Regression Analysis by Examples, 4th ed.; Wiley: New York, 2006; p 262. U.S. Environmental Protection Agency. EPA CMB8.2 User’s Manual; EPA Office of Air Quality Planning and Standards: Research Triangle Park NC, 2004.

(20) Javitz, H. S.; Watson, J. G.; Robinson, N. Performance of the chemical mass balance model with simulated local-scale aerosols. Atmos. Environ. 1988, 22, 2309–2322. (21) Lowenthal, D. H.; Chow, J. C.; Watson, J. G.; Neuroth, G. R.; Robbins, R. B.; Shafritz, B. P.; Countess, R. J. The effects of collinearity on the ability to determine aerosol contributions from diesel- and gasoline- powered vehicles using the chemical mass balance model. Atmos. Environ. 1992, 26A, 2341–2351. (22) Belsley, D. A., Kuh, E., Welsch, R. E. 1980: Regression Diagnostics: Identifying Influential Data and Sources of Collinearity; John Wiley & Sons:New York., 1985. (23) Chan, Y. C.; Simpson, R. W.; Mctainsh, G. H.; Vowles, P. D.; Cohen, D. D.; Bailey, G. M. Source apportionment of PM2.5 and PM10 aerosols in Brisbane (Australia) by receptor modeling. Atmos. Environ. 1999, 33, 3251–3268. (24) Samara, C.; Kouimtzis, Th.; Tsitouridou, R.; Kanias, G.; Simeonov, V. Chemical mass balance source apportionment of PM10 in an industrialized urban area of Northern Greece. Atmos. Environ. 2003, 37, 41–54. (25) Mazzera, D. M.; Lowenthal, D. H.; Chow, J. C.; Watson, J. G. 2001. Sources of PM10 and sulfate aerosol at McMurdo station, Antarctica. Chemosphere 2001, 45, 347–356. (26) Park, S. S.; Kim, Y. J. Source contributions to fine particulate matter in an urban atmosphere. Chemosphere 2005, 59, 217– 226. (27) Baldwin, D. P.; Zamzow, D. S.; D’Silva, A. P. Aerosol mass measurement and solution standard additions for quantization in laser ablation-inductively coupled plasma atomic emission spectrometry. Anal. Chem. 1994, 66, 1911–1917. (28) Watson, J. G.; Chow, J. C.; Frazier, C. A. X-ray fluorescence analysis of ambient air samples. In Elemental Analysis of Airborne Particles; Landsberger, S., Creatchman, M., Eds.; Gordon and Breach: Newark, NJ, 1999; pp 67-96. (29) Carvalho, L. R. F.; Souza, S. R.; Martinis, B. S.; Korn, M. Monitoring of the ultrasonic irradiation effect on the extraction of airborne particle matter by ion chromatography. Anal. Chim. Acta 1995, 317 (1-3), 171–179. (30) Chow, J. C., Watson, J. G. Ion chromatography in elemental analysis of airborne particles. In Elemental analysis of airborne particles; Landsberger, S., Creatchman, M., Eds.; Gordon and Breach: Newark, NJ, 1999; pp 539-573. (31) Liu, S.; Hu, M.; Slanina, S.; He, L. Y.; Niu, Y. W.; Bruegemann, E.; Gnauk, T.; Herrmann, H. Size distribution and source analysis of ionic compositions of aerosols in polluted periods at Xinken in Pearl River Delta (PRD) of China. Atmos. Environ. 2008, 42, 6284–6295. (32) Cao, J. J.; Lee, S. C.; Ho, K. F.; Zou, S. C.; Fung, K.; Li, Y.; Watson, J. G.; Chow, J. C. 2004. Spatial and seasonal variations of atmospheric organic carbon and elemental carbon in Pearl River Delta region, China. Atmos. Environ. 2004, 38, 4447–4456.

ES902785C

VOL. 43, NO. 23, 2009 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

8873