Iterative Inverse Modeling and Direct Sensitivity Analysis of a

A four-dimensional data assimilation approach that combines emission-based modeling and inverse modeling techniques has been developed to provide a ...
0 downloads 0 Views 139KB Size
Environ. Sci. Technol. 2000, 34, 4974-4981

Iterative Inverse Modeling and Direct Sensitivity Analysis of a Photochemical Air Quality Model ALBERTO MENDOZA-DOMINGUEZ* AND ARMISTEAD G. RUSSELL School of Civil and Environmental Engineering, Georgia Institute of Technology, 200 Bobby Dodd Way, Atlanta, Georgia 30332-0512

A four-dimensional data assimilation approach that combines emission-based modeling and inverse modeling techniques has been developed to provide a method to help identify emissions inventory improvements. The method is based on linking formal direct sensitivity analysis of threedimensional air quality models with an inverse modeling technique such that observational data of multiple species can be used to suggest improvements in emission strengths, patterns, and compositions of various source categories simultaneously. Information regarding the characteristics of the data and the emissions is incorporated into the model by means of weighting factors. A penalty function allows determining if the method is fully observationdriven, emission-driven, or mixed. The assimilation of the data requires an iterative process since the sensitivity coefficients change as emissions are adjusted, though results suggest that only three or four iterations are necessary. The method has been applied to the Atlanta Metro Area and tested with pseudo-observations. Perturbed emissions were adjusted close to their known original value in these test scenarios. When noisy pseudo-observations were used, the final simulated concentration errors were of the order of the noise applied to the pseudoobservations.

1. Introduction Four-dimensional data assimilation (FDDA) is a method that directly, or indirectly, incorporates observational data into a deterministic model by modifying model inputs or parameters. FDDA tries to get a better description of the processes that the model is simulating by adding information of the real state of the system. This has not yet been developed widely for air quality models (AQMs), in part due to the lack of an appropriate approach. Historically, emissions-based models (EBMs) have been the primary tool used to determine how source controls will affect air quality (1, 2). If EBMs and their inputs were perfect, they are an ideal approach to generate control strategies since they are based on describing the physics and chemistry of pollutant dynamics. However, EBMs rely on emissions inventories that are suspected of being highly uncertain (e.g. ref 2). In typical EBM applications to photochemical smog episodes there are significant differences between modelderived concentrations and the corresponding observations, particularly for the primary species, including individual * Corresponding author phone: (404)385-0570; fax: (404)894-8266; e-mail: [email protected]. 4974

9

ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 34, NO. 23, 2000

volatile organic compounds (VOCs) and nitrogen oxides (NOx) [e.g. refs 3 and 4]. Thus, a significant lack of confidence remains in EBM applications. In particular, if the emissions inventories were incorrect, then using the results of the model application to define source-receptor relationships and control strategies would be suspect. Recently, observation-based methods (OBMs) [e.g. refs 5 and 6] have been suggested to overcome some of the perceived weaknesses in EBMs. Since observations of ozone and its precursors are used to provide information on how air quality will respond to emissions changes, such a method is less impacted by the uncertainties (errors) in the emissions. It is recognized that this approach, too, has limitations; in particular there does not exist a direct link between model predictions and emissions. However, one of the attractive features of OBMs is that there is an increasing amount of measurement data available, and OBMs present an opportunity to mine that information. Of interest is coupling the EBM and OBM approaches such that the problems of each one are removed, without introducing other significant limitations. The hybrid EBM-OBM technique would be able to suggest changes in the emissions inventory being used to get a better performance of the AQM. Past studies have made used of different inverse modeling techniques to derive adjusted emissions based on ambient data. Some applications have made use of direct matrix inversion techniques (7) or Kalman filtering (8, 9) to adjust emissions of low-reactive species at regional and global scales. Kalman filters have also being used to adjust urban scale emissions of SO2 (10), CO (11, 12), and isoprene (13). However, deriving inverse-adjusted emissions with this technique has proven to be in some applications a challenge (12, 13). Others (e.g. ref 14) have implemented variational assimilation approaches for chemistry/transport models, though these variational techniques have not been applied yet to adjust emission parameters. No matter what inverse method is chosen, the most serious problem that one faces in trying to deduce emissions using atmospheric measurements is the “ill-posedness” of the inverse problem (15, 16). This is reflected when small variations in the observational data produce large variations in the estimated emission adjustments. Physically, in the atmosphere the signals from the different sources are smoothed out by mixing and chemical reactions. This smoothing process translates in information from the emission source being damped down below the noise level in the observed data. Thus, this information is essentially lost, i.e., it cannot be differentiated from the noise associated with the measurements. The FDDA method proposed here has been implemented to suggest adjustments to emissions used to drive EBMs. The technique is based on linking direct sensitivity analysis of three-dimensional photochemical AQMs with inverse modeling techniques to produce a hybrid observation/ emissions-based modeling (O/EBM) approach that fully incorporates the emission estimates, meteorology, and observations, providing a bridge between EBM and OBM techniques. By using an EBM, the FDDA method includes a first principle approach for assessing the impact of transport, deposition, and chemistry and also brings all the available measurements to bear in a spatially and temporally consistent fashion. The FDDA technique iteratively adjusts emission strengths and compositions by minimizing the difference between observations and simulated concentrations. A penalty function is defined such that one can determine the relative importance of the OBM and the EBM components in the assimilation. The aim in this application is to produce 10.1021/es991040+ CCC: $19.00

 2000 American Chemical Society Published on Web 11/01/2000

a top-down approach to suggest emissions inventory improvements; however, the method is not limited to deduce emission changes and can also be applied to suggest modifications to other model inputs and parameters. The major assumption used here to drive the FDDA method is that the emissions are the major source of discrepancy between the observations and model predictions, i.e., the AQM and other inputs are subject to smaller uncertainties. However, it is evident that other uncertainties do exist, and the applicability of this approach depends on the characteristics of the AQM being used and the uncertainties in other inputs as well as the properties of the observations data set. An application of the method is presented for the Atlanta Metropolitan Area in which model-derived observations were used to test the technique.

2. Formulation of the FDDA Approach The FDDA method proposed is based on using a photochemical AQM, with direct sensitivity analysis, combined with inverse modeling. Here the FDDA approach is implemented by linking the CIT model (3, 17, 18), extended with the direct decoupled method for three-dimensional models (DDM-3D), which conducts direct sensitivity analysis calculations (19, 20), with ridge regression (21). The CIT model, extended with DDM-3D, calculates local sensitivity coefficients of model outputs to seminormalized model parameters and inputs (20). The seminormalized sensitivity coefficients for gas-phase species have units of ppb/% increase in the parameter j. DDM-3D computes the sensitivity fields of all the pollutants to each chosen parameter with a single execution. With this approach the model can be applied once, and the concentration fields of all species and sensitivity fields of all the primary and secondary pollutants to different emission sources can be calculated simultaneously. Further details on the coupling of DDM-3D with the CIT model can be found in the Supporting Information. The difference between observations and simulated concentrations, along with the sensitivities, are used to estimate how much emissions from a specific source should be altered to optimize the performance of the AQM. Let Oik be the kth observation (in space and time) of the ith species and Pik〈adjusted〉 be an adjusted simulated concentration, at the same location and time of the measured concentration, then the error can be defined as

eik ) Oik - Pik〈adjusted〉

(1)

The response of the simulated concentration due to an adjustment in the strengths of given sources (i.e., Pik〈adjusted〉) can be calculated by means of a linear combination of the product of the sensitivity coefficients and the perturbed emission strengths (assuming the EBM responds linearly to its inputs)

Pik〈adjusted〉 ) Pik +

J

∑s

ikjµj

(2)

j)1

where sikj is the sensitivity of the ith species to emissions from the jth source at the location and time of the kth observation, J is the total number of sources, and µj is the scaling factor of the jth source from its base level. Combining eqs 1 and 2 yields J

Oik - Pik ) dik )

∑s

ik,jµj

+ eik

(3)

j)1

or in matrix notation d)Sµ+e, where µ is a vector of length J, d and e are vectors of length K‚I, S is the (K‚I×J) matrix of

sensitivity coefficients, I is the total number of species, and K is the total number of observations. Note that the matrix dik is transformed to a one-dimensional vector where the first K elements correspond to the first species (i.e., d1k), the next K elements to the second species, and so on. A similar transformation is applied to the sikj elements to obtain the correspondent sik,j components of the S matrix. Given that, typically, there are more observations available than the number of major source categories (e.g. power plants, automobile exhaust, etc.) and that the model will not allow a perfect agreement (i.e. emissions perturbations cannot bring all predictions in agreement will all observations), there will be some residual error. We find the amount each source strength should be adjusted to minimize the error in a least-squares sense. It should be recognized that the sensitivities will not remain constant as the emission strengths are varied and that iteration, with updating of the sensitivity coefficients, is performed to achieve a stable response. Additional information about the measurements and known, or presumed, properties of the emissions can be incorporated in the analysis (22). This additional information can recognize, for example, uncertainties in the data gathering and characteristics of the monitoring network and is built into the model by weighting the observations. Here we identified five factors that can be used to weight the observations: (1) the number of valid data points for each observed species, (2) the known (or presumed) error of each measurement, (3) the relative deviation of each predicted value with respect to its corresponding measurement, and the relative value of each measurement with respect to (4) the spatial mean of the same species at the same hour and (5) to the temporal mean of the same species at the same location. These factors are introduced in the model by means of a weighting diagonal matrix We. Each element in We (ωik) is computed as

ωik ) ωik〈N〉‚ωik〈c〉‚ωik〈co〉‚ωik〈xy〉‚ωik〈t〉

(4)

where ωik〈N〉 is a weighting factor that takes in account the number of point measurements available for species i, ωik〈c〉 is the weighting factor due to uncertainties in the observations, ωik〈co〉 is the weighting factor due to the difference between simulated and observed concentrations, ωik〈xy〉 is the weighting factor for the location of the observation, and ωik〈t〉 is the weighting factor for the time of the observation. Additionally, a scaling factor is added to each sensitivity coefficient:

gikj ) ωj〈E〉‚sikj

(5)

This factor is used to over-relax the emission adjustments of those sources that are suspect of being highly uncertain and are likely to change the most in the assimilation process. The form of the scaling factor and further comments on its purpose are described with detail in the Supporting Information. In essence, the scaling factors transform the S matrix into a (K‚I×J) G matrix with elements gik,j. This transformation also redefines eq 3 to be d)Gm+e, where m is the unknown vector of emission rate scaling factors obtained from using the scaled sensitivity coefficients. The weighting factors ωik〈N〉, ωik〈c〉, ωik〈co〉, ωik〈xy〉, ωik〈t〉, and ωj〈E〉 can be assigned a value following the formulations found in the Supporting Information or can be suppressed assigning them a value of one. It has to be acknowledged that the weighting factors ωik〈co〉, ωik〈xy〉, and ωik〈t〉 are not typically used in standard regression calculations. Only the ωik〈N〉 and ωik〈c〉 weights have statistical reasons to be used. The three additional factors were developed with the intention of accelerating convergence of the iterative assimilation process. The approach is to apply VOL. 34, NO. 23, 2000 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

4975

the three additional weights (or some combination of them) during the first one or two iterations and then use only the weights that take in account the error in observations and number of measurements to remain in the calculations of the last iterations. Some of the experiments conducted here had the aim of determining if the weights helped to achieve faster convergence and, if they did, determine the appropriate number of iterations for their application. Effects of these factors in the stability of the solutions obtained were also investigated in this paper. Here, our approach is to minimize an objective function, Γ, based on the square of the weighted error between the predicted and observed concentrations, i.e., Γ ) eTWee. If this objective function is minimized, the result yields

m ) (GTWeG)-1GTWed

(6)

The above formulation implies that the method is fully observational driven, i.e., the weighted difference between observations and predictions completely drives the values of the estimated emission changes. However, there is no assurance that after applying eq 6 the emission change estimates will be physically meaningful. These estimates have to fulfill certain conditions. First, no changes can be less than -100% (i.e., emissions cannot be negative). A more restrictive condition is that the new emissions should lie inside a range defined by uncertainty limits of the base emission estimates. For example, uncertainties in emission estimates are often represented by a log-normal distribution. Assuming that the baseline inventory is located at the median of the distribution, a confidence interval of the emission estimate covers the range from 1/σgn to σgn (σg is the geometric standard deviation), where n depends on the confidence interval selected (e.g. for a 99.9% CI, n ) 3.3). One can select a value of n for each source to define the range where the new estimates will lie. Here we define a penalty function that will allow us to control the assimilation process by constraining the scaling factors to be within prescribed bounds. The penalty function is represented by a (J×J) diagonal matrix Wm with parameters (positive constants) wjj in the diagonal. Note that when this matrix is incorporated into the linear system in ref 6, it modifies the method from being fully observational driven to a mixed EBM-OBM approach. The introduction of the penalty function is accomplished by defining a new objective function

Γ ) eTWee + mTWmm

(7)

This form allows weighting the relative importance of the EBM and OBM parts in the calculation of the emission scaling factors. The first term tries to explain the data; meanwhile the second term constrains the form of the solution to be of minimum length. For wjj ) 0, the error is dependent only on the weighted difference between the observed and simulated concentrations and does not incorporate any penalty for how much the emissions were perturbed. Alternatively, when wjj f ∞, the differences between the observed and simulated concentrations are not factored in. This last option, as wjj f ∞, is exactly the EBM application. The least-squares estimator derived from minimizing the objective function in eq 7 is

m ) (GTWeG + Wm)-1GTWed

(8)

When all elements of Wm are identical, Hoerl and Kennard’s (23) Ridge Regression (RR) estimator is obtained (i.e., Wm ) λI, where λ is known as the ridge parameter and I is the identity matrix). The estimator in ref 8 is known as a “biased” estimator, which can give more stable predictions than ordinary least squares when there are few or noisy data, or 4976

9

ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 34, NO. 23, 2000

when the generalized inverse is highly ill-conditioned (24). Reviews on the properties of ridge regression can be found elsewhere (21, 25). Of note, the use of ridge regression has also proven to help convergence in iterative estimation of nonlinear parameters (26). (See Supporting Information for additional comments on convergence.) An important issue now is to determine how to obtain a good value for each wjj that constrains the estimated emission changes within a prescribed bound as discussed above. The optimal value of Wm depends on the characteristics of the problem under consideration, the quality of the data, and the adequacy of the problem. If Hoerl and Kennard’s RR estimator is used, a number of methods exist to estimate the ridge parameter (λ) [e.g. refs 23, 27-30]. In general, the differences between the regression estimates obtained using different methods to calculate λ are not large since the regressors are usually stable over ranges of possible values of λ (31). Here a stepwise approach to calculate Wm is taken. First, the ridge parameter λ is estimated based on the work of the Hoerl et al. (32). Then, a length parameter (l), which expands or shrinks the RR solution vector, is computed (24). The length parameter retains the direction of the RR estimator but modifies its length if necessary. The necessity to further shrink (or enlarge) the correction vector beyond what is suggested by simple ridge regression (using only the ridge parameter) is a function of the uncertainty in the correction vector itself. This includes in the process of deriving the correction estimates information from the data and from the estimates themselves. If the solution given by the computation of purely stochastic parameters (λ and l) lies outside the bounds of the uncertainty limits prescribed for a given source, a further penalization is applied increasing the corresponding value of the diagonal element of Wm. Details of this calculation can be found in the Supporting Information. The method as described here makes no assumption on the temporal window selected to conduct the assimilation. The smallest window that can be used is defined by the resolution of the observational data. Thus, assimilation can occur for hourly, daily, or episodic emission estimate corrections.

3. FDDA Application to Atlanta 3.1. Modeling Domain and Testing Approach. The FDDA method was tested using a constructed data set developed for the purpose of exploring the limitations of the method where a “known” solution exists. By known we mean that a predefined perturbation is applied to the base emissions inventory to generate predictions that serve as observations (or pseudo-observations) in the assimilation process. The objective is to see if the FDDA method can identify the predefined perturbation applied. In essence we construct a data set where other model limitations are removed from impacting the analysis. Further, the use of the constructed data set allows more guided exploration of what type of weighting schemes provide the most rapid convergence. The historic episode of August 9-10, 1992 of the Southern Oxidant Study (SOS) 1992 Atlanta Intensive (33) is used as a base to derive the constructed data set. Here, the CIT model is run with known perturbations in one or more source categories, and the model-derived concentrations computed at locations where equivalent observations were made as part of the SOS study are taken as the observations used in the assimilation process. Thus, the pseudo-observations retain the temporal and spatial distribution of the actual observations. The CIT model is then run with the base emission inventory, and the FDDA module is employed to predict the changes required in the emissions to recover the results from the perturbed case. The FDDA module is expected to estimate emission corrections close to the known changes applied. This

FIGURE 1. Location of monitoring stations (b) during the SOS Atlanta Intensive within the modeling domain and the species observed in each station (N: nitrogen containing species; C: carbonyls): (a) O3; (b) O3, N, VOCs; (c) VOCs; (d) O3, VOCs; (e) O3, N, C; (f) O3, C; (g) O3, N, CO, C; (h) O3, N, CO, VOCs, C. Black thin lines represent county boundaries, and thick blue lines represent mayor interstates and highways. Also represented is the location of the modeling domain relative to the State of Georgia in the Southeastern United States. procedure helps assess the correct performance of the assimilation technique under controlled conditions. In following work, ambient data from the SOS field study are used. The modeling domain is the same previously applied to this episode (12). The horizontal structure of the computational grid is a 40 by 40 grid of 4 × 4 km square cells (Figure 1). Details on model inputs and conditions during the episode being modeled can be found elsewhere (12, 13). 3.2. Observational Data and Test Scenarios Description. Five test scenarios were investigated, each one with different mix of observed species assimilated and emission sources analyzed. Species include O3, NO, NO2, peroxyacetyl nitrate (PAN), NOy (NOx + other inorganic and organic nitrates), and VOCs. Only a subset of the measured VOC species was used for testing purposes. VOCs used were as follows: benzene, butane, ethane, ethene, ethylbenzene, isoprene, 2-methylpentane, 1,2,4-trimethylpentane, propene, toluene, o-xylene, and carbonyls (formaldehyde + acetaldehyde). At each iteration of the assimilation process, several combinations of the weighting factors were tested and evaluated for its impact on convergence and stability of the estimated emission changes. The main combinations were as follows: ωik(nw) ) 1.0 (i.e. no weights applied), ωik(a) ) ωik〈t〉, ωik(b) ) ωik〈xy〉, ωik(c) ) ωik〈co〉, and ωik(d) ) ωik〈t〉‚ωik〈xy〉‚ωik〈co〉. In all cases the ωik〈N〉 factor was applied. Following is a description of each of the five test scenarios. In the description, mobile emissions are represented as M, area sources by A, point sources by P, and biogenic sources as B. Thus, CO(M) represents CO emissions from mobile sources. Case I: CO Pseudo-Sources. Two one-grid area pseudosources with baseline emissions of 10 metric tons of CO per hour were added to the base emissions inventory. The first source, CO(1S), was located in cell (7, 27), and the second source, CO(2S), was located in cell (23, 27). All hourly emissions from CO(1S) were doubled, and CO(2S) emissions were halved to generate the pseudo-observations. Assimilation was performed for all CO observations in the episode, thus a single emission estimate change per source for the whole episode was computed. No uncertainty weights were applied (i.e. ωik〈c〉 ) ωj〈E〉 ) 1), and the penalty function was disabled (Wm ) 0). Case II: CO Area and Mobile Sources. Case II was designed to compare the results obtained by the FDDA module against

the results of Chang et al. (12). CO(A) emissions were scaled down by 60%, and CO(M) emissions were increased by 50%. Besides, hourly mobile emissions from the grid cell where the Georgia Tech station is located [cell CO(2C)] were swapped with the emissions from the adjacent right cell [cell CO(1C)], and the resulting emissions were multiplied by 50%. Again, no uncertainty weights were applied but the penalty function was used. Case III: Mobile Source CO and NOx and Biogenic Isoprene. One of the relevant properties of the method proposed here is its ability to treat multiple sources at a time while getting a stable response. Here, NOx(M) emissions were scaled by 2.5, CO(M) by 3.0, and biogenic isoprene [ISOP(B)] by 4.0 times their base values. Observation species used were CO, NO, NO2, O3, PAN, and isoprene. Case IV: Mobile Source CO and NOx, Area Source VOCs, and Biogenic Isoprene. Here, in addition to using NOx(M), CO(M), and ISOP(B) scaled in the same way as in Case III, VOCs from area sources were increased by 75%. Two applications were investigated here, each one using a different set of observation species. Case IVa used NO, NO2, O3, CO, PAN, and isoprene as observation species; Case IVb used all observed species (inorganic and organic) in the data set. The objective here is to show the contribution of primary and secondary species in constraining the solution of the VOC category since in many cases VOC data is sparse. Case V: Noisy Pseudo-Observations. In the previous test scenarios, the pseudo-observations used were perfect. Real observations have intrinsic errors due to the measurement techniques. Further, the measurements are point observations, while the model simulates volume averages. To test how these effects can impact the technique, a certain degree of noise in the pseudo-observations was applied, and then the FDDA approach was used to find the correct emissions with these “noisy” observations. As noted previously, in real applications there are differences between model simulations and observations due to other uncertainties in the model formulation, inputs and parameters. These additional uncertainties were not factored in to better demonstrate details of the technique. In this test case the model was first run using the base case emissions with no perturbations. Then, a noise comVOL. 34, NO. 23, 2000 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

4977

TABLE 1. Uncertainty Measure for Observation Species and Emission Source Categories species

coeff of variation (%)

emission sourcea

geometric SD

99.9% CIb

NOc NO2 c NOx c NOy c O3 c carbonylsd other VOCsd PANd COc

10 20 20 20 5 8 35 20 15

NOx(A,M)e VOC(A,M)e CO(A,M)e NOx(P)e ISOP(B)f OVOC(B)f

1.22 1.49 1.28 1.16 1.92 1.34

(0.51, 1.95) (0.27, 3.73) (0.44, 2.26) (0.61, 1.63) (0.11, 8.60) (0.38, 2.60)

TABLE 3. History of Episodic Estimated Emissions Perturbations for the CO(1S) Area Pseudo-Sourcea weighting schemes

a A: area sources, B: biogenic sources, M: mobile sources, P: point sources. b Values as scaling factors of original emission strength. c Reference 2 and references therein. d Reference 34 and references therein. e Reference 35. f Reference 36.

iteration

ωik(nw)

ωik(a)

ωik(b)

ωik(c)

ωik(d)

1 2 3

1.768 1.947 1.989

1.794 1.951 1.991

1.786 1.952 1.992

1.786 1.959 2.011

1.831 1.969 2.022

a A value of 2.0 is the correct scaling. Values in bold represent the scaling factor obtained from the weighting scheme that gave the lowest total variance (see text for details). This factor is used to scale the emissions for the next iteration.

TABLE 4. History of Episodic Estimated Emissions Perturbations for CO(M) Emissions under Case II Testa weighting schemes iteration

ωik(nw)

ωik(a)

ωik(b)

ωik(c)

ωik(d)

1 2 3 4

1.387 1.492 1.501 1.500

1.390 1.491 1.500 1.497

1.392 1.492 1.499 1.498

1.399 1.488 1.508 1.488

1.403 1.485 1.509 1.480

TABLE 2. Initial Scaling Factors Applied to the Baseline Inventory for Each Emission Source Category under the Test with Noisy Pseudo-Observations source

N1 run

N2 run

N3 run

N4 run

NOx(A) VOC(A) CO(A) NOx(M) VOC(M) CO(M) ISOP(B) NOx(P)

0.725 1.815 0.620 0.720 1.735 2.180 0.130 1.317

1.474 0.495 1.800 0.685 0.490 0.400 0.390 0.697

0.590 2.695 0.680 1.461 0.400 0.490 2.990 1.255

1.386 0.385 1.445 1.387 2.180 1.735 5.330 0.790

ponent was added to the model simulated concentrations derived from the sampling of random variables, where each variable corresponded to a particular species. The noise components were assumed to come from a normal distribution. Coefficients of variation (CV) for each species were taken from the literature (Table 1), and independent realizations from normal standard distributions were obtained using a Latin hypercube sampling technique (37) and scaled with the appropriate CV to obtain each noise component. In the assimilation process, the ωik〈c〉 weighting factor (computed as ωik〈c〉 ) 1/(σik〈o〉)2, where σik〈o〉 is the measurement standard deviation), was enabled. The σik〈o〉 for each species was assumed to have the same value, and it was computed as the product of the CV of species i and the pseudo-observed episode mean of the same species. Four independent sets of pseudo-observations were created so that four different runs starting with different source perturbations could be analyzed. The runs were labeled as N1 through N4. Table 1 lists the values of σg,j〈E〉 used to define each gik,j (gik,j ) sik,j/(σg,j〈E〉)2) and the bounds established for each source [e.g. final NOx(A,M) emissions cannot be less than 0.51 times the original emissions, nor more than 1.95 times the original emissions]. Table 2 lists the sources used and the initial scaling applied for each of the four test runs.

4. Results In this section the results obtained for each test scenario described in the previous section are presented. The relevant points reported are the method’s ability to estimate the known perturbation applied to derive the pseudo-observations, the stability of the estimates, and velocity of convergence. In all cases the weighting schemes ωik(nw), ωik(a), ωik(b), ωik(c), and ωik(d) were tested at each iteration in the assimilation process, and the emission scaling factors used in subsequent iterations were determined by the weighting scheme that gave the lowest total variance of the emission change estimates (i.e., the scheme for which the sum of the variances of each 4978

9

ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 34, NO. 23, 2000

a

A value of 1.5 is the correct scaling. See Table 3 for details.

estimate is a minimum). For the test with noisy pseudoobservations, uncertainty bounds in the emission change estimates were computed. Table 3 presents the assimilation results for Case I. The values listed correspond to emission adjustments predicted after each iteration step. The scaling factor obtained by the weighting scheme that gave the lowest total variance of the regressors is highlighted in bold, and this estimate was the one used to scale the emissions for the next iteration. There were small differences between the solutions provided with the different combinations of weighting factors in this application. The best solution (in closeness to the real perturbation) for both pseudo-sources at the first iteration was found using the option that uses all the weights (ωik(d)), for the second iteration by using the ωik(d) or ωik(c), and for the final step the ωik(nw), ωik(a), or ωik(b) combinations gave approximately the same result. However, the solutions from the ωik(c) and ωik(d) options tended to diverge in the third assimilation step. After three iterations the estimated emissions for both sources were within 1.5% of the correct value for all weight combinations. Results from Case II are summarized in Table 4. The table presents the iterative solution found for the CO(M) category. After three iterations the episode emissions were scaled by the correct amount. The scheme that used all the weights (ωik(d)) gave the best result in the first iteration, but applying all the weights (and the difference in concentration weighting factor, ωik(c)) at further iterations degraded the solution. Similar results were obtained for the CO(A) category. Figure 2 presents the hourly emissions for the two swapped cells after six iterations. At the third iteration the estimated and correct emissions for cell CO(1C) were within 3%, on average; and at the sixth iteration estimated and correct emission for cell CO(2C) were within 5%, on average. Values for the wjj components of the penalty function were on average 0.1 for the CO(1C) and CO(2C) sources and were set to zero for the CO(A) and CO(M) sources. In Case III we investigate the response of the method to treating reactive species. Table 5 presents the assimilation history for NOx(M) and ISOP(B). In general, applying all weights in the first assimilation led to more rapid convergence, and using the temporal (ωik(a)) and spatial factors (ωik(b)) in the second iteration helped. After that, the weighting factors

FIGURE 2. Hourly estimated emission perturbations for the CO emissions of two adjacent grid cells, CO(1C) and CO(2C), after six iterations. The “baseline” lines represent the correct scaling.

TABLE 5. History of Episodic Estimated Emissions Perturbations for NOx(M) and ISOP(B) under Case III Testa weighting schemes source

iteration

ωik(nw)

NOx(M)

1 2 3 1 2 3

2.021 2.341 2.446 2.854 3.741 3.971

ISOP(B)

ωik(a)

ωik(b)

ωik(c)

ωik(d)

2.039 2.347 2.445 3.322 3.864 3.991

2.067 2.338 2.444 2.616 3.706 3.989

1.920 2.345 2.448 2.411 3.680 3.920

2.002 2.314 2.381 3.492 3.900 3.808

a The correct scalings are 2.5 for NO (M) and 4.0 for ISOP(B). See x Table 3 for details.

TABLE 6. History of Episodic Estimated Emissions Perturbations for VOC(A) Emissions Using No Observations of VOC Speciesa weighting schemes iteration

ωik(nw)

ωik(ab)

ωik(c)

ωik(d)

1 2 3 4 5 6

0.980 1.483 1.613 1.680 1.704 1.715

1.257 1.538 1.624 1.685 1.706 1.713

0.473 1.378 1.615 1.685 1.694 1.608

1.426 1.547 1.628 1.669 1.693 1.508

a

The correct scaling is 1.75. See Table 3 for details.

do little to improve the solutions convergence (and in some cases they degrade the solution). In Case IV a lumped VOC source class was used to contrast the performance of the method on using explicit (e.g. isoprene) or lumped VOCs. Table 6 presents the results for VOC(A) under Case IVa conditions and Figure 3 compares the results from Case IVa and Case IVb. The FDDA module was able to determine the correct emission change required for both cases, but it took twice as many iterations for the assimilation process that did not use additional VOC species. Note that the spatial and temporal factors were combined together (ωik(ab) ) ωik〈t〉‚ωik〈xy〉) since in previous tests both factors responded similarly when used independently. Finally, the noisy pseudo-observations scenario provides a means to test the method under more realistic conditions. Table 7 presents the final estimates for each of the four independent runs performed. In all cases, the estimates are within 30% of their correct value, and the majority of the estimates are within 20%. The highest error was calculated for the CO sources, probably an indication of the low spatial resolution of the observation field for this species (only four

FIGURE 3. Comparison of episodic estimated emission perturbations for VOC(A) sources using (4) observations of VOC species and (0) no observations of VOC species. The correct scaling is 1.75.

TABLE 7. Final Inventory Fractions (with Respect to the Baseline Inventory) Estimated for Each Emission Source Category (Estimate ( 1 SD Relative to the Mean) under the Test with Noisy Pseudo-Observationsc source

N1 runa

N2 runa

N3 runb

N4 runb

NOx(A) VOC(A) CO(A) NOx(M) VOC(M) CO(M) ISOP(B) NOx(P)

1.08 ( 0.01 0.92 ( 0.02 0.81 ( 0.09 0.89 ( 0.01 0.99 ( 0.02 1.07 ( 0.03 1.02 ( 0.03 1.04 ( 0.01

1.09 ( 0.02 1.01 ( 0.05 1.33 ( 0.29 0.99 ( 0.01 1.09 ( 0.07 0.90 ( 0.01 0.91 ( 0.10 0.94 ( 0.03

0.75 ( 0.02 0.80 ( 0.05 1.16 ( 0.23 1.12 ( 0.01 1.20 ( 0.06 0.95 ( 0.07 0.99 ( 0.08 1.05 ( 0.04

1.21 ( 0.01 1.04 ( 0.03 0.98 ( 0.15 0.93 ( 0.01 1.22 ( 0.04 0.97 ( 0.05 0.88 ( 0.05 0.86 ( 0.02

a After 4 iterations. b After 3 iterations. c A value of 1.0 represents the baseline inventory.

TABLE 8. Statistical Model Performance Evaluation Results for Each Species Used in the Assimilation Process for the Test with Noisy Pseudo-Observation for the Second Day of the Simulation normalized error (%)a species CO O3 NO NO2 PAN isoprene toluene ethene carbonylsb

absolute bias (%)b

after 1st after final after 1st after final iteration iteration realc iteration iteration realc 29.5 12.9 32.0 24.5 87.2 196.0 60.5 56.1 61.6

16.2 5.8 18.8 18.9 25.5 20.2 21.0 23.8 7.0

16.0 5.8 18.1 19.0 24.1 19.7 20.6 23.6 6.8

23.1 9.7 6.7 14.6 78.5 196.0 54.0 53.1 59.0

2.6 1.8 1.6 4.6 21.0 7.8 6.7 10.0 1.2

2.5 1.8 0.5 5.2 14.9 7.4 4.4 6.5 1.1

a Average normalized error of the four runs with noisy pseudoobservations (N1-N4). b Average absolute bias of the four runs with noisy pseudo-observations (N1-N4). c Model concentrations obtained with the base case inventory.

sites measured CO). Table 8 presents model performance statistics for the four runs. The comparison is between the simulated concentrations and the noisy pseudo-observations. In addition, the comparison between the base case simulation (“real” emissions) and the noisy pseudo-observations is also provided. On average, the predictions obtained with the FDDA-derived emissions compare closely to the predictions of the base case emissions. The normalized error of the concentrations derived from the simulation with FDDA estimated emissions is close to the noise applied to the pseudo-observations, i.e., it would not be expected that the FDDA module could resolve any emission changes when observations and prediction errors are within the measureVOL. 34, NO. 23, 2000 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

4979

ment errors. Further information on the behavior of the components of the penalty function (Wm) and their significance can be found in the Supporting Information.

5. Discussion Five different scenarios were used to test the FDDA approach developed here. All test cases were conducted with constructed data sets or pseudo-observations. In all cases the FDDA method was able to identify the predefined perturbation used to scale the different emission source categories. Results indicate that three or four iterations are sufficient to obtain a stable response in the assimilation process. The spatial (ωik〈xy〉), temporal (ωik〈t〉), difference in concentration (ωik〈co〉) weighting factors, and different combination of these three were tested for their ability to accelerate the convergence of the process. The combination of all three factors (ωik(d)) accelerated the convergence if applied in the first assimilation step, and at the second step the combinations ωik(a) and ωik(b) gave similar results. After two iterations, the weighting factor’s effect on accelerating the convergence is minimal, and using the ωik〈co〉 and ωik(d) factors introduced divergence from the expected solution. Thus, after two iterations it is suggested to use only the weights commonly used in regression analysis (i.e., weight by the uncertainty in observations and by the number of valid data points). Testing with noisy pseudo-observations allows observing the response of the model to more realistic working conditions. The normalized error between the simulated and observed concentrations for each species after the assimilation process was similar to the relative magnitude of the noise component added to the corresponding species, indicating that the FDDA approach cannot bring agreement between the simulated and observed concentration fields beyond the noise in the observed field (i.e., there is no way of recovering information about the source lost in the noise of the data). However, as noted by Brown (7), high-resolution inversions can still be obtained by introducing additional information in order to constrain the deduced emissions. This is obtained here by the weighting factors and the penalty function. Even though an error bound (expressed as (1σ) was given to the estimated emissions changes in Table 7, it has to be noted that the distributions of the estimates are non-Gaussian since the inverse problem is nonlinear (22). Here the last iteration is used to estimate the variance-covariance matrix of the emission scaling factors, assuming that the problem is not overly nonlinear in the neighborhood of the solution (22). The error bounds have to be taken with caution also because the application of the penalty function biases to some extent the solution and reduces the variance of each estimate. This effect has to be taken in account while analyzing the error bounds. In all but one test, the assimilation was done with a window of assimilation that covered the two-day episode of the application. The additional test assimilated the observations in an hourly fashion. The window of assimilation selected depends on the application and the interpretation that is sought; in some instances it might be of interest compare daily estimate changes whereas in other cases a greater temporal resolution might be of concern. Additionally, a greater resolution in the sources can be specified, either by treating explicitly different VOC species or by differentiating spatially some categories (e.g. mobile source NOx emissions in different quadrants of a domain). An important issue in the ability to generate reliable corrections in emission estimates is the need of a good observational data set. The quality of the data used in the assimilation as well as the spatial and temporal resolution are key aspects on the confidence in the FDDA outputs. Here, the response of the FDDA approach was satisfactory under 4980

9

ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 34, NO. 23, 2000

controlled conditions. It is still necessary to test the method with real data under a variety of conditions to obtain confidence in its applicability. One of the outputs of the FDDA module is how much each source should be modified to maximize performance of all species being assimilated. The scaling used to optimize model performance contains information on the bias in single source emissions strengths. However, it is important to recognize that the scaling value should be used cautiously. The sensitivity of model results to the scaling factor is also important since if the sensitivity is small, the degree of scaling can be magnified due to uncertainties in the measurements or both modeling and emissions inventory errors. Increased confidence in the FDDA application can be obtained by having two separate inventories as starting points of the assimilation run. If both routes lead to similar results, then more solid conclusions on emissions inventory uncertainties and areas for improvement can be drawn. Further, this will identify the degree to which such improvements will impact source attribution results. For example, using the sensitivity coefficients derived from the FDDA application is a direct route to identifying the most effective control strategies. These issues will be addressed in a future paper where ambient observations are used in conjunction with current emissions inventories. In this paper, the FDDA method was tested using observations and emissions of gas-phase species, though the approach is not limited to the gas phase. The incorporation of primary and secondary aerosols is straightforward once the AQM is capable of following the dynamics of these pollutants, and the local sensitivity coefficients for particulate matter can be computed in an efficient manner. An additional assimilation step can also be incorporated into the technique. This would be to add a perturbation concentration field (optimally interpolated) for each of the measured species to the simulated field to bring the simulated concentrations into complete agreement with the observations. This would be further use of FDDA, though the resulting field would not obey mass conservation, and the source-receptor relationships would be clouded. However, one could argue, much in the spirit of OBMs, that this is the appropriate state to be used to conduct the sensitivity of pollutant concentrations to changes in emissions. These are issues subject of further work. Finally, it has to be noted that a limitation of this technique is its inability to guide emission models in the development of emission projections for future years. The FDDA method proposed here can only identify problems with inventories developed for historical episodes. Further, in this paper we created a data set that addressed only errors in the observations, assuming that other processes (e.g. dry deposition) were error-free. Further studies need to be conducted to analyze uncertainties in other processes. In fact, the extension to estimate, for example, adjustments to initial conditions or dry deposition is straightforward given that DDM-3D can readily compute the sensitivities of concentrations to these (and other) parameters.

Acknowledgments This work was supported in part by the National Science Foundation under contract no. BES-9613729 and Georgia Power. A. Mendoza-Dominguez acknowledges the Consejo Nacional de Ciencia y Tecnologia (Mexico) for partial financial support during his research stay at Georgia Tech. The authors thank Michael E. Chang for providing the UAM-IV input files used in this work and one of the reviewers for his/her comments that were valuable in improving the paper.

Supporting Information Available Details on the coupling of DDM-3D with the CIT model, definition of weighting factors, additional comments on convergence, and Tables S1 (number of hourly observations used in the assimilation process) and S2 (values of λ and l at each assimilation step for the pseudo-data runs). This material is available free of charge via the Internet at http://pubs.acs.org.

Literature Cited (1) Seinfeld, J. H. JAPCA 1988, 38, 616. (2) National Research Council. Rethinking the ozone problem in urban and regional air pollution; National Academy Press: Washington, DC, 1991. (3) Harley, R. A.; Russell, A. G.; McRae, G. J.; Cass, G. R.; Seinfeld, J. H. Environ. Sci. Technol. 1993, 27, 378. (4) Harley, R. A.; Cass, G. R. Atmos. Environ. 1995, 29, 905. (5) Cardelino, C. A.; Chameides, W. L. J. Air Waste Manage. Assoc. 1995, 45, 161. (6) Chang, T. Y.; Chock, D. P.; Nance, B. I.; Winklers, S. L. Atmos. Environ. 1997, 31, 2787. (7) Brown, M. J. Geophys. Res. 1993, 98(D7), 12639. (8) Hartley, D.; Prinn, R. J. Geophys. Res. 1993, 98(D3), 5183. (9) Haas-Laursen, D. E.; Hartley, D. E. J. Geophys. Res. 1996, 101(D17), 22823. (10) Mulholland, M. Atmos. Environ. 1989, 23, 1443. (11) Mulholland, M.; Seinfeld, J. H. Atmos. Environ. 1995, 29, 497. (12) Chang, M. E.; Hartley, D. E.; Cardelino, C.; Haas-Laursen, D.; Chang, W.-L. J. Geophys. Res. 1997, 102(D13), 16023. (13) Chang, M. E.; Hartley, D. E.; Cardelino, C.; Chang, W.-L. Geophys. Res. Lett. 1996, 23, 3007. (14) Elbern, H.; Schmidt, H J. Geophys. Res. 1999, 104(D15), 18583. (15) Newsam, G. N.; Enting, I. G. Inverse Problems 1988, 4, 1037. (16) Enting, I. G.; Newsam, G. N. J. Atmos. Chem. 1990, 11, 69. (17) McRae, G. J.; Goodin, W.; Seinfeld, J. H. Atmos. Environ. 1982, 16, 679. (18) Russell, A. G.; McCue, K. F.; Cass, G. R. Environ. Sci. Technol. 1988, 22, 263.

(19) Dunker, A. M. J. Chem. Phys. 1984, 81, 2385. (20) Yang, Y.-J.; Wilkinson, J. G.; Russell, A. G. Environ. Sci. Technol. 1997, 31, 2859. (21) Draper, N. R.; Van Nostrand, R. C. Technometrics 1979, 21, 451. (22) Menke, W. Geophysical Data Analysis: Discrete Inverse Theory (International Geophysics Series, Vol. 45); Academic Press: San Diego, CA, 1989. (23) Hoerl, A. E.; Kennard, R. W. Technometrics 1970, 12, 55. (24) Aldrin, M. Computational Statistics Data Analysis 1997, 28, 377. (25) Frank, I. E.; Friedman, J. H. Technometrics 1993, 35, 109. (26) Marquardt, D. W. J. Soc. Indust. Appl. Math. 1963, 2, 431. (27) Mallows, C. L. Technometrics 1973, 15, 661. (28) Allen, D. M. Technometrics 1974, 13, 469. (29) Golub, G. H.; Heath, M.; Wahba, G. Technometrics 1979, 21, 215. (30) Vinod, H. D. J. Econometrics 1995, 68, 287. (31) Marquardt, D. W.; Snee, R. D. Am. Statistician 1975, 29(1), 3. (32) Hoerl, A. E.; Kennard, R. W. Commun. Statis.-Theor. Meth. 1976, A5(1), 77. (33) Fleming Group. Data from the 1992 Atlanta Intensive, Southern Oxidants Study; Archive site, East Syracuse, NY, July 22-August 31, 1992. (34) Sillman, S.; Al-Wali, K. I.; Marsik, F. J.; et al. Atmos. Environ. 1995, 29, 3055. (35) Hanna, S. R.; Chang, J. C.; Fernau, M. E.; Hansen, D. A. Proceedings of the 22nd NATO/CCMS International Technical Meeting on Air Pollution Modeling and its Application; Plenum Press: New York, 1998; p 711. (36) Geron, C. D.; Pierce, T. E.; Guenther, A. B. Atmos. Environ. 1995, 29, 1569. (37) Iman, R. L.; Shortencarier, M. J. A FORTRAN77 Program and Users’s Guide for the Generation of Latin Hypercube and Random Samples for Use with Computer Models; NUREG/CR-3624; Sandia National Laboratories: Albuquerque, NM, March 1984.

Received for review September 9, 1999. Revised manuscript received September 18, 2000. Accepted September 22, 2000. ES991040+

VOL. 34, NO. 23, 2000 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

4981