Subscriber access provided by University of Florida | Smathers Libraries
Article 3
Quantifying O impacts in urban areas due to wildfires using a Generalized Additive Model Xi Gong, Aaron S. Kaulfus, Udaysankar S Nair, and Daniel A. Jaffe Environ. Sci. Technol., Just Accepted Manuscript • DOI: 10.1021/acs.est.7b03130 • Publication Date (Web): 25 Oct 2017 Downloaded from http://pubs.acs.org on October 25, 2017
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
Environmental Science & Technology is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 25
Environmental Science & Technology
1
1
Quantifying O3 impacts in urban areas due to wildfires using a
2
Generalized Additive Model
3 4 5 6 7 8 9 10 11 12 13 14 15 16
Xi Gong1,2, Aaron Kaulfus3, Udaysankar Nair3, Daniel A. Jaffe2,4* 1
School of Resource and Environmental Sciences, Wuhan University, Wuhan 430079, China School of Science, Technology, Engineering and Mathematics, University of Washington Bothell, 18115 Campus Way NE, Bothell, WA 98011, USA 3 Department of Atmospheric Sciences, University of Alabama-Huntsville, Huntsville, AL 35899, USA 4 Department of Atmospheric Sciences, University of Washington Seattle, Seattle, WA 98195, USA * Corresponding author (University of Washington,
[email protected]) 2
KEYWORDS. Ozone; Generalized Additive Model; Wildfires; Exceptional events
ACS Paragon Plus Environment
Environmental Science & Technology
Page 2 of 25
2
17
ABSTRACT
18
Wildfires emit O3 precursors but there are large variations in emissions, plume
19
heights and photochemical processing. These factors make it challenging to model O3
20
production from wildfires using Eulerian models. Here we describe a statistical approach
21
to characterize the maximum daily 8-hour average O3 (MDA8) for 8 cities in the U.S. for
22
typical, non-fire, conditions. The statistical model represents between 35-81% of the
23
variance in MDA8 for each city. We then examine the residual from the model under
24
conditions with elevated particulate matter (PM) and satellite observed smoke (“smoke
25
days”). For these days, the residuals are elevated by an average of 3-8 ppb (MDA8)
26
compared to non-smoke days. We found that while smoke days are only 4.1% of all days
27
(May-Sept) they are 19% of days with an MDA8 greater than 75 ppb. We also show that
28
a published method that does not account for transport patterns gives rise to large over-
29
estimates in the amount of O3 from fires, particularly for coastal cities. Finally, we apply
30
this method to a case study from August 2015, and show that the method gives results
31
that are directly applicable to the EPA guidance on excluding data due to an
32
uncontrollable source.
33 34
TOC Art
35 36 37
1.
38
particulate organic carbon and many other compounds. The NOx and VOCs are
39
precursors that can contribute to O3 formation.1 A review of more than 100 published
40
studies found that wildfires usually result in O3 production downwind, with the amount of
INTRODUCTION Wildfires emit nitrogen oxides (NOx) and Volatile Organic Compounds (VOCs),
ACS Paragon Plus Environment
Page 3 of 25
Environmental Science & Technology
3 41
O3 produced increasing with transport time away from the fire.2 Some studies have
42
reported rapid O3 production, on the order of 50 ppb in 6 hours.3, 4 But there are large
43
variations in amount and rate of O3 produced, likely associated with variability in
44
emissions, plume heights and many other factors.
45
In extra-tropical regions, wildfire emissions have a relatively high molar VOC/NOx
46
ratio, around 10-100,1 which makes O3 production in smoke plumes very sensitive to
47
NOx concentrations. Mixing of a wildfire plume into a NOx rich urban area can enhance
48
O3 production.5-13 The VOC emissions also have a very high proportion of Oxygenated
49
VOCs (OVOCs), which results in a different photochemical environment compared to
50
typical urban chemistry.1 The high emissions of OVOCs, especially acetaldehyde, result
51
in rapid sequestration of NOx, forming peroxyacetyl nitrate (PAN) or other organic
52
nitrates.2, 14-17 In 5 fire plumes observed in the Pacific Northwest, PAN was found to
53
average 44% of the observed NOy species.18 This is much higher than the 12-15% usually
54
observed in urban plumes.19
55
The large variability in emissions and plume heights, the sub-grid processes,
56
chemical and physical processes with respect to aerosols and the large emissions of
57
OVOCs make it challenging to model O3 production from wildfires with traditional
58
Eulerian models. It is common for Eulerian models to over-predict O3 close to
59
wildfires.20-22 In a rather extreme case, Baker et al23 report wildfire O3 production of up
60
80 ppb, with simultaneous O3 over-predictions up to 60 ppb using the CMAQ model. Lu
61
et al22, using the GEOS-Chem model, show significant over-predictions at sites close to
62
the fires and under-predictions at sites further away. While there are many challenges in
63
modeling wildfire plumes and O3 production, one of the most important factors is the
64
rapid sequestration of NOx, which limits nearby O3 production. As this PAN moves
65
downwind and decomposes, O3 production will be enhanced, in contrast to most other
66
nitrogen oxides. Since the PAN can be transported from a region of high to low
67
concentrations, this may have the effect of increasing overall O3 production. Thus
68
accurately quantifying O3 production from wildfires is a major challenge.
69
In 2015, the U.S. Environmental Protection Agency (EPA) tightened the National
70
Ambient Air Quality Standards (NAAQS) for O3 to 70 ppb for the maximum daily 8-hour
71
average (MDA8).24, 25 Because non-controllable sources can significantly impact O3 in
ACS Paragon Plus Environment
Environmental Science & Technology
Page 4 of 25
4 72
some areas, the EPA has developed the “exceptional events” rule, which allows a state to
73
petition the EPA to exclude data that have been substantially impacted by specific non-
74
controllable sources. For wildfires, the EPA has developed a guidance document that
75
makes recommendations on best practices to document wildfire impacts on O3.25 This
76
guidance document presents several possible tools that can be used, but there are
77
significant challenges with some of these. For example, the guidance document suggests
78
that fire emissions divided by the distance from the fires (Q/D) is one indicator of a local
79
wildfire influence, with higher values of Q/D indicative of stronger influence. However
80
this method has not been extensively evaluated and data in the literature suggests that O3
81
production continues for 1-3 days downwind of a large fire.2 A second tool mentioned in
82
the guidance document is the use of statistical models. These relate O3 concentrations to a
83
number of meteorological factors. Statistical models can predict O3 based on
84
meteorological factors with R2 up to 0.8.26 The residual (observed O3 minus model fitted
85
O3) can give information on additional or unusual sources of O3, which could include
86
unusual industrial emissions, stratospheric intrusions or a contribution from wildfires.
87
This approach has been used in a number of prior studies documenting O3 impacts from
88
wildfires.6, 10, 27
89
Generalized Additive Models (GAMs) are flexible regression models that can
90
incorporate both a linear and non-linear dependence on numerical and categorical
91
variables. These have been used previously to describe the relationship between
92
meteorological variables and PM2.5 (particulate matter with an aerodynamic diameter less
93
than 2.5µm), O3, SO2, and CO. 28-31 Camalier et al.26 demonstrated the use of a slightly
94
different approach called GLM (Generalized Linear Model) to model urban O3 and
95
showed that the daily maximum 8-hour average was related to temperature, humidity and
96
other factors.
97
To identify the presence of smoke we used the NOAA Hazard Mapping System-Fire
98
and Smoke Product (HMS-FSP) combined with surface PM2.5 data. The HMS product is
99
based on analysis of multiple satellite products, both geostationary and polar orbiting. As
100
noted by previous researchers, the frequency of smoke detection at any site is likely an
101
under-estimate.12, 32 Kaulfus et al33 combined surface PM data with the HMS product and
102
examine the PM distributions for days in presence and absence of overhead HMS smoke.
ACS Paragon Plus Environment
Page 5 of 25
Environmental Science & Technology
5 103
They found that on 20% of the days that were above the current 24-hour PM2.5 standard,
104
the HMS smoke product showed smoke in the region.
105
In this work, we apply GAMs to O3 concentrations for 8 cities in the U.S. and
106
examine the residuals in the presence/absence of smoke. We then demonstrate how this
107
approach can be used to estimate O3 production due to wildfires and relate this back to
108
the EPA guidance document on exceptional events. As a case study we focus on the large
109
wildfires that burned in the Pacific Northwest in the summer of 2015 and demonstrate
110
substantial impacts on MDA8 O3 at distances up to 1200 km from the wildfires.
111 112
2.
113
2.1. Variables and Data Sources
MATERIALS AND METHODS
114
Our analysis considers MDA8 O3 from May to September for 2008 to 2015 in 8
115
cities in the western US (see Table S1 for locations). The presence of smoke is identified
116
using the NOAA Hazard Mapping System-Fire and Smoke Product (FSP)
117
(http://www.ospo.noaa.gov/Products/land/hms.html), combined with surface PM2.5 data.
118
A variety of data sources were used for the analysis. The MDA8 O3 and PM2.5 data
119
came from the EPA’s Air Data database and the National Park Service O3 database. The
120
meteorological data came from the NCEP/NCAR reanalysis and AirNow tech dataset
121
(http://www.airnowtech.org/). A full description of the meteorological variables used is
122
given in Table S2. There is some missing data in the meteorological datasets for each site.
123
We use series mean values in the GAMs for a small number of missing meteorological
124
data. In all cases, we only include meteorological parameters that have at least 80% data
125
completeness.
126
The HYSPLIT model (v4.9) was run for each day using the GDAS 1ox1o dataset to
127
calculate a 12-hour backward trajectory from each monitor site. Each trajectory was used
128
to calculate the ending transport quadrant and direct point to point transport distance. For
129
each city we examined the model performance for morning and afternoon trajectory times
130
and choose the best O3 predictors. For all cities, we used the trajectories initialized at
131
2pm local time. By using the direct distance and quadrant between starting and end
132
points, we can incorporate a measure of air mass recirculation or stagnation in the
133
statistical model. We also examined model performance using 24-hour backward
ACS Paragon Plus Environment
Environmental Science & Technology
Page 6 of 25
6 134
trajectories, but found the 12-hour back trajectories slightly superior based on the
135
Akaike’s Information Criterion (AIC) and R2 values for each analysis
136 137
2.2. GAM Method and Model Development
138
Generalized Additive Models are regression models involving a sum of smooth
139
functions of covariates instead of linear coefficients.34, 35 The MDA8 O3 of each city has
140
been modelled separately using the GAM in R software with the “mgcv package” .35 The
141
equation can be written as:
142 143
= + + + ⋯ + (1) where i indicates the ith day’s observation. fj(x) are smooth functions of the
144
predictors or meteorological data. The element g(µi) is the “link” function, which
145
specifies the relationship between the linear formulation on the right side of equation and
146
the expected response µi. Xiθ represents an ordinary linear model component for
147
predictors (elements of the vector Xi) not subject to nonlinear transformations, mainly
148
categorical indicators. Non-linear functions fj(x) are used to represent the complex
149
relationship between ozone and meteorological parameters.36 In this study, we found
150
Gaussian distributions and the identity link function most appropriate because the
151
distribution of ozone (or, more precisely, model residuals) is close to a normal
152
distribution. In some analyses of ozone, non-Gaussian models and/or non-identity link
153
functions, most commonly the log link that results in a multiplicative model, have been
154
used. For example, Camalier et al.26 chose the unusual combination of a Gaussian model
155
and a log link while Aldrin et al. 37and Pearce et al.38 modeled log-transformed pollutant
156
concentrations with an additive model like that given above. In our work, we found very
157
little difference in model results using the log-link function, and given the added
158
computational time, we used only the identity link.
159
Penalized cubic regression splines (CRS) were used for the smoothing functions fj(x)
160
to allow a non-linear response between ozone and each meteorological parameter. The
161
smoothness of each function f in the equation is controlled by the number of knots or the
162
effective number of degrees of freedom. Models fit the observed ozone concentrations
163
better as the number of knots increases but it will become less smooth and effectively
164
over-fit the data. The number of knots are estimated with cross-validation.35 The default
165
method of cross-validation was used to choose the degree of smoothing (penalization) of
ACS Paragon Plus Environment
Page 7 of 25
Environmental Science & Technology
7 166
k=10 dimensional regression splines. Figure S1 shows an example of the O3 response
167
function vs relative humidity and the resulting spline fit.
168
As there are a large number of meteorological datasets, the challenge to building a
169
succinct model is how to select the key parameters out of a large number of parameters.
170
The same combination of parameters was used in the 8 cities of the US. The model
171
considered only the main effects; the possible parameter interaction effects (such as wind
172
speed and direction together) were ignored.37
173
Various methods are available for selection of meteorological parameters, including
174
“regularized” (e.g. “lasso”) methods and the default selection method (of the “select”
175
argument) in the “mgcv” package. However, we chose to adopt a more structured,
176
scientifically-motivated forward selection procedure based on AIC with a goal of
177
obtaining a common model specification that works well across all of the cities.35, 38 We
178
followed this process to build the model:
179
1. Choose the common parameters that are important for all sites. We selected DOY
180
(day of year), year and trajectory direction and trajectory distance as the basic
181
parameters.
182
2. Group the parameters with similar physical information (e.g. temperature
183
variables, pressure variables, etc.) and add these variables one by one in the model to
184
choose the ones have the greatest impact on the AIC.
185
3. Identify the most important parameters for each city. The final model selection is
186
a compromise of these processes so that all cities use the same variable set which
187
includes either 18 or 10 variables models (Table S2). This allows us to compare the
188
model residuals across cities for specific dates.
189
As variables are added, the AIC decreases and R2 increases up to a point. Figure S2
190
shows the R2 and AIC trend for Salt Lake City as variables are added. It was found that
191
the AIC decreased as the adjusted R2 increased when the important parameters were
192
included in the model. For the 8 cities, we evaluated a model with 18 and 10 parameters.
193
Since the AIC decreased continuously up to 18 variables, we can be confident that the
194
model has not been overfit.
195 196
2.3. Model Quality Control
ACS Paragon Plus Environment
Environmental Science & Technology
Page 8 of 25
8 197
For each model, we used graphical tools to assess the underlying assumptions of
198
homogeneity, normality, and independence.35, 39 There are several methods we used: (1)
199
Use the gam.check code in R software to check the QQ plots (sample quantiles against
200
theoretical quantiles), scatterplots (residuals against linear predictor), histogram of the
201
residuals and scatter plots (response against fitted values). These plots are shown in
202
Figure S3; (2) Plot model residuals against fitted values; (3) Plot the residuals against
203
each explanatory variable that was used in the model; (4) Examine the autocorrelation of
204
both the original O3 data and residuals.
205
We examined autocorrelation in the original O3 data and the model residuals. Figure
206
S4 shows the autocorrelation functions for Provo, UT. For the Provo site, the original O3
207
MDA8 dataset has a significant autocorrelation out to at least 15 days. This likely reflects
208
the strong seasonality in O3 concentrations. The autocorrelation in the residuals is much
209
less. The ACF (Auto Correlation Function) value for the residuals drops to a small value
210
in a couple of days. Thus while there is still a small degree of autocorrelation in the
211
results, it is not a large factor. This likely reflects the fact that the fit values include a day
212
of year parameter that will remove the seasonality in residuals. Thus any autocorrelation
213
may reflect temporal correlation in factors that cause high or low O3 that were not
214
considered by the model (eg. a stratospheric intrusion or wildfire). Similar
215
autocorrelation results are seen for the other sites examined.
216 217
3.
218
3.1. Summary of GAM Results for Each City
219
RESULTS Table 1 shows the R2 for the model fits for all sites. Table S3 shows the model fits
220
(R2) for both the 18- and 10-variable models. The change in the R2 when going from the
221
18 parameters to the 10 parameter model is small. Figure 1 shows an example of the
222
observed MDA8 vs model fit for Houston, TX.
223
Table 1 shows that the GAM models have adjusted R2 values ranging from 0.35 to
224
0.81. The residuals are normally distributed. The standard deviation, 95th and 97.5th
225
percentiles of the residuals are shown in Table 1. The highest adjusted R2 is seen for
226
Houston (0.81), and the lowest is seen for Yellowstone (0.35). This likely reflects the
227
greater occurrence of high O3 days in Houston (and correspondingly, greater variance to
ACS Paragon Plus Environment
Page 9 of 25
Environmental Science & Technology
9 228
be explained), and greater local meteorological control on O3.
229
It is useful to contrast the ability of the GAM approach to capture variability in daily
230
MDA8 values with typical Eulerian models. Simon et al40 summarized Eulerian model
231
results for O3 from 69 Eulerian model studies reported in the literature and found typical
232
R2 performance of between 0.56-0.73 for model calculated MDA8s, but much lower R2
233
values (0.05 to 0.28) if only high MDA8 days were considered (>75 ppb). They also
234
report that the Eulerian models tend to under-predict O3 on the highest days. For the
235
Houston GAM results reported here, we find an R2 value (0.82) if all data are included
236
But the GAM R2 values drop to 0.61, 0.52, 0.38 and 0.23 if only MDA8s greater than 40,
237
50, 60 and 70 ppb, respectively, are included. Thus the GAM approach demonstrates
238
comparable ability at capturing daily variability in MDA8 compared to most Eulerian
239
models, keeping in mind the two approaches are very different and give different types of
240
information.
241 242
Table 1: Adjusted R2 and statistics on model residuals for the 8 cities using 18 meteorological parameters. Adjusted R2 One standard 97.5th 95th percentile of City (18 deviation of percentile of residuals (ppb) parametersa) residuals (ppb) residuals (ppb) Houston 0.81 8.1 13.5 17.8 Boise 0.55 5.6 9.3 11.1 Denver 0.54 8.0 11.4 14.2 Fort Collins 0.61 6.5 10.4 12.7 Yellowstone 0.35 5.7 9.6 11.3 Provo 0.52 6.1 9.6 11.7 Salt Lake 0.62 6.2 10.2 12.0 City Spokane 0.64 5.3 8.3 10.4 a 2 Adjusted R values for the 10-variable model are reported in Table S3.
243 244 245
3.2. Interpretation of Residuals with HMS and PM Data
246
A high residual alone does not identify the source of high O3. For this we used
247
surface PM2.5 combined with the HMS satellite smoke product to identify “smoky” days
248
in each city, similar to the approach used previously12. This analysis considers 6 out of
249
the 8 sites, since Yellowstone and Provo do not have surface PM2.5 data. We calculated
250
the standard deviation of daily PM2.5 values by month for all cities with data. These range
ACS Paragon Plus Environment
Environmental Science & Technology
Page 10 of 25
10 251
from 3 to 7 µg-m-3 (shown in Table S4). We use the standard deviation for each site to
252
identify whether the daily (24 hour mean) PM2.5 concentration is elevated or not for each
253
city. We define Del PM2.5 as:
254 255
Del PM2.5 = Daily PM2.5 – Monthly mean PM2.5 (2) In choosing the appropriate monthly mean to use, we want to consider that high fire
256
periods will have enhanced monthly means, but also reductions in anthropogenic
257
emissions may result in downward trends at some locations. So for locations with no
258
significant downward trend, we use the monthly mean calculated using all years in the
259
analysis for each month in equation 2. For sites that had months with a statistically
260
significant downward trend in the monthly means, we use the monthly mean for that
261
month-year in equation 2. Only Salt Lake City (May) and Houston (May and September)
262
had statistically significant downward trends in PM2.5. If Del PM2.5 was higher than the
263
standard deviation for that city and there was overhead smoke, as seen in the HMS
264
product, these days were regarded as “smoke days” similar to a previous study.12 Table 2
265
shows statistics for smoke and smoke-free days in each city, and Figure S5 shows the
266
distribution of the residuals for one location (Salt Lake City). The distribution of
267
residuals is nearly normal on smoke-free days, whereas for “smoke days” the distribution
268
is skewed towards higher values, and the mean value of the residuals on smoke days is
269
higher than on smoke-free days. Using a two sample t-test, we found a statistically
270
significant difference (P