Machine Learning Approach To Estimate Hourly Exposure to Fine

Oct 24, 2018 - Jiayun Yao*† , Michael Brauer† , Sean Raffuse‡ , and Sarah B. Henderson†§. † School of Population and Public Health, Univers...
2 downloads 0 Views 1MB Size
Subscriber access provided by UNIV OF LOUISIANA

Environmental Modeling

A machine learning approach to estimate hourly exposure to fine particulate matter for urban, rural, and remote populations during wildfire seasons Jiayun Yao, Michael Brauer, Sean M. Raffuse, and Sarah Henderson Environ. Sci. Technol., Just Accepted Manuscript • DOI: 10.1021/acs.est.8b01921 • Publication Date (Web): 24 Oct 2018 Downloaded from http://pubs.acs.org on October 25, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 28

Environmental Science & Technology

1

A machine learning approach to estimate hourly exposure to

2

fine particulate matter for urban, rural, and remote

3

populations during wildfire seasons

4

Jiayun Yao1*, Michael Brauer1, Sean Raffuse2, Sarah B. Henderson1,3

5 6 7

1. School of Population and Public Health, University of British Columbia, Vancouver, Canada,

8

V6T 1Z3

9

2. Air Quality Research Center, University of California, Davis, USA, 95616

10

3. Environmental Health Services, British Columbia Centre for Disease Control, Vancouver,

11

Canada, V5Z 4R4

12 13

*Corresponding to Jiayun Yao at: School of Population and Public Health, University of British

14

Columbia, 2206 East Mall, Vancouver, V6T 1Z3 Canada [email protected]

15 16

Main text word count: 5037

17

18

19

20

1 ACS Paragon Plus Environment

Environmental Science & Technology

Page 2 of 28

21

Abstract

22

Exposure to wildfire smoke averaged over 24-hour periods has been associated with a wide range of

23

acute cardiopulmonary events, but little is known about the effects of sub-daily exposures immediately

24

preceding these events. One challenge for studying sub-daily effects is the lack of spatially and

25

temporally resolved estimates of smoke exposures. Inexpensive and globally applicable tools to reliably

26

estimate exposure are needed. Here we describe a Random Forests machine learning approach to

27

estimate 1-hour average population exposure to fine particulate matter during wildfire seasons from

28

2010 to 2015 in British Columbia, Canada, at a 5km by 5km resolution. The model uses remotely sensed

29

fire activity, meteorology assimilated from multiple data sources, and geographic/ecological information.

30

Compared with observations, model predictions had a correlation of 0.93, root mean squared error of

31

3.2 µg/m3, mean fractional bias of 15.1%, and mean fractional error of 44.7%. Spatial cross-validation

32

indicated an overall correlation of 0.60, with an interquartile range from 0.48 to 0.70 across monitors.

33

This model can be adapted for global use, even in locations without air quality monitoring. It is useful for

34

epidemiologic studies on sub-daily exposure to wildfire smoke, and for informing public health actions if

35

operationalized in near-real-time.

36 37

Table of Contents (TOC)/Abstract Art 2 ACS Paragon Plus Environment

Page 3 of 28

Environmental Science & Technology

38

Introduction

39

Wildfire frequency and intensity have significantly increased in North America over recent decades,1-5

40

resulting in destruction of property and loss of human life. Smoke generated from wildfires can be

41

transported to populations in regions distant from the actual fires, leading to degradation of air quality

42

and threats to public health.6-9 Estimates from 1997-2006 suggest that deaths attributable to global

43

landscape fire smoke exposure can range from 260,000 to 600,000 every year.10 Smoke is an important

44

outdoor air pollution source that contributes to the health burden in many regions of the world,11 and

45

the impacts are expected to increase in the future with the changing climate.12-14

46

Exposure to wildfire smoke has been associated with a wide range of acute cardiopulmonary outcomes,

47

from increased reporting of symptoms to elevated risk of mortality.15, 16 However, most of the

48

epidemiologic evidence has been generated using exposures averaged over the 24-hour period on the

49

day of the observed outcomes, and very little is known about the effects of sub-daily exposures in the

50

hours before the outcome. As a result, it remains unclear how much of the cardiopulmonary effects of

51

wildfire smoke should be attributed to short (i.e. hours) but extreme levels of exposure compared with

52

longer (i.e. days) periods of cumulative exposure. Wildfire smoke levels within a community can change

53

very rapidly, so such evidence is needed to develop air quality guidelines and optimal response plans

54

based on relevant averaging periods. Knowledge about the potential impacts of short-term increases in

55

exposure and the potential benefits of short-term interventions is critical for protecting public health.

56

One of the biggest challenges for studying sub-daily smoke exposures is the ability to generate spatially

57

and temporally resolved estimates of fine particulate matter (PM2.5) concentrations. Although 1-hour

58

average measurements of PM2.5 from fixed air quality monitoring stations may be available, they are

59

typically sparse in space, located in urban areas, and may fail during periods of extreme smoke.17

60

Further, while wildfire smoke can affect well-monitored urban populations, it more frequently affects 3 ACS Paragon Plus Environment

Environmental Science & Technology

Page 4 of 28

61

dispersed populations in rural and remote areas. In addition, assigning exposures from central

62

monitoring stations assumes no spatial variability over potentially large geographic areas, which can

63

introduce non-differential exposure misclassification and bias epidemiologic effect estimates towards

64

the null.18 The one epidemiologic study that has used such data to evaluate sub-daily impacts of wildfire

65

smoke reported a borderline positive association with out-of-hospital cardiac arrest.19

66

The province of British Columbia, Canada, has urban, rural, and remote populations that are routinely

67

affected by seasonal wildfire smoke. Our previous work in this context has shown that a spatially

68

resolved model of 24-hour average wildfire smoke exposures produced more precise and robust

69

epidemiologic effect estimates than measurements taken from the provincial air quality monitoring

70

network, despite this network being relatively dense compared with other parts of the world.20 We

71

therefore hypothesize that epidemiologic evidence on sub-daily, 1-hour exposures can be improved by

72

developing a spatially resolved sub-daily exposure model. There are two possible ways to achieve this:

73

deterministic and statistical modeling.

74

Many air pollution models are deterministic, meaning that they use a variety of meteorological and

75

other data inputs to simulate emissions and dispersion based on a set of equations describing the

76

physical and chemical atmospheric processes. Deterministic wildfire smoke models usually consume

77

information about the fires (location and intensity), fuel types, fuel loads, and meteorological conditions.

78

This information is then used by different component models that simulate fire behavior, smoke

79

emissions, and dispersion.21, 22 Such models require extensive expertise and considerable computing

80

power to develop and maintain. Statistical models, on the other hand, estimate pollutant concentrations

81

as a stochastic function of (1) the relevant variables that are currently available, and (2) the empirical

82

relationship between those variables and the pollutant concentrations derived from historical data.23, 24

83

Conventional approaches to statistical air pollution modeling have relied on linear regression, but rapid

4 ACS Paragon Plus Environment

Page 5 of 28

Environmental Science & Technology

84

developments in the fields of data mining, machine learning, and computation have made these

85

alternative methods more widely available in recent years.25, 26

86

Machine learning is a range of computational methods that produce predictions based on algorithms

87

trained using historical data, in an automated and iterative process without explicit programming.27-29

88

Many machine learning approaches can accommodate complex relationships that are non-linear, or that

89

are best explained by high-dimensional interactions. These properties have made them attractive for a

90

wide range of applications, and several recent studies have used them to model air quality and pollutant

91

exposures.30 Unlike deterministic models, statistical models derived from machine learning predict

92

exposure based purely on the relationships between the response and predictive variables, without

93

explicit knowledge of the complex and inter-dependant underlying physical processes. Although prior

94

knowledge of the basic physical processes is necessary to offer these models a relevant set of predictive

95

variables, achieving the most predictive model is the primary objective of machine learning approaches,

96

not the interpretability of the output. These models tend to produce estimates that agree better with

97

observations than their deterministic counterparts,31-34 while requiring less expertise and resources to

98

operationalize.

99

The Optimized Statistical Smoke Exposure Model (OSSEM) is a statistical model that we previously

100

developed to estimate 24-hour average PM2.5 for the entire population of British Columbia applied linear

101

regression to integrate data from multiple sources, including remote sensing images of fire and smoke,

102

routine air quality monitoring, and meteorological models.35 It has been operationalized for public

103

health surveillance since 2012.36 Here, we expand on the 24-hour model (OSSEM-24h) to build a 1-hour

104

model (OSSEM-1h) that estimates PM2.5 exposures for the same population at the same resolution using

105

more advanced statistical modeling techniques and data sources with higher temporal resolution. To

106

identify relevant variables for this work, we also completed a thorough investigation of factors

5 ACS Paragon Plus Environment

Environmental Science & Technology

Page 6 of 28

107

associated with the presence of smoke at the surface level, where populations are exposed.37 The

108

OSSEM-1h model will be useful for conducting epidemiologic studies on sub-daily exposure to wildfire

109

smoke, as well as informing public health actions once operationalized in near-real-time.

110 111 112 113 114 115 116 117 118

Figure 1. Map of British Columbia, Canada. Grey squares indicate grid cells that cover the populated areas in the province, where predictions from the model are made. Orange crosses indicate large fires with high radiative power (>1000 MW) that occurred during the study period (2010-2015). The box with the dashed line indicates the central interior region for case study 1, while the box with solid line indicates the area for case study 2, which included the greater Vancouver area. The locations of the monitoring stations presented in case studies are also labeled. Base map data source: BC Stats and BC Ministry of Health

119

6 ACS Paragon Plus Environment

Page 7 of 28

Environmental Science & Technology

120

Materials and Methods

121

Study area and period

122

British Columbia is the westernmost province of Canada, with a total land area of 925,186 km2, and a

123

population of 4.8 million people.38 Over half of this population resides in the greater Vancouver area,

124

while the vast majority of the landmass is sparsely populated39 (Figure 1). Almost two-thirds of the

125

landmass is forested, making the province prone to seasonal wildfires.40 The study period covers the

126

wildfire seasons (1 April to 30 September) of 2010 through 2015, during which the province experienced

127

three severe fire seasons in 2010, 2014, and 2015, with nearly 3,300 km2 burned each year. The recent

128

increase in extreme wildfire seasons has been attributed to global climate change and widespread forest

129

damage caused by the mountain pine beetle over the last decades.41-43 Most large and intense wildfires

130

occurred in the northern and central regions of the province during the study period (Figure 1).

131

Data sources and variables

132

The following data sources were used to derive the response variable and potentially predictive

133

variables for the OSSEM-1h model, based on previous work to develop the OSSEM-24h35 and vertical

134

height37 models. The spatial resolutions of the input datasets described below varied between 1 and

135

50 km, so a prediction grid with a resolution of 5 km by 5 km was created for the province. Values for

136

each of the potentially predictive variables were assigned to each cell. If more than one value was

137

available for a single cell, the mean value was assigned for continuous variables and the closest value to

138

the cell centroid was assigned for categorical variables. If no value was available within a cell, the value

139

closest to the cell centroid was assigned.

140

PM2.5 measurements: We obtained 1-hour average PM2.5 measurements from 72 air quality monitoring

141

stations from the Provincial Air Data Archive Website maintained by the British Columbia Ministry of

142

Environment and Climate Change Strategy. The total number of 1-hour observations available at each 7 ACS Paragon Plus Environment

Environmental Science & Technology

Page 8 of 28

143

station during the study period ranged from 2263 to 19325 out of the 26352 hours. These 1-hour

144

average data were used as the PM 1-hour response variable for model training, as well as the

145

observations against which we compared the predictions for model evaluation. The response variable

146

PM 1-hour was log-transformed due to its right-skewed distribution. The calendar month of the

147

observation (Month) was included as a potentially predictive variable based on previous work 37.

148

Ecozone: The provincial digital Biogeoclimatic Ecosystem Classification map (Version 10.0) was obtained

149

from the GeoBC Data Discovery Service. The province was classified into 16 ecozones named by the

150

major vegetation species present in each (Figure S1), which have similar growing conditions due to a

151

broad, homogeneous macroclimate.44, 45 While not temporally variable over the study period, these data

152

may indicate how smoke emissions factors vary by fuel type and climate.46-49 The potentially predictive

153

variable Ecozone was assigned to each grid cell as the ecozone of the nearest fire.

154

Fire activity: Data from the Moderate Resolution Spectroradiometer (MODIS) instruments aboard the

155

polar orbit Aqua and Terra satellites were used to assign fire locations and intensity. Fire locations were

156

taken as the centroids of all 1 km by 1 km fire pixels identified in the active fire product (MCD14ML,

157

Collection 6), and fire intensities were approximated by the fire radiative power (FRP) variable. Because

158

these two satellites only passed over the study area four times daily, information was not available for

159

every hour. As a result, hotspots caused by presumed vegetation fires observed within the 12-hour

160

window before or after the 1-hour prediction period were rasterized to the prediction grid cells. Five

161

potentially predictive variables were derived from these data (Table 1): (1) FRP Within 500km as

162

summed FRP values for all fires within 500 km of the grid cell; (2) Closest FRP Cluster as the summed FRP

163

value for the closest fire cluster, where a cluster was defined as any group of grid cells with fire

164

detections that shared any boundaries using the Queen’s case criterion50; (3) IDW FRP as the inverse

165

distance weighted (IDW) average to the centroids of all fires within 500 km, which would reflect larger

8 ACS Paragon Plus Environment

Page 9 of 28

Environmental Science & Technology

166

impacts from closer fires; and (4) Minimum Fire Distance as the distance to the nearest detected fire;

167

and (5) Fire Direction as the direction in degrees bearing from the grid cell to the nearest fire. More

168

details on the derivation of these variables can be found in previous work.37

169

Meteorology: Hourly meteorological information was retrieved from the NASA Modern Era

170

Retrospective-analysis for Research and Applications (MERRA) program, available at a spatial resolution

171

of 2/3-degree longitude by 1/2-degree latitude. We used these data to generate potentially predictive

172

variables for the planetary boundary layer height (PBLH) and the eastward and northward wind

173

components at 50 m above the surface (E50m, N50m). We also extracted eastward and northward

174

winds at the 500 hPa (E500hPa, N500hPa) and 250 hPa (E250hPa, N250hPa) pressure levels. These

175

variables were predictive of the dispersion of wildfire smoke in previous studies.25, 37 More details about

176

these variables can be found in previous work.37

177

Elevation: The average elevation of the grid cell was calculated using data from the GTOPO30 product

178

from the US Geological Survey Earth Resources Observation and Science Center. This is a global digital

179

elevation model with a horizontal grid spacing of approximately 1km by 1km.

180

Model building and evaluation

181

Data reduction

182

Given the large number of 1-hour PM2.5 observations available, using all observations for the modeling

183

would be computationally expensive. In addition, a large proportion of the observations reflect low,

184

background PM2.5 concentrations, which are not suited to the objective of modeling the air quality

185

impacts of wildfire smoke. In consideration of these factors, we chose to train and evaluate OSSEM-1h

186

using a reduced dataset. This included all observations with PM2.5 concentration > 15 µg/m3,

187

approximately the 95th percentile of all 1-hour PM2.5 observations, and twice as many randomly-selected

188

observations with PM2.5 concentrations  15 µg/m3 (Figure S2). 9 ACS Paragon Plus Environment

Environmental Science & Technology

189

Page 10 of 28

Table 1. Data sources and variables. Variable

Description

Data source

1-hour PM2.5 from 72 monitoring stations

BC Ministry of Environment

Response variable PM 1-hour

Potentially predictive variables Month Ecozone FRP Within 500km Closest FRP IDW FRP Minimum Fire Distance

Categorical variable indicating the calendar month Ecozone from the Biogeoclimatic Ecosystem Classification map, where the closest fire is observed Summed fire radiative power (FRP) for all fire within 500 km Summed FRP for the closest fire cluster The inverse distance weighted averaged FRP for fire within 500 km Distance to the closest fire

E50m N50m E500hPa N500hPa E250hPa N250hPa

Direction as degrees bearing to the closest fire Planetary boundary layer height above land surface Eastward wind at 50 meters above surface Northward wind at 50 meters above surface Eastward wind at 500 hPa pressure level Northward wind at 500 hPa pressure level Eastward wind at 250 hPa pressure level Northward wind at 250 hPa pressure level

Elevation

Elevation of the land surface above sea level

Fire Direction PBLH

BC Ministry of Forests and Range

US National Aeronautics and Space Administration (NASA) data from the Moderate Resolution Imaging Spectroradiometer (MODIS)

NASA Modern Era Retrospectiveanalysis for Research and Applications (MERRA) program

US Geological Survey Earth Resources Observation and Science Center

190 191

Model training and out-of-bag evaluation

192

The reduced data were fitted with a Random Forests model, which is an ensemble of regression trees.51

193

Each tree was trained with a random subset of the predictive variables on a random subset of the

194

reduced data. We constructed the Random Forests models with a total of 500 trees using five predictive

195

variables in each tree. To evaluate the model performance, predictions were made with the unsampled

196

data from each tree, otherwise known as the “out-of-bag” data. Because a single observation could fall 10 ACS Paragon Plus Environment

Page 11 of 28

Environmental Science & Technology

197

into the out-of-bag subset multiple times, all values predicted for the same PM 1-hour observation were

198

averaged to obtain the final out-of-bag prediction. Four statistics were calculated to compare the out-of-

199

bag predictions against the observations: (1) Pearson’s correlation coefficient; (2) root mean squared

200

error (RMSE); (3) mean fractional bias (MFB); and (4) mean fractional error (MFE) (Table S1). The out-of-

201

bag predictions were also examined on days with different fire activity levels. The provincial sum of FRP

202

was calculated for each day of the study period, and days with FRP in 80 percentile were categorized as low, moderate and high fire days, respectively. Out-of-bag

204

predictions from the model were compared with observations stratified by these groupings.

205

The importance of each model variable was assessed by generating a random permutation of the

206

variable, calculating the increase in model prediction error with the new values, and comparing it with

207

the error in the model that used the real values. If a variable is important for prediction, the error will be

208

dramatically increased when that variable is replaced with random values. A backward variable selection

209

process was used based on variable importance. If a reduced model without the least important variable

210

performed better or the same as the full model in out-of-bag prediction, that variable would be

211

permanently removed from the final model. Otherwise, the variable would be kept, and the full model

212

was regarded as the final model.

213

Leave-region-out cross-validation

214

To further test the model performance in areas away from the locations of the training data, a leave-

215

region-out cross-validation was conducted. For the 72 monitoring stations used in the study, those

216

within 50km of each other were clustered as a regional group. The 50km criterion was chosen based on

217

a previous study that found the median distance from the populated grid cells of the model to the

218

nearest monitor was approximately 50km.35 The clustering produced a total of 30 groups, each of which

219

included 1-4 stations, except for the regional group in the greater Vancouver area, where 13 stations

11 ACS Paragon Plus Environment

Environmental Science & Technology

Page 12 of 28

220

were clustered into one group. We then trained 16 models, 15 of which excluded data from a random

221

selection of two regional groups, and one of which excluded data from the greater Vancouver regional

222

group. Then we used the models to make predictions for the excluded stations. Using this spatial cross-

223

validation design, we were able to test model performance when none of the training stations were

224

within 50km of the testing locations. The same four evaluation statistics (Table S1) were calculated for

225

each station and overall.

226

Case studies

227

In addition to the province-wide quantitative evaluation, the final model was qualitatively evaluated

228

using two case studies. The first one examined an extreme fire and smoke event that occurred in August

229

2010 across the central interior region of British Columbia52, which is frequently affected by seasonal

230

wildfires and smoke53 (Figure 1). The second one focused on the southwestern region around greater

231

Vancouver, where more than half of the population of the province resides (Figure 1). This region was

232

more moderately impacted by smoke from wildfires burning north of the area in early July 2015.54 The

233

monitoring stations used for evaluation in these case studies were chosen to represent impacts from the

234

smoke events at different extents and timing. Predictions were made for populated grid cells (Figure 1)

235

in the case study area during the case study period.

236

Sensitivity Analysis

237

Inclusion of PM lag 24-hour

238

The OSSEM-24h model included PM2.5 concentrations from the previous day as its most important

239

predictor35, but this limits its utility in areas without dense monitoring networks. Here we chose to

240

exclude lagged PM2.5 concentrations from the main analyses, but to evaluate in a sensitivity analysis

241

whether we could achieve a better model with these data. The 24-hour average of the PM2.5

242

concentrations on each day were calculated using the 1-hour data. The values from the closest stations

12 ACS Paragon Plus Environment

Page 13 of 28

Environmental Science & Technology

243

on the previous day were assigned to grid cells as the potentially predictive variable PM Lag 24-hour, to

244

describe the general air quality in the time before the prediction was made. Evaluation statistics were

245

calculated for out-of-bag predictions from model with PM Lag 24-hour and compared with those from

246

the primary model.

247

Inclusion of GOES AOD

248

We originally intended to use the aerosol optical depth (AOD) product from the Geostationary

249

Operational Environmental Satellite (GOES) West as a potentially predictive variable, because it was the

250

only AOD product available at the timescale relevant to OSSEM-1h. These data are available at a 30-

251

minute interval and 4 km X 4 km spatial resolution during the sunlit portion of the day, and have been

252

correlated with ground-level PM2.5 concentrations in previous studies.25, 55-57 However, the variable was

253

omitted because a large proportion (64%) of the data were missing. Given that MODIS AOD was an

254

important contributor to OSSEM-24h, we tested OSSEM-1h models developed with and without GOES

255

AOD using the subset of available data. We applied the cloud filter provided with the product to remove

256

invalid retrievals and assigned the average of the two retrievals for each hour to the prediction grid. If

257

the value at a grid cell was missing or invalid, the value from the closest cell with valid value within 50km

258

was assigned, which was the same approach used for OSSEM-24h.35

259

Results

260

There were 789,337 1-hour observations with complete information for the response variable and all

261

potentially predictive variables, among which 31,688 (4%) had PM2.5 concentrations higher than 15

262

µg/m3. The PM2.5 observations had a median [25%ile, 75%ile] of 4 [2, 7] µg/m3. After the sampling

263

procedure (Figure S2), a reduced dataset of 95,064 observations was used to train the final model. The

264

PM2.5 observations in the reduced data had a median of 6 [3, 17] µg/m3.

13 ACS Paragon Plus Environment

Environmental Science & Technology

Page 14 of 28

265

The final model included all 15 potentially predictive variables after the backward selection process. The

266

eastward wind component at 50 m above surface (E50m) was the most important variable in the model,

267

followed by Ecozone of the closest fire and other meteorological variables, and FRP Within 500km,

268

which reflected the intensity of fire activity in area (Figure S3). The final model had an overall

269

correlation of 0.93, RMSE of 9.34 µg/m3, MFB of -0.68%, and MFE of 45% between the out-of-bag

270

predictions and observations. Some of the evaluation statistics were influenced by data reduction,

271

where low PM2.5 values were under-represented relative to the entire distribution (Figure S2). When we

272

applied the model to make predictions to the complete dataset (789,337 observations), the predictions

273

had an overall correlation of 0.94, RMSE of 3.2 µg/m3, MFB of -15.1%, and MFE of 44.7% compared with

274

observations. The model performed best on days with high fire activity (r = 0.94), compared with

275

moderate fire activity (r = 0.90) and low fire activity (r = 0.77). The model tended to underestimate

276

observed concentrations in all of these categories (Figure S4).

277

The leave-region-out cross-validation resulted in an overall correlation of 0.60 between predictions and

278

observations. Station-specific correlations ranged from -0.09 to 0.86, with an interquartile range from

279

0.48 to 0.70. Some spatial patterns were observed in the distribution of the performance statistics

280

across different stations (Figure 2). The model performed well around the greater Vancouver region

281

with high correlations, low RMSE and MFE, and MFB close to zero. Stations in the central interior region

282

also had high correlations, but they tended have large RMSE, negative MFB, and high MFE values (Figure

283

2). This may have been due to the higher PM2.5 concentrations and thus larger variability in the central

284

interior region compared with the greater Vancouver region. Stations with fewer observations and

285

lower variability tended to have lower correlation: all stations with correlation below 0.3 had < 500

286

observations and a PM2.5 interquartile range (IQR) of < 6 µg/m3, compared with the medians of 1326

287

observations and 12 µg/m3 IQR among all stations.

14 ACS Paragon Plus Environment

Page 15 of 28

Environmental Science & Technology

288 289 290 291 292 293 294 295

Figure 2. The (a) correlation (r), (b) root mean squared error (RMSE), (c) mean fractional bias (MFB), and (d) mean fractional error (MFE) between model predictions and observations in the leave-region-out cross-validation. The dashed box indicates the central interior area in case study 1, and the solid box with inset indicate the area in case study 2, which includes greater Vancouver. Base map data source: BC Stats and BC Ministry of Health

296

Several large wildfires began burning near Williams Lake (Figure 1) in the central interior region on 28

297

July 2010. These fires remained partially contained until 17 August 2010, when a wind event caused

298

rapid fire growth and wide smoke dispersion across the region. Three of the PM2.5 monitoring stations in

299

the area had distinctly different time series of observed 1-hour concentrations between 17 and 19

Case study 1

15 ACS Paragon Plus Environment

Environmental Science & Technology

Page 16 of 28

300

August 2010, which were generally well-predicted by OSSEM-1h (Figure 3 a-c). However, the model

301

tended to overestimate when PM2.5 concentrations were low and to underestimate when PM2.5

302

concentrations were high. The spatial distribution of OSSEM-1h predictions reflected the rapid

303

dispersion of smoke from Williams Lake to the surrounding area farther and farther removed from the

304

fires (Figure 3 e-f).

305

Case Study 2

306

Starting 5 July 2015, the greater Vancouver area was affected by smoke from multiple fires at the north

307

of the case study boundary (Figure 1). The 24-hour averaged PM2.5 concentrations across greater

308

Vancouver were elevated over the provincial air quality objective (25 µg/m3) for multiple days, which is

309

rare for the region. Stations in both Squamish and Chilliwack observed significant elevations in PM2.5

310

concentrations, but with the peaks occurring at different times, while the Colwood station observed

311

limited impacts (Figure 4). Similar to Case Study 1, OSSEM-1h predictions generally agreed well with

312

observed 1-hour PM2.5 concentrations but consistently underestimated the high concentrations (Figure

313

4).

314

Sensitivity Analyses

315

The model including the PM Lag 24-hour variable had a correlation of 0.93, RMSE of 9.41 µg/m3, MFB of

316

-0.23%, and MFE of 44% between the out-of-bag predictions and observations. The correlations on days

317

with high, moderate and low fire activity were 0.94, 0.98 and 0.76, respectively. This performance was

318

very similar to the performance of the primary model without PM Lag 24-hour.

319

There were 245,464 1-hour observations available across 66 monitoring stations with complete data for

320

GOES AOD and all other predictive variables. Using the same approach described previously (Figure S2),

321

34,758 observations were sampled to train the models with and without AOD. The distribution of the

322

PM2.5 observations in this subset of data was similar to that of the full reduced dataset, with a median of 16 ACS Paragon Plus Environment

Page 17 of 28

Environmental Science & Technology

323

7 [3, 17] µg/m3. The out-of-bag predictions made with the model including AOD had a correlation of

324

0.90, RMSE of 10.7 µg/m3, MFB of 0.12% and MFE of 46% when compared with observations, while the

325

model excluding AOD had a correlation of 0.91, RMSE of 9.7 µg/m3, MFB of 0.01% and MFE of 45%.

326 327

17 ACS Paragon Plus Environment

Environmental Science & Technology

328 329 330 331 332 333 334 335 336 337

Page 18 of 28

Figure 3. Case study 1. On the left, the time series of 1-hour PM2.5 model predictions (grey bars) are compared with observations (black line) at monitoring stations in (a) Williams Lake, (b) Prince George, and (c) Burns Lake from 17 – 19 August 2010. The maps on the right show model predictions at (d) 17:00 on 17 August, (e) 10:00 on 18 August, and (f) 17:00 on 18 August in all populated grid cells. The time points correlating to the right-hand panels are labeled on each of the time series plots (color coded), and the locations correlating to the left-hand panels are labeled on each of the maps. Base map data: Google.

18 ACS Paragon Plus Environment

Page 19 of 28

338 339 340 341 342 343 344 345 346

Environmental Science & Technology

Figure 4. Case study 2. On the left, the time series of PM2.5 model predictions (grey bars) are compared with observations (black line) at monitoring stations at (a) Chilliwack, (b) Colwood, and (c) Squamish from 05 – 07 July 2015. Maps on the right show model predictions at (d) 00:00 on 06 July 6, (e) 10:00 on 06 July, and (f) 05:00 on 07 July in all populated grid cells. The time points correlating to the right-hand panels are labeled on each of the time series plots (color coded), and the locations correlating to the left-hand panels are labeled on each of the maps. Base map data: Google.

19 ACS Paragon Plus Environment

Environmental Science & Technology

Page 20 of 28

347

Discussion

348

A statistical model was developed to estimate 1-hour PM2.5 concentrations at 5 x 5 km resolution during

349

wildfire seasons over the entire province of British Columbia, Canada using a Random Forests model and

350

multiple data sources. The two case studies also showed that the model identified the temporal and

351

spatial peaks in smoke exposures. Predictions from the final model had a high correlation with

352

observations, while spatial cross-validation revealed some variability in prediction performance. Overall,

353

the model performed best closer to the coast, especially around the densely populated greater

354

Vancouver area, where concentrations were generally lower. Although correlations remained high in the

355

interior region, the error and bias were increased further from the coast. To the best of our knowledge,

356

this is the first statistical model developed for smoke-related PM2.5 at a temporal resolution of 1-hour.

357

All other models that produce 1-hour estimates of smoke-related PM2.5 use a deterministic approach.

358

One example is the BlueSky framework, which is an operational smoke forecasting system run in both

359

the US and Canada. An evaluation of BlueSky performance at the 1-hour resolution in California

360

reported MFB values of −84.4% and −1.3% and MFE values of 100.8% and 91.8% for two case studies.58

361

Another example is the FireWork forecasting system developed by Environment Canada. An evaluation

362

of its performance during the 2015 fire season found a correlation of 0.49 and RMSE of 18 µg/m3

363

compared with observations in western Canada.22 Although the evaluation statistics for BlueSky and

364

FireWork were less favorable than OSSEM-1hr, these modeling systems were forecasting hours ahead of

365

time while the OSSEM-1hr model was retrospectively estimating concentrations using already-observed

366

data.

367

Deterministic and statistical models have been developed to estimate 1-hour PM2.5 concentrations due

368

to sources other than wildfire smoke. One study examined the PM2.5 estimates from ten deterministic air

369

quality modeling systems in regions of North America and Europe in 2006 and found that their 20 ACS Paragon Plus Environment

Page 21 of 28

Environmental Science & Technology

370

correlations with observations were between 0.53 and 0.66.59 On the other hand, a statistical regression

371

model with remote sensing data built for southern Ontario, Canada resulted in a correlation of 0.8160 in

372

2004, while a neural network coupled with K-mean clustering approach developed in Auckland, New

373

Zealand, had a correlation of 0.7961 between 2008 and 2011. The out-of-bag evaluation suggests that

374

OSSEM-1hr performed well compared with these models, possibly due to the large training dataset, the

375

choice of modeling approach, the selection of relevant predictive variables, and the difference in

376

distribution of observations included in the analysis. The leave-region-out cross-validation analysis

377

showed that model performance can vary when applied to areas more than 50km from the locations of

378

the training data. However, the model can still provide reasonable estimates for most of the exposed

379

population in BC, because 80% of people reside within 10 km of a monitoring station.35

380

The inclusion of PM2.5 measurements from the previous day as a potentially predictive variable did not

381

improve the model performance in the out-of-bag evaluation, suggesting the potential of the model to

382

be applied in regions without large air quality monitoring networks. When the data were subset to

383

observations for which there were complete AOD data, the inclusion of AOD did not improve the model

384

performance. This was surprising, given the importance of AOD in previously-developed 24-hour

385

models.25, 55, 57 However, there are several factors to consider. First, the quality of the GOES AOD data

386

used here may be affected by the location of the study area, where the larger solar zenith angle at high

387

latitude can lead to larger bias in AOD retrieval.62 In addition, British Columbia is at edge of the GOES-

388

West image swath, and the larger satellite viewing angle results in a larger scattering angle, which

389

relates to larger bias compared with data retrieved at the centre of the image.63 Second, the standard

390

cloud filter from the product may not be appropriate for the study period because heavy smoke may

391

have been incorrectly classified as cloud.25 Previous studies have shown that a less restrictive cloud filter

392

derived from local estimates of surface brightness could improve the utility of AOD products from other

393

satellites during smoke events.64, 65 Third, the mountainous terrain in the province may result in more 21 ACS Paragon Plus Environment

Environmental Science & Technology

Page 22 of 28

394

smoke aloft, leading to lower correlation between satellite AOD and surface concentrations. Fourth, the

395

variables used to represent fire activity and meteorological conditions may adequately explain the

396

response variable in the absence of AOD, leaving little room for improvement.

397

The OSSEM-1h model has some unique strengths. First, all data sources are publicly available and easy

398

to acquire. Second, most of the data have global coverage such that these methods could be locally

399

adapted and expanded to other parts of the world, and could be especially useful in fire-affected areas

400

without sufficient PM2.5 data from existing air quality monitoring networks. Third, the simplicity of the

401

model makes it relatively straightforward to operationalize for near-real-time surveillance. Finally,

402

OSSEM-1hr could also serve as a tool for more spatially comprehensive evaluation of smoke forecasting

403

models.

404

The OSSEM-1h model also has some important limitations. First, errors and uncertainty in the

405

measurement of the predictive variables, such as those from satellite data retrieval or meteorological

406

modeling, will be inherited by the model, affecting the model performance. Second, the fire activity

407

information is currently retrieved from MODIS, which is not updated hourly. As a result, the fire

408

information fed into the model may not reflect the situation at the exact hour of prediction, especially

409

when fire behavior is changing rapidly or there is active and effective effort to suppress fires. We did

410

explore the possibility of using the GOES fire product, which is updated every 30 minutes, but the fire

411

detection sensitivity is affected by the low spatial resolution66 and FRP measurements are not available

412

in the archived data. However, the launch of the GOES-R series of geostationary satellites since 2016 is

413

expected to improve the capacity of fire detection, potentially useful for developing similar models in

414

the future. The imaging instrument aboard of these satellites can provide three times more spectral

415

information, four times the spatial resolution, and more than five times faster temporal coverage than

416

the previous system.67 Third, this approach cannot fully account for smoke generated well beyond the

22 ACS Paragon Plus Environment

Page 23 of 28

Environmental Science & Technology

417

British Columbia study area. Although smoke can travel to British Columbia from other regions,

418

especially the western United States and Siberia, a recent study found that most smoke affecting the

419

province originated from fires within the province.68 Fourth, while the machine learning approach can

420

make good estimates of the response variable, the precise nature of the relationships between the

421

response and predictive variables are impossible to ascertain. As such, users must simply trust the

422

output in context of its performance, in the absence of specific information about the model structure.

423

In addition, the model output may not be interpreted with the physical process of smoke dispersion.

424

Finally, the model selection process based on variable importance may not be adequate to select the

425

optimal model, as the permutation-based variable importance calculation may be influenced by

426

correlated variables in the model.69

427

The OSSEM-1h model presented here can be used to temporally resolved estimates of population

428

exposure to PM2.5 during episodes of wildfire smoke. These can be useful for epidemiologic studies on

429

the very acute health effects of sub-daily exposure, given health outcome measures with sub-daily time

430

stamps such as ambulance dispatch data.19, 70 It can also be used as a tool for public health surveillance

431

of the exposure in near-real-time, which can be used inform timely actions to mitigate the adverse

432

population health impacts of wildfire smoke exposure.

433

Supporting Information

434

Table S1 and Figure S1-S4 in the Supporting Information, which contains additional information about

435

the variable input to the model, calculation of evaluation statistics, data sampling strategies, variable

436

importance and model evaluations.

437 438 23 ACS Paragon Plus Environment

Environmental Science & Technology

439

Acknowledgement

440

This work is supported by the Australian Research Council Linkage Program (LP130100146) and the

441

British Columbia Lung Association. We would like to thank all the organizations that generate and

442

distribute the data used in the model, without which this work will not be possible. We also thank the

443

editor and reviewers for their valuable insights that make this work better.

Page 24 of 28

444 445

References

446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476

1. Westerling, A. L.; Hidalgo, H. G.; Cayan, D. R.; Swetnam, T. W., Warming and earlier spring increase western US forest wildfire activity. science 2006, 313, (5789), 940-943. 2. Gillett, N. P.; Weaver, A. J.; Zwiers, F. W.; Flannigan, M. D., Detecting the effect of climate change on Canadian forest fires. Geophysical Research Letters 2004, 31, (18), L18211. 3. Dennison, P. E.; Brewer, S. C.; Arnold, J. D.; Moritz, M. A., Large wildfire trends in the western United States, 1984–2011. Geophysical Research Letters 2014, 41, (8), 2928-2933. 4. Jolly, W. M.; Cochrane, M. A.; Freeborn, P. H.; Holden, Z. A.; Brown, T. J.; Williamson, G. J.; Bowman, D. M., Climate-induced variations in global wildfire danger from 1979 to 2013. Nature Communications 2015, 6, 7537. 5. Jain, P.; Wang, X.; Flannigan, M. D., Trend analysis of fire season length and extreme fire weather in North America between 1979 and 2015. International Journal of Wildland Fire 2018, 26, (12), 1009-1020. 6. Dempsey, F., Forest Fire Effects on Air Quality in Ontario: Evaluation of Several Recent Examples. B Am Meteorol Soc 2013, 94, (7), 1059-1064. 7. Jeong, J. I.; Park, R. J.; Youn, D., Effects of Siberian forest fires on air quality in East Asia during May 2003 and its climate implication. Atmos Environ 2008, 42, (39), 8910-8922. 8. Miller, D. J.; Sun, K.; Zondlo, M. A.; Kanter, D.; Dubovik, O.; Welton, E. J.; Winker, D. M.; Ginoux, P., Assessing boreal forest fire smoke aerosol impacts on U.S. air quality: A case study using multiple data sets. J. Geophys. Res.-Atmos. 2011, 116, D22. 9. Viswanathan, S.; Eria, L.; Diunugala, N.; Johnson, J.; McClean, C., An analysis of effects of San Diego wildfire on ambient air quality. Journal of the Air & Waste Management Association 2006, 56, (1), 56-67. 10. Johnston, F. H.; Henderson, S. B.; Chen, Y.; Randerson, J. T.; Marlier, M.; Defries, R. S.; Kinney, P.; Bowman, D. M.; Brauer, M., Estimated global mortality attributable to smoke from landscape fires. Environmental health perspectives 2012, 120, (5), 695-701. 11. Lelieveld, J.; Evans, J.; Fnais, M.; Giannadaki, D.; Pozzer, A., The contribution of outdoor air pollution sources to premature mortality on a global scale. Nature 2015, 525, (7569), 367-371. 12. Spracklen, D. V.; Mickley, L. J.; Logan, J. A.; Hudman, R. C.; Yevich, R.; Flannigan, M. D.; Westerling, A. L., Impacts of climate change from 2000 to 2050 on wildfire activity and carbonaceous aerosol concentrations in the western United States. Journal of Geophysical Research: Atmospheres 2009, 114, D20301.

24 ACS Paragon Plus Environment

Page 25 of 28

477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522

Environmental Science & Technology

13. Liu, J. C.; Mickley, L. J.; Sulprizio, M. P.; Dominici, F.; Yue, X.; Ebisu, K.; Anderson, G. B.; Khan, R. F.; Bravo, M. A.; Bell, M. L., Particulate air pollution from wildfires in the Western US under climate change. Climatic change 2016, 138, (3-4), 655-666. 14. Liu, J. C.; Mickley, L. J.; Sulprizio, M. P.; Yue, X.; Peng, R. D.; Dominici, F.; Bell, M. L., Future respiratory hospital admissions from wildfire smoke under climate change in the Western US. Environmental Research Letters 2016, 11, (12), 124018. 15. Liu, J. C.; Pereira, G.; Uhl, S. A.; Bravo, M. A.; Bell, M. L., A systematic review of the physical health impacts from non-occupational exposure to wildfire smoke. Environmental Research 2015, 136, (0), 120-132. 16. Reid, C. E.; Brauer, M.; Johnston, F. H.; Jerrett, M.; Balmes, J. R.; Elliott, C. T., Critical review of health impacts of wildfire smoke exposure. Environmental health perspectives 2016, 124, (9), 1334. 17. Yao, J. Evaluation of the BlueSky smoke forecasting system and its utility for public health protection in British Columbia. University of British Columbia, 2012. 18. Rothman, K. J., Epidemiology: an introduction. Oxford university press: 2012. 19. Dennekamp, M.; Straney, L. D.; Erbas, B.; Abramson, M. J.; Keywood, M.; Smith, K.; Sim, M. R.; Glass, D. C.; Del Monaco, A.; Haikerwal, A.; Tonkin, A. M., Forest Fire Smoke Exposures and Out-ofHospital Cardiac Arrests in Melbourne, Australia: A Case-Crossover Study. Environmental health perspectives 2015, 123, (10), 959-64. 20. Yao, J.; Eyamie, J.; Henderson, S. B., Evaluation of a spatially resolved forest fire smoke model for population-based epidemiologic exposure assessment. Journal of Exposure Science and Environmental Epidemiology 2016, 26, (3), 233. 21. Larkin, N. K.; O’Neill, S. M.; Solomon, R.; Raffuse, S.; Strand, T.; Sullivan, D. C.; Krull, C.; Rorig, M.; Peterson, J.; Ferguson, S. A., The BlueSky smoke modeling framework. International Journal of Wildland Fire 2010, 18, (8), 906-920. 22. Pavlovic, R.; Chen, J.; Anderson, K.; Moran, M. D.; Beaulieu, P.-A.; Davignon, D.; Cousineau, S., The FireWork air quality forecast system with near-real-time biomass burning emissions: Recent developments and evaluation of performance for the 2015 North American wildfire season. Journal of the Air & Waste Management Association 2016, 66, (9), 819-841. 23. Zannetti, P.; Anfossi, D., Air Quality Modeling: Theories, Methodologies, Computational Techniques, and Available Databases and Software. EnviroComp: 2003. 24. Mallet, V.; Sportisse, B., Air quality modeling: From deterministic to stochastic approaches. Computers & Mathematics with Applications 2008, 55, (10), 2329-2337. 25. Reid, C. E.; Jerrett, M.; Petersen, M. L.; Pfister, G. G.; Morefield, P. E.; Tager, I. B.; Raffuse, S. M.; Balmes, J. R., Spatiotemporal Prediction of Fine Particulate Matter During the 2008 Northern California Wildfires Using Machine Learning. Environmental Science & Technology 2015, 49, (6), 3887-3896. 26. Lassman, W.; Ford, B.; Gan, R. W.; Pfister, G.; Magzamen, S.; Fischer, E. V.; Pierce, J. R., Spatial and temporal estimates of population exposure to wildfire smoke during the Washington state 2012 wildfire season using blended model, satellite, and in situ data. GeoHealth 2017, 1, (3), 106-121. 27. Mohri, M.; Rostamizadeh, A.; Talwalkar, A., Foundations of machine learning. MIT press: 2012. 28. Langley, P., Elements of machine learning. Morgan Kaufmann: 1996. 29. Samuel, A. L., Some studies in machine learning using the game of checkers. IBM Journal of research and development 1959, 3, (3), 210-229. 30. Bellinger, C.; Jabbar, M. S. M.; Zaïane, O.; Osornio-Vargas, A., A systematic review of data mining and machine learning for air pollution epidemiology. BMC public health 2017, 17, (1), 907. 31. Lightstone, S. D.; Moshary, F.; Gross, B., Comparing CMAQ Forecasts with a Neural Network Forecast Model for PM2. 5 in New York. Atmosphere 2017, 8, (9), 161.

25 ACS Paragon Plus Environment

Environmental Science & Technology

523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569

Page 26 of 28

32. Ma, D.; Zhang, Z., Contaminant dispersion prediction and source estimation with integrated Gaussian-machine learning network model for point source emission in atmosphere. Journal of hazardous materials 2016, 311, 237-245. 33. Xi, X.; Wei, Z.; Xiaoguang, R.; Yijie, W.; Xinxin, B.; Wenjun, Y.; Jin, D. In A comprehensive evaluation of air pollution prediction improvement by a machine learning method, Service Operations And Logistics, And Informatics (SOLI), 2015 IEEE International Conference on, 2015; IEEE: 2015; pp 176181. 34. Pelliccioni, A.; Cristofari, A.; Silibello, C.; Gherardi, M.; Cecinato, A.; Lamberti, M. In Estimation of PAHs concentration fields in an urban area by means of Support Vector Machines, 7th Intl. Congress on Env. Modelling and Software, San Diego, CA, 2014; Ames, D. P.; Quinn, N. W. T.; Rizzoli, A. E., Eds. San Diego, CA, 2014. 35. Yao, J.; Henderson, S. B., An empirical model to estimate daily forest fire smoke exposure over a large geographic area using air quality, meteorological, and remote sensing data. J Expos Sci Environ Epidemiol 2014, 24, (3), 328-335. 36. McLean, K. E.; Yao, J.; Henderson, S. B., An Evaluation of the British Columbia Asthma Monitoring System (BCAMS) and PM2.5 Exposure Metrics during the 2014 Forest Fire Season. Int J Environ Res Public Health 2015, 12, (6), 6710-24. 37. Yao, J.; Raffuse, S. M.; Brauer, M.; Williamson, G. J.; Bowman, D. M.; Johnston, F. H.; Henderson, S. B., Predicting the minimum height of forest fire smoke within the atmosphere using machine learning and data from the CALIPSO satellite. Remote Sensing of Environment 2018, 206, 98-106. 38. Statistics Canada, Table 051-0001 - Estimates of population, by age group and sex for July 1, Canada, provinces and territories, annual (persons unless otherwise noted), CANSIM (database). Accessed: 2018-01-12. 2017. 39. Statistics Canada, Focus on geography series, 2011 Census. 2012. 40. Ministry of Forests and Ministry of Sustainable Resource Management, British Columbia’s forests: a geographical snapshot. In 2003; p 4. 41. British Columbia Ministry of Environment, Indicators of Climate Change for British Columbia 2016 Update. In 2016. 42. Aukema, B. H.; Carroll, A. L.; Zhu, J.; Raffa, K. F.; Sickley, T. A.; Taylor, S. W., Landscape level analysis of mountain pine beetle in British Columbia, Canada: spatiotemporal development and spatial synchrony within the present outbreak. Ecography 2006, 29, (3), 427-441. 43. Taylor, S. W.; Carroll, A. L.; Alfaro, R. I.; Safranyik, L., Forest, climate and mountain pine beetle outbreak dynamics in western Canada. The mountain pine beetle: A synthesis of biology, management, and impacts on lodgepole pine 2006, 67-94. 44. Delong, C.; Griesbauer, H.; Mackenzie, W.; Foord, V., Corroboration of biogeoclimatic ecosystem classification climate. Journal of Ecosystems and Management 2010, 10, (3), 49–64. 45. Pojar, J.; Klinka, K.; Meidinger, D., Biogeoclimatic ecosystem classification in British Columbia. Forest Ecology and Management 1987, 22, (1-2), 119-154. 46. Wiedinmyer, C.; Quayle, B.; Geron, C.; Belote, A.; McKenzie, D.; Zhang, X.; O’Neill, S.; Wynne, K. K., Estimating emissions from fires in North America for air quality modeling. Atmos Environ 2006, 40, (19), 3419-3432. 47. Andreae, M. O.; Merlet, P., Emission of trace gases and aerosols from biomass burning. Global biogeochemical cycles 2001, 15, (4), 955-966. 48. Kim, Y. H.; Warren, S. H.; Krantz, Q. T.; King, C.; Jaskot, R.; Preston, W. T.; George, B. J.; Hays, M. D.; Landis, M. S.; Higuchi, M., Mutagenicity and Lung Toxicity of Smoldering vs. Flaming Emissions from Various Biomass Fuels: Implications for Health Effects from Wildland Fires. Environmental health perspectives 2018, 126, (1), 017011-017011. 26 ACS Paragon Plus Environment

Page 27 of 28

570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616

Environmental Science & Technology

49. Kistler, M.; Schmidl, C.; Padouvas, E.; Giebl, H.; Lohninger, J.; Ellinger, R.; Bauer, H.; Puxbaum, H., Odor, gaseous and PM10 emissions from small scale combustion of wood types indigenous to Central Europe. Atmos Environ 2012, 51, 86-93. 50. Lloyd, C., Spatial data analysis: an introduction for GIS users. Oxford university press: 2010; p 57. 51. Breiman, L., Random forests. Machine learning 2001, 45, (1), 5-32. 52. Yao, J.; Brauer, M.; Henderson, S. B., Evaluation of a Wildfire Smoke Forecasting System as a Tool for Public Health Protection. Environmental health perspectives 2013, 121, (10), 1142. 53. Elliott, C. T.; Henderson, S. B.; Wan, V., Time series analysis of fine particulate matter and asthma reliever dispensations in populations affected by forest fires. Environmental health : a global access science source 2013, 12, 11. 54. Henderson, S. B.; Morrison, K. T.; McLean, K. E.; Yao, J.; Shaddick, G.; Buckeridge, D. L., Staying ahead of the epidemiologic curve: Demonstration of the British Columbia Asthma Prediction System (BCAPS) during a wildfire smoke event. Environmental Health Perspectives 2018 (submitted) 55. Green, M.; Kondragunta, S.; Ciren, P.; Xu, C., Comparison of GOES and MODIS aerosol optical depth (AOD) to aerosol robotic network (AERONET) AOD and IMPROVE PM2. 5 mass at Bondville, Illinois. Journal of the Air & Waste Management Association 2009, 59, (9), 1082-1091. 56. Liu, Y.; Paciorek, C. J.; Koutrakis, P., Estimating regional spatial and temporal variability of PM2. 5 concentrations using satellite data, meteorology, and land use information. Environmental health perspectives 2009, 117, (6), 886. 57. Paciorek, C. J.; Liu, Y.; Moreno-Macias, H.; Kondragunta, S., Spatiotemporal associations between GOES aerosol optical depth retrievals and ground-level PM2.5. Environmental Science & Technology 2008, 42, (15), 5800-6. 58. Strand, T. M.; Larkin, N.; Craig, K. J.; Raffuse, S.; Sullivan, D.; Solomon, R.; Rorig, M.; Wheeler, N.; Pryden, D., Analyses of BlueSky Gateway PM2.5 predictions during the 2007 southern and 2008 northern California fires. J. Geophys. Res.-Atmos. 2012, 117. 59. Solazzo, E.; Bianconi, R.; Pirovano, G.; Matthias, V.; Vautard, R.; Moran, M. D.; Appel, K. W.; Bessagnet, B.; Brandt, J.; Christensen, J. H., Operational model evaluation for particulate matter in Europe and North America in the context of AQMEII. Atmos Environ 2012, 53, 75-92. 60. Tian, J.; Chen, D., A semi-empirical model for predicting hourly ground-level fine particulate matter (PM2. 5) concentration in southern Ontario from satellite remote sensing and ground-based meteorological measurements. Remote Sensing of Environment 2010, 114, (2), 221-229. 61. Elangasinghe, M.; Singhal, N.; Dirks, K.; Salmond, J.; Samarasinghe, S., Complex time series analysis of PM10 and PM2. 5 for a coastal site using artificial neural network modelling and k-means clustering. Atmos Environ 2014, 94, 106-116. 62. Prados, A. I.; Kondragunta, S.; Ciren, P.; Knapp, K. R., GOES aerosol/smoke product (GASP) over North America: comparisons to AERONET and MODIS observations. Journal of Geophysical Research: Atmospheres 2007, 112, D15201. 63. Zhang, H.; Lyapustin, A.; Wang, Y.; Kondragunta, S.; Laszlo, I.; Ciren, P.; Hoff, R., A multi-angle aerosol optical depth retrieval algorithm for geostationary satellite data over the United States. Atmos Chem Phys 2011, 11, (23), 11977. 64. Raffuse, S. M.; McCarthy, M. C.; Craig, K. J.; DeWinter, J. L.; Jumbam, L. K.; Fruin, S.; James Gauderman, W.; Lurmann, F. W., High‐resolution MODIS aerosol retrieval during wildfire events in California for use in exposure assessment. Journal of Geophysical Research: Atmospheres 2013, 118, (19), 11-242. 65. van Donkelaar, A.; Martin, R. V.; Levy, R. C.; da Silva, A. M.; Krzyzanowski, M.; Chubarova, N. E.; Semutnikova, E.; Cohen, A. J., Satellite-based estimates of ground-level fine particulate matter during extreme events: A case study of the Moscow fires in 2010. Atmos Environ 2011, 45, (34), 6225-6232. 27 ACS Paragon Plus Environment

Environmental Science & Technology

617 618 619 620 621 622 623 624 625 626 627 628 629 630

Page 28 of 28

66. Schroeder, W.; Prins, E.; Giglio, L.; Csiszar, I.; Schmidt, C.; Morisette, J.; Morton, D., Validation of GOES and MODIS active fire detection products using ASTER and ETM+ data. Remote Sensing of Environment 2008, 112, (5), 2711-2726. 67. NOAA & NASA Advanced Baseline Imager (ABI) Fact Sheet. https://www.goesr.gov/education/docs/Factsheet_ABI.pdf (2018-02-28), 68. Brey, S. J. Using operational hms smoke observations to gain insights on north american smoke transport and implications for air quality. Colorado State University, Fort Collins, Colorado, 2016. 69. Nicodemus, K.; Shugart, Y. In Impact of linkage disequilibrium and effect size on the ability of machine learning methods to detect epistasis in case-control studies, Genetic Epidemiology, 2007; WILEY-LISS DIV JOHN WILEY & SONS INC, 111 RIVER ST, HOBOKEN, NJ 07030 USA: 2007; pp 611-611. 70. Salimi, F.; Henderson, S. B.; Morgan, G. G.; Jalaludin, B.; Johnston, F. H., Ambient particulate matter, landscape fire smoke, and emergency ambulance dispatches in Sydney, Australia. Environment international 2017, 99, 208-212.

28 ACS Paragon Plus Environment