Subscriber access provided by - Access paid by the | UCSB Libraries
Environmental Modeling
Satellite-Based Estimates of Daily NO2 Exposure in China Using Hybrid Random Forest and Spatiotemporal Kriging Model Yu Zhan, Yuzhou Luo, Xunfei Deng, Kaishan Zhang, Minghua Zhang, Michael L. Grieneisen, and Baofeng Di Environ. Sci. Technol., Just Accepted Manuscript • DOI: 10.1021/acs.est.7b05669 • Publication Date (Web): 16 Mar 2018 Downloaded from http://pubs.acs.org on March 16, 2018
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 39
Environmental Science & Technology
1
Satellite-Based Estimates of Daily NO2 Exposure in China Using Hybrid Random Forest
2
and Spatiotemporal Kriging Model
3 4
Yu Zhan,†,‡,§ Yuzhou Luo,‖ Xunfei Deng,⊥ Kaishan Zhang,† Minghua Zhang,‖
5
Michael L. Grieneisen,‖ Baofeng Di*,‡,†
6 7
†
8
610065, China
9
‡
Department of Environmental Science and Engineering, Sichuan University, Chengdu, Sichuan
Institute for Disaster Management and Reconstruction, Sichuan University, Chengdu, Sichuan
10
610200, China
11
§
12
610065, China
13 14 15 16
Sino-German Centre for Water and Health Research, Sichuan University, Chengdu, Sichuan
‖
Department of Land, Air, and Water Resources, University of California, Davis, CA 95616,
USA ⊥
Institute of Digital Agriculture, Zhejiang Academy of Agricultural Sciences, Hangzhou,
Zhejiang 310021, China
17 18
*
19
Tel: +86 13982079978; fax +86 2885405613; e-mail:
[email protected] Corresponding author
1 ACS Paragon Plus Environment
Environmental Science & Technology
20
ABSTRACT
21
A novel model named random-forest-spatiotemporal-kriging (RF-STK) was developed to
22
estimate the daily ambient NO2 concentrations across China during 2013-2016 based on the
23
satellite retrievals and geographic covariates. The RF-STK model showed good prediction
24
performance, with cross-validation R2=0.62 (RMSE=13.3 µg/m3) for daily and R2=0.73
25
(RMSE=6.5 µg/m3) for spatial predictions. The nationwide population-weighted multiyear
26
average of NO2 was predicted to be 30.9±11.7 µg/m3 (mean±standard deviation), with a slowly
27
but significantly decreasing trend at a rate of -0.88±0.38 µg/m3/year. Among the main economic
28
zones of China, the Pearl River Delta showed the fastest decreasing rate of -1.37 µg/m3/year,
29
while the Beijing-Tianjin Metro did not show a temporal trend (P=0.32). The population-
30
weighted NO2 was predicted to be the highest in North China (40.3±10.3 µg/m3) and lowest in
31
Southwest China (24.9±9.4 µg/m3). Approximately 25% of the population lived in nonattainment
32
areas with annual-average NO2>40 µg/m3. A piecewise linear function with an abrupt point
33
around 100 people/km2 characterized the relationship between the population density and the
34
NO2, indicating a threshold of aggravated NO2 pollution due to urbanization. Leveraging the
35
ground-level NO2 observations, this study fills the gap of statistically modeling nationwide NO2
36
in China, and provides essential data for epidemiological research and air quality management.
2 ACS Paragon Plus Environment
Page 2 of 39
Page 3 of 39
37
Environmental Science & Technology
TOC Art
38
3 ACS Paragon Plus Environment
Environmental Science & Technology
39
INTRODUCTION
40
Ambient nitrogen dioxide (NO2) causes direct damage to human health and contributes to the
41
formation of ozone (O3), particulate matter (PM), and acid rain.1, 2 Exposure to NO2 has been
42
associated with adverse human health outcomes, such as asthma, respiratory disorders, lung
43
cancer, and premature mortality.2, 3 As a valuable marker of air pollution mixtures, NO2 is an
44
important precursor of O3 and PM, and their adverse effects on human health may be mutually
45
reinforcing.2, 4 Atmospheric NO2 is mainly emitted from anthropogenic sources, such as motor
46
vehicle emissions and industrial boilers.1 In China, as one of the highly NO2 polluted countries in
47
the world,5 the number of private cars has increased by more than 500% in the past decade.6
48
More than a thousand state-managed sites have been established in China to monitor the
49
concentrations of ambient NO2 and other air pollutants since 2013. High uncertainty emerges
50
when assessing the exposure levels through site-matching, especially for the areas distant from
51
the nearest monitoring site. It is therefore critical to accurately predict the complete
52
spatiotemporal distribution of NO2 across the entire country for population exposure assessment.
53 54
Land use regression (LUR) models are commonly employed to predict spatial distributions of
55
NO2 worldwide.7-10 In a narrow sense, land uses surrounding monitoring sites (as predictors) and
56
observed NO2 are used to parameterize linear regression (LR) models, which predict NO2 at
57
unmonitored locations based on the data from the predictors.11 In a broad sense, predictors of
58
LUR models include many other geographic factors, such as population densities and
59
meteorological conditions.12, 13 The satellite-retrieved vertical column density (VCD) of NO2 in
60
the atmosphere is a particularly informative predictor for estimating NO2.7, 14 For instance, the
61
ozone monitoring instrument (OMI) onboard the Aura satellite provides tropospheric NO2
4 ACS Paragon Plus Environment
Page 4 of 39
Page 5 of 39
Environmental Science & Technology
62
density with global coverage on a daily basis. A side-effect of including many predictors in an
63
LUR model, e.g., ≥800 predictors were included in a previous LUR model,7 is the problem of
64
severe multicollinearity among predictors, resulting in unreliable parameter estimation and lower
65
prediction accuracy. Various techniques, such as stepwise variable selection, partial least square
66
regression (PLSR), or least absolute shrinkage and selection operator (LASSO), are employed to
67
resolve this multicollinearity problem.7, 10, 15 Nevertheless, LR is still inadequate for modeling
68
NO2 given the complex relationships between the predictors and NO2, including nonlinearity and
69
high-order interactions.
70 71
Compared to LR models commonly used in LUR, machine learning models (e.g., random forests
72
and neural networks) generally show higher prediction accuracy due to their strength in
73
modeling complex relationships between response and predictor variables.16, 17 By investigating
74
patterns from large amounts of data, machine learning algorithms develop sophisticated model
75
structures for capturing relationships that are otherwise too complex to specify in parametric
76
models such as LR. While machine learning models are usually considered black boxes, several
77
statistical metrics are available for model interpretation, such as variable importance measures
78
and partial dependence plots for random forests.18 For prediction-oriented tasks with primary
79
concerns of prediction accuracy, machine learning models are generally more suitable than LR
80
models when training data are abundant. Machine learning models have shown high performance
81
in predicting ambient concentrations of multiple air pollutants, such as fine particulate matter
82
(PM2.5) and O3.19-22 With the availability of the national air quality monitoring network in China
83
and the millions of data points collected,23 machine-learning-based LUR models are more
5 ACS Paragon Plus Environment
Environmental Science & Technology
84
feasible than linear-regression-based LUR models for predicting the spatiotemporal distributions
85
of NO2 for China.
86 87
Most existing modeling studies on NO2 are focused on the prediction of spatial distributions, and
88
only a few studies considered the intra-annual variation of NO2.13, 14, 24 A previous LUR study
89
used temporal scaling to derive monthly NO2 from annual predictions based on monthly patterns
90
observed at monitoring sites.14 This “top-down” scaling approach is satisfactory on a monthly
91
scale but inadequate in predicting daily NO2, due to the fact that daily variation is much more
92
irregular than monthly variation (Figure S1). To predict temporally resolved NO2, the temporal
93
variation of predictors should be accounted for in LUR models.12, 25 A few previous studies used
94
the predictors with temporal variation (e.g., satellite OMI retrievals and/or meteorological
95
conditions) to predict daily or monthly NO2 by using linear-regression-based LUR models.12, 25
96
However, similar to spatial LUR, the predictive performance of spatiotemporal LUR can be
97
improved by replacing LR models with machine learning models. In addition, while national or
98
continental-scale LUR modeling work for NO2 has been conducted for western countries, only a
99
few local or regional-scale NO2-LUR studies exist for China.8, 24
100 101
This study aims to fill that gap by estimating the spatiotemporal distributions of daily NO2 across
102
China (0.1°×0.1° grid; 98341 cells) to facilitate nationwide exposure assessment. The spatial
103
resolution of 0.1° is consistent with our previous work estimating spatiotemporal distributions of
104
ambient ozone across China,22 and this resolution is commonly used in global or national
105
exposure assessments.5, 26 A novel hybrid model of a random forest submodel and spatiotemporal
106
kriging (RF-STK) is proposed to predict the daily NO2 concentrations based on the OMI satellite
6 ACS Paragon Plus Environment
Page 6 of 39
Page 7 of 39
Environmental Science & Technology
107
retrievals and various geographic covariates, as well as the spatiotemporal autocorrelations.
108
Variable importance and partial dependence plots were employed to evaluate the effect of each
109
predictor on the NO2 prediction. On the basis of the predicted NO2, we assessed the
110
spatiotemporal patterns of NO2and the resulting population exposure levels. Moreover, the
111
relationship between the population density and the NO2 was characterized with a piecewise
112
linear function. Filling the gap of statistically modeling nationwide NO2 for China, this study
113
provides essential data for epidemiological analyses and air quality management.
114 115
MATERIALS AND METHODS
116
Ground-level NO2 Observations. The hourly NO2 measurements were collected from 1657
117
monitoring sites scattered throughout mainland China, Taiwan, and Hong Kong during 2013-
118
2016 (Figure S2).23, 27, 28 The NO2 concentrations were measured using the chemiluminescence
119
method. The number of monitoring sites increased from 744 to 1604 during 2013-2016. The
120
hourly concentrations were averaged by days for each site, and days with less than 12-hour
121
measurements were excluded. Strong diurnal patterns in NO2 were observed, with two peaks at
122
8am and 21pm (Figure S3). While sampling bias towards urban areas was considerable,
123
approximately 8% of the NO2 data were retrieved from areas with population densities lower
124
than 400 people/km2 (Figure S4). These sites provided important training samples for modeling
125
NO2 in suburban or rural areas. The units of NO2 were made uniform as µg/m3 by using the unit
126
conversion factor (1 ppb = 1.88 µg/m3 NO2) in order to be consistent with the air quality
127
guidelines set by WHO and the Chinese government.2 Approximately 1.67 million daily NO2
128
observations were included for the model development. The number of observations per site was
7 ACS Paragon Plus Environment
Environmental Science & Technology
129
1008±344 (mean ± standard deviation), with negligible seasonal trend in the missing data (Figure
130
S5).
131 132
During 2013-2016 in China, the daily NO2 observations were 34±21 µg/m3, with a median of 29
133
µg/m3 and an interquartile range of 26 µg/m3. The annual averages of the observed NO2
134
decreased from 40±25 µg/m3 to 32±20 µg/m3 during 2013-2016. Seasonally, the NO2
135
observations were the highest in winter (42±25 µg/m3) and the lowest in summer (25±14 µg/m3),
136
with similar levels between spring (33±20 µg/m3) and fall (35±21 µg/m3). At the regional level,
137
the highest and lowest mean NO2 levels were observed in North China (40±25 µg/m3) and South
138
China (28±18 µg/m3), respectively.
139 140
Satellite Retrievals. The tropospheric column densities of NO2 (molecules/cm2) were obtained
141
from the OMI-NO2 level-3 data product (OMNO2d version 3; 0.25°×0.25° resolution).29 The
142
OMI satellite retrievals had nearly global coverage on a daily basis, with a local bypass time
143
during 12:00-15:00pm. The data of the level-3 product were regularly latitude-longitude gridded
144
by calculating area-weighted means of all good-quality retrievals within each grid cell. The main
145
quality screening criteria included terrain reflectivity