Satellite-Based Estimates of Daily NO2 Exposure in China Using

Mar 16, 2018 - Satellite-Based Estimates of Daily NO2 Exposure in China Using Hybrid ... Institute of Digital Agriculture, Zhejiang Academy of Agricul...
1 downloads 0 Views 2MB Size
Subscriber access provided by - Access paid by the | UCSB Libraries

Environmental Modeling

Satellite-Based Estimates of Daily NO2 Exposure in China Using Hybrid Random Forest and Spatiotemporal Kriging Model Yu Zhan, Yuzhou Luo, Xunfei Deng, Kaishan Zhang, Minghua Zhang, Michael L. Grieneisen, and Baofeng Di Environ. Sci. Technol., Just Accepted Manuscript • DOI: 10.1021/acs.est.7b05669 • Publication Date (Web): 16 Mar 2018 Downloaded from http://pubs.acs.org on March 16, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 39

Environmental Science & Technology

1

Satellite-Based Estimates of Daily NO2 Exposure in China Using Hybrid Random Forest

2

and Spatiotemporal Kriging Model

3 4

Yu Zhan,†,‡,§ Yuzhou Luo,‖ Xunfei Deng,⊥ Kaishan Zhang,† Minghua Zhang,‖

5

Michael L. Grieneisen,‖ Baofeng Di*,‡,†

6 7



8

610065, China

9



Department of Environmental Science and Engineering, Sichuan University, Chengdu, Sichuan

Institute for Disaster Management and Reconstruction, Sichuan University, Chengdu, Sichuan

10

610200, China

11

§

12

610065, China

13 14 15 16

Sino-German Centre for Water and Health Research, Sichuan University, Chengdu, Sichuan



Department of Land, Air, and Water Resources, University of California, Davis, CA 95616,

USA ⊥

Institute of Digital Agriculture, Zhejiang Academy of Agricultural Sciences, Hangzhou,

Zhejiang 310021, China

17 18

*

19

Tel: +86 13982079978; fax +86 2885405613; e-mail: [email protected]

Corresponding author

1 ACS Paragon Plus Environment

Environmental Science & Technology

20

ABSTRACT

21

A novel model named random-forest-spatiotemporal-kriging (RF-STK) was developed to

22

estimate the daily ambient NO2 concentrations across China during 2013-2016 based on the

23

satellite retrievals and geographic covariates. The RF-STK model showed good prediction

24

performance, with cross-validation R2=0.62 (RMSE=13.3 µg/m3) for daily and R2=0.73

25

(RMSE=6.5 µg/m3) for spatial predictions. The nationwide population-weighted multiyear

26

average of NO2 was predicted to be 30.9±11.7 µg/m3 (mean±standard deviation), with a slowly

27

but significantly decreasing trend at a rate of -0.88±0.38 µg/m3/year. Among the main economic

28

zones of China, the Pearl River Delta showed the fastest decreasing rate of -1.37 µg/m3/year,

29

while the Beijing-Tianjin Metro did not show a temporal trend (P=0.32). The population-

30

weighted NO2 was predicted to be the highest in North China (40.3±10.3 µg/m3) and lowest in

31

Southwest China (24.9±9.4 µg/m3). Approximately 25% of the population lived in nonattainment

32

areas with annual-average NO2>40 µg/m3. A piecewise linear function with an abrupt point

33

around 100 people/km2 characterized the relationship between the population density and the

34

NO2, indicating a threshold of aggravated NO2 pollution due to urbanization. Leveraging the

35

ground-level NO2 observations, this study fills the gap of statistically modeling nationwide NO2

36

in China, and provides essential data for epidemiological research and air quality management.

2 ACS Paragon Plus Environment

Page 2 of 39

Page 3 of 39

37

Environmental Science & Technology

TOC Art

38

3 ACS Paragon Plus Environment

Environmental Science & Technology

39

INTRODUCTION

40

Ambient nitrogen dioxide (NO2) causes direct damage to human health and contributes to the

41

formation of ozone (O3), particulate matter (PM), and acid rain.1, 2 Exposure to NO2 has been

42

associated with adverse human health outcomes, such as asthma, respiratory disorders, lung

43

cancer, and premature mortality.2, 3 As a valuable marker of air pollution mixtures, NO2 is an

44

important precursor of O3 and PM, and their adverse effects on human health may be mutually

45

reinforcing.2, 4 Atmospheric NO2 is mainly emitted from anthropogenic sources, such as motor

46

vehicle emissions and industrial boilers.1 In China, as one of the highly NO2 polluted countries in

47

the world,5 the number of private cars has increased by more than 500% in the past decade.6

48

More than a thousand state-managed sites have been established in China to monitor the

49

concentrations of ambient NO2 and other air pollutants since 2013. High uncertainty emerges

50

when assessing the exposure levels through site-matching, especially for the areas distant from

51

the nearest monitoring site. It is therefore critical to accurately predict the complete

52

spatiotemporal distribution of NO2 across the entire country for population exposure assessment.

53 54

Land use regression (LUR) models are commonly employed to predict spatial distributions of

55

NO2 worldwide.7-10 In a narrow sense, land uses surrounding monitoring sites (as predictors) and

56

observed NO2 are used to parameterize linear regression (LR) models, which predict NO2 at

57

unmonitored locations based on the data from the predictors.11 In a broad sense, predictors of

58

LUR models include many other geographic factors, such as population densities and

59

meteorological conditions.12, 13 The satellite-retrieved vertical column density (VCD) of NO2 in

60

the atmosphere is a particularly informative predictor for estimating NO2.7, 14 For instance, the

61

ozone monitoring instrument (OMI) onboard the Aura satellite provides tropospheric NO2

4 ACS Paragon Plus Environment

Page 4 of 39

Page 5 of 39

Environmental Science & Technology

62

density with global coverage on a daily basis. A side-effect of including many predictors in an

63

LUR model, e.g., ≥800 predictors were included in a previous LUR model,7 is the problem of

64

severe multicollinearity among predictors, resulting in unreliable parameter estimation and lower

65

prediction accuracy. Various techniques, such as stepwise variable selection, partial least square

66

regression (PLSR), or least absolute shrinkage and selection operator (LASSO), are employed to

67

resolve this multicollinearity problem.7, 10, 15 Nevertheless, LR is still inadequate for modeling

68

NO2 given the complex relationships between the predictors and NO2, including nonlinearity and

69

high-order interactions.

70 71

Compared to LR models commonly used in LUR, machine learning models (e.g., random forests

72

and neural networks) generally show higher prediction accuracy due to their strength in

73

modeling complex relationships between response and predictor variables.16, 17 By investigating

74

patterns from large amounts of data, machine learning algorithms develop sophisticated model

75

structures for capturing relationships that are otherwise too complex to specify in parametric

76

models such as LR. While machine learning models are usually considered black boxes, several

77

statistical metrics are available for model interpretation, such as variable importance measures

78

and partial dependence plots for random forests.18 For prediction-oriented tasks with primary

79

concerns of prediction accuracy, machine learning models are generally more suitable than LR

80

models when training data are abundant. Machine learning models have shown high performance

81

in predicting ambient concentrations of multiple air pollutants, such as fine particulate matter

82

(PM2.5) and O3.19-22 With the availability of the national air quality monitoring network in China

83

and the millions of data points collected,23 machine-learning-based LUR models are more

5 ACS Paragon Plus Environment

Environmental Science & Technology

84

feasible than linear-regression-based LUR models for predicting the spatiotemporal distributions

85

of NO2 for China.

86 87

Most existing modeling studies on NO2 are focused on the prediction of spatial distributions, and

88

only a few studies considered the intra-annual variation of NO2.13, 14, 24 A previous LUR study

89

used temporal scaling to derive monthly NO2 from annual predictions based on monthly patterns

90

observed at monitoring sites.14 This “top-down” scaling approach is satisfactory on a monthly

91

scale but inadequate in predicting daily NO2, due to the fact that daily variation is much more

92

irregular than monthly variation (Figure S1). To predict temporally resolved NO2, the temporal

93

variation of predictors should be accounted for in LUR models.12, 25 A few previous studies used

94

the predictors with temporal variation (e.g., satellite OMI retrievals and/or meteorological

95

conditions) to predict daily or monthly NO2 by using linear-regression-based LUR models.12, 25

96

However, similar to spatial LUR, the predictive performance of spatiotemporal LUR can be

97

improved by replacing LR models with machine learning models. In addition, while national or

98

continental-scale LUR modeling work for NO2 has been conducted for western countries, only a

99

few local or regional-scale NO2-LUR studies exist for China.8, 24

100 101

This study aims to fill that gap by estimating the spatiotemporal distributions of daily NO2 across

102

China (0.1°×0.1° grid; 98341 cells) to facilitate nationwide exposure assessment. The spatial

103

resolution of 0.1° is consistent with our previous work estimating spatiotemporal distributions of

104

ambient ozone across China,22 and this resolution is commonly used in global or national

105

exposure assessments.5, 26 A novel hybrid model of a random forest submodel and spatiotemporal

106

kriging (RF-STK) is proposed to predict the daily NO2 concentrations based on the OMI satellite

6 ACS Paragon Plus Environment

Page 6 of 39

Page 7 of 39

Environmental Science & Technology

107

retrievals and various geographic covariates, as well as the spatiotemporal autocorrelations.

108

Variable importance and partial dependence plots were employed to evaluate the effect of each

109

predictor on the NO2 prediction. On the basis of the predicted NO2, we assessed the

110

spatiotemporal patterns of NO2and the resulting population exposure levels. Moreover, the

111

relationship between the population density and the NO2 was characterized with a piecewise

112

linear function. Filling the gap of statistically modeling nationwide NO2 for China, this study

113

provides essential data for epidemiological analyses and air quality management.

114 115

MATERIALS AND METHODS

116

Ground-level NO2 Observations. The hourly NO2 measurements were collected from 1657

117

monitoring sites scattered throughout mainland China, Taiwan, and Hong Kong during 2013-

118

2016 (Figure S2).23, 27, 28 The NO2 concentrations were measured using the chemiluminescence

119

method. The number of monitoring sites increased from 744 to 1604 during 2013-2016. The

120

hourly concentrations were averaged by days for each site, and days with less than 12-hour

121

measurements were excluded. Strong diurnal patterns in NO2 were observed, with two peaks at

122

8am and 21pm (Figure S3). While sampling bias towards urban areas was considerable,

123

approximately 8% of the NO2 data were retrieved from areas with population densities lower

124

than 400 people/km2 (Figure S4). These sites provided important training samples for modeling

125

NO2 in suburban or rural areas. The units of NO2 were made uniform as µg/m3 by using the unit

126

conversion factor (1 ppb = 1.88 µg/m3 NO2) in order to be consistent with the air quality

127

guidelines set by WHO and the Chinese government.2 Approximately 1.67 million daily NO2

128

observations were included for the model development. The number of observations per site was

7 ACS Paragon Plus Environment

Environmental Science & Technology

129

1008±344 (mean ± standard deviation), with negligible seasonal trend in the missing data (Figure

130

S5).

131 132

During 2013-2016 in China, the daily NO2 observations were 34±21 µg/m3, with a median of 29

133

µg/m3 and an interquartile range of 26 µg/m3. The annual averages of the observed NO2

134

decreased from 40±25 µg/m3 to 32±20 µg/m3 during 2013-2016. Seasonally, the NO2

135

observations were the highest in winter (42±25 µg/m3) and the lowest in summer (25±14 µg/m3),

136

with similar levels between spring (33±20 µg/m3) and fall (35±21 µg/m3). At the regional level,

137

the highest and lowest mean NO2 levels were observed in North China (40±25 µg/m3) and South

138

China (28±18 µg/m3), respectively.

139 140

Satellite Retrievals. The tropospheric column densities of NO2 (molecules/cm2) were obtained

141

from the OMI-NO2 level-3 data product (OMNO2d version 3; 0.25°×0.25° resolution).29 The

142

OMI satellite retrievals had nearly global coverage on a daily basis, with a local bypass time

143

during 12:00-15:00pm. The data of the level-3 product were regularly latitude-longitude gridded

144

by calculating area-weighted means of all good-quality retrievals within each grid cell. The main

145

quality screening criteria included terrain reflectivity