Constrained Mixed-Effect Models with Ensemble Learning for

Jul 20, 2017 - Department of Epidemiology, University of California, Los Angeles, ... in Public Health, College of Health Sciences, University of Cali...
0 downloads 0 Views 2MB Size
Subscriber access provided by UNIV OF WATERLOO

Article

Constrained Mixed-Effect Models with Ensemble Learning for Prediction of Nitrogen Oxides Concentrations at a High Spatiotemporal Resolution Lianfa Li, Fred Lurmann, Rima Habre, Robert Urman, Edward Rappaport, Beate Ritz, Jiu-Chiuan Chen, Frank Gilliland, and Jun Wu Environ. Sci. Technol., Just Accepted Manuscript • DOI: 10.1021/acs.est.7b01864 • Publication Date (Web): 20 Jul 2017 Downloaded from http://pubs.acs.org on August 7, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Environmental Science & Technology is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 31

Environmental Science & Technology

Constrained Mixed-Effect Models with Ensemble Learning for Prediction of Nitrogen Oxides Concentrations at High Spatiotemporal Resolution

Lianfa Li*,1,2, Fred Lurmann3, Rima Habre*,1, Robert Urman1, Edward Rappaport1, Beate Ritz4, JiuChiuan Chen1, Frank D. Gilliland1, Jun Wu*, 5 1. Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA 2. State Key Laboratory of Resources and Environmental Information System, Institute of Geographical Sciences and Natural Resources, Chinese Academy of Sciences, Beijing, China 3. Sonoma Technology, Inc., Petaluma, CA, USA 4. Department of Epidemiology, University of California, Los Angeles, CA, USA 5. Program in Public Health, College of Health Sciences, University of California, Irvine, CA, USA

*Corresponding authors’ contact information: Lanfa Li: Phone, 650-388-0268; Email, [email protected] or [email protected]. Rima Habre: Phone, 323-442-8283; Email, [email protected]. Jun Wu: Phone, 949-824-0548; Email, [email protected].

1 ACS Paragon Plus Environment

Environmental Science & Technology

1

Abstract

2

Spatiotemporal models to estimate ambient exposures at high spatiotemporal resolutions are crucial in

3

large-scale air pollution epidemiological studies that follow participants over extended periods.

4

Previous models typically rely on central-site monitoring data and/or covered short periods, limiting

5

their applications to long-term cohort studies. Here we developed a spatiotemporal model that can

6

reliably predict nitrogen oxide concentrations with a high spatiotemporal resolution over a long time

7

span (>20 years). Leveraging the spatially extensive highly-clustered exposure data from short-term

8

measurement campaigns across 1-2 years and long-term central site monitoring in 1992-2013, we

9

developed an integrated mixed-effect model with uncertainty estimates. Our statistical model

Page 2 of 31

10

incorporated non-linear and spatial effects to reduce bias. Identified important predictors included

11

temporal basis predictors, traffic indicators, population density, and sub-county-level mean pollutant

12

concentrations. Substantial spatial autocorrelation (11-13%) was observed between neighboring

13

communities. Ensemble learning and constrained optimization were used to enhance reliability of

14

estimation over a large metropolitan area and a long period. The ensemble predictions of biweekly

15

concentrations resulted in an R2 of 0.85 (RMSE: 4.7 ppb) for NO2 and 0.86 (RMSE: 13.4 ppb) for NOx.

16

Ensemble learning and constrained optimization generated stable time series, which notably improved

17

the results compared with those from initial mixed-effects models.

18 19 20 21 22 23 2 ACS Paragon Plus Environment

Page 3 of 31

24 25

Environmental Science & Technology

1. Introduction Exposure to air pollution is associated with acute and chronic adverse health outcomes, such as

26

respiratory and cardiovascular morbidity 1, 2. Spatiotemporal models that optimally characterize the

27

environment are crucial to estimate exposures to ambient air pollutants with high spatiotemporal

28

resolutions for large-scale epidemiologic studies. However, obtaining reliable estimates of long-term

29

exposures relies on spatiotemporal models that fully capture complex temporal structure (e.g., both short

30

and long-term temporal trends) jointly with multi-scale spatial variations (e.g., regional- and local-scale

31

spatial variation). Developing such spatiotemporal models is challenging because measurement data are

32

limited in space and time, and complex, and non-linear associations exist between predictors (such as

33

meteorology and traffic) and pollutant concentrations 3.

34

Many earlier studies used land-use regression or conventional kriging approaches to develop

35

individual spatial models of exposure that capture details across space but not time. Such conventional

36

approaches tend to over-fit irregularities of the training data relying on a set of assumptions 4, 5,

37

including having access to an unbiased sample of monitoring sites for the population and homogeneity

38

of spatial variation for kriging. When the temporal variability of exposure is ignored in the modeling

39

process, especially when statistical assumptions are violated, the resulting exposure estimates could be

40

affected by large error causing significant biases or a large variance 6, 7.

41

In more recent years 3, 8-12, air pollution exposure modelers started to employ the variants of principal

42

components called empirical orthogonal functions (EOFs) 11 for spatiotemporal modeling of air

43

pollutants. In the approach, the first and second principal components accounting for the dominant

44

temporal structure often explain the majority of the long-term and seasonal but not the short-term

45

temporal pollutant variation in the study region. Two of these studies parameterized the temporal basis

46

functions by incorporating time-invariant spatial variables such as elevation, distance to the shorelines 3 ACS Paragon Plus Environment

Environmental Science & Technology

47

and meteorological factors 3, 12. However, this two-stage approach generally assumes spatial-temporal

48

independence and is limited in capturing short-term temporal variability in its estimates.

Page 4 of 31

Other methods such as Bayesian maximum entropy have been used to estimate the spatiotemporal

49 50

concentrations of particulate matter