Article pubs.acs.org/est
Using High-Resolution Satellite Aerosol Optical Depth To Estimate Daily PM2.5 Geographical Distribution in Mexico City Allan C. Just,*,† Robert O. Wright,‡ Joel Schwartz,† Brent A. Coull,§ Andrea A. Baccarelli,† Martha María Tellez-Rojo,∥ Emily Moody,⊥ Yujie Wang,# Alexei Lyapustin,∇ and Itai Kloog¶ †
Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02215, United States Department of Preventive Medicine, Icahn School of Medicine at Mount Sinai, New York, New York 10029, United States § Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115, United States ∥ Center of Nutrition and Health Research, National Institute of Public Health, Cuernavaca, Morelos 62100, Mexico ⊥ Department of Internal Medicine−Pediatrics, University of Minnesota Medical Center, Minneapolis, Minnesota 55455, United States # University of Maryland, Baltimore County, Baltimore, Maryland 21250, United States ∇ National Aeronautics and Space Administration (NASA) Goddard Space Flight Center (GSFC), Code 613, Greenbelt, Maryland 20771, United States ¶ Department of Geography and Environmental Development, Ben-Gurion University of the Negev, Beer Sheva 8410501, Israel ‡
ABSTRACT: Recent advances in estimating fine particle (PM2.5) ambient concentrations use daily satellite measurements of aerosol optical depth (AOD) for spatially and temporally resolved exposure estimates. Mexico City is a dense megacity that differs from other previously modeled regions in several ways: it has bright land surfaces, a distinctive climatological cycle, and an elevated semi-enclosed air basin with a unique planetary boundary layer dynamic. We extend our previous satellite methodology to the Mexico City area, a region with higher PM2.5 than most U.S. and European urban areas. Using a novel 1 km resolution AOD product from the MODIS instrument, we constructed daily predictions across the greater Mexico City area for 2004−2014. We calibrated the association of AOD to PM2.5 daily using municipal ground monitors, land use, and meteorological features. Predictions used spatial and temporal smoothing to estimate AOD when satellite data were missing. Our model performed well, resulting in an out-of-sample cross-validation R2 of 0.724. Cross-validated root-mean-squared prediction error (RMSPE) of the model was 5.55 μg/m3. This novel model reconstructs long- and short-term spatially resolved exposure to PM2.5 for epidemiological studies in Mexico City.
■
INTRODUCTION Mexico City is a megacity, with a population of more than 24 million. Air pollution is a major public health threat, particularly in the urban regions of Mexico. The Mexican National Institute of Statistics and Geography estimates that the total cost of atmospheric contamination was 3.4% of the national gross domestic product (GDP) in 2012.1 Effective quantitation of exposure to air pollution for epidemiologic studies and policy analysis requires measurements that go beyond the relatively sparse air-monitoring network to more finely capture the exposure variability that may be driving health effects. Mexico City’s unique geography, climate, and demography contribute to the high levels of air pollution. The city sits in an elevated basin, 2240 m above sea level and surrounded on three sides by mountain ridges, but with a broad opening to the north and a narrower gap to the south−southeast. There are over 40 000 industries and >5.5 million vehicles passing through the city daily. The combined high altitude and tropical insolation contribute to incomplete combustion and the formation of © 2015 American Chemical Society
secondary particulate matter (PM). Air quality is generally worse in the winter, when rain is less common and thermal inversions are more frequent.2,3 These distinctive characteristics also make modeling air pollution in Mexico City challenging. Until recent years, epidemiologic studies on the health effects of PM have used exposure estimates with limited geospatial information (such as central ground monitors), assigning measurements to populations within a specified distance of the monitor.4,5 These types of exposure assignment methods have been used in studies of the health impacts of air pollution in Mexico City.6−8 This exposure assignment method can introduce error and most likely biases the effect estimates downward because of spatial misalignment.9 To address this exposure misclassification, various tools have been developed Received: Revised: Accepted: Published: 8576
February 16, 2015 June 5, 2015 June 10, 2015 June 10, 2015 DOI: 10.1021/acs.est.5b00859 Environ. Sci. Technol. 2015, 49, 8576−8584
Article
Environmental Science & Technology
Figure 1. Map of the study area showing the border of the AOD grid, location of the PM2.5 monitoring stations, major roadways, and elevation contours.
over the past few years.10−13 A commonly used approach is land use regression (LUR), which uses geographic covariates to expand in situ measurements of PM2.5 concentrations to large areas. Unfortunately, because these geographic covariates are mostly non-time varying, the temporal resolution of most LUR models is limited and these types of exposure models are used mainly to assess chronic health effects. Recent improvements of LUR address some of its inherent limitations.14−16 Some of these approaches use newly available satellite data to greatly expand the temporal variability as well as spatial coverage of LUR. Satellite-based aerosol optical depth (AOD) is a derived geophysical daily measurement that can be used to estimate air quality and pollution. AOD is a measure of the scattering/ extinction of electromagnetic radiation at a given wavelength because of the presence of aerosols in the atmospheric column. Chang and colleagues have used new statistical downscaling and data fusion techniques to predict PM2.5 concentrations at spatial point locations in the southeastern United States during the period 2003−2005 using moderate resolution imaging spectroradiometer (MODIS) satellite data.17 Their model performed well in cross-validated predictions [R2 = 0.78, and a root mean-squared error (RMSE) = 3.61 μg/m3]. Similarly, Chudnovsky and colleagues18 used 1 year of observations from the newly developed multi-angle implementation of atmospheric correction (MAIAC) algorithm based on MODIS.19,20 They used AOD data at 1 km spatial resolution to generate daily PM2.5 estimates for co-located AOD and PM2.5 sites. The model predictive performance spatially was similar to that estimated by Chang et al. (spatial R2 = 0.79). More recently, our group used high-resolution MAIAC data to develop models to predict daily PM2.5 at a 1 × 1 km resolution across the entire northeastern U.S.A. for the years 2003−2011, which allowed us to better differentiate daily and long-term exposure between urban, suburban, and rural areas.21 Our model performance improved upon these initial breakthroughs in satellite-based modeling (cross-validated predictions, R2 = 0.88). In addition, our results revealed very low bias (slope of predictions versus
withheld observations of 0.99), which suggests that residual error in estimates from this approach will be non-differential. Although demonstrated successfully in the northeastern U.S.A., it remains uncertain how well the hybrid-satellite approach would perform in areas that are significantly different from the northeastern U.S.A. in geography, climate, and PM characteristics. Other groups have used lower resolution (∼10 km) AOD products in estimating the global burden of disease, including estimating PM2.5 long-term averages over Mexico City, but have not tailored the fit for daily predictions.22 A number of geographic features distinguish the Mexico City metropolitan area (MCMA) from other areas in which hybrid models have been generated, particularly in the northeastern U.S.A. Differences include the meteorological patterns and seasonality, the semi-enclosed basin confining regional pollution, the higher elevation, and consequent less efficient motor vehicle combustion. In addition, Mexico City has brighter land surfaces related to surface soils, limited vegetation, and prevalent building materials, which contribute to error in the derivation of the satellite AOD estimates. These physical/ meteorological differences, the additional error in the AOD estimates, and limited data availability on local pollution sources (such as traffic density or roadway classifications) in the Mexico City region add to the challenge of generating PM2.5 prediction models. The goal of this paper was to use the new MAIAC AOD satellite data to estimate PM2.5 in the MCMA, adapting and extending the previous hybrid-model approach to account for the unique challenges of the Mexico City region.
■
METHODS Study Area. The study area included the region of Mexico City within longitude from −99.42 to −98.67 and latitude from 19.09 to 19.77. To avoid making predictions in the largely uninhabited outlying mountains to the southwest and east of the city, the study area was restricted to the contiguous Mexico City valley area of topologically connected grid cells with 8577
DOI: 10.1021/acs.est.5b00859 Environ. Sci. Technol. 2015, 49, 8576−8584
Article
Environmental Science & Technology
Planetary Boundary Layer (PBL). Hourly PBL (mixing height) measurements between 6:00 and 18:00 from a fixed site within the city were provided by the Secretariá del Medio Ambiente28 and included in the model as the mean of the morning measurements. The height of the boundary layer may vary with local meteorological conditions, influencing the concentration and vertical profile of pollutants. The boundary layer not only controls transport and location of pollutants and aerosols, but also their concentrations would be different in variable boundary layer structures.29 Daily Precipitation. Daily precipitation measurements (in millimeters) were entered as a daily citywide mean obtained through Sistema de Aguas de la Ciudad de México.30 Population at Risk Estimates. Demographic data were obtained from the 2010 Mexican National Census.31 We calculated the average population for each 1 km grid cell centroid based on the census statistical areas (AGEBs) intersecting these grid cells. Statistical Methods. All modeling was performed using the R statistical software, version 3.2.0.32 In the Mexico City area, there are large day-to-day differences in mean PM 2.5 concentration, PBL, temperature, and precipitation. These daily differences create a varying relationship between the AOD surface and PM2.5 for every single day, and thus, we chose to use a mixed effect model as we have done in some of our previous studies.21,33,34 We incorporate spatial and temporal predictors and day-specific random effects to account for these temporal variations in the PM2.5−AOD relationship. To generate the daily 1 km PM2.5 predictions in each grid cell for the entire period, we developed an analytic process as follows: The first stage calibrates the AOD grid-level observations to the PM2.5 monitoring data using all monitor-day observations, with the closest available AOD value within 1.1 km during the study period, while adjusting for spatial and temporal covariates. Specifically, we fit the following model (calibration stage):
elevation less than 3000 m using the mean elevation within each 1 km grid cell from the ASTER GDEM V2.23 The resulting region included 4694 1 km grid cells. AOD Data. Daily MAIAC spectral AOD was derived from MODIS Aqua Collection 6 L1B data for the period of 2004− 2014. MAIAC algorithm provides a high-resolution (1 km) AOD product that was used in our analysis for the Mexico City area. The MAIAC data from MODIS Aqua represent a midafternoon measurement [mean daily acquisition time of 2:57 pm Greenwich mean time (GMT) −5 h; range 2:10−3:45 pm local overpass time]. An in-depth description of the MAIAC product and algorithm details can be found in previous publications.19,20 Because AOD values may be spurious at cloud edges, AOD data were filtered to exclude values with adjacent cloud or high uncertainty flags and with a moving window variance in the top 2.5th percentile. The resulting MAIAC data set included 19.8% of all possible observations, which is consistent with our previous work in the New England region.21 Monitoring Data. Data for daily PM2.5 mass concentrations across Mexico City for the study period (from January 2, 2004 to April 30, 2014) were downloaded from the automated air quality monitoring network of Mexico City (RAMA).24 Daily PM2.5 concentrations were calculated as the mean of the hourly measurements (≥18 available) collected on tapered element oscillating microbalance (TEOM) devices. The atmospheric monitoring system of Mexico City (SIMAT) also provided monitor locations and meteorological network data, including temperature and relative humidity, which were assigned on the basis of the closest available station (e.g., from 23 stations reporting temperature). We analyzed local features of monitor locations and saw that specific locations had much lower MAIAC retrieval. For example, in the grid cell containing the station “PER”, the surface brightness analyzed by the MAIAC algorithm was 30−50% brighter than at the other sites, making the MAIAC data less reliable in these locations. Because of these features, three monitors in atypical locations were excluded, leaving a total of 12 monitors used in the remaining analyses (all locations shown in Figure 1). Measurements from Christmas and New Years included many outlying values, potentially associated with the use of fireworks. These two dates include 18 of the 20 highest monitor-day observations in the data set and were excluded from model calibration. Spatial and Temporal Predictors of PM2.5. The multistep regression modeling approach included AOD, classic land use predictors, and temporal predictors. Spatial covariates were generated using ArcMap, version 10.2, and mapped with QGIS, version 2.8.25,26 All spatial data were projected into UTM zone 14N. We used the following spatial predictors: Roadway Density. Roadway data were obtained from OpenStreetMap (downloaded July 31, 2013; http://www. openstreetmap.org/).27 Total road density was calculated using the line density tool in the ArcGIS spatial analyst toolbox. This calculates the total polyline distance of all roads per square kilometer around the centroids of the satellite AOD cells across the study area. We used the following temporal predictors: Meteorological Data. Temperature and relative humidity were obtained from the SIMAT, except for 100 days of relative humidity in 2007 that were missing from the SIMAT database and were imputed using the airport meteorological station. Grid cells were matched to the closest weather station with available daily means for these variables (≥18 hourly measurements).
PMij = (α + uj) + (β1 + vj)AODij + β2 temperatureij + β3relative humidityij + β4 mean AM PBLj + β5 √ precipitationj + β6 roadway densityi + εij (ujvj) ∼ [(0, 0), Σ]
where PMij denotes the measured PM2.5 concentration at a spatial site i on a day j; α and uj are the fixed and random daily intercepts, respectively, AODij is the AOD value in the grid cell corresponding to site i on day j, β1 and vj are the fixed and random slopes, respectively, temperatureij and relative humidityij are the assigned closest available values in the grid cells corresponding to site i on a day j, mean AM PBLj and precipitationj are the values on date j, roadway densityi is the mean in the grid cell containing site i, and finally, Σ is an unstructured variance−covariance matrix for the random effects. Following this process, we predicted daily PM2.5 concentrations in grid cells without monitors but with available AOD measurements using the calibration stage model coefficients. This resulted in PM2.5 prediction for all day-grid cell combinations with available satellite-based AOD. 8578
DOI: 10.1021/acs.est.5b00859 Environ. Sci. Technol. 2015, 49, 8576−8584
Article
Environmental Science & Technology In the second stage (full coverage model), to predict daily PM2.5 in grid cells with no AOD on that day across the study area, we make use of the city-wide association between grid-cell AOD and PM2.5 levels and the association between the PM2.5 level in a given grid cell with that in neighboring grid cells. Because the spatial pattern of air pollution varies by season, we fit a separate smooth spatial surface for each season (cold-dry, November−February; warm-dry, March−April; and rainy, May−October) across the study time period.2 To constrain this surface to positive variation around the time series, the full coverage model was fit on the square root of the calibration stage predictions and, subsequently, back-transformed. Specifically, we fit the following model (full coverage stage):
Table 1. Calibration Model Regression Fixed Effects intercept AOD temperature relative humidity mean AM PBL/100 √precipitation roadway density
estimate
standard error
t value
12.94 14.41 0.58 0.12 −1.72 −4.85 0.21
1.35 0.93 0.07 0.02 0.13 0.66 0.01
9.60 15.53 8.88 7.97 −12.92 −7.33 14.06
p value