Land-Use Regression Modeling of Source-Resolved Fine Particulate

Jul 17, 2019 - This study presents land-use regression (LUR) models for submicron particulate matter (PM1) components from an urban area. Models are ...
0 downloads 0 Views 6MB Size
Article Cite This: Environ. Sci. Technol. XXXX, XXX, XXX−XXX

pubs.acs.org/est

Land-Use Regression Modeling of Source-Resolved Fine Particulate Matter Components from Mobile Sampling Ellis Shipley Robinson,†,‡ Rishabh Urvesh Shah,†,‡ Kyle Messier,§ Peishi Gu,†,‡ Hugh Z. Li,†,‡ Joshua Schulz Apte,∥ Allen L. Robinson,†,‡ and Albert A. Presto*,†,‡ †

Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States Center for Atmospheric Particle Studies, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States § Department of Environmental and Molecular Toxicology, Oregon State University, Corvallis, Oregon 97333, United States ∥ Department of Civil, Architectural & Environmental Engineering, University of Texas at Austin, Austin, Texas 78705, United States

Downloaded via KEAN UNIV on July 18, 2019 at 13:12:51 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.



S Supporting Information *

ABSTRACT: This study presents land-use regression (LUR) models for submicron particulate matter (PM1) components from an urban area. Models are presented for mass concentrations of inorganic species (SO4, NO3, NH4), organic aerosol (OA) factors, and total PM1. OA is source-apportioned using positive matrix factorization (PMF) of data collected from aerosol mass spectrometry deployed on a mobile laboratory. PMF yielded a three-factor solution: cooking OA (COA), hydrocarbon-like OA (HOA), and less-oxidized oxygenated OA (LO-OOA). This study represents the first time that LUR has been applied to source-resolved OA factors. We sampled a roughly 20 km2 area of West Oakland, California, USA, over 1 month (mid-July to mid-August, 2017). The road network of the sampling domain was comprehensively sampled each day using a randomized driving route to minimize temporal and spatial bias. Mobile measurements were aggregated both spatially and temporally for use as discrete spatial observations for LUR model building. LUR model performance was highest for those species with more spatial variability (primary OA factors: COA R2 = 0.80, HOA R2 = 0.67) and lowest for secondary inorganic species (SO4 R2 = 0.47, NH4 R2 = 0.43) that were more spatially homogeneous. Notably, the stepwise selective LUR algorithm largely selected predictors for primary OA factors that correspond to the associated land-use categories (e.g., cooking land-use variables were selected in cooking-related PM models). This finding appears to be robust, as we demonstrate the predictive link between land-use variables and the corresponding source-resolved PM1 components through a subsampling analysis.



sampling locations.10 Mobile sampling provides an attractive alternative to fixed-site monitoring because it requires comparatively fewer instruments (including just a single one in many cases).9,11−13 The primary weakness of mobile sampling, however, is the convolution of spatial with temporal variations in pollutant concentrations. Mobile sampling studies that use a relatively small number of sampling visits may be insufficient to determine pollutant spatial patterns that match long-term (e.g., annual) conditions. Repeated visits across multiple sampling days are required to deconvolute temporal from spatial variability in mobile sampling.9,14,15 Pollutant variability can also be predicted at fine spatial scales (∼25−100 m), though ultimately such modeling approaches rely on representative, spatially resolved measure-

INTRODUCTION Exposure to ambient air pollution poses a large environmental threat to human health. Fine particulate matter (particles smaller than 2.5 μm diameter, PM2.5) is the greatest environmental contributor to the global burden of disease1 and is associated with adverse health effects (morbidity,2 mortality3 ) even at low concentrations.4,5 PM2.5 mass concentrations have long been used as the benchmark variable in health effects studies of PM (e.g., Harvard Six Cities study3). However, recent research identifies substantial intraurban concentration gradients of PM2.5.6,7 In many cases, concentration differences within urban areas may be larger than those between them.8 Therefore, to most accurately describe exposure there exists a strong need for spatially resolved PM2.5 concentrations within urban areas. Unfortunately, most urban areas contain a limited number of regulatory monitors, which insufficiently capture this variability.9 Resolving fine-scale concentration gradients using fixed-site monitors requires a substantial number of instruments and © XXXX American Chemical Society

Received: March 29, 2019 Revised: June 27, 2019 Accepted: July 3, 2019

A

DOI: 10.1021/acs.est.9b01897 Environ. Sci. Technol. XXXX, XXX, XXX−XXX

Article

Environmental Science & Technology

Figure 1. (a) Sampling domain showing 200 m spatial aggregation locations (grid cells). Includes Port of Oakland, West Oakland neighborhoods, and most of Downtown Oakland. Polygons used for mobile sampling planning (numbered) are roughly coincident with City of Oakland neighborhoods and include: (1) Port, (2) The Bottoms, (3) Clawson, (4) Acorn, (5) Oak Center, (6) McClymonds, (7) “Slim” (parts of Ralph Bunche, Oak Center, and Acorn neighborhoods), (8) Ghosttown/Hoover-Foster, and (9) Downtown. The inset map in the upper left of the left panel shows the location of the sampling domain within the larger Bay Area metropolitan area. Map data: Google. (b) Unique day visits vs number of measurement locations (grid cells). The red line shows that 208 measurement locations meet the 15 day minimum threshold.

ments as a starting point. Kriging,16,17 inverse-distance weighting (IDW),18 and land-use regression (LUR), for example, 19,20 are commonly used to predict pollutant concentrations at high spatial resolution. Kriging and IDW are methods of spatial interpolation, while LUR is an application of multilinear regression (MLR) with spatially referenced covariates. Despite the promise that LUR can assess the link, at least in theory, between pollutants and sources (or other physical elements affecting their concentrations), it can suffer from a lack of transferability between different areas.21 PM2.5 is composed of numerous components, which are derived from a wide range of processes and sources. One of the major components of fine PM mass, between 20% and 90%,22 is the organic fraction (organic aerosol, OA). Secondary OA (SOA) is formed through atmospheric chemical reactions, and primary OA (POA) is composed of particles emitted directly from sources. Positive matrix factorization (PMF) of aerosol mass spectrometry data has been routinely used in recent years to apportion OA into “factors” that are linked to sources such as cooking OA (COA),23 biomass burning OA (BBOA),24,25 and hydrocarbon-like OA (HOA). HOA is attributed to vehicular combustion processes.26 PM components that arise from different processes and sources can have correspondingly different spatial patterns in urban environments.11,27 Previous studies have shown significant intraurban concentration variations in POA factors such as COA and HOA.11,28 While LUR models of PM2.5 mass concentrations have become common,29 very few LUR studies have focused on the components that comprise PM. Some studies have developed LURs for metals30−32 and examined closely the source− element connections. Li et al. developed LURs for fractions of particulate organic carbon (OC) based on volatility, and found certain fractions to have strong connections with source categories (e.g., semivolatile OC and traffic).27 However, none of these studies use explicitly source-resolved PM measurements as a starting point, nor do they have speciated measurements for all of the major components of fine PM

massinorganic species, OA factors, and black carbonlike we do here. Our aim is to connect LUR to each of the major components comprising fine PM, allowing us to gain insights on what drives the variability for each component separately. We leverage a data set of mobile measurements that are comprehensive in space for the domain in which we sampled, as well as comprehensive in providing mass concentrations for the major components of PM. Additionally, source-resolved PM provides a unique opportunity to test LUR ideas because it separates portions of PM mass that should be directly connected to land uses versus those that should not be. The hypothesis driving this work is that primary particulate species should be modeled well, or at least better than secondary species using land-use regression. Primary species emitted by commonly distributed sources (e.g., restaurants, vehicles) should be closely linked to land-use variables that describe those emissions sources. Secondary species, which form on long atmospheric time scales and are subject to regional transport, should not be as well-described by land-use variables. The main contributions of this work are (1) to develop LUR models for all components of fine PM, and (2) to evaluate the connection between PM components and selected LUR predictor variables. Lastly, we compare the performance of two approaches for LUR modeling of total submicron PM (PM1): an LUR model fitted to PM1 measurements versus the sum of individual component LUR models. These findings have implications for LUR model transferability for fine PM mass.



MATERIALS AND METHODS Spatial Domain and Site Selection. Mobile sampling was conducted in West Oakland, Alameda County, California. Oakland is a midsize city (population: ∼430 000) within the Bay Area metropolitan statistical area (population: B

DOI: 10.1021/acs.est.9b01897 Environ. Sci. Technol. XXXX, XXX, XXX−XXX

Article

Environmental Science & Technology ∼4 700 000).33 Figure 1a shows a map of the specific sampling domain and its situation within the larger metropolitan area. We subdivide our ∼20 km2 sampling domain into three main parts: First is the Port of Oakland, which is the seventh largest container-shipping port in the U.S.34 Second is the West Oakland neighborhoods, home to ∼36 000 people (2010 census). These neighborhoods are encircled by interstate highways on all sides and include a mix of residential, commercial, and light-industrial land uses. The third is downtown Oakland, which is bordered by the same interstate highways on two sides and contains a high density of commercial activities, including restaurants. Downtown Oakland also serves as one of the region’s major transit hubs. Multiple air quality studies have been conducted in West Oakland in recent years. This focus has been driven by environmental justice concerns about air quality due to emissions from the Port of Oakland and its effects on the adjacent neighborhoods. Legislation has since been imposed to curb these emissions, including regulations on fuel for ships at the port,35 improved queuing protocols to alleviate drayage truck congestion,36 and mandating diesel particulate filters for all drayage vehicles.37 These measures have resulted in substantially improved local air quality38,39 in West Oakland, though the area as a whole remains very source-rich in the context of typical urban areas in the U.S. Mobile Sampling Strategy. The sampling strategy is described in detail by Shah et al.40 and summarized here. We performed saturation sampling in the study domain to quantify PM1 mass concentration and chemical composition at high spatial resolution. Nominally, we drove each road segment at least once per day for 22 total days of driving. Due to various factors (traffic, construction, instrument down-time, etc.) not all segments in the domain were visited each day. The sampling hours for daily driving were roughly 9:00 to 18:00 local time, and the study was conducted between July 10 and August 2, 2017; therefore, our data are representative of daytime summer pollution levels. To minimize any spatial bias and distribute our visits to different parts of the sampling domain across time of day, we used the following approach: we divided the domain into small polygons that took ∼1 h to drive (shown by numbers in Figure 1a). We would drive the street network within each polygon comprehensively but randomize the order that we visited the polygons each day. We used 200 m grid cells for spatial aggregation. There are 311 grid cells in the sampling domain, as shown in Figure 1a. Several recent mobile air quality studies highlight the importance of repeated measurement visits to produce representative concentrations at a given location.9,14,15 We considered 15 days to be the minimum threshold of daily visits required to produce concentrations representative of summer daytime averages based on the findings of Apte et al.,9 and 208 grid cells meet this daily visits threshold (see Figure 1b). The concentrations aggregated for these grid cells serve as the input data for LUR modeling. Air Quality Data Collection. All measurements were conducted using the Carnegie Mellon University mobile laboratory, the details of which have been described previously.32 Briefly, the instrument suite is powered by an on-board alternator coupled to the vehicle’s engine. Ambient air was sampled through a 0.5 in. OD stainless steel tube mounted on the roof of the vehicle and passed through a cyclone separator with a 2.5 μm particle diameter cut-size. The

instrument suite included a high-resolution time-of-flight aerosol mass spectrometer (HR-ToF-AMS, referred to as “AMS” for remainder of this work; Aerodyne Research Inc.) for measuring nonrefractory PM1 chemical composition and mass concentration;41,42 an aethalometer for measuring black carbon (BC) mass concentration (AE33, Magee Scientific); a condensation particle counter for measuring particle number concentrations (200P, Aerosol Dynamics Inc.); and CO and CO2 gas monitors (T300U, Teledyne API, and LI-820, LICOR, respectively). Flow to the AMS was dried using a Nafion drier (MD-110-24, PermaPure). Position data were collected at 1 Hz using a GPS logger (BE-2200 GPS Pro, Bad Elf) with stated accuracy of ±3 m. The focus of this work is mobile concentration measurements of the subcomponents that comprise PM1 mass (inorganic species, OA, and BC). All nonrefractory inorganic and organic components were measured using the AMS, while BC was measured using the aethalometer. The full details of AMS operation as well as all other instruments, data analysis, and spatial aggregation are presented in Shah et al.40 The nominal upper size limit (1 μm) of particle transmission into the AMS is limited by the focusing lens, which is why we present PM1 measurements here instead of PM2.5. We operated the AMS with 20 s averaging time. The nominal speed of the mobile laboratory was 10 m s−1, giving a spatial resolution of roughly 200 m. All data timestamps were shifted to account for transit time within the sampling line. Additionally, each AMS sample timestamp was shifted by 10 s, ensuring that each data point was assigned spatial coordinates corresponding to the midpoint of the 20 s sample. Upon assigning the appropriate spatial coordinates to each AMS sample, we aggregated data within 200 m grid cells. Within each grid cell, we took daily averages to eliminate any potential temporal biases (e.g., stopping at a light). The median of these grid cell daily average values was chosen as the representative concentration of each cell (similar to Apte et al.9). Source-Resolved OA Factors. One of the novel aspects of this paper is to couple LUR modeling with measurements of the components of PM, including source-resolved OA factors. We performed PMF on the AMS OA mass spectra, which yielded a three-factor solution: COA, HOA, and lessoxygenated oxygenated organic aerosol (LO-OOA). Two of these factors are related to primary emissions (COA and HOA), while LO-OOA is an SOA factor. A full discussion of the three-factor PMF solution used in this work is presented in Shah et al.40 Residual mass not fitted by the PMF three-factor solution was less than 1% and is considered negligible. For the purposes of this Article, we assert that the PMF source-apportionment is accurate: COA and HOA are direct primary emissions from their respective source categories, and LO-OOA is a secondary factor. We base this on the following: (1) The mass spectra of each factor are consistent with many previous laboratory and ambient identifications25,26 (see Figure S1a). (2) The diurnal patterns for each factor agree qualitatively with the expected temporal activity of their respective sources (see Figure S1b). (3) Both of the POA factors have a high degree of skewness (ratio of hourly mean:hourly median concentrations >1.6) compared to LOOOA (mean:median ratio = 1.07). High skewness indicates a greater degree of spatial variability, which corresponds intense emission plumes that are captured on our driving route. Given these three aspects, we have a high degree of confidence in C

DOI: 10.1021/acs.est.9b01897 Environ. Sci. Technol. XXXX, XXX, XXX−XXX

Article

Environmental Science & Technology

Figure 2. Overview of aggregated concentration measurements of OA factors and select PM1 components using 200 m grid cell aggregation. (a) Histogram of grid cell concentrations. (b) Ordered normalized concentrations for each component, illustrating the relative spatial variability between each. Concentrations of each component were normalized by their median concentration, and ordered from highest to lowest. (c) Box plot of grid cell concentrations for OA factors grouped by spatial subdomain (left panel) and coarse land-use categories (right panel).

selected. Model performance during this stepwise selection is determined by adjusted R2. This model serves as the starting point for multilinear regression from the remaining set of predictors, where each predictor is then individually added to the previous model and evaluated, and again the best performing model is selected. This sequential addition of predictors is continued until predictor variable inclusion does not increase model performance by 1% or more. The following quality-control measures and diagnostic tests are performed after the stepwise model-building algorithm is completed. Variables with p-values greater than 0.10 are removed from the model. The variance inflation factor (VIF) is computed for all predictors, and those with VIF > 3 are sequentially removed. Cook’s D values are computed for each observation to identify potential high-leverage locations for

attributing HOA to traffic-related activities, COA to cooking, and LO-OOA to secondary chemistry. LUR Model Development. We developed LUR models for all nonrefractory components of PM1, BC, and total PM1 mass (sum of mass concentrations of all components: inorganics + OA + BC). Land-use variables used in this study fall into broad categories including traffic, industry, foodcooking, land-cover, and other environmental variables (e.g., elevation above sea level). A full list of all LUR variables used can be found in Table S1. To build LUR models, we follow the same forward selection approach outlined by Eeftens et al. and used by the ESCAPE project43 and others. Briefly, the model building uses stepwise variable selection. First, univariate regression is conducted using all possible predictors, and the best performing model is D

DOI: 10.1021/acs.est.9b01897 Environ. Sci. Technol. XXXX, XXX, XXX−XXX

Environmental Science & Technology



Article

RESULTS Mobile Measurements. Aggregated measurements of PM1 components are presented in Figure 2. Histograms of grid cell concentrations in Figure 2a illustrate the absolute concentration differences between PM1 components. We show all primary aerosol components (BC, HOA, COA), the dominant secondary aerosol components (LO-OOA, SO4), and total PM1 mass. Total OA (HOA + COA + LO-OOA) contributes roughly 50% of total PM1 mass. The median grid cell concentration of total OA is of 5.2 μg m−3 within our domain. LO-OOA makes up the majority of OA (57%, median = 2.96 μg m−3), followed by COA (26%, median = 1.35 μg m−3) and HOA (17%, median = 0.88 μg m−3). While COA is the dominant of the two primary OA factors, there is a strong diurnal pattern for both, and HOA is greater during morning rush-hour. COA peaks during lunch time. COA also increases during the late afternoon and presumably has another evening meal-time peak, though our measurements did not include evening hours. LO-OOA concentrations show a broad mid-day peak, reflecting production via photochemical oxidation. See Figure S1b for diurnal patterns of OA factors. Inorganic species have similar diurnal patterns to LO-OOA. Inorganic SO4 is the most abundant inorganic component (median = 2.5 μg m−3), followed by NH4 (median = 1.3 μg m−3) and NO3 (median = 0.94 μg m−3). Inorganic NO3 and NH4 are omitted from Figure 2 for the sake of visual clarity. Figure 2b illustrates the relative spatial variability between components. Here, we normalized the representative grid cell concentrations for each component by their median value and ordered these from highest to lowest. COA is the most spatially variable species (representative grid cell concentrations vary across the domain by a factor of 6), followed closely by HOA. LO-OOA is considerably less variable than either POA factor, and SO4 is the least variable of all components. Total PM1 falls in-between as expected, given that it is a composite of all primary and secondary components. Maps in Figure S2 further illustrate this relative spatial variability between COA and SO4. Figure 2c shows the distribution of OA factor concentrations aggregated both by neighborhood (left panel) and broad landuse type (right panel). Neighborhoods used for aggregation are shown in Figure 1a. Broad land-use categorization is done by classifying grid cells as a binary “High” or “Low” with respect to traffic and food-cooking land uses. Grid cells are “High” with respect to food-cooking if they fall in the upper quintile of restaurant or food-truck density. “High” traffic grid cells are those whose centroids are within 200 m of any interstate highway. HOA concentrations are significantly (as determined by ANOVA, p-value