Subscriber access provided by University of Glasgow Library
Article
Using building heights and street configuration to enhance intra-urban PM , NO and NO land use regression models 10
X
2
Robert Tang, Marta Blangiardo, and John Gulliver Environ. Sci. Technol., Just Accepted Manuscript • Publication Date (Web): 03 Sep 2013 Downloaded from http://pubs.acs.org on September 3, 2013
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
Environmental Science & Technology is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 27
Environmental Science & Technology
1
Using building heights and street configuration to enhance intra-urban PM10, NOX and
2
NO2 land use regression models
3
Robert Tang1*, Marta Blangiardo1, and John Gulliver1
4 5
1. Small Area Health Statistics Unit, MRC-PHE Centre for Environment and Health, School
6
of Public Health, Imperial College London, St Mary’s campus, London, W2 1PG, UK.
7 8
* Corresponding author:
9
[email protected] 10
Tel: +44 (0)20 7594 5027
11
Fax: +44 (0)20 7594 0768
12
Small Area Health Statistics Unit, MRC-PHE Centre for Environment and Health, School of
13
Public Health, Imperial College London, St Mary’s campus, London, W2 1PG, UK.
1 ACS Paragon Plus Environment
Environmental Science & Technology
14
TOC/Abstract Art
15
2 ACS Paragon Plus Environment
Page 2 of 27
Page 3 of 27
Environmental Science & Technology
16
Abstract
17
Land use regression (LUR) models have been widely used to provide long-term air pollution
18
exposure assessment in epidemiological studies. However, models have rarely offered
19
variables that account for the dispersion environment close to source (e.g. street canyons,
20
position and dimensions of buildings, road width). This study used newly-available data on
21
building heights and geometry to enhance the representation of land use and the dispersion
22
field in LUR. Models were developed for PM10, NOX and NO2 for 2008-2011 for London,
23
UK. A separate set of models using ‘traditional’ land use and traffic indicators (e.g. distance
24
from road, area of housing within circular buffers) were also developed and their
25
performance compared with ‘enhanced’ models. Models were evaluated using leave-one-out
26
(n-1) (LOOCV) and grouped (n-25%) cross-validation (GCV). LOOCV R2 values were 0.71,
27
0.50, 0.66 and 0.73, 0.79, 0.78 for traditional and enhanced PM10, NOX and NO2 models,
28
respectively. GCV R2 values were 0.71, 0.53, 0.64 and 0.68, 0.77, 0.77 for traditional and
29
enhanced PM10, NOX and NO2 models, respectively. Data on building volume within the area
30
common to a 20 m road buffer within a 25 m circular buffer substantially improved the
31
performance (R2 > 13%) of NOX and NO2 LUR models.
32 33
Keywords
34
GIS, land use regression, street canyon, street configurations, building heights
3 ACS Paragon Plus Environment
Environmental Science & Technology
35
1. Introduction
36
Land use regression (LUR) models have been widely used for exposure assessment in
37
epidemiological studies.1-3 The structure of LUR models varies between studies (i.e. number
38
of variables, different distances and buffers sizes) due to location and size of area (i.e. city,
39
national, continental), but typically models include information on traffic, road length, land
40
use, population and altitude. Air pollution is quickly dispersed in open country but elevated
41
levels of pollutants are often seen in densely built-up areas, especially in heavily trafficked
42
streets canyons.4 Source data in LUR models are, however, often relatively coarse in spatial
43
resolution (e.g. 100 m CORINE land cover used in the ESCAPE project5) and thus do not
44
represent the physical characteristics of dispersion environments (i.e. buildings and street
45
configuration). Some dispersion models simulate the physical properties of air pollution
46
dispersion around buildings (e.g. ADMS-Urban), but are usually limited to 10-15 buildings
47
(i.e. one street or road intersection) in any model run. Thus, dispersion modeling with
48
buildings is not practical for city-wide exposure assessment. A few studies have developed
49
‘hybrid’ approaches by combining outputs from dispersion models with other variables in
50
LUR,6-8 but thus far these models have the same limitation of not including the effects of
51
buildings on the dispersion of air pollution.
52 53
Information on building heights together with streets widths has been used in previous LUR
54
studies to calculate aspect ratio to act as an indicator of pollutant trapping.9,10 This has been
55
limited to considering the height of buildings on either side of the street at the point of
56
interest, thus it does not account for the effects of street canyon length or other buildings in
57
the vicinity of sources. One constraint has been the availability of data on building heights,
58
which in the past had to be measured in the field and/or imputed from aerial photography or
59
satellite imagery.9,10
4 ACS Paragon Plus Environment
Page 4 of 27
Page 5 of 27
Environmental Science & Technology
60 61
During recent years, new datasets on building heights and geometry, and detailed urban land
62
use data have been made available in several countries including the UK. The Landmap
63
dataset (www.landmap.ac.uk) provides city-wide data on building heights and geometry
64
(with horizontal and vertical accuracy of +/- 0.5m) for much of urban UK, whilst OS
65
MasterMap (http://www.ordnancesurvey.co.uk/oswebsite/products/os-mastermap/index.html)
66
provides highly detailed topography and land use data, including areas of roads (i.e. road
67
width) and land use at fine spatial scale (i.e. 70% of daily
89
mean concentration available for each year between 2008-2011 were selected. Supporting
90
information about these sites, including site type (as defined by LAQN; i.e. kerbside,
91
roadside, industrial, suburban and urban background), and British National Grid X-Y site
92
coordinates were downloaded from the LAQN website. Site coordinates were checked using
93
online Google Maps (maps.google.com). Daily mean concentrations of pollutants were
94
downloaded then averaged over a four-year period (i.e. 2008-2011) to provide long-term
95
average concentrations (i.e. avoid the influence of meteorology in individual years). There
96
were 42, 57, and 56 monitoring sites for PM10, NOX and NO2, respectively, that passed the
97
site selection criteria. Locations of these monitoring sites can be seen in Figures S1-S3
98
(Supporting Information). The 4-year average concentrations of each pollutant are normally
99
distributed.
100 101
Development of predictor variables
102
A range of predictor variables were derived from GIS datasets for developing the LUR
103
models; a full list of variables can be found in Table S1 (Supporting Information). Traffic,
104
land use and population variables extracted were similar to those used in the ESCAPE study.5
105
The London Atmospheric Emissions Inventory 2008 (LAEI 2008) was used for road and
106
traffic variables. It comprises a digital road geography, attributed with information on type of
107
road (‘motorways’, ‘A-roads’ and some significantly trafficked ‘minor roads’), traffic flows
108
and speeds, with separate categories for fleet type (e.g. buses, light and heavy vehicles),
109
within M25 boundary in Greater London. For land use variables, we used a combination of
6 ACS Paragon Plus Environment
Page 6 of 27
Page 7 of 27
Environmental Science & Technology
110
CORINE land cover for Europe (CLC) and the Land Cover Map of Great Britain 2007
111
(LCM2007) (Source: Centre of Ecology and Hydrology) on a 25m2 grid. The land cover
112
datasets were aggregated into five groups: high density urban, low density urban, industry,
113
ports and railways, and urban green space. Further details on this procedure can be found in
114
Supporting Information. Population variables (i.e. number of inhabitants and households)
115
were extracted from the UK 2001 Census (Source: Office for National Statistics) – the closest
116
available year.
117 118
Four variables on buildings and/or street configuration were offered to create enhanced
119
models, including two on aspect ratio as used in other studies9, 10, 12, and two new variables
120
designed as part of this study: (1) aspect ratio (i.e. average building height / road width); (2)
121
maximum aspect ratio (i.e. maximum building height / road width); (3) building volume (i.e.
122
sum of building area × height); and (4) building volume / road width. The new variables are
123
used to account for building intensity and stature close to roads with traffic information; in
124
other words, they aim to represent both the height and continuity of street canyons. The
125
variable on both building volume and road width is thus an extension of aspect ratio to
126
account for both the shape of the canyon and the density of buildings. Higher values of these
127
variables are expected to have a positive effect (i.e. increase) on pollutant concentrations.
128
Building areas and heights were extracted from Landmap. Road widths and areas of urban
129
green space surfaces were extracted from OS MasterMap.
130 131
The building and road width data were extracted only at locations where a road emission
132
source from LAEI (i.e. with traffic information) is present (i.e. at road- and kerb-side sites),
133
and limited to a distance of 100m to represent distance of influence of buildings and street
134
configuration on pollution trapping in the wake of sources.4,11 Two types of buffers were
7 ACS Paragon Plus Environment
Environmental Science & Technology
135
used: (1) circular buffers as used in other studies, and (2) road buffers - buffers of road
136
center-lines were created, intersected with circular buffers, and features within the area
137
common to both buffers (e.g. buildings alongside roads) were clipped and extracted for each
138
of the four building/street configuration variables. Distances for road buffers were 20, 30, 40
139
and 50m from the center of roads within 25, 50 and 100m circular buffers. The road buffers
140
thus focus on areas in the immediate wake of road sources. Figures S4 and S5 in Supporting
141
Information illustrates how road and circular buffers were created.
142 143
Model development
144
Models were developed for PM10, NOx and NO2 using ArcGIS (v10.0; ESRI) to generate
145
predictor variables and SPSS (v20.0; IBM) for regression analysis. Variables were regressed
146
against monitored concentrations of each pollutant. For traditional models, the predictor
147
variable showing the highest correlation with monitored concentrations was first entered in
148
the model and entry of proceeding variables followed a supervised forward stepwise
149
method.13-15
150 151
Two approaches were used to develop enhanced models. Firstly, the four building and street
152
configuration variables were regressed in turn against the residual from traditional models
153
(i.e. coefficients of the variables in the traditional model were not allowed to change), similar
154
to Eeftens et al..12 Secondly, models were developed from scratch with all variables being
155
offered in the regression analysis. We ensured that building/street configuration variables
156
were only entered into the model if a local traffic variable (i.e. traffic load within a 25 m
157
buffer) was already present in the model. Initially, the best performing variable representing
158
local traffic sources were identified, then stepwise coupled with building/street configuration
159
variables. Once the best combination of a local traffic and building/street configuration
8 ACS Paragon Plus Environment
Page 8 of 27
Page 9 of 27
Environmental Science & Technology
160
variable was selected other variables reflecting background traffic (i.e. road and traffic
161
variables beyond the extent of the local traffic variable) and non-traffic sources (e.g.
162
population, housing, industry, green space) were offered.
163 164
A variable is included in a final model if it provides: (1) greater than 1% increment in
165
adjusted R2; (2) p-value 1%; and (2) p-value < 0.05 are
244
highlighted. For PM10, no enhanced variables were found to be statistically significant (p
0.05, thus
257
model residuals were deemed not to be spatially dependent.
258 259
In order to compare the predictive powers of each variable and their contributions to absolute
260
predicted values, regression coefficient × the (90th percentile – 10th percentile) were
261
calculated.12 Predicted long-term average concentrations associated with each variable by site
262
can be found in compound graphs presented in Figures S7-S9 (Supporting Information).
263 264
Model evaluation
265
Table 2 shows summary statistics from cross-validation. Figure 1 shows predicted
266
concentrations from GCV plotted against monitored concentrations. Summary statistics for
267
each evaluation group in GCV can be found in Table S5.
268 269
For LOOCV, traditional models yielded R2 (MSE-R2 in brackets) values of 0.71 (0.70), 0.50
270
(0.48), 0.66 (0.64), and enhanced models yielded R2 values of 0.73 (0.71), 0.79 (0.78), 0.78
271
(0.77) for PM10, NOX and NO2, respectively (Table 2). For all pollutants, enhanced models
272
outperformed traditional models, with 2%, 29% and 12% higher explained variance (i.e.,
273
LOOCV R2) in monitored concentrations for PM10, NOX and NO2, respectively. The fall in
274
R2 between model development (Table 1) and LOOCV R2 was below 10% (3~7% drop for all
275
models), which is comparable to changes in R2 between model development and evaluation
276
in the recent ESCAPE project (within 15%), indicating stable models.5
277 278
The R2 values obtained from GCV for PM10, NOX and NO2, respectively, were 0.71 (0.69),
279
0.53 (0.51), 0.64 (0.63) for traditional models and 0.68 (0.67), 0.77 (0.77), 0.77 (0.76) for
13 ACS Paragon Plus Environment
Environmental Science & Technology
280
enhanced models. Enhanced models yielded 24% and 13% increase in explained variation in
281
NOX and NO2 concentrations, but the PM10 model was not improved.
282 283
Values of RMSE calculated from LOOCV and GCV were found to be similar, with higher
284
values of RMSE from traditional models, with the exception of PM10 in GCV. IOA ranged
285
from 0.89~0.94 for LOOCV and 0.83~0.94 for GCV. IOAs are higher in enhanced models;
286
with 11% and 5% more agreement between predicted and monitored values in GCV for NOX
287
and NO2, respectively. Values of FB for all models are small (-0.002~0.004 for LOOCV; -
288
0.015~0.009 for GCV), indicating that models are relatively free from bias (i.e. low levels of
289
over- or under-prediction). The highest bias was under-predictions of enhanced PM10 model
290
in GCV (FB = -0.015; i.e. under-prediction of about 1.5%). Values of beta (i.e. regression
291
slopes) fell within 95% confidence interval in all cases.
292 293
Discussion
294
We developed enhanced intra-urban PM10, NOX and NO2 land use regression models for
295
London, using data on building heights and street configuration alongside traditional LUR
296
variables (i.e. land use, population, roads etc.), and compared their performance with
297
traditional models. To our knowledge, this is the first study both to employ city-wide building
298
geometry/heights and detailed land use datasets (with spatial accuracy of +/-1 meter in urban
299
data) to represent the effects of the built environment on the pollutant dispersion in LUR
300
models.
301 302
We offered the enhanced variables in addition to the traditional models using a similar
303
approach to the one applied in the Netherlands.12 We subsequently developed new models by
304
offering enhanced and traditional variables together. Results showed that models with
14 ACS Paragon Plus Environment
Page 14 of 27
Page 15 of 27
Environmental Science & Technology
305
enhanced geographical variables provided substantial improvements in model performance
306
for NOX (MSE-R2 = 26% in GCV) and NO2 (MSE-R2 = 13% in GCV). Although a higher
307
value of R2 was achieved in model building for the enhanced PM10 model, its performance
308
measured against the traditional model in evaluation was mixed.
309 310
LUR model variables and model performance
311
We developed a range of variables using circular and road buffers of varying size to represent
312
building densities and street canyons based on canyon length, height, width and building
313
areas. These variables represented both the immediate street-level dispersion environment
314
and the dispersion field in the surrounding area around monitoring sites. This is an
315
improvement in LUR models over simple street canyon (i.e. no canyon; partial/full canyon)
316
classifications or aspect ratios which typically only consider the height of buildings on
317
opposite sides of streets at the air pollution monitoring site.9,10,12
318 319
In enhanced models, variables extracted with road buffers were retained in some models over
320
traditional circular buffers. Smaller sized road buffers (