Subscriber access provided by READING UNIV
Article
Distinguishing petroleum (crude oil and fuel) from smoke exposure within populations based on the relative blood levels of benzene, toluene, ethylbenzene, and xylenes (BTEX), styrene and 2,5dimethylfuran by pattern recognition using artificial neural networks David Michael Chambers, Christopher Mark Reese, Lydia Grace Thornburg, Eduardo Sanchez, Jessica Patricia Rafson, Benjamin C. Blount, John Russell Erskine Ruhl, and Victor Raul De Jesus Environ. Sci. Technol., Just Accepted Manuscript • DOI: 10.1021/acs.est.7b05128 • Publication Date (Web): 07 Dec 2017 Downloaded from http://pubs.acs.org on December 13, 2017
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
Environmental Science & Technology is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 27
Environmental Science & Technology
1 2 3
Distinguishing petroleum (crude oil and fuel) from smoke exposure within populations based on the relative blood levels of benzene, toluene, ethylbenzene, and xylenes (BTEX), styrene and 2,5dimethylfuran by pattern recognition using artificial neural networks
4 5
Chambers, D.M.*; Reese, C.M.; Thornburg, L.G.; Sanchez, E.; Rafson, J.P.; Blount, B.C.; Ruhl III, J.R.E.; De Jesús, V.R.
6 7
Tobacco and Volatiles Branch, Division of Laboratory Sciences, National Center for Environmental Health, US Centers for Disease Control and Prevention, Atlanta, GA 30341
8
*Corresponding author: 4770 Buford Hwy., NE, Mail Stop F-47, Atlanta, GA 30341
9
E-mail address:
[email protected] 10 11
Abstract Studies of exposure to petroleum (crude oil/fuel) often involve monitoring benzene, toluene,
12
ethylbenzene, xylenes (BTEX) and styrene (BTEXS) because of their toxicity and gas-phase prevalence,
13
where exposure is typically by inhalation. However, BTEXS levels in the general U.S. population are
14
primarily from exposure to tobacco smoke, where smokers have blood levels on average up to eight
15
times higher than nonsmokers. This work describes a method using partition theory and artificial neural
16
network (ANN) pattern recognition to classify exposure source based on relative BTEXS and 2,5-
17
dimethylfuran blood levels. A method using surrogate signatures to train the ANN was validated by
18
comparing blood levels among cigarette smokers from the National Health and Nutrition Examination
19
Survey (NHANES) with BTEXS and 2,5-dimethylfuran signatures derived from the smoke of machine-
20
smoked cigarettes. Classification agreement for an ANN model trained with relative VOC levels was up
21
to 99.8 % for nonsmokers and 100.0% for smokers. As such, because there is limited blood level data on
22
individuals exposed to crude oil/fuel, only surrogate signatures derived from crude oil and fuel were
23
used for training the ANN. For the 2007-2008 NHANES data, the ANN model assigned 7 out of 1998
24
specimens (0.35%) and for the 2013-2014 NHANES data 12 out of 2906 specimens (0.41%) to the
25
crude oil/fuel signature category.
ACS Paragon Plus Environment
Environmental Science & Technology
26 27
Page 2 of 27
1. Introduction As oil and natural gas extraction capacity grows to meet increased demand for petroleum-based
28
products, the potential for exposure to volatile organic compounds (VOCs) emanating from drilling1,
29
industrial2, and consumer product3 sources increases. Exposure to VOCs from these sources is important
30
to identify and minimize because these VOCs include toxicants such as benzene, toluene, ethylbenzene,
31
and xylenes (BTEX) that can cause cancer, neurological damage and impairment, and pulmonary and
32
cardiovascular disease4-8. BTEX concentrations are commonly measured in environmental and
33
biomonitoring samples to assess exposure levels associated with petroleum sources because they are
34
prominent volatile components in petroleum9 and are chemically stable10 allowing them to be measured
35
as intact compounds in the body11.
36
Absorption of VOCs occurs through three major routes—inhalation, dermal absorption and
37
ingestion. Of these inhalation is generally the primary absorption route for nonpolar VOCs because
38
100% of the blood circulates through the lungs. Only approximately 20% of the blood passes through
39
the digestive tract and the dermal stratum corneum barrier generally inhibits VOC adsorption12.
40
Furthermore, relative gas-phase levels of BTEX that have similarly low molecular weights are
41
conserved as they evaporate from crude oil or fuel, a nonpolar matrix with relatively low-
42
intermolecular-forces13. For these reasons, relative BTEX blood levels in this study are associated with
43
inhalation absorption.
44
Nevertheless, a common impediment to petroleum exposure assessment using biomarkers such as
45
BTEX has mainly been confounded by exposure to BTEX in cigarette smoke. In fact, tobacco smoke
46
contains ppm levels14 of these and other petroleum biomarkers as both are derived from the biomass.
47
Among cigarette smokers, blood BTEX levels can reach the low ng/mL range, whereas nonsmokers are
48
typically in the pg/mL range11. In the same vein, confounding from other types of smoke exist such as
49
from marijuana smoke, which is increasing in prevalence15,16. Therefore, if including individuals
ACS Paragon Plus Environment
Page 3 of 27
Environmental Science & Technology
50
exposed to smoke in biomonitoring studies that investigates petroleum exposure, smoke exposed
51
individuals need to be accurately classified because of their potential to confound the assessment.
52
In this work, we describe a biomonitoring approach that distinguishes exposure specific to petroleum
53
through pattern recognition of relative biomarker levels of BTEX and 2,5-dimethylfuran using artificial
54
neural network (ANN) modeling. As a way to minimize confounding seen by other significant exposure
55
sources such as tobacco smoke, we investigated the use of relative combustion biomarker levels of
56
BTEX + styrene (BTEXS) and 2,5-dimethylfuran. Styrene levels, although not typically measured with
57
BTEX and not always available, are included in this work where possible because styrene is a
58
monoaromatic similar to BTEX and a BTEX concomitant at appreciable levels in smoke and
59
petroleum11,17. 2,5-Dimethylfuran is included because it is a smoke biomarker that exists at levels
60
comparable to BTEX and styrene18,19. This application of ANN, novel to exposure assessment, arose out
61
of the need to objectively distinguish exposure between different VOC sources involving thousands of
62
specimens. As such, we demonstrate that relative blood BTEXS levels among smokers and cigarette
63
tobacco smoke remain consistent over a range of cigarette brand varieties and smoking techniques. This
64
approach takes advantage of the similarities of the physical properties of BTEXS and partition theory.
65
Next, this approach is used for the identification of individuals having relative levels of these
66
compounds in their blood that are relatable to relative concentrations in petroleum (e.g., crude oil and
67
fuel). Although BTEXS blood levels vary with source concentration, relative proportions can remain
68
consistent and unique to a particular source, especially for nonpolar VOCs with similar chemical
69
properties in nonpolar matrices, as in the case of BTEXS in petroleum.
70
2. Experimental
71
2.1.Population Data
72
Population data were taken from the 2007-2008 and 2013-2014 National Health and Nutrition
73
Examination Survey (NHANES) provided by the National Center for Health Statistics (NCHS)20.
74
Survey data for the 2007-2008 NHANES cycle on recent VOC exposure was included in SAS export ACS Paragon Plus Environment
Environmental Science & Technology
Page 4 of 27
75
file, VOCWB_E.XPT, and recent tobacco use and drug use were collected at the mobile examination
76
center (SAS export file: SMQRTU_E.XPT and DUQ_E.XPT, respectively) on the day of the health
77
exam. Corresponding survey data for the 2013-2014 NHANES cycle on recent VOC exposure was
78
reported in SAS export file VTQ_H.XPT and recent tobacco and drug use in file SMQRTU_H.XPT and
79
DUQ_H.XPT, respectively. Specific questionnaire data regarding recent VOC exposure and tobacco use
80
for evaluating ANN model results are specified in the Discussion section.
81
Whole blood specimens are collected at NHANES mobile examination centers (MEC) by certified
82
phlebotomists and are on average collected approximately 40 min. after check-in. Analyses of these
83
hermetically collected blood specimens were performed by CDC’s Division of Laboratory Sciences
84
(DLS) at the National Center for Environmental Health (NCEH). The specific analysis method used by
85
the DLS for determining VOC blood levels by equilibrium headspace solid phase microextraction
86
(SPME)/gas chromatography (GC)/mass spectrometry (MS) is described in detail elsewhere17,21,22. For
87
accurate quantification this laboratory used stable isotopically labeled analogs for each VOC for internal
88
standardization to offset any matrix and competition effects in sample preparation and analysis23.
89
Whole blood from 3415 participants for the 2007-2008 NHANES (SAS export file:
90
VOCWB_E.XPT) and 3489 for the 2013-2014 NHANES (SAS export file: VOCWB_H.XPT) was
91
analyzed, where both cycles consisted of a half subsample of ≥12 years of age. These whole blood levels
92
for benzene, toluene, ethylbenzene, m/p-xylene, o-xylene, styrene, and 2,5-dimethylfuran were used in
93
two different pattern recognition analyses. One analysis involved classifying specimens only between
94
other/nonsmoker and smoker categories for proof of concept, and the other for classifying specimens
95
among other/nonsmoker, smoker and crude oil/fuel categories. Because styrene was only available in the
96
2007-2008 NHANES data set, only the 2007-2008 NHANES data set was used in pattern recognition
97
proof of concept. Participants were excluded from the analyses if they were missing blood levels for any
98
of the six VOCs. Therefore, for the 2007-2008 NHANES data, 1858 out of 3415 were used where
99
styrene was included (other/nonsmoker vs. smoker classification model) and 1998 out of 3415 where ACS Paragon Plus Environment
Page 5 of 27
Environmental Science & Technology
100
styrene was not used (other/nonsmoker, smoker, crude oil/fuel classification model). For the 2013-2014
101
NHANES data, which did not have styrene data, 2906 specimens of the 3489 were used
102
(other/nonsmoker, smoker, crude oil/fuel classification model).
103
2.2 Estimation of Surrogate Signatures of BTEXS and 2,5-Dimethylfuran from Source Concentration
104
Cigarette tobacco smoke BTEXS and 2,5-dimethylfuran levels used in this study were quantified
105
using Tedlar bag collection of machine smoked cigarettes followed by equilibrium headspace
106
SPME/GC/MS analysis as described elsewhere24. In summary, Tedlar bags fitted with butyl o-rings
107
were attached to an ASM500 smoking machine (Cerulean, Milton Keynes, U.K.) and then analyzed
108
directly by SPME with a specially configured CTC Combi-PAL autosampler (Leap Technologies,
109
Carrboro, NC). SPME collection was by 75-µm carboxen-PDMS SPME fiber (Supelco, Bellefonte, PA)
110
for 1 min at room temperature. As similar to the blood VOC method, isotopically labeled internal
111
standard analogs were used to offset analyte losses and competition effects. These cigarette smoke VOC
112
levels were generated using the International Organization for Standardization (ISO) (3308:2012) and
113
Canadian Intense (CI) protocols from 50 U.S. brand varieties of cigarettes from brand families that
114
comprise 78% of the market share at the time of the sampling14. Surrogate estimates of blood BTEXS
115
and 2,5-dimethylfuran levels were produced by multiplying these smoke levels by their respective
116
blood/air partition constant, Kblood/air, and normalizing the levels relative to toluene, a high detection rate
117
analyte in blood. Although, m/p-xylene blood levels had a slightly higher detection rate than toluene,
118
toluene was used because available crude oil and fuel BTEX data combined m/p-xylene with o-xylene
119
and o-xylene had a lower detection rate than toluene. The corresponding Kblood/air values used for this
120
adjustment were taken from Meulenberg and Vijverberg25, where benzene is 7.37, toluene is 15.11,
121
ethylbenzene is 28.2, styrene is 55.6, total xylene is 36.13 (the average of 33.2 for m-xylene, 38.9 for p-
122
xylene, and 36.3 for o-xylene), and m/p-xylene is 36.05 (the average Kblood/air for m-xylene and p-
123
xylene). A measured value of Kblood/air for 2,5-dimethylfuran was not available, but was estimated to be
124
8.5 using previous published regression data26. The use of these surrogate VOC signatures (i.e., relative ACS Paragon Plus Environment
Environmental Science & Technology
125
BTEXS and 2,5-dimethylfuran levels derived from the source material) is compared with the use of
126
measured blood VOC signatures (i.e., relative BTEXS and 2,5-dimethylfuran levels in blood) for
127
training the ANN.
128
Page 6 of 27
The surrogate blood BTEX (styrene not available) and 2,5-dimethylfuran signatures for crude oil and
129
fuel exposure used for training the ANN were estimated from crude oil and fuel characterizations from
130
the Environment Canada Environmental Technology Centre Oil Properties Database27. Reported in the
131
database are ppm concentrations (by weight) of benzene, toluene, ethylbenzene and total xylene, but not
132
styrene, in a number of the 450 different petroleums. Unweathered crude oil selected were representative
133
of different regions of the world and fuels were representative of different types of fuels. The petroleums
134
as listed in the database with location of the oil field in parentheses that were used include Alaska North
135
Slope (2002) (Alaska), Amauligak (Canada), Arabian Light (2000) (Saudi Arabia), Arabian Medium
136
(Saudi Arabia), BCF 24 (Venezuela), Boscan (Venezuela), Cano Limon (Columbia), Carpinteria
137
(California), Chayvo #6 (Sakhalin), Cold Lake Blend (Alaska), Cusiana (Columbia), Dos Cuadras
138
(California), Ekofisk (Norway), Eugene Island Block 32 ( Louisiana), Isthmus (Mexico), Maya
139
(Mexico), Maya (1997) (Mexico), Mississippi Canyon Block 194 (Mississippi), South Louisiana (2001)
140
(Louisiana), Terra Nova (Canada), West Detlat Block 97 (Louisiana), West Texas (2000) (Texas), West
141
Texas Intermediate (Texas), West Texas Sour (Texas), and Vasconia (Columbia). Fuels included
142
Aviation Gasoline 100LL, Diesel Fuel Oil (2002), Diesel Fuel Oil (Alaska), Diesel Fuel Oil (Southern
143
U.S.A., 1994), Diesel Fuel Oil (Southern U.S.A., 1997), Fuel Oil No. 5 (2000), High Viscosity Fuel Oil,
144
Jet A/Jet A-1, Jet B (Alaska). The automobile gasoline BTEX signature used was from the Total
145
Petroleum Hydrocarbon Criteria Working Group9 because a gasoline signature was not available in
146
Environment Canada’s database. These surrogate blood signatures were created using the same
147
procedure as described for producing the surrogate blood BTEXS signatures for cigarette smoke
148
exposure.
ACS Paragon Plus Environment
Page 7 of 27
149
Environmental Science & Technology
2.3 Statistical Analysis
150
JMP 12.0 was used for all the statistical analyses. Central tendency for blood VOC levels is
151
expressed as geometric mean because the data followed a log normal distribution. For the artificial
152
neural network (ANN) analyses, model parameters were fit via penalized maximum likelihood
153
maximization28. Separate log-likelihoods were computed for each response, and the overall log-
154
likelihood for all the responses was taken as the sum of the log-likelihoods of the individual responses.
155
The model used one hidden layer with 15 hidden nodes each using a hyperbolic tangent (TanH)
156
activation function and was validated using a holdback fraction of 0.33. The optimal number of hidden
157
nodes was determined as the minimal number of nodes that yielded, on average, the best classification
158
accuracy in the training set with regard to the surrogate crude oil/fuel signatures. The output layer had
159
three output nodes, corresponding to smoker, to crude oil/fuel, and to neither smoker nor crude oil/fuel,
160
which is referred to as other/smoker. The input layer variables for the characterization of
161
other/nonsmoker (not exposed to either crude oil/fuel or smoke) and smoker included blood levels for
162
benzene, toluene, ethylbenzene, o-xylene, m/p-xylene, styrene and 2,5-dimethylfuran where smoking
163
status was designated using the 2,5-dimethylfuran cutpoint of 0.014 ng/mL rather than questionnaire
164
data for convenience, as 2,5-dimethylfuran blood levels are available with the BTEXS levels. The
165
surrogate blood levels used as input variables for the classification of crude oil/fuel signatures were
166
calculated from petroleum and fuel source concentrations using the same approach as for creating the
167
surrogate blood levels from cigarette smoke levels described above. Specifically, benzene, toluene,
168
ethylbenzene, total xylene and 2,5-dimethylfuran concentrations were multiplied by their respective
169
Kblood/air and normalized relative to the toluene concentration. 2,5-Dimethylfuran was imputed as below
170
LOD (i.e., LOD/√2) and styrene levels were not used because they were was not available. Models that
171
involved classifying only between smokers and other/nonsmokers used the entire 2007-2008 NHANES
172
data set where there were approximately 4 times more other/nonsmokers than smokers. However, for
173
models created with the other/nonsmoker, smoker and crude oil/fuel categories, the size of the three ACS Paragon Plus Environment
Environmental Science & Technology
174
input groups were adjusted so that each group had similar numbers of results of approximately 500,
175
where 333 of these were set aside for training and 167 for validating the ANN. In this case, the training
176
set for the other/nonsmoker group was randomly sampled to decrease its size to 525 participants, all
177
smokers were used (N=494) and the surrogate crude oil/fuels signatures were oversampled29 by
178
duplicating their results (14 replicates of the 35 petroleums, N=490). The random selection was
179
performed by creating a new column and making the initial data values randomly equal to 1 for 25% or
180
0 for 75% of the data. Rows where the column result equals 1 were used. Output value (probability
181
ranging from 0 to 1) cutpoint for assignment to a category is set to 0.5. Blood levels were normalized to
182
toluene because the surrogate signature used for crude oil and fuel exposure expresses relative level and
183
not absolute concentration. Rows with missing values were ignored.
184
The robustness and appropriateness of using measured vs. surrogate signatures were assessed in an
185
experiment involving classification of smokers and other/nonsmokers using the 2007-2008 NHANES
186
data. The percentage of correct classification assignments was based on 2,5-dimethylfuran cut-point.
187
Note that the 2007-2008 NHANES data had only two categories, other/nonsmoker and smoker, any
188
individuals that may have had a signature corresponding to the surrogate crude oil/fuel signature were
189
initially assigned to either other/nonsmoker or smoker in the training. Following this experiment, an
190
ANN model used for classification of all three groups—other/nonsmoker, smoker, and crude oil/fuel—
191
was constructed by combining the surrogate crude oil/fuel signatures with the other/nonsmoker and
192
smoker signatures from the 2007-2008 NHANES data. Once this final model with the three categories
193
was created, it was applied to the 2013-2014 NHANES data.
194
3. Results
195
Without a sufficient number of known crude oil or fuel blood signatures with which to train the
196
ANN, it was necessary to generate surrogate a BTEX and 2,5-dimethylfuran signature from petroleum
197
concentrations that could be compared with specimen blood levels. For this surrogate signature, blood
ACS Paragon Plus Environment
Page 8 of 27
Page 9 of 27
Environmental Science & Technology
198
BTEX and 2,5-dimethylfuran signatures needed to be estimated from relative levels of these chemicals
199
in different crude oils and fuels by adjusting the relative crude oil or fuel BTEX concentrations based on
200
blood-air partition constant, Kblood/air. This approach is validated in the following data where relative
201
blood BTEXS levels of cigarette smokers are estimated from the relative BTEXS levels measured in
202
cigarette smoke from different cigarettes.
203
Shown in Fig. 1 is a comparison of blood levels measured among smokers with different smoking
204
frequencies. The BTEXS and 2,5-dimethylfuran blood VOC levels were from the 2007-2008 NHANES
205
and are baseline adjusted mean levels for participants who reported smoking only 5, 10, 15, 20, 30, or 40
206
cigarettes per day (CPD). The number of individuals (N) in these categories ranged from 14 to 91 as
207
noted in the figure caption. Baseline adjustment of cigarette smoker blood levels was performed by
208
subtracting the mean blood level of individuals that were classified as other/nonsmoker (N=4876). This
209
adjustment was only performed on Fig. 1 and 2 data for the sake of comparing BTEXS and 2,5-
210
dimethylfuran levels among smokers and with levels in cigarette smoke in Fig. 2. It is important to note
211
that the data used to produce the crude oil/fuel ANN classification models described later are not
212
baseline adjusted. Other/nonsmoker were identified as having blood 2,5-dimethylfuran levels < 0.014
213
ng/mL. Levels in Fig. 1 were not normalized so that magnitude of BTEXS exposure could be compared.
214
Error bars were not added to this figure because they complicated the graphics, however they are
215
included in the composite signature shown in Fig. 2. Upon normalization of each CPD category to
216
toluene, the percent relative standard deviation (RSD) that existed across these selected CPD categories
217
were 9.0% for m/p-xylene, 6.3% for benzene, 9.3% for ethylbenzene, 13.3% for styrene, 14.7% for o-
218
xylene, and 5.5% for 2,5-dimethylfuran.
219
Surrogate blood BTEXS signatures estimated from the cigarette smoke of 50 U.S. brand varieties
220
smoked using two different regimens are compared with a normalized blood BTEXS and 2,5-
221
dimethylfuran composite from smokers in Fig. 2. These surrogate signatures were created by first
222
multiplying the amount of the compounds per smoked cigarette (generated using either the Canadian ACS Paragon Plus Environment
Environmental Science & Technology
Page 10 of 27
223
Intense and ISO protocols) by the respective Kblood/air, and then normalizing these levels with respect to
224
toluene. The BTEXS and 2,5-dimethylfuran blood signature is a composite for all 2007-2008 NHANES
225
smokers (N=456) normalized to toluene and background subtracted, where smokers are classified using
226
a 2,5-dimethylfuran cutpoint level above 0.014 ng/mL. Error bars for Fig. 2 are represented as 1
227
standard error.
228
The same procedure used to estimate blood VOC level signatures for smokers from cigarette smoke
229
was used to generate surrogate blood signatures consistent with those deduced from crude oil or fuel.
230
Specifically, crude oil or fuel benzene, toluene, ethylbenzene, total xylene and 2,5-dimethylfuran ppm
231
(by weight) concentrations were multiplied by their respective Kblood/air and normalized relative to
232
toluene. In this application, BTEX levels among different crude oil and fuels were obtained primarily
233
from Environment Canada’s Oil Properties Database. Petroleums selected included 25 crude oils from
234
the United States and other large oil producing countries, and nine fuels27. Because a signature for
235
automobile gasoline was not available in this database, the one from the Total Petroleum Hydrocarbon
236
Working Group was used9. In the Environment Canada’s database, levels of xylene’s structural isomers
237
(m-xylene, p-xylene, and o-xylene) were combined and there were no data for styrene. Levels for 2,5-
238
dimethylfuran, which is not a substantial compound in petroleum, were added to the signature and set to
239
below the LOD (imputed as LOD/√2). Correlation between analyte pairs were high to moderate where
240
Pearson correlation coefficients were 0.82 (toluene vs. xylenes), 0.80 (benzene vs. toluene), 0.74
241
(benzene vs. ethylbenzene), 0.71 (benzene vs. xylenes), and 0.59 (toluene vs. ethylbenzene), and 0.42
242
(ethylbenzene vs. xylenes). Accordingly, the BTEX levels for different petroleums were adjusted by
243
multiplying them by their corresponding Kblood/air, and then normalizing relative to toluene for
244
comparison with the NHANES blood signatures. Shown in Fig. 3 is a comparison of the raw composite
245
petroleum signature, the surrogate blood level composite signature derived from the petroleum
246
signatures (adjusted based on blood/air partition constants and normalized relative to toluene), and an
247
individual blood signature from a known fuel exposure22. ACS Paragon Plus Environment
Page 11 of 27
Environmental Science & Technology
248
Shown in Fig. 4 are density plots characterizing ANN predictions for the 2013-2014 NHANES data
249
using a model trained and validated with 2007-2008 NHANES data. Toluene, total xylenes (m/p-xylene
250
+ o-xylene), benzene, ethylbenzene, and 2,5-dimethylfuran levels were used as input variables in
251
modeling. For the ANN model the prediction variables were other/nonsmoker, smoker, and crude
252
oil/fuel groups, where training signatures were from individuals identified as other/nonsmoker and
253
smokers (established using blood 2,5-methylfuran cutpoint of 0.014 ng/mL11) and surrogate crude
254
oil/fuel signatures. The training set for the other/nonsmoker signature potentially included crude oil/fuel
255
signatures that had not yet been identified as so. Fitting the NHANES data repeatedly ten times resulted
256
in varying assignments where the number of predicted crude oil/fuel signatures ranged from 7 to 24 for
257
the 2013-2014 NHANES and from 1 to 7 for the 2007-2008 NHANES. Each of these 10 models
258
properly identified a positive control blood signature from an individual with known fuel exposure
259
presumed to be from inhalation22 with probabilities ranging from 0.9800 to 0.9999. Visual inspection of
260
each sample identified as having the crude oil/fuel signature by any of these 10 models were generally
261
similar to the surrogate and known exposure blood sample signatures. The first model fit of the 2007-
262
2008 NHANES data classified 7 individuals as crude oil/fuel (0.35%), 1591 as other/nonsmoker
263
(79.63%) and 400 as smoker (20.02%). Prediction results using the 2013-2014 NHANES blood VOC
264
data from the first model fit are graphed in Fig. 4a and consisted of 12 crude oil/fuel (0.41%), 2290
265
other/nonsmoker (78.80%) and 604 smoker (20.79%). Note that the crude oil/fuel probability cutpoint is
266
0.5, however because of smoothing by JMP, the graphical boundary for the crude oil/fuel distribution
267
extends below 0.5. Most smokers and other/nonsmoker had a probability below 0.25 with 5
268
other/nonsmoker and 1 smoker between 0.25 and 0.5. Shown in Fig. 4b are corresponding probabilities
269
for other/nonsmoker and smoker with the petroleum/fuel group identified in Fig. 4a. Among the samples
270
classified as either other/nonsmoker or smoker, most (i.e., 2791) had probabilities less than 0.25 or
271
greater than 0.75, where 103 samples fell between these limits.
ACS Paragon Plus Environment
Environmental Science & Technology
Page 12 of 27
272
Compared in Fig. 5 are BTEXS and 2,5-dimethylfuran composite signatures based on assignments
273
for the three exposure categories (crude oil/fuel, other/nonsmoker, and smoker) for the 2007-2008 and
274
2013-2014 NHANES data with m/p-xylene and o-xylene separated and with styrene included for the
275
2007-2008 data. Blood concentrations are expressed as geometric means because the data are
276
lognormally distributed. Error bars associated with geometric mean are not included because the
277
geometric means are presented on a linear scale. Individuals classified in the crude oil/fuel group
278
generally had levels of combined xylene that were higher than toluene, but overall high toluene and
279
ethylbenzene as well. Specimens classified as other/nonsmoker typically had lowest levels of BTEXS
280
and 2,5-dimethylfuran, where toluene and total xylenes were typically comparable. The signature for the
281
smoker category had higher levels of toluene and comparable levels of xylenes and benzene. Benzene
282
and styrene levels were comparable between smoker and crude oil/fuel groups.
283
4. Discussion
284
Elimination half-life for BTEXS, which have similar blood stabilities, polarities and solubilities, fall
285
within a narrow range from 8-30 hrs and are generally an order of magnitude longer than α-phase half-
286
lives30-33. Therefore, blood levels can persist above detectable levels even if blood samples are collected
287
beyond the α-phase half-life, where α-phase half-life is the half-life of the VOC if it is only in the blood
288
(compartment) and elimination half-life incorporates equilibration of the VOCs between the blood and
289
the body tissues (muscle, adipose, organ compartments). Because we are evaluating relative BTEXS and
290
2,5-dimethylfuran blood levels rather than absolute blood concentrations, exposure intensity is
291
unimportant as long as relative BTEXS and 2,5-dimethylfuran levels remain conserved. Studies on
292
weathering of petroleums demonstrate fast evaporation of BTEX because they exist as volatile
293
compounds with similarly low molecular weights in a relatively nonpolar matrix, causing relative gas
294
phase levels to be generally conserved13. Furthermore, BTEXS signatures resulting from crude oil/fuel
295
exposure can be confounded by exposure to tobacco smoke, especially from cigarette use, as smoking
ACS Paragon Plus Environment
Page 13 of 27
Environmental Science & Technology
296
prevalence in the United States is 16.4%34 and tobacco smoke has high levels of BTEXS
297
(µg/cigarette)14. Individuals who are not exposed to smoke often have BTEXS blood levels near or
298
below detection as this VOC source substantially drives BTEXS levels in the general U.S. population11.
299
Based on findings shown in Figs. 1 and 2, blood BTEXS signature among smokers is conserved (based
300
on relative level) despite demographic and smoking technique differences, although absolute levels vary
301
substantially. This consistency in relative blood BTEXS level is apparent from the moderate to high
302
correlation among the BTEXS compounds for smokers who report smoking between 5 and 40 cigarettes
303
per day (Fig. 1), where Pearson r values range from 0.55 (o-xylene vs. benzene) to 0.96 (m/p-xylene vs.
304
ethylbenzene) and are consistent with those previously reported11. The basis for this consistency is
305
attributed to the conservation of relative BTEXS and 2,5-dimethylfuran levels from mainstream
306
cigarette smoke where these relative levels have been shown to exist within a narrow range.
307
Specifically, in a study of 50 different U.S. brand varieties generated with two different smoking
308
machine protocols, the Pearson r values among the BTEXS ranged from 0.83 (o-xylene vs. benzene) to
309
0.98 (m/p-xylene vs. o-xylene)14. This high degree of consistency and correlation persists despite
310
differences in influential cigarette parameters such as percent tip ventilation and number of puffs per
311
cigarette14 and confirms that there is a characteristic BTEXS signature (based on relative level) among
312
the different brand varieties.
313
This consistency in relative BTEXS levels in mainstream smoke makes possible a proportional
314
relationship between tobacco smoke and blood levels that is attributed to quick equilibration of inhaled
315
VOCs. Blood VOC levels dependent on blood-air partitioning, Kblood/air, equilibrate quickly because of
316
relatively small pulmonary venous blood volume and high flowrate where the entire body blood volume
317
completely circulates through the lungs in approximately 1 min35,36. This relationship is confirmed by
318
the similarity of smoker blood levels and VOC levels in machine-generated smoke that has been
319
adjusted with Kblood/air by multiplying smoke BTEXS and 2,5-dimethylfuran levels by their respective
320
Kblood/air. During exposure to the BTEXS and 2,5-dimethylfuran in cigarette smoke, the proportions of ACS Paragon Plus Environment
Environmental Science & Technology
Page 14 of 27
321
VOC levels in the blood are based on their affinity that corresponds to Kblood/air. Because magnitude of
322
BTEXS and 2,5-dimethylfuran levels can vary depending on smoking topography, that is, the manner in
323
which a smoker smokes a cigarette, two machine smoking protocols with different smoking intensities
324
were evaluated—the ISO and CI37,38. Shown in Fig. 2 is a comparison of the normalized adjusted
325
BTEXS and 2,5-dimethylfuran signatures produced by the ISO and CI protocols from a U.S. market
326
study of 50 brand varieties. Although the signatures associated with these two protocols are similar,
327
there are some minor differences that are statistically significant. Specifically, m/p-xylene and styrene
328
levels were substantially lower relative to toluene for the ISO protocol than with the CI protocol.
329
Because the blood BTEXS and 2,5-dimethylfuran signature of smokers more closely resembles that
330
produced by the ISO protocol than the CI protocol (Fig. 2), the ISO signature was used in ANN training
331
comparison experiments discussed below. The similarity between BTEXS and 2,5-dimethylfuran levels
332
in smokers and adjusted BTEXS and 2,5-dimethylfuran levels in smoke generated by the ISO method is
333
a noteworthy finding, and underscores the consistency of relative BTEXS and 2,5-dimethylfuran levels
334
in cigarette smoke and among smokers.
335
Because of the consistency of relative blood levels of BTEXS and 2,5-dimethylfuran among smokers
336
given vastly different exposure magnitudes, we investigated whether a BTEXS signature associated with
337
crude oil or fuel could be identified in the general population using ANN pattern recognition. To train
338
the ANN, best practice is to use relative levels of blood BTEXS of petroleum exposed individuals,
339
however because such data is not available we used surrogates signatures derived from different crude
340
oils and fuel. The limitation in using surrogate signatures produced from source concentrations to train
341
the ANN is that they lack magnitude information, which can be used to adjust the contribution of any
342
baseline levels that might exist. For example, in the comparison shown in Fig. 2, the mean baseline level
343
of nonsmokers was subtracted from the smoker composite signature. In reality, the smoker composite
344
includes baseline levels as shown in Fig. 5. In models using surrogate signatures constructed from
345
cigarette smoke concentrations, this baseline level was absent. The resulting ANN models (10 model ACS Paragon Plus Environment
Page 15 of 27
Environmental Science & Technology
346
iterations) made with these surrogate signatures agreed with classifications defined by the 0.014 ng/mL
347
2,5-dimethylfuran cutpoint (discussed below) for 99.5–99.9 % of nonsmokers (N=1502), but only 85.7–
348
90.7 % of smokers (N=356). On the contrary, ANN models (10 model iterations) created using blood
349
BTEXS signatures from smokers where classification was defined by the 2,5-dimethylfuran cutpoint,
350
classification agreement of 99.0–99.8 % for nonsmokers (N-1502) and 97.5–100.0% for smokers
351
(N=356) were achieved. The lower level of classification agreement using surrogate signatures is
352
attributed to not being able to add a baseline level to surrogate signatures causing the model to classify
353
individuals with low-level smoke BTEXS exposure as other/nonsmoker as the baseline level becomes a
354
more prominent component of the signature. In conclusion, using the surrogate signature slightly under
355
estimated exposed individuals.
356
For training the ANN using smokers, smokers were initially classified using the smoke biomarker
357
2,5-dimethylfuran with a cutpoint of 0.014 ng/mL. It was convenient to use 2,5-dimethylfuran levels
358
versus questionnaire data because 2,5-dimethylfuran is measured simultaneously with the other VOCs
359
and is a well-established smoking biomarker19,39. Training and validation of the ANN involved using the
360
the 2007-2008 NHANES data and did not include specimens with missing data. The ANN model
361
classified participants who reported recent use of other combustible tobacco products [e.g., pipe
362
(SMQ755 = 1, N = 1), cigar (SMQ785 = 1, N = 6)] as cigarette smokers, with the exception of three
363
cigar smokers. Two of these cigar smokers (SEQN# 46111 and 47799) had 2,5-dimethylfuran levels
364
below the established 0.014 ng/mL cutpoint and were identified as nonsmokers in all 10 iterations of the
365
model. BTEXS levels for these two samples are more consistent with those of the other/nonsmoker
366
category having m/p-xylene below 0.1 ng/mL. The third cigar smoker (SEQN# 51320) had a blood 2,5-
367
dimethylfuran level of 0.055 ng/mL and m/p-xylene = 0.457 ng/mL, but was classified as an
368
other/nonsmoker in 5 of the 10 ANN models, where the smoker classification probability average was
369
0.52 ± 0.25 and other/nonsmoker classification probability average was 0.48 ± 0.25. This average result,
370
which assigns this third cigar smoker as a smoker, suggests that averaging probabilities over multiple ACS Paragon Plus Environment
Environmental Science & Technology
Page 16 of 27
371
ANN model instances may better predict borderline signatures. Furthermore, recent marijuana use did
372
not interfere with proper classification of tobacco smokers. Specifically, for the 2 tobacco smokers
373
(SEQN#41623 and 47126) who reported recent marijuana use (DUQ220Q = 0/DUQ220U = 1), both
374
signatures were assigned as smoker and all had blood 2,5-dimethylfuran level above the smoker
375
biomarker cutpoint. Complete BTEXS data was not available for exclusive marijuana smokers (N=1) for
376
the 2007-2008 NHANES data, but was for the 2013-2014 NAHNES data (N=8) and is discussed below.
377
Although marijuana is most commonly smoked, information regarding the method of use was not
378
collected and, therefore, it is possible that these marijuana users may have exclusively used ingestion.
379
The petroleum blood signature was defined using a surrogate approach because of a lack of VOC
380
data for petroleum exposed individuals. To create crude oil and fuel signatures with which to train the
381
ANN, crude oil and fuel BTEX levels from the Oil Properties Database27 and a literature source for
382
automobile gasoline9 were multiplied by the corresponding Kblood/air and normalized relative to toluene
383
as demonstrated with the cigarette smoke signatures. Despite the fact that fuel composition varies across
384
crude oil sources, manufacturers, and the time of year9,13, especially in terms of absolute levels, relative
385
BTEX levels among low-end (e.g., gasoline) and middle distillates (diesel, jet, and home heating fuel)
386
were similar to those seen in crude oil. This consistency is apparent in Fig. 3 for the adjusted and
387
normalized BTEX signature where standard deviation error bars do not overlap with the exception of
388
toluene and ethylbenzene. Relative total xylene levels were highest, toluene and ethylbenzene levels
389
were moderate and benzene levels were the lowest. These adjusted crude oil and fuel signatures are
390
consistent with previously published blood VOC data resulting from a fuel inhalation exposure subject
391
included in Fig. 3, which was used as a blinded positive control22.
392
Using the ANN model to evaluate the 2013-2014 NHANES data, 12 individuals out of 2906 had
393
exposure patterns that placed them in the crude oil/fuel exposure category with a probability that ranged
394
from 0.52 to 0.98 (Fig. 4a). Of the 12 crude oil/fuel classified individuals, all were nonsmokers based on
395
the 2,5-dimethylfuran cutpoint and questionnaire responses. There was no consistent trend with recent ACS Paragon Plus Environment
Page 17 of 27
Environmental Science & Technology
396
use of gas for cooking (VTQ241A, 4 out of 12), pumping gas or diesel (VTQ244A and VTQ281C, 7 out
397
of 12), or time spent near smoke in the last 10 hours or less (VTQ265B, 1 out of 12), however all had
398
reported having an attached garage (VTQ210, 12 out of 12). Association between attached garage and
399
increased blood BTEX levels has been previously reported40,41. This association is attributed to
400
evaporative fuel emissions from vehicles, which have been identified as an important sources of VOC
401
emissions in the U.S. and other industrialized nations3,42,43.
402
Of the 32 individuals that reported pumping gas in the last hour, one (SEQN# 80105) was classified
403
as crude oil/fuel and had a high m/p-xylene level of 1.140 ng/mL. It is important to note that this
404
identification was made based on relative levels of VOCs and not magnitude because data fed into the
405
model were normalized relative to toluene. In this subset of 32 individuals, the 19 individuals were
406
classified as other/nonsmoker with m/p-xylene blood levels ranging from