Document not found! Please try again

Distinguishing petroleum (crude oil and fuel) from ... - ACS Publications

Dec 5, 2017 - Tobacco and Volatiles Branch, Division of Laboratory Sciences, National Center for Environmental. 6. Health, US Centers for Disease Cont...
1 downloads 0 Views 1MB Size
Subscriber access provided by READING UNIV

Article

Distinguishing petroleum (crude oil and fuel) from smoke exposure within populations based on the relative blood levels of benzene, toluene, ethylbenzene, and xylenes (BTEX), styrene and 2,5dimethylfuran by pattern recognition using artificial neural networks David Michael Chambers, Christopher Mark Reese, Lydia Grace Thornburg, Eduardo Sanchez, Jessica Patricia Rafson, Benjamin C. Blount, John Russell Erskine Ruhl, and Victor Raul De Jesus Environ. Sci. Technol., Just Accepted Manuscript • DOI: 10.1021/acs.est.7b05128 • Publication Date (Web): 07 Dec 2017 Downloaded from http://pubs.acs.org on December 13, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Environmental Science & Technology is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 27

Environmental Science & Technology

1 2 3

Distinguishing petroleum (crude oil and fuel) from smoke exposure within populations based on the relative blood levels of benzene, toluene, ethylbenzene, and xylenes (BTEX), styrene and 2,5dimethylfuran by pattern recognition using artificial neural networks

4 5

Chambers, D.M.*; Reese, C.M.; Thornburg, L.G.; Sanchez, E.; Rafson, J.P.; Blount, B.C.; Ruhl III, J.R.E.; De Jesús, V.R.

6 7

Tobacco and Volatiles Branch, Division of Laboratory Sciences, National Center for Environmental Health, US Centers for Disease Control and Prevention, Atlanta, GA 30341

8

*Corresponding author: 4770 Buford Hwy., NE, Mail Stop F-47, Atlanta, GA 30341

9

E-mail address: [email protected]

10 11

Abstract Studies of exposure to petroleum (crude oil/fuel) often involve monitoring benzene, toluene,

12

ethylbenzene, xylenes (BTEX) and styrene (BTEXS) because of their toxicity and gas-phase prevalence,

13

where exposure is typically by inhalation. However, BTEXS levels in the general U.S. population are

14

primarily from exposure to tobacco smoke, where smokers have blood levels on average up to eight

15

times higher than nonsmokers. This work describes a method using partition theory and artificial neural

16

network (ANN) pattern recognition to classify exposure source based on relative BTEXS and 2,5-

17

dimethylfuran blood levels. A method using surrogate signatures to train the ANN was validated by

18

comparing blood levels among cigarette smokers from the National Health and Nutrition Examination

19

Survey (NHANES) with BTEXS and 2,5-dimethylfuran signatures derived from the smoke of machine-

20

smoked cigarettes. Classification agreement for an ANN model trained with relative VOC levels was up

21

to 99.8 % for nonsmokers and 100.0% for smokers. As such, because there is limited blood level data on

22

individuals exposed to crude oil/fuel, only surrogate signatures derived from crude oil and fuel were

23

used for training the ANN. For the 2007-2008 NHANES data, the ANN model assigned 7 out of 1998

24

specimens (0.35%) and for the 2013-2014 NHANES data 12 out of 2906 specimens (0.41%) to the

25

crude oil/fuel signature category.

ACS Paragon Plus Environment

Environmental Science & Technology

26 27

Page 2 of 27

1. Introduction As oil and natural gas extraction capacity grows to meet increased demand for petroleum-based

28

products, the potential for exposure to volatile organic compounds (VOCs) emanating from drilling1,

29

industrial2, and consumer product3 sources increases. Exposure to VOCs from these sources is important

30

to identify and minimize because these VOCs include toxicants such as benzene, toluene, ethylbenzene,

31

and xylenes (BTEX) that can cause cancer, neurological damage and impairment, and pulmonary and

32

cardiovascular disease4-8. BTEX concentrations are commonly measured in environmental and

33

biomonitoring samples to assess exposure levels associated with petroleum sources because they are

34

prominent volatile components in petroleum9 and are chemically stable10 allowing them to be measured

35

as intact compounds in the body11.

36

Absorption of VOCs occurs through three major routes—inhalation, dermal absorption and

37

ingestion. Of these inhalation is generally the primary absorption route for nonpolar VOCs because

38

100% of the blood circulates through the lungs. Only approximately 20% of the blood passes through

39

the digestive tract and the dermal stratum corneum barrier generally inhibits VOC adsorption12.

40

Furthermore, relative gas-phase levels of BTEX that have similarly low molecular weights are

41

conserved as they evaporate from crude oil or fuel, a nonpolar matrix with relatively low-

42

intermolecular-forces13. For these reasons, relative BTEX blood levels in this study are associated with

43

inhalation absorption.

44

Nevertheless, a common impediment to petroleum exposure assessment using biomarkers such as

45

BTEX has mainly been confounded by exposure to BTEX in cigarette smoke. In fact, tobacco smoke

46

contains ppm levels14 of these and other petroleum biomarkers as both are derived from the biomass.

47

Among cigarette smokers, blood BTEX levels can reach the low ng/mL range, whereas nonsmokers are

48

typically in the pg/mL range11. In the same vein, confounding from other types of smoke exist such as

49

from marijuana smoke, which is increasing in prevalence15,16. Therefore, if including individuals

ACS Paragon Plus Environment

Page 3 of 27

Environmental Science & Technology

50

exposed to smoke in biomonitoring studies that investigates petroleum exposure, smoke exposed

51

individuals need to be accurately classified because of their potential to confound the assessment.

52

In this work, we describe a biomonitoring approach that distinguishes exposure specific to petroleum

53

through pattern recognition of relative biomarker levels of BTEX and 2,5-dimethylfuran using artificial

54

neural network (ANN) modeling. As a way to minimize confounding seen by other significant exposure

55

sources such as tobacco smoke, we investigated the use of relative combustion biomarker levels of

56

BTEX + styrene (BTEXS) and 2,5-dimethylfuran. Styrene levels, although not typically measured with

57

BTEX and not always available, are included in this work where possible because styrene is a

58

monoaromatic similar to BTEX and a BTEX concomitant at appreciable levels in smoke and

59

petroleum11,17. 2,5-Dimethylfuran is included because it is a smoke biomarker that exists at levels

60

comparable to BTEX and styrene18,19. This application of ANN, novel to exposure assessment, arose out

61

of the need to objectively distinguish exposure between different VOC sources involving thousands of

62

specimens. As such, we demonstrate that relative blood BTEXS levels among smokers and cigarette

63

tobacco smoke remain consistent over a range of cigarette brand varieties and smoking techniques. This

64

approach takes advantage of the similarities of the physical properties of BTEXS and partition theory.

65

Next, this approach is used for the identification of individuals having relative levels of these

66

compounds in their blood that are relatable to relative concentrations in petroleum (e.g., crude oil and

67

fuel). Although BTEXS blood levels vary with source concentration, relative proportions can remain

68

consistent and unique to a particular source, especially for nonpolar VOCs with similar chemical

69

properties in nonpolar matrices, as in the case of BTEXS in petroleum.

70

2. Experimental

71

2.1.Population Data

72

Population data were taken from the 2007-2008 and 2013-2014 National Health and Nutrition

73

Examination Survey (NHANES) provided by the National Center for Health Statistics (NCHS)20.

74

Survey data for the 2007-2008 NHANES cycle on recent VOC exposure was included in SAS export ACS Paragon Plus Environment

Environmental Science & Technology

Page 4 of 27

75

file, VOCWB_E.XPT, and recent tobacco use and drug use were collected at the mobile examination

76

center (SAS export file: SMQRTU_E.XPT and DUQ_E.XPT, respectively) on the day of the health

77

exam. Corresponding survey data for the 2013-2014 NHANES cycle on recent VOC exposure was

78

reported in SAS export file VTQ_H.XPT and recent tobacco and drug use in file SMQRTU_H.XPT and

79

DUQ_H.XPT, respectively. Specific questionnaire data regarding recent VOC exposure and tobacco use

80

for evaluating ANN model results are specified in the Discussion section.

81

Whole blood specimens are collected at NHANES mobile examination centers (MEC) by certified

82

phlebotomists and are on average collected approximately 40 min. after check-in. Analyses of these

83

hermetically collected blood specimens were performed by CDC’s Division of Laboratory Sciences

84

(DLS) at the National Center for Environmental Health (NCEH). The specific analysis method used by

85

the DLS for determining VOC blood levels by equilibrium headspace solid phase microextraction

86

(SPME)/gas chromatography (GC)/mass spectrometry (MS) is described in detail elsewhere17,21,22. For

87

accurate quantification this laboratory used stable isotopically labeled analogs for each VOC for internal

88

standardization to offset any matrix and competition effects in sample preparation and analysis23.

89

Whole blood from 3415 participants for the 2007-2008 NHANES (SAS export file:

90

VOCWB_E.XPT) and 3489 for the 2013-2014 NHANES (SAS export file: VOCWB_H.XPT) was

91

analyzed, where both cycles consisted of a half subsample of ≥12 years of age. These whole blood levels

92

for benzene, toluene, ethylbenzene, m/p-xylene, o-xylene, styrene, and 2,5-dimethylfuran were used in

93

two different pattern recognition analyses. One analysis involved classifying specimens only between

94

other/nonsmoker and smoker categories for proof of concept, and the other for classifying specimens

95

among other/nonsmoker, smoker and crude oil/fuel categories. Because styrene was only available in the

96

2007-2008 NHANES data set, only the 2007-2008 NHANES data set was used in pattern recognition

97

proof of concept. Participants were excluded from the analyses if they were missing blood levels for any

98

of the six VOCs. Therefore, for the 2007-2008 NHANES data, 1858 out of 3415 were used where

99

styrene was included (other/nonsmoker vs. smoker classification model) and 1998 out of 3415 where ACS Paragon Plus Environment

Page 5 of 27

Environmental Science & Technology

100

styrene was not used (other/nonsmoker, smoker, crude oil/fuel classification model). For the 2013-2014

101

NHANES data, which did not have styrene data, 2906 specimens of the 3489 were used

102

(other/nonsmoker, smoker, crude oil/fuel classification model).

103

2.2 Estimation of Surrogate Signatures of BTEXS and 2,5-Dimethylfuran from Source Concentration

104

Cigarette tobacco smoke BTEXS and 2,5-dimethylfuran levels used in this study were quantified

105

using Tedlar bag collection of machine smoked cigarettes followed by equilibrium headspace

106

SPME/GC/MS analysis as described elsewhere24. In summary, Tedlar bags fitted with butyl o-rings

107

were attached to an ASM500 smoking machine (Cerulean, Milton Keynes, U.K.) and then analyzed

108

directly by SPME with a specially configured CTC Combi-PAL autosampler (Leap Technologies,

109

Carrboro, NC). SPME collection was by 75-µm carboxen-PDMS SPME fiber (Supelco, Bellefonte, PA)

110

for 1 min at room temperature. As similar to the blood VOC method, isotopically labeled internal

111

standard analogs were used to offset analyte losses and competition effects. These cigarette smoke VOC

112

levels were generated using the International Organization for Standardization (ISO) (3308:2012) and

113

Canadian Intense (CI) protocols from 50 U.S. brand varieties of cigarettes from brand families that

114

comprise 78% of the market share at the time of the sampling14. Surrogate estimates of blood BTEXS

115

and 2,5-dimethylfuran levels were produced by multiplying these smoke levels by their respective

116

blood/air partition constant, Kblood/air, and normalizing the levels relative to toluene, a high detection rate

117

analyte in blood. Although, m/p-xylene blood levels had a slightly higher detection rate than toluene,

118

toluene was used because available crude oil and fuel BTEX data combined m/p-xylene with o-xylene

119

and o-xylene had a lower detection rate than toluene. The corresponding Kblood/air values used for this

120

adjustment were taken from Meulenberg and Vijverberg25, where benzene is 7.37, toluene is 15.11,

121

ethylbenzene is 28.2, styrene is 55.6, total xylene is 36.13 (the average of 33.2 for m-xylene, 38.9 for p-

122

xylene, and 36.3 for o-xylene), and m/p-xylene is 36.05 (the average Kblood/air for m-xylene and p-

123

xylene). A measured value of Kblood/air for 2,5-dimethylfuran was not available, but was estimated to be

124

8.5 using previous published regression data26. The use of these surrogate VOC signatures (i.e., relative ACS Paragon Plus Environment

Environmental Science & Technology

125

BTEXS and 2,5-dimethylfuran levels derived from the source material) is compared with the use of

126

measured blood VOC signatures (i.e., relative BTEXS and 2,5-dimethylfuran levels in blood) for

127

training the ANN.

128

Page 6 of 27

The surrogate blood BTEX (styrene not available) and 2,5-dimethylfuran signatures for crude oil and

129

fuel exposure used for training the ANN were estimated from crude oil and fuel characterizations from

130

the Environment Canada Environmental Technology Centre Oil Properties Database27. Reported in the

131

database are ppm concentrations (by weight) of benzene, toluene, ethylbenzene and total xylene, but not

132

styrene, in a number of the 450 different petroleums. Unweathered crude oil selected were representative

133

of different regions of the world and fuels were representative of different types of fuels. The petroleums

134

as listed in the database with location of the oil field in parentheses that were used include Alaska North

135

Slope (2002) (Alaska), Amauligak (Canada), Arabian Light (2000) (Saudi Arabia), Arabian Medium

136

(Saudi Arabia), BCF 24 (Venezuela), Boscan (Venezuela), Cano Limon (Columbia), Carpinteria

137

(California), Chayvo #6 (Sakhalin), Cold Lake Blend (Alaska), Cusiana (Columbia), Dos Cuadras

138

(California), Ekofisk (Norway), Eugene Island Block 32 ( Louisiana), Isthmus (Mexico), Maya

139

(Mexico), Maya (1997) (Mexico), Mississippi Canyon Block 194 (Mississippi), South Louisiana (2001)

140

(Louisiana), Terra Nova (Canada), West Detlat Block 97 (Louisiana), West Texas (2000) (Texas), West

141

Texas Intermediate (Texas), West Texas Sour (Texas), and Vasconia (Columbia). Fuels included

142

Aviation Gasoline 100LL, Diesel Fuel Oil (2002), Diesel Fuel Oil (Alaska), Diesel Fuel Oil (Southern

143

U.S.A., 1994), Diesel Fuel Oil (Southern U.S.A., 1997), Fuel Oil No. 5 (2000), High Viscosity Fuel Oil,

144

Jet A/Jet A-1, Jet B (Alaska). The automobile gasoline BTEX signature used was from the Total

145

Petroleum Hydrocarbon Criteria Working Group9 because a gasoline signature was not available in

146

Environment Canada’s database. These surrogate blood signatures were created using the same

147

procedure as described for producing the surrogate blood BTEXS signatures for cigarette smoke

148

exposure.

ACS Paragon Plus Environment

Page 7 of 27

149

Environmental Science & Technology

2.3 Statistical Analysis

150

JMP 12.0 was used for all the statistical analyses. Central tendency for blood VOC levels is

151

expressed as geometric mean because the data followed a log normal distribution. For the artificial

152

neural network (ANN) analyses, model parameters were fit via penalized maximum likelihood

153

maximization28. Separate log-likelihoods were computed for each response, and the overall log-

154

likelihood for all the responses was taken as the sum of the log-likelihoods of the individual responses.

155

The model used one hidden layer with 15 hidden nodes each using a hyperbolic tangent (TanH)

156

activation function and was validated using a holdback fraction of 0.33. The optimal number of hidden

157

nodes was determined as the minimal number of nodes that yielded, on average, the best classification

158

accuracy in the training set with regard to the surrogate crude oil/fuel signatures. The output layer had

159

three output nodes, corresponding to smoker, to crude oil/fuel, and to neither smoker nor crude oil/fuel,

160

which is referred to as other/smoker. The input layer variables for the characterization of

161

other/nonsmoker (not exposed to either crude oil/fuel or smoke) and smoker included blood levels for

162

benzene, toluene, ethylbenzene, o-xylene, m/p-xylene, styrene and 2,5-dimethylfuran where smoking

163

status was designated using the 2,5-dimethylfuran cutpoint of 0.014 ng/mL rather than questionnaire

164

data for convenience, as 2,5-dimethylfuran blood levels are available with the BTEXS levels. The

165

surrogate blood levels used as input variables for the classification of crude oil/fuel signatures were

166

calculated from petroleum and fuel source concentrations using the same approach as for creating the

167

surrogate blood levels from cigarette smoke levels described above. Specifically, benzene, toluene,

168

ethylbenzene, total xylene and 2,5-dimethylfuran concentrations were multiplied by their respective

169

Kblood/air and normalized relative to the toluene concentration. 2,5-Dimethylfuran was imputed as below

170

LOD (i.e., LOD/√2) and styrene levels were not used because they were was not available. Models that

171

involved classifying only between smokers and other/nonsmokers used the entire 2007-2008 NHANES

172

data set where there were approximately 4 times more other/nonsmokers than smokers. However, for

173

models created with the other/nonsmoker, smoker and crude oil/fuel categories, the size of the three ACS Paragon Plus Environment

Environmental Science & Technology

174

input groups were adjusted so that each group had similar numbers of results of approximately 500,

175

where 333 of these were set aside for training and 167 for validating the ANN. In this case, the training

176

set for the other/nonsmoker group was randomly sampled to decrease its size to 525 participants, all

177

smokers were used (N=494) and the surrogate crude oil/fuels signatures were oversampled29 by

178

duplicating their results (14 replicates of the 35 petroleums, N=490). The random selection was

179

performed by creating a new column and making the initial data values randomly equal to 1 for 25% or

180

0 for 75% of the data. Rows where the column result equals 1 were used. Output value (probability

181

ranging from 0 to 1) cutpoint for assignment to a category is set to 0.5. Blood levels were normalized to

182

toluene because the surrogate signature used for crude oil and fuel exposure expresses relative level and

183

not absolute concentration. Rows with missing values were ignored.

184

The robustness and appropriateness of using measured vs. surrogate signatures were assessed in an

185

experiment involving classification of smokers and other/nonsmokers using the 2007-2008 NHANES

186

data. The percentage of correct classification assignments was based on 2,5-dimethylfuran cut-point.

187

Note that the 2007-2008 NHANES data had only two categories, other/nonsmoker and smoker, any

188

individuals that may have had a signature corresponding to the surrogate crude oil/fuel signature were

189

initially assigned to either other/nonsmoker or smoker in the training. Following this experiment, an

190

ANN model used for classification of all three groups—other/nonsmoker, smoker, and crude oil/fuel—

191

was constructed by combining the surrogate crude oil/fuel signatures with the other/nonsmoker and

192

smoker signatures from the 2007-2008 NHANES data. Once this final model with the three categories

193

was created, it was applied to the 2013-2014 NHANES data.

194

3. Results

195

Without a sufficient number of known crude oil or fuel blood signatures with which to train the

196

ANN, it was necessary to generate surrogate a BTEX and 2,5-dimethylfuran signature from petroleum

197

concentrations that could be compared with specimen blood levels. For this surrogate signature, blood

ACS Paragon Plus Environment

Page 8 of 27

Page 9 of 27

Environmental Science & Technology

198

BTEX and 2,5-dimethylfuran signatures needed to be estimated from relative levels of these chemicals

199

in different crude oils and fuels by adjusting the relative crude oil or fuel BTEX concentrations based on

200

blood-air partition constant, Kblood/air. This approach is validated in the following data where relative

201

blood BTEXS levels of cigarette smokers are estimated from the relative BTEXS levels measured in

202

cigarette smoke from different cigarettes.

203

Shown in Fig. 1 is a comparison of blood levels measured among smokers with different smoking

204

frequencies. The BTEXS and 2,5-dimethylfuran blood VOC levels were from the 2007-2008 NHANES

205

and are baseline adjusted mean levels for participants who reported smoking only 5, 10, 15, 20, 30, or 40

206

cigarettes per day (CPD). The number of individuals (N) in these categories ranged from 14 to 91 as

207

noted in the figure caption. Baseline adjustment of cigarette smoker blood levels was performed by

208

subtracting the mean blood level of individuals that were classified as other/nonsmoker (N=4876). This

209

adjustment was only performed on Fig. 1 and 2 data for the sake of comparing BTEXS and 2,5-

210

dimethylfuran levels among smokers and with levels in cigarette smoke in Fig. 2. It is important to note

211

that the data used to produce the crude oil/fuel ANN classification models described later are not

212

baseline adjusted. Other/nonsmoker were identified as having blood 2,5-dimethylfuran levels < 0.014

213

ng/mL. Levels in Fig. 1 were not normalized so that magnitude of BTEXS exposure could be compared.

214

Error bars were not added to this figure because they complicated the graphics, however they are

215

included in the composite signature shown in Fig. 2. Upon normalization of each CPD category to

216

toluene, the percent relative standard deviation (RSD) that existed across these selected CPD categories

217

were 9.0% for m/p-xylene, 6.3% for benzene, 9.3% for ethylbenzene, 13.3% for styrene, 14.7% for o-

218

xylene, and 5.5% for 2,5-dimethylfuran.

219

Surrogate blood BTEXS signatures estimated from the cigarette smoke of 50 U.S. brand varieties

220

smoked using two different regimens are compared with a normalized blood BTEXS and 2,5-

221

dimethylfuran composite from smokers in Fig. 2. These surrogate signatures were created by first

222

multiplying the amount of the compounds per smoked cigarette (generated using either the Canadian ACS Paragon Plus Environment

Environmental Science & Technology

Page 10 of 27

223

Intense and ISO protocols) by the respective Kblood/air, and then normalizing these levels with respect to

224

toluene. The BTEXS and 2,5-dimethylfuran blood signature is a composite for all 2007-2008 NHANES

225

smokers (N=456) normalized to toluene and background subtracted, where smokers are classified using

226

a 2,5-dimethylfuran cutpoint level above 0.014 ng/mL. Error bars for Fig. 2 are represented as 1

227

standard error.

228

The same procedure used to estimate blood VOC level signatures for smokers from cigarette smoke

229

was used to generate surrogate blood signatures consistent with those deduced from crude oil or fuel.

230

Specifically, crude oil or fuel benzene, toluene, ethylbenzene, total xylene and 2,5-dimethylfuran ppm

231

(by weight) concentrations were multiplied by their respective Kblood/air and normalized relative to

232

toluene. In this application, BTEX levels among different crude oil and fuels were obtained primarily

233

from Environment Canada’s Oil Properties Database. Petroleums selected included 25 crude oils from

234

the United States and other large oil producing countries, and nine fuels27. Because a signature for

235

automobile gasoline was not available in this database, the one from the Total Petroleum Hydrocarbon

236

Working Group was used9. In the Environment Canada’s database, levels of xylene’s structural isomers

237

(m-xylene, p-xylene, and o-xylene) were combined and there were no data for styrene. Levels for 2,5-

238

dimethylfuran, which is not a substantial compound in petroleum, were added to the signature and set to

239

below the LOD (imputed as LOD/√2). Correlation between analyte pairs were high to moderate where

240

Pearson correlation coefficients were 0.82 (toluene vs. xylenes), 0.80 (benzene vs. toluene), 0.74

241

(benzene vs. ethylbenzene), 0.71 (benzene vs. xylenes), and 0.59 (toluene vs. ethylbenzene), and 0.42

242

(ethylbenzene vs. xylenes). Accordingly, the BTEX levels for different petroleums were adjusted by

243

multiplying them by their corresponding Kblood/air, and then normalizing relative to toluene for

244

comparison with the NHANES blood signatures. Shown in Fig. 3 is a comparison of the raw composite

245

petroleum signature, the surrogate blood level composite signature derived from the petroleum

246

signatures (adjusted based on blood/air partition constants and normalized relative to toluene), and an

247

individual blood signature from a known fuel exposure22. ACS Paragon Plus Environment

Page 11 of 27

Environmental Science & Technology

248

Shown in Fig. 4 are density plots characterizing ANN predictions for the 2013-2014 NHANES data

249

using a model trained and validated with 2007-2008 NHANES data. Toluene, total xylenes (m/p-xylene

250

+ o-xylene), benzene, ethylbenzene, and 2,5-dimethylfuran levels were used as input variables in

251

modeling. For the ANN model the prediction variables were other/nonsmoker, smoker, and crude

252

oil/fuel groups, where training signatures were from individuals identified as other/nonsmoker and

253

smokers (established using blood 2,5-methylfuran cutpoint of 0.014 ng/mL11) and surrogate crude

254

oil/fuel signatures. The training set for the other/nonsmoker signature potentially included crude oil/fuel

255

signatures that had not yet been identified as so. Fitting the NHANES data repeatedly ten times resulted

256

in varying assignments where the number of predicted crude oil/fuel signatures ranged from 7 to 24 for

257

the 2013-2014 NHANES and from 1 to 7 for the 2007-2008 NHANES. Each of these 10 models

258

properly identified a positive control blood signature from an individual with known fuel exposure

259

presumed to be from inhalation22 with probabilities ranging from 0.9800 to 0.9999. Visual inspection of

260

each sample identified as having the crude oil/fuel signature by any of these 10 models were generally

261

similar to the surrogate and known exposure blood sample signatures. The first model fit of the 2007-

262

2008 NHANES data classified 7 individuals as crude oil/fuel (0.35%), 1591 as other/nonsmoker

263

(79.63%) and 400 as smoker (20.02%). Prediction results using the 2013-2014 NHANES blood VOC

264

data from the first model fit are graphed in Fig. 4a and consisted of 12 crude oil/fuel (0.41%), 2290

265

other/nonsmoker (78.80%) and 604 smoker (20.79%). Note that the crude oil/fuel probability cutpoint is

266

0.5, however because of smoothing by JMP, the graphical boundary for the crude oil/fuel distribution

267

extends below 0.5. Most smokers and other/nonsmoker had a probability below 0.25 with 5

268

other/nonsmoker and 1 smoker between 0.25 and 0.5. Shown in Fig. 4b are corresponding probabilities

269

for other/nonsmoker and smoker with the petroleum/fuel group identified in Fig. 4a. Among the samples

270

classified as either other/nonsmoker or smoker, most (i.e., 2791) had probabilities less than 0.25 or

271

greater than 0.75, where 103 samples fell between these limits.

ACS Paragon Plus Environment

Environmental Science & Technology

Page 12 of 27

272

Compared in Fig. 5 are BTEXS and 2,5-dimethylfuran composite signatures based on assignments

273

for the three exposure categories (crude oil/fuel, other/nonsmoker, and smoker) for the 2007-2008 and

274

2013-2014 NHANES data with m/p-xylene and o-xylene separated and with styrene included for the

275

2007-2008 data. Blood concentrations are expressed as geometric means because the data are

276

lognormally distributed. Error bars associated with geometric mean are not included because the

277

geometric means are presented on a linear scale. Individuals classified in the crude oil/fuel group

278

generally had levels of combined xylene that were higher than toluene, but overall high toluene and

279

ethylbenzene as well. Specimens classified as other/nonsmoker typically had lowest levels of BTEXS

280

and 2,5-dimethylfuran, where toluene and total xylenes were typically comparable. The signature for the

281

smoker category had higher levels of toluene and comparable levels of xylenes and benzene. Benzene

282

and styrene levels were comparable between smoker and crude oil/fuel groups.

283

4. Discussion

284

Elimination half-life for BTEXS, which have similar blood stabilities, polarities and solubilities, fall

285

within a narrow range from 8-30 hrs and are generally an order of magnitude longer than α-phase half-

286

lives30-33. Therefore, blood levels can persist above detectable levels even if blood samples are collected

287

beyond the α-phase half-life, where α-phase half-life is the half-life of the VOC if it is only in the blood

288

(compartment) and elimination half-life incorporates equilibration of the VOCs between the blood and

289

the body tissues (muscle, adipose, organ compartments). Because we are evaluating relative BTEXS and

290

2,5-dimethylfuran blood levels rather than absolute blood concentrations, exposure intensity is

291

unimportant as long as relative BTEXS and 2,5-dimethylfuran levels remain conserved. Studies on

292

weathering of petroleums demonstrate fast evaporation of BTEX because they exist as volatile

293

compounds with similarly low molecular weights in a relatively nonpolar matrix, causing relative gas

294

phase levels to be generally conserved13. Furthermore, BTEXS signatures resulting from crude oil/fuel

295

exposure can be confounded by exposure to tobacco smoke, especially from cigarette use, as smoking

ACS Paragon Plus Environment

Page 13 of 27

Environmental Science & Technology

296

prevalence in the United States is 16.4%34 and tobacco smoke has high levels of BTEXS

297

(µg/cigarette)14. Individuals who are not exposed to smoke often have BTEXS blood levels near or

298

below detection as this VOC source substantially drives BTEXS levels in the general U.S. population11.

299

Based on findings shown in Figs. 1 and 2, blood BTEXS signature among smokers is conserved (based

300

on relative level) despite demographic and smoking technique differences, although absolute levels vary

301

substantially. This consistency in relative blood BTEXS level is apparent from the moderate to high

302

correlation among the BTEXS compounds for smokers who report smoking between 5 and 40 cigarettes

303

per day (Fig. 1), where Pearson r values range from 0.55 (o-xylene vs. benzene) to 0.96 (m/p-xylene vs.

304

ethylbenzene) and are consistent with those previously reported11. The basis for this consistency is

305

attributed to the conservation of relative BTEXS and 2,5-dimethylfuran levels from mainstream

306

cigarette smoke where these relative levels have been shown to exist within a narrow range.

307

Specifically, in a study of 50 different U.S. brand varieties generated with two different smoking

308

machine protocols, the Pearson r values among the BTEXS ranged from 0.83 (o-xylene vs. benzene) to

309

0.98 (m/p-xylene vs. o-xylene)14. This high degree of consistency and correlation persists despite

310

differences in influential cigarette parameters such as percent tip ventilation and number of puffs per

311

cigarette14 and confirms that there is a characteristic BTEXS signature (based on relative level) among

312

the different brand varieties.

313

This consistency in relative BTEXS levels in mainstream smoke makes possible a proportional

314

relationship between tobacco smoke and blood levels that is attributed to quick equilibration of inhaled

315

VOCs. Blood VOC levels dependent on blood-air partitioning, Kblood/air, equilibrate quickly because of

316

relatively small pulmonary venous blood volume and high flowrate where the entire body blood volume

317

completely circulates through the lungs in approximately 1 min35,36. This relationship is confirmed by

318

the similarity of smoker blood levels and VOC levels in machine-generated smoke that has been

319

adjusted with Kblood/air by multiplying smoke BTEXS and 2,5-dimethylfuran levels by their respective

320

Kblood/air. During exposure to the BTEXS and 2,5-dimethylfuran in cigarette smoke, the proportions of ACS Paragon Plus Environment

Environmental Science & Technology

Page 14 of 27

321

VOC levels in the blood are based on their affinity that corresponds to Kblood/air. Because magnitude of

322

BTEXS and 2,5-dimethylfuran levels can vary depending on smoking topography, that is, the manner in

323

which a smoker smokes a cigarette, two machine smoking protocols with different smoking intensities

324

were evaluated—the ISO and CI37,38. Shown in Fig. 2 is a comparison of the normalized adjusted

325

BTEXS and 2,5-dimethylfuran signatures produced by the ISO and CI protocols from a U.S. market

326

study of 50 brand varieties. Although the signatures associated with these two protocols are similar,

327

there are some minor differences that are statistically significant. Specifically, m/p-xylene and styrene

328

levels were substantially lower relative to toluene for the ISO protocol than with the CI protocol.

329

Because the blood BTEXS and 2,5-dimethylfuran signature of smokers more closely resembles that

330

produced by the ISO protocol than the CI protocol (Fig. 2), the ISO signature was used in ANN training

331

comparison experiments discussed below. The similarity between BTEXS and 2,5-dimethylfuran levels

332

in smokers and adjusted BTEXS and 2,5-dimethylfuran levels in smoke generated by the ISO method is

333

a noteworthy finding, and underscores the consistency of relative BTEXS and 2,5-dimethylfuran levels

334

in cigarette smoke and among smokers.

335

Because of the consistency of relative blood levels of BTEXS and 2,5-dimethylfuran among smokers

336

given vastly different exposure magnitudes, we investigated whether a BTEXS signature associated with

337

crude oil or fuel could be identified in the general population using ANN pattern recognition. To train

338

the ANN, best practice is to use relative levels of blood BTEXS of petroleum exposed individuals,

339

however because such data is not available we used surrogates signatures derived from different crude

340

oils and fuel. The limitation in using surrogate signatures produced from source concentrations to train

341

the ANN is that they lack magnitude information, which can be used to adjust the contribution of any

342

baseline levels that might exist. For example, in the comparison shown in Fig. 2, the mean baseline level

343

of nonsmokers was subtracted from the smoker composite signature. In reality, the smoker composite

344

includes baseline levels as shown in Fig. 5. In models using surrogate signatures constructed from

345

cigarette smoke concentrations, this baseline level was absent. The resulting ANN models (10 model ACS Paragon Plus Environment

Page 15 of 27

Environmental Science & Technology

346

iterations) made with these surrogate signatures agreed with classifications defined by the 0.014 ng/mL

347

2,5-dimethylfuran cutpoint (discussed below) for 99.5–99.9 % of nonsmokers (N=1502), but only 85.7–

348

90.7 % of smokers (N=356). On the contrary, ANN models (10 model iterations) created using blood

349

BTEXS signatures from smokers where classification was defined by the 2,5-dimethylfuran cutpoint,

350

classification agreement of 99.0–99.8 % for nonsmokers (N-1502) and 97.5–100.0% for smokers

351

(N=356) were achieved. The lower level of classification agreement using surrogate signatures is

352

attributed to not being able to add a baseline level to surrogate signatures causing the model to classify

353

individuals with low-level smoke BTEXS exposure as other/nonsmoker as the baseline level becomes a

354

more prominent component of the signature. In conclusion, using the surrogate signature slightly under

355

estimated exposed individuals.

356

For training the ANN using smokers, smokers were initially classified using the smoke biomarker

357

2,5-dimethylfuran with a cutpoint of 0.014 ng/mL. It was convenient to use 2,5-dimethylfuran levels

358

versus questionnaire data because 2,5-dimethylfuran is measured simultaneously with the other VOCs

359

and is a well-established smoking biomarker19,39. Training and validation of the ANN involved using the

360

the 2007-2008 NHANES data and did not include specimens with missing data. The ANN model

361

classified participants who reported recent use of other combustible tobacco products [e.g., pipe

362

(SMQ755 = 1, N = 1), cigar (SMQ785 = 1, N = 6)] as cigarette smokers, with the exception of three

363

cigar smokers. Two of these cigar smokers (SEQN# 46111 and 47799) had 2,5-dimethylfuran levels

364

below the established 0.014 ng/mL cutpoint and were identified as nonsmokers in all 10 iterations of the

365

model. BTEXS levels for these two samples are more consistent with those of the other/nonsmoker

366

category having m/p-xylene below 0.1 ng/mL. The third cigar smoker (SEQN# 51320) had a blood 2,5-

367

dimethylfuran level of 0.055 ng/mL and m/p-xylene = 0.457 ng/mL, but was classified as an

368

other/nonsmoker in 5 of the 10 ANN models, where the smoker classification probability average was

369

0.52 ± 0.25 and other/nonsmoker classification probability average was 0.48 ± 0.25. This average result,

370

which assigns this third cigar smoker as a smoker, suggests that averaging probabilities over multiple ACS Paragon Plus Environment

Environmental Science & Technology

Page 16 of 27

371

ANN model instances may better predict borderline signatures. Furthermore, recent marijuana use did

372

not interfere with proper classification of tobacco smokers. Specifically, for the 2 tobacco smokers

373

(SEQN#41623 and 47126) who reported recent marijuana use (DUQ220Q = 0/DUQ220U = 1), both

374

signatures were assigned as smoker and all had blood 2,5-dimethylfuran level above the smoker

375

biomarker cutpoint. Complete BTEXS data was not available for exclusive marijuana smokers (N=1) for

376

the 2007-2008 NHANES data, but was for the 2013-2014 NAHNES data (N=8) and is discussed below.

377

Although marijuana is most commonly smoked, information regarding the method of use was not

378

collected and, therefore, it is possible that these marijuana users may have exclusively used ingestion.

379

The petroleum blood signature was defined using a surrogate approach because of a lack of VOC

380

data for petroleum exposed individuals. To create crude oil and fuel signatures with which to train the

381

ANN, crude oil and fuel BTEX levels from the Oil Properties Database27 and a literature source for

382

automobile gasoline9 were multiplied by the corresponding Kblood/air and normalized relative to toluene

383

as demonstrated with the cigarette smoke signatures. Despite the fact that fuel composition varies across

384

crude oil sources, manufacturers, and the time of year9,13, especially in terms of absolute levels, relative

385

BTEX levels among low-end (e.g., gasoline) and middle distillates (diesel, jet, and home heating fuel)

386

were similar to those seen in crude oil. This consistency is apparent in Fig. 3 for the adjusted and

387

normalized BTEX signature where standard deviation error bars do not overlap with the exception of

388

toluene and ethylbenzene. Relative total xylene levels were highest, toluene and ethylbenzene levels

389

were moderate and benzene levels were the lowest. These adjusted crude oil and fuel signatures are

390

consistent with previously published blood VOC data resulting from a fuel inhalation exposure subject

391

included in Fig. 3, which was used as a blinded positive control22.

392

Using the ANN model to evaluate the 2013-2014 NHANES data, 12 individuals out of 2906 had

393

exposure patterns that placed them in the crude oil/fuel exposure category with a probability that ranged

394

from 0.52 to 0.98 (Fig. 4a). Of the 12 crude oil/fuel classified individuals, all were nonsmokers based on

395

the 2,5-dimethylfuran cutpoint and questionnaire responses. There was no consistent trend with recent ACS Paragon Plus Environment

Page 17 of 27

Environmental Science & Technology

396

use of gas for cooking (VTQ241A, 4 out of 12), pumping gas or diesel (VTQ244A and VTQ281C, 7 out

397

of 12), or time spent near smoke in the last 10 hours or less (VTQ265B, 1 out of 12), however all had

398

reported having an attached garage (VTQ210, 12 out of 12). Association between attached garage and

399

increased blood BTEX levels has been previously reported40,41. This association is attributed to

400

evaporative fuel emissions from vehicles, which have been identified as an important sources of VOC

401

emissions in the U.S. and other industrialized nations3,42,43.

402

Of the 32 individuals that reported pumping gas in the last hour, one (SEQN# 80105) was classified

403

as crude oil/fuel and had a high m/p-xylene level of 1.140 ng/mL. It is important to note that this

404

identification was made based on relative levels of VOCs and not magnitude because data fed into the

405

model were normalized relative to toluene. In this subset of 32 individuals, the 19 individuals were

406

classified as other/nonsmoker with m/p-xylene blood levels ranging from