Mining Available Data from the United States Environmental

Aug 12, 2016 - The data mining method, however, is less transparent in documenting emissions by technology, and some emissions may be from the product...
2 downloads 11 Views 831KB Size
Subscriber access provided by Northern Illinois University

Policy Analysis

Mining Available Data from the United States Environmental Protection Agency to Support Rapid Life Cycle Inventory Modeling of Chemical Manufacturing Sarah Allanson Cashman, David Edward Meyer, Ashley Edelen, Wesley W. Ingwersen, John Abraham, William Martin Barrett, Michael Albert Gonzalez, Paul Randall, Gerardo J. Ruiz-Mercado, and Raymond L. Smith Environ. Sci. Technol., Just Accepted Manuscript • DOI: 10.1021/acs.est.6b02160 • Publication Date (Web): 12 Aug 2016 Downloaded from http://pubs.acs.org on August 16, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Environmental Science & Technology is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 28

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Environmental Science & Technology

Mining Available Data from the United States Environmental Protection Agency to Support Rapid Life Cycle Inventory Modeling of Chemical Manufacturing Sarah A. Cashman1, David E. Meyer2,*, Ashley Edelen3, Wesley Ingwersen2, John Abraham2, William Barrett2, Michael Gonzalez2, Paul Randall2, Gerardo Ruiz-Mercado2, Raymond L. Smith2 1

Eastern Research Group, 110 Hartwell Ave., Lexington, MA 02421, USA United States Environmental Protection Agency, National Risk Management Research Laboratory, 26 West Martin Luther King Drive, Cincinnati, OH 45268, USA 3 Oak Ridge Institute of Science and Education (ORISE) Postdoctoral Research Participant hosted by U.S. Environmental Protection Agency Office of Research and Development, 26 West Martin Luther King Drive, Cincinnati, OH 45268, USA 2

*Corresponding author: David E. Meyer, PhD U.S. Environmental Protection Agency National Risk Management Research Laboratory 26 West Martin Luther King Drive (MS 483) Cincinnati, OH 45268 Phone: (513) 569-7194 Email: [email protected]

22

Journal: Environmental Science & Technology

23

Keywords: LCI, Data Mining, Chemical Manufacture, US EPA Database, Semantic

24

Abstract

25

Demands for quick and accurate life cycle assessments create a need for methods to rapidly

26

generate reliable life cycle inventories (LCI). Data mining is a suitable tool for this purpose, especially

27

given the large amount of available governmental data. These data are typically applied to LCIs on a

28

case-by-case basis. As linked open data becomes more prevalent, it may be possible to automate LCI

29

using data mining by establishing a reproducible approach for identifying, extracting, and processing the

30

data. This work proposes a method to standardize and eventually automate the discovery and use of

31

publically available data at the United States Environmental Protection Agency for chemical

32

manufacturing LCI. The method is developed using a case study of acetic acid. The data quality and gap

33

analyses for the generated inventory found the selected data sources can provide information of equal

34

or better reliability and representativeness on air, water, hazardous waste, on-site energy usage, and

35

production volumes, but with key data gaps including material inputs, water usage, purchased

36

electricity, and transportation requirements. A comparison of the generated LCI with existing data

37

revealed the data mining inventory is in reasonable agreement with existing data and may provide a

38

more comprehensive inventory of air emissions and water discharges. The case study highlighted i ACS Paragon Plus Environment

Environmental Science & Technology

39

challenges for current data management practices that must be overcome to successfully automate the

40

method using semantic technology. Benefits of the method are the openly available data can be

41

compiled in a standardized and transparent approach that supports potential automation, with

42

flexibility to incorporate new data sources as needed.

ii ACS Paragon Plus Environment

Page 2 of 28

Page 3 of 28

43

Environmental Science & Technology

Introduction

44

Recently, the National Research Council of the National Academy of Sciences made

45

recommendations to the United States Environmental Protection Agency (US EPA) for implementing

46

sustainability as part of its decision-making process by incorporating life cycle thinking.1 A key tool for

47

this purpose is Life Cycle Assessment (LCA) as defined by the ISO 14040 and 14044 Standards.2,3 The

48

practicality of adopting LCA to support decision-making can be limited by the generic nature of the

49

assessment and the resource-intensive nature of data collection and life cycle inventory (LCI) modeling.

50

However, efforts to reduce the generic nature of LCA are further contributing to the second challenge by

51

increasing the quantity and type of data required for performing assessments. To address these growing

52

data requirements, there is a need to develop automated data mining methods that can efficiently

53

generate reliable LCIs using readily accessible secondary data sources, such as government databases

54

describing the chemical manufacturing sector. While the various data can fill a variety of inventory

55

modeling needs, there are challenges to automating their seamless integration into a suitable LCI.

56

Data Mining and Life Cycle Inventory Modeling. An LCI is an accounting of the material, energy,

57

waste, and emission flows associated with the life cycle of a product. An LCI can be provided at the

58

system level and describe the entire product life cycle from raw material acquisition to final disposal

59

(i.e., cradle-to-grave), or can be created for specific unit processes within the product life cycle (i.e.,

60

cradle-to-gate, gate-to-gate, or gate-to-grave) and sequentially linked to cover the life cycle. The gate-

61

to-gate (or gated) approach is preferred because the modular nature of the inventories enables the

62

same data to be applied across multiple LCAs.

63

Chemical and product manufacturing inventories are typically the most challenging to model

64

because of the lack of available data and large number of material and energy flows. Ideally, the

65

inventory data are obtained directly from manufacturers. However, companies often treat the required

66

information as confidential business information (CBI) and are unwilling to make the data public. One

67

exception is the USLCI database because the chemical LCIs were primarily created through industry

68

association efforts to aggregate and sanitize facility-level CBI for use as industry benchmarks.4,5 Such

69

industry association studies are done on a case-by-case basis, require significant resources, may lack full

70

industry coverage, and are challenging to automatically update. Detailed process-level industry

71

association LCIs derived from primary facility-level data collection should be supplemented with LCI

72

information from other sources. In the absence of the necessary primary data, LCIs must be constructed

73

using secondary data sources. Given the multitude of potential data sources, this approach to inventory

74

modeling will be most efficient using data mining. 1 ACS Paragon Plus Environment

Environmental Science & Technology

75

Data mining, or knowledge discovery in databases, is the study of collecting, harmonizing,

76

processing, and analyzing data to gain useful insights.6 While the term data mining can be used loosely,

77

it generally refers to extracting meaningful new information from data. However, data mining is only a

78

tool and does not eliminate the need to know the system under study or to understand the limitations

79

and proper use of the data. It is often applied for inventory modeling in LCA when a practitioner does

80

not have access to primary data and must rely on secondary data from journal articles, patents,

81

government reports, and existing LCI databases to fill data gaps. However, data mining for this purpose

82

is typically not conducive to rapid and reproducible inventory modeling. For example, Suh discussed the

83

development of a database of environmental flows for use in economic input-output (IO) LCA.7 This

84

involved integrating sources from US EPA such as the Toxics Release Inventory (TRI) and National

85

Emissions Inventory (NEI) to develop sector-average emission data for all industries covered in the IO

86

tables from the US Bureau of Economic Analysis (BEA).8,9,10 The key issues identified for this approach

87

included overlapping chemical coverage of data sources and limitations of the data based on reporting

88

requirements. These thoughts were echoed by Sengupta et al. when developing a method to include

89

data from NEI and TRI in process-based LCI for national-average biofuel production.11 Although both

90

authors chose to use NEI as the primary source and supplement the resulting LCIs with information from

91

TRI, the approach to calculate average emission quantities varied. Suh combined the sectoral emission

92

data from TRI and NEI with the corresponding sectoral economic output from BEA.7 However, Sengupta

93

et al. discovered this can lead to underestimation of emissions if all facilities don’t report to both data

94

sources.11 To correct for this, Sengupta and coworkers matched releases (e.g., emissions) with output at

95

the facility-level to calculate facility-by-facility release factors, with sector averages only reflecting

96

facilities for which both production and releases could be assigned.11 An advantage to this approach is

97

the incorporation of all relevant site-specific factors (e.g., differences in equipment and production

98

methods) into the aggregated US gated chemical manufacturer LCI. Using facility-level information as

99

the foundation of the LCI also allows for aggregation at different spatial or temporal scales depending on

100

the desired end use of the LCI.

101

The results of these two examples are LCIs with a varying degree of uncertainty and

102

heterogeneity. For LCA, controlling and/or reducing the uncertainty using quality data increases the

103

defensibility of life cycle impact assessment (LCIA) results. This may be achievable using data mining to

104

consistently select, harmonize, and process higher quality data into LCIs. Furthermore, the examples

105

were implemented with a focus on sector-specific data that incorporates all unit operations within the

106

included facilities, which is not adequate for modeling a modular gated LCI. Thus, a new method for 2 ACS Paragon Plus Environment

Page 4 of 28

Page 5 of 28

Environmental Science & Technology

107

modeling chemical manufacturing at the facility-level is needed. Methodical data mining is a suitable

108

approach for this purpose because of the ability to include metadata, or data describing the data, with

109

the LCI for data quality analysis.

110

Transitioning to Linked Open Data for Inventory Modeling. The United States Open

111

Government Directive was issued to all federal agencies to make government transparent, participatory,

112

and collaborative.12 As a result, Federal agencies are making data available publically as linked open data

113

(LOD), 13 a machine-readable format that supports facilitated searching and retrieval. This transition has

114

been guided by the emergence of the semantic web, an approach to structure and describe data to

115

enable sharing across multiple systems for a variety of applications.14 The US EPA is exploring semantic

116

management of LOD for key environmental databases such as TRI and the Chemical Data Reporting tool

117

(CDR).8,15,16 This governmental shift can support a transition to semantic data mining for improved

118

inventory modeling.

119

Research on the use of semantic tools to support LCI has focused primarily on the

120

interoperability and accessibility of existing inventory data to make LCA more efficient and less resource

121

intensive. For example, Bertin and coworkers applied semantic modeling to improve identification of

122

linkages in inventory databases within their CarbonDB life cycle database.17,18 Similarly, Perez and

123

coworkers proposed the use of LOD within the LCA community as an example of using semantic

124

modeling to enhance knowledge discovery within life cycle models.19 Cooper and coworkers identified

125

the use of Semantic Web technology as a means to create interoperable “Big Data” across the various

126

public LCA repositories.20 Ingwersen and coworkers described the implementation of the Resource

127

Description Framework (RDF), a set of Semantic Web standards, to facilitate data harmonization and

128

enable the creation of an LOD LCA network.21 Finally, Zhang and coworkers, demonstrated the

129

integration of a semantic LCI database with the openLCA software platform to support streamlined life

130

cycle modeling based on product-specific querying.22,23

131

Towards Rapid Inventory Modeling. Moving beyond the basic objective of data interoperability

132

and examining how semantic modeling can be harnessed to support rapid inventory modeling offers the

133

potential to automate the process and greatly reduce the resources required to conduct an LCA.

134

However, it is important to first develop a conceptual approach for consistent data mining to guide the

135

development of semantic inventory modeling and identify potential challenges with using LOD. The

136

calculations required to transform mined data into meaningful inventory data need to be documented

137

so they can be translated into machine-readable queries. 3 ACS Paragon Plus Environment

Environmental Science & Technology

138

The US EPA maintains large repositories of environmental data, with much of these data

139

obtained directly from state governments and industry as part of regulatory reporting. Given that

140

environmental data accounts for a majority of an LCI, US EPA data is a desirable source upon which to

141

begin developing an automated inventory modeling method. However, various factors may affect the

142

quality of the data for this purpose, including CBI claims by businesses for information contained in

143

regulatory filings and variability in both rule reporting requirements and how the raw data is processed

144

into databases. These factors will then influence the quality of inventories generated using the data.

145

Identifying data that is publically available and of sufficient quality for decision support is key if this

146

method is to be useful to the LCA community at large. In order to achieve the long term vision of an

147

automated inventory modeling system, the objectives of this research are: (1) identify public data

148

sources within US EPA that support inventory modeling; (2) propose a method to apply data mining to

149

US EPA data for inventory modeling; (3) develop a gated LCI for the manufacture of a case study

150

chemical using the data mining method; (4) evaluate the quality of the case study inventory and identify

151

data gaps; (5) investigate the utility of the generated inventory for decision support; and (6) identify

152

challenges to automating the data mining method using semantic tools based on insights provided by

153

the case study. The results of this research will advance the practice of inventory modeling for LCA by

154

supporting development of a semantic modeling tool that generates LCI in a rapid and consistent way.

155

Methods

156

Data Source Identification. The scope of this work was limited to publically available US EPA

157

data sources to support widespread use of an eventual automated inventory tool. The method

158

described here focuses on the collection of data and subsequent calculations for inventory modeling,

159

with only limited discussion of potential automation. The initial focus was to identify suitable electronic

160

sources at US EPA as opposed to other formats that would be more difficult to translate into machine-

161

readable data. A search of US EPA’s website identified ten potential data sources for chemical

162

manufacturing LCIs as summarized in Table S1 of the Supporting Information (SI). Of these, six were

163

selected for further use based on the data sources’ applicability to creating chemical-specific LCI unit

164

processes:

165 166 167 168 169 170 171

1. 2. 3. 4.

CDR: facility-level data on chemical production; NEI: facility- and process-level criteria air pollutants (CAP) and hazardous air pollutants (HAP); TRI: facility-level toxic air, water, and waste flows; Discharge Monitoring Report (DMR) Pollutant Loading Tool: National Pollutant Discharge Elimination System-permit and TRI-water discharges at the facility-level; 5. Greenhouse Gas Reporting Program (GHGRP): greenhouse gases (GHGs) at the facility- and unitlevel; and 4 ACS Paragon Plus Environment

Page 6 of 28

Page 7 of 28

172 173

Environmental Science & Technology

6. Resource Conservation and Recovery Act Information (RCRAInfo): type, fate, and quantity of hazardous waste generated at the facility-level.

174

The selected sources provide inventory data on a facility-level rather than a product-level

175

because they were initially developed for facility-level regulatory purposes and/or for informing the

176

public. The identified databases build on the work completed by Sengupta and coworkers by compiling

177

facility-specific information that extends the possible sources beyond NEI and TRI.11 This allows for the

178

consideration of GHGs, hazardous wastes, more comprehensive water discharges, and chemical

179

production volumes (PVs) in addition to the CAP, HAP, and TRI-reportable chemical data from NEI and

180

TRI. The facility-level focus, when combined with the annual reporting basis, make it critical to obtain

181

facility PV data. IO LCA is able to handle this limitation because it is normalized by US economic activity.

182

For chemicals, CDR provides the necessary annual PVs for most chemicals manufactured in the US

183

except for those excluded through exemptions. Thus, the method developed here may be applied to

184

other sectors if suitable PV data for these other sectors are identified.

185

Selection of a Suitable Case Study. Development of the data mining method through case study

186

enabled identification of the step-by-step activities and calculations needed to leverage the data for LCIs

187

while supporting a better gap analysis of the resulting output. When selecting the chemical case study, a

188

key criterion was the availability of existing LCI unit processes to which the data mining inventory could

189

be compared. Available US-specific LCI unit processes for chemicals were first identified in the USLCI

190

database.4 Of the available chemicals, a relatively simple, widely used chemical with known chemical

191

production processes was preferred. Acetic acid was selected due to its comparatively simple chemistry

192

and common usage. There are currently three main routes for production of acetic acid: methanol

193

carbonylation, acetaldehyde oxidation, and butane-naptha catalytic liquid-phase oxidation, with the

194

greatest commercial production of virgin synthetic acetic acid based on methanol carbonylation.24 All

195

production routes for acetic acid are described in detail in Section 1 of the SI. For comparison, the USLCI

196

database contains inventory for acetic acid via methanol carbonylation while the European ecoinvent

197

database has datasets representing all three acetic acid production processes that can be combined in a

198

market average.25

199

System Boundaries. The focus of the rapid inventory method is gate-to-gate production of a

200

chemical. The boundaries for this process were restricted to the acetic acid production process. As such,

201

production of feedstocks and energy, as well as the treatment of waste were modeled as separate

202

inventories connected through intermediate flows. As will be discussed in the description of data quality

5 ACS Paragon Plus Environment

Environmental Science & Technology

203

analysis methods, a theoretical process flow diagram was developed for acetic acid synthesis and used

204

to evaluate the completeness of the process.

205

Data Mining Method for Inventory Modeling. The data mining and calculation activities

206

performed during the case study were documented as a framework of sequential steps for application

207

to a generic chemical product as illustrated in Figure 1. A benefit of this approach is the ability to

208

separate emissions that fall outside the process boundaries, such as those related to on-site fuel

209

combustion for energy, and use them to calculate the required fuel or energy inputs for the process.

210

Identifying and segregating this data is important for the modular use of gated chemical LCI in broader

211

product system inventories because it enables a manufacturing process to be connected to either

212

facility-specific fuel combustion unit processes or more generic data from background databases. The

213

framework is structured to support translation into a series of machine-readable queries for future

214

automation. The current approach is only applicable for creating gated chemical production LCI data. For

215

other chemicals in the supply chain, the method can be repeated to generate additional inventories that

216

are connected through intermediate flows, provided the necessary data is contained in the sources

217

upon which the framework is based. Otherwise, there would be a need to connect to existing

218

background inventories. The following steps, with numbering corresponding to Figure 1, were

219

developed for the data mining method:

220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241

1. Identify US facilities that manufacture (not import) the product in the US based on matching the chemical product name in CDR for a specific reporting year. 2. Select CDR data for sites that have associated non-CBI PV data for the product. 3. Rank facilities by PV (highest to lowest), and select the largest PV to be Facility A. 4. Compile air emissions from TRI, NEI, and GHGRP, water discharges from the DMR pollutant loading tool and TRI, and hazardous waste information from RCRAInfo and TRI for Facility A for the selected reporting year (simplified depiction of data sources used for Facility A is provided in Figure S1a). Ensure the facility address matches across all sources. 5. Assume the production process is responsible for all water discharges and hazardous waste from the facility. 6. Sort NEI emissions by the standard classification code (SCC) descriptors, thus identifying each air emission as belonging to the production process, fuel combustion, or disposal. Based on SCC codes, assign emissions to the primary production or supporting processes as depicted in Figure S1b. Supporting processes in this method refer to combustion or disposal processes that are associated with the primary chemical production. Allocating emissions to more detailed supporting processes will allow in-depth contribution analyses to be conducted in subsequent LCAs involving the data. 7. Derive on-site fuel quantity use from GHGRP by dividing the CO2 emissions for each fuel type by the GHGRP emission factors (EFs) or by dividing appropriate NEI emissions (i.e. emissions originally calculated based on a US EPA EF with no control efficiency used) by US EPA EFs.26 8. Use CDR to compile PVs for other chemicals produced at Facility A for the selected reporting year. 6 ACS Paragon Plus Environment

Page 8 of 28

Page 9 of 28

242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263

Environmental Science & Technology

9. Allocate emissions for the production and supporting processes across all products of Facility A on a mass basis. 10. Standardize Facility A releases to a common functional unit, for example, a weight basis of one kilogram of the chemical product. 11. Repeat steps 4-10 for all other facilities making the chemical product. 12. Create a weighted-average product manufacture unit process based on results for all reporting facilities. The following equation describes the process to complete this on the basis of a single pollutant flow.     =

∑$   ,  ×!",  # ∑$  !", 

(Eqn. 1)

Where:  o    is the weighted-average EF, specific to pollutant X and, in this example, the production of the chemical product (kg/kg) o EFPollutant X, Facility i is an EF for pollutant X at a specific facility (a pollutant release normalized by total chemical production, kg/kg) o PVPD, Facility i is the PV of the chemical product at a specific facility (kg) o Subscript Pollutant X refers to a unique pollutant-media combination (e.g., CO2 emissions to air, ammonia emissions to water) o Subscript Facility i refers to a specific facility (e.g., Facility A) o N is the total number of all facilities o PD refers to the chemical product of interest

7 ACS Paragon Plus Environment

Environmental Science & Technology

264 265

Page 10 of 28

Figure 1. General schematic for data mining method applied to acetic acid case study.

266

Assigning Uncertainty. As represented in Equation 1, the EFs for the primary acetic acid

267

production process are based on the production-volume-weighted-average of all included facilities. This

268

is also the case for the energy factor inputs for fuel combustion supporting processes developed as part

269

of the method. The maximum emission and energy factors were recorded as the maximum value at a

270

particular facility. Similarly, the minimum emission or energy factor was recorded as the minimum value

271

at a specific facility. If no quantity of a specific flow was listed at a facility, it was assumed the value was

272

zero and, therefore, the minimum value would be zero. This is a potential limitation to using US EPA 8 ACS Paragon Plus Environment

Page 11 of 28

Environmental Science & Technology

273

databases because it is possible facilities may not report emission quantities below reporting thresholds

274

established for regulatory programs. For all flows with average, minimum and maximum values, a

275

triangular distribution was assumed. It was not possible to use the relevant database reporting

276

threshold limits as minimums because of variations in how threshold limits are defined and applied

277

across the various sources. These variations are noted in Section 2 of the SI, and Table S1 of the SI lists

278

reporting thresholds, exemptions reporting period cycles for each source.

279

Data Quality Analysis. The ISO 14044 LCA standard requires that quality of LCI data for use in

280

one or more LCA studies be evaluated according to a number of criteria.2,3 For this study, a data quality

281

assessment method was employed using a modified form of published data quality indicators.27

282

Quantitative assessments were completed for the primary chemical synthesis inventory, any supporting

283

processes, and the comparison datasets from USLCI and ecoinvent. The data quality assessment applied

284

criteria both at the individual flow-level within each unit process developed and for the whole unit

285

processes. These criteria are provided in Tables S2 and S3 of the SI.

286

Specific emissions shared among processes were scored at the flow-level for data reliability,

287

temporal correlation, geographical correlation, technological correlation and data collection methods.

288

Data reliability scores were assigned based on the original data generation method reported for each

289

emission found in the data source (Table S4 in the SI). Temporal scores were based on the difference

290

between the data generation year and the technology scope year. Geographical scores were based on

291

the resolution of the data, as defined in Table S5 in the SI. The technological scores were based on the

292

ability to represent the technology mix in the US from the three major acetic acid production methods,

293

methyl carbonylation 83.3%, acetaldehyde 14.2% and butane 2.5%.28 The sampling method correlation

294

assesses how adequate the collected data represent the desired sample size and time duration. One

295

year was considered an adequate period of time for collecting data because the databases used yearly

296

crosschecks to determine outlier data, thus ensuring that yearly data variances were minimal. Process

297

validation and completeness were scored at the unit process-level. Process validation is a scoring of the

298

level of review the unit processes underwent by LCA experts and/or experts in the acetic acid

299

production industry. Process completeness quantifies the data gaps and was determined by creating a

300

ratio of included flows versus overall flows into the system.

301

Dataset Refinement and Comparison. In an attempt to reduce the number of misappropriated

302

emissions (i.e., emissions from other unit processes on-site), an exclusion method was developed using

303

organic chemistry and chemical process knowledge to generate a qualitative list of potential emissions. 9 ACS Paragon Plus Environment

Environmental Science & Technology

Page 12 of 28

304

This qualitative inventory was first created for each facility by considering all potential acetic acid

305

synthesis routes, including reactant feeds, desired products and by-products, anticipated intermediates

306

and projected emissions (SI Section 1). The facility-specific inventories were merged to create a master

307

list of potential emissions for average acetic acid production. The inventory generated using the data

308

mining method was refined to remove possible misappropriated emissions through comparison with the

309

potential emissions list.

310

Both the raw and filtered inventories were compared to acetic acid inventories from USLCI and

311

ecoinvent. The analysis examined the number and type of flows contained within each dataset. The

312

alignment of the datasets was gauged based on the number and value of common flows. LCIA results

313

were then calculated to determine if the use of data mining to produce inventories can substantially

314

alter the interpretation of results for decision-making. Impact assessment for the data mined inventory,

315

USLCI inventory and average inventory from ecoinvent v2.2 was performed in openLCA v1.4 using the

316

Tool for the Reduction and Assessment of Chemical and Other Environmental Impacts (TRACI) version

317

2.1, a method developed by US EPA specifically for US conditions.

318

processes were modified to support assessment of only the gate-to-gate acetic acid process emissions.

319

The elementary flows for all unit processes were harmonized against TRACI 2.1 using US EPA’s recently

320

developed LCA Harmonization Tool.30 Contribution analysis and Monte Carlo uncertainty analysis results

321

were exported from openLCA for interpretation.

322

Results

23,25,29

The system boundaries of all

323

Eight facilities were identified with publically available acetic acid PVs from CDR (Table S6).

324

These facilities combined produced 99.5 million pounds of acetic acid in 2011, the common year

325

identified between data sources. CDR reported the total US production plus import of acetic acid as 8.5

326

billion pounds.31 Based on these statistics, the data mining method captured 1.17% of total acetic acid

327

production using non-CBI facilities. While this percentage is likely larger because of the inclusion of

328

imports in the US total, the low-level of data coverage underscores the importance of incorporating CBI

329

to improve coverage and accuracy of LCIs. For the eight facilities, the number of co-products ranged

330

from three to ninety-two, and the percentage of facility chemical production attributed to acetic acid

331

ranged from 0.1% to 43% (Table S7). For purposes of this work, facilities with low relative production of

332

acetic acid were not filtered from the analysis. Not all facilities reported to every database for the 2011

333

reporting year, with all eight reporting to TRI, seven reporting to NEI, four reporting to DMR, three

334

reporting to RCRAInfo, and three reporting to GHGRP (Table S6). The lack of consistency with reporting

335

requirements is a key challenge for the proposed method and required careful consideration to address. 10 ACS Paragon Plus Environment

Page 13 of 28

Environmental Science & Technology

336

For example, weighted-average EFs for GHGs were based on the three facilities reporting to GHGRP,

337

while non-GHG air emissions incorporated all seven facilities reporting to NEI. These variations must be

338

well-documented in an inventory if it is to be used in LCA.

339

Data Gap Analysis. The gate-to-gate system boundaries for acetic acid manufacture is displayed

340

in Figure 2. Ten unit processes describing gate-to-gate acetic acid synthesis and various on-site

341

supporting processes were created:

342 343 344 345 346 347 348 349 350 351

• • • • • • • • • •

Acetic acid; manufacture; at plant; production mix Bituminous coal external combustion boilers; spreader stoker; at plant Natural gas external combustion boilers; 10-100 million BTU/hr.; at plant Natural gas external combustion boilers; X% total

528

facility production) to filter out facilities producing very low quantities of the modeled chemical relative

529

to their total chemical production. However, determining the threshold for a PV filter will be a challenge

530

that requires a larger set of case studies to implement. Another concern with allocation issues at

531

facilities is the possibility chemical co-products are not identified in CDR because of reporting

532

limitations. As shown in Table S1 of the SI, chemical production did not have to be reported to the 2012

533

CDR if facility manufacturing quantities were below 25,000 lbs annually or the chemical was a polymer,

534

alkyd, or oxylated compound. Such exclusions not only affect the co-product allocation values applied,

535

but may also underestimate the total facility production volumes.

Discussion

536

Further areas of refinement to the method should focus on improving the quality and

537

completeness of the resulting datasets. Regarding quality, methods for identifying the technology

538

market representativeness of the data are needed because the generated LCI currently represents an

539

unknown technology blend. A potential solution is the use of sanitized CBI where companies disclose

540

process details. The inclusion of CBI depends on the ability to sanitize data for public use and could

541

immediately alleviate the sample size issue encountered during the acetic acid case study. The data

542

quality analysis demonstrated the difficulty of comparing unit processes based on different LCI methods

543

and chemical production technologies because of a lack of understanding variations between top-down

544

and bottom-up inventory modeling.

545

The immediate research need is filling the data gaps for material and electricity inputs. One

546

approach could be expanding the number of included data sources. As mentioned above, CBI data 19 ACS Paragon Plus Environment

Environmental Science & Technology

Page 22 of 28

547

within US EPA, after sanitization, may provide additional process knowledge such as chemical synthesis

548

routes and feedstocks. Other Federal data sources outside of US EPA may contain relevant input data. A

549

second approach worth considering is augmenting existing LCIs, such as those from USLCI and

550

ecoinvent, with emission profiles from data mining. This will be most feasible for existing datasets that

551

represent market average technology mixes. This approach will be limited by both the availability of

552

existing chemical manufacturing LCIs and any copyright protection of the data. A final approach that

553

may provide the most flexibility is the integration of the data mining method with bottom-up process

554

simulation techniques for LCI modeling. Process simulation can provide an estimate of material and

555

energy requirements, which. can be matched with data mining emission profiles for facilities operating

556

the same basic synthesis technology.

557

The final objective of this work is to identify potential challenges to automating the LCI data

558

mining method using semantic modeling. Before discussing these challenges, it is important to clarify

559

that no inventory modeling should ever be a fully automated black box because of the potential to

560

unknowingly generate erroneous datasets. The intent is to automate the data acquisition and

561

calculations in a stepwise fashion using the approach presented here. This will encourage users to

562

engage in the process and understand the quality of the data. Based on the case study, the first

563

challenge to automation is data access. Many of the various data sources, although publically available,

564

are not in a LOD format compatible with semantic queries. Given the variability, it will be necessary to

565

describe those datasets with ontologies and convert them to LOD graphs for machine-readability.33

566

Work is already underway in US EPA to convert many of the required databases to LOD and the

567

development of an automated tool can logically build on these efforts by augmenting the underlying

568

ontologies with LCA concepts. In addition to the format of the data, the underlying ontologies used to

569

relate the various pieces of data need to be linked. For example, the facility identification code used for

570

TRI data is not the same as the NEI facility code. A relationship between the two facility codes must be

571

made to support compatible querying. These linkages generally exist within US EPA data because the

572

agency maintains a Facility Registry Service containing a crosswalk of all identification codes used to

573

represent a facility in the various databases. Once harmonization connections like these are recognized

574

and facilitated with an ontology, further connections must be established between the types of data

575

that can be extracted from each source and what the data represents conceptually in LCA. A second

576

challenge identified during the case study is the need to develop queries to translate the modeling

577

method presented here into machine-executable steps. The sorting and calculations used to develop

578

weighted EFs were more complex than originally anticipated, especially given the use of a chemistry20 ACS Paragon Plus Environment

Page 23 of 28

Environmental Science & Technology

579

based exclusion filter and data quality analysis. A first practical automated discovery tool will likely not

580

include automation of the chemistry-based exclusion without some form of user input to define the

581

exclusion rules because no easily accessible database of chemical reaction mechanisms exists. To enable

582

this capability in the future, either such a database will have to be created or, as a more novel approach,

583

a simplified chemical reaction simulator will need to be added to the system to establish exclusion

584

criteria based on predicted intermediates and by-products. The final challenge identified through the

585

case study is the logistics of maintaining such a system. The advantage of LOD is the ability to preserve

586

the metadata associated with a data point. However, tracking large quantities of metadata identified

587

during the case study meant larger storage requirements, slower system performance during querying

588

activities, and an increased cost for system hosting. Thus, the future system design must optimize these

589

tradeoffs to rapidly provide a high quality inventory.

590

Once these and other challenges with data acquisition and processing are overcome, it will be

591

possible to develop a semantic inventory modeling system (Figure 5). The proposed structure for

592

automated data discovery will allow the output to be harmonized and stored in a central database. The

593

database will support both inventory modeling and the application of the data to LCA using suitable

594

software. The long-term vision for the system is to support the LCA community by enabling LCIs to be

595

efficiently generated and shared. The method developed here is applicable to production of chemicals

596

tracked in CDR. For chemicals not in CDR, but for which the associated emissions are contained in EPA

597

data sources, suitable PV data will be required to resolve issues related to modeling at the facility-level.

21 ACS Paragon Plus Environment

Environmental Science & Technology

598 599

Figure 5. A proposed system for automated inventory modeling based on linked data architecture.

600

22 ACS Paragon Plus Environment

Page 24 of 28

Page 25 of 28

Environmental Science & Technology

601

Disclaimer

602

The views expressed in this article are those of the authors and do not necessarily represent the views

603

or policies of the US Environmental Protection Agency.

604

Supporting Information

605

Additional text, 4 figures, 18 tables with descriptions of acetic acid production processes, selection and

606

utilization of US EPA databases, data quality scoring, identified acetic acid manufacture facilities,

607

developed life cycle inventory, and life cycle impact assessment of supporting processes. This material is

608

available free of charge via the Internet at http://pubs.acs.org.

609

References (1) National Research Council. Sustainability Concepts in Decision-Making, Tools and Approaches for the US Environmental Protection Agency; National Academies Press: Washington, DC, 2014. (2) Environmental management -- Life cycle assessment -- Principles and framework; ISO No. 14040; ISO: Switzerland, Jan 07, 2006. (3) Environmental management -- Life cycle assessment -- Requirements and guidelines; ISO No. 14044; ISO: Switzerland, Jan 07, 2006. (4) US Life Cycle Inventory (LCI) database; US Department of Energy, National Renewable Energy Lab (NREL), 2012. www.lcacommons.gov/nrel/search (accessed March 2015). 5

Franklin Associates, a Division of Eastern Research Group, Inc. Cradle-to-Gate Life Cycle Inventory of Nine Plastics Resins and Four Polyurethane Precursors; The Plastics Division of the American Chemistry Council: Washington D.C., 2011. (6) Aggarwal, C. C. An Introduction to Data Mining. In Data Mining; Springer International Publishing: Switzerland 2015; pp 1-26. (7) Suh, S. Developing a sectoral environmental database for input-output analysis: the comprehensive data archive of the US. Economic Systems Research. 2005, 17 (4), 449-697. (8) Toxics Release Inventory (TRI) Data and Tools. http://www2.epa.gov/toxics-release-inventory-triprogram/tri-data-and-tools (accessed August 12, 2015). (9) National Emissions Inventory (NEI) https://www3.epa.gov/ttnchie1/net/2011inventory.html (accessed August 12, 2015). (10) US Bureau of Economic Analysis (BEA). Annual input–output table – make and use matrices for 1998. Washington, DC: Department of Commerce, 2002. (11) Sengupta, D.; Hawkins, T.R.; Smith, R.L. Using national inventories for estimating environmental impacts of products from industrial sectors: a case study of ethanol and gasoline. Int. J. Life Cycle Assess. 2015, 20 (5), 597-607.

23 ACS Paragon Plus Environment

Environmental Science & Technology

(12) Open Government Directive, M10-06; Executive Office of the President, Office of Management and Budget, 2009. https://www.whitehouse.gov/open/documents/open-government-directive (accessed August 21, 2015). (13) Data.gov. http://www.data.gov/ (accessed April 27, 2016) (14) Berners-Lee, T.; Hendler, J.; Lassila, O. The semantic web. Scientific American. 2001, 284, 34-43 (15) Hyland, B. Linked Data: Structured Data on the Web. 2014, http://www.slideshare.net/3roundstones/linked-data-overview-structured-data-on-the-web-for-us-epa20140203 (accessed April 27, 2016) (16) Chemical Data Reporting. http://www.epa.gov/cdr/ (accessed July 28, 2015). (17) Bertin, B.; Scuturici, V.M.; Risler, E.; Pinon, J.M. A Semantic Approach to Life Cycle Assessment Applied on Energy Environmental Impact Data Management. Proceedings of the International Conference on Extending Database Technology, Berlin, Germany, March 27–30 2012; Association for Computing Machinery: New York, NY, 2012a. (18) Bertin, B.; Scuturici, V.M.; Pinon, J.M.; Risler, E. CarbonDB: a semantic life cycle inventory database. Proceedings of the 21st ACM international conference on information and knowledge management, Maui, USA, Oct 29 - Nov 2, 2012; Association for Computing Machinery: New York, NY, 2012b. (19) Perez, A.; Larrinaga, F.; Curry, E. The Role of Linked Data and Semantic-Technologies for Sustainability Idea Management. In Software Engineering and Formal Methods, Proceedings of the International Symposium on Modelling and Knowledge Management for Sustainable Development, Madrid, Spain, September 24, 2013; Counsell, S., Nunez, M., Eds.; Springer International Publishing: Switzerland, 2013. (20) Cooper, J.; Noon, M.; Jones, C.; Kahn, E.; Arbuckle, P.; Big data in life cycle assessment. J. Ind. Ecol. 2013, 17 (6), 796-799. (21) Ingwersen, W.W.; Hawkins, T.R.; Transue, T.R.; Meyer, D.E.; Moore, G.; Kahn, E.; Arbuckle, P.; Paulsen, H.; Norris, G.A. A new data architecture for advancing life cycle assessment. Int. J. Life Cycle Assess. 2015, 20 (4), 520-526. (22) Zhang, Y.; Luo, X.; Buis, J.J.; Sutherland, J.W. LCA-oriented semantic representation for the product life cycle. J. Cleaner Prod. 2015, 86, 146-162. (23) openLCA, 1.4.2; GreenDelta: Berlin, Germany, 2015. (24) Wagner, F.S. Acetic acid. In Kirk-Othmer encyclopedia of chemical technology, 5th ed.; John Wiley & Sons Publishing: New York, NY, 2014; Vol. 1, pp 115-136. (25) Ecoinvent database, version 2.2; Swiss Centre for Life Cycle Inventories: Dübendorf, Switzerland, 2010. (26) Technology Transfer Network Clearinghouse for Inventories & Emission Factors. http://cfpub.epa.gov/webfire/index.cfm?action=fire.detailedSearch (accessed August 6, 2015).

24 ACS Paragon Plus Environment

Page 26 of 28

Page 27 of 28

Environmental Science & Technology

(27) Weidema, B.P.; Bauer, C.; Hischier, R.; Mutel, C.; Nemecek, T.; Reinhard, J.; Vadenbo, C.O.; Wernet, G. Overview and methodology: Data quality guidelines for the ecoinvent database version 3. In Swiss Centre for life cycle Inventories, 2013. (28) Ecoinvent database, version 3.2; Swiss Centre for Life Cycle Inventories: Dübendorf, Switzerland, 2015. (29) Bare, J.C.; Norris, G.A.; Pennington, D.W.; McKone, T. TRACI: The tool for the reduction and assessment of chemical and other environmental impacts. J. Ind. Ecol. 2002, 6, 49–78. (30) Ingwersen, W.W.; Transue, T.; Meyer, D.E.; Bergmann, M.; Paulsen, H.; Kahn, E.; Arbuckle, P. LCA Harmonization Tool. Proceedings of the Life Cycle Assessment (LCA) XV conference, Vancouver, Canada, October 6-8, 2015. (31) Chemical Data Access Tool (CDAT). http://java.epa.gov/oppt_chemical_search/ (accessed August 20, 2015). (32) RCRAInfo. http://www.epa.gov/enviro/facts/rcrainfo/ (accessed July 28, 2015). (33) Heath, T., Bizer, C. Linked Data: Evolving the Web into a Global Data Space (1st edition). Synthesis Lectures on the Semantic Web: Theory and Technology. 2011, 1:1, 1-136. Morgan & Claypool. http://linkeddatabook.com/editions/1.0/#htoc16 (accessed February 8, 2016)

25 ACS Paragon Plus Environment

Environmental Science & Technology

Airborne Emissions

Water Discharges

Toxic Releases

Hazardous Waste

Page 28 of 28

Greenhouse Gases

Production Volumes

Chemical Manufacturing Life Cycle Inventory ACS Paragon Plus Environment