Subscriber access provided by Northern Illinois University
Policy Analysis
Mining Available Data from the United States Environmental Protection Agency to Support Rapid Life Cycle Inventory Modeling of Chemical Manufacturing Sarah Allanson Cashman, David Edward Meyer, Ashley Edelen, Wesley W. Ingwersen, John Abraham, William Martin Barrett, Michael Albert Gonzalez, Paul Randall, Gerardo J. Ruiz-Mercado, and Raymond L. Smith Environ. Sci. Technol., Just Accepted Manuscript • DOI: 10.1021/acs.est.6b02160 • Publication Date (Web): 12 Aug 2016 Downloaded from http://pubs.acs.org on August 16, 2016
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
Environmental Science & Technology is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 28
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Environmental Science & Technology
Mining Available Data from the United States Environmental Protection Agency to Support Rapid Life Cycle Inventory Modeling of Chemical Manufacturing Sarah A. Cashman1, David E. Meyer2,*, Ashley Edelen3, Wesley Ingwersen2, John Abraham2, William Barrett2, Michael Gonzalez2, Paul Randall2, Gerardo Ruiz-Mercado2, Raymond L. Smith2 1
Eastern Research Group, 110 Hartwell Ave., Lexington, MA 02421, USA United States Environmental Protection Agency, National Risk Management Research Laboratory, 26 West Martin Luther King Drive, Cincinnati, OH 45268, USA 3 Oak Ridge Institute of Science and Education (ORISE) Postdoctoral Research Participant hosted by U.S. Environmental Protection Agency Office of Research and Development, 26 West Martin Luther King Drive, Cincinnati, OH 45268, USA 2
*Corresponding author: David E. Meyer, PhD U.S. Environmental Protection Agency National Risk Management Research Laboratory 26 West Martin Luther King Drive (MS 483) Cincinnati, OH 45268 Phone: (513) 569-7194 Email:
[email protected] 22
Journal: Environmental Science & Technology
23
Keywords: LCI, Data Mining, Chemical Manufacture, US EPA Database, Semantic
24
Abstract
25
Demands for quick and accurate life cycle assessments create a need for methods to rapidly
26
generate reliable life cycle inventories (LCI). Data mining is a suitable tool for this purpose, especially
27
given the large amount of available governmental data. These data are typically applied to LCIs on a
28
case-by-case basis. As linked open data becomes more prevalent, it may be possible to automate LCI
29
using data mining by establishing a reproducible approach for identifying, extracting, and processing the
30
data. This work proposes a method to standardize and eventually automate the discovery and use of
31
publically available data at the United States Environmental Protection Agency for chemical
32
manufacturing LCI. The method is developed using a case study of acetic acid. The data quality and gap
33
analyses for the generated inventory found the selected data sources can provide information of equal
34
or better reliability and representativeness on air, water, hazardous waste, on-site energy usage, and
35
production volumes, but with key data gaps including material inputs, water usage, purchased
36
electricity, and transportation requirements. A comparison of the generated LCI with existing data
37
revealed the data mining inventory is in reasonable agreement with existing data and may provide a
38
more comprehensive inventory of air emissions and water discharges. The case study highlighted i ACS Paragon Plus Environment
Environmental Science & Technology
39
challenges for current data management practices that must be overcome to successfully automate the
40
method using semantic technology. Benefits of the method are the openly available data can be
41
compiled in a standardized and transparent approach that supports potential automation, with
42
flexibility to incorporate new data sources as needed.
ii ACS Paragon Plus Environment
Page 2 of 28
Page 3 of 28
43
Environmental Science & Technology
Introduction
44
Recently, the National Research Council of the National Academy of Sciences made
45
recommendations to the United States Environmental Protection Agency (US EPA) for implementing
46
sustainability as part of its decision-making process by incorporating life cycle thinking.1 A key tool for
47
this purpose is Life Cycle Assessment (LCA) as defined by the ISO 14040 and 14044 Standards.2,3 The
48
practicality of adopting LCA to support decision-making can be limited by the generic nature of the
49
assessment and the resource-intensive nature of data collection and life cycle inventory (LCI) modeling.
50
However, efforts to reduce the generic nature of LCA are further contributing to the second challenge by
51
increasing the quantity and type of data required for performing assessments. To address these growing
52
data requirements, there is a need to develop automated data mining methods that can efficiently
53
generate reliable LCIs using readily accessible secondary data sources, such as government databases
54
describing the chemical manufacturing sector. While the various data can fill a variety of inventory
55
modeling needs, there are challenges to automating their seamless integration into a suitable LCI.
56
Data Mining and Life Cycle Inventory Modeling. An LCI is an accounting of the material, energy,
57
waste, and emission flows associated with the life cycle of a product. An LCI can be provided at the
58
system level and describe the entire product life cycle from raw material acquisition to final disposal
59
(i.e., cradle-to-grave), or can be created for specific unit processes within the product life cycle (i.e.,
60
cradle-to-gate, gate-to-gate, or gate-to-grave) and sequentially linked to cover the life cycle. The gate-
61
to-gate (or gated) approach is preferred because the modular nature of the inventories enables the
62
same data to be applied across multiple LCAs.
63
Chemical and product manufacturing inventories are typically the most challenging to model
64
because of the lack of available data and large number of material and energy flows. Ideally, the
65
inventory data are obtained directly from manufacturers. However, companies often treat the required
66
information as confidential business information (CBI) and are unwilling to make the data public. One
67
exception is the USLCI database because the chemical LCIs were primarily created through industry
68
association efforts to aggregate and sanitize facility-level CBI for use as industry benchmarks.4,5 Such
69
industry association studies are done on a case-by-case basis, require significant resources, may lack full
70
industry coverage, and are challenging to automatically update. Detailed process-level industry
71
association LCIs derived from primary facility-level data collection should be supplemented with LCI
72
information from other sources. In the absence of the necessary primary data, LCIs must be constructed
73
using secondary data sources. Given the multitude of potential data sources, this approach to inventory
74
modeling will be most efficient using data mining. 1 ACS Paragon Plus Environment
Environmental Science & Technology
75
Data mining, or knowledge discovery in databases, is the study of collecting, harmonizing,
76
processing, and analyzing data to gain useful insights.6 While the term data mining can be used loosely,
77
it generally refers to extracting meaningful new information from data. However, data mining is only a
78
tool and does not eliminate the need to know the system under study or to understand the limitations
79
and proper use of the data. It is often applied for inventory modeling in LCA when a practitioner does
80
not have access to primary data and must rely on secondary data from journal articles, patents,
81
government reports, and existing LCI databases to fill data gaps. However, data mining for this purpose
82
is typically not conducive to rapid and reproducible inventory modeling. For example, Suh discussed the
83
development of a database of environmental flows for use in economic input-output (IO) LCA.7 This
84
involved integrating sources from US EPA such as the Toxics Release Inventory (TRI) and National
85
Emissions Inventory (NEI) to develop sector-average emission data for all industries covered in the IO
86
tables from the US Bureau of Economic Analysis (BEA).8,9,10 The key issues identified for this approach
87
included overlapping chemical coverage of data sources and limitations of the data based on reporting
88
requirements. These thoughts were echoed by Sengupta et al. when developing a method to include
89
data from NEI and TRI in process-based LCI for national-average biofuel production.11 Although both
90
authors chose to use NEI as the primary source and supplement the resulting LCIs with information from
91
TRI, the approach to calculate average emission quantities varied. Suh combined the sectoral emission
92
data from TRI and NEI with the corresponding sectoral economic output from BEA.7 However, Sengupta
93
et al. discovered this can lead to underestimation of emissions if all facilities don’t report to both data
94
sources.11 To correct for this, Sengupta and coworkers matched releases (e.g., emissions) with output at
95
the facility-level to calculate facility-by-facility release factors, with sector averages only reflecting
96
facilities for which both production and releases could be assigned.11 An advantage to this approach is
97
the incorporation of all relevant site-specific factors (e.g., differences in equipment and production
98
methods) into the aggregated US gated chemical manufacturer LCI. Using facility-level information as
99
the foundation of the LCI also allows for aggregation at different spatial or temporal scales depending on
100
the desired end use of the LCI.
101
The results of these two examples are LCIs with a varying degree of uncertainty and
102
heterogeneity. For LCA, controlling and/or reducing the uncertainty using quality data increases the
103
defensibility of life cycle impact assessment (LCIA) results. This may be achievable using data mining to
104
consistently select, harmonize, and process higher quality data into LCIs. Furthermore, the examples
105
were implemented with a focus on sector-specific data that incorporates all unit operations within the
106
included facilities, which is not adequate for modeling a modular gated LCI. Thus, a new method for 2 ACS Paragon Plus Environment
Page 4 of 28
Page 5 of 28
Environmental Science & Technology
107
modeling chemical manufacturing at the facility-level is needed. Methodical data mining is a suitable
108
approach for this purpose because of the ability to include metadata, or data describing the data, with
109
the LCI for data quality analysis.
110
Transitioning to Linked Open Data for Inventory Modeling. The United States Open
111
Government Directive was issued to all federal agencies to make government transparent, participatory,
112
and collaborative.12 As a result, Federal agencies are making data available publically as linked open data
113
(LOD), 13 a machine-readable format that supports facilitated searching and retrieval. This transition has
114
been guided by the emergence of the semantic web, an approach to structure and describe data to
115
enable sharing across multiple systems for a variety of applications.14 The US EPA is exploring semantic
116
management of LOD for key environmental databases such as TRI and the Chemical Data Reporting tool
117
(CDR).8,15,16 This governmental shift can support a transition to semantic data mining for improved
118
inventory modeling.
119
Research on the use of semantic tools to support LCI has focused primarily on the
120
interoperability and accessibility of existing inventory data to make LCA more efficient and less resource
121
intensive. For example, Bertin and coworkers applied semantic modeling to improve identification of
122
linkages in inventory databases within their CarbonDB life cycle database.17,18 Similarly, Perez and
123
coworkers proposed the use of LOD within the LCA community as an example of using semantic
124
modeling to enhance knowledge discovery within life cycle models.19 Cooper and coworkers identified
125
the use of Semantic Web technology as a means to create interoperable “Big Data” across the various
126
public LCA repositories.20 Ingwersen and coworkers described the implementation of the Resource
127
Description Framework (RDF), a set of Semantic Web standards, to facilitate data harmonization and
128
enable the creation of an LOD LCA network.21 Finally, Zhang and coworkers, demonstrated the
129
integration of a semantic LCI database with the openLCA software platform to support streamlined life
130
cycle modeling based on product-specific querying.22,23
131
Towards Rapid Inventory Modeling. Moving beyond the basic objective of data interoperability
132
and examining how semantic modeling can be harnessed to support rapid inventory modeling offers the
133
potential to automate the process and greatly reduce the resources required to conduct an LCA.
134
However, it is important to first develop a conceptual approach for consistent data mining to guide the
135
development of semantic inventory modeling and identify potential challenges with using LOD. The
136
calculations required to transform mined data into meaningful inventory data need to be documented
137
so they can be translated into machine-readable queries. 3 ACS Paragon Plus Environment
Environmental Science & Technology
138
The US EPA maintains large repositories of environmental data, with much of these data
139
obtained directly from state governments and industry as part of regulatory reporting. Given that
140
environmental data accounts for a majority of an LCI, US EPA data is a desirable source upon which to
141
begin developing an automated inventory modeling method. However, various factors may affect the
142
quality of the data for this purpose, including CBI claims by businesses for information contained in
143
regulatory filings and variability in both rule reporting requirements and how the raw data is processed
144
into databases. These factors will then influence the quality of inventories generated using the data.
145
Identifying data that is publically available and of sufficient quality for decision support is key if this
146
method is to be useful to the LCA community at large. In order to achieve the long term vision of an
147
automated inventory modeling system, the objectives of this research are: (1) identify public data
148
sources within US EPA that support inventory modeling; (2) propose a method to apply data mining to
149
US EPA data for inventory modeling; (3) develop a gated LCI for the manufacture of a case study
150
chemical using the data mining method; (4) evaluate the quality of the case study inventory and identify
151
data gaps; (5) investigate the utility of the generated inventory for decision support; and (6) identify
152
challenges to automating the data mining method using semantic tools based on insights provided by
153
the case study. The results of this research will advance the practice of inventory modeling for LCA by
154
supporting development of a semantic modeling tool that generates LCI in a rapid and consistent way.
155
Methods
156
Data Source Identification. The scope of this work was limited to publically available US EPA
157
data sources to support widespread use of an eventual automated inventory tool. The method
158
described here focuses on the collection of data and subsequent calculations for inventory modeling,
159
with only limited discussion of potential automation. The initial focus was to identify suitable electronic
160
sources at US EPA as opposed to other formats that would be more difficult to translate into machine-
161
readable data. A search of US EPA’s website identified ten potential data sources for chemical
162
manufacturing LCIs as summarized in Table S1 of the Supporting Information (SI). Of these, six were
163
selected for further use based on the data sources’ applicability to creating chemical-specific LCI unit
164
processes:
165 166 167 168 169 170 171
1. 2. 3. 4.
CDR: facility-level data on chemical production; NEI: facility- and process-level criteria air pollutants (CAP) and hazardous air pollutants (HAP); TRI: facility-level toxic air, water, and waste flows; Discharge Monitoring Report (DMR) Pollutant Loading Tool: National Pollutant Discharge Elimination System-permit and TRI-water discharges at the facility-level; 5. Greenhouse Gas Reporting Program (GHGRP): greenhouse gases (GHGs) at the facility- and unitlevel; and 4 ACS Paragon Plus Environment
Page 6 of 28
Page 7 of 28
172 173
Environmental Science & Technology
6. Resource Conservation and Recovery Act Information (RCRAInfo): type, fate, and quantity of hazardous waste generated at the facility-level.
174
The selected sources provide inventory data on a facility-level rather than a product-level
175
because they were initially developed for facility-level regulatory purposes and/or for informing the
176
public. The identified databases build on the work completed by Sengupta and coworkers by compiling
177
facility-specific information that extends the possible sources beyond NEI and TRI.11 This allows for the
178
consideration of GHGs, hazardous wastes, more comprehensive water discharges, and chemical
179
production volumes (PVs) in addition to the CAP, HAP, and TRI-reportable chemical data from NEI and
180
TRI. The facility-level focus, when combined with the annual reporting basis, make it critical to obtain
181
facility PV data. IO LCA is able to handle this limitation because it is normalized by US economic activity.
182
For chemicals, CDR provides the necessary annual PVs for most chemicals manufactured in the US
183
except for those excluded through exemptions. Thus, the method developed here may be applied to
184
other sectors if suitable PV data for these other sectors are identified.
185
Selection of a Suitable Case Study. Development of the data mining method through case study
186
enabled identification of the step-by-step activities and calculations needed to leverage the data for LCIs
187
while supporting a better gap analysis of the resulting output. When selecting the chemical case study, a
188
key criterion was the availability of existing LCI unit processes to which the data mining inventory could
189
be compared. Available US-specific LCI unit processes for chemicals were first identified in the USLCI
190
database.4 Of the available chemicals, a relatively simple, widely used chemical with known chemical
191
production processes was preferred. Acetic acid was selected due to its comparatively simple chemistry
192
and common usage. There are currently three main routes for production of acetic acid: methanol
193
carbonylation, acetaldehyde oxidation, and butane-naptha catalytic liquid-phase oxidation, with the
194
greatest commercial production of virgin synthetic acetic acid based on methanol carbonylation.24 All
195
production routes for acetic acid are described in detail in Section 1 of the SI. For comparison, the USLCI
196
database contains inventory for acetic acid via methanol carbonylation while the European ecoinvent
197
database has datasets representing all three acetic acid production processes that can be combined in a
198
market average.25
199
System Boundaries. The focus of the rapid inventory method is gate-to-gate production of a
200
chemical. The boundaries for this process were restricted to the acetic acid production process. As such,
201
production of feedstocks and energy, as well as the treatment of waste were modeled as separate
202
inventories connected through intermediate flows. As will be discussed in the description of data quality
5 ACS Paragon Plus Environment
Environmental Science & Technology
203
analysis methods, a theoretical process flow diagram was developed for acetic acid synthesis and used
204
to evaluate the completeness of the process.
205
Data Mining Method for Inventory Modeling. The data mining and calculation activities
206
performed during the case study were documented as a framework of sequential steps for application
207
to a generic chemical product as illustrated in Figure 1. A benefit of this approach is the ability to
208
separate emissions that fall outside the process boundaries, such as those related to on-site fuel
209
combustion for energy, and use them to calculate the required fuel or energy inputs for the process.
210
Identifying and segregating this data is important for the modular use of gated chemical LCI in broader
211
product system inventories because it enables a manufacturing process to be connected to either
212
facility-specific fuel combustion unit processes or more generic data from background databases. The
213
framework is structured to support translation into a series of machine-readable queries for future
214
automation. The current approach is only applicable for creating gated chemical production LCI data. For
215
other chemicals in the supply chain, the method can be repeated to generate additional inventories that
216
are connected through intermediate flows, provided the necessary data is contained in the sources
217
upon which the framework is based. Otherwise, there would be a need to connect to existing
218
background inventories. The following steps, with numbering corresponding to Figure 1, were
219
developed for the data mining method:
220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241
1. Identify US facilities that manufacture (not import) the product in the US based on matching the chemical product name in CDR for a specific reporting year. 2. Select CDR data for sites that have associated non-CBI PV data for the product. 3. Rank facilities by PV (highest to lowest), and select the largest PV to be Facility A. 4. Compile air emissions from TRI, NEI, and GHGRP, water discharges from the DMR pollutant loading tool and TRI, and hazardous waste information from RCRAInfo and TRI for Facility A for the selected reporting year (simplified depiction of data sources used for Facility A is provided in Figure S1a). Ensure the facility address matches across all sources. 5. Assume the production process is responsible for all water discharges and hazardous waste from the facility. 6. Sort NEI emissions by the standard classification code (SCC) descriptors, thus identifying each air emission as belonging to the production process, fuel combustion, or disposal. Based on SCC codes, assign emissions to the primary production or supporting processes as depicted in Figure S1b. Supporting processes in this method refer to combustion or disposal processes that are associated with the primary chemical production. Allocating emissions to more detailed supporting processes will allow in-depth contribution analyses to be conducted in subsequent LCAs involving the data. 7. Derive on-site fuel quantity use from GHGRP by dividing the CO2 emissions for each fuel type by the GHGRP emission factors (EFs) or by dividing appropriate NEI emissions (i.e. emissions originally calculated based on a US EPA EF with no control efficiency used) by US EPA EFs.26 8. Use CDR to compile PVs for other chemicals produced at Facility A for the selected reporting year. 6 ACS Paragon Plus Environment
Page 8 of 28
Page 9 of 28
242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263
Environmental Science & Technology
9. Allocate emissions for the production and supporting processes across all products of Facility A on a mass basis. 10. Standardize Facility A releases to a common functional unit, for example, a weight basis of one kilogram of the chemical product. 11. Repeat steps 4-10 for all other facilities making the chemical product. 12. Create a weighted-average product manufacture unit process based on results for all reporting facilities. The following equation describes the process to complete this on the basis of a single pollutant flow. =
∑$ , ×!", # ∑$ !",
(Eqn. 1)
Where: o is the weighted-average EF, specific to pollutant X and, in this example, the production of the chemical product (kg/kg) o EFPollutant X, Facility i is an EF for pollutant X at a specific facility (a pollutant release normalized by total chemical production, kg/kg) o PVPD, Facility i is the PV of the chemical product at a specific facility (kg) o Subscript Pollutant X refers to a unique pollutant-media combination (e.g., CO2 emissions to air, ammonia emissions to water) o Subscript Facility i refers to a specific facility (e.g., Facility A) o N is the total number of all facilities o PD refers to the chemical product of interest
7 ACS Paragon Plus Environment
Environmental Science & Technology
264 265
Page 10 of 28
Figure 1. General schematic for data mining method applied to acetic acid case study.
266
Assigning Uncertainty. As represented in Equation 1, the EFs for the primary acetic acid
267
production process are based on the production-volume-weighted-average of all included facilities. This
268
is also the case for the energy factor inputs for fuel combustion supporting processes developed as part
269
of the method. The maximum emission and energy factors were recorded as the maximum value at a
270
particular facility. Similarly, the minimum emission or energy factor was recorded as the minimum value
271
at a specific facility. If no quantity of a specific flow was listed at a facility, it was assumed the value was
272
zero and, therefore, the minimum value would be zero. This is a potential limitation to using US EPA 8 ACS Paragon Plus Environment
Page 11 of 28
Environmental Science & Technology
273
databases because it is possible facilities may not report emission quantities below reporting thresholds
274
established for regulatory programs. For all flows with average, minimum and maximum values, a
275
triangular distribution was assumed. It was not possible to use the relevant database reporting
276
threshold limits as minimums because of variations in how threshold limits are defined and applied
277
across the various sources. These variations are noted in Section 2 of the SI, and Table S1 of the SI lists
278
reporting thresholds, exemptions reporting period cycles for each source.
279
Data Quality Analysis. The ISO 14044 LCA standard requires that quality of LCI data for use in
280
one or more LCA studies be evaluated according to a number of criteria.2,3 For this study, a data quality
281
assessment method was employed using a modified form of published data quality indicators.27
282
Quantitative assessments were completed for the primary chemical synthesis inventory, any supporting
283
processes, and the comparison datasets from USLCI and ecoinvent. The data quality assessment applied
284
criteria both at the individual flow-level within each unit process developed and for the whole unit
285
processes. These criteria are provided in Tables S2 and S3 of the SI.
286
Specific emissions shared among processes were scored at the flow-level for data reliability,
287
temporal correlation, geographical correlation, technological correlation and data collection methods.
288
Data reliability scores were assigned based on the original data generation method reported for each
289
emission found in the data source (Table S4 in the SI). Temporal scores were based on the difference
290
between the data generation year and the technology scope year. Geographical scores were based on
291
the resolution of the data, as defined in Table S5 in the SI. The technological scores were based on the
292
ability to represent the technology mix in the US from the three major acetic acid production methods,
293
methyl carbonylation 83.3%, acetaldehyde 14.2% and butane 2.5%.28 The sampling method correlation
294
assesses how adequate the collected data represent the desired sample size and time duration. One
295
year was considered an adequate period of time for collecting data because the databases used yearly
296
crosschecks to determine outlier data, thus ensuring that yearly data variances were minimal. Process
297
validation and completeness were scored at the unit process-level. Process validation is a scoring of the
298
level of review the unit processes underwent by LCA experts and/or experts in the acetic acid
299
production industry. Process completeness quantifies the data gaps and was determined by creating a
300
ratio of included flows versus overall flows into the system.
301
Dataset Refinement and Comparison. In an attempt to reduce the number of misappropriated
302
emissions (i.e., emissions from other unit processes on-site), an exclusion method was developed using
303
organic chemistry and chemical process knowledge to generate a qualitative list of potential emissions. 9 ACS Paragon Plus Environment
Environmental Science & Technology
Page 12 of 28
304
This qualitative inventory was first created for each facility by considering all potential acetic acid
305
synthesis routes, including reactant feeds, desired products and by-products, anticipated intermediates
306
and projected emissions (SI Section 1). The facility-specific inventories were merged to create a master
307
list of potential emissions for average acetic acid production. The inventory generated using the data
308
mining method was refined to remove possible misappropriated emissions through comparison with the
309
potential emissions list.
310
Both the raw and filtered inventories were compared to acetic acid inventories from USLCI and
311
ecoinvent. The analysis examined the number and type of flows contained within each dataset. The
312
alignment of the datasets was gauged based on the number and value of common flows. LCIA results
313
were then calculated to determine if the use of data mining to produce inventories can substantially
314
alter the interpretation of results for decision-making. Impact assessment for the data mined inventory,
315
USLCI inventory and average inventory from ecoinvent v2.2 was performed in openLCA v1.4 using the
316
Tool for the Reduction and Assessment of Chemical and Other Environmental Impacts (TRACI) version
317
2.1, a method developed by US EPA specifically for US conditions.
318
processes were modified to support assessment of only the gate-to-gate acetic acid process emissions.
319
The elementary flows for all unit processes were harmonized against TRACI 2.1 using US EPA’s recently
320
developed LCA Harmonization Tool.30 Contribution analysis and Monte Carlo uncertainty analysis results
321
were exported from openLCA for interpretation.
322
Results
23,25,29
The system boundaries of all
323
Eight facilities were identified with publically available acetic acid PVs from CDR (Table S6).
324
These facilities combined produced 99.5 million pounds of acetic acid in 2011, the common year
325
identified between data sources. CDR reported the total US production plus import of acetic acid as 8.5
326
billion pounds.31 Based on these statistics, the data mining method captured 1.17% of total acetic acid
327
production using non-CBI facilities. While this percentage is likely larger because of the inclusion of
328
imports in the US total, the low-level of data coverage underscores the importance of incorporating CBI
329
to improve coverage and accuracy of LCIs. For the eight facilities, the number of co-products ranged
330
from three to ninety-two, and the percentage of facility chemical production attributed to acetic acid
331
ranged from 0.1% to 43% (Table S7). For purposes of this work, facilities with low relative production of
332
acetic acid were not filtered from the analysis. Not all facilities reported to every database for the 2011
333
reporting year, with all eight reporting to TRI, seven reporting to NEI, four reporting to DMR, three
334
reporting to RCRAInfo, and three reporting to GHGRP (Table S6). The lack of consistency with reporting
335
requirements is a key challenge for the proposed method and required careful consideration to address. 10 ACS Paragon Plus Environment
Page 13 of 28
Environmental Science & Technology
336
For example, weighted-average EFs for GHGs were based on the three facilities reporting to GHGRP,
337
while non-GHG air emissions incorporated all seven facilities reporting to NEI. These variations must be
338
well-documented in an inventory if it is to be used in LCA.
339
Data Gap Analysis. The gate-to-gate system boundaries for acetic acid manufacture is displayed
340
in Figure 2. Ten unit processes describing gate-to-gate acetic acid synthesis and various on-site
341
supporting processes were created:
342 343 344 345 346 347 348 349 350 351
• • • • • • • • • •
Acetic acid; manufacture; at plant; production mix Bituminous coal external combustion boilers; spreader stoker; at plant Natural gas external combustion boilers; 10-100 million BTU/hr.; at plant Natural gas external combustion boilers; X% total
528
facility production) to filter out facilities producing very low quantities of the modeled chemical relative
529
to their total chemical production. However, determining the threshold for a PV filter will be a challenge
530
that requires a larger set of case studies to implement. Another concern with allocation issues at
531
facilities is the possibility chemical co-products are not identified in CDR because of reporting
532
limitations. As shown in Table S1 of the SI, chemical production did not have to be reported to the 2012
533
CDR if facility manufacturing quantities were below 25,000 lbs annually or the chemical was a polymer,
534
alkyd, or oxylated compound. Such exclusions not only affect the co-product allocation values applied,
535
but may also underestimate the total facility production volumes.
Discussion
536
Further areas of refinement to the method should focus on improving the quality and
537
completeness of the resulting datasets. Regarding quality, methods for identifying the technology
538
market representativeness of the data are needed because the generated LCI currently represents an
539
unknown technology blend. A potential solution is the use of sanitized CBI where companies disclose
540
process details. The inclusion of CBI depends on the ability to sanitize data for public use and could
541
immediately alleviate the sample size issue encountered during the acetic acid case study. The data
542
quality analysis demonstrated the difficulty of comparing unit processes based on different LCI methods
543
and chemical production technologies because of a lack of understanding variations between top-down
544
and bottom-up inventory modeling.
545
The immediate research need is filling the data gaps for material and electricity inputs. One
546
approach could be expanding the number of included data sources. As mentioned above, CBI data 19 ACS Paragon Plus Environment
Environmental Science & Technology
Page 22 of 28
547
within US EPA, after sanitization, may provide additional process knowledge such as chemical synthesis
548
routes and feedstocks. Other Federal data sources outside of US EPA may contain relevant input data. A
549
second approach worth considering is augmenting existing LCIs, such as those from USLCI and
550
ecoinvent, with emission profiles from data mining. This will be most feasible for existing datasets that
551
represent market average technology mixes. This approach will be limited by both the availability of
552
existing chemical manufacturing LCIs and any copyright protection of the data. A final approach that
553
may provide the most flexibility is the integration of the data mining method with bottom-up process
554
simulation techniques for LCI modeling. Process simulation can provide an estimate of material and
555
energy requirements, which. can be matched with data mining emission profiles for facilities operating
556
the same basic synthesis technology.
557
The final objective of this work is to identify potential challenges to automating the LCI data
558
mining method using semantic modeling. Before discussing these challenges, it is important to clarify
559
that no inventory modeling should ever be a fully automated black box because of the potential to
560
unknowingly generate erroneous datasets. The intent is to automate the data acquisition and
561
calculations in a stepwise fashion using the approach presented here. This will encourage users to
562
engage in the process and understand the quality of the data. Based on the case study, the first
563
challenge to automation is data access. Many of the various data sources, although publically available,
564
are not in a LOD format compatible with semantic queries. Given the variability, it will be necessary to
565
describe those datasets with ontologies and convert them to LOD graphs for machine-readability.33
566
Work is already underway in US EPA to convert many of the required databases to LOD and the
567
development of an automated tool can logically build on these efforts by augmenting the underlying
568
ontologies with LCA concepts. In addition to the format of the data, the underlying ontologies used to
569
relate the various pieces of data need to be linked. For example, the facility identification code used for
570
TRI data is not the same as the NEI facility code. A relationship between the two facility codes must be
571
made to support compatible querying. These linkages generally exist within US EPA data because the
572
agency maintains a Facility Registry Service containing a crosswalk of all identification codes used to
573
represent a facility in the various databases. Once harmonization connections like these are recognized
574
and facilitated with an ontology, further connections must be established between the types of data
575
that can be extracted from each source and what the data represents conceptually in LCA. A second
576
challenge identified during the case study is the need to develop queries to translate the modeling
577
method presented here into machine-executable steps. The sorting and calculations used to develop
578
weighted EFs were more complex than originally anticipated, especially given the use of a chemistry20 ACS Paragon Plus Environment
Page 23 of 28
Environmental Science & Technology
579
based exclusion filter and data quality analysis. A first practical automated discovery tool will likely not
580
include automation of the chemistry-based exclusion without some form of user input to define the
581
exclusion rules because no easily accessible database of chemical reaction mechanisms exists. To enable
582
this capability in the future, either such a database will have to be created or, as a more novel approach,
583
a simplified chemical reaction simulator will need to be added to the system to establish exclusion
584
criteria based on predicted intermediates and by-products. The final challenge identified through the
585
case study is the logistics of maintaining such a system. The advantage of LOD is the ability to preserve
586
the metadata associated with a data point. However, tracking large quantities of metadata identified
587
during the case study meant larger storage requirements, slower system performance during querying
588
activities, and an increased cost for system hosting. Thus, the future system design must optimize these
589
tradeoffs to rapidly provide a high quality inventory.
590
Once these and other challenges with data acquisition and processing are overcome, it will be
591
possible to develop a semantic inventory modeling system (Figure 5). The proposed structure for
592
automated data discovery will allow the output to be harmonized and stored in a central database. The
593
database will support both inventory modeling and the application of the data to LCA using suitable
594
software. The long-term vision for the system is to support the LCA community by enabling LCIs to be
595
efficiently generated and shared. The method developed here is applicable to production of chemicals
596
tracked in CDR. For chemicals not in CDR, but for which the associated emissions are contained in EPA
597
data sources, suitable PV data will be required to resolve issues related to modeling at the facility-level.
21 ACS Paragon Plus Environment
Environmental Science & Technology
598 599
Figure 5. A proposed system for automated inventory modeling based on linked data architecture.
600
22 ACS Paragon Plus Environment
Page 24 of 28
Page 25 of 28
Environmental Science & Technology
601
Disclaimer
602
The views expressed in this article are those of the authors and do not necessarily represent the views
603
or policies of the US Environmental Protection Agency.
604
Supporting Information
605
Additional text, 4 figures, 18 tables with descriptions of acetic acid production processes, selection and
606
utilization of US EPA databases, data quality scoring, identified acetic acid manufacture facilities,
607
developed life cycle inventory, and life cycle impact assessment of supporting processes. This material is
608
available free of charge via the Internet at http://pubs.acs.org.
609
References (1) National Research Council. Sustainability Concepts in Decision-Making, Tools and Approaches for the US Environmental Protection Agency; National Academies Press: Washington, DC, 2014. (2) Environmental management -- Life cycle assessment -- Principles and framework; ISO No. 14040; ISO: Switzerland, Jan 07, 2006. (3) Environmental management -- Life cycle assessment -- Requirements and guidelines; ISO No. 14044; ISO: Switzerland, Jan 07, 2006. (4) US Life Cycle Inventory (LCI) database; US Department of Energy, National Renewable Energy Lab (NREL), 2012. www.lcacommons.gov/nrel/search (accessed March 2015). 5
Franklin Associates, a Division of Eastern Research Group, Inc. Cradle-to-Gate Life Cycle Inventory of Nine Plastics Resins and Four Polyurethane Precursors; The Plastics Division of the American Chemistry Council: Washington D.C., 2011. (6) Aggarwal, C. C. An Introduction to Data Mining. In Data Mining; Springer International Publishing: Switzerland 2015; pp 1-26. (7) Suh, S. Developing a sectoral environmental database for input-output analysis: the comprehensive data archive of the US. Economic Systems Research. 2005, 17 (4), 449-697. (8) Toxics Release Inventory (TRI) Data and Tools. http://www2.epa.gov/toxics-release-inventory-triprogram/tri-data-and-tools (accessed August 12, 2015). (9) National Emissions Inventory (NEI) https://www3.epa.gov/ttnchie1/net/2011inventory.html (accessed August 12, 2015). (10) US Bureau of Economic Analysis (BEA). Annual input–output table – make and use matrices for 1998. Washington, DC: Department of Commerce, 2002. (11) Sengupta, D.; Hawkins, T.R.; Smith, R.L. Using national inventories for estimating environmental impacts of products from industrial sectors: a case study of ethanol and gasoline. Int. J. Life Cycle Assess. 2015, 20 (5), 597-607.
23 ACS Paragon Plus Environment
Environmental Science & Technology
(12) Open Government Directive, M10-06; Executive Office of the President, Office of Management and Budget, 2009. https://www.whitehouse.gov/open/documents/open-government-directive (accessed August 21, 2015). (13) Data.gov. http://www.data.gov/ (accessed April 27, 2016) (14) Berners-Lee, T.; Hendler, J.; Lassila, O. The semantic web. Scientific American. 2001, 284, 34-43 (15) Hyland, B. Linked Data: Structured Data on the Web. 2014, http://www.slideshare.net/3roundstones/linked-data-overview-structured-data-on-the-web-for-us-epa20140203 (accessed April 27, 2016) (16) Chemical Data Reporting. http://www.epa.gov/cdr/ (accessed July 28, 2015). (17) Bertin, B.; Scuturici, V.M.; Risler, E.; Pinon, J.M. A Semantic Approach to Life Cycle Assessment Applied on Energy Environmental Impact Data Management. Proceedings of the International Conference on Extending Database Technology, Berlin, Germany, March 27–30 2012; Association for Computing Machinery: New York, NY, 2012a. (18) Bertin, B.; Scuturici, V.M.; Pinon, J.M.; Risler, E. CarbonDB: a semantic life cycle inventory database. Proceedings of the 21st ACM international conference on information and knowledge management, Maui, USA, Oct 29 - Nov 2, 2012; Association for Computing Machinery: New York, NY, 2012b. (19) Perez, A.; Larrinaga, F.; Curry, E. The Role of Linked Data and Semantic-Technologies for Sustainability Idea Management. In Software Engineering and Formal Methods, Proceedings of the International Symposium on Modelling and Knowledge Management for Sustainable Development, Madrid, Spain, September 24, 2013; Counsell, S., Nunez, M., Eds.; Springer International Publishing: Switzerland, 2013. (20) Cooper, J.; Noon, M.; Jones, C.; Kahn, E.; Arbuckle, P.; Big data in life cycle assessment. J. Ind. Ecol. 2013, 17 (6), 796-799. (21) Ingwersen, W.W.; Hawkins, T.R.; Transue, T.R.; Meyer, D.E.; Moore, G.; Kahn, E.; Arbuckle, P.; Paulsen, H.; Norris, G.A. A new data architecture for advancing life cycle assessment. Int. J. Life Cycle Assess. 2015, 20 (4), 520-526. (22) Zhang, Y.; Luo, X.; Buis, J.J.; Sutherland, J.W. LCA-oriented semantic representation for the product life cycle. J. Cleaner Prod. 2015, 86, 146-162. (23) openLCA, 1.4.2; GreenDelta: Berlin, Germany, 2015. (24) Wagner, F.S. Acetic acid. In Kirk-Othmer encyclopedia of chemical technology, 5th ed.; John Wiley & Sons Publishing: New York, NY, 2014; Vol. 1, pp 115-136. (25) Ecoinvent database, version 2.2; Swiss Centre for Life Cycle Inventories: Dübendorf, Switzerland, 2010. (26) Technology Transfer Network Clearinghouse for Inventories & Emission Factors. http://cfpub.epa.gov/webfire/index.cfm?action=fire.detailedSearch (accessed August 6, 2015).
24 ACS Paragon Plus Environment
Page 26 of 28
Page 27 of 28
Environmental Science & Technology
(27) Weidema, B.P.; Bauer, C.; Hischier, R.; Mutel, C.; Nemecek, T.; Reinhard, J.; Vadenbo, C.O.; Wernet, G. Overview and methodology: Data quality guidelines for the ecoinvent database version 3. In Swiss Centre for life cycle Inventories, 2013. (28) Ecoinvent database, version 3.2; Swiss Centre for Life Cycle Inventories: Dübendorf, Switzerland, 2015. (29) Bare, J.C.; Norris, G.A.; Pennington, D.W.; McKone, T. TRACI: The tool for the reduction and assessment of chemical and other environmental impacts. J. Ind. Ecol. 2002, 6, 49–78. (30) Ingwersen, W.W.; Transue, T.; Meyer, D.E.; Bergmann, M.; Paulsen, H.; Kahn, E.; Arbuckle, P. LCA Harmonization Tool. Proceedings of the Life Cycle Assessment (LCA) XV conference, Vancouver, Canada, October 6-8, 2015. (31) Chemical Data Access Tool (CDAT). http://java.epa.gov/oppt_chemical_search/ (accessed August 20, 2015). (32) RCRAInfo. http://www.epa.gov/enviro/facts/rcrainfo/ (accessed July 28, 2015). (33) Heath, T., Bizer, C. Linked Data: Evolving the Web into a Global Data Space (1st edition). Synthesis Lectures on the Semantic Web: Theory and Technology. 2011, 1:1, 1-136. Morgan & Claypool. http://linkeddatabook.com/editions/1.0/#htoc16 (accessed February 8, 2016)
25 ACS Paragon Plus Environment
Environmental Science & Technology
Airborne Emissions
Water Discharges
Toxic Releases
Hazardous Waste
Page 28 of 28
Greenhouse Gases
Production Volumes
Chemical Manufacturing Life Cycle Inventory ACS Paragon Plus Environment