Integrating COSMO-Based σ-Profiles with Molecular and

Dec 20, 2018 - Life Cycle Assessment (LCA) has become the main approach for the environmental impact assessment of chemicals. Unfortunately, LCA ...
0 downloads 0 Views 1MB Size
Subscriber access provided by University of South Dakota

Article

Integrating COSMO-based #-profiles with molecular and thermodynamic attributes to predict the life cycle environmental impact of chemicals Raul Calvo-Serrano, Maria Gonzalez-Miquel, and Gonzalo Guillen Gosalbez ACS Sustainable Chem. Eng., Just Accepted Manuscript • DOI: 10.1021/ acssuschemeng.8b06032 • Publication Date (Web): 20 Dec 2018 Downloaded from http://pubs.acs.org on December 25, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Sustainable Chemistry & Engineering

Integrating COSMO-based σ-profiles with molecular and thermodynamic attributes to predict the life cycle environmental impact of chemicals Raul Calvo-Serrano†, María González-Miquel‡,§, Gonzalo Guillén-Gosálbez†* †Centre

for Process Systems Engineering, Imperial College of Science, Technology and

Medicine, South Kensington Campus, Roderic Hill Building, London SW7 2BY, United Kingdom ‡Departamento

de Ingeniería Química Industrial y del Medioambiente, Universidad

Politécnica de Madrid, Calle José Gutiérrez Abascal num.2, E-28006 Madrid, Spain §School

of Chemical Engineering and Analytical Science, The University of Manchester, The

Mill, Sackville Street, Manchester M1 3AL, United Kingdom§†‡

*e-mail: [email protected]

Abstract

Life Cycle Assessment (LCA) has become the main method for the environmental impact assessment of chemicals. Unfortunately, LCA studies often require large amounts of data, time and resources. To circumvent this limitation, here we propose a streamlined LCA method that predicts the impact of chemicals from molecular descriptors, thermodynamic properties and 1 ACS Paragon Plus Environment

ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 29

the surface charge density distributions of molecules (COSMO-based σ-profiles). Our approach uses mixed-integer nonlinear models that automatically construct predictive equations of the life cycle impact of chemicals from a set of attributes easier to obtain compared to LCA inventories. We applied our method to 90 chemicals, whose impact categories were predicted from three attribute sets: 15 molecular descriptors, 12 thermodynamic properties and discretised σ-profiles. Nine impact categories were estimated, ranging from Global Warming Potential and Eco-Indicator99 metrics. Results show that models based on molecular and σprofile attributes present similar performance to those based on molecular and thermodynamic attributes. This facilitates the application of streamlined LCA when developing new chemicals and processes, avoiding the experimental determination of thermodynamic properties. Furthermore, molecular, thermodynamic and σ-profile attributes used together provide the most accurate predictions. Overall, this work aims to enhance chemical environmental assessment, facilitating their screening and enhancing the development of more sustainable processes and products.

Keywords: Streamlined Life Cycle Assessment, Mathematical programming, Feature selection, sigma-profile, prediction models. INTRODUCTION In recent years the amount of chemicals developed has increased exponentially, with more than 15.000 substances registered every day in the CAS registry1. Under this scenario, it is necessary to develop fast and reliable methods to properly assess the environmental impact of chemicals considering their life cycle. This need has led to the emergence of a wide range of environmental assessment tools2,3, among which Life Cycle Assessment (LCA)4–11 has become the prevalent approach. The main advantage of LCA is that it quantifies environmental burdens in different categories across the product’s life cycle, thereby avoiding results that shift 2 ACS Paragon Plus Environment

Page 3 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Sustainable Chemistry & Engineering

environmental burdens to other echelons in the chemicals’ supply chain and/or to other environmental categories12,13,22,23,14–21. The main drawback of LCA is that it requires large amounts of data, most of which might be difficult to gather in practice. Data gaps often arise due to low quality measurements, confidentiality issues related to companies reluctant to share data and/or total absence of information. This shortcoming is particularly critical in the chemical industry, which involves hundreds of interconnected technologies exchanging materials and energy and consuming shared resources. Hence, applying LCA to chemicals is challenging, and as a consequence full LCAs of only a few hundred are at present available in standard repositories24. To circumvent these issues and help to cover data gaps, streamlined LCA (SLCA) methodologies have been developed that facilitate the application of LCA by using proxy and alternative data and/or simplified evaluation methods, including qualitative assessments and regression models. These approaches produce less accurate estimates of the environmental impacts, yet they have the potential to greatly reduce the time and resources required in a standard LCA. For an SLCA to be effective, it needs to be tailored to a specific family of products, as otherwise the accuracy and reliability of the results may be poor. Several SLCA approaches have been developed for different sectors, such as buildings25–27, water treatment plants28,29, electricity power plants30, oil refineries31, general chemical processes32–41 and other manufacturing facilities42,43. Here we are interested in developing an SLCA method for predicting the life cycle impact of chemicals, enabling a fast and reliable screening of products without the need to carry out full LCAs. In a seminal work, Wernet et al.(2008)32 proposed an SLCA method for chemicals that predicts their life cycle production impact (i.e. cradle-to-gate) from their molecular structure. Capitalising on this approach, in a previous work44 we introduced a new SLCA method for chemicals that constructs linear equations to predict their life cycle impact from

3 ACS Paragon Plus Environment

ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 29

thermodynamic and physical properties as well as molecular structure descriptors. Other SLCA methods45–47 estimate impact values using detailed information on the associated production processes (i.e., the amount and type of feedstocks and energy consumed, solvents and catalysts required, emissions generated, etc.). In contrast, both Wernet et al.(2008)32 and our previous approach44 aim to estimate LCA impacts when such process information is unavailable. Instead, they rely on physicochemical properties of the chemical produced, exploiting the underlying assumption that there is a link between those properties and the particular performance of the process being analysed; hence, properties are assumed to be indirectly related to the estimated impact category, as they are connected to mass and energy flows in chemical processes that dictate their impact. The assumptions and simplifications made in these SLCA approaches (including the one here presented) lead to some errors, the magnitude of which depends on the specific case. Results from these methods, therefore, can be used as benchmarking and preliminary values, becoming valuable at the early stages of the design of new chemicals or synthesis processes. SLCA methods are also very valuable when a large number of chemicals need to be screened, which would require substantial resources and time, making the analysis very costly. As we shall discuss in more detail later during the article, here we propose to enhance our previous SLCA method by adding additional predictors into the SLCA equations. In our first approach presented in Calvo-Serrano et al. (2018)44, the impact of chemicals were estimated from their molecular descriptors, such as molecular weight or carboxylic groups, and thermodynamic properties, such as standard densities or standard formation enthalpies. For many common chemicals, these properties can be easily obtained in databases such as the NIST Chemistry WebBook48 or in safety sheets (MSDS). Alternatively, they can be calculated using thermodynamic methods implemented in, for example, process simulation software such as Aspen Plus or Aspen-HYSYS. However, when dealing with novel molecules whose impact

4 ACS Paragon Plus Environment

Page 5 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Sustainable Chemistry & Engineering

needs to be assessed at the early stages of process design, calculating these properties might be challenging and can lead to large uncertainties. In fact, classical thermodynamic models used in Chemical Engineering, including group contribution methods such as UNIFAC or activitycoefficient models like NRTL, strongly rely on experimental data for specific parameters adjustment, therefore having limited applicability to compounds with new functional groups or completely new chemicals49. To avoid the need to estimate thermodynamic properties of chemicals as a preliminary step to predict their impact, we here propose to use σ-profile data derived from quantum chemical COSMO-based methods49–52 along with molecular descriptors to produce estimates of their life cycle environmental performance. In particular, the COnductor-like Screening MOdel for Real Solvents (COSMO-RS) is a quantum mechanical method combined with a statistical thermodynamic procedure that enables the efficient calculation of the thermophysical properties of compounds50. Moreover, this computational approach provides a representation of the charge density σ distribution on the molecular surface through the so-called σ-profiles, which includes relevant molecular information about the interaction energies of the species in solution that can serve as the basis to compute key thermodynamic properties. Furthermore, it is worth highlighting that COSMO-RS methods allows estimating molecular interactions and derived properties based on structural information of the compounds, without the need of experimental data and with general applicability53. In fact, σ-profiles have been extensively used to characterise molecular interactions, with particular focus on the characterisation of solvents54–56 as well as on the estimation of physicochemical properties like viscosity or surface tension57–59.

The advantage of

incorporating information on σ-profiles in the predictive equations of impact is twofold: (i) obtaining σ-profiles is often easier than determining thermodynamic and physical properties, which simplifies the development of the SLCA equations to evaluate the environmental impact of new chemicals; (ii) using σ-profiles data in combination with thermodynamic properties

5 ACS Paragon Plus Environment

ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 29

(when available) and molecular descriptors provides a more flexible and general framework to estimate the chemicals’ impact, which as shown later results in more accurate predictive models. In addition to these advantages, as in our previous work44, this methodology avoids the assembling and analysis of process flowsheets, providing a direct link between molecules and their life cycle impact without the need to develop a process flowsheet, which can be highly time consuming. In summary, the methodology here proposed aims to improve our previous work44 by adding σ-profiles to thermodynamic and molecular attributes, enhancing both the LCA impact prediction accuracy and its flexibility to be applied when information on thermodynamic properties is scarce or completely missing. The prediction models obtained from this work can be used, for example, to compare the life cycle impact of new chemicals when other information is unavailable at the early stages of their development. Furthermore, these same models could be embedded in molecular design tools so as to screen and rank chemical according to their environmental performance. These results, however, should be used to narrow down the initially large number of alternatives to be explored prior to conduction a more rigorous assessment with mode detailed models. Hence, here we develop a new SLCA method for predicting the life cycle cradle-to-gate impact of producing chemicals using three types of descriptors: (i) molecular descriptors; (ii) thermodynamic properties; and (iii) σ-profiles. We apply our method to data retrieved from the literature and environmental repositories24, showing that the new combined approach produces more accurate estimates of impact. In the ensuing sections, we will further define the problem to be solved and the data used in the calculations. The methodology will then be outlined and applied, followed by an analysis of the numerical results and the conclusions of the work.

PROBLEM STATEMENT

6 ACS Paragon Plus Environment

Page 7 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Sustainable Chemistry & Engineering

Given a set of attributes of chemicals, including molecular descriptors, thermodynamic properties and molecular information encoded in the σ-profiles, the goal of the analysis is to build streamlined LCA models that will provide estimates of the life cycle impact from a combination of a subset of these attributes. To test the hypothesis that the life cycle impact can be predicted from these attributes, we adapt the dataset presented in our previous work44 by adding seven conventional pesticides (see section S1 in the Supporting Information). All chemicals are analysed following the same approach, without separating them into potentially arbitrary groups that may bias both results and reliability and generality of the presented method. In addition, data consistency is ensured by removing some incomplete or empty molecular and thermodynamic attribute categories from the database used in our previous approach44. Ultimately, the adapted dataset here used encompasses 90 different chemicals, 12 different thermodynamic properties (labelled here as T1 to T12, directly retrieved from AspenHYSYS v8.2) and 15 molecular descriptors (labelled as M1 to M15). In addition to these attributes, we consider the information contained in the σ-profiles of the chemicals, which represent the molecular surface charge density distribution and depends on the specific chemical structure of the compounds and also on in its spatial configuration (i.e. different profiles are obtained for different conformations of the same molecule). The σprofiles of the chemicals were computed using COSMOtherm software (version C30, release 1701) and a given parametrization (BP_TZVP_C30_1701)60. In this work, the profiles are pre-processed as follows before being used as predictors. First, we average all possible σ-profiles of a given molecule. Then, we discretise the σ-profiles into a set of values following the approach proposed by Palomar et al. (2008b)56, where the profile distribution is divided into eight equal partitions (from -0.03 to 0.03 e/Ų, with 0.0075 e/Ų intervals). For each such interval, we integrate the area below each partition so as to obtain a

7 ACS Paragon Plus Environment

ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 29

single value per partition, each of which can be used as predictor in the dataset (labelled as S1 to S8). After including these eight σ-profile attributes, we end up with a database consisting of 35 input attributes for 90 organic chemicals ranging from pesticides to petrochemicals with many diverse applications in the chemical industry, most of them as solvents. The 90 chemicals here considered have been selected so as to satisfy three criteria: (i) they are widely produced chemicals, (ii) the required information (i.e. physicochemical attributes and full LCA results) is available, and (iii) they cover a wide range of organic chemicals (including solvents, intermediates, food additives and pesticides). The goal is then to construct linear equations capable of predicting the impact in nine relevant environmental impact categories from a subset of the 35 attributes. The categories to be estimated include: (i) the Cumulative Energy Demand (CED), covering all energy streams consumed in the product’s supply chain; (ii) the Global Warming Potential (GWP), denoting the greenhouse gases emissions throughout the product’s life cycle; (iii) the Chemical Oxygen Demand (COD) and (iv) Biological Oxygen Demand (BOD5), both related to the degradability of the contaminants present in the aqueous streams in the product’s life cycle; (v) the Total Organic Carbon (TOC), which quantifies the amount of organic chemicals present in the aforementioned life cycle waste; and (vi) the Eco-Indicator99 (EI99 Total). The latter aggregates impacts on (vii) Human Health (EI99 HH), accounting for the effects of emissions to air, soil and water on life expectancy; (viii) Ecosystem Quality (EI99 EQ), connected to emissions affecting biodiversity; and depletion of (ix) Resources (EI99 Res), related to the surplus energy required in the future to extract the non-renewable resources consumed in all the stages in the chemical’s life cycle. A description of all the attributes and the parameters used for the normalisation carried out before the calculations is provided in section S1 in the Supporting Information. In order to calibrate the different streamlined LCA models, we hence

8 ACS Paragon Plus Environment

Page 9 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Sustainable Chemistry & Engineering

use information available of these impact categories for the 90 chemicals in the dataset. This information is ultimately retrieved from environmental databases (e.g. Ecoinvent24) and full LCA results calculated by Wernet et al.(2008)32. Note that there are many factors (e.g. type of electricity consumed, solvents, etc.) that despite affecting the environmental performance of a chemical are omitted in the models in an explicit manner, i.e., they are not defined as descriptors of impact although they are reflected in the impact values of the chemicals used to build the models. Hence, while the impact values employed for calibrating the models cover different scenarios (those considered in Ecoinvent24 and Wernet et al.(2008)32), the descriptors employed to make the predictions are based only on information contained in molecular, thermodynamic and σ-profile attributes. The electricity mix, as an example, has a strong impact on the environmental indicators, yet it is not defined as predictor. This simplification, however, would be overcome in a subsequent step by performing a detailed LCA on a reduced subset of chemicals once the original (larger) set of alternatives have been narrowed down using the streamlined results. Furthermore, our approach could be easily extended to add additional predictors, like the electricity mix type, in order to produce better estimates, yet this would also lead to higher data requirements.

METHODS In essence, the streamlined LCA models of impact are constructed by solving an optimisation problem M of the following form: (M) min

𝑅𝑆𝑆 = ∑𝑖(𝑦𝑖 ― 𝑦𝑖)2

(1)

s.t. 𝑦𝑖 = 𝑎 + ∑𝑗𝑏𝑗𝑥𝑖𝑗, ∀𝑖

(2)

𝑧𝑗𝑏𝑗 ≤ 𝑏𝑗 ≤ 𝑧𝑗𝑏𝑗, ∀𝑗

(3)

∑ 𝑗𝑧 𝑗 = 𝑘

(4)

9 ACS Paragon Plus Environment

ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 29

where continuous variables 𝑎 and 𝑏𝑗 denote the regression coefficients, while binary variables 𝑧𝑗 represent the selection of attributes and parameter 𝑘 represents the total amount of attributes selected. Hence, a binary variable is defined for each attribute that takes the value of one if such attribute (e.g. M1, molecular weight) is included in the regression equation and becomes zero otherwise. As we shall discuss later on, the use of these variables mitigates the problem of overfitting that appears when too many attributes are included in the predictive equation, thereby wrongly capturing noise as an essential part of the regression model. The optimisation problem M, used to build the streamlined LCA equations, takes the form of an MINLP that minimises the error between the predicted impact (𝑦𝑖) and the “true” impact (𝑦𝑖), where the former is calculated by the regression equation while the latter is retrieved from environmental repositories (i.e. Ecoinvent24). The MINLP is solved for an increasing number of attributes (i.e. by varying the value of parameter 𝑘 in M), and the models obtained that way are validated afterwards. Hence, the MINLP combines Ordinary Least Squares (OLS) with features selection constraints based on binary variables. Each time the optimisation problem is solved, a candidate regression model is generated. After repeating this for several numbers of predictors, a set of candidate models are generated which are afterwards assessed via a LeaveOne-Out (LOO) cross validation. A summary of this methodology can be found in Figure 1, while further details of this method can be found in our previous work44.

10 ACS Paragon Plus Environment

Page 11 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Sustainable Chemistry & Engineering

Figure 1, Methodology for predicting life cycle impacts from different sets of attributes. An example of streamlined LCA equation is provided, that in this hypothetical case would provide the GWP of the chemical from the number of cyano groups and one interval of the σ-profile. This simple linear equation can then be used in the sustainability assessment and optimisation of chemical processes and products. RESULTS AND DISCUSSION We here assess the accuracy of the approach proposed by defining different scenarios that differ in the type of attributes considered in the analysis: using only σ-profile data as predictors (S); only thermodynamic properties (T); σ-profiles and molecular descriptors combined (MS); thermodynamic and molecular attributes combined (MT); and all attributes combined (MTS). Note that the combined use of different types of attributes can exploit specific synergies between them thereby leading to better predictions44,61. Hence, in total, we consider five different scenarios (S, T, MS, MT and MTS), for each of which we produce a set of candidate predictive models. This is done by solving a series of optimisation models in GAMS 24.4 interfacing with the solver CPLEX 12.6 on an Intel Core i5-4570 3.20 GHz computer. Details 11 ACS Paragon Plus Environment

ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 29

of these models can be found in Table 1. We first discuss the results obtained in the training set and then describe the performance of the models in the validation set. Table 1, Model sizes and time requirements for solving all models in all presented scenarios. Scenario # continuous variables # binary variables # equations CPU seconds

S 190

T 194

MS 205

MT 209

MTS 217

8

12

23

27

35

198 2.01

206 3.49

228 10.36

236 41.09

252 1129.29

Training set results Different sets of models are identified by solving the MINLP for all the 90 chemicals available (i.e. the training set), considering an increasing number of attributes. The sum of squared residuals (RSS) displayed in Figure 2 shows how the errors consistently decrease as the amount of attributes used as predictors increases. As an example, in scenario MTS, CED RSS values drop form 1.49 with one attribute selected to 0.83 with 35 attributes. Furthermore, synergies become apparent when comparing the results for the different scenarios, as the error decreases always as we include more attributes into the models. As an example, for GWP, the RSS values for the S and MS cases with seven attributes are 2.44 and 2.16, respectively; hence, the MS model performs better because it can chose among a wider range of attributes. Comparing the MT and MS models, we find that MS performs better for impacts COD and EI99 EQ, while in GWP, TOC and EI99 T the MT approach is clearly superior. Finally, regardless of the number of features considered, the best model is always the one with all the attributes included. As an example, the best GWP RSS for the S, T, MS, MT and MTS cases are 2.44, 2.08, 1.91, 1.51 and 1.32, respectively, when all the attributes within each scenario are included in the analysis.

12 ACS Paragon Plus Environment

Page 13 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Sustainable Chemistry & Engineering

Figure 2, Training RSS results for the scenarios S, T, MS, MT and MTS.

Note that other error metrics can be used in the calculations. More precisely, other performance metrics (Average Relative Error, ARE; and Spearman correlation coefficients, rSpearman) could be calculated for the training set besides the RSS. These are provided in section S2 in the Supporting Information. As well, the attributes selected in each predictive equations can be found in section S3 in the Supporting Information. A careful inspection of these results reveals no clear selection pattern behind the predictive models, as attributes are included an excluded from the equations following no clear sequence. This behaviour in model selection with features selection has been previously observed in the literature61, and is very much dependant on the potential synergies between attributes. Following on this, in our previous work44 we discussed the specific relationship between molecular and thermodynamic attributes 13 ACS Paragon Plus Environment

ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 29

and impact categories, analysing their correlations and their selection in the prediction models. Our findings in that paper indicated that some of the molecular and thermodynamic attributes could be clearly linked to specific impacts (e.g. boiling enthalpy and temperature can be directly linked to CED), while other attributes presented relevant correlations despite having little or no apparent relation with that impact (e.g. molecular weight was linked to EI99 HH). In this case, however, it is quite challenging to directly link the different discretised σ-profiles attributes (S1 to S8) to environmental impacts, as these represent only a fraction of a property (the charge distribution). Nevertheless, it should be kept in mind that σ-profiles of the molecules can be used as the basis to predict the thermodynamic properties of the compounds53 and, therefore, both types of attributes should in principle display similar predictive capabilities. Directly embedding the σ-profile data in the predictive models, however, overcomes the lack of thermodynamic properties available, which can be of chief importance for screening new chemicals for sustainable products and processes design. Validation set results Following our approach, the candidate models (each involving a given number of attributes) are validated using the Leave-One-Out (LOO) cross validation method. This approach removes one single entry in the dataset at a time (i.e. a given chemical), using the remaining entries to re-adjust the models found in the training stage. Then, the readjusted model is used to predict the impact value of the removed entry (i.e. chemical), using its attribute values as inputs. The predicted results are then compared with impact values from rigorous LCA databases (e.g. Ecoinvent24), obtaining a direct measure of the prediction performance of the proposed model for that particular value. This approach is repeated for all the entries in the dataset, ultimately quantifying average error metrics that assess the accuracy, generality and reliability of the models evaluated. Additionally, the LOO validation approach also checks their reliability when attempting to predict impacts for the removed chemicals (i.e. extrapolation of results), directly

14 ACS Paragon Plus Environment

Page 15 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Sustainable Chemistry & Engineering

measuring the potential bias caused by the data used to train the model (i.e. selection of inappropriate prediction attributes) or in the structure of the prediction model itself. In order to compare the performance of the scenarios previously defined, we here use two complementary metrics of performance: the Average Relative Error (ARE), and the Spearman correlation coefficient (rSpearman). On the one hand, ARE values represent the error of the estimate produced by the model, and therefore evaluate its accuracy. On the other hand, rSpearman represents the rank correlation between the predicted and original impact values, so it measures how well the model mimics the behaviour of the predicted impact. Additionally, these correlation values can be used to assess the statistical significance of the models, thereby providing a measure of robustness. The ARE and rSpearman results from the LOO cross validation are displayed in Figures 3 and 4, respectively. As observed in Figure 3, the LOO ARE results show drastic changes in performance in consecutive models. Furthermore, the performance of the models tend to first improve and then worsen as we increase the number of attributes. This behaviour is consistent with the potential occurrence of “overfitting”, where the addition of parameters in a model (in this case, more attributes) may wrongly capture noise as an essential part of the model, thereby adding error to the predictions and worsening the model’s performance. Because of this, models first tend to improve their accuracy with the addition of few attributes, but after a certain point, accuracy decreases, further worsening with the addition of more attributes. Examples of overfitting can be clearly seen in Figure 3, for the prediction of GWP in scenarios S and T, with the errors monotonically increasing after a certain number of attributes. Finally, it is worth mentioning that the errors are quite high for the impact categories COD, BOD5 and TOC. As it was discussed in our previous work44, these impact categories are related to the contamination of wastewaters, which in the case of chemicals is mostly linked to the use of solvents and

15 ACS Paragon Plus Environment

ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 29

generation of aqueous wastes. These two environmental burdens are quite hard to estimate from the chemical attributes used herein, which leads to poor predictions. Overall, it should be noted that impact categories related with energy demand and environmental emissions (such as CED or GWP) are better predicted compared to those related to the biodegradability of the products (like COD or BOD5). This can be explained taking into account that the three types of attributes considered herein (σ-profile data, thermodynamic properties and molecular descriptors) are inherently related to physicochemical properties of the compounds that ultimately drive the energy penalty or effluent emissions during the life cycle of the molecules. However, they do not provide enough information about the rate of waste stabilization which is related to other bio-chemical properties that need to be determined through specific biodegradation assays.

16 ACS Paragon Plus Environment

Page 17 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Sustainable Chemistry & Engineering

Figure 3, LOO ARE results for the scenarios S, T, MS, MT and MTS.

Figure 4, LOO rSpearman results for the scenarios S, T, MS, MT and MTS. Figure 4 shows the Spearman rank correlation coefficients (LOO rSpearman) computed in the validation set for the same models in Figure 3. These correlation values represent the probability that a model captures the true behaviour of the predicted data, thus providing a direct measure of its robustness. Following this definition, it is possible to determine whether the results obtained are statistically significant or otherwise may be the result of random noise. This is done by comparing the obtained correlation factors with a threshold value, which corresponds to rSpearman=0.18 for a significance level α=0.05. Further details about this limit can be found in section S4 in the Supporting Information. As can be seen in Figure 4, most of the models lie above the significance threshold, which confirms their statistical reliability. However, for categories COD, BOD5 and TOC, most prediction models fall below the 17 ACS Paragon Plus Environment

ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 29

threshold (198 out of 315 models generated for all scenarios), which indicates that they are not statistically robust. Furthermore, as can be seen in Figure 4, the trends described by the LOO rSpearman are similar to the ones observed when analysing the LOO ARE, which evidences the existence of overfitting. That is, the models perform first better as we add few attributes, and then start to worsen beyond a given number of attributes selected. In Figure 5, we compare in more detail the best models generated in each scenario (i.e. models with lowest LOO ARE values). The explicit form of these models is provided in section S5 in the Supporting Information.

Fig 5, Best LOO ARE models (and their rSpearman values) for the scenarios S, T, MS, MT and MTS. The number of attributes considered in each model is shown on the respective error bar. The bars marked in bold indicate the model with lowest LOO ARE values among all the ones 18 ACS Paragon Plus Environment

Page 19 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Sustainable Chemistry & Engineering

generated by the MINLP for every impact category. The discontinuous lines indicate the statistical significance for the rSpearman values for a confidence level α=0.05, while the continuous line indicates the null correlation threshold. As seen in Figure 5, most of the best models display correlation values above the required threshold, except for categories COD, BOD5 and TOC, where much lower correlations are found. This finding is consistent with our previous work44, where these three impact categories presented very high errors (similar to the ones shown here) and non-significant correlation values, which is most likely caused by a weak relation between these metrics and the attributes here considered. From Figure 5, we see that models containing σ-profile data appear to be statically reliable to predict impact categories closely related with the physicochemical and thermodynamic behaviour of the systems. This is consistent with the fact that σ-profiles capture the relevant molecular information required for the estimation of the thermophysical properties of the chemicals, as supported by previous COSMO-based QSAR\QSPR approaches55,56. Moreover, there are some specific impact categories such as COD and EI99 EQ where the σ-profile models (scenario S) perform slightly better than the one with thermodynamic properties (scenario T). At this respect, linear and nonlinear techniques have been recently proposed to relate σ-profiles descriptors to the ecotoxicity of solvents, demonstrating that σ-profiles are able to capture the structural factors strongly connected to the overall toxicity62. Likewise, models with σ-profiles and molecular descriptors together (scenario MS) generally provide similar results (LOO ARE values less than 5% apart) to those produced by models containing thermodynamic and molecular attributes (scenario MT). Again, this can be explained considering that σ-profiles resulting from quantum chemical calculations contain the required molecular information to estimate a wide range of thermodynamic properties of chemicals53. Therefore, the performance of both types of attributes could be expected to be similar when

19 ACS Paragon Plus Environment

ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 29

embedded in predictive models combined with molecular descriptors that evaluate the environmental impact of chemicals, as supported by the results shown in Figure 5. This indicates that MS models may be useful in applications where σ-profiles are easier to characterise than thermodynamic properties (e.g. when dealing with new molecules for which experimental data are lacking). In most categories, the models that achieve the best prediction errors (i.e. highlighted error bars in Figure 5) are those using σ-profiles, thermodynamic properties and molecular descriptors together (scenario MTS). Prediction errors in these best models fall in the range 2040%, an acceptable value given the uncertainties affecting any standard LCA study, particularly considering the simplifications here considered. This further justifies the use of σprofiles in the prediction of environmental impacts, as they improve the accuracy of models containing only thermodynamic and molecular attributes (scenario MT), which were the focus of our previous work44.

CONCLUSION In this paper we have presented a method to predict the cradle-to-gate impact of organic chemicals that automatically generates linear prediction models based on molecular, thermodynamic and surface charge density (σ-profiles) attributes. The methodology is based on a mixed-integer programming formulation that automatically identifies the best set of prediction attributes and, simultaneously, generates the corresponding prediction models used to produce the impact estimates. This methodology was applied to predict nine LCA impact categories from a dataset containing 35 attributes (15 molecular, 12 thermodynamic and eight intervals of the σ-profiles) for 90 chemicals, ranging from conventional petrochemical solvents to pesticides. The results obtained demonstrate that thermodynamic properties can be replaced by σ-profiles while still producing accurate results, with almost identical performance (less

20 ACS Paragon Plus Environment

Page 21 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Sustainable Chemistry & Engineering

than 5% difference) when combined with molecular descriptors. At the same time, the best prediction models were those constructed considering together molecular, thermodynamic and σ-profile attributes. This demonstrates that adding COSMO-based σ-profiles data to thermodynamic and molecular attributes leads to more accurate estimates, thus enhancing applications in molecular design and environmental assessment of chemical products and processes, filling data gaps hard to cover otherwise. Furthermore, the proposed methodology was able to generate reasonable prediction errors (20 to 40%) for most of the considered impacts (CED, GWP and EI99 categories), only failing to capture the behaviour of three impact categories (COD, BOD5 and TOC), with prediction errors higher than 700%. Hence, our method should not be used to predict the COD, BOD5 and TOC. Overall, these methods can in turn facilitate and underpin the design of new chemicals for which thermodynamic data based on experimental measurements are scarce, thereby contributing to improve the sustainability level of the chemical industry.

AUTHOR INFORMATION Corresponding Author *G. Guillén-Gosálbez. E-mail: [email protected] ORCID Gonzalo Guillén-Gosálbez: 0000-0001-6074-8473 Author Contributions All authors contributed equally. Notes

21 ACS Paragon Plus Environment

ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 29

The authors declare no competing financial interest. ACKNOWLEDGEMENTS Gonzalo Guillén-Gosálbez would like to acknowledge the financial support received from the Spanish Government (CTQ2016-77968-C3-1-P, MINECO/FEDER, UE). ASSOCIATED CONTENT The Supporting Information is available free of charge on the ACS Publications website at DOI: 00.0000/acs.est.0000000 Content: Data description, Additional model training results, Statistical correlation analysis, Best LOO prediction models

REFERENCES (1)

http://www.cas.org/. Chemical Abstracts Service Home Page http://www.cas.org/ (accessed Oct 18, 2016).

(2)

Constable, D. J. C.; Curzons, A. D.; Cunningham, V. L. Metrics to “green” Chemistry - Which Are the Best? Green Chem. 2002, 4 (6), 521–527, DOI 10.1039/b206169b.

(3)

Tobiszewski, M. Metrics for Green Analytical Chemistry. Analytical Methods. The Royal Society of Chemistry 2016, pp 2993–2999, DOI 10.1039/c6ay00478d.

(4)

Guinée, J. B.; Heijungs, R.; Udo de Haes, H. A.; Huppes, G. Quantitative Life Cycle Assessment of Products. 2. Classification, Valuation and Improvement Analysis. J. Clean. Prod. 1993, 1 (2), 81–91, DOI 10.1016/0959-6526(93)90046-E.

(5)

Guinée, J. B.; Udo de Haes, H. A.; Huppes, G. Quantitative Life Cycle Assessment of Products. 1:Goal Definition and Inventory. J. Clean. Prod. 1993, 1 (1), 3–13, DOI 10.1016/0959-6526(93)90027-9.

(6)

Sheldon, R. A. CHAPTER 2. Sustainability, Green Chemistry and White Biotechnology; 2015; pp 9–35, DOI 10.1039/9781782624080-00009.

(7)

Quiroz-Ramírez, J. J.; Sánchez-Ramírez, E.; Hernández-Castro, S.; SegoviaHernández, J. G.; Ponce-Ortega, J. M. Optimal Planning of Feedstock for Butanol 22 ACS Paragon Plus Environment

Page 23 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Sustainable Chemistry & Engineering

Production Considering Economic and Environmental Aspects. ACS Sustain. Chem. Eng. 2017, 5 (5), 4018–4030, DOI 10.1021/acssuschemeng.7b00015. (8)

Bessette, A. P.; Teymouri, A.; Martin, M. J.; Stuart, B. J.; Resurreccion, E. P.; Kumar, S. Life Cycle Impacts and Techno-Economic Implications of Flash Hydrolysis in Algae Processing. ACS Sustain. Chem. Eng. 2018, 6 (3), 3580–3588, DOI 10.1021/acssuschemeng.7b03912.

(9)

Sternberg, A.; Bardow, A. Life Cycle Assessment of Power-to-Gas: Syngas vs Methane. ACS Sustain. Chem. Eng. 2016, 4 (8), 4156–4165, DOI 10.1021/acssuschemeng.6b00644.

(10)

Li, Q.; McGinnis, S.; Sydnor, C.; Wong, A.; Renneckar, S. Nanocellulose Life Cycle Assessment. ACS Sustain. Chem. Eng. 2013, 1 (8), 919–928, DOI 10.1021/sc4000225.

(11)

Orfield, N. D.; Fang, A. J.; Valdez, P. J.; Nelson, M. C.; Savage, P. E.; Lin, X. N.; Keoleian, G. A. Life Cycle Design of an Algal Biorefinery Featuring Hydrothermal Liquefaction: Effect of Reaction Conditions and an Alternative Pathway Including Microbial Regrowth. ACS Sustain. Chem. Eng. 2014, 2 (4), 867–874, DOI 10.1021/sc4004983.

(12)

Azapagic, A.; Clift, R. The Application of Life Cycle Assessment to Process Optimisation. Comput. Chem. Eng. 1999, 23 (10), 1509–1526, DOI 10.1016/S00981354(99)00308-7.

(13)

Burgess, A. A.; Brennan, D. J. Application of Life Cycle Assessment to Chemical Processes. Chem. Eng. Sci. 2001, 56 (8), 2589–2604, DOI 10.1016/S00092509(00)00511-X.

(14)

Anastas, P. T.; Lankey, R. L. Life Cycle Assessment and Green Chemistry: The Yin and Yang of Industrial Ecology. Green Chem. 2000, 2 (6), 289–295, DOI 10.1039/b005650m.

(15)

Jimenez-Gonzalez, C.; Overcash, M. R. The Evolution of Life Cycle Assessment in Pharmaceutical and Chemical Applications - a Perspective. Green Chem. 2014, 16 (7), 3392–3400, DOI 10.1039/C4GC00790E.

(16)

Kralisch, D.; Ott, D.; Gericke, D. Rules and Benefits of Life Cycle Assessment in Green Chemical Process and Synthesis Design: A Tutorial Review. Green Chemistry. 23 ACS Paragon Plus Environment

ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 29

2015, pp 123–145, DOI 10.1039/c4gc01153h. (17)

Khoo, H. H.; Ee, W. L.; Isoni, V. Bio-Chemicals from Lignocellulose Feedstock: Sustainability, LCA and the Green Conundrum. Green Chem. 2016, 18 (7), 1912– 1922, DOI 10.1039/C5GC02065D.

(18)

Cespi, D.; Beach, E. S.; Swarr, T. E.; Passarini, F.; Vassura, I.; Dunn, P. J.; Anastas, P. T. Life Cycle Inventory Improvement in the Pharmaceutical Sector: Assessment of the Sustainability Combining PMI and LCA Tools. Green Chem. 2015, 17 (6), 3390– 3400, DOI 10.1039/c5gc00424a.

(19)

Bojarski, A. D.; Laínez, J. M.; Espuña, A.; Puigjaner, L. Incorporating Environmental Impacts and Regulations in a Holistic Supply Chains Modeling: An LCA Approach. Comput. Chem. Eng. 2009, 33 (10), 1747–1759, DOI 10.1016/j.compchemeng.2009.04.009.

(20)

Gerber, L.; Gassner, M.; Maréchal, F. Systematic Integration of LCA in Process Systems Design: Application to Combined Fuel and Electricity Production from Lignocellulosic Biomass. Comput. Chem. Eng. 2011, 35 (7), 1265–1280, DOI 10.1016/j.compchemeng.2010.11.012.

(21)

Wen, Y.; Shonnard, D. R. Environmental and Economic Assessments of Heat Exchanger Networks for Optimum Minimum Approach Temperature. Comput. Chem. Eng. 2003, 27 (11), 1577–1590, DOI 10.1016/S0098-1354(03)00097-8.

(22)

Lazareva, A.; Keller, A. A. Estimating Potential Life Cycle Releases of Engineered Nanomaterials from Wastewater Treatment Plants. ACS Sustain. Chem. Eng. 2014, 2 (7), 1656–1665, DOI 10.1021/sc500121w.

(23)

Gonzalez-Garay, A.; Gonzalez-Miquel, M.; Guillen-Gosalbez, G. High-Value Propylene Glycol from Low-Value Biodiesel Glycerol: A Techno-Economic and Environmental Assessment under Uncertainty. ACS Sustain. Chem. Eng. 2017, 5 (7), 5723–5732, DOI 10.1021/acssuschemeng.7b00286.

(24)

Swiss Centre For Life Cycle Inventories, 2017. Ecoinvent Data V3.3. Ecoinvent Cent. Ecoinvent 3.3 https://www.ecoinvent.org/home.html (accessed May 20, 2017).

(25)

Malmqvist, T.; Glaumann, M.; Scarpellini, S.; Zabalza, I.; Aranda, A.; Llera, E.; Díaz, S. Life Cycle Assessment in Buildings: The ENSLIC Simplified Method and 24 ACS Paragon Plus Environment

Page 25 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Sustainable Chemistry & Engineering

Guidelines. Energy 2011, 36 (4), 1900–1907, DOI 10.1016/j.energy.2010.03.026. (26)

Yeo, Z.; Ng, R.; Song, B. Technique for Quantification of Embodied Carbon Footprint of Construction Projects Using Probabilistic Emission Factor Estimators. J. Clean. Prod. 2016, 119, 135–151, DOI 10.1016/j.jclepro.2016.01.076.

(27)

Zabalza Bribián, I.; Aranda Usón, A.; Scarpellini, S. Life Cycle Assessment in Buildings: State-of-the-Art and Simplified LCA Methodology as a Complement for Building Certification. Build. Environ. 2009, 44 (12), 2510–2520, DOI 10.1016/j.buildenv.2009.05.001.

(28)

Schulz, M.; Short, M. D.; Peters, G. M. A Streamlined Sustainability Assessment Tool for Improved Decision Making in the Urban Water Industry. Integr. Environ. Assess. Manag. 2012, 8 (1), 183–193, DOI 10.1002/ieam.247.

(29)

Quirante, N.; Caballero, J. A. Optimization of a Sour Water Stripping Plant Using Surrogate Models. Comput. Aided Chem. Eng. 2016, 38, 31–36, DOI 10.1016/B978-0444-63428-3.50010-2.

(30)

Moreau, V.; Bage, G.; Marcotte, D.; Samson, R. Statistical Estimation of Missing Data in Life Cycle Inventory: An Application to Hydroelectric Power Plants. J. Clean. Prod. 2012, 37, 335–341, DOI 10.1016/j.jclepro.2012.07.036.

(31)

Weston, N.; Clift, R.; Holmes, P.; Basson, L.; White, N. Streamlined Life Cycle Approaches for Use at Oil Refineries and Other Large Industrial Facilities. Ind. Eng. Chem. Res. 2011, 50 (3), 1624–1636, DOI 10.1021/ie1007272.

(32)

Wernet, G.; Hellweg, S.; Fischer, U.; Papadokonstantakis, S.; Hungerbühler, K. Molecular-Structure-Based Models of Chemical Inventories Using Neural Networks. Environ. Sci. Technol. 2008, 42 (17), 6717–6722, DOI 10.1021/es7022362.

(33)

Wernet, G.; Papadokonstantakis, S.; Hellweg, S.; Hungerbühler, K. Bridging Data Gaps in Environmental Assessments: Modeling Impacts of Fine and Basic Chemical Production. Green Chem. 2009, No. 11, 1826–1831, DOI 10.1039/b905558d.

(34)

Hugo, A.; Ciumei, C.; Buxton, A.; Pistikopoulos, E. N. Environmental Impact Minimization through Material Substitution : A Multi-Objective Optimization Approach. Methodology 2004, 407–417, DOI 10.1039/b401868k.

(35)

Eckelman, M. J. Life Cycle Inherent Toxicity: A Novel LCA-Based Algorithm for 25 ACS Paragon Plus Environment

ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 29

Evaluating Chemical Synthesis Pathways. Green Chem. 2016, 18 (11), 3257–3264, DOI 10.1039/C5GC02768C. (36)

Marvuglia, A.; Kanevski, M.; Benetto, E. Machine Learning for Toxicity Characterization of Organic Chemical Emissions Using USEtox Database: Learning the Structure of the Input Space. Environ. Int. 2015, 83, 72–85, DOI 10.1016/j.envint.2015.05.011.

(37)

Tula, A. K.; Babi, D. K.; Bottlaender, J.; Eden, M. R.; Gani, R. A Computer-Aided Software-Tool for Sustainable Process Synthesis-Intensification. Comput. Chem. Eng. 2017, 105, 74–95, DOI 10.1016/j.compchemeng.2017.01.001.

(38)

Karka, P.; Papadokonstantakis, S.; Hungerbühler, K.; Kokossis, A. Environmental Impact Assessment of Biorefinery Products Using Life Cycle Analysis. In Computer Aided Chemical Engineering; 2014; Vol. 34, pp 543–548, DOI 10.1016/B978-0-44463433-7.50075-4.

(39)

Guillen-Gosalbez, G.; Caballero, J. A.; Esteller, L. J.; Gadalla, M. Application of Life Cycle Assessment to the Structural Optimization of Process Flowsheets. Comput. Aided Chem. Eng. 2007, 24, 1163–1168, DOI 10.1016/S1570-7946(07)80218-5.

(40)

Smith, R. L.; Ruiz-Mercado, G. J.; Meyer, D. E.; Gonzalez, M. A.; Abraham, J. P.; Barrett, W. M.; Randall, P. M. Coupling Computer-Aided Process Simulation and Estimations of Emissions and Land Use for Rapid Life Cycle Inventory Modeling. ACS Sustain. Chem. Eng. 2017, 5 (5), 3786–3794, DOI 10.1021/acssuschemeng.6b02724.

(41)

Mittal, V. K.; Bailin, S. C.; Gonzalez, M. A.; Meyer, D. E.; Barrett, W. M.; Smith, R. L. Toward Automated Inventory Modeling in Life Cycle Assessment: The Utility of Semantic Data Modeling to Predict Real-World Chemical Production. ACS Sustain. Chem. Eng. 2018, 6 (2), 1961–1976, DOI 10.1021/acssuschemeng.7b03379.

(42)

Kaebemick, H.; Sun, M.; Kara, S. Simplified Lifecycle Assessment for the Early Design Stages of Industrial Products. CIRP Ann. - Manuf. Technol. 2003, 52 (1), 25– 28, DOI 10.1016/S0007-8506(07)60522-8.

(43)

Bartholomew, T. V.; Mauter, M. S. Multiobjective Optimization Model for Minimizing Cost and Environmental Impact in Shale Gas Water and Wastewater 26 ACS Paragon Plus Environment

Page 27 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Sustainable Chemistry & Engineering

Management. ACS Sustain. Chem. Eng. 2016, 4 (7), 3728–3735, DOI 10.1021/acssuschemeng.6b00372. (44)

Calvo-Serrano, R.; González-Miquel, M.; Papadokonstantakis, S.; Guillén-Gosálbez, G. Predicting the Cradle-to-Gate Environmental Impact of Chemicals from Molecular Descriptors and Thermodynamic Properties via Mixed-Integer Programming. Comput. Chem. Eng. 2018, 108, 179–193, DOI 10.1016/j.compchemeng.2017.09.010.

(45)

DeRosa, S. E.; Allen, D. T. Comparison of Attributional and Consequential Life-Cycle Assessments in Chemical Manufacturing BT - Reference Module in Earth Systems and Environmental Sciences. In Encyclopedia of Sustainable Technologies; Elsevier, 2017; pp 339–347, DOI https://doi.org/10.1016/B978-0-12-409548-9.10069-7.

(46)

Calvo-Serrano, R.; Guille-Gosaíbez, G. Streamlined Life Cycle Assessment under Uncertainty Integrating a Network of the Petrochemical Industry and Optimization Techniques: Ecoinvent vs Mathematical Modeling. ACS Sustain. Chem. Eng. 2018, 6, 7109–7118, DOI 10.1021/acssuschemeng.8b01050.

(47)

DeRosa, S. E.; Allen, D. T. Impact of Natural Gas and Natural Gas Liquids Supplies on the United States Chemical Manufacturing Industry: Production Cost Effects and Identification of Bottleneck Intermediates. ACS Sustain. Chem. Eng. 2015, 3 (3), 451– 459, DOI 10.1021/sc500649k.

(48)

National Institute of Standards and Technology. NIST Chemistry WebBook https://webbook.nist.gov/chemistry/ (accessed Apr 13, 2018).

(49)

Mullins, E.; Oldland, R.; Liu, Y. A.; Wang, S.; Sandler, S. I.; Chen, C. C.; Zwolak, M.; Seavey, K. C. Sigma-Profile Database for Using COSMO-Based Thermodynamic Methods. Ind. Eng. Chem. Res. 2006, 45 (12), 4389–4415, DOI 10.1021/ie060370h.

(50)

Klamt, A. Conductor-like Screening Model for Real Solvents: A New Approach to the Quantitative Calculation of Solvation Phenomena. J. Phys. Chem. 1995, 99 (7), 2224– 2235, DOI 10.1021/j100007a062.

(51)

Islam, M. R.; Chen, C. C. COSMO-SAC Sigma Profile Generation with Conceptual Segment Concept. Ind. Eng. Chem. Res. 2015, 54 (16), 4441–4454, DOI 10.1021/ie503829b.

(52)

Klamt, A.; Eckert, F.; Hornig, M. COSMO-RS: A Novel View to Physiological 27 ACS Paragon Plus Environment

ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 29

Solvation and Partition Questions. J. Comput. Aided. Mol. Des. 2001, 15 (4), 355–365, DOI 10.1023/A:1011111506388. (53)

Diedenhofen, M.; Klamt, A. COSMO-RS as a Tool for Property Prediction of IL Mixtures-A Review. Fluid Phase Equilib. 2010, 294 (1–2), 31–38, DOI 10.1016/j.fluid.2010.02.002.

(54)

Mullins, E.; Liu, Y. A.; Ghaderi, A.; Fast, S. D. Sigma Profile Database for Predicting Solid Solubility in Pure and Mixed Solvent Mixtures for Organic Pharmacological Compounds with COSMO-Based Thermodynamic Methods. Ind. Eng. Chem. Res. 2008, 47 (5), 1707–1725, DOI 10.1021/ie0711022.

(55)

Palomar, J.; Torrecilla, J. S.; Ferro, V. R.; Rodríguez, F. Development of an a Priori Ionic Liquid Design Tool. 1. Integration of a Novel COSMO-RS Molecular Descriptor on Neural Networks. Ind. Eng. Chem. Res. 2008, 47 (13), 4523–4532, DOI 10.1021/ie800056q.

(56)

Palomar, J.; Torrecilla, J. S.; Lemus, J.; Ferro, V. R.; Rodríguez, F. Prediction of NonIdeal Behavior of Polarity/Polarizability Scales of Solvent Mixtures by Integration of a Novel COSMO-RS Molecular Descriptor and Neural Networks. Phys. Chem. Chem. Phys. 2008, 10 (39), 5967–5975, DOI 10.1039/b807617k.

(57)

Zhao, Y.; Huang, Y.; Zhang, X.; Zhang, S. A Quantitative Prediction of the Viscosity of Ionic Liquids Using S σ-Profile Molecular Descriptors. Phys. Chem. Chem. Phys. 2015, 17 (5), 3761–3767, DOI 10.1039/C4CP04712E.

(58)

Járvás, G.; Quellet, C.; Dallos, A. Estimation of Hansen Solubility Parameters Using Multivariate Nonlinear QSPR Modeling with COSMO Screening Charge Density Moments. Fluid Phase Equilib. 2011, 309 (1), 8–14, DOI 10.1016/j.fluid.2011.06.030.

(59)

Kondor, A.; Járvás, G.; Kontos, J.; Dallos, A. Temperature Dependent Surface Tension Estimation Using COSMO-RS Sigma Moments. Chem. Eng. Res. Des. 2014, 92 (12), 1867–1872, DOI 10.1016/j.cherd.2014.06.021.

(60)

COSMOtherm Version C3.0, Release 17.01. COSMOlogic GmbH & Co. KG,: Leverkusen, Germany, http://www.cosmologic.de.

(61)

Guyon, I.; Elisseeff, A. An Introduction to Variable and Feature Selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. 28 ACS Paragon Plus Environment

Page 29 of 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Sustainable Chemistry & Engineering

(62)

Ghanem, O. Ben; Mutalib, M. I. A.; Lévêque, J. M.; El-Harbawi, M. Development of QSAR Model to Predict the Ecotoxicity of Vibrio Fischeri Using COSMO-RS Descriptors. Chemosphere 2017, 170, 242–250, DOI 10.1016/j.chemosphere.2016.12.003.

TOC art (For Table of Content Use Only)

Synopsys: Environmental impacts of chemical production processes are automatically predicted using physicochemical attributes of the main product via mathematical programming.

29 ACS Paragon Plus Environment