Subscriber access provided by Kaohsiung Medical University
Gate-to-gate energy consumption in chemical batch plants: Statistical models based on reaction synthesis type Cecilia Pereira, Ines Hauner, Konrad Hungerbuehler, and Stavros Papadokonstantakis ACS Sustainable Chem. Eng., Just Accepted Manuscript • DOI: 10.1021/ acssuschemeng.7b03769 • Publication Date (Web): 04 Apr 2018 Downloaded from http://pubs.acs.org on April 4, 2018
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Sustainable Chemistry & Engineering
1 2 3
Gate-to-gate energy consumption in chemical batch plants: Statistical models based on reaction
4
synthesis type
5 6 7
Cecilia Pereira1, Ines Hauner1, Konrad Hungerbühler1, Stavros Papadokonstantakis2,*
8 1
9 10 11
Vladimir-Prelog-Weg 1, 8093 Zurich, Switzerland 2
Chalmers University of Technology, Division of Energy Technology
12 13
ETH Zurich, Institute for Chemical and Bioengineering,
Hörsalsvägen 7B, 41296 Gothenburg, Sweden *
[email protected] / +46 708129280
14 15 16
1 ACS Paragon Plus Environment
ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
1
ABSTRACT
2
Energy consumption in the chemical industry is an important operating cost and
3
environmental impact factor and reducing it is also explicitly mentioned as one of the key
4
principles of green chemistry. Energy consumption has thus been included in diverse process
5
design and evaluation tools as a key metric. However, measurements of energy consumption at
6
the process equipment level are scarce, especially in fine chemical production typically
7
performed in multiproduct and multipurpose batch plants. In this work we present a short-cut
8
approach based on statistical models, such as probability density functions (PDF) and
9
classification trees, for estimating steam consumption which typically represents the highest
10
energy utility consumption in batch plants. The output of these models is in the form of
11
intervals derived from PDF interquartile ranges and as classes derived from the classification
12
trees, respectively. The validation results (i.e., goodness of fit, cross validation and case studies)
13
show that the models provide satisfactory interval estimations of steam consumption
14
benchmarking chemical reaction types and performing uncertainty analysis. The models can be
15
primarily used at early design stages for screening purposes, the reaction type being the
16
minimum needed input information, allowing in the case of classification trees also an analysis
17
of the most influencing predictor variables (i.e., reaction type and operating parameters) upon
18
the steam consumption.
19
This study also demonstrates the use of the PDF statistical models to a previously published
2 ACS Paragon Plus Environment
Page 2 of 59
Page 3 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Sustainable Chemistry & Engineering
1
case study for the production of the intermediate substance 4-(2-methoxyethyl)-phenol, which
2
can be produced from seven different synthesis routes. The ranking of the synthesis routes
3
according to the PDF models shows similar trends to that of an Energy Loss Index proxy
4
indicator which however requires more detailed chemical and process information.
5 6
Keywords
7
Green chemistry; batch processes; energy consumption benchmarking; life cycle energy
8
inventories; early design phase metrics; classification trees.
9
3 ACS Paragon Plus Environment
ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 4 of 59
1
INTRODUCTION
2
Energy related impacts often account for over 50% of the total environmental impact of
3
chemical industry 1. Therefore, minimisation of energy use in chemical production, which has
4
been recognized as one of the twelve principles of green chemistry, is a key target for the
5
chemical sector and for environmental regulations 2. As a result, evaluation methods have been
6
developed which include energy use as a design metric in sustainable process design
7
cycle assessment 8-14, efficient heat transfer 15-17, and pinch analysis 18-20, in the academia and
8
industry (e.g., BASF
9
different design alternatives as part of scenario-based analysis and multi-objective optimization.
10
To be able to screen different alternatives at early phases of process design can be more
11
beneficial than in later stages, because the degrees of freedom are more providing flexibility for
12
modifications and improvements with significantly less costs. However, the aforementioned
13
evaluation methods require process energy consumption data, which can be scarce or
14
aggregated, because energy flow measurements at the process equipment level are more
15
complicated and costly compared to material flows. (i.e., requiring mass flow, temperature and
16
pressure measurements of the energy utilities).
21
and GlaxoSmithKline
3-7
, life
22
). These methods can be used to compare
17
Particularly in multiproduct and multipurpose batch plants, energy measurements are
18
highly aggregated (i.e., typically only available at the “production building” level), partly due
19
to complexities caused by the process dynamics of batch operation and partly because material
4 ACS Paragon Plus Environment
Page 5 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Sustainable Chemistry & Engineering
1
flows are traditionally more important for the batch plant economics. Consequently, process
2
specific energy data has to be modelled or estimated by process experts and rules of thumbs in
3
many cases.
4
Rigorous energy consumption models for batch processes have been addressed in previous 23-24
5
works
6
multipurpose batch plants based on extensive process documentation and sensor data. These
7
models are suitable when high resolution for dynamic optimisation of energy efficiency is
8
needed. Shortcut models of steam consumption in multipurpose batch plants based on standard
9
process documentation, rules of thumb, expert opinion, classical thermodynamics, and model
10
parameterisation, have also been proposed 25. These documentation-based models still require
11
detailed process information as input (e.g., standard operating procedures) which can also be
12
partially confidential. Therefore, another modelling approach is needed when standard
13
operation procedures are not available, namely in early phases of process design, or when very
14
fast estimations have to be performed for screening purposes, for instance to streamline
15
environmental assessment studies. In this context, models of steam consumption are developed
16
in this study based on statistical analysis of production data provided by a consortium of
17
industrial partners representing leading companies in fine chemical and pharmaceutical
18
production.
19
proposing bottom-up approaches for modelling of energy utility consumption in
Statistical analysis has been previously used to diverse kind of data and applications in
5 ACS Paragon Plus Environment
ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 6 of 59
1
relevance with process design and environmental assessment. Some examples include
2
relationships between parameterization factors of life cycle inventories and design parameters
3
26
, guidelines for building process and material alternatives within a life cycle inventory
27
4
identification of the most influential data elements in the life cycle inventory stage
28
5
multivariate data analysis to characterize environmental impacts of power production
6
technologies and detect cause-effect relationships 29, hierarchical cluster analysis and principal
7
component analysis for benchmarking the relative sustainability of pharmaceutical production
8
processes 30, statistically rigorous reconciliation of life cycle inventory data according to satisfy
9
thermodynamic principles 31, and uncertainty analysis in LCA inventories 32.
, ,
10
The models of steam consumption proposed in this work are based on probability density
11
functions (PDF) and classification trees (Witten and Frank, 2005, Breiman, et al., 1984). In both
12
cases the output of these models can take the form of intervals rather than point estimations.
13
The hypothesis behind these generic models is that energy utility consumption is mostly
14
dependent on the production processes than on specific reactants, auxiliaries and products.
15
Therefore, the estimation error of these generic models is expected to be partly due, among
16
other factors, to this simplification.
17
The PDF models describe the variability of the gate-to-gate steam consumption for the
18
production of one kilogram of product for a particular reaction type. The type of distribution
19
and the value of its parameters were selected to maximize the probability of generating the
6 ACS Paragon Plus Environment
Page 7 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Sustainable Chemistry & Engineering
33
1
sample data by means of the maximum likelihood method
. The goodness of the fit was
2
evaluated using standard statistical tests (Youden, 1950) and the Akaike information criteria 34.
3
The PDF models can provide interval based estimations, for instance by determining the
4
respective interquartile ranges. The classification trees directly provide intervals (i.e., in the
5
form of output classes) of the gate-to-gate steam consumption for the production of one
6
kilogram of product, given a particular set of reaction and process attributes depending on the
7
available information in the respective stage of process design.
8
The PDF models can be used for benchmarking production processes on the basis of the
9
reaction type and facilitate rigorous uncertainty analysis and fast predictions for screening
10
purposes. The classification trees can be used for a finer categorisation of the steam
11
consumption, when not only reaction type but also additional process information is available.
12
Consequently, both types of these new models should be of high importance in early phases of
13
process design, for streamlining LCA, and as benchmarking tools for labelling chemical
14
reactions.
15 16
METHODS Definition of system boundaries
17
As depicted in Figure 1, a chemical synthesis route can in general include one-to-n reaction
18
steps, each of them followed by zero-to-m work-up recovery processes. The processes for
19
which steam consumption models are developed in this work are defined by the grey boxes in
7 ACS Paragon Plus Environment
ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
1
Figure 1 corresponding to a single reaction step plus the work-up processes which immediately
2
follow this reaction step, if existent. In the rest of the document this system is referred to as
3
“reaction”. The empirical yield ranges corresponding to the reaction systems of this work are
4
given in Table S4 in the Supporting Information.
5
Special purification steps (e.g., because of unusually strict requirements for the purity of
6
the chemical to be used in the next synthesis step) and drying of the product (e.g., because of the
7
special form that the chemical must be stored, packaged, sold etc.) are not addressed within the
8
system, because this may depend on the relative position of the reaction step within the
9
synthesis path. Thus, the steam consumption models of this work are independent of whether
10
the corresponding reaction is performed, for instance, as the last step of a synthesis route or not,
11
in which case additional work-up processes may be required for delivering the traded form of
12
the chemical and meeting customer specific purities. The available dataset did not allow the
13
definition of significantly large sets for statistical analysis of reaction types separately as last
14
synthesis steps. Thus, if a reaction is performed as a last synthesis step, then the special
15
purification and other processes required for the chemical to take its final marketed form should
16
be calculated separately. Moreover, production of raw materials and auxiliaries, solvent
17
recovery and waste treatment lie outside the reaction system boundaries. The respective models
18
and tools for the scope of comprehensive cradle-to-gate LCA are generally available elsewhere
19
(e.g., Ecoinvent,
35
, Finechem, 8, Ecosolvent,
36
) and are complementary to the models
8 ACS Paragon Plus Environment
Page 8 of 59
Page 9 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Sustainable Chemistry & Engineering
1
presented in this work. Thus, for LCA purposes, a chemical synthesis path can be modelled as a
2
sequence of distinct reaction-steps, where the individual reaction models proposed here can be
3
used for filling in gate-to-gate steam consumption data gaps and cradle-to-gate life cycle
4
inventories can be provided by typical LCA tools and databases.
5
Data sources, predictor variables and output classes
6
The data for the model development were provided by nine industrial partners in
7
Switzerland, Germany, France and United States, covering different sectors from basic
8
chemicals to pesticides and pharmaceutical products. In most cases the primary data were in the
9
form of standard operation procedures describing in detail the way the unit operations are
10
performed in the production plants. This data was used to model the steam consumption
11
(kilograms of steam at 6 bar per kilogram of product) according to the method presented
12
elsewhere 25 for the reaction boundary system described above. Each of these modelled values
13
represents one data point in the training dataset (250 points) for the development of the PDF
14
and the classification tree models. Additional data from the industrial partners that have not
15
been used for model development were used for testing and comparing the performance of the
16
classification trees and the PDF models. This testset consists of 17 modelled data of steam
17
consumption for chemical production in similar multiproduct/multipurpose batch plants like
18
those used in the training setnot all of which, however, correspond to the same modelling
19
approach 25.
9 ACS Paragon Plus Environment
ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
1
The collected data cover a wide range of frequently performed reactions in the chemical
2
industry. Nevertheless, this is not a comprehensive study of all existing reactions in chemical
3
production sites but rather a methodology and set of models for benchmarking the most
4
common reaction types with respect to their energy (i.e., steam) consumption, while motivating
5
practitioners in academia and industry to extend it in other reactions of their interest. The
6
selected reaction classes and the reaction classification procedure are shown and explained in
7
the Supporting Information (Figure S1). The reaction type constitutes the main predictor
8
variable for the PDF models and one of the predictor variables for the classification trees.
9
Classification trees may include several other predictor variables of nominal, binary and
10
continuous type depending on the stage of process design (Table1). At the earliest stage of
11
process design (S1), only information regarding the reaction type is available. At the next stages,
12
the type of work-up processes (S2), operating parameters such as operation time and
13
temperatures (S3) and mass flows (S4) are respectively known. All these variables can be
14
determined in conceptual design stages based on laboratory experiments and simple scale-up
15
calculations (e.g., what kind of separations are most likely applicable, what are the maximum
16
expected temperatures and the expected mass efficiencies, etc.). The latest stage of design (S5)
17
assumes that the steam consumption of the energy intensive distillation processes is available.
18
The classification trees developed for S5 can be used to compare the performance of the S1-S4
19
models, which are missing the important predictor of the distillation steam consumption.
10 ACS Paragon Plus Environment
Page 10 of 59
Page 11 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Sustainable Chemistry & Engineering
1
To test the effect of the number of predictor variables in the performance of the 37
2
classification trees
3
dataset-1 (Table 1) comprises only a subset of predictors from the more inclusive dataset-2
4
(Table S2), selected based on empirical knowledge about their influence on steam consumption.
5
For all classification models, the target variable (i.e., steam consumption) is discretized
6
into a certain number of intervals (classes); thus all classification trees provide as output a range
7
of steam consumption ranges rather than point estimation. To specify the number and the width
8
of the steam consumption intervals, a histogram of the steam consumption data indicated a
9
trade-off between classes of equal size and sufficient sample size in every interval. To test the
10
influence of the number and size of classes, two scenarios were tested with three (Table 2) and
11
five classes (Table S3), respectively.
12
two different training datasets (dataset-1 and dataset-2) were used:
Model development and evaluation metrics
13
Classification trees represent rules underlying data with hierarchical, sequential structures.
14
These rules partition the data in every tree node based on a particular predictor variable value.
15
At every node, the resulting split optimizes the classification for the respective tree depth. The
16
tree is typically grown to its full size achieving maximum classification performance for the
17
training data (e.g., using the CART algorithm
18
validation procedure to safeguard against overfitting. In this study, cross validation was
19
performed by randomly dividing the data into ten equal partitions (i.e., stratified ten-fold cross
38
) and then pruned back according to a cross
11 ACS Paragon Plus Environment
ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 12 of 59
1
validation), keeping in turn one tenth of the data for testing and the remaining nine-tenths of the
2
data for training. Thus, the training procedure is performed ten times on different datasets and
3
every time the cross validation defines the degree that the tree must be pruned (e.g., based on
4
the accuracy, see Eq. 1). From the average performance during the ten-fold cross validation the
5
pruning degree of the tree and the modelling accuracy and generalisation metrics are inferred.
6
Then, this pruning degree is imposed in a tree trained with all the available data to propose a
7
final classification tree. This is a standard procedure, when the data for training and testing is
8
limited 37.
9
Three different metrics, namely sensitivity, specificity and accuracy, are calculated to
10
assess the classification performance for each output class and the overall performance of the
11
classification tree. These metrics can be expressed as follows:
12
Accuracy = (TP+TN)/(TP+TN+FP+FN)
(1)
13
Sensitivity = TP/(TP+FN)
(2)
14
Specificity = TN/(TN+FP)
(3)
15
where TP (true positives) accounts for the number of instances belonging to one class and
16
predicted within that class, FN (false negatives) is the number of instances belonging to one
17
class but not predicted within that class, FP (false positives) is the number of instances
18
predicted to be in one class but not belonging to that class, and TN (true negatives) is the
19
number of instances not belonging to one class and not being predicted into that class. While
12 ACS Paragon Plus Environment
Page 13 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Sustainable Chemistry & Engineering
1
high sensitivity for a certain output class implies a very general model for this class, namely a
2
lot of data points fitting into few rules, low sensitivity denotes a very specific model with many
3
rules containing small portions of data. This trade-off is best depicted in plots of sensitivity
4
against (1-specificity), also called receiver operating characteristic (ROC) plots 39.
5
Besides the tree performance for every output class and overall, extraction of important logical
6
rules is an important step towards model interpretability and transparency. The importance of
7
these rules can be considered in relation to the importance of certain predictor variables, in the
8
sense that an important rule should also contain important predictors. This importance can be
9
quantified considering the risk reduction from parent to children nodes due to splitting on every
10
predictor variable (see also formulas in the Supporting Information, Figure S3).
11
The selection of important predictors can also be used for further parameterisation of the PDF
12
models, the second type of models developed in this work. These models are probability
13
density functions fitted to different datasets, each of them defined by one reaction type.
14
Parameterisation of these models means that for a specific reaction type the dataset is divided
15
into subsets according to process parameters, such as temperature, operation time, etc. An
16
example of this parameterisation procedure is the partition of the dataset for the condensation
17
reaction into two datasets, a first one for condensations where distillation processes take place
18
and a second one where distillation processes do not take place. In general, the need for further
19
parameterisation arises when the intervals of the initial PDF models are very broad or when the
13 ACS Paragon Plus Environment
ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
1 2 3
goodness of fit indicates a poor fitting of the data. RESULTS AND DISCUSSION Classification trees
4
For analyzing the performance of the classification trees we start by considering dataset-1
5
as training set and three output classes. The selection of priors (i.e., estimates of the probability
6
that randomly sampling an instance from a population will yield the given class) in this study
7
did not influence the model performance at any of the design stages considered in this work.
8
The results of the cross validation for the five classification trees (S1 to S5) are depicted in
9
Figure S4a in the Supporting Information as an ROC plot. As expected, an improvement from
10
S1 to S5 for both training and test sets is observed, since more process information is available
11
for the models. More importantly, overfitting is avoided since the model performance for the
12
training and test sets is similar, with a difference of less than 11% and 6% for the sensitivity and
13
specificity respectively. The classification trees with five output classes (Figure S4b in the
14
Supporting Information) follow similar trends but have generally lower sensitivity values.
15
Moreover, the additional resolution for the case of the five output classes is not well supported
16
from the independent class analysis, as discussed later in this section. Overall, the results of the
17
preliminary analysis indicate that the case of three output classes and eight predictor variables
18
(dataset-1) presents the best performance and interpretability due to lower model complexity
19
(see also the Supporting Information and Figure S2 for further discussion on the model
14 ACS Paragon Plus Environment
Page 14 of 59
Page 15 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Sustainable Chemistry & Engineering
1
selection).
2
Figure 2 shows the performance of the selected model per output class. The low and high
3
classes tend to form clusters, while the middle class is rather scattered. The low class presents
4
higher sensitivity and slightly lower specificity than the high and middle classes. This is
5
because of the more data points belonging to the low class than to the other classes in a ratio of
6
approximately 2 to 1. In general, when the class sizes are not equal, the model favours the larger
7
class in terms of sensitivity and overall success rate or accuracy, but performs less well
8
regarding specificity. Following the low class, the high class shows a better performance than
9
the middle class. Moreover, for the middle class we observe a performance improvement from
10
S1 to S5 in terms of sensitivity, while the other two classes improve in terms of specificity.
11
Overall, these results show that except for the middle class in S1, all other classes at different
12
stages of process design appear at the left side of the ROC plot, indicating satisfactory model
13
performance. Similarly, the classification trees with five output classes (depicted in Figure S5
14
in the Supporting Information) show a poor performance for the respective middle classes,
15
namely the middle-low, middle, and middle-high classes. These results also support the
16
decision of having three output classes instead of five, since the higher resolution regarding the
17
middle classes did not lead to improvement of the performance of the overall tree or the specific
18
output classes. The sensitivity and specificity values as well as the distances to the (0,1) point
19
and the random line for the three output classes trees built from dataset-1 can be found in Table
15 ACS Paragon Plus Environment
ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
1
S5 in the Supporting Information.
2
Table 3 summarizes the most important classification rules for the models at each one of
3
the S1 to S5 stages, each rule corresponding to a path in the decision tree (the full classification
4
trees for every design stage are presented in Tables S8-S12 in the Supporting Information). For
5
every output class the most sensitive rule was selected as the most important one, since
6
sensitivity was found to be the most critical metric, especially for the middle class. For example,
7
Figure 3 shows the classification tree developed for S4 with a total of seven paths
8
corresponding to seven rules. The path leading to the highlighted high class output corresponds
9
to rule-3 for S4 and can be stated as follows:
10
IF the reaction type is acylation OR alkylation OR complexation OR condensation OR
11
hydrolysis OR polymerisation OR reduction, AND the operation time is higher than 18 hours
12
THEN the steam consumption is high.
13
The performance of this model is presented in Figure 4a by means of a resubstitution
14
performance, where the training data is presented on the x-axis and the predicted classes on the
15
y-axis. In agreement with what was observed in Figure 2, the S4 model is performing very well
16
for the low and high classes, and satisfactorily for the middle class. Moreover, combining the
17
information in Figure 4a with that of Table 2, the following can be inferred for the performance
18
of the S4 model:
19
When the model output is “low steam consumption” there is 11% probability that this is
16 ACS Paragon Plus Environment
Page 16 of 59
Page 17 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Sustainable Chemistry & Engineering
1 2 3 4 5
underestimation. When the model output is “medium steam consumption” there is 18% probability that this is underestimation and 9% probability that this is overestimation. When the model output is “high steam consumption” there is 26% probability that this is overestimation.
6
The more “severe” misclassification errors (i.e., high-to-low or low-to-high) have a
7
probability of 4% and 5% in the cases that the model predicts “high” or “low” steam
8
consumption, respectively.
9
Considering that the main use of these models would be at early cases of process design for
10
prescreening synthesis options or for a fast estimation (i.e., before detailed process simulations
11
and pilot plant experimentation), these numbers are satisfactory, taking also into account the
12
respective inaccuracies in other life cycle assessment metrics, costs estimations etc., at these
13
early stages.
14
Moreover, if the target of prescreening process options is to mostly avoid losing
15
“interesting” process options (i.e., low steam consumption in this regard), then only 5% for of
16
such options would be screened out. However, it should be noted that better performance with
17
respect to specificity of the medium and high consumption classes can be achieved by
18
increasing the number of data points for these categories in future studies.
19
In Table 3, the reaction type appears in all rules from S1 to S4, indicating a high importance
17 ACS Paragon Plus Environment
ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
1
of this predictor variable. This is also confirmed by the predictor importance plot (Figure S3 in
2
the Supporting Information), where the reaction type presents the highest importance from S1
3
to S4 among all attributes. The operation time, which appears in two rules in S3 and S4, has also
4
a high influence. This can also be observed in Figure S3 in the Supporting Information. On the
5
other hand, the rules derived for S5 follow a different pattern. As expected, the distillation
6
steam consumption being the main additional predictor in S5 dominates the classification rules,
7
as it is very important part of the target attribute. Other differences in S5 classification trees
8
refer to the reaction type not being part of the rules, while the temperature, which was not
9
appearing in S3 and S4, is now included in two of the rules. Again, the same trends can be also
10
observed in the predictor importance plot in Figure S3 in the Supporting Information.
11
Summarizing, it is verified that the most sensitive rules for the classification trees include the
12
most important predictor variables.
13
PDF models
14
The most important predictor variable for S1 to S4 was shown to be the reaction type,
15
which verifies the decision to construct PDF models on this minimum process information. In
16
addition, the extracted rules from the classification trees helped to further parameterize the PDF
17
models, where necessary. Table 4 shows the empirical median, minimum and maximum steam
18
consumption values of the different datasets corresponding to reaction types (first column) and
19
the further parameterized subsets (second column). It also includes the fitted PDF with the
18 ACS Paragon Plus Environment
Page 18 of 59
Page 19 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Sustainable Chemistry & Engineering
1
corresponding parameters, their median, first and third quartiles, and 2.5th and 97.5th percentiles.
2
The assessment of the goodness of the fit is also given and further discussed in Table S6 in the
3
Supporting Information. The performance of the PDF models when providing interval
4
estimation of the steam consumption based on the interquartile ranges is depicted in Figure 4b.
5
The resubstitution performance is in most cases similar among the different reaction types
6
(60% average of true positives) except for the polymerisation and elimination reactions which
7
present a slightly poorer performance (40% true positives), and the reduction reaction which
8
performs slightly better (70% true positives) than the rest. This is an indication that for the
9
polymerisation and elimination reactions the hypothesis that steam consumption can be well
10
predicted based on the reaction type is not verified. In general, although the predictive
11
performance of the PDF models is inferior to the one of the classification trees with additional
12
process related predictor variables, they can still provide useful information for an interval
13
estimation based on the interquartile ranges. Additionally, the PDF models can benchmark for
14
chemical reaction types performed in industrial operations with respect to their place in the
15
distribution of the same reaction type family. Furthermore, the PDF models allow for a more
16
rigorous uncertainty analysis compared to the interval estimations, by sampling from the
17
respective distributions.
18
In cases where further parameterisation was performed for the PDF models (i.e., for 5 out
19
of the 13 reaction models), the original dataset for a reaction type was partitioned into two
19 ACS Paragon Plus Environment
ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
1
subsets. In all cases the partition results in one lower and one higher interquartile interval.
2
These interquartile intervals have the same or smaller width with the original interquartile
3
intervals and present a significant degree of overlapping with them. Only for the polymerisation
4
reaction no overlapping was observed because of the gap of values in the empirical distribution
5
of this reaction group. As a result of the parameterisation, the PDF models for alkylation,
6
condensation and polymerisation maintain the same percentage of true positives with narrower
7
interval predictions. The only exception is the family of acylation reactions where the narrower
8
intervals are combined with a decrease in the number of true positives decreases (see Table S7
9
in the Supporting Information); thus in this case a comparison between the parameterized and
10
the parent PDF models is not straightforward. Overall, these results support the increased
11
resolution of the models as a result of the further parameterisation. Continuing the
12
parameterization using additional predictor variables could theoretically improve the PDF
13
model performance; however such a parameterization could not be supported by the available
14
amount of data for statistically significant inference.
15
Therefore, for prediction purposes, the use of higher order classification trees should be
16
preferred, assuming availability of the respective predictor variable values. On the other hand,
17
when the input information of the classification trees is comparable with the one of the PDF
18
models, as for instance in the cases of the S1 classification trees, a lower prediction
19
performance compared to the S4 classification tree is expected (Figure S6 in the Supporting
20 ACS Paragon Plus Environment
Page 20 of 59
Page 21 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Sustainable Chemistry & Engineering
1 2
Information), which is much closer to the level of the PDF models. Case study results
3
Both types of models were applied to the additional case study dataset and the results are
4
presented in Figure 5. The 30% error limit is considered to be acceptable for shortcut models at
5
early design stages
6
performance of the models (training set) and on the bottom (b) the performance on the external
7
case study dataset (not used for training). In both cases we observe that the PDF models and the
8
S1 classification tree perform similarly within a difference range of 13% (i.e., regarding the
9
respective sections of the bars). The rest of the trends are also similar between (a) and (b),
10
although generally the performance of the models is inferior in the external dataset. This can be
11
due to the fact that not all steam consumption target values were derived by utilizing standard
12
operating procedure documents, which were always available for the training set. However, in
13
both cases, more than 80% of the predictions were not underestimated by more than 30%. This
14
is an additional positive feature for the robustness of the model, in terms of safeguarding the
15
predictions from severe steam consumption underestimation.
40-41
. On the top of Figure 5 (a), we see the respective resubstitution
16
Summarizing the results presented in this section, for a first approximate estimation of
17
steam consumption and fast screening purposes the PDF models or the S1 trees can be used..
18
Higher order classification trees (S4-S5) provide more satisfactory steam consumption
19
estimations and can also be used for revealing key process features that define in a large extent
21 ACS Paragon Plus Environment
ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
1 2 3
the level of energy consumption (high, middle, low). Demonstration of the PDF models to a case study for the production of 4-(2-methoxyethyl)-phenol
4
The PDF models were applied to a previously published case study (Albrecht et al., 2010)
5
for the production of 4-(2-methoxyethyl)-phenol from seven different synthesis routes (Figure
6
6). The procedure of estimating the steam consumption of the different synthesis routes using
7
the PDF models is demonstrated in detail and the resulting ranking of the alternatives is
8
compared to the one derived by the proxy indicators of Albrecht et al. (2010). The reason that
9
only the PDF models are used is to demonstrate what one should expect when the only
10
information available is the chemical synthesis path. Table 5 depicts the reaction types at every
11
step for the seven synthesis routes. From the 30 reactions, 13 belong exactly to reactions
12
considered in the dataset for the development of the PDF statistical models. Seven reactions
13
steps cannot be exactly matched to any of the reactions in the dataset but belong to one of the
14
reaction categories (e.g., the O-Alkylation reaction with dimethyl sulfate as reactant in step
15
A-3). In this case in order to fill the data gap the steam consumption is estimated considering
16
the PDF model for Alkylation. Ten remaining reactions could not be assigned to any of the
17
reactions included in the training dataset (e.g., nitration in step A-1). A default value of 1.2 kg
18
per kg of product, calculated as the average over the median values from all PDF models, was
19
considered in order to fill the data gaps in these cases.
22 ACS Paragon Plus Environment
Page 22 of 59
Page 23 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Sustainable Chemistry & Engineering
1
The steam consumption for a complete synthesis route can be estimated by adding the
2
median values derived from the PDF models for every reaction step. The model values given in
3
kilograms of steam per kilogram of product were multiplied by the corresponding reaction yield
4
values, assuming stoichiometric ratios. As it can be seen in Figures 7a and 7b, route E
5
represents the best alternative according to the PDF models for steam consumption, the Mass
6
Loss Index (MLI) and the Energy Loss Index (ELI) defined in the work of Albrecht et al.,
7
(2010). The MLI is defined in this case, as the sum of the mass ratios of all coupled products
8
and by-products to intermediate or end product. Other input materials into the system such as
9
solvents, auxiliaries, etc. are not considered in this definition of the MLI, since this information
10
is not available at earliest design stages. The ELI proxy indicator is calculated on the basis of
11
four parameters: the concentration of water at the reactor outlet, the difference of the boiling
12
point temperatures between the product and the substance which has the closest boiling point to
13
the product, the MLI values for each reaction step, and the reaction energy. All these parameters
14
are first scaled according to empirical criteria and then weighted and aggregated to give the ELI
15
value. The rankings given of the PDF models and the ELI indicator are generally in agreement,
16
with the exception of routes C and D. Overall the ranking according to the ELI indicator
17
presents a higher similarity to the PDF model predictions than the MLIs. This is consistent with
18
the fact that an index based only on the reaction mass yield does not necessarily correlate with
19
energy consumption. A fairly good correlation of a mass index, namely the Process Mass
23 ACS Paragon Plus Environment
ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
1
Intensity (PMI), with an energy related indicator, such as the cradle-to-gate Global Warming
2
Potential (GWP), was observed in the work of Jimenez-Gonzalez et al., (2011). However, in
3
this case, the mass index (PMI) includes the total mass of materials per mass of product, for
4
instance reactants, reagents, solvents used for reaction and separation and catalysts, being this a
5
more robust, but also a more data intensive indicator than the MLI as defined by Albrecht et al.,
6
(2010).
7
In order to evaluate the uncertainty of the ranking results, a Monte Carlo simulation was
8
carried out. For each reaction step of the synthesis routes, values of steam consumption were
9
repeatedly sampled by the PDF models. Then, standard ANOVA tests were performed to
10
analyze the differences between the means of these samples. The data sample for each synthesis
11
route was obtained by adding the sample values for each reaction step within the route. For the
12
reaction steps without a corresponding PDF model, a uniform distribution was considered for
13
the generation of the sample. The parameters of the uniform distribution were assigned the
14
minimum and maximum values of 0 and 2.4 kg per kg of product respectively. These values
15
were derived considering the mean value of the distribution to be the default value of 1.2 kg per
16
kg of product.
17
A boxplot diagram of the different reaction route samples is shown in Figure 8. Two
18
different groups of synthesis routes can be observed, namely E, G, D, A with lower median
19
values and interval ranges, and F, B, C with higher median values and interval ranges. The
24 ACS Paragon Plus Environment
Page 24 of 59
Page 25 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Sustainable Chemistry & Engineering
1
results of the ANOVA test (not presented here) showed that it is possible to further discriminate
2
among some routes within these two groups. It was determined that route C is significantly
3
different from B and F, route E from route G. and route A from route D.
4
The results depicted in this case study show the applicability of the PDF models –
5
considering median values or intervals – for a fast ranking of alternative chemical synthesis
6
routes with respect to their energy consumption at early stages of process design. The PDF
7
models provide a similar ranking as by using a more complex and process data intensive
8
indicator such as the ELI. The PDF models cannot obviously be a “perfect” match to the
9
relatively richer in information ELI index. Moreover, in this specific case study, the prediction
10
of route D (one of the two routes not correctly ranked) is somewhat cumbersome because half
11
of the reactions involved in the synthesis path do not match with any of the reactions for which
12
PDF models were developed and are thus calculated based on default values.
13
Finally, one should consider the proper way of using the PDF based ranking (i.e., not only
14
the one based on the point estimations but also the Monte Carlo and ANOVA analysis), namely
15
to prioritize the most promising routes for the next level of more detailed analysis. For instance,
16
looking at the ELI based ranking in Figure 8, one would prioritize routes E and G from energy
17
consumption perspective, followed by C and A, while F, B and D would most probably be
18
discarded from the next analysis steps. Based on the point estimations of the PDF models, one
19
would also clearly prioritize E and G, followed by D and A. Knowing the aforementioned PDF
25 ACS Paragon Plus Environment
ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
1
data gaps considering route D, one would probably still consider it for the next stage of analysis
2
and try to fill in data gaps by process simulations, etc. The fact that route D would also pass to
3
the next stage of detailed process simulation (or perhaps to ELI or other index-based
4
prioritization), should not be considered as a drawback of the PDF models, which in principle
5
do not want to lose at this stage any major opportunity, while still reducing the number of
6
options for the next stage of analysis following a relatively simple calculation procedure.
7 8
CONCLUSIONS
9
We have proposed shortcut models of steam consumption for production processes in
10
chemical batch plants in the form of generic intervals based on modelling and statistical
11
analysis of industrial production data. The developed models are in the form of probability
12
density functions (PDFs) and classification trees and can be used at different levels of
13
information for the chemical process under investigation, the minimal being the reaction type.
14
The resubstitution performance of the PDF models indicates that for most reaction types, except
15
for polymerisation and elimination, the interquartile ranges can provide satisfactory interval
16
estimations when the reaction type is the only available process information. This can be of
17
particular interest when a first screening of various chemical synthesis path is required, before
18
proceeding to experimentation or laborious process design calculations, to eliminate some of
19
the process options if statistically significant differences in the estimated steam consumption
26 ACS Paragon Plus Environment
Page 26 of 59
Page 27 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Sustainable Chemistry & Engineering
1
are detected. Additionally, the PDF and the classification trees generalisation capability was
2
validated in a case study. It was shown that, in average, more than 80% of the predictions were
3
not underestimated by more than 30%, being this a satisfactory performance for shortcut
4
models at early design stages.
5
The new shortcut steam models developed in this work represent a potentially useful tool
6
for estimating steam consumption of production processes, when limited process information is
7
available or when overwhelming processes have to be screened in short time. Although the PDF
8
models allow for reasonable predictions of steam consumption, their most interesting
9
application will be to benchmark chemical reaction types and facilitate rigorous uncertainty
10
analysis. Thus, the shortcut models presented here are especially suitable for applications in the
11
fields of process design and for streamlining LCA studies.
12
ACKNOWLEDGEMENTS
13
We thank Professor Stefanie Hellweg, the Swiss Federal Office for the Environment, the Swiss
14
Federal Office for Energy, and our industrial partners for their support in this project.
15
REFERENCES
16
1.
Wernet, G.; Mutel, C.; Hellweg, S.; Hungerbuehler, K., The Environmental Importance
17 18 19
of Energy Use in Chemical Production. J. Ind. Ecol. 2011, 15 (1), 96-107, doi 10.1111/j.1530-9290.2010.00294.x 2. Jenck, J. F.; Agterberg, F.; Droescher, M. J., Products and processes for a sustainable
20 21 22 23
chemical industry: a review of achievements and prospects. Green Chem. 2004, 6 (11), 544-556, doi 10.1039/b406854h 3. Albrecht, T.; Papadokonstantakis, S.; Sugiyama, H.; Hungerbühler, K., Demonstrating multi-objective screening of chemical batch process alternatives during early design phases.
24
Chem. Eng. Res. Des. 2010, 88 (5-6A), 529-550, doi 10.1016/j.cherd.2009.11.009
27 ACS Paragon Plus Environment
ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
1
4.
Cano-Ruiz, J. A.; McRae, G. J., Environmentally conscious chemical process design.
2 3
Annu Rev Energy Env 1998, 23, 499-536, doi 10.1146/annurev.energy.23.1.499 5. Chen, H.; Shonnard, D. R., Systematic framework for environmentally conscious
4 5
chemical process design: Early and detailed design stages. Ind. Eng. Chem. Res. 2004, 43 (2), 535-552, doi 10.1021/ie0304356
6
6.
Ruiz-Mercado, G. J.; Smith, R. L.; Gonzalez, M. A., Sustainability indicators for
7 8 9 10
chemical processes: I. Taxonomy. Ind. Eng. Chem. Res. 2012, 51 (5), 2309-2328, doi 10.1021/ie102116e 7. Sugiyama, H.; Fischer, U.; Hungerbuhler, K.; Hirao, M., Decision framework for chemical process design including different stages environmental, health, and safety
11 12 13
assessment. AlChE J. 2008, 54 (4), 1037-1053, doi 10.1002/aic.11430 8. Wernet, G.; Papadokonstantakis, S.; Hellweg, S.; Hungerbuhler, K., Bridging data gaps in environmental assessments: Modeling impacts of fine and basic chemical production. Green
14 15
Chem. 2009, 11 (11), 1826-1831, doi 10.1039/b905558d 9. Kim, S.; Overcash, M., Energy in chemical manufacturing processes: gate-to-gate
16 17 18 19
information for life cycle assessment. J. Chem. Technol. Biotechnol. 2003, 78 (9), 995-1005, doi 10.1002/jctb.821 10. Kniel, G. E.; Delmarco, K.; Petrie, J. G., Life cycle assessment applied to process design: Environmental and economic analysis and optimization of a nitric acid plant. Environ.
20 21
Prog. 1996, 15, 221-228, doi 10.1002/ep.670150410 11. Bretz, R.; Frankhauser, P., Screening LCA for large numbers of products. Int. J. Life
22 23 24 25
Cycle Assess. 1996, 1 (3), 139-146, doi 10.1007/BF02978941 12. Van der Vorst, G.; Van Langenhove, H.; De Paep, F.; Aelterman, W.; Dingenen, J.; Dewulf, J., Exergetic life cycle analysis for the selection of chromatographic separation processes in the pharmaceutical industry: preparative HPLC versus preparative SFC. Green
26 27 28
Chem. 2009, 11 (7), 1007-1012, doi 10.1039/b901151j 13. Romero-Hernandez, O., To treat or not to treat? Applying chemical engineering tools and a life cycle approach to assessing the level of sustainability of a clean-up technology. Green
29 30
Chem. 2004, 6 (8), 395-400, doi 10.1039/b401871k 14. Hellweg, S.; Fischer, U.; Scheringer, M.; Hungerbuhler, K., Environmental assessment
31 32 33
of chemicals: methods and application to a case study of organic solvents. Green Chem. 2004, 6 (8), 418-427, doi 10.1039/b402807b 15. Vaklieva-Bancheva, N.; Ivanov, B. B.; Shah, N.; Pantelides, C. C., Heat exchanger
34 35 36
network design for multipurpose batch plants. Comput. Chem. Eng. 1996, 20 (8), 989-1001, doi 10.1016/0098-1354(95)00217-0 16. Phillips, C. H.; Lauschke, G.; Peerhossaini, H., Intensification of batch chemical
37 38 39
processes by using integrated chemical reactor-heat exchangers. Appl. Therm. Eng. 1997, 17 (8-10), 809-824, doi 10.1016/s1359-4311(96)00061-0 17. Oppenheimer, O.; Sorensen, E., Comparative energy consumption in batch and
28 ACS Paragon Plus Environment
Page 28 of 59
Page 29 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Sustainable Chemistry & Engineering
1 2
continuous distillation. Comput. Chem. Eng. 1997, 21, 529-534, doi 10.1016/s0098-1354(97)87556-4
3 4 5
18. Smith, R., Chemical process design. McGraw Hill: New York, 1995. 19. Shenoy, U. V., Heat exchanger network synthesis: process optimization by energy and resource analysis. Gulf Publishing Co.: Houston, 1995.
6 7 8
20. Linnhoff, B., Pinch analysis - A state-of-the-art overview. Chem. Eng. Res. Des. 1993, 71 (A5), 503-522, doi 21. Saling, P.; Kicherer, A.; Dittrich-Krämer, B.; Wittlinger, R.; Zombik, W.; Schmidt, I.;
9
Schrott, W.; Schmidt, S., Eco-efficiency analysis by BASF: The method. Int. J. Life Cycle
10 11
Assess. 2002, 1-16, doi 10.1065/lca2002.06.083.1 22. Jiménez-González, C.; Constable, D. J. C.; Curzons, A. D.; Cunningham, V. L.,
12
Developing GSK’s green technology guidance: methodology for case-scenario comparison of
13 14
technologies. Clean Techn Environ Policy 2002, 4, 44-53, doi 10.1007/s10098-001-0134-7 23. Bieler, P. S.; Fischer, U.; Hungerbuhler, K., Modeling the energy consumption of
15 16 17 18
chemical batch plants: Bottom-up approach. Ind. Eng. Chem. Res. 2004, 43 (24), 7785-7795, doi 10.1021/ie049641j 24. Szïjjarto, A.; Papadokonstantakis, S.; Fischer, U.; Hungerbühler, K., Bottom-up modeling of the steam consumption in multipurpose chemical batch plants focusing on
19 20 21 22
identification of the optimization potential. Ind. Eng. Chem. Res. 2008, 47, 7323-7334, doi 10.1021/ie071291o 25. Pereira, C.; Papadokonstantakis, S.; Rerat, C.; Hungerbühler, K., Industrial documentation-based approach for modeling the process steam consumption in chemical batch
23 24 25
plants. Ind. Eng. Chem. Res. 2013, 52 (44), 15635-15647, doi 10.1021/ie401198w 26. Mueller, K. G.; Lamperth, M. U.; Kimura, F., Parameterised inventories for life cycle assessment - Systematically relating design parameters to the life cycle inventory. Int. J. Life
26 27
Cycle Assess. 2004, 9 (4), 227-235, doi 10.1065/lca2004.03.147 27. Cooper, J.; Godwin, C.; Hall, E. S., Modeling process and material alternatives in life
28 29
cycle assessments. Int. J. Life Cycle Assess. 2008, 13 (2), 115-123, doi 10.1065/lca2007.06.341 28. Canter, K. G.; Kennedy, D. J.; Montgomery, D. C.; Keats, J. B.; Carlyle, W. M.,
30 31 32 33
Screening stochastic life cycle assessment inventory models. Int. J. Life Cycle Assess. 2002, 7 (1), 18-26, doi 10.1065/lca2001.08.063 29. Cosmi, C.; Loperte, S.; Macchiato, M.; Pietrapertosa, F.; Ragosta, M.; Salvia, M., Life cycle assessment and multivariate data analysis for an integrated characterisation of the
34 35 36
technologies for electric energy production. Adv. Air Pollut. Ser. 2004, 14, 67-75, doi 30. Curzons, A. D.; Jimenez-Gonzalez, C.; Duncan, A. L.; Constable, D. J. C.; Cunningham, V. L., Fast life cycle assessment of synthetic chemistry (FLASC (TM)) tool. Int. J.
37 38
Life Cycle Assess. 2007, 12 (4), 272-280, doi 10.1065/lca2007.03.315 31. Hau, J. L.; Yi, H. S.; Bakshi, B. R., Enhancing life-cycle inventories via reconciliation
39
with the laws of thermodynamics. J. Ind. Ecol. 2007, 11 (4), 5-25, doi 10.1162/jiec.2007.1165
29 ACS Paragon Plus Environment
ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
1 2
32. Maurice, B.; Frischknecht, R.; Coelho-Schwirtz, V.; Hungerbühler, K., Uncertainty analysis in life cycle inventory. Application to the production of electricity with French coal
3 4 5
power plants. J. Clean. Prod. 2000, 8, 95-108, doi 10.1016/S0959-6526(99)00324-8 33. Myung, I. J., Tutorial on maximum likelihood estimation. J. Math. Psychol. 2003, 47 (1), 90-100, doi 10.1016/s0022-2496(02)00028-7
6 7 8
34. Akaike, H., New look at statistical-model identification. IEEE T. Autmoat. Contr. 1974, AC19 (6), 716-723, doi 10.1109/tac.1974.1100705 35. Frischknecht, R.; Jungbluth, N.; Althaus, H. J.; Doka, G.; Dones, R.; Heck, T.; Hellweg,
9
S.; Hischier, R.; Nemecek, T.; Rebitzer, G.; Spielmann, M., The ecoinvent database: Overview
10 11
and methodological framework. Int. J. Life Cycle Assess. 2005, 10 (1), 3-9, doi 10.1065/lca2004.10.181.1
12 13
36. Capello, C.; Hellweg, S.; Badertscher, B.; Betschart, H.; Hungerbuhler, K., Part 1: The ecosolvent tool - Environmental assessment of waste-solvent treatment options. J. Ind. Ecol.
14 15 16 17 18
2007, 11 (4), 26-38, doi 10.1162/jiec.2007.1231 37. Witten, I.; Frank, E., Data mining. second ed. ed.; Morgan Kaufmann Publishers: Massachusetts, 2005. 38. Breiman, L.; Friedman, J. H.; Olshen, R. A.; Stone, C. J., Classification and regression trees. Chapman and Hall: New York, 1984.
19 20 21 22 23
39. Youden, W. J., Index for rating diagnostic tests Biometrics 1950, 6 (2), 172-173, doi 10.2307/3001825 40. Bumann, A. A.; Papadokonstantakis, S.; Sugiyama, H.; Fischer, U.; Hungerbuehler, K., Evaluation and analysis of a proxy indicator for the estimation of gate-to-gate energy consumption in the early process design phases: The case of organic solvent production. Energy
24 25 26
2010, 35 (6), 2407-2418, doi 10.1016/j.energy.2010.02.023 41. Turton, R.; Bailie, R. C.; Whiting, W. B.; Shaeiwitz, J. A., Analysis, synthesis, and design of chemical processes. Prentice Hall: New Jersey, 1998.
27
30 ACS Paragon Plus Environment
Page 30 of 59
Page 31 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Sustainable Chemistry & Engineering
1
SUPPORTING INFORMATION
2 3 4
Methodology followed for chemical reaction classification Statistics of reaction yields in the dataset Description of predictor variables, and discretization of steam consumption
5 6 7
Further insights in model selection and predictor importance Assessment of the “goodness of fit” for the PDF models Detailed description of the classification trees in the form of IF-THEN rules
8
31 ACS Paragon Plus Environment
ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
1
TABLES
2 3
Table 1: Definition of the predictor variables in the dataset-1.
Stage S1
Predictor
Type
Description
reaction type
categorical
Reaction type defined according to Figure S1
mechanism
categorical
Reaction mechanism defined according to Table S1
variable
Total S2
2 distillation
binary
Indicates presence or absence of distillation processes during the reaction work-up. It refers to simple evaporation or distillation under reflux conditions (yes/ no)
reflux
binary
Indicates presence or absence of reflux conditions during the reaction synthesis or during the reaction work-up (yes/ no)
Total* S3
4 Tmax
continuous
Maximal operation temperature in °C
time
continuous
Sum over time in hours required for heating of the reaction mixture, solvent evaporation, keeping the temperature constant above the atmospheric temperature under reflux conditions or not, during the reaction synthesis and work-up processes within the defined boundary system
Total* S4
6 PMI
continous
Total* S5
Process Mass Intensity1 7
Steamdist
continous
Total*
Steam consumption during distillation processes 8
4 5 6
*The total number of predictor variables per design stage is cumulative, meaning that at a certain stage i the variables appearing
7
at the previous stages are also present at stage i.
8
1
PMI =
mtotal m product
, where mtotal is the total input mass of raw materials and mproduct is the mass of product.
32 ACS Paragon Plus Environment
Page 32 of 59
Page 33 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ACS Sustainable Chemistry & Engineering
1 2
Table 2: Discretized intervals of steam consumption (target values) considering three output classes. The values are given in kilograms of steam consumption per kilogram of product.
3 Class label
Interval
Number of data points
High
3-16
61
Middle
1-3
51
Low
0-1
122
4 5
33 ACS Paragon Plus Environment
ACS Sustainable Chemistry & Engineering
1
Table 3: Most important rules of the classification trees at the five stages of process design (S1 to S5).
Class*
Steamdist
PMI
T max
Time
Distillation
acylation (cyanur chloride) , azo-counpling, diazotisation,
Reflux
Reaction
1
***
Rule
S1
Mechanism
2 Stage
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
Page 34 of 59
low
elimination, halogenation, sulfonation 2
acylation,
alkylation,
complexation,
condensation, HC,SN1,SN2,SNAr
high
hydrolysis, polymerisation, reduction S2
1
acylation (cyanur chloride), azo-counpling, diazotisation,
low
elimination, halogenation, sulfonation 2
acylation,
alkylation,
complexation,
condensation, HC,SNAr
no
middle
condensation,
yes
high
hydrolysis, polymerisation, reduction 3
acylation,
alkylation,
complexation,
hydrolysis, polymerisation, reduction S3
1
acylation (cyanur chloride), azo-counpling, diazotisation,
low
elimination, halogenation, sulfonation 2
acylation,
alkylation,
complexation,
condensation, HC,SN2,SNAr
no
18
high
hydrolysis, polymerisation, reduction 3
acylation,
alkylation,
complexation,
condensation,
hydrolysis, polymerisation, reduction
34
ACS Paragon Plus Environment
Page 35 of 59 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 1 20 21 2 22 3 23 4 24 25 5 26 6 27 28 7 29 8 30 9 31 32 10 33 11 34 35 12 36 37 38 39 40 41 42 43 44 45 46 47
S4
ACS Sustainable Chemistry & Engineering
1
acylation (cyanur chloride), azo-counpling, diazotisation,
low
elimination, halogenation, sulfonation 2
acylation,
alkylation,
complexation,
condensation, HC,SN2,SNAr
no
18
high
hydrolysis, polymerisation, reduction 3
acylation,
alkylation,
complexation,
condensation,
hydrolysis, polymerisation, reduction S5
1
1.5
high
3
In this table each row corresponds to one rule, each column starting from the third one to a predictor variable and the last column to the output class. The grey areas indicate when a predictor variable is not present at the corresponding design stage. Considering the categorical predictor variables, reaction and mechanism, the logical rule operation for these predictors corresponds to “OR”. The relation between the different predictors is given by the logical operator “AND”. For example, the second rule of S1 can be formulated as follows: IF the reaction type is equal to acylation OR alkylation OR complexation OR condensation OR hydrolysis OR polymerisation OR reduction, AND the mechanism is equal to HC OR SN1 OR SN2 OR SNAr THEN the steam consumption is high. Reaction mechanisms are included only in cases where at least one of the reaction types can undergo through more than one mechanism. In this way we consider the reaction mechanism as additional information to the reaction type. This choice is consistent with the predictor importance depicted in Figure S3 in the Supporting Information, which shows a higher relevance of the reaction type compared to the mechanism at the five stages of process design.
35
ACS Paragon Plus Environment
ACS Sustainable Chemistry & Engineering 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
Page 36 of 59
1
Table 4: Empirical statistics and probability density function (PDF) model results per reaction type. The values are given in kilograms of steam
2
consumption per kilogram of product.
3 Reaction type
Parameterisation
Empirical values n
Acylation Time18h Acylation
median
min
Model
PDF max 1
33
1.6
0.0
9.3
gamma
21
0.9
0.0
6.0
gamma
Fitted values
parameters
2
p1
p2
median 25th
75th
2.5th 97.5th
0.6
4.1
1.2
0.3
3.2
0.0
11.1
0.5
3.8
0.7
0.14
2.3
0.0
9.1
0.9
0.8
2.5
1.5
4.4
0.5
12.3
12
2.4
0.79
9.3
lognormal
22
0.0
0.0
0.8
lognormal
-4.1
2.9
0.0
0.0
0.1
0.0
4.9
33
2.5
0.0
11.8
gamma
0.6
5.2
1.7
0.5
4.2
0.0
14.4
12
1.0
0.0
3.4
gamma
0.4
2.3
0.3
0.1
1.2
0.0
5.1
(cyanur chloride) Alkylation no distillation distillation
3
21
3.0
0.3
11.8
weibull
4.9
1.5
3.8
2.1
6.1
0.4
11.7
Azo-coupling
25
0.1
0.0
5.3
gamma
0.1
6.6
0.0
0.0
0.2
0.0
6.4
Complexation
9
2.8
0.2
14.9
exponential4
0.23
3.1
1.3
6.1
0.1
16.3
Time